Conducting Cross-Tabulation Analysis

Cross-tabulation is a cornerstone of survey data analysis, offering deep dives into the interplay between different variables. The scgUtils package equips researchers with robust tools to execute and visualise these complex relationships. This guide explores the nuanced functionalities of crosstab() and compile(), designed to streamline your analytical workflow.

Dynamically Structuring Data with crosstab()

crosstab() transforms survey responses into meaningful two-by-two tables, enriched with statistical analyses. Tailor the presentation of your data with flexible output formats, catering to wide or long data frames for diverse analytical approaches.

#### Wide Format Cross-Tabulation This example demonstrates how to generate a wide-format table, incorporating optional statistical measures for enhanced insights.

# Wide format
crosstab(df,
         rowVar = "partyId",
         colVar = "gender",
         weight = "wt", # optional
         format = "df_wide", # default = df_long which is useful for plotting
         round_decimals = 2, # optional
         statistics = TRUE # optional
) %>%
  head()

#> [1] "partyId x gender: Chisq = 29.054 | DF = 9 | Cramer's V = 0.028 | p-value = 0.001"

partyId	Total	Female	Male
Conservative	29.26	27.41	30.85
Labour	24.14	25.38	23.09
Liberal Democrat	5.84	5.64	6.00
Scottish National Party (SNP)	2.55	2.89	2.26
Plaid Cymru	0.37	0.32	0.41
Green Party	2.52	2.17	2.81

Visual Insights from Crosstabs

Leverage crosstab() with plot = TRUE to convert tabular data into visual representations. This fusion of data and design aids in the intuitive grasp of distribution patterns, supported by statistical depth.

crosstab(df,
         rowVar = "p_eurefvote",
         colVar = "p_edlevel",
         weight = "wt",
         plot = TRUE,
         statistics = TRUE,
         round_decimals = 2
) %>%
  head()

#> [1] "p_eurefvote x p_edlevel: Chisq = 371.026 | DF = 10 | Cramer's V = 0.243 | p-value = 0"

p_eurefvote	p_edlevel	Freq	Perc
I voted to remain	No qualifications	56.18	23.49
I voted to leave	No qualifications	182.98	76.51
Don’t know	No qualifications	0.00	0.00
I voted to remain	Below GCSE	43.66	27.47
I voted to leave	Below GCSE	115.30	72.53
Don’t know	Below GCSE	0.00	0.00

Enhancing Plot Readability

Adjust X-axis labels with adjustX = TRUE for clearer interpretation of densely populated variables, ensuring data accessibility.

crosstab(df,
         rowVar = "polAttention",
         colVar = "gender",
         weight = "wt",
         plot = TRUE,
         statistics = TRUE,
         adjustX = TRUE,
         round_decimals = 2
) %>%
  head()

#> [1] "polAttention x gender: Chisq = 163.768 | DF = 11 | Cramer's V = 0.061 | p-value = 0"

polAttention	gender	Freq	Perc
Pay no attention	Male	65.46	3.04
1	Male	39.89	1.85
2	Male	47.53	2.21
3	Male	54.82	2.55
4	Male	56.93	2.64
5	Male	161.07	7.48

Streamlining Analysis with `compile()`

For extensive variable sets, compile() emerges as a powerful ally. It aggregates crosstabs and statistical summaries into a comprehensive data frame, simplifying the exploration of intricate data relationships.

#### Statistical Compilation Demonstrate the compile() function’s capability to organise a broad spectrum of statistics, including Chi-square, Degrees of Freedom, Cramer’s V, and p-value, offering a scaffold for informed decision-making.

# the row variables are typically your questions within the survey. For ease, utilise dplyr to select the variables
rowVars <- names(df %>% dplyr::select(turnoutUKGeneral:partyIdStrength,
                                      partyIdSqueeze:likeGrn,
                                      pcon:p_hh_size,
                                      p_disability:p_past_vote_2019,
                                      p_eurefturnout))

# the column variables tend to be the demographic variables
colVars <- c("gender", "ageGroup", "p_socgrade", "partyId", "p_eurefvote", "p_edlevel")

# compile stats and save to data frame called `stats`
stats <- compile(df,
                 rowVars = rowVars,
                 colVars = colVars,
                 weight = "wt", # optional
                 save = FALSE, # turn this to FALSE to prevent saving as a .csv
                 format = "statistics")

# View first 10, sorted by Cramer's V
head(stats[order(-stats$CramersV),], 10)

	Row_Var	Col_Var	Size	Chisq	DF	CramersV
10	generalElectionVote	partyId	3953.314	10049.807	81	0.531
316	p_past_vote_2017	partyId	3545.672	5251.916	72	0.430
52	bestOnMII	partyId	3719.803	5886.195	81	0.419
310	p_past_vote_2015	partyId	3567.690	5054.699	81	0.397
258	p_education_age	p_edlevel	3465.861	3157.122	30	0.390
322	p_past_vote_2019	partyId	3551.306	4736.332	90	0.365
248	p_job_sector	ageGroup	3991.109	1706.840	20	0.327
200	p_work_stat	ageGroup	3991.109	2857.155	35	0.320
298	p_past_vote_2010	partyId	3511.013	3182.663	81	0.317
304	p_past_vote_2005	partyId	3222.173	2868.127	81	0.314

NB caution using chi-square and p-values when the sample size is >500 or <5. In these circumstances, use Cramer’s V or Fisher’s Exact test, respectively.

Expansive Tables with `compile()`

The compile() function in scgUtils excels in generating comprehensive crosstab tables. It efficiently processes each variable pair within your dataset, producing detailed tabular outputs. These tables can be formatted and saved as CSV files, making them perfect for inclusion in reports or further analysis.

rowVars <- names(df %>% dplyr::select(turnoutUKGeneral:partyIdStrength,
                                      partyIdSqueeze:likeGrn,
                                      pcon:p_hh_size,
                                      p_disability:p_past_vote_2019,
                                      p_eurefturnout))

colVars <- c("gender", "ageGroup", "p_socgrade", "partyId", "p_eurefvote", "p_edlevel")

compile(df,
        rowVars = rowVars,
        colVars = colVars,
        weight = "wt", # optional
        name = "crosstabs" # this will save as "crosstabs.csv"
)

The ability to create such extensive tables is invaluable for presenting a holistic view of your survey results, encompassing various aspects and relationships within your data.

Dynamically Structuring Data with crosstab()

Visual Insights from Crosstabs

Enhancing Plot Readability

Streamlining Analysis with compile()

Expansive Tables with compile()

Streamlining Analysis with `compile()`

Expansive Tables with `compile()`