Cross-tabulation is a cornerstone of survey data analysis,
offering deep dives into the interplay between different variables. The
scgUtils
package equips researchers with robust tools to
execute and visualise these complex relationships. This guide explores
the nuanced functionalities of crosstab()
and
compile()
, designed to streamline your analytical workflow.
Dynamically Structuring Data with crosstab()
crosstab()
transforms survey responses into meaningful
two-by-two tables, enriched with statistical analyses. Tailor the
presentation of your data with flexible output formats, catering to wide
or long data frames for diverse analytical approaches.
####
Wide Format Cross-Tabulation This example demonstrates
how to generate a wide-format table, incorporating optional statistical
measures for enhanced insights.
# Wide format
crosstab(df,
rowVar = "partyId",
colVar = "gender",
weight = "wt", # optional
format = "df_wide", # default = df_long which is useful for plotting
round_decimals = 2, # optional
statistics = TRUE # optional
) %>%
head()
#> [1] "partyId x gender: Chisq = 29.054 | DF = 9 | Cramer's V = 0.028 | p-value = 0.001"
partyId | Total | Female | Male |
---|---|---|---|
Conservative | 29.26 | 27.41 | 30.85 |
Labour | 24.14 | 25.38 | 23.09 |
Liberal Democrat | 5.84 | 5.64 | 6.00 |
Scottish National Party (SNP) | 2.55 | 2.89 | 2.26 |
Plaid Cymru | 0.37 | 0.32 | 0.41 |
Green Party | 2.52 | 2.17 | 2.81 |
Visual Insights from Crosstabs
Leverage crosstab()
with plot = TRUE
to
convert tabular data into visual representations. This fusion of data
and design aids in the intuitive grasp of distribution patterns,
supported by statistical depth.
crosstab(df,
rowVar = "p_eurefvote",
colVar = "p_edlevel",
weight = "wt",
plot = TRUE,
statistics = TRUE,
round_decimals = 2
) %>%
head()
#> [1] "p_eurefvote x p_edlevel: Chisq = 371.026 | DF = 10 | Cramer's V = 0.243 | p-value = 0"
p_eurefvote | p_edlevel | Freq | Perc |
---|---|---|---|
I voted to remain | No qualifications | 56.18 | 23.49 |
I voted to leave | No qualifications | 182.98 | 76.51 |
Don’t know | No qualifications | 0.00 | 0.00 |
I voted to remain | Below GCSE | 43.66 | 27.47 |
I voted to leave | Below GCSE | 115.30 | 72.53 |
Don’t know | Below GCSE | 0.00 | 0.00 |
Enhancing Plot Readability
Adjust X-axis labels with adjustX = TRUE
for clearer
interpretation of densely populated variables, ensuring data
accessibility.
crosstab(df,
rowVar = "polAttention",
colVar = "gender",
weight = "wt",
plot = TRUE,
statistics = TRUE,
adjustX = TRUE,
round_decimals = 2
) %>%
head()
#> [1] "polAttention x gender: Chisq = 163.768 | DF = 11 | Cramer's V = 0.061 | p-value = 0"
polAttention | gender | Freq | Perc |
---|---|---|---|
Pay no attention | Male | 65.46 | 3.04 |
1 | Male | 39.89 | 1.85 |
2 | Male | 47.53 | 2.21 |
3 | Male | 54.82 | 2.55 |
4 | Male | 56.93 | 2.64 |
5 | Male | 161.07 | 7.48 |
Streamlining Analysis with compile()
For extensive variable sets, compile()
emerges as a
powerful ally. It aggregates crosstabs and statistical summaries into a
comprehensive data frame, simplifying the exploration of intricate data
relationships.
#### Statistical Compilation
Demonstrate the compile()
function’s capability to organise
a broad spectrum of statistics, including Chi-square, Degrees of
Freedom, Cramer’s V, and p-value, offering a scaffold for informed
decision-making.
# the row variables are typically your questions within the survey. For ease, utilise dplyr to select the variables
rowVars <- names(df %>% dplyr::select(turnoutUKGeneral:partyIdStrength,
partyIdSqueeze:likeGrn,
pcon:p_hh_size,
p_disability:p_past_vote_2019,
p_eurefturnout))
# the column variables tend to be the demographic variables
colVars <- c("gender", "ageGroup", "p_socgrade", "partyId", "p_eurefvote", "p_edlevel")
# compile stats and save to data frame called `stats`
stats <- compile(df,
rowVars = rowVars,
colVars = colVars,
weight = "wt", # optional
save = FALSE, # turn this to FALSE to prevent saving as a .csv
format = "statistics")
# View first 10, sorted by Cramer's V
head(stats[order(-stats$CramersV),], 10)
Row_Var | Col_Var | Size | Chisq | DF | CramersV | p_value | |
---|---|---|---|---|---|---|---|
10 | generalElectionVote | partyId | 3953.314 | 10049.807 | 81 | 0.531 | 0 |
316 | p_past_vote_2017 | partyId | 3545.672 | 5251.916 | 72 | 0.430 | 0 |
52 | bestOnMII | partyId | 3719.803 | 5886.195 | 81 | 0.419 | 0 |
310 | p_past_vote_2015 | partyId | 3567.690 | 5054.699 | 81 | 0.397 | 0 |
258 | p_education_age | p_edlevel | 3465.861 | 3157.122 | 30 | 0.390 | 0 |
322 | p_past_vote_2019 | partyId | 3551.306 | 4736.332 | 90 | 0.365 | 0 |
248 | p_job_sector | ageGroup | 3991.109 | 1706.840 | 20 | 0.327 | 0 |
200 | p_work_stat | ageGroup | 3991.109 | 2857.155 | 35 | 0.320 | 0 |
298 | p_past_vote_2010 | partyId | 3511.013 | 3182.663 | 81 | 0.317 | 0 |
304 | p_past_vote_2005 | partyId | 3222.173 | 2868.127 | 81 | 0.314 | 0 |
NB caution using chi-square and p-values when the sample
size is >500 or <5. In these circumstances, use Cramer’s V or
Fisher’s Exact test, respectively.
Expansive Tables with compile()
The compile()
function in scgUtils
excels
in generating comprehensive crosstab tables. It efficiently processes
each variable pair within your dataset, producing detailed tabular
outputs. These tables can be formatted and saved as CSV files, making
them perfect for inclusion in reports or further analysis.
rowVars <- names(df %>% dplyr::select(turnoutUKGeneral:partyIdStrength,
partyIdSqueeze:likeGrn,
pcon:p_hh_size,
p_disability:p_past_vote_2019,
p_eurefturnout))
colVars <- c("gender", "ageGroup", "p_socgrade", "partyId", "p_eurefvote", "p_edlevel")
compile(df,
rowVars = rowVars,
colVars = colVars,
weight = "wt", # optional
name = "crosstabs" # this will save as "crosstabs.csv"
)
The ability
to create such extensive tables is invaluable for presenting a holistic
view of your survey results, encompassing various aspects and
relationships within your data.