Visualising Data

This article demonstrates the powerful visualisation capabilities of the scgUtils package, offering tools for diverse data presentations ranging from personality profiles to demographic and flow analyses.

Sankey

Flow visualisation helps in understanding how different categories of respondents transition between various stages or choices. The plot_sankey() function is instrumental in depicting the flow of data, especially useful in understanding voting patterns or changes in preferences over time.

Preparing Data with `grp_freq()`

Before visualising, prepare your data using grp_freq(), which aggregates frequencies necessary for the Sankey diagram.

# Subset the required columns from the dataset
sankey_df <- survey_df[, c("wt", "generalElectionVote", "p_past_vote_2019")]

# Get the frequency
sankey_df <- grp_freq(sankey_df,
                      groups = c("generalElectionVote", "p_past_vote_2019"),
                      weight = "wt", # optional
                      round_decimals = 0, # optional
)
head(sankey_df)
# NB. The `dplyr` equivalent is:
# df %>%
#   group_by(generalElectionVote, p_past_vote_2019) %>%
#   summarise(Freq = sum(wt))

generalElectionVote	p_past_vote_2019	Freq
I would/did not vote	Conservative	75
Conservative	Conservative	793
Labour	Conservative	124
Liberal Democrat	Conservative	50
Scottish National Party (SNP)	Conservative	2
Plaid Cymru	Conservative	2

Customising the Sankey Diagram

The plot_sankey() function offers extensive customisation, allowing the diagram to be tailored to specific data narratives. The colour_prep() function enhances this customisation by facilitating the assignment of meaningful colours based on categories like political party affiliations. Such customisation not only improves the aesthetic appeal of the Sankey diagram but also boosts its interpretability and effectiveness in conveying complex data flows.

plot_sankey(sankey_df,
            source = "p_past_vote_2019", # on the left side
            target = "generalElectionVote", # on the right side
            value = "Freq",
            units = " votes",
            colours = colour_prep(df, c("generalElectionVote", "p_past_vote_2019"), pal_name = "polUK"),
            fontSize = 16, # change font size
            fontFamily = "Calibri", # default
            nodeWidth = 20, # default
            nodePadding = 10, # default
            margin = list(top = 0, right = 130, bottom = 0, left = 0), # adjust the margin
            width = 1200, # default
            height = 800, # default
            shiftLabel = NULL, # default
            heading = "Flow of Votes",
            sourceTitle = "2019 Vote",
            targetTitle = "VI"
) # %>%
# save from viewer to html
# htmlwidgets::saveWidget(file = "sankey_VI.html", selfcontained = TRUE)

Parliament

Understanding the distribution of parliamentary seats among political parties is crucial for grasping the political landscape. The plot_parliament() function in scgUtils is designed to visualise this distribution in a semicircular parliament layout. It is particularly useful for illustrating the composition of a parliament following an election.

Basic Parliament

The basic usage of plot_parliament() involves creating a plot that shows the number of seats each party holds. This representation helps in quickly understanding the strength of each party within the parliament.

# Prepare Data
de_parliament <- data.frame(
  Party = c("SPD", "Greens", "FDP", "The Left", "Other", "AfD", "CDU/CSU"),
  Result = c(206, 118, 92, 39, 1, 83, 97)
)

# Plot
plot_parliament(de_parliament,
                partyCol = "Party",
                seatCol = "Result",
                colours = c("#e3000f", "#409a3c", "#ffed00", "#be3075", "#dcdcdc", "#00a2de", "black") # optional
)

Adding a Percentage Bar

For a more detailed analysis, plot_parliament() can also include a percentage bar that shows the popular vote won by each party. This feature provides additional context to the seat distribution, reflecting how party popularity translates into parliamentary seats.

# Prepare Data
uk_parliament <- data.frame(
  Party = c("Labour", "SNP", "Other", "Liberal Democrat", "Conservative"),
  Seats = c(202, 48, 24, 11, 365),
  Percentage = c(32.1, 3.9, 8.8, 11.6, 43.6)
)

# Plot
plot_parliament(uk_parliament,
                partyCol = "Party",
                seatCol = "Seats",
                percentCol = "Percentage",
                majorityLine = TRUE, # add line down centre
                title = "2019 UK General Election", # add title
                subtitle = "Results", # add subtitle
                legend = "bottom", # add legend to bottom
                colours = colour_prep(uk_parliament, "Party", "polUK"), # match colours using `colour_prep()`
)

This plot offers an intuitive way to analyse election results, party strengths, and their representation in the parliament. The inclusion of a majority line further enhances the plot by delineating the threshold needed for a majority.

Population

Understanding demographic distribution is vital in survey analysis. plot_popn() creates visual representations of population profiles.

Using `plot_popn`

The plot_popn() function is designed to visualise the population structure of your survey respondents. It creates a population pyramid showing distributions across gender and age groups. If a variable like average age (meanVar) is specified, the plot can also display this information, adding another layer of insight into the demographic composition.

plot_popn(data = survey_df,
          xVar = "gender",
          yVar = "ageGroup",
          weight = "wt", # optional
          meanVar = "age", # optional (must be numeric)
          addLabels = TRUE # to add % labels
)

Faceting by Group

Enhance your population pyramid by faceting the plot_popn plot by a specific group, such as voter turnout. This feature overlays the selected group’s data onto the total population structure, providing a comparative view that highlights differences or similarities within subgroups.

plot_popn(data = survey_df,
          xVar = "gender",
          yVar = "ageGroup",
          group = "turnoutUKGeneral",
          weight = "wt", # optional
)

Personality

The plot_bigfive() function returns a ggplot2 chart to help visualise the personality profile of the survey data. This radar chart is primarily to visualise the Big Five personality traits (neuroticism, extroversion, openness, agreeableness, and conscientiousness) but can be amended for other quantitative data types with a scale between 0 and 100.

# Create single plot using unweighted data
plot_bigfive(bigfive_df,
             bigfive = c("Neuroticism", "Extroversion", "Openness", "Agreeableness", "Conscientiousness"))

When a group is provided, the function returns faceted plots with the variables within the group plotted on top of the average. This provides an easy comparison between the variable and the rest of the cohort in the survey.

# Create faceted plot using age groups and weighted data
plot_bigfive(bigfive_df,
             bigfive = c("Neuroticism", "Extroversion", "Openness", "Agreeableness", "Conscientiousness"),
             group = "Gender",
             weight = "Weight")

Binary

The plot_binary() function visualises binary survey responses (e.g., “Yes” vs “No”). It is particularly effective for comparative analysis. This function utilises the grid_vars() function to help transform the data into the correct format.

# Create list for dummy data
vars <- list(Q1a = "Art",
             Q1b = "Automobiles",
             Q1c = "Birdwatching",
             Q1d = "Music",
             Q1e = "Reading",
             Q1f = "Cooking",
             Q1g = "Hiking",
             Q1h = "Watching Sport",
             Q1i = "Computers",
             Q1j = "Gaming"
)
# Create plot of total dataset using unweighted data
plot_binary(binary_df,
            vars = vars,
            value = "Yes",
            title = "Hobbies", # option
            totalColour = "maroon" # optional (default = grey)
)

When a group is provided, the function return faceted plots with the variables within the group plotted against the total. This provides an easy comparison between the variable and the rest of the cohort in the survey.

# Create faceted plot using Gender and weighted data
plot_binary(binary_df,
            vars = vars,
            value = "Yes",
            group = "Gender", # optional
            weight = "Weight", # optional
            title = "Hobbies",
            subtitle = "by Gender"
)

Waffle

Waffle plots provide a unique and compelling way to visualise categorical data, making them ideal for representing proportions or percentages in datasets, including survey responses. The plot_waffle() function in the scgUtils package offers a straightforward method to create these plots.

Basic Usage

The basic use of plot_waffle() involves creating a plot that represents the distribution of different categories. This method is especially effective for visually demonstrating the relative sizes of groups within a population.

# Prepare Data
waffle_df <- data.frame(
  Category = c("A", "B", "C"),
  Count = c(30, 40, 30)
)

# Plot
plot_waffle(waffle_df,
            group = "Category",
            values = "Count",
            isolateVar = "A" # show a single plot only
)

Using `plot_waffle()` with Survey Data

plot_waffle() is particularly adept at handling survey data, supporting both weighted and unweighted analysis. The function can automatically extract and display relevant labels from survey questions, enhancing the plot’s interpretability.

# Waffle plot with unweighted survey data
plot_waffle(survey_df %>% filter(p_socgrade != "Unknown"), # removing unknowns
            group = "p_socgrade",
            title = "p_socgrade"
)

For a more refined visual presentation, plot_waffle() allows customisation of colours and plot order. This flexibility is invaluable for aligning the visual aesthetic with specific data narratives or brand guidelines.

Likert Scales

Likert scales are a staple in survey research, providing detailed insights into respondents’ attitudes and opinions. The plot_likert() function within the scgUtils package offers versatile visualisation options to effectively communicate the nuances of Likert scale data. This function includes three distinct visualisation styles:

100% Stacked Bars
Diverging Bars
Faceted Bars

Visualising Likert scale responses with a 100% stacked bar chart allows for an intuitive comparison of agreement levels across different questions or groups.

Stacked Bar Chart with One Question and One Group
To enhance clarity and interpretability, the plot_likert() function supports the optional varLevels argument to dictate the order of response variables. Colour customisation is achievable through the colour_pal() function, aiding in distinguishing between different levels of agreement or sentiment. Additionally, text colour is set based on an internal contrast test to ensure readability of labels.

# Prepare palette with 5 colours from the divergent colour scale Blue to Green
colours <- colour_pal(pal_name = "divBlueGreen",
                      n = 5, # number of colours required
                      assign = c("Strongly disagree", "Disagree",
                                 "Don't know",
                                 "Agree", "Strongly agree")
)

# Prepare varLevels as named list with 'left', 'neutral', and 'right' elements.
# (NB these must match all columns contained within vars
varLevels <- list(left = c("Strongly disagree", "Disagree"),
                  neutral = c("Don't know"),
                  right = c("Agree", "Strongly agree"))

# Likert plot with custom settings (weighted data and group on the y-axis)
plot_likert(survey_df,
            vars = "pidWeThey",
            group = "partyId",
            weight = "wt",
            varLevels = varLevels,
            total = TRUE, # set TRUE to add comparison against total population (available when group is present)
            NET = TRUE, # set TRUE to add NET score
            addLabels = TRUE, # Add % labels
            threshold = 5, # Set threshold for % that labels will be shown on plot
            order_by = "NET", # order bars by NET score
            title = get_question(survey_df, "pidWeThey"), # Add title
            subtitle = "by Party Identification", # Add subtitle
            colours = colours, # Add colours
            legend = "bottom" # Move legend to bottom

)
#> Warning in geom_text(aes(x = upperLim + 5, y = net_label_y_pos, label = heading), : All aesthetics have length 1, but the data has 46 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data
#>   containing a single row.

Basic Chart with Multiple Questions
This approach allows for a comprehensive overview of several questions simultaneously, with the varLevels list controlling the response variable ordering and the inclusion of a Net Promoter Score-like metric for additional insight.

# Prepare vars argument as list. The strings will be plotted on the y axis.
vars <- list(likeSunak = "Sunak",
             likeStarmer = "Starmer",
             likeCon = "Conservative",
             likeLab = "Labour",
             likeLD = "Lib Dems",
             likeSNP = "SNP",
             likePC = "Plaid",
             likeBrexitParty = "Reform",
             likeGrn = "Greens"
)

# Response level settings
varLevels <- list(left = c("Strongly dislike", "1", "2", "3", "4"),
                  neutral = c("5", "Don't know"),
                  right = c("6", "7", "8", "9", "Strongly like"))

# Custom colours
colours <- colour_pal("divRedBlue", 11, c("Strongly dislike", "1", "2", "3", "4", "5", "6", "7", "8", "9", "Strongly like"))
colours$`Don't know` <- "grey90" # Make "Don't know" grey

# Plotting multiple questions (vars are along the y-axis)
plot_likert(survey_df,
            vars = vars,
            varLevels = varLevels,
            weight = "wt",
            colours = colours,
            NET = TRUE,
            order_by = "left", # ordering by the mean of Strongly dislike etc (left side)
            title = "Un*favourability of Political Parties and Leaders",
            legend = "bottom",
            nrow = 2, # put the legend on two rows
) +
  ggplot2::geom_vline(xintercept = 50, colour = "white") # add white intercept line at the 50% point
#> Warning in geom_text(aes(x = upperLim + 5, y = net_label_y_pos, label = heading), : All aesthetics have length 1, but the data has 108 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data
#>   containing a single row.

Grouped Likert Chart
Adding a grouping variable allows for the nuanced comparison of Likert scale responses across different segments of the survey population, further enriching the data’s interpretative value.

# Plot weighted data faceted by EU referendum vote
plot_likert(survey_df,
            vars = vars,
            group = "p_eurefvote",
            varLevels = varLevels,
            weight = "wt",
            colours = colours,
            order_by = "left", # ordering by the mean of Strongly dislike etc (left side)
            title = "Un*favourability of Political Parties and Leaders",
            legend = "bottom",
            ncol = 1, # change number of columns from 3 to one to stack plots
            nrow = 2, # put the legend on two rows
            ratio = 3, # reduce the fixed ratio of the coordinates from 6 to 3
            base_size = 9 # reduce the base font from 10 to 9
) +
  ggplot2::geom_vline(xintercept = 50, colour = "white")

Divergent with NET and vars on Y-Axis

varLevels <- list(left = c("Strongly dislike", "1", "2", "3", "4"),
                       neutral = c("5", "Don't know"),
                       right = c("6", "7", "8", "9", "Strongly like"))

plot_likert(survey_df,
            vars = vars,
            weight = "wt",
            type = "divergent",
            varLevels = varLevels,
            NET = TRUE, # add NET column
            colours = colours,
            order_by = "NET" # order by the NET values
)
#> Warning in geom_text(aes(x = upperLim + 5, y = net_label_y_pos, label = heading), : All aesthetics have length 1, but the data has 126 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data
#>   containing a single row.

Divergent with neutrals on the right and group on Y-Axis

varLevels <- list(left = c("Strongly disagree", "Disagree"),
                             neutral = "Don't know",
                             right = c("Agree", "Strongly agree"))

plot_likert(survey_df,
            vars = "pidWeThey",
            group = "gender",
            weight = "wt",
            type = "divergent",
            varLevels = varLevels,
            addLabels = TRUE, # turn labels on
            total = TRUE, # add totals
            neutrals = "right", # Place neutrals on right instead
            order_by = "left", # order by the negative/lhs values hand side
            legend = "bottom"
)
#> Warning in geom_text(aes(x = upperLim + 5, y = net_label_y_pos, label = heading), : All aesthetics have length 1, but the data has 12 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data
#>   containing a single row.

Facetted with group on Y-Axis

# Prepare palette with 5 colours from the divergent colour scale Blue to Green
colours <- colour_pal(pal_name = "divBlueGreen",
                      n = 5, # number of colours required
                      assign = c("Strongly disagree", "Disagree",
                                 "Don't know",
                                 "Agree", "Strongly agree")
)

# Prepare varLevels as named list with 'left', 'neutral', and 'right' elements.
# (NB these must match all columns contained within vars
varLevels <- list(left = c("Strongly disagree", "Disagree"),
                  neutral = c("Don't know"),
                  right = c("Agree", "Strongly agree"))

# Likert plot with custom settings (weighted data and group on the y-axis)
plot_likert(survey_df,
            vars = "pidWeThey",
            group = "partyId",
            weight = "wt",
            varLevels = varLevels,
            type = "facetted", # Change type of plot
            order_by = "right", # order bars by left side
            title = get_question(survey_df, "pidWeThey"), # Add title
            subtitle = "by Party Identification", # Add subtitle
            colours = colours, # Add colours
            ratio = 10,
            legend = "none" # Turn off legend
)

Facetted with vars on Y-Axis

# Prepare vars argument as list. The strings will be plotted on the y axis.
vars <- list(likeSunak = "Sunak",
             likeStarmer = "Starmer",
             likeCon = "Conservative",
             likeLab = "Labour",
             likeLD = "Lib Dems",
             likeSNP = "SNP",
             likePC = "Plaid",
             likeBrexitParty = "Reform",
             likeGrn = "Greens"
)

# Response level settings
varLevels <- list(left = c("Strongly dislike", "1", "2", "3", "4"),
                  neutral = c("5", "Don't know"),
                  right = c("6", "7", "8", "9", "Strongly like"))

# Custom colours
colours <- colour_pal("divRedBlue", 11, c("Strongly dislike", "1", "2", "3", "4", "5", "6", "7", "8", "9", "Strongly like"))
colours$`Don't know` <- "grey90" # Make "Don't know" grey

# Plotting multiple questions (vars are along the y-axis)
plot_likert(survey_df,
            vars = vars,
            varLevels = varLevels,
            weight = "wt",
            type = "facetted",
            colours = colours,
            addLabels = TRUE,
            ratio = 10,
            order_by = "left", # ordering by the mean of Strongly dislike etc (left side)
            title = "Un*favourability of Political Parties and Leaders",
            legend = "none"
)

Other Plots

Future updates to scgUtils will introduce other plots such as:

plot_dumbbell() which can be used to compare two categories and view the differences between numeric data.
plot_wordcloud() to highlight keywords from either qualitative or quantitative results.
plot_donut() to illustrate numerical proportions.
plot_radar() which will expand the capabilities of plot_bigfive() in order to allow the comparison of any numeric multivariate data.
plot_mekko()to represent categorical data with multiple subcategories.