Stacked area charts

The following example presents the evolution of house appliances’ usage along the period from 1970 to 2012. For such a example, the source of the data is not relevant.

The libraries used in this example are: ggplot2¹, scales², colortools³, plyr⁴, dplyr⁵, reshape2⁶.

  library(ggplot2)    # grammar of graphics library
  library(scales)     # to provide some format on the plots
  library(colortools) # to make color functions
  library(plyr)       # fit for purpose data tool
  library(dplyr)      # a grammar of data manipulation
  library(reshape2)   # adapts dataframes for optimal plotting

To start, the original table .xls is fixed and transformed to a comma-separated values file .csv to be then easily managed in R.

  dB.apps_mtoe <- read.csv("dataEvolutionApps.csv")
    rownames(dB.apps_mtoe) <- dB.apps_mtoe$Year
      dB.apps_mtoe$Year <- NULL

      dplyr::tbl_df(dB.apps_mtoe)

          Source: local data frame [43 x 28]

        Year Standard.Light.Bulb Halogen Fluorescent.Strip.Lighting Energy.Saving.Light.Bulb   LED Chest.Freezer Fridge.freezer Refrigerator
       (int)               (int)   (int)                      (int)                    (int) (int)         (int)          (int)        (int)
    1   1970                 841       0                         97                        0     0            22              0          569
    2   1971                 863       0                        103                        0     0            43              0          612
    3   1972                 882       0                        108                        0     0            68             17          652
    4   1973                 906       0                        115                        0     0            97             35          698
    5   1974                 928       0                        121                        0     0           127             60          737
    6   1975                 949       0                        127                        0     0           155             91          750
    7   1976                 966       0                        132                        0     0           181            126          739
    8   1977                 985       0                        137                        0     0           204            164          715
    9   1978                1004       0                        143                        0     0           222            202          686
    10  1979                1025       0                        148                        0     0           235            240          659
    ..   ...                 ...     ...                        ...                      ...   ...           ...            ...          ...
    Variables not shown: Upright.Freezer (int), Washing.Machine (int), Washer.dryer (int), Dishwasher (int), Tumble.Dryer (int), TV (int),
      Set.Top.Box (int), DVD.VCR (int), Games.Consoles (int), Power.Supply.Units (int), Desktops (int), Laptops (int), Monitors (int), Printers
      (int), MultiFunction.Devices (int), Electric.Oven (int), Electric.Hob (int), Microwave (int), Kettle (int)

Then, in order to have a visual aim in the graphic, categories are added to each of the columns.

And finally, a common practice when using ggplot is to melt the data frame to a reduced number of columns: variable, value and category.

  names(dB.apps_mtoe) <- tolower(names(dB.apps_mtoe))
  dB.apps_mtoe.melt <- suppressMessages(melt(dB.apps_mtoe))
    dB.apps_mtoe.melt$Year <- seq(1970, 2012)
    dB.apps_mtoe.melt$System <- "-"
      dB.apps_mtoe.melt$System[1:215] <- "Lighting"
      dB.apps_mtoe.melt$System[216:387] <- "Cold"
      dB.apps_mtoe.melt$System[388:559] <- "Wet"
      dB.apps_mtoe.melt$System[560:774] <- "Brown"
      dB.apps_mtoe.melt$System[775:989] <- "Computing"
      dB.apps_mtoe.melt$System[990:1161] <- "Cooking"
    dB.apps_mtoe.melt$.type <- as.factor(dB.apps_mtoe.melt$variable)
    dB.apps_mtoe.melt$variable <- as.factor(dB.apps_mtoe.melt$Year)
    dB.apps_mtoe.melt$value <- as.numeric(dB.apps_mtoe.melt$value)
    dB.apps_mtoe.melt$.subtype <- dB.apps_mtoe.melt$System
    dB.apps_mtoe.melt$Year <- dB.apps_mtoe.melt$System <- NULL
    #dB.apps_mtoe.melt <- dB.apps_mtoe.melt[order(dB.apps_mtoe.melt$.subtype),]

  dB_categories <- as.data.frame(aggregate(dB.apps_mtoe.melt["value"],
                by=dB.apps_mtoe.melt[c(".type",".subtype")], FUN=length))
    dB_categories$.type <- as.character(dB_categories$.type)
    dB_categories$.subtype <- as.character(dB_categories$.subtype)
    dB_categories$value <- NULL
  dB_categories <- plyr::count(dB_categories, ".subtype")

To improve visualisation, a vector of colors is assigned in regard to the number of elements on each category. This vector is created with small function to define both contrast and related colors among the groups. This vector is passed then to the plot-function which also calls for the type of plot and a facet flag.

  colorCategories <- function(DataToBePlotted, colorId = "#606FEF",
                              levelSat = 0.85){
    DataToBePlotted$col <- setColors(colorId, dim(DataToBePlotted)[1])
    DataToBePlotted$new <- ""
    DataToBePlotted.types <- dim(DataToBePlotted)[1]

    for(i in 1:DataToBePlotted.types){
      ColorSet <- sequential(DataToBePlotted$col[i], 1, what = "value",
                             s = levelSat, alpha = 1, fun = "sqrt", plot=F)
      firstColor <- floor(length(ColorSet) / (DataToBePlotted$freq[i]*4+1))
      groupColor <- NULL;
      for(j in 1:DataToBePlotted$freq[i]) groupColor[j] <- ColorSet[firstColor*j*4]
      DataToBePlotted$new[i] <- paste(groupColor, collapse=",")
    }

    outputColors <- paste(DataToBePlotted$new,collapse=",")
    outputColors <- unlist(strsplit(outputColors, ","))
    return(outputColors)
  }

  drawAppsTyp <- function(DataToBePlotted, cbbPalette, typePlot="stack", facets=F,
                          titleGraph, xAxis, yAxis){
    DataToBePlotted$variable <- as.numeric(as.character(DataToBePlotted$variable))
    DataToBePlotted$.subtype <- as.factor(DataToBePlotted$.subtype)
    g <- ggplot(DataToBePlotted, aes(variable, value, fill=.type)) +
      geom_ribbon(aes(ymin=0, ymax=value), position = typePlot, alpha = 0.84) +
      xlab(xAxis) +
      ylab(yAxis) +
      ggtitle(titleGraph) +
      theme(
        axis.text = element_text(color = "black", size = 13,
                                 margin=unit(0.04, "in")),
        axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1, size = 10),
        axis.title = element_text(size = 18, vjust = 0.8),
        legend.position = "right",
        legend.background = element_rect(fill="transparent"),
        legend.key = element_rect(fill = "white", color = "white"),
        legend.key.width = unit(0.25, "in"),
        legend.key.height = unit(0.25, "in"),
        legend.title = element_blank(),
        panel.grid.major = element_line(colour = "gray",linetype = "dotted"),
        panel.grid.minor = element_line(colour = "gray",linetype = "dotted"),
        panel.background = element_rect(fill = "transparent"),
        axis.ticks = element_line(colour = "gray"),
        axis.ticks.x = element_line(size = rel(4)),
        plot.background=element_blank(),
        plot.title = element_text(colour = "#2E2E2E", size = 20,
                                  hjust = 0, vjust = 2, angle = 0)
      ) +
      scale_x_continuous(breaks=c(1970,1980,1990,2000,2012),
                         labels=c("70s", "80s", "90s", "2k", "now")) +
      scale_y_continuous(labels=comma) +
      guides(fill=guide_legend(title=NULL, reverse=T,
                               label.position="right", keywidth = 0.5,
                               keyheight = 1,
                               nrow = length(unique(DataToBePlotted$.type)))) +
      scale_fill_manual(values=cbbPalette) +
      labs(fill = "variable")

    if(facets==T){
      g + facet_wrap(~ .subtype, shrink = TRUE, scales="free_y")
    }else{
      g
    }
  }

  ColorSet <- colorCategories(dB_categories,"#2ea473", 0.88)

  drawAppsTyp(dB.apps_mtoe.melt, ColorSet, "fill", T,
                    "evolution of house appliances' usage \nby categories",
                    "","appliances usage")

The first plot presents the accumulated stacked data to see the overall evolution.

Plot Stack

However, since some of the elements contain very low values, an alternative might be to separate the plot by categories and to represent them in a percentage (or normalised) way.

Plot Fill

As seen, ggplot is flexible and may be used with ease for reproducible documentation.

I hope this has been useful.

[- Download dataEvolutionApps]

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009. ^[return]
Hadley Wickham (2016). scales: Scale Functions for Visualization. R package version 0.4.0. ^[return]
Gaston Sanchez (2013). colortools: Tools for colors in a Hue-Saturation-Value (HSV) color model. R package version 0.1.5. ^[return]
Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. ^[return]
Hadley Wickham and Romain Francois (2015). dplyr: A Grammar of Data Manipulation. R package version 0.4.3. ^[return]
Hadley Wickham (2007). Reshaping Data with the reshape Package. Journal of Statistical Software, 21(12), 1-20. ^[return]