Introduction

High-volume scatterplot are scatterplots with many points. This typically leads to overplotting and when saving the plots to files using vector-based devices such as pdf() or postscript result files are very large.

A first solution to this problem is to use raster-based graphics devices such as png() or jpeg(). But those devices require X11-library support to be installed. On an ubuntu system this support can be installed using the package xorg-dev.

In what follows alternative ways to overcome the problem of high-volume scatterplots when the native raster-based devices such as png() and jpeg() are not available are described.

Alternative Solutions

This section contains a few alternative solutions

smoothScatter

Instead of plotting each point in a high-volumn scatterplot, a smoothed version of the plot is drawn using a 2D kernel density estimate. This can be done in the function graphics::smoothScatter() which is available in base-r. This approach is described in more detail under https://www.stat.ubc.ca/~jenny/STAT545A/block91_latticeGraphics.html.

grid.raster

In case we are able to represent the scatterplot data in a matrix, we can use grid.raster() to draw a rasterized image of the plot. For more details, please have a look at https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Murrell.pdf

gglot2

Instead of saving the plots as raster-based images, we can use ggplot2 for plottting and save the plot as R-objects using the save() function. The saved R-objects can be re-loaded on a different system with support for png() and jpeg() and viewed on this second system.

Numerical Example

In this section we want to investigate the file sizes of the different proposed approaches. For this we first create a random dataset and create from this a scatterplot and save those plots to files.

n_nr_obs <- 10^5
n_inter <- 3.24
n_slope <- 0.7
x <- runif(n_nr_obs)
y <- n_inter + n_slope * x + rnorm(n_nr_obs)
tbl_plot_data <- tibble::tibble(x = x, y = y)

The produced scatter plots that will be saved in files all look more or less as follows

plot(x, y)

The above shown scatter plot contains large sections of over-plotting. A smoothed scatter plot might give more information in all areas of the plot.

smoothScatter(y ~ x, xlim = c(0,1), ylim = c(0, 8))

pdf

As a reference to the upper bound of the file size, we start by creating a vector-based version of the scatterplot and see that the file size is very large.

s_out_path_vec <- "Rvecplot.pdf"
pdf(s_out_path_vec)
plot(x, y)
dev.off()
null device 
          1 
n_vec_filesize <- file.info(s_out_path_vec)$size

png

The lower bound of the file size or the target value is reached with an ordinary png-plot

s_out_path_png <- "Rpngplot.png"
png(s_out_path_png)
plot(x, y)
dev.off()
null device 
          1 
n_png_filesize <- file.info(s_out_path_png)$size

smoothScatter

The graphics package from r-base contains the function smoothScatter() which shades the plane according to a 2D kernel density estimate.

s_out_path_smooth <- "Rsmoothplot.pdf"
pdf(s_out_path_smooth)
#smoothScatter(y ~ x, colramp = colorRampPalette(c("white", blues9)), xlim = c(0,1), ylim = c(0, 8))
smoothScatter(y ~ x, xlim = c(0,1), ylim = c(0, 8))
dev.off()
null device 
          1 
n_smooth_filesize <- file.info(s_out_path_smooth)$size

ggplot2

This approach is different in the sense that it does not store the graphics files, but it stores the graphics objects. This can only be done with ggplot2.

require(ggplot2)
s_put_path_plot_obj <- "Rplotobj.rda"
p <- ggplot(tbl_plot_data, aes(x, y)) + geom_point()
save(p, file = s_put_path_plot_obj)
n_ggplot_filesize <- file.info(s_put_path_plot_obj)$size

On a different machine the plot is loaded again and the plotted.

s_out_path_ggplot <- "Rggplot.png"
load(s_put_path_plot_obj)
ggsave(s_out_path_ggplot, plot = p, width = 4, height = 4)
n_png_ggplot_filesize <- file.info(s_out_path_ggplot)$size

Resulting Filesizes

The table below gives an overview of the file sizes

Filename Method File Size (in KB)
Rvecplot.pdf pdf 699.95
Rpngplot.png png 64.90
Rsmoothplot.pdf smooth 86.37
Rplotobj.rda ggplot2obj 1286.01
Rggplot.png ggsave 128.55

Clean Up

All the plot files are removed.

file.remove(tbl_file_size$Filename)
[1] TRUE TRUE TRUE TRUE TRUE
LS0tCnRpdGxlOiAiSGlnaC12b2x1bWUgU2NhdHRlcnBsb3RzIFJldmlzaXRlZCIKYXV0aG9yOiAiUGV0ZXIgdm9uIFJvaHIiCmRhdGU6ICIyMDE5LTAxLTE1IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgoKIyMgSW50cm9kdWN0aW9uCkhpZ2gtdm9sdW1lIHNjYXR0ZXJwbG90IGFyZSBzY2F0dGVycGxvdHMgd2l0aCBtYW55IHBvaW50cy4gVGhpcyB0eXBpY2FsbHkgbGVhZHMgdG8gb3ZlcnBsb3R0aW5nIGFuZCB3aGVuIHNhdmluZyB0aGUgcGxvdHMgdG8gZmlsZXMgdXNpbmcgdmVjdG9yLWJhc2VkIGRldmljZXMgc3VjaCBhcyBgcGRmKClgIG9yIGBwb3N0c2NyaXB0YCByZXN1bHQgZmlsZXMgYXJlIHZlcnkgbGFyZ2UuCgpBIGZpcnN0IHNvbHV0aW9uIHRvIHRoaXMgcHJvYmxlbSBpcyB0byB1c2UgcmFzdGVyLWJhc2VkIGdyYXBoaWNzIGRldmljZXMgc3VjaCBhcyBgcG5nKClgIG9yIGBqcGVnKClgLiBCdXQgdGhvc2UgZGV2aWNlcyByZXF1aXJlIFgxMS1saWJyYXJ5IHN1cHBvcnQgdG8gYmUgaW5zdGFsbGVkLiBPbiBhbiB1YnVudHUgc3lzdGVtIHRoaXMgc3VwcG9ydCBjYW4gYmUgaW5zdGFsbGVkIHVzaW5nIHRoZSBwYWNrYWdlIGB4b3JnLWRldmAuIAoKSW4gd2hhdCBmb2xsb3dzIGFsdGVybmF0aXZlIHdheXMgdG8gb3ZlcmNvbWUgdGhlIHByb2JsZW0gb2YgaGlnaC12b2x1bWUgc2NhdHRlcnBsb3RzIHdoZW4gdGhlIG5hdGl2ZSByYXN0ZXItYmFzZWQgZGV2aWNlcyBzdWNoIGFzIGBwbmcoKWAgYW5kIGBqcGVnKClgIGFyZSBub3QgYXZhaWxhYmxlIGFyZSBkZXNjcmliZWQuIAoKCiMjIEFsdGVybmF0aXZlIFNvbHV0aW9ucwpUaGlzIHNlY3Rpb24gY29udGFpbnMgYSBmZXcgYWx0ZXJuYXRpdmUgc29sdXRpb25zCgoKIyMjIHNtb290aFNjYXR0ZXIKSW5zdGVhZCBvZiBwbG90dGluZyBlYWNoIHBvaW50IGluIGEgaGlnaC12b2x1bW4gc2NhdHRlcnBsb3QsIGEgc21vb3RoZWQgdmVyc2lvbiBvZiB0aGUgcGxvdCBpcyBkcmF3biB1c2luZyBhIDJEIGtlcm5lbCBkZW5zaXR5IGVzdGltYXRlLiBUaGlzIGNhbiBiZSBkb25lIGluIHRoZSBmdW5jdGlvbiBgZ3JhcGhpY3M6OnNtb290aFNjYXR0ZXIoKWAgd2hpY2ggaXMgYXZhaWxhYmxlIGluIGJhc2Utci4gVGhpcyBhcHByb2FjaCBpcyBkZXNjcmliZWQgaW4gbW9yZSBkZXRhaWwgdW5kZXIgaHR0cHM6Ly93d3cuc3RhdC51YmMuY2Evfmplbm55L1NUQVQ1NDVBL2Jsb2NrOTFfbGF0dGljZUdyYXBoaWNzLmh0bWwuIAoKCiMjIyBncmlkLnJhc3RlcgpJbiBjYXNlIHdlIGFyZSBhYmxlIHRvIHJlcHJlc2VudCB0aGUgc2NhdHRlcnBsb3QgZGF0YSBpbiBhIG1hdHJpeCwgd2UgY2FuIHVzZSBgZ3JpZC5yYXN0ZXIoKWAgdG8gZHJhdyBhIHJhc3Rlcml6ZWQgaW1hZ2Ugb2YgdGhlIHBsb3QuIEZvciBtb3JlIGRldGFpbHMsIHBsZWFzZSBoYXZlIGEgbG9vayBhdCBodHRwczovL2pvdXJuYWwuci1wcm9qZWN0Lm9yZy9hcmNoaXZlLzIwMTEtMS9SSm91cm5hbF8yMDExLTFfTXVycmVsbC5wZGYKCgojIyMgZ2dsb3QyCkluc3RlYWQgb2Ygc2F2aW5nIHRoZSBwbG90cyBhcyByYXN0ZXItYmFzZWQgaW1hZ2VzLCB3ZSBjYW4gdXNlIGBnZ3Bsb3QyYCBmb3IgcGxvdHR0aW5nIGFuZCBzYXZlIHRoZSBwbG90IGFzIFItb2JqZWN0cyB1c2luZyB0aGUgYHNhdmUoKWAgZnVuY3Rpb24uIFRoZSBzYXZlZCBSLW9iamVjdHMgY2FuIGJlIHJlLWxvYWRlZCBvbiBhIGRpZmZlcmVudCBzeXN0ZW0gd2l0aCBzdXBwb3J0IGZvciBgcG5nKClgIGFuZCBganBlZygpYCBhbmQgdmlld2VkIG9uIHRoaXMgc2Vjb25kIHN5c3RlbS4KCgojIyBOdW1lcmljYWwgRXhhbXBsZQpJbiB0aGlzIHNlY3Rpb24gd2Ugd2FudCB0byBpbnZlc3RpZ2F0ZSB0aGUgZmlsZSBzaXplcyBvZiB0aGUgZGlmZmVyZW50IHByb3Bvc2VkIGFwcHJvYWNoZXMuIEZvciB0aGlzIHdlIGZpcnN0IGNyZWF0ZSBhIHJhbmRvbSBkYXRhc2V0IGFuZCBjcmVhdGUgZnJvbSB0aGlzIGEgc2NhdHRlcnBsb3QgYW5kIHNhdmUgdGhvc2UgcGxvdHMgdG8gZmlsZXMuCgoKYGBge3J9Cm5fbnJfb2JzIDwtIDEwXjUKbl9pbnRlciA8LSAzLjI0Cm5fc2xvcGUgPC0gMC43CnggPC0gcnVuaWYobl9ucl9vYnMpCnkgPC0gbl9pbnRlciArIG5fc2xvcGUgKiB4ICsgcm5vcm0obl9ucl9vYnMpCnRibF9wbG90X2RhdGEgPC0gdGliYmxlOjp0aWJibGUoeCA9IHgsIHkgPSB5KQpgYGAKClRoZSBwcm9kdWNlZCBzY2F0dGVyIHBsb3RzIHRoYXQgd2lsbCBiZSBzYXZlZCBpbiBmaWxlcyBhbGwgbG9vayBtb3JlIG9yIGxlc3MgYXMgZm9sbG93cwoKYGBge3J9CnBsb3QoeCwgeSkKYGBgCgpUaGUgYWJvdmUgc2hvd24gc2NhdHRlciBwbG90IGNvbnRhaW5zIGxhcmdlIHNlY3Rpb25zIG9mIG92ZXItcGxvdHRpbmcuIEEgc21vb3RoZWQgc2NhdHRlciBwbG90IG1pZ2h0IGdpdmUgbW9yZSBpbmZvcm1hdGlvbiBpbiBhbGwgYXJlYXMgb2YgdGhlIHBsb3QuCgpgYGB7cn0Kc21vb3RoU2NhdHRlcih5IH4geCwgeGxpbSA9IGMoMCwxKSwgeWxpbSA9IGMoMCwgOCkpCmBgYAoKCgojIyMgcGRmCkFzIGEgcmVmZXJlbmNlIHRvIHRoZSB1cHBlciBib3VuZCBvZiB0aGUgZmlsZSBzaXplLCB3ZSBzdGFydCBieSBjcmVhdGluZyBhIHZlY3Rvci1iYXNlZCB2ZXJzaW9uIG9mIHRoZSBzY2F0dGVycGxvdCBhbmQgc2VlIHRoYXQgdGhlIGZpbGUgc2l6ZSBpcyB2ZXJ5IGxhcmdlLgoKYGBge3J9CnNfb3V0X3BhdGhfdmVjIDwtICJSdmVjcGxvdC5wZGYiCnBkZihzX291dF9wYXRoX3ZlYykKcGxvdCh4LCB5KQpkZXYub2ZmKCkKbl92ZWNfZmlsZXNpemUgPC0gZmlsZS5pbmZvKHNfb3V0X3BhdGhfdmVjKSRzaXplCmBgYAoKCiMjIyBwbmcgClRoZSBsb3dlciBib3VuZCBvZiB0aGUgZmlsZSBzaXplIG9yIHRoZSB0YXJnZXQgdmFsdWUgaXMgcmVhY2hlZCB3aXRoIGFuIG9yZGluYXJ5IGBwbmdgLXBsb3QKCmBgYHtyfQpzX291dF9wYXRoX3BuZyA8LSAiUnBuZ3Bsb3QucG5nIgpwbmcoc19vdXRfcGF0aF9wbmcpCnBsb3QoeCwgeSkKZGV2Lm9mZigpCm5fcG5nX2ZpbGVzaXplIDwtIGZpbGUuaW5mbyhzX291dF9wYXRoX3BuZykkc2l6ZQpgYGAKCgojIyMgc21vb3RoU2NhdHRlcgpUaGUgYGdyYXBoaWNzYCBwYWNrYWdlIGZyb20gci1iYXNlIGNvbnRhaW5zIHRoZSBmdW5jdGlvbiBgc21vb3RoU2NhdHRlcigpYCB3aGljaCBzaGFkZXMgdGhlIHBsYW5lIGFjY29yZGluZyB0byBhIDJEIGtlcm5lbCBkZW5zaXR5IGVzdGltYXRlLiAKCmBgYHtyfQpzX291dF9wYXRoX3Ntb290aCA8LSAiUnNtb290aHBsb3QucGRmIgpwZGYoc19vdXRfcGF0aF9zbW9vdGgpCiNzbW9vdGhTY2F0dGVyKHkgfiB4LCBjb2xyYW1wID0gY29sb3JSYW1wUGFsZXR0ZShjKCJ3aGl0ZSIsIGJsdWVzOSkpLCB4bGltID0gYygwLDEpLCB5bGltID0gYygwLCA4KSkKc21vb3RoU2NhdHRlcih5IH4geCwgeGxpbSA9IGMoMCwxKSwgeWxpbSA9IGMoMCwgOCkpCmRldi5vZmYoKQpuX3Ntb290aF9maWxlc2l6ZSA8LSBmaWxlLmluZm8oc19vdXRfcGF0aF9zbW9vdGgpJHNpemUKYGBgCgoKIyMjIGdncGxvdDIKVGhpcyBhcHByb2FjaCBpcyBkaWZmZXJlbnQgaW4gdGhlIHNlbnNlIHRoYXQgaXQgZG9lcyBub3Qgc3RvcmUgdGhlIGdyYXBoaWNzIGZpbGVzLCBidXQgaXQgc3RvcmVzIHRoZSBncmFwaGljcyBvYmplY3RzLiBUaGlzIGNhbiBvbmx5IGJlIGRvbmUgd2l0aCBgZ2dwbG90MmAuCgpgYGB7cn0KcmVxdWlyZShnZ3Bsb3QyKQpzX3B1dF9wYXRoX3Bsb3Rfb2JqIDwtICJScGxvdG9iai5yZGEiCnAgPC0gZ2dwbG90KHRibF9wbG90X2RhdGEsIGFlcyh4LCB5KSkgKyBnZW9tX3BvaW50KCkKc2F2ZShwLCBmaWxlID0gc19wdXRfcGF0aF9wbG90X29iaikKbl9nZ3Bsb3RfZmlsZXNpemUgPC0gZmlsZS5pbmZvKHNfcHV0X3BhdGhfcGxvdF9vYmopJHNpemUKYGBgCgpPbiBhIGRpZmZlcmVudCBtYWNoaW5lIHRoZSBwbG90IGlzIGxvYWRlZCBhZ2FpbiBhbmQgdGhlIHBsb3R0ZWQuCgpgYGB7cn0Kc19vdXRfcGF0aF9nZ3Bsb3QgPC0gIlJnZ3Bsb3QucG5nIgpsb2FkKHNfcHV0X3BhdGhfcGxvdF9vYmopCmdnc2F2ZShzX291dF9wYXRoX2dncGxvdCwgcGxvdCA9IHAsIHdpZHRoID0gNCwgaGVpZ2h0ID0gNCkKbl9wbmdfZ2dwbG90X2ZpbGVzaXplIDwtIGZpbGUuaW5mbyhzX291dF9wYXRoX2dncGxvdCkkc2l6ZQpgYGAKCgojIyMgUmVzdWx0aW5nIEZpbGVzaXplcwpUaGUgdGFibGUgYmVsb3cgZ2l2ZXMgYW4gb3ZlcnZpZXcgb2YgdGhlIGZpbGUgc2l6ZXMKCmBgYHtyLCBlY2hvPUZBTFNFLCByZXN1bHRzPSdhc2lzJ30KdGJsX2ZpbGVfc2l6ZSA8LSB0aWJibGU6OnRpYmJsZShGaWxlbmFtZSA9IGMoc19vdXRfcGF0aF92ZWMsIHNfb3V0X3BhdGhfcG5nLCBzX291dF9wYXRoX3Ntb290aCwgc19wdXRfcGF0aF9wbG90X29iaiwgc19vdXRfcGF0aF9nZ3Bsb3QpLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIE1ldGhvZCA9IGMoInBkZiIsICJwbmciLCAic21vb3RoIiwgImdncGxvdDJvYmoiLCAiZ2dzYXZlIiksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgYEZpbGUgU2l6ZSAoaW4gS0IpYD0gcm91bmQoYyhuX3ZlY19maWxlc2l6ZSwgbl9wbmdfZmlsZXNpemUsIG5fc21vb3RoX2ZpbGVzaXplLCBuX2dncGxvdF9maWxlc2l6ZSwgbl9wbmdfZ2dwbG90X2ZpbGVzaXplKS8xMDI0LCBkaWdpdHMgPSAyKSkKIyMjICMgcHV0IHRoZSB0aWJibGUgaW50byBhIHRhYmxlCmtuaXRyOjprYWJsZSh0YmxfZmlsZV9zaXplLAogICAgICAgICAgICAgYm9va3RhYnMgPSBUUlVFKQpgYGAKCgoKIyMgQ2xlYW4gVXAKQWxsIHRoZSBwbG90IGZpbGVzIGFyZSByZW1vdmVkLiAKCmBgYHtyfQpmaWxlLnJlbW92ZSh0YmxfZmlsZV9zaXplJEZpbGVuYW1lKQpgYGAKCg==