Introduction
High-volume scatterplot are scatterplots with many points. This typically leads to overplotting and when saving the plots to files using vector-based devices such as pdf()
or postscript
result files are very large.
A first solution to this problem is to use raster-based graphics devices such as png()
or jpeg()
. But those devices require X11-library support to be installed. On an ubuntu system this support can be installed using the package xorg-dev
.
In what follows alternative ways to overcome the problem of high-volume scatterplots when the native raster-based devices such as png()
and jpeg()
are not available are described.
Alternative Solutions
This section contains a few alternative solutions
smoothScatter
Instead of plotting each point in a high-volumn scatterplot, a smoothed version of the plot is drawn using a 2D kernel density estimate. This can be done in the function graphics::smoothScatter()
which is available in base-r. This approach is described in more detail under https://www.stat.ubc.ca/~jenny/STAT545A/block91_latticeGraphics.html.
gglot2
Instead of saving the plots as raster-based images, we can use ggplot2
for plottting and save the plot as R-objects using the save()
function. The saved R-objects can be re-loaded on a different system with support for png()
and jpeg()
and viewed on this second system.
Numerical Example
In this section we want to investigate the file sizes of the different proposed approaches. For this we first create a random dataset and create from this a scatterplot and save those plots to files.
n_nr_obs <- 10^5
n_inter <- 3.24
n_slope <- 0.7
x <- runif(n_nr_obs)
y <- n_inter + n_slope * x + rnorm(n_nr_obs)
tbl_plot_data <- tibble::tibble(x = x, y = y)
The produced scatter plots that will be saved in files all look more or less as follows
plot(x, y)
The above shown scatter plot contains large sections of over-plotting. A smoothed scatter plot might give more information in all areas of the plot.
smoothScatter(y ~ x, xlim = c(0,1), ylim = c(0, 8))
pdf
As a reference to the upper bound of the file size, we start by creating a vector-based version of the scatterplot and see that the file size is very large.
s_out_path_vec <- "Rvecplot.pdf"
pdf(s_out_path_vec)
plot(x, y)
dev.off()
null device
1
n_vec_filesize <- file.info(s_out_path_vec)$size
png
The lower bound of the file size or the target value is reached with an ordinary png
-plot
s_out_path_png <- "Rpngplot.png"
png(s_out_path_png)
plot(x, y)
dev.off()
null device
1
n_png_filesize <- file.info(s_out_path_png)$size
smoothScatter
The graphics
package from r-base contains the function smoothScatter()
which shades the plane according to a 2D kernel density estimate.
s_out_path_smooth <- "Rsmoothplot.pdf"
pdf(s_out_path_smooth)
#smoothScatter(y ~ x, colramp = colorRampPalette(c("white", blues9)), xlim = c(0,1), ylim = c(0, 8))
smoothScatter(y ~ x, xlim = c(0,1), ylim = c(0, 8))
dev.off()
null device
1
n_smooth_filesize <- file.info(s_out_path_smooth)$size
ggplot2
This approach is different in the sense that it does not store the graphics files, but it stores the graphics objects. This can only be done with ggplot2
.
require(ggplot2)
s_put_path_plot_obj <- "Rplotobj.rda"
p <- ggplot(tbl_plot_data, aes(x, y)) + geom_point()
save(p, file = s_put_path_plot_obj)
n_ggplot_filesize <- file.info(s_put_path_plot_obj)$size
On a different machine the plot is loaded again and the plotted.
s_out_path_ggplot <- "Rggplot.png"
load(s_put_path_plot_obj)
ggsave(s_out_path_ggplot, plot = p, width = 4, height = 4)
n_png_ggplot_filesize <- file.info(s_out_path_ggplot)$size
Resulting Filesizes
The table below gives an overview of the file sizes
Rvecplot.pdf |
pdf |
699.95 |
Rpngplot.png |
png |
64.90 |
Rsmoothplot.pdf |
smooth |
86.37 |
Rplotobj.rda |
ggplot2obj |
1286.01 |
Rggplot.png |
ggsave |
128.55 |
Clean Up
All the plot files are removed.
file.remove(tbl_file_size$Filename)
[1] TRUE TRUE TRUE TRUE TRUE
LS0tCnRpdGxlOiAiSGlnaC12b2x1bWUgU2NhdHRlcnBsb3RzIFJldmlzaXRlZCIKYXV0aG9yOiAiUGV0ZXIgdm9uIFJvaHIiCmRhdGU6ICIyMDE5LTAxLTE1IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgoKIyMgSW50cm9kdWN0aW9uCkhpZ2gtdm9sdW1lIHNjYXR0ZXJwbG90IGFyZSBzY2F0dGVycGxvdHMgd2l0aCBtYW55IHBvaW50cy4gVGhpcyB0eXBpY2FsbHkgbGVhZHMgdG8gb3ZlcnBsb3R0aW5nIGFuZCB3aGVuIHNhdmluZyB0aGUgcGxvdHMgdG8gZmlsZXMgdXNpbmcgdmVjdG9yLWJhc2VkIGRldmljZXMgc3VjaCBhcyBgcGRmKClgIG9yIGBwb3N0c2NyaXB0YCByZXN1bHQgZmlsZXMgYXJlIHZlcnkgbGFyZ2UuCgpBIGZpcnN0IHNvbHV0aW9uIHRvIHRoaXMgcHJvYmxlbSBpcyB0byB1c2UgcmFzdGVyLWJhc2VkIGdyYXBoaWNzIGRldmljZXMgc3VjaCBhcyBgcG5nKClgIG9yIGBqcGVnKClgLiBCdXQgdGhvc2UgZGV2aWNlcyByZXF1aXJlIFgxMS1saWJyYXJ5IHN1cHBvcnQgdG8gYmUgaW5zdGFsbGVkLiBPbiBhbiB1YnVudHUgc3lzdGVtIHRoaXMgc3VwcG9ydCBjYW4gYmUgaW5zdGFsbGVkIHVzaW5nIHRoZSBwYWNrYWdlIGB4b3JnLWRldmAuIAoKSW4gd2hhdCBmb2xsb3dzIGFsdGVybmF0aXZlIHdheXMgdG8gb3ZlcmNvbWUgdGhlIHByb2JsZW0gb2YgaGlnaC12b2x1bWUgc2NhdHRlcnBsb3RzIHdoZW4gdGhlIG5hdGl2ZSByYXN0ZXItYmFzZWQgZGV2aWNlcyBzdWNoIGFzIGBwbmcoKWAgYW5kIGBqcGVnKClgIGFyZSBub3QgYXZhaWxhYmxlIGFyZSBkZXNjcmliZWQuIAoKCiMjIEFsdGVybmF0aXZlIFNvbHV0aW9ucwpUaGlzIHNlY3Rpb24gY29udGFpbnMgYSBmZXcgYWx0ZXJuYXRpdmUgc29sdXRpb25zCgoKIyMjIHNtb290aFNjYXR0ZXIKSW5zdGVhZCBvZiBwbG90dGluZyBlYWNoIHBvaW50IGluIGEgaGlnaC12b2x1bW4gc2NhdHRlcnBsb3QsIGEgc21vb3RoZWQgdmVyc2lvbiBvZiB0aGUgcGxvdCBpcyBkcmF3biB1c2luZyBhIDJEIGtlcm5lbCBkZW5zaXR5IGVzdGltYXRlLiBUaGlzIGNhbiBiZSBkb25lIGluIHRoZSBmdW5jdGlvbiBgZ3JhcGhpY3M6OnNtb290aFNjYXR0ZXIoKWAgd2hpY2ggaXMgYXZhaWxhYmxlIGluIGJhc2Utci4gVGhpcyBhcHByb2FjaCBpcyBkZXNjcmliZWQgaW4gbW9yZSBkZXRhaWwgdW5kZXIgaHR0cHM6Ly93d3cuc3RhdC51YmMuY2Evfmplbm55L1NUQVQ1NDVBL2Jsb2NrOTFfbGF0dGljZUdyYXBoaWNzLmh0bWwuIAoKCiMjIyBncmlkLnJhc3RlcgpJbiBjYXNlIHdlIGFyZSBhYmxlIHRvIHJlcHJlc2VudCB0aGUgc2NhdHRlcnBsb3QgZGF0YSBpbiBhIG1hdHJpeCwgd2UgY2FuIHVzZSBgZ3JpZC5yYXN0ZXIoKWAgdG8gZHJhdyBhIHJhc3Rlcml6ZWQgaW1hZ2Ugb2YgdGhlIHBsb3QuIEZvciBtb3JlIGRldGFpbHMsIHBsZWFzZSBoYXZlIGEgbG9vayBhdCBodHRwczovL2pvdXJuYWwuci1wcm9qZWN0Lm9yZy9hcmNoaXZlLzIwMTEtMS9SSm91cm5hbF8yMDExLTFfTXVycmVsbC5wZGYKCgojIyMgZ2dsb3QyCkluc3RlYWQgb2Ygc2F2aW5nIHRoZSBwbG90cyBhcyByYXN0ZXItYmFzZWQgaW1hZ2VzLCB3ZSBjYW4gdXNlIGBnZ3Bsb3QyYCBmb3IgcGxvdHR0aW5nIGFuZCBzYXZlIHRoZSBwbG90IGFzIFItb2JqZWN0cyB1c2luZyB0aGUgYHNhdmUoKWAgZnVuY3Rpb24uIFRoZSBzYXZlZCBSLW9iamVjdHMgY2FuIGJlIHJlLWxvYWRlZCBvbiBhIGRpZmZlcmVudCBzeXN0ZW0gd2l0aCBzdXBwb3J0IGZvciBgcG5nKClgIGFuZCBganBlZygpYCBhbmQgdmlld2VkIG9uIHRoaXMgc2Vjb25kIHN5c3RlbS4KCgojIyBOdW1lcmljYWwgRXhhbXBsZQpJbiB0aGlzIHNlY3Rpb24gd2Ugd2FudCB0byBpbnZlc3RpZ2F0ZSB0aGUgZmlsZSBzaXplcyBvZiB0aGUgZGlmZmVyZW50IHByb3Bvc2VkIGFwcHJvYWNoZXMuIEZvciB0aGlzIHdlIGZpcnN0IGNyZWF0ZSBhIHJhbmRvbSBkYXRhc2V0IGFuZCBjcmVhdGUgZnJvbSB0aGlzIGEgc2NhdHRlcnBsb3QgYW5kIHNhdmUgdGhvc2UgcGxvdHMgdG8gZmlsZXMuCgoKYGBge3J9Cm5fbnJfb2JzIDwtIDEwXjUKbl9pbnRlciA8LSAzLjI0Cm5fc2xvcGUgPC0gMC43CnggPC0gcnVuaWYobl9ucl9vYnMpCnkgPC0gbl9pbnRlciArIG5fc2xvcGUgKiB4ICsgcm5vcm0obl9ucl9vYnMpCnRibF9wbG90X2RhdGEgPC0gdGliYmxlOjp0aWJibGUoeCA9IHgsIHkgPSB5KQpgYGAKClRoZSBwcm9kdWNlZCBzY2F0dGVyIHBsb3RzIHRoYXQgd2lsbCBiZSBzYXZlZCBpbiBmaWxlcyBhbGwgbG9vayBtb3JlIG9yIGxlc3MgYXMgZm9sbG93cwoKYGBge3J9CnBsb3QoeCwgeSkKYGBgCgpUaGUgYWJvdmUgc2hvd24gc2NhdHRlciBwbG90IGNvbnRhaW5zIGxhcmdlIHNlY3Rpb25zIG9mIG92ZXItcGxvdHRpbmcuIEEgc21vb3RoZWQgc2NhdHRlciBwbG90IG1pZ2h0IGdpdmUgbW9yZSBpbmZvcm1hdGlvbiBpbiBhbGwgYXJlYXMgb2YgdGhlIHBsb3QuCgpgYGB7cn0Kc21vb3RoU2NhdHRlcih5IH4geCwgeGxpbSA9IGMoMCwxKSwgeWxpbSA9IGMoMCwgOCkpCmBgYAoKCgojIyMgcGRmCkFzIGEgcmVmZXJlbmNlIHRvIHRoZSB1cHBlciBib3VuZCBvZiB0aGUgZmlsZSBzaXplLCB3ZSBzdGFydCBieSBjcmVhdGluZyBhIHZlY3Rvci1iYXNlZCB2ZXJzaW9uIG9mIHRoZSBzY2F0dGVycGxvdCBhbmQgc2VlIHRoYXQgdGhlIGZpbGUgc2l6ZSBpcyB2ZXJ5IGxhcmdlLgoKYGBge3J9CnNfb3V0X3BhdGhfdmVjIDwtICJSdmVjcGxvdC5wZGYiCnBkZihzX291dF9wYXRoX3ZlYykKcGxvdCh4LCB5KQpkZXYub2ZmKCkKbl92ZWNfZmlsZXNpemUgPC0gZmlsZS5pbmZvKHNfb3V0X3BhdGhfdmVjKSRzaXplCmBgYAoKCiMjIyBwbmcgClRoZSBsb3dlciBib3VuZCBvZiB0aGUgZmlsZSBzaXplIG9yIHRoZSB0YXJnZXQgdmFsdWUgaXMgcmVhY2hlZCB3aXRoIGFuIG9yZGluYXJ5IGBwbmdgLXBsb3QKCmBgYHtyfQpzX291dF9wYXRoX3BuZyA8LSAiUnBuZ3Bsb3QucG5nIgpwbmcoc19vdXRfcGF0aF9wbmcpCnBsb3QoeCwgeSkKZGV2Lm9mZigpCm5fcG5nX2ZpbGVzaXplIDwtIGZpbGUuaW5mbyhzX291dF9wYXRoX3BuZykkc2l6ZQpgYGAKCgojIyMgc21vb3RoU2NhdHRlcgpUaGUgYGdyYXBoaWNzYCBwYWNrYWdlIGZyb20gci1iYXNlIGNvbnRhaW5zIHRoZSBmdW5jdGlvbiBgc21vb3RoU2NhdHRlcigpYCB3aGljaCBzaGFkZXMgdGhlIHBsYW5lIGFjY29yZGluZyB0byBhIDJEIGtlcm5lbCBkZW5zaXR5IGVzdGltYXRlLiAKCmBgYHtyfQpzX291dF9wYXRoX3Ntb290aCA8LSAiUnNtb290aHBsb3QucGRmIgpwZGYoc19vdXRfcGF0aF9zbW9vdGgpCiNzbW9vdGhTY2F0dGVyKHkgfiB4LCBjb2xyYW1wID0gY29sb3JSYW1wUGFsZXR0ZShjKCJ3aGl0ZSIsIGJsdWVzOSkpLCB4bGltID0gYygwLDEpLCB5bGltID0gYygwLCA4KSkKc21vb3RoU2NhdHRlcih5IH4geCwgeGxpbSA9IGMoMCwxKSwgeWxpbSA9IGMoMCwgOCkpCmRldi5vZmYoKQpuX3Ntb290aF9maWxlc2l6ZSA8LSBmaWxlLmluZm8oc19vdXRfcGF0aF9zbW9vdGgpJHNpemUKYGBgCgoKIyMjIGdncGxvdDIKVGhpcyBhcHByb2FjaCBpcyBkaWZmZXJlbnQgaW4gdGhlIHNlbnNlIHRoYXQgaXQgZG9lcyBub3Qgc3RvcmUgdGhlIGdyYXBoaWNzIGZpbGVzLCBidXQgaXQgc3RvcmVzIHRoZSBncmFwaGljcyBvYmplY3RzLiBUaGlzIGNhbiBvbmx5IGJlIGRvbmUgd2l0aCBgZ2dwbG90MmAuCgpgYGB7cn0KcmVxdWlyZShnZ3Bsb3QyKQpzX3B1dF9wYXRoX3Bsb3Rfb2JqIDwtICJScGxvdG9iai5yZGEiCnAgPC0gZ2dwbG90KHRibF9wbG90X2RhdGEsIGFlcyh4LCB5KSkgKyBnZW9tX3BvaW50KCkKc2F2ZShwLCBmaWxlID0gc19wdXRfcGF0aF9wbG90X29iaikKbl9nZ3Bsb3RfZmlsZXNpemUgPC0gZmlsZS5pbmZvKHNfcHV0X3BhdGhfcGxvdF9vYmopJHNpemUKYGBgCgpPbiBhIGRpZmZlcmVudCBtYWNoaW5lIHRoZSBwbG90IGlzIGxvYWRlZCBhZ2FpbiBhbmQgdGhlIHBsb3R0ZWQuCgpgYGB7cn0Kc19vdXRfcGF0aF9nZ3Bsb3QgPC0gIlJnZ3Bsb3QucG5nIgpsb2FkKHNfcHV0X3BhdGhfcGxvdF9vYmopCmdnc2F2ZShzX291dF9wYXRoX2dncGxvdCwgcGxvdCA9IHAsIHdpZHRoID0gNCwgaGVpZ2h0ID0gNCkKbl9wbmdfZ2dwbG90X2ZpbGVzaXplIDwtIGZpbGUuaW5mbyhzX291dF9wYXRoX2dncGxvdCkkc2l6ZQpgYGAKCgojIyMgUmVzdWx0aW5nIEZpbGVzaXplcwpUaGUgdGFibGUgYmVsb3cgZ2l2ZXMgYW4gb3ZlcnZpZXcgb2YgdGhlIGZpbGUgc2l6ZXMKCmBgYHtyLCBlY2hvPUZBTFNFLCByZXN1bHRzPSdhc2lzJ30KdGJsX2ZpbGVfc2l6ZSA8LSB0aWJibGU6OnRpYmJsZShGaWxlbmFtZSA9IGMoc19vdXRfcGF0aF92ZWMsIHNfb3V0X3BhdGhfcG5nLCBzX291dF9wYXRoX3Ntb290aCwgc19wdXRfcGF0aF9wbG90X29iaiwgc19vdXRfcGF0aF9nZ3Bsb3QpLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIE1ldGhvZCA9IGMoInBkZiIsICJwbmciLCAic21vb3RoIiwgImdncGxvdDJvYmoiLCAiZ2dzYXZlIiksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgYEZpbGUgU2l6ZSAoaW4gS0IpYD0gcm91bmQoYyhuX3ZlY19maWxlc2l6ZSwgbl9wbmdfZmlsZXNpemUsIG5fc21vb3RoX2ZpbGVzaXplLCBuX2dncGxvdF9maWxlc2l6ZSwgbl9wbmdfZ2dwbG90X2ZpbGVzaXplKS8xMDI0LCBkaWdpdHMgPSAyKSkKIyMjICMgcHV0IHRoZSB0aWJibGUgaW50byBhIHRhYmxlCmtuaXRyOjprYWJsZSh0YmxfZmlsZV9zaXplLAogICAgICAgICAgICAgYm9va3RhYnMgPSBUUlVFKQpgYGAKCgoKIyMgQ2xlYW4gVXAKQWxsIHRoZSBwbG90IGZpbGVzIGFyZSByZW1vdmVkLiAKCmBgYHtyfQpmaWxlLnJlbW92ZSh0YmxfZmlsZV9zaXplJEZpbGVuYW1lKQpgYGAKCg==