Reduce the PDF file size of the drawing by filtering hidden objects
When generating a scatter plot of many points in R (for example, using ggplot ()), many points may lag behind other points and are not visible at all For example, see the following figure:
This is a scatter plot of hundreds of thousands of points, but most of them lag behind other points The problem is that when converting the output to a vector file (such as a PDF file), invisible points increase the file size and increase memory and CPU utilization when viewing the file
A simple solution is to convert the output to bitmap images (such as TIFF or PNG), but they lose vector quality and may be larger I tried some online PDF compressors, but the result was the same size as the original file
Is there any good solution? For example, some methods to filter invisible points may be by editing the PDF file during or after generating the drawing?
Solution
First, you can do this:
set.seed(42) DF <- data.frame(x=x<-runif(1e6),y=x+rnorm(1e6,sd=0.1)) plot(y~x,data=DF,pch=".",cex=4)
Pdf size: 6334 KB
DF2 <- data.frame(x=round(DF$x,3),y=round(DF$y,3)) DF2 <- DF[!duplicated(DF2),] nrow(DF2) #[1] 373429 plot(y~x,data=DF2,cex=4)
Pdf size: 2373 KB
By rounding, you can control the number of values to delete You just need to modify it to handle different colors