Effectively find unique vector elements in the list
I have a list of numeric vectors. I need to create a list that contains only one copy of each vector There is no list method for the same function, so I wrote a function to check the alignment of each vector
F1 <- function(x){ to_remove <- c() for(i in 1:length(x)){ for(j in 1:length(x)){ if(i!=j && identical(x[[i]],x[[j]]) to_remove <- c(to_remove,j) } } if(is.null(to_remove)) x else x[-c(to_remove)] }
The problem is that this function becomes very slow as the size of the input list x increases, partly because the for loop allocates two large vectors I hope to be able to run a 15 - length vector with a length of 1.5 million in one minute, but this may be optimistic
Does anyone know a more effective way to compare each vector in the list with each other? The length of the carrier itself is guaranteed to be equal
The sample output is shown below
x = list(1:4,1:4,2:5,3:6) F1(x) > list(1:4,3:6)
Solution
According to @ Joshua Ulrich and @ thelatemail, ll [! Duplicate (LL)] works normally
Since efficiency is a goal, we should benchmark these
# Let's create some sample data xx <- lapply(rep(100,15),sample) ll <- as.list(sample(xx,1000,T)) ll
Put it against some becnhmarks
fun1 <- function(ll) { ll[c(TRUE,!sapply(2:length(ll),function(i) ll[i] %in% ll[1:(i-1)]))] } fun2 <- function(ll) { ll[!duplicated(sapply(ll,digest))] } fun3 <- function(ll) { ll[!duplicated(ll)] } fun4 <- function(ll) { unique(ll) } #Make sure all the same all(identical(fun1(ll),fun2(ll)),identical(fun2(ll),fun3(ll)),identical(fun3(ll),fun4(ll)),identical(fun4(ll),fun1(ll))) # [1] TRUE library(rbenchmark) benchmark(digest=fun2(ll),duplicated=fun3(ll),unique=fun4(ll),replications=100,order="relative")[,c(1,3:6)] test elapsed relative user.self sys.self 3 unique 0.048 1.000 0.049 0.000 2 duplicated 0.050 1.042 0.050 0.000 1 digest 8.427 175.563 8.415 0.038 # I took out fun1,since when ll is large,it ran extremely slow
Fastest option:
unique(ll)