Effectively find unique vector elements in the list
I have a list of numeric vectors. I need to create a list that contains only one copy of each vector There is no list method for the same function, so I wrote a function to check the alignment of each vector
F1 <- function(x){
to_remove <- c()
for(i in 1:length(x)){
for(j in 1:length(x)){
if(i!=j && identical(x[[i]],x[[j]]) to_remove <- c(to_remove,j)
}
}
if(is.null(to_remove)) x else x[-c(to_remove)]
}
The problem is that this function becomes very slow as the size of the input list x increases, partly because the for loop allocates two large vectors I hope to be able to run a 15 - length vector with a length of 1.5 million in one minute, but this may be optimistic
Does anyone know a more effective way to compare each vector in the list with each other? The length of the carrier itself is guaranteed to be equal
The sample output is shown below
x = list(1:4,1:4,2:5,3:6) F1(x) > list(1:4,3:6)
Solution
According to @ Joshua Ulrich and @ thelatemail, ll [! Duplicate (LL)] works normally
Since efficiency is a goal, we should benchmark these
# Let's create some sample data xx <- lapply(rep(100,15),sample) ll <- as.list(sample(xx,1000,T)) ll
Put it against some becnhmarks
fun1 <- function(ll) {
ll[c(TRUE,!sapply(2:length(ll),function(i) ll[i] %in% ll[1:(i-1)]))]
}
fun2 <- function(ll) {
ll[!duplicated(sapply(ll,digest))]
}
fun3 <- function(ll) {
ll[!duplicated(ll)]
}
fun4 <- function(ll) {
unique(ll)
}
#Make sure all the same
all(identical(fun1(ll),fun2(ll)),identical(fun2(ll),fun3(ll)),identical(fun3(ll),fun4(ll)),identical(fun4(ll),fun1(ll)))
# [1] TRUE
library(rbenchmark)
benchmark(digest=fun2(ll),duplicated=fun3(ll),unique=fun4(ll),replications=100,order="relative")[,c(1,3:6)]
test elapsed relative user.self sys.self
3 unique 0.048 1.000 0.049 0.000
2 duplicated 0.050 1.042 0.050 0.000
1 digest 8.427 175.563 8.415 0.038
# I took out fun1,since when ll is large,it ran extremely slow
Fastest option:
unique(ll)
