Effectively find unique vector elements in the list

I have a list of numeric vectors. I need to create a list that contains only one copy of each vector There is no list method for the same function, so I wrote a function to check the alignment of each vector

F1 <- function(x){

    to_remove <- c()
    for(i in 1:length(x)){
        for(j in 1:length(x)){
            if(i!=j && identical(x[[i]],x[[j]]) to_remove <- c(to_remove,j)
        }
    }
    if(is.null(to_remove)) x else x[-c(to_remove)] 
}

The problem is that this function becomes very slow as the size of the input list x increases, partly because the for loop allocates two large vectors I hope to be able to run a 15 - length vector with a length of 1.5 million in one minute, but this may be optimistic

Does anyone know a more effective way to compare each vector in the list with each other? The length of the carrier itself is guaranteed to be equal

The sample output is shown below

x = list(1:4,1:4,2:5,3:6)
F1(x)
> list(1:4,3:6)

Solution

According to @ Joshua Ulrich and @ thelatemail, ll [! Duplicate (LL)] works normally

Since efficiency is a goal, we should benchmark these

# Let's create some sample data
xx <- lapply(rep(100,15),sample)
ll <- as.list(sample(xx,1000,T))
ll

Put it against some becnhmarks

fun1 <- function(ll) {
  ll[c(TRUE,!sapply(2:length(ll),function(i) ll[i] %in% ll[1:(i-1)]))]
}

fun2 <- function(ll) {
  ll[!duplicated(sapply(ll,digest))]
}

fun3 <- function(ll)  {
  ll[!duplicated(ll)]
}

fun4 <- function(ll)  {
  unique(ll)
}

#Make sure all the same
all(identical(fun1(ll),fun2(ll)),identical(fun2(ll),fun3(ll)),identical(fun3(ll),fun4(ll)),identical(fun4(ll),fun1(ll)))
# [1] TRUE


library(rbenchmark)

benchmark(digest=fun2(ll),duplicated=fun3(ll),unique=fun4(ll),replications=100,order="relative")[,c(1,3:6)]

        test elapsed relative user.self sys.self
3     unique   0.048    1.000     0.049    0.000
2 duplicated   0.050    1.042     0.050    0.000
1     digest   8.427  175.563     8.415    0.038
# I took out fun1,since when ll is large,it ran extremely slow

Fastest option:

unique(ll)
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>