The rcpp function adds the elements of a vector

2020-08-26 • Java

I have a very long parameter vector (about 4 ^ 10 elements) and an index vector My goal is to add up all the parameter values of the index in the index vector

For example, if I have para = [1,2,3,4,5,5] and indices = [3,1,6], then I want to find the cumulative sum (3) of the third value twice, and the first value (1) and the sixth value (5) get 12 There is also the option to twist parameter values according to their position

I'm trying to speed up R implementation because I call it millions of times

My current code always returns Na, and I can't see where it goes wrong

This is the rcpp function:

double dot_prod_c(NumericVector indices,NumericVector paras,NumericVector warp = NA_REAL) {
int len = indices.size();
LogicalVector indices_ok;
for (int i = 0; i < len; i++){
    indices_ok.push_back(R_IsNA(indices[i]));
}
if(is_true(any(indices_ok))){
    return NA_REAL;
}
double counter = 0;
if(NumericVector::is_na(warp[1])){
    for (int i = 0; i < len; i++){
        counter += paras[indices[i]];
    }
} else {
    for (int i = 0; i < len; i++){
        counter += paras[indices[i]] * warp[i];
    }
}
return counter;
}

This is the working r version:

dot_prod <- function(indices,paras,warp = NA){
    if(is.na(warp[1])){
        return(sum(sapply(indices,function(ind) paras[ind + 1])))
    } else {
        return(sum(sapply(1:length(indices),function(i){
            ind <- indices[i]
            paras[ind + 1] * warp[i]
        })))
    }
}

The following are some codes for testing and benchmarking using the microbbenchmark software package:

# testing
library(Rcpp)
library(microbenchmark)

parameters <- list()
indices <- list()
indices_Trad <- list()

set.seed(2)
for (i in 4:12){
    size <- 4^i
    window_size <- 100
    parameters[[i-3]] <- runif(size)
    indices[[i-3]] <- floor(runif(window_size)*size)
    temp <- rep(0,size)
    for (j in 1:window_size){
        temp[indices[[i-3]][j] + 1] <- temp[indices[[i-3]][j] + 1] + 1
    }
    indices_Trad[[i-3]] <- temp
}

microbenchmark(
    x <- sapply(1:9,function(i) dot_prod(indices[[i]],parameters[[i]])),x_c <- sapply(1:9,function(i) dot_prod_c(indices[[i]],x_base <- sapply(1:9,function(i) indices_Trad[[i]] %*% parameters[[i]])
)
all.equal(x,x_base) # is true,does work
all.equal(x_c,x_base) # not true - C++ version returns only NAs

Solution

I ran into some trouble trying to explain your overall goal through your code, so I just want to explain this problem

Because I know best

There are some problems with your C code First, don't do this – numericvector warp = Na_ Real – use the RCP:: nullable < > template (shown below) This will solve some problems:

>It is more readable If you're not familiar with the nullable class, it's almost what it sounds like - an object that may or may not be null. > You don't have to do any awkward initialization, such as numericvector warp = Na_ REAL. Frankly, I'm surprised that the compiler accepted this. > You don't have to worry about accidentally forgetting that C uses a zero based index. Unlike R, this line is: if (numeric vector:: is_na (warp [1]) {. There is an ambiguous behavior written on it

This is a revised version that cancels your reference to the above problem:

#include <Rcpp.h>

typedef Rcpp::Nullable<Rcpp::NumericVector> nullable_t;
// [[Rcpp::export]]
double DotProd(Rcpp::NumericVector indices,Rcpp::NumericVector params,nullable_t warp_ = R_NilValue) {
  R_xlen_t i = 0,n = indices.size();
  double result = 0.0;

  if (warp_.isNull()) {
    for ( ; i < n; i++) {
      result += params[indices[i]];
    }    
  } else {
    Rcpp::NumericVector warp(warp_);
    for ( ; i < n; i++) {
      result += params[indices[i]] * warp[i];
    }  
  }

  return result;
}

You have some carefully designed code to generate sample data I didn't take the time to finish this because there was no need and no benchmark You said that your version of C did not produce the correct results Your first priority should be to let your code handle simple data Then give it some more complex data Then the reference The above revision applies to simple data:

args <- list(
  indices = c(3,6),params = c(1,5),warp = c(.25,.75,1.25,1.75)
)

all.equal(
  DotProd(args[[1]],args[[2]]),dot_prod(args[[1]],args[[2]]))
#[1] TRUE

all.equal(
  DotProd(args[[1]],args[[2]],args[[3]]),args[[3]]))
#[1] TRUE

It is also faster than the R version on this sample data I have no reason to believe that it is not applicable to larger and more complex data – * the apply function has no magic or special efficiency; They are just more conventional / readable R

microbenchmark::microbenchmark(
  "Rcpp" = DotProd(args[[1]],"R" = dot_prod(args[[1]],args[[2]]))
#Unit: microseconds
#expr    min      lq     mean  median      uq    max neval
#Rcpp  2.463  2.8815  3.52907  3.3265  3.8445 18.823   100
#R    18.869 20.0285 21.60490 20.4400 21.0745 66.531   100
#
microbenchmark::microbenchmark(
  "Rcpp" = DotProd(args[[1]],args[[3]]))
#Unit: microseconds
#expr    min      lq     mean median      uq    max neval
#Rcpp  2.680  3.0430  3.84796  3.701  4.1360 12.304   100
#R    21.587 22.6855 23.79194 23.342 23.8565 68.473   100

I omitted the Na check from the above example, but it can also be modified to something more conventional by using a little rcpp sugar In the past, you did this:

LogicalVector indices_ok;
for (int i = 0; i < len; i++){
  indices_ok.push_back(R_IsNA(indices[i]));
}
if(is_true(any(indices_ok))){
  return NA_REAL;
}

It's a bit radical – you're testing the entire value vector (using r_isna) and then applying is_ True (any (indications_ok)) – when you may interrupt prematurely and_ Returns Na on the first instance of ISNA (indices [i])_ When real, the result is true In addition, use push_ Back will slow down your function - you'd better put indices_ OK initializes to a known size and populates it with index access in the loop However, this is one way to compress:

if (Rcpp::na_omit(indices).size() != indices.size()) return NA_REAL;

For completeness, here is a fully saccharified version that allows you to completely avoid loops:

#include <Rcpp.h> 

typedef Rcpp::Nullable<Rcpp::NumericVector> nullable_t;
// [[Rcpp::export]]
double DotProd3(Rcpp::NumericVector indices,nullable_t warp_ = R_NilValue) {
  if (Rcpp::na_omit(indices).size() != indices.size()) return NA_REAL; 

  if (warp_.isNull()) {
    Rcpp::NumericVector tmp = params[indices];
    return Rcpp::sum(tmp);    
  } else {
    Rcpp::NumericVector warp(warp_),tmp = params[indices];
    return Rcpp::sum(tmp * warp); 
  }
}

/*** R

all.equal(
  DotProd3(args[[1]],args[[2]]))
#[1] TRUE

all.equal(
  DotProd3(args[[1]],args[[3]]))
#[1] TRUE

*/

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java – HashMap can only be copied through hashcode()

< <上一篇

Java – use the original string in the modified mail request body

下一篇>>

搜索内容

The rcpp function adds the elements of a vector

Solution

热门文章