The rcpp function adds the elements of a vector
I have a very long parameter vector (about 4 ^ 10 elements) and an index vector My goal is to add up all the parameter values of the index in the index vector
For example, if I have para = [1,2,3,4,5,5] and indices = [3,1,6], then I want to find the cumulative sum (3) of the third value twice, and the first value (1) and the sixth value (5) get 12 There is also the option to twist parameter values according to their position
I'm trying to speed up R implementation because I call it millions of times
My current code always returns Na, and I can't see where it goes wrong
This is the rcpp function:
double dot_prod_c(NumericVector indices,NumericVector paras,NumericVector warp = NA_REAL) { int len = indices.size(); LogicalVector indices_ok; for (int i = 0; i < len; i++){ indices_ok.push_back(R_IsNA(indices[i])); } if(is_true(any(indices_ok))){ return NA_REAL; } double counter = 0; if(NumericVector::is_na(warp[1])){ for (int i = 0; i < len; i++){ counter += paras[indices[i]]; } } else { for (int i = 0; i < len; i++){ counter += paras[indices[i]] * warp[i]; } } return counter; }
This is the working r version:
dot_prod <- function(indices,paras,warp = NA){ if(is.na(warp[1])){ return(sum(sapply(indices,function(ind) paras[ind + 1]))) } else { return(sum(sapply(1:length(indices),function(i){ ind <- indices[i] paras[ind + 1] * warp[i] }))) } }
The following are some codes for testing and benchmarking using the microbbenchmark software package:
# testing library(Rcpp) library(microbenchmark) parameters <- list() indices <- list() indices_Trad <- list() set.seed(2) for (i in 4:12){ size <- 4^i window_size <- 100 parameters[[i-3]] <- runif(size) indices[[i-3]] <- floor(runif(window_size)*size) temp <- rep(0,size) for (j in 1:window_size){ temp[indices[[i-3]][j] + 1] <- temp[indices[[i-3]][j] + 1] + 1 } indices_Trad[[i-3]] <- temp } microbenchmark( x <- sapply(1:9,function(i) dot_prod(indices[[i]],parameters[[i]])),x_c <- sapply(1:9,function(i) dot_prod_c(indices[[i]],x_base <- sapply(1:9,function(i) indices_Trad[[i]] %*% parameters[[i]]) ) all.equal(x,x_base) # is true,does work all.equal(x_c,x_base) # not true - C++ version returns only NAs
Solution
I ran into some trouble trying to explain your overall goal through your code, so I just want to explain this problem
Because I know best
There are some problems with your C code First, don't do this – numericvector warp = Na_ Real – use the RCP:: nullable < > template (shown below) This will solve some problems:
>It is more readable If you're not familiar with the nullable class, it's almost what it sounds like - an object that may or may not be null. > You don't have to do any awkward initialization, such as numericvector warp = Na_ REAL. Frankly, I'm surprised that the compiler accepted this. > You don't have to worry about accidentally forgetting that C uses a zero based index. Unlike R, this line is: if (numeric vector:: is_na (warp [1]) {. There is an ambiguous behavior written on it
This is a revised version that cancels your reference to the above problem:
#include <Rcpp.h> typedef Rcpp::Nullable<Rcpp::NumericVector> nullable_t; // [[Rcpp::export]] double DotProd(Rcpp::NumericVector indices,Rcpp::NumericVector params,nullable_t warp_ = R_NilValue) { R_xlen_t i = 0,n = indices.size(); double result = 0.0; if (warp_.isNull()) { for ( ; i < n; i++) { result += params[indices[i]]; } } else { Rcpp::NumericVector warp(warp_); for ( ; i < n; i++) { result += params[indices[i]] * warp[i]; } } return result; }
You have some carefully designed code to generate sample data I didn't take the time to finish this because there was no need and no benchmark You said that your version of C did not produce the correct results Your first priority should be to let your code handle simple data Then give it some more complex data Then the reference The above revision applies to simple data:
args <- list( indices = c(3,6),params = c(1,5),warp = c(.25,.75,1.25,1.75) ) all.equal( DotProd(args[[1]],args[[2]]),dot_prod(args[[1]],args[[2]])) #[1] TRUE all.equal( DotProd(args[[1]],args[[2]],args[[3]]),args[[3]])) #[1] TRUE
It is also faster than the R version on this sample data I have no reason to believe that it is not applicable to larger and more complex data – * the apply function has no magic or special efficiency; They are just more conventional / readable R
microbenchmark::microbenchmark( "Rcpp" = DotProd(args[[1]],"R" = dot_prod(args[[1]],args[[2]])) #Unit: microseconds #expr min lq mean median uq max neval #Rcpp 2.463 2.8815 3.52907 3.3265 3.8445 18.823 100 #R 18.869 20.0285 21.60490 20.4400 21.0745 66.531 100 # microbenchmark::microbenchmark( "Rcpp" = DotProd(args[[1]],args[[3]])) #Unit: microseconds #expr min lq mean median uq max neval #Rcpp 2.680 3.0430 3.84796 3.701 4.1360 12.304 100 #R 21.587 22.6855 23.79194 23.342 23.8565 68.473 100
I omitted the Na check from the above example, but it can also be modified to something more conventional by using a little rcpp sugar In the past, you did this:
LogicalVector indices_ok; for (int i = 0; i < len; i++){ indices_ok.push_back(R_IsNA(indices[i])); } if(is_true(any(indices_ok))){ return NA_REAL; }
It's a bit radical – you're testing the entire value vector (using r_isna) and then applying is_ True (any (indications_ok)) – when you may interrupt prematurely and_ Returns Na on the first instance of ISNA (indices [i])_ When real, the result is true In addition, use push_ Back will slow down your function - you'd better put indices_ OK initializes to a known size and populates it with index access in the loop However, this is one way to compress:
if (Rcpp::na_omit(indices).size() != indices.size()) return NA_REAL;
For completeness, here is a fully saccharified version that allows you to completely avoid loops:
#include <Rcpp.h> typedef Rcpp::Nullable<Rcpp::NumericVector> nullable_t; // [[Rcpp::export]] double DotProd3(Rcpp::NumericVector indices,nullable_t warp_ = R_NilValue) { if (Rcpp::na_omit(indices).size() != indices.size()) return NA_REAL; if (warp_.isNull()) { Rcpp::NumericVector tmp = params[indices]; return Rcpp::sum(tmp); } else { Rcpp::NumericVector warp(warp_),tmp = params[indices]; return Rcpp::sum(tmp * warp); } } /*** R all.equal( DotProd3(args[[1]],args[[2]])) #[1] TRUE all.equal( DotProd3(args[[1]],args[[3]])) #[1] TRUE */