R: Delete the numbers at the beginning and end of the string
•
Java
I have the following vectors:
words <- c("5lang","kasverschil2","b2b")
I want to delete "5" in "5lang" and "2" in "kasverschil2" But I don't want to delete "2" in "B2B"
Solution
gsub("^\\d+|\\d+$","",words)
gsub("^\\d+|\\d+$","",words) #[1] "lang" "kasverschil" "b2b"
Another option is to use stringi
library(stringi) stri_replace_all_regex(words,"^\\d+|\\d+$","") #[1] "lang" "kasverschil" "b2b"
Using variations of the datasets provided by OP, here are the benchmarks for the three main solutions (note that these strings are very short and designed; the results may vary on larger actual datasets):
words <- rep(c("5lang","b2b"),100000) library(stringi) library(microbenchmark) GSUB <- function() gsub("^\\d+|\\d+$",words) STRINGI <- function() stri_replace_all_regex(words,"") GREGEXPR <- function() { gregexpr(pattern='(^[0-9]+|[0-9]+$)',text = words) -> mm sapply(regmatches(words,mm,invert=TRUE),paste,collapse="") } microbenchmark( GSUB(),STRINGI(),GREGEXPR(),times=100L ) ## Unit: milliseconds ## expr min lq median uq max neval ## GSUB() 301.0988 349.9952 396.3647 431.6493 632.7568 100 ## STRINGI() 465.9099 513.1570 569.1972 629.4176 738.4414 100 ## GREGEXPR() 5073.1960 5706.8160 6194.1070 6742.1552 7647.8904 100
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
二维码