R: Delete the numbers at the beginning and end of the string
•
Java
I have the following vectors:
words <- c("5lang","kasverschil2","b2b")
I want to delete "5" in "5lang" and "2" in "kasverschil2" But I don't want to delete "2" in "B2B"
Solution
gsub("^\\d+|\\d+$","",words)
gsub("^\\d+|\\d+$","",words)
#[1] "lang" "kasverschil" "b2b"
Another option is to use stringi
library(stringi) stri_replace_all_regex(words,"^\\d+|\\d+$","") #[1] "lang" "kasverschil" "b2b"
Using variations of the datasets provided by OP, here are the benchmarks for the three main solutions (note that these strings are very short and designed; the results may vary on larger actual datasets):
words <- rep(c("5lang","b2b"),100000)
library(stringi)
library(microbenchmark)
GSUB <- function() gsub("^\\d+|\\d+$",words)
STRINGI <- function() stri_replace_all_regex(words,"")
GREGEXPR <- function() {
gregexpr(pattern='(^[0-9]+|[0-9]+$)',text = words) -> mm
sapply(regmatches(words,mm,invert=TRUE),paste,collapse="")
}
microbenchmark(
GSUB(),STRINGI(),GREGEXPR(),times=100L
)
## Unit: milliseconds
## expr min lq median uq max neval
## GSUB() 301.0988 349.9952 396.3647 431.6493 632.7568 100
## STRINGI() 465.9099 513.1570 569.1972 629.4176 738.4414 100
## GREGEXPR() 5073.1960 5706.8160 6194.1070 6742.1552 7647.8904 100
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
二维码
