R: Delete the numbers at the beginning and end of the string

I have the following vectors:

words <- c("5lang","kasverschil2","b2b")

I want to delete "5" in "5lang" and "2" in "kasverschil2" But I don't want to delete "2" in "B2B"

Solution

gsub("^\\d+|\\d+$","",words)    
gsub("^\\d+|\\d+$","",words)    
 #[1] "lang"        "kasverschil" "b2b"

Another option is to use stringi

library(stringi)
 stri_replace_all_regex(words,"^\\d+|\\d+$","")
  #[1] "lang"        "kasverschil" "b2b"

Using variations of the datasets provided by OP, here are the benchmarks for the three main solutions (note that these strings are very short and designed; the results may vary on larger actual datasets):

words <- rep(c("5lang","b2b"),100000)

library(stringi)
library(microbenchmark)

GSUB <- function() gsub("^\\d+|\\d+$",words)
STRINGI <- function() stri_replace_all_regex(words,"")
GREGEXPR <- function() {
    gregexpr(pattern='(^[0-9]+|[0-9]+$)',text = words) -> mm
    sapply(regmatches(words,mm,invert=TRUE),paste,collapse="") 
}

microbenchmark( 
    GSUB(),STRINGI(),GREGEXPR(),times=100L
)

## Unit: milliseconds
##        expr       min        lq    median        uq       max neval
##      GSUB()  301.0988  349.9952  396.3647  431.6493  632.7568   100
##   STRINGI()  465.9099  513.1570  569.1972  629.4176  738.4414   100
##  GREGEXPR() 5073.1960 5706.8160 6194.1070 6742.1552 7647.8904   100
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>