In R, the character vector is divided according to specific characters; Save the 3rd piece in the new vector

I have a 'AAA'_ 9999_ Data vector in the form of 1 ', in which the first part is alpha location code, the second part is four digit year, and the last is a unique point identifier For example, there are multiple SILs_ 2007_ X points, each point has a different last digit I need to use "" Character splits this field and saves only the unique ID number to the new vector I tried:

oss$point <- unlist(strsplit(oss$id,split='_',fixed=TRUE))[3]

Based on the reply here: R remove part of string I received a single reply from "1" If I run

strsplit(oss$id,split= ‘_’,fixed=TRUE)

I can generate split lists:

> head(oss$point)
[[1]]
[1] "sil"  "2007" "1"   

[[2]]
[1] "sil"  "2007" "2"   

[[3]]
[1] "sil"  "2007" "3"   

[[4]]
[1] "sil"  "2007" "4"   

[[5]]
[1] "sil"  "2007" "5"   

[[6]]
[1] "sil"  "2007" "6"

Add [3] at the end and only give me [[3]] results: "SIL" "2007" "3" What I want is the vector of Part 3 (unique number) of all records I think I'm close to understanding this, but I spend too much time on deadline projects (like most of the time) Thank you for any feedback

Solution

Strsplit creates a list, so I'll try the following:

lapply(strsplit(oss$id,fixed=TRUE),`[`,3) ## Output a list
sapply(strsplit(oss$id,3) ## Output a vector (even though a list is also a vector)

[how to extract the third element. If you like vectors, use sapply instead of lapply

Here is an example:

mystring <- c("A_B_C","D_E_F")

lapply(strsplit(mystring,"_"),3)
# [[1]]
# [1] "C"
# 
# [[2]]
# [1] "F"
sapply(strsplit(mystring,3)
# [1] "C" "F"

If there is an easily defined schema, gsub may also be a good choice and avoid fragmentation See dwin and Josh O'Brien's improved (more powerful) version for comments

gsub(".*_.*_(.*)","\\1",mystring)
# [1] "C" "F"

Finally, for fun, you can extend the unlist method to extract each third item by looping the vectors of true and false (because we know in advance that all splits will produce the same structure)

unlist(strsplit(mystring,use.names = FALSE)[c(FALSE,FALSE,TRUE)]
# [1] "C" "F"

If you want to extract the last value after the separator instead of the number position, you have several different options

Use greedy regular expressions:

gsub(".*_(.*)",mystring)
# [1] "C" "F"

Use the string in the "stringi" package_ Convenience functions such as extract *:

library(stringi)
stri_extract_last_regex(mystring,"[A-Z]+")
# [1] "C" "F"
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>