【发布时间】:2018-06-06 06:54:51
【问题描述】:
我正在尝试根据 df 中的分隔符(“_”)拆分第一列。
country_city_gender_age_name_state =c("US_Dallas_Male_23_hanes_TX","US_LosAngeles_CA",
"US_Atlanta_Female_jenny_GA","US_Orlando_kane_FL")
df = data.frame(country_city_gender_age_name_state)
第一个分隔符之前的字符串,第一个分隔符之后的第二个字符串和最后一个分隔符之前的最后一个字符串是连续的
根据上述说法
在 row_1 :所有基于 delimiter 和 no.of delimiters 拆分的字符串为 5
row_2 : 缺少分隔符 2,3,4,因此值为空,分隔符数量为 2
第 3 行:缺少分隔符 3,因此值为空,分隔符数量为 4
row_4 : 缺少分隔符 2,3 ,因此值为空,分隔符数量为 3
我已尝试使用以下代码。
df$country<- sapply(strsplit(as.character(df$country_city_gender_age_name_state),"_"), "[",1)
df$city<- sapply(strsplit(as.character(df$country_city_gender_age_name_state),"_"), "[",2)
df$gender<- sapply(strsplit(as.character(df$country_city_gender_age_name_state),"_"), "[",3)
df$age<- sapply(strsplit(as.character(df$country_city_gender_age_name_state),"_"), "[",4)
df$name<- sapply(strsplit(as.character(df$country_city_gender_age_name_state),"_"), "[",5)
df$state<- sapply(strsplit(as.character(df$country_city_gender_age_name_state), "_"), tail, 1)
所需的输出数据框将是
country = c("US","US","US","US")
city = c("Dallas","LosAngeles","Atlanta","Orlando")
gender = c("Male","","Femal","")
age = c("23","","","")
name = c("hanes","","Jenny","kane")
state = c("TX","CA","GA","FL")
out_df = data.frame(country_city_gender_age_name_state,country,city,gender,age,name,state)
提前致谢
【问题讨论】:
-
这很难做到。如何告诉计算机 LosAngeles 不是像“Jenny”这样的名字。
-
可能的起始位置:
library(tidyverse);df$country_city_gender_age_name_state %>% map(~str_split(.,"_") %>% unlist %>% trimws)
标签: r