【发布时间】:2019-08-27 23:05:36
【问题描述】:
我想将 R 中一列中的字符串与另一列中用“,”逗号分隔的字符串进行匹配
我在 R 中有两个数据框:
General_df
Main_cat gen_cat
Fruits apple
Fruits mango
Fruits strawberry
Vegetable potato
Vegetable lettuce
Vegetable onion
Liquids water
Liquids milk
Liquids juice
Tech app
Object straw
My_dataframe
Days cat
Day 1 apple, potato, milk
Day 2 onion, water
Day 3 strawberry, potato
Day 4 straw, mango
我想为“My_dataframe”获取 Main_cat,所以我设法得到了这个:
Days cat Match_string Main_cat
Day 1 apple, potato, milk apple Fruits
Day 1 apple, potato, milk potato Vegetable
Day 1 apple, potato, milk app Tech
Day 1 apple, potato, milk milk Liquids
它也匹配子字符串“app”,并且我的数据框中的多行有几个这样的子字符串匹配
但是,我只希望它完全匹配“cat”列中由“,”分隔的整个字符串
Days cat Match_string Main_cat
Day 1 apple, potato, milk apple Fruits
Day 1 apple, potato, milk potato Vegetable
Day 1 apple, potato, milk milk Liquids
有没有办法在这个场景中找到一个完全匹配的字符串?谢谢!
General_df <- read.table(text='
Main_cat gen_cat
Fruits apple
Fruits mango
Fruits strawberry
Vegetable potato
Vegetable lettuce
Vegetable onion
Liquids water
Liquids milk
Liquids juice
Tech app
Object straw', header=TRUE, stringsAsFactors = FALSE)
My_dataframe <- read.table(text='
Days; cat
Day 1; apple, potato, milk
Day 2; onion, water
Day 3; strawberry, potato
Day 4 ; straw, mango', sep=';', header=TRUE, stringsAsFactors = FALSE)
My_dataframe[] <- lapply(My_dataframe, trimws)
【问题讨论】:
-
fuzzyjoin::regex_inner_join将在这里一步到位,但效率低于公认的答案
标签: r string dataframe match grepl