【问题标题】:Match in df1$col1 in df2 and output value in df2$col1 into df1$col2匹配 df2 中的 df1$col1 并将 df2$col1 中的输出值匹配到 df1$col2
【发布时间】:2021-03-26 15:50:11
【问题描述】:

我有两个数据框。

symbols <- c("Santa", "Elves", "Candy Cane", "Reindeers", "Cats",
             "Turkey", "Mashed Potatoes", "Cranberry Sauce", "Dogs",
             "Eggs", "Chocolates with cream", "Bunnies", "Flowers", "Donut")


df1 <- data.frame(symbols)

df1
                 symbols
1                  Santa
2                  Elves
3             Candy Cane
4              Reindeers
5                   Cats
6                 Turkey
7        Mashed Potatoes
8        Cranberry Sauce
9                   Dogs
10                  Eggs
11 Chocolates with cream
12               Bunnies
13               Flowers
14                 Donut

holiday <- c("Christmas", "Thanksgiving", "Easter")
v1 <- c("Santa", "Turkey", "Eggs")
v2 <- c("Elves", "Mashed Potatoes", "Chocolates with cream")
v3 <- c("Candy Canes", "Cranberry Sauce", "Bunnies")
v4 <- c("Reindeers", NA, "Flowers")

df2 <- data.frame(holiday, v1, v2, v3, v4)

df2
       holiday     v1
1    Christmas  Santa
2 Thanksgiving Turkey
3       Easter   Eggs
                     v2              v3
1                 Elves     Candy Canes
2       Mashed Potatoes Cranberry Sauce
3 Chocolates with cream         Bunnies
         v4
1 Reindeers
2      <NA>
3   Flowers

如果 df1$symbols 中的任何内容与 df2 中的任何值(df2$holiday、df2$v1、df2$v2、df2$v3、df2$v4)匹配,我希望它将 df2$holiday 值输出到df1 中的新列。

理想情况下,我会有一个如下所示的 df1:

    df1
                 symbols      holiday
1                  Santa    Christmas
2                  Elves    Christmas
3             Candy Cane    Christmas
4              Reindeers    Christmas
5                   Cats         <NA>
6                 Turkey Thanksgiving
7        Mashed Potatoes Thanksgiving
8        Cranberry Sauce Thanksgiving
9                   Dogs         <NA>
10                  Eggs       Easter
11 Chocolates with cream       Easter
12               Bunnies       Easter
13               Flowers       Easter
14                 Donut         <NA>

我认为我可以做到的一种方法是将 df2 拆分,然后为每一列执行 left_join:

df2_v1 <- data.frame(df2$holiday, df2$v1)
df2_v2 <- data.frame(df2$holiday, df2$v2)
df2_v3 <- data.frame(df2$holiday, df2$v3)
df2_v4 <- data.frame(df2$holiday, df2$v4)

Then I can use left_join for each df1 with df2_v#. For example:

df1_x <- left_join(df1, df2_v1, by = c("symbols" = "df2.v1"))

然后我可以合并或使用一些 ifelse 逻辑来获得一个干净的 df1$holiday 列,但如果 df2 中有更多列,这将非常耗时。

有更快的方法吗?

【问题讨论】:

    标签: r dataframe match


    【解决方案1】:
    library( data.table )
    setDT(df1);setDT(df2)
    #join on molten df2
    df1[ melt(df2, id.vars = "holiday"), 
         holiday := i.holiday, on = .(symbols = value)]
    
    #                  symbols      holiday
    # 1:                 Santa    Christmas
    # 2:                 Elves    Christmas
    # 3:            Candy Cane         <NA>
    # 4:             Reindeers    Christmas
    # 5:                  Cats         <NA>
    # 6:                Turkey Thanksgiving
    # 7:       Mashed Potatoes Thanksgiving
    # 8:       Cranberry Sauce Thanksgiving
    # 9:                  Dogs         <NA>
    #10:                  Eggs       Easter
    #11: Chocolates with cream       Easter
    #12:               Bunnies       Easter
    #13:               Flowers       Easter
    #14:                 Donut         <NA>
    

    【讨论】:

      猜你喜欢
      • 2021-02-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-06-20
      • 1970-01-01
      • 1970-01-01
      • 2017-02-06
      • 1970-01-01
      相关资源
      最近更新 更多