【问题标题】:Replacing multiple observations from one column with values from another column in R用 R 中另一列的值替换来自一列的多个观察值
【发布时间】:2019-10-11 17:20:20
【问题描述】:

我正在尝试用另外两列中的值替换两列中的值。这是一个相当基本的问题,有人问过bypythonusers,但是我使用的是 R。

我有一个看起来像这样的df(仅在更大的范围内 [>20,000]):

squirrel_id    locx    locy    dist
6391           17.5    10.0    50.0
6391           17.5    10.0    20.0
6391           17.5    10.0    15.5
8443           20.5    1.0     800
6025           -5.0    -0.5    0.0

对于 63 只松鼠,我需要替换它们的 locxlocy 值。

我通常用以下代码替换值:

library(dplyr)    

df <- df %>%
   mutate(locx = ifelse (squirrel_id=="6391", "12.5", locx),
         locy = ifelse (squirrel_id=="6391", "15.5", locy),
         locx = ifelse (squirrel_id=="8443", "2.5", locx),
         locy = ifelse (squirrel_id=="8443", "80", locy)) #etc for 63 squirrels

这会给我:

squirrel_id    locx    locy    dist
6391           12.5    10.0    50.0
6391           12.5    10.0    20.0
6391           12.5    10.0    15.5
8443           2.5     80.0    800
6025           -5.0    -0.5    0.0

但这会创建额外的 126 行代码,我怀疑有更简单的方法可以做到这一点。

我确实在单独的df 中拥有所有新的locxlocy 值,但我不知道如何通过squirrel_id 加入这两个dataframes 而不会弄乱数据。

df 使用需要替换旧df 中的值:

squirrel_id    new_locx    new_locy   
6391           12.5        15.5 
8443           2.5         80
6025           -55.0       0.0

我怎样才能更有效地做到这一点?

【问题讨论】:

    标签: r join merge dplyr


    【解决方案1】:

    你可以left_join这两个数据框,然后使用if_else语句得到正确的locxlocy。试试看:

    library(dplyr)
    df %>% left_join(df2, by = "squirrel_id") %>%
            mutate(locx = if_else(is.na(new_locx), locx, new_locx), # as suggested by @echasnovski, we can also use locx = coalesce(new_locx, locx)
                   locy = if_else(is.na(new_locy), locy, new_locy)) %>% # or locy = coalesce(new_locy, locy)
            select(-new_locx, -new_locy)
    # output
      squirrel_id  locx locy  dist
    1        6391  12.5 15.5  50.0
    2        6391  12.5 15.5  20.0
    3        6391  12.5 15.5  15.5
    4        8443   2.5 80.0 800.0
    5        6025 -55.0  0.0   0.0
    6        5000  18.5 18.5  10.0 # squirrel_id 5000 was created for an example of id 
    # present if df but not in df2
    

    数据

    df <- structure(list(squirrel_id = c(6391L, 6391L, 6391L, 8443L, 6025L, 
    5000L), locx = c(17.5, 17.5, 17.5, 20.5, -5, 18.5), locy = c(10, 
    10, 10, 1, -0.5, 12.5), dist = c(50, 20, 15.5, 800, 0, 10)), class = "data.frame", row.names = c(NA, 
    -6L))
    df2 <- structure(list(squirrel_id = c(6391L, 8443L, 6025L), new_locx = c(12.5, 
    2.5, -55), new_locy = c(15.5, 80, 0)), class = "data.frame", row.names = c(NA, 
    -3L))
    

    【讨论】:

    • 请注意,您可以使用coalesce(x, y),而不是if_else(is.na(x), x, y)
    • 感谢您指出@echasnovski,我将编辑我的帖子
    【解决方案2】:

    使用@ANG 的数据,这是一个data.table 解决方案。它通过引用加入和更新原始df

    library(data.table)
    
    setDT(df)
    setDT(df2)
    
    df[df2, on = c('squirrel_id'), `:=` (locx = new_locx, locy = new_locy) ]
    
    df
    
       squirrel_id  locx locy  dist
    1:        6391  12.5 15.5  50.0
    2:        6391  12.5 15.5  20.0
    3:        6391  12.5 15.5  15.5
    4:        8443   2.5 80.0 800.0
    5:        6025 -55.0  0.0   0.0
    6:        5000  18.5 12.5  10.0
    

    另见:

    how to use merge() to update a table in R

    Replace a subset of a data frame with dplyr join operations

    R: Updating a data frame with another data frame

    【讨论】:

      猜你喜欢
      • 2015-11-21
      • 2016-01-03
      • 2021-03-28
      • 1970-01-01
      • 2016-03-08
      • 1970-01-01
      • 2022-11-04
      • 1970-01-01
      • 2020-05-18
      相关资源
      最近更新 更多