【问题标题】:str_replace_all mistakingly assigning valuesstr_replace_all 错误地赋值
【发布时间】:2021-02-26 15:46:50
【问题描述】:

我正在尝试从以下 df 替换“day”列中的值。

structure(list(Segment = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Position = c(1550L, 
1550L, 1550L, 1550L, 1550L, 1550L, 1550L, 1550L, 1550L, 1550L, 
1550L, 1550L, 1550L, 1550L, 1550L, 1550L, 1550L, 1550L, 1550L, 
1550L, 1550L, 1550L, 1550L, 1550L, 1550L, 1550L, 1550L, 1550L, 
1550L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 
100L, 100L, 100L, 100L, 100L, 2327L, 2327L, 2327L, 2327L, 2327L, 
2327L, 2327L, 2327L, 2327L, 2327L, 2327L, 2327L, 2327L, 2327L
), Quail = c(52L, 53L, 54L, 12L, 36L, 48L, 59L, 11L, 12L, 36L, 
48L, 59L, 52L, 53L, 54L, 52L, 53L, 54L, 11L, 12L, 48L, 59L, 59L, 
11L, 36L, 59L, 52L, 53L, 54L, 52L, 53L, 54L, 36L, 59L, 36L, 48L, 
59L, 52L, 53L, 54L, 52L, 53L, 54L, 36L, 48L, 59L, 36L, 59L, 36L, 
48L, 59L, 52L, 53L, 54L, 52L, 53L, 54L, 36L, 48L, 59L, 11L, 11L, 
12L, 36L, 48L, 59L, 36L, 59L), Freq = c(0.443883, 0.440835, 0.477273, 
0.761589, 0.186821, 0.072325, 0.748305, 0.986968, 0.99361, 0.664921, 
0.188847, 0.858921, 0.960804, 0.102041, 0.323194, 0.2, 0.449976, 
0.630868, 0.958506, 0.743932, 0.257758, 0.886377, 0.038241, 0.992894, 
0.633987, 0.564021, 0.054054, 0.068994, 0.200188, 0.091693, 0.256094, 
0.165732, 0.988798, 0.46675, 0.997898, 0.954168, 0.993462, 0.996931, 
0.932008, 0.998634, 0.957213, 0.858198, 0.22418, 0.910005, 0.045072, 
0.731313, 0.995946, 0.877519, 0.998066, 0.999401, 0.953812, 0.02749, 
0.043711, 0.065646, 0.032982, 0.025522, 0.023756, 0.02199, 0.020975, 
0.021915, 0.026906, 0.029056, 0.025562, 0.031411, 0.021782, 0.024584, 
0.033382, 0.026406), Group = structure(c(4L, 4L, 4L, 1L, 4L, 
2L, 3L, 1L, 1L, 4L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 2L, 
3L, 3L, 1L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 2L, 3L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 3L, 4L, 3L, 4L, 2L, 3L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 2L, 3L, 1L, 1L, 1L, 4L, 2L, 3L, 4L, 3L), .Label = c("var", 
"varL", "varLQ", "varQ"), class = "factor"), Expo = structure(c(2L, 
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("DC", "DI"), class = "factor"), day = c("3", 
"3", "3", "3", "3", "3", "3", "7", "7", "7", "7", "7", "7", "7", 
"7", "5", "5", "5", "5", "5", "5", "5", "1", "8", "8", "8", "1", 
"1", "1", "3", "3", "3", "3", "3", "7", "7", "7", "7", "7", "7", 
"5", "5", "5", "5", "5", "5", "1", "1", "8", "8", "8", "1", "1", 
"1", "3", "3", "3", "3", "3", "3", "7", "5", "5", "5", "5", "5", 
"1", "1")), row.names = c(NA, -68L), class = "data.frame")

为此,我制作了以下列表:

p = c("1" = "3",
      "3" = "5",
      "5" = "7",
      "7" = "9",
      "8" = "10")

并尝试使用以下替换:

library(stringr)
# Substitute        
VariantsGenomeQuails.sub <- 
  VariantsGenomeQuails %>% 
  mutate(day = case_when(Expo == "DC" ~ str_replace_all(day, p),
                         TRUE ~ as.character(day)))

这样做时,我只会得到 9 和 10 作为替换值,而缺少其他值。

如果我尝试用字母而不是数字替换数值,它会按预期工作。

我过去多次使用这种方法,从来没有遇到过问题。

你能检查一下我在这里遗漏了什么吗?

非常感谢。

【问题讨论】:

    标签: r dplyr stringr


    【解决方案1】:

    我测试了以下两种情况。在测试用例 1 中,Expo 为“DC”时的数字变为字母 a 到 e,而在测试用例 2 中,Expo 为“DC”时的所有结果都变为“e”。这意味着,当新替换的字符在您提供的列表中有另一个匹配可用时,此代码将继续替换该字符,直到最后一个可用。因此,在您的原始文件中,所有结果都变为“9”和“10”,而在我的测试用例 2 中,所有结果都是“e”。我相信根本原因是str_replace_all 是矢量化。当您在case_when 中重复应用str_replace_all 时,它总是会替换整个列。

    library(dplyr)
    library(stringr)
    
    # Test case 1  
    p2 = c("1" = "a",
          "3" = "b",
          "5" = "c",
          "7" = "d",
          "8" = "e")
    
    VariantsGenomeQuails.sub2 <- 
      VariantsGenomeQuails %>% 
      mutate(day = case_when(Expo == "DC" ~ str_replace_all(day, p2),
                             TRUE ~ as.character(day)))
    
    # Test case 2    
    p3 = c("1" = "a",
           "a" = "3",
           "3" = "b",
           "b" = "5",
           "5" = "c",
           "c" = "7",
           "7" = "d",
           "d" = "8",
           "8" = "e")
    
    VariantsGenomeQuails.sub3 <- 
      VariantsGenomeQuails %>% 
      mutate(day = case_when(Expo == "DC" ~ str_replace_all(day, p3),
                             TRUE ~ as.character(day)))
    

    这是对您的代码的修复。让我们不要使用str_replace_all,只使用匹配。效果很好。

    VariantsGenomeQuails.sub4 <- 
      VariantsGenomeQuails %>% 
      mutate(day = case_when(
        Expo %in% "DC" & day %in% "1"     ~"3",
        Expo %in% "DC" & day %in% "3"     ~"5",
        Expo %in% "DC" & day %in% "5"     ~"7",
        Expo %in% "DC" & day %in% "7"     ~"9",
        Expo %in% "DC" & day %in% "8"     ~"10",
        TRUE ~ day
      ))
    

    【讨论】:

      【解决方案2】:

      一个简单的解决方法是这样的,基于观察到“旧”号码和替换号码之间的步骤总是2

      df$day[df$Expo=="DC"] <- as.numeric(df$day[df$Expo=="DC"])+2
      

      【讨论】:

      • 嗨,克里斯,非常感谢您的评论。可能的问题实际上更多地是在我的教育方面,而不是解决问题本身。有一些我看不到的东西,有更多 R 经验的人可能会立即指出。
      • 我花了一个小时左右,但无法弄清楚问题出在哪里。对不起,伙计!
      • 如果有帮助,请使用df %&gt;% mutate(day = if_else(Expo == "DC", as.numeric(day) + 2, as.numeric(day)))
      • 非常感谢@Chris Ruehlemann!非常感谢。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2013-09-11
      • 2018-02-02
      • 2015-12-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多