R - 用另一行的变量替换数据框中的行变量答案

【问题标题】：R - Replace row variables within a data frame with variables from another rowR - 用另一行的变量替换数据框中的行变量
【发布时间】：2018-10-11 09:12:09
【问题描述】：

我有一个类似于下面的表示但有 100 多列的数据框列表：

# reproducible example
df <- data.frame(
  Name = c("Name1", "Name2", "Name3", "Name4", "Name5"),
  Date = c("2018-01-01", "2018-01-02"),
  Value1 = c(rnorm(5, 2, 3), rnorm(5, 4, 1)),
  Value2 = c(rnorm(5, 12, 4), rnorm(5, 5, 8)),
  Value3 = c(rnorm(5, 22, 13), rnorm(5, 7, 10))
)

# transform data frame into list
df <- split(df, df$Name)

对于列表中的每个数据框，我想用前一行的值替换最后一行。例如，对于列表中的每个数据框，我想将[2, 3:5] 替换为[1, 3:5]。

> tail(df[["Name1"]], n = 2)
   Name       Date    Value1    Value2    Value3
1 Name1 2018-01-01 0.9184539 15.658510 29.219707
2 Name1 2018-01-02 3.8875463  3.628546  9.777399

我不确定将我的数据框转换为列表是否是解决此问题的最佳方法，因此欢迎提出任何其他建议。我尝试按如下所述解决此问题，但我的尝试仅将数据框中的最后一行替换为倒数第二行。

我的尝试

# reproducible example
df <- data.frame(
  Name = c("Name1", "Name2", "Name3", "Name4", "Name5"),
  Date = c("2018-01-01", "2018-01-02"),
  Value1 = c(rnorm(5, 2, 3), rnorm(5, 4, 1)),
  Value2 = c(rnorm(5, 12, 4), rnorm(5, 5, 8)),
  Value3 = c(rnorm(5, 22, 13), rnorm(5, 7, 10))
)

# arrange by Name and Date
df <- df %>% dplyr::arrange(Name, Date)

# attempt to replace 
df[length(df$Name), c(3:5)] <- df[length(df$Name)-1, c(3:5)]

# result
tail(df, n = 4)

> tail(df, n = 4)
    Name       Date    Value1    Value2    Value3
7  Name4 2018-01-01  3.242383 -11.44217 -1.215688
8  Name4 2018-01-02 -4.042093  18.18184  1.544271
9  Name5 2018-01-01 -1.930195  13.18662 18.889372
10 Name5 2018-01-02 -1.930195  13.18662 18.889372

【问题讨论】：

标签： r dataframe replace

【解决方案1】：

tidyverse 解决方案。我认为没有必要转换为列表。 df 是您示例中的数据框。我们可以用NA替换最后一行，然后用fill填充上一行。

library(tidyverse)

df2 <- df %>%
  group_by(Name) %>%
  mutate_at(vars(starts_with("Value")), 
            funs(ifelse(row_number() == max(row_number()), NA, .))) %>%
  fill(starts_with("Value")) %>%
  ungroup()
df2
# # A tibble: 10 x 5
#    Name  Date       Value1 Value2 Value3
#    <fct> <fct>       <dbl>  <dbl>  <dbl>
#  1 Name1 2018-01-01  1.35   14.5   34.2 
#  2 Name1 2018-01-02  1.35   14.5   34.2 
#  3 Name2 2018-01-02  2.42    4.43  19.5 
#  4 Name2 2018-01-01  2.42    4.43  19.5 
#  5 Name3 2018-01-01  4.60   14.1   15.8 
#  6 Name3 2018-01-02  4.60   14.1   15.8 
#  7 Name4 2018-01-02  6.36   11.4    9.40
#  8 Name4 2018-01-01  6.36   11.4    9.40
#  9 Name5 2018-01-01  0.214   8.34  33.8 
# 10 Name5 2018-01-02  0.214   8.34  33.8

以下内容可能会更好。这个没有使用fill函数，也没有改变行序。

df2 <- df %>%
  group_by(Name) %>%
  mutate_at(vars(starts_with("Value")), 
            funs(ifelse(row_number() == max(row_number()), 
                        nth(., n = max(row_number()) - 1),
                        .))) %>%
  ungroup()
df2
# # A tibble: 10 x 5
#    Name  Date       Value1 Value2 Value3
#    <fct> <fct>       <dbl>  <dbl>  <dbl>
#  1 Name1 2018-01-01   4.40  13.5   28.0 
#  2 Name2 2018-01-02   1.82   8.23  20.9 
#  3 Name3 2018-01-01   1.07  16.9    7.50
#  4 Name4 2018-01-02   1.09   8.05  14.4 
#  5 Name5 2018-01-01   1.17  11.6   24.0 
#  6 Name1 2018-01-02   4.40  13.5   28.0 
#  7 Name2 2018-01-01   1.82   8.23  20.9 
#  8 Name3 2018-01-02   1.07  16.9    7.50
#  9 Name4 2018-01-01   1.09   8.05  14.4 
# 10 Name5 2018-01-02   1.17  11.6   24.0

【讨论】：

非常好的解决方案。当我在 starts_with() 函数中定义一个列名时，两者都适用于 reprex 数据和我的实际数据框中。但是，像 starts_with(c("Value", "OtherValue", "OtherOtherValue")) 这样定义多个列名会产生以下错误：Error in starts_with(c("Value", "OtherValue", "OtherOtherValue")) : is_string(match) is not TRUE。
@On_an_island 试试vars(starts_with("Value"), starts_with("OtherValue"), starts_with("OtherOtherValue"))
在我向您发布回复后不久就发现了确切的建议。谢谢，您的第二个解决方案效果很好，比使用 fill 快得多！