【问题标题】:Difference from the mean (by column)与平均值的差异(按列)
【发布时间】:2023-01-13 21:27:59
【问题描述】:

我有这个 DF:

structure(list(Date = structure(c(18605, 18604, 18598, 18597, 
18590, 18584, 18583, 18578, 18570, 18569, 18563, 18562, 18557, 
18549, 18548, 18542, 18541, 18536, 18534, 18529, 18521, 18520, 
18515, 18508, 18500, 18499, 18493, 18492, 18486, 18485, 18479, 
18478, 18472, 18471, 18465, 18464, 18458, 18457, 18450, 18445, 
18444, 18437, 18436, 18430, 18429, 18424, 18416, 18415, 18410, 
18409, 18403, 18402, 18396, 18388, 18387, 18381, 18380, 18374, 
18373, 18368, 18367, 18360, 18359, 18354, 18340, 18338, 18331, 
18325, 18317, 18312, 18289, 18282, 18275, 18268), class = "Date"), 
    V1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0.3, 0, 0, 0, 0, 0.4, 0, 0, 0, 0, 0.2, 0, 0, 0, 0, 0.7, 0, 
    0, 0, 0, 0, 0.5, 0, 0, 0, 0, 0.3, 0, 0, 0, 0, 0, 0.4, 0, 
    0, 0, 0.3, 0, 0, 0, 0, 0, 0, 0, 0, 0.6, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0), V2 = c(0, 0, 0.1, 0, 0, 0.1, 0, 0.2, 0, 0.2, 
    0.1, 0, 0.2, 0.2, 0, 0.1, 0, 0, 0.1, 0, 0.2, 0, 0, 0.4, 0.2, 
    0, 0.3, 0, 0.2, 0, 0.3, 0, 0.6, 0, 0.4, 0, 0, 0.2, 0, 0.4, 
    0.6, 0, 0.3, 0, 0.2, 0.7, 0, 0.1, 0.3, 0, 0.2, 0, 0, 0, 0.3, 
    0, 0.1, 0.3, 0, 0, 0.3, 0.2, 0, 0, 0, 0, 0.6, 0, 0.4, 0, 
    0.2, 0, 0, 0.2), V3 = c(0, 0.3, 0, 0.3, 0.4, 0, 0.2, 0, 0.3, 
    0, 0, 0.2, 0, 0, 0.2, 0, 0.2, 0, 0, 0.1, 0, 0.2, 0, 0, 0, 
    0.3, 0, 0, 0, 0.4, 0, 0.3, 0, 0.7, 0, 0.2, 0.5, 0.4, 0, 0.4, 
    0, 0.8, 0.4, 0, 0.2, 0.6, 0.3, 0.2, 0, 0, 0, 0.4, 0.4, 0, 
    0.2, 0.3, 0, 0.2, 0.3, 0.4, 0, 0.7, 0, 0, 1.4, 0, 0, 1.4, 
    0, 1, 0, 0, 0.3, 0), V4 = c(0, 0.4, 0, 0.1, 0.1, 0, 0.1, 
    0, 0, 0.1, 0, 0.1, 0.2, 0, 0.2, 0, 0.2, 0.3, 0, 0, 0, 0.2, 
    0.3, 0.3, 0, 0, 0, 0.5, 0, 0.6, 0, 0.7, 0, 0, 0, 1.2, 1, 
    0, 0.3, 0, 1.1, 0, 0, 0.4, 0, 0, 0, 0, 0.2, 0.2, 0, 0, 0.2, 
    0, 0, 0.1, 0, 0, 0, 0.2, 0.3, 0, 0.2, 0.3, 0, 1.8, 0, 0, 
    0, 0, 0, 0.2, 0, 0)), row.names = c(NA, -74L), class = c("tbl_df", 
"tbl", "data.frame"))

我想改变 V1、V2、V3 和 V4 列,而不是显示此处发布的当前值,而是想显示它们与各自列中的平均值的差异。 所以 V4 的平均值 = 0.1635135,所以第四个值应该是 = 0.4-0.1635135 = 0.2364865。

我试过通过执行以下操作零碎地(单独做每一列),但我不断收到计算错误:

df <- df %>% mutate(across(2, x - mean())

有没有人对我如何完成这个有任何建议?非常感谢任何帮助

【问题讨论】:

  • df %&gt;% mutate(across(V1:V4, ~ .x - mean(.x)))

标签: r mean mutate


【解决方案1】:

解决方案 1:across()中使用purrr风格的函数

df %>%
  mutate(across(V1:V4, ~ .x - mean(.x)))

# # A tibble: 74 × 5
#    Date          V1      V2       V3      V4
#    <date>     <dbl>   <dbl>    <dbl>   <dbl>
#  1 2020-12-09 -0.05 -0.128  -0.204   -0.164 
#  2 2020-12-08 -0.05 -0.128   0.0959   0.236 
#  3 2020-12-02 -0.05 -0.0284 -0.204   -0.164 
#  4 2020-12-01 -0.05 -0.128   0.0959  -0.0635
#  5 2020-11-24 -0.05 -0.128   0.196   -0.0635
# ...

解决方案 2:使用across() 选择变量并将其传递给scale(x, scale = FALSE)

df %>%
  mutate(as_tibble(scale(across(V1:V4), scale = FALSE)))

# # A tibble: 74 × 5
#    Date          V1      V2       V3      V4
#    <date>     <dbl>   <dbl>    <dbl>   <dbl>
#  1 2020-12-09 -0.05 -0.128  -0.204   -0.164 
#  2 2020-12-08 -0.05 -0.128   0.0959   0.236 
#  3 2020-12-02 -0.05 -0.0284 -0.204   -0.164 
#  4 2020-12-01 -0.05 -0.128   0.0959  -0.0635
#  5 2020-11-24 -0.05 -0.128   0.196   -0.0635
# ...

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-11-15
    • 1970-01-01
    • 2023-01-19
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多