【问题标题】:How do I merge data in R based on row values and create new variable with it?如何根据行值合并 R 中的数据并用它创建新变量?
【发布时间】:2020-11-17 14:05:22
【问题描述】:

我有一个数据集,其中每一行代表特定产品在特定地理区域的销售额(见图 1)。我想获取其他产品的价格并将它们添加为附加列,以便它们可以成为回归中的附加变量(参见图 2)。我该怎么做?

我目前的数据:

期望的输出:

【问题讨论】:

标签: r dataframe merge linear-regression data-cleaning


【解决方案1】:

使用来自dplyr 包的gatherunitespread

df <- tibble(
          Time = rep(c("Week1","Week2","Week3"), 4),
          Geography = c(rep("Dallas",6), rep("Houston",6)),
          Product = c(rep("Apple",3), rep("Orange",3), rep("Apple",3), rep("Orange",3)),
          Volume = c(1403, 3514, 3388, 2284, 3091, 3558, 3199, 2521, 3381, 2127, 2383, 2469),
          Price = c(4.01, 4.11, 4.10, 2.63, 2.98, 2.25, 3.67, 3.80, 3.29, 5.30, 5.02, 5.57))

>df
# A tibble: 12 x 5
   Time  Geography Product Volume Price
   <chr> <chr>     <chr>    <dbl> <dbl>
 1 Week1 Dallas    Apple     1403  4.01
 2 Week2 Dallas    Apple     3514  4.11
 3 Week3 Dallas    Apple     3388  4.1 
 4 Week1 Dallas    Orange    2284  2.63
 5 Week2 Dallas    Orange    3091  2.98
 6 Week3 Dallas    Orange    3558  2.25
 7 Week1 Houston   Apple     3199  3.67
 8 Week2 Houston   Apple     2521  3.8 
 9 Week3 Houston   Apple     3381  3.29
10 Week1 Houston   Orange    2127  5.3 
11 Week2 Houston   Orange    2383  5.02
12 Week3 Houston   Orange    2469  5.57

df <- df %>%
  gather(Volume, Price, -(Time:Product)) %>%
  unite(temp, Product, Volume) %>%
  spread(temp, Price)

> df
# A tibble: 6 x 6
  Time  Geography Apple_Price Apple_Volume Orange_Price Orange_Volume
  <chr> <chr>           <dbl>        <dbl>        <dbl>         <dbl>
1 Week1 Dallas           4.01         1403         2.63          2284
2 Week1 Houston          3.67         3199         5.3           2127
3 Week2 Dallas           4.11         3514         2.98          3091
4 Week2 Houston          3.8          2521         5.02          2383
5 Week3 Dallas           4.1          3388         2.25          3558
6 Week3 Houston          3.29         3381         5.57          2469

P/S:下次请复制您问题中的数据样本(不是图片)。它可以帮助其他人复制问题并更快地解决问题。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-09-25
    • 1970-01-01
    • 2021-12-14
    • 2023-02-22
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多