【问题标题】:Pivot_wider with multiple (parallel) columns具有多个(平行)列的 Pivot_wider
【发布时间】:2021-11-01 20:27:16
【问题描述】:

假设我有以下数据 (df),其中包含按区域划分的公司的数据。 df 是一种宽格式,其中WC19600WC19610 描述位置,WC19601WC19611 包含销售数据。这些变量中的倒数第二个数字表示segment level

# A tibble: 2 x 6
  NAME      ISIN         WC19600       WC19610           WC19601   WC19611
  <chr>     <chr>        <chr>         <chr>               <dbl>     <dbl>
1 APPLE     US0378331005 United States Other Foreign   109197000 125010000
2 MICROSOFT US5949181045 United States Other countries  83953000  84135000

我的目标是让数据采用更长的格式,例如

# A tibble: 4 x 5
  NAME      ISIN          segm region            sales
  <chr>     <chr>        <dbl> <chr>             <dbl>
1 APPLE     US0378331005     0 United States 109197000
2 APPLE     US0378331005     1 Other Foreign 125010000
3 MICROSOFT US5949181045     0 United States  83953000
4 MICROSOFT US5949181045     1 United States  84135000

我试过了

我已经尝试过以下几行,但实际上我只需要旋转一次并将输出合并到segment level 上,并在desc 上有两列

df %>% 
  tidyr::pivot_longer(
    c(WC19600, WC19610),
    names_pattern = "WC196(\\d)0", 
    names_to = "segm",
    values_to = "region"
  ) %>% 
  tidyr::pivot_longer(
    c(WC19601, WC19611),
    names_pattern = "WC196(\\d)1", 
    names_to = "segm",
    values_to = "sales", 
    names_repair = "minimal"
  )

# A tibble: 8 x 6
  NAME      ISIN         segm  region          segm      sales
  <chr>     <chr>        <chr> <chr>           <chr>     <dbl>
1 APPLE     US0378331005 0     United States   0     109197000
2 APPLE     US0378331005 0     United States   1     125010000
3 APPLE     US0378331005 1     Other Foreign   0     109197000
4 APPLE     US0378331005 1     Other Foreign   1     125010000
5 MICROSOFT US5949181045 0     United States   0      83953000
6 MICROSOFT US5949181045 0     United States   1      84135000
7 MICROSOFT US5949181045 1     Other countries 0      83953000
8 MICROSOFT US5949181045 1     Other countries 1      84135000

数据

# input data
df <- tibble::tribble(
  ~NAME,          ~ISIN,        ~WC19600,          ~WC19610,  ~WC19601,  ~WC19611,
  "APPLE", "US0378331005", "United States",   "Other Foreign", 109197000, 125010000,
  "MICROSOFT", "US5949181045", "United States", "Other countries",  83953000,  84135000
)
# aimed results
expected <- tribble(
  ~NAME, ~ISIN, ~segm, ~region, ~sales,
  "APPLE","US0378331005",0,"United States",109197000,
  "APPLE","US0378331005",1,"Other Foreign",125010000,
  "MICROSOFT", "US5949181045", 0, "United States", 83953000,
  "MICROSOFT", "US5949181045", 1, "United States", 84135000,
)

【问题讨论】:

  • 您可以利用超棒的功能来构建一个规范数据框,您可以在其中定义哪一列去哪里。示例见此处:stackoverflow.com/a/61367970/2725773
  • df %&gt;%pivot_longer(starts_with("WC"), names_to = c("segm", ".value"), names_pattern = "(\\d)(\\d$)")

标签: r pivot reshape tidyr


【解决方案1】:
library(dplyr)
library(tidyr)
pivot_longer(
  df, -c(NAME, ISIN),
  names_pattern = "(.*)([0-9])$", names_to = c("segm", ".value")
) %>%
  rename(region = "0", sales= "1")
# # A tibble: 4 x 5
#   NAME      ISIN         segm   region              sales
#   <chr>     <chr>        <chr>  <chr>               <dbl>
# 1 APPLE     US0378331005 WC1960 United States   109197000
# 2 APPLE     US0378331005 WC1961 Other Foreign   125010000
# 3 MICROSOFT US5949181045 WC1960 United States    83953000
# 4 MICROSOFT US5949181045 WC1961 Other countries  84135000

(如果需要/想要,您可以添加%&gt;% mutate(segm = gsub(".*(.)$", "\\1", segm)) 来清理segm。)

【讨论】:

  • 如果您只想提取最后两位数字为“segm”和“.value”,您可以使用names_pattern = "WC196(\\d)(\\d)" 之类的名称模式。当然,这仅在真实列都以与示例中完全相同的模式开头时才有效,因此可能很脆弱。 :)
  • 好点,确实是这样!
猜你喜欢
  • 2021-04-14
  • 2020-05-13
  • 2020-06-17
  • 2022-01-03
  • 1970-01-01
  • 1970-01-01
  • 2021-07-29
  • 1970-01-01
  • 2020-08-27
相关资源
最近更新 更多