【发布时间】:2021-02-18 14:49:01
【问题描述】:
我有 5 个主要变量 f1_、f2_、f3_、f4_ 和 f5_ 的调查日期,每个 f*_ 组变量最多有 10 个子组,例如:f1_1、f1_2、f1_3 ... 或 f2_1、f2_2、。 .. f2_10.
我想执行一个 pivot_longer 来重塑我的数据框以便进行分析,我是 R 用户并且这样做了,我想知道如何通过 python、pandas 实现相同的输出。
df %>%
# Reshape data - to long
pivot_longer(cols = all_of(ends_with(c("1","2","3", "4" ,"5"))), names_to = c("name", "check_id"), names_pattern = "(.*)(.)") %>%
# Reshape data - to wide
pivot_wider(names_from = name) %>%
#unnest data
unnest() %>%
# remove row if it has a NA value in both column
filter_at(.vars = vars(one_of(c("f1_", "f2_"))),~ !is.na(.)) %>%
# Crosstab 3 way
tabyl(check_id, f1_ ,f2_ ) %>%
# add total row and col
adorn_totals(c("row", "col" ))
这是所需的输出:
$No
check_id Person 1 Person 2 Person 3 Person 4 Total
1 2 0 0 0 2
2 0 1 0 0 1
3 0 0 1 0 1
4 1 0 0 1 2
Total 3 1 1 1 6
$Yes
check_id Person 1 Person 2 Person 3 Person 4 Total
1 5 0 0 0 5
2 0 5 0 0 5
3 0 1 2 0 3
4 0 0 0 1 1
Total 5 6 2 1 14
Python 示例数据
f1_ 和 f2_1 有 5 个子组
df = pd.DataFrame(
{
"f1_1": ["Person 1","NA","Person 1","Person 1","Person 1","Person 1","NA","Person 1", "Person 1"],
"f1_2": ["Person 2","NA","Person 2","Person 2","Person 2","NA","NA","Person 2","Person 2"],
"f1_3": ["Person 3","NA","NA","Person 3","Person 2","NA","NA","Person 3","NA"],
"f1_4": ["Person 4","NA","NA","Person 4", "NA","NA","NA","Person 1","NA"],
"f1_5": ["NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"],
"f2_1": ["Yes", "NA", "Yes", "No", "Yes", "No", "NA", "Yes", "Yes"],
"f2_2": ["Yes", "NA", "Yes", "No", "Yes", "NA", "NA", "Yes", "Yes"],
"f2_3": ["Yes", "NA", "NA", "No", "Yes", "NA", "NA", "Yes", "NA"],
"f2_4": ["Yes", "NA", "NA", "No", "NA", "NA", "NA", "No", "NA"],
"f2_5": ["NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"]
}
)
R 样本数据
df <- tibble::tribble(
~f1_1, ~f1_2, ~f1_3, ~f1_4, ~f1_5, ~f2_1, ~f2_2, ~f2_3, ~f2_4, ~f2_5,
"Person 1", "Person 2", "Person 3", "Person 4", NA, "Yes", "Yes", "Yes", "Yes", NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
"Person 1", "Person 2", NA, NA, NA, "Yes", "Yes", NA, NA, NA,
"Person 1", "Person 2", "Person 3", "Person 4", NA, "No", "No", "No", "No", NA,
"Person 1", "Person 2", "Person 2", NA, NA, "Yes", "Yes", "Yes", NA, NA,
"Person 1", NA, NA, NA, NA, "No", NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
"Person 1", "Person 2", "Person 3", "Person 1", NA, "Yes", "Yes", "Yes", "No", NA,
"Person 1", "Person 2", NA, NA, NA, "Yes", "Yes", NA, NA, NA
)
【问题讨论】:
标签: python r pandas melt pandas-melt