【发布时间】:2021-08-02 20:23:48
【问题描述】:
我有两个数据框,一个是调查数据(data.csv),另一个是标签数据(label.csv)。这是示例数据(我的原始数据有大约 150 个变量)
#sample data
df <- tibble::tribble(
~id, ~House_member, ~dob, ~age_quota, ~work, ~sex, ~pss,
1L, 4L, 1983L, 2L, 2L, 1, 1,
2L, 1L, 1940L, 7L, 2L, 1, 2,
3L, 2L, 1951L, 5L, 6L, 1, 1,
4L, 4L, 1965L, 2L, 2L, 1, 4,
5L, 3L, 1965L, 2L, 3L, 1, 1,
6L, 1L, 1951L, 3L, 1L, 1, 3,
7L, 1L, 1955L, 1L, 1L, 1, 3,
8L, 4L, 1982L, 2L, 2L, 2, 5,
9L, 2L, 1990L, 2L, 4L, 2, 3,
10L, 2L, 1953L, 3L, 2L, 2, 4
)
#sample label data
label <- tibble::tribble(
~variable, ~value, ~label,
"House_member", NA, "How many people live with you?",
"House_member", 1L, "1 person",
"House_member", 2L, "2 persons",
"House_member", 3L, "3 persons",
"House_member", 4L, "4 persons",
"House_member", 5L, "5 persons",
"House_member", 6L, "6 persons",
"House_member", 7L, "7 persons",
"House_member", 8L, "8 persons",
"House_member", 9L, "9 persons",
"House_member", 10L, "10 or more",
"dob", NA, "date of brith",
"age_quota", NA, "age_quota",
"age_quota", 1L, "10-14",
"age_quota", 2L, "15-19",
"age_quota", 3L, "20-29",
"age_quota", 4L, "30-39",
"age_quota", 5L, "40-49",
"age_quota", 6L, "50-70",
"age_quota", 7L, "70 +",
"work", NA, "what is your occupation?",
"work", 1L, "full time",
"work", 2L, "part time",
"work", 3L, "retired",
"work", 4L, "student",
"work", 5L, "housewife",
"work", 6L, "unemployed",
"work", 7L, "other",
"work", 8L, "kid under 15",
"sex", NA, "gender?",
"sex", 1L, "Man",
"sex", 2L, "Woman",
"pss", NA, "How often do you use PS?",
"pss", 1L, "Daily",
"pss", 2L, "several times per week",
"pss", 3L, "once per week",
"pss", 4L, "several time per month",
"pss", 5L, "Rarly"
)
我想知道有没有什么方法可以将这些文件组合在一起,形成一个带有标签的数据框,例如 SPSS 的样式格式(dbl+lbl 格式)。我知道labelled 包可以将值标签添加到未标记的向量中,例如以下示例:
v <- labelled::labelled(c(1,2,2,2,3,9,1,3,2,NA), c(yes = 1, maybe = 2, no = 3))
我希望有一种比一个一个地为每个变量添加标签更好/更快的方法。
【问题讨论】:
标签: r dataframe tidyverse purrr r-haven