【发布时间】:2022-01-20 13:54:29
【问题描述】:
ID What color is this item? What color is this item?_2 What is the shape of this item? What is the shape of this item?_2 size
55 red blue circle triangle small
83 blue yellow circle NA large
78 red yellow square circle large
43 green NA square circle small
29 yellow green circle triangle medium
我想要一个这样的频率表:
Variable Level Freq Percent
color blue 2 22.22
red 2 22.22
yellow 3 33.33
green 2 22.22
total 9 100.00
shape circle 5 50.0
triangle 3 30.0
square 2 20.0
total 10 100.0
size small 2 33.3
medium 2 33.3
large 2 33.3
total 6 100.0
但是当我尝试转换为 long 时,我无法匹配列的名称,因为它们是长字符串。从上一个问题中,我知道我可以这样做:
options(digits = 3)
df1 <- df2 %>%
pivot_longer(
-ID,
names_to = "Question",
values_to = "Response"
) %>%
mutate(Question = str_extract(Question, '')) %>%
group_by(Question, Response) %>%
count(Response, name = "Freq") %>%
na.omit() %>%
group_by(Question) %>%
mutate(Percent = Freq/sum(Freq)*100) %>%
group_split() %>%
adorn_totals() %>%
bind_rows() %>%
mutate(Response = ifelse(Response == last(Response), last(Question), Response)) %>%
mutate(Question = ifelse(duplicated(Question) |
Question == "Total", NA, Question))
但我无法找到正确的正则表达式以放入该行:
mutate(Question = str_extract(Question, '')) %>%
如果有人知道另一种方法来做到这一点,那就太好了!
【问题讨论】:
-
不清楚你想提取什么。
But I'm having trouble finding the right regular expression to put in the line:。你要mutate(Question = str_extract(Question, "color|shape|size")) -
您介意与
dput分享您的数据吗?或者至少在列名周围加上引号?空格使导入很烦人。