如何在我的数据表中添加一列以显示多个其他列的值的总和？答案

【问题标题】：How do I add a column to my data table that shows the sum of multiple other columns' values?如何在我的数据表中添加一列以显示多个其他列的值的总和？
【发布时间】：2019-10-13 00:12:19
【问题描述】：

我有 8 个年龄类别，每个类别都有自己的列（即residents_under_5、resident_6_to_12 等。每列的值介于 0 和 3 之间，表示该家庭中该特定年龄类别的人数。我想要的是一个新列，我可以用它在直方图上绘制人口年龄的总分布。所以我在想一列有 66 行的resident_under_5、32 行的resident_6_to_12 等，作为这些类别的总和。

我的数据如下所示：

我想要的是一列显示：

e
a
a
a
a
b
b
b
b
b
c
c
c
d
d
d

其他列中出现的总次数。

我尝试使用 sum(residents_under_5) 声明新列，但这会给我 1 行 66（作为该类别的总和）。我无法用这样的列绘制直方图。我希望有人能弄清楚！

这是相关列的 dput()

residents_under_5 = c(0, 0, 0, 1, 1, 2), 
residents_6_to_12 = c(0, 0, 0, 0, 0, 0), 
        residents_13_to_18 = c(0, 0, 0, 0, 0, 0), 
residents_19_to_24 = c(0, 
        0, 0, 0, 0, 0), 
residents_25_to_34 = c(0, 1, 2, 0, 1, 0), 
       residents_35_to_49 = c(0, 0, 0, 2, 1, 2), 
residents_50_to_64 = c(0, 
        1, 0, 0, 0, 0), 
residents_65_and_older = c(2, 0, 0, 0, 1, 
        0)

【问题讨论】：

类似这样的东西：stackoverflow.com/questions/31461357/…?
我刚刚提供了一些显示我的问题的数据
你能提供你的基础数据集的 dput() 吗？
提供@RandallHelms

标签： r

【解决方案1】：

您可以unlist 数据帧并使用table 计算频率，然后使用rep 重复letters。

rep(letters[seq_len(ncol(df))], colSums(df))

数据

df <- data.frame(residents_under_5 = c(0, 0, 0, 1, 1, 2), 
                 residents_6_to_12 = c(0, 0, 0, 0, 0, 0), 
                 residents_13_to_18 = c(0, 0, 0, 0, 0, 0), 
                 residents_19_to_24 = c(0, 0, 0, 0, 0, 0), 
                 residents_25_to_34 = c(0, 1, 2, 0, 1, 0), 
                 residents_35_to_49 = c(0, 0, 0, 2, 1, 2), 
                 residents_50_to_64 = c(0, 1, 0, 0, 0, 0), 
                 residents_65_and_older = c(2, 0, 0, 0, 1, 0))

【讨论】：

这也适用于数据表吗？我有比这 4 个更多的列，如何仅指定相关列？我只是写“名字”，还是应该写什么？谢谢！
@BenGill 这考虑了所有列，无论是 4 还是 100。如果您想忽略它们，您可能需要对这些列进行子集化。例如，要忽略第一列，我们可能会这样做 rep(names(df)[-1], table(unlist(df)[-1]))
显然它不起作用，因为我的数据集中有负值（错误：'times' 参数无效）
@BenGill 我使用您相关列的dput 创建了一个新数据框并更新了答案，您现在可以检查吗？
它给了我 5000 行“a”

【解决方案2】：

tidyverse 中的一个选项是将具有 summarise_all、gather 的所有列中的 sum 转换为“长”格式，将 uncount 转换为“值”列

library(tidyverse)
df1 %>% 
   summarise_all(sum) %>%
   gather %>% 
   uncount(value)

数据

df1 <- structure(list(a = 0:3, b = c(3L, 3L, 0L, 1L), c = c(2L, 2L, 
2L, 0L), d = c(1L, 1L, 1L, 0L)), class = "data.frame", row.names = c(NA, 
  -4L))

【讨论】：