在 R 中重塑宽数据：将两行转换为列答案

【问题标题】：Reshape wide data in R: Converting two rows into columns在 R 中重塑宽数据：将两行转换为列
【发布时间】：2018-10-05 17:59:55
【问题描述】：

如何在 R 中转置这个数据集？见下文：

我下载了一个如下所示的数据集（日期从 2016 年往回追溯至 1975 年）：

           V1               V2               V3               V4               V5
1                         2016             2016             2016             2015
4     Country       Both-sexes             Male           Female       Both-sexes
5 Afghanistan 23.4 [22.0-24.8] 22.6 [20.1-25.1] 24.1 [23.0-25.3] 23.3 [21.9-24.6]
6     Albania 26.7 [25.8-27.5] 27.0 [25.8-28.2] 26.3 [25.0-27.6] 26.6 [25.8-27.4]
7     Algeria 25.5 [24.5-26.5] 24.7 [23.4-26.1] 26.4 [24.9-27.8] 25.5 [24.5-26.4]
8     Andorra 26.7 [24.6-28.7] 27.3 [24.8-29.8] 26.1 [22.8-29.5] 26.7 [24.7-28.7]

我需要将年份和性别行（当前编号为第 1 行和第 4 行）设置为列。这就是我想要的：

1 Country Year Sex Rate 2 Afghanistan 2016 Both-sexes 23.4 3 Afghanistan 2016 Male 22.6 3 Afghanistan 2016 Female 24.1 4 Afghanistan 2015 Both-sexes 23.3

...对于数据集中的所有国家/地区，这些行一直持续到所有年份。

这是我为到达那里所做的工作：

cfile <- read.csv(file= "countries-BMI.csv", header = F)


#removed second two rows that have unnecessary info
countries_data <- cfile[-c(2,3), ]

molten_countries_data <- melt(countries_data, id=c("V1"))

.这是我的结果 - head(molten_countries_data):

           V1 variable            value
1                   V2             2016
2     Country       V2       Both-sexes
3 Afghanistan       V2 23.4 [22.0-24.8]
4     Albania       V2 26.7 [25.8-27.5]
5     Algeria       V2 25.5 [24.5-26.5]
6     Andorra       V2 26.7 [24.6-28.7]

不是我想要的！请帮忙。

【问题讨论】：

看来问题不在于重塑数据，而在于删除括号中的文本
另请注意，年份和性别会丢失，因为它们实际上不是列名。使用reproducible question 提供帮助会更容易
如果您使用 data.table 中的 melt，patterns() 辅助函数可能很有用，例如 stackoverflow.com/q/12466493（您需要解决 read.csv 中跳过行的问题，以便该行我猜 4 是你的标题。）
关键是将前 2 行合并在一起以创建唯一的列名，然后使用 tidyr 的 spread 函数，如果您可以使用 dput 函数发布数据样本，它将更容易获得帮助。

标签： r dataframe reshape reshape2 melt

【解决方案1】：

感谢@Dave2e 的提示，我首先合并前 2 行。这是我最终做的：

library(reshape2)
library(tidyr)

#load data frame without first two rows
cdata <- read.csv("countries-BMI.csv", skip = 2, header = F)

#create header by combining top two rows
headers <- read.csv("countries-BMI.csv", nrows=2, header=FALSE)
headers_names <- sapply(headers,paste,collapse="_")

#add the new header to data frame
names(cdata) <- headers_names

#transpose the "wide data" to make it tidy/long
longdata <- melt(cdata, id.vars = c("_Country"))

#separate the year and sex columns
countriesBMI2 <- separate(data = longdata, col = variable, into = c("Year", "Sex"), sep = "_")

我的结果：head(countriesBMI2)

             _Country Year        Sex            value
1         Afghanistan 2016 Both-sexes 23.4 [22.0-24.8]
2             Albania 2016 Both-sexes 26.7 [25.8-27.5]
3             Algeria 2016 Both-sexes 25.5 [24.5-26.5]
4             Andorra 2016 Both-sexes 26.7 [24.6-28.7]
5              Angola 2016 Both-sexes 23.3 [21.2-25.6]
6 Antigua and Barbuda 2016 Both-sexes 26.7 [24.6-28.8]

【讨论】：