将 Excel 小计行转换为 R 数据框中的列答案

【问题标题】：Convert Excel subtotal rows to columns in R data frames将 Excel 小计行转换为 R 数据框中的列
【发布时间】：2015-04-15 08:27:10
【问题描述】：

我正在尝试读入包含按员工分组的时间输入行的 R Excel 电子表格，当分组折叠时看起来像这样（此处使用逗号分隔列）：

Column A    Column B

Alice

2015-01-01  8
2015-01-02  7.5
2015-01-03  6

Bob

2015-01-02  6
2015-01-03  8

我可以使用 xlsx::read.xlsx2 函数将电子表格读入数据框，但我无法弄清楚如何将小计行转换为列，因此数据框如下所示：

Alice   2015-01-01  8
Alice   2015-01-02  7.5
Alice   2015-01-03  6
Bob     2015-01-02  6
Bob     2015-01-03  8

我尝试查看reshape 和dplyr，但我不知道他们是否可以提供帮助。有人可以指出我正确的方向吗？

【问题讨论】：

小计是什么意思？名字？
你能用制表符格式化初始表，以区分列吗？我相信 tidyr gather 适合这个。
每个 Alice、Bob 等的时间条目数是否相同？
how to pivot/unpivot (cast/melt) data frame? 可能有用
日期是否与 Alice 在同一列？如果不是，请查看 zoo 包中的 na.locf。

标签： r excel

【解决方案1】：

这可能有帮助

library(dplyr)
library(tidyr)
#read the file using `readLines`
lines <- readLines('file.csv')
#remove the empty elements
lines1 <- lines[lines!='']
#create a grouping index based on the occurrence of non-numeric elements 
indx <- cumsum(grepl('^[A-Za-z]', lines1))
#create another index for finding the position of non-numeric element 
indx1 <- grep('^[A-Za-z]', lines1)
#split the lines based on the grouping index
lst <- setNames(split(lines1[-indx1], indx[-indx1]), lines1[indx1])
#use unnest from tidyr and split the `x` column into two
unnest(lst, Name) %>% 
           extract(x, c('Date', 'val'), '(.*),(.*)', convert=TRUE)
#   Name       Date val
#1 Alice 2015-01-01   8
#2 Alice 2015-01-02 7.5
#3 Alice 2015-01-03   6
#4   Bob 2015-01-02   6
#5   Bob 2015-01-03   8

或者你可以使用base R。

#read the data using `read.csv` or `read.xlsx2`.  Here `,` is the delimiter
d1 <- read.csv('file.csv', header=FALSE, stringsAsFactors=FALSE)
#second column `V2` will have `NAs` for corresponding words in `V1`
indx <- is.na(d1$V2)
#subset the dataset by removing the `NA` rows 
d2 <- d1[!indx,]
#use one of the aggregating functions
#remove the first element for each group  
d2$names <-  unlist(tapply(rep(d1$V1[indx], tabulate(cumsum(indx))), 
             cumsum(indx), FUN=tail,-1), use.names=FALSE)
d2
#         V1  V2 names
#2 2015-01-01 8.0 Alice
#3 2015-01-02 7.5 Alice
#4 2015-01-03 6.0 Alice
#6 2015-01-02 6.0   Bob
#7 2015-01-03 8.0   Bob

【讨论】：