【发布时间】:2019-10-03 03:06:13
【问题描述】:
我在清理 R 中的数据集时遇到问题。我有一个包含三个变量(名称、日期、数据)的数据集。第三个变量实际上包含我的所有数据,但需要对其进行解析。我需要根据列中的值将此列拆分为多列。例如,在以下数据库中:
x <- data.frame("name" = c("John","John","John","John","John","Sarah","Sarah","Sarah"), "Day" = c(1,1,1,1,1,2,2,2), "Data" = c("Map 28", 2,3,"Transfer","Time","Map 18",2,3))
看起来像:
name Day Data
1 John 1 Map 28
2 John 1 2
3 John 1 3
4 John 1 Transfer
5 John 1 Time
6 Sarah 2 Map 18
7 Sarah 2 2
8 Sarah 2 3
我需要查看“数据”列并找到使用“地图”一词的任何时间,然后将其下的所有数据转换为另一列。像这样:
name Day Data Val1 Val2 Val3 Val4
1 John 1 Map 28 2 3 Transfer Time
2 Sarah 2 Map 18 2 3 <NA> <NA>
对此的任何帮助将不胜感激!
[编辑]
对不起,我认为我的例子过于简单了......问题是每个人每天都会有多个需要定位的“地图”值。所以看起来更像下面这样。
x <- data.frame("name" = c("John","John","John","John","John","John","John","John","John","John","John","John","Sarah","Sarah","Sarah","Sarah","Sarah","Sarah"),
"Day" = c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2),
"Data" = c("Map 28", 2,3,"Transfer","Time","Map 15",2,3,"Text","Map3",2,4,"Map 18",2,3,"Map 22",2,3))
name Day Data
1 John 1 Map 28
2 John 1 2
3 John 1 3
4 John 1 Transfer
5 John 1 Time
6 John 1 Map 15
7 John 1 2
8 John 1 3
9 John 1 Text
10 John 1 Map3
11 John 1 2
12 John 1 4
13 Sarah 2 Map 18
14 Sarah 2 2
15 Sarah 2 3
16 Sarah 2 Map 22
17 Sarah 2 2
18 Sarah 2 3
然后最终的输出将是......
y <- data.frame("name" = c("John","John","John","Sarah", "Sarah"),
"Day" =c(1,1,1,2,2),
"Data"= c("Map 28","Map 15","Map 3","Map 18","Map 22"),
"Val1" =c(2,2,2,2,2),
"Val2"=c(3,3,4,3,3),
"Val3"=c("Transfer","Text",NA,NA,NA),
"Val4"=c("Time",NA,NA,NA,NA))
name Day Data Val1 Val2 Val3 Val4
1 John 1 Map 28 2 3 Transfer Time
2 John 1 Map 15 2 3 Text <NA>
3 John 1 Map 3 2 4 <NA> <NA>
4 Sarah 2 Map 18 2 3 <NA> <NA>
5 Sarah 2 Map 22 2 3 <NA> <NA>
【问题讨论】:
-
你确定你需要这么花哨吗?这看起来像是从长到宽的重塑,使用
name和Day作为输出数据集中行的标识符。 -
感谢您的快速回复,我想我把问题简单化了。每个人每天都会有多个 Map 值,每个 Map 值之间的数据量不同。我已对我的原始帖子添加了编辑。很抱歉造成混乱,再次感谢。
标签: r