【问题标题】:Converting row values(duplicates and different data types) to columns in R [duplicate]将行值(重复和不同的数据类型)转换为 R 中的列 [重复]
【发布时间】:2018-01-25 10:42:43
【问题描述】:

更新数据集:

DateTime            Object.Name    Object.Value
6/22/2017 21:11     DaHum          Normal
6/22/2017 12:59     DaHum          Alarm
6/16/2017 18:48     DaHum          Normal
6/16/2017 14:33     DaHum          Alarm
6/15/2017 18:46     DaHum          Normal
7/28/2017 8:00      ZON-1          58.56
7/28/2017 8:00      MA-H           51.66
7/28/2017 8:00      ZON-2          72.00
7/28/2017 8:00      ZON-4          70.00
7/28/2017 8:00      ZON-3          72.00
7/28/2017 7:45      PH             0.00
7/28/2017 7:45      OA             79.50
7/28/2017 7:45      SP             50.00
7/28/2017 7:45      ZON-1          32.47
7/28/2017 7:45      ZON-3          70.00
7/28/2017 7:45      CC             55.81

您好,我有以下格式的数据框:

我需要将Object_Name 下的所有值转换为列名。 Object_Names 有重复的值,即the same Name is repeated with different timestamp.

Object_Value 的数据类型是字母数字,所以在 R 中传递时,它要么是 Factor or a Character

因此基于时间戳,我需要转换所有Object_Name row values to column name

  Date         Time    Object_Name    Object_Value
  7/28/2017    08:00    A1            58.56
  7/28/2017    08:00    A2            51.66
  .
  .
  .
  7/28/2017    08:30    A1            60.2
  7/28/2017    08:30    A2            65.2
  .
  .
  7/30/2017    08:30    B1            On
  7/30/2017    09:30    B1            Off

我需要output 如下:

  Date         Time     A1        A2     B1
  7/28/2017    08:00    58.5     51.6    -
  7/28/2017    08:30    60.2     65.2    -
  7/30/2017    08:30      -        -     On
  7/30/2017    09:30      -        -     Off

到目前为止的代码:

JCI <- read.csv("JCIS2.csv",header = T, stringsAsFactors=FALSE)

JCI$Object.Value <- as.numeric(JCI$Object.Value)

library(reshape2)
JCI_Reshape <- dcast(JCI_Unique, Date...Time ~ Object.Name, value.var = "Object.Value", fun.aggregate = mean)

【问题讨论】:

  • 只使用dcast from reshape2 or data.table or spread from tidyr
  • 我尝试过使用 dcast,如果我将 object_Value 转换为数字类型。我得到非数字的 NA。如果我将它用作字符变量,我会得到所有的 NA。并且没有聚合函数(平均值)我得到错误,。有没有我可以使用的聚合函数

标签: r dataframe rows reshape2 col


【解决方案1】:
library(tidyr) 
spread(JCI, Object_Name,Object_Value)
        Date  Time    A1    A2  B1
1: 7/28/2017 08:00 58.56 51.66  NA
2: 7/28/2017 08:30  60.2  65.2  NA
3: 7/30/2017 08:30    NA    NA  On
4: 7/30/2017 09:30    NA    NA Off

数据:

 dput(JCI)
structure(list(Date = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("7/28/2017", 
"7/30/2017"), class = "factor"), Time = structure(c(1L, 1L, 2L, 
2L, 2L, 3L), .Label = c("08:00", "08:30", "09:30"), class = "factor"), 
    Object_Name = structure(c(1L, 2L, 1L, 2L, 3L, 3L), .Label = c("A1", 
    "A2", "B1"), class = "factor"), Object_Value = structure(c(2L, 
    1L, 3L, 4L, 6L, 5L), .Label = c("51.66", "58.56", "60.2", 
    "65.2", "Off", "On"), class = "factor")), .Names = c("Date", 
"Time", "Object_Name", "Object_Value"), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000007b70788>)

【讨论】:

  • 抛出错误:错误:行标识符重复 (194894, 194895), (194268, 194269), (193878, 193879), (193642, 193643), (193524, 193525), (193446) , 193447), (193406, 193407), (192936, 192937), (195908, 195909), (195558, 195559), (195246, 195247),.....
  • 您能否生成一个可重现的数据集示例作为您问题的更新。
  • 请找到更新后的数据集
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2022-06-28
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-09-22
  • 2022-01-10
  • 2016-07-24
相关资源
最近更新 更多