【问题标题】:R: Converting structure of a dataframe into the same structure of another dataframeR:将数据帧的结构转换为另一个数据帧的相同结构
【发布时间】:2020-03-17 07:30:14
【问题描述】:

目前我有 2 个大型数据框,包含超过 300,000 个观察值和 100 多个变量,但为了简单起见,假设我有 df1:

> str(df1)
'data.frame':   3000 obs. of  3 variables:
 $ Name         : chr  "AAA" "BBB" "CCC" "DDD" ...
 $ DateTime     : POSIXct, format: "2014-01-01 00:00:00" "2014-01-01 00:10:00" "2014-01-01 00:20:00" ...
 $ Age          : num  27 25 27 30 ...

df2:

> str(df2)
'data.frame':   3000 obs. of  3 variables:
 $ HEX          : Factor w/ 500 levels "AAA","BBB",..: 100 100 100 100 ...
 $ DateTime     : Factor w/ 3000 levels "2014-01-01 00:00:00",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Age          : Factor w/ 500 levels "27","25",..: 100 100 100 100 ...

两个数据框具有相同的值,具有相同的列数和行数,只是它们的结构与 df2 中的所有内容都不同。

我想将 df2 中的结构转换为与 df1 相同。请指教,先谢谢了

【问题讨论】:

    标签: r string dataframe structure


    【解决方案1】:

    假设两个数据框的列与描述的顺序完全相同,您可以在Map 方法中使用class 函数。

    df2[] <- Map(function(x, y) {
      if (any(grepl("POS", y)))
        ISOdate(as.Date(x), 0, 0, 0)
      else if (y == "Date")
        as.Date(x)
      else
        `class<-`(as.character(x), y)
      }, df2, lapply(df1, class))
    

    演示

    之前

    lapply(df1, class)
    # $name
    # [1] "character"
    # 
    # $date
    # [1] "POSIXct" "POSIXt" 
    # 
    # $age
    # [1] "numeric"
    # 
    # $date2
    # [1] "Date"
    
    lapply(df2, class)
    # $HEX
    # [1] "factor"
    # 
    # $date
    # [1] "factor"
    # 
    # $age
    # [1] "factor"
    # 
    # $date2
    # [1] "factor"
    

    转化

    df2[] <- Map(function(x, y) {
      if (any(grepl("POS", y)))
        ISOdate(as.Date(x), 0, 0, 0)
      else if (y == "Date")
        as.Date(x)
      else
        `class<-`(as.character(x), y)
      }, df2, lapply(df1, class))
    

    之后

    lapply(df2, class)
    # $HEX
    # [1] "character"
    # 
    # $date
    # [1] "POSIXct" "POSIXt" 
    # 
    # $age
    # [1] "numeric"
    # 
    # $date2
    # [1] "Date"
    

    数据

    df1 <- structure(list(name = c("A", "B", "C", "D", "E"), date = structure(c(1577836800, 
    1580515200, 1583020800, 1585699200, 1588291200), class = c("POSIXct", 
    "POSIXt")), age = c(30, 27, 25, 28, 23), date2 = structure(c(18262, 
    18293, 18322, 18353, 18383), class = "Date")), row.names = c(NA, 
    -5L), class = "data.frame")
    
    df2 <- structure(list(HEX = structure(1:5, .Label = c("A", "B", "C", 
    "D", "E"), class = "factor"), date = structure(1:5, .Label = c("2020-01-01 01:00:00", 
    "2020-02-01 01:00:00", "2020-03-01 01:00:00", "2020-04-01 02:00:00", 
    "2020-05-01 02:00:00"), class = "factor"), age = structure(c(5L, 
    3L, 2L, 4L, 1L), .Label = c("23", "25", "27", "28", "30"), class = "factor"), 
        date2 = structure(1:5, .Label = c("2020-01-01", "2020-02-01", 
        "2020-03-01", "2020-04-01", "2020-05-01"), class = "factor")), row.names = c(NA, 
    -5L), class = "data.frame")
    

    【讨论】:

    • 感谢您的建议!但是,df2 中的 HEX 和 DateTime 列分别更改为数字(在 chr str 中)和 1970-01-01
    • 我看到了问题。是否还有更多课程,或者chr, num, POSIXctdf1 中唯一的课程?
    • 在我的原始数据框中,还有一列结构为Date
    • 那时我们可能需要异常处理。请看看编辑是否有帮助。
    • 太棒了!但是,df2 中 'Age' 列中的值已更改...(很抱歉给您带来困扰)
    猜你喜欢
    • 1970-01-01
    • 2021-10-23
    • 1970-01-01
    • 2019-11-16
    • 2021-04-13
    • 2022-01-27
    • 1970-01-01
    • 1970-01-01
    • 2023-03-27
    相关资源
    最近更新 更多