【问题标题】:Error in reading YAML files as data frame in R在 R 中将 YAML 文件作为数据框读取时出错
【发布时间】:2016-07-27 08:53:02
【问题描述】:

我正在尝试使用以下命令读取位于 here 的数据的 YAML 文件格式,但两者都没有像位于 here 的 CSV 文件那样以所需的输出格式提供数据。 YAML文件中的数据描述为here或很快,可以直接参考最后给出的格式。

我尝试使用这些命令加载数据,但没有成功。谁能指导我将 YAML 文件中的数据正确加载为 R 数据名或按照上面指定的输出格式转换为 csv?

cric <- yaml.load_file("911047.yaml")
cric <- data.frame(yaml.load_file("211028.yaml"))

我正在为您提供以下数据的高级格式供您快速参考(抱歉,在此处粘贴时原始 YAML 代码格式已消失,我无法找到按原样粘贴和重新格式化的方法):

meta:   
  data_version: 0.6   
  created: 2013-02-22   
  revision: 1 
  info:  
  city: Southampton   
  dates:
    - 2005-06-13   
  match_type: T20   
  outcome:
    by:
      runs: 100
    winner: England   
    overs: 20   
    player_of_match:
      - KP Pietersen
    teams:
      - England
      - Australia
    toss:
      decision: bat
      winner: England
    umpires:
    - NJ Llong
    - JW Lloyds
    venue: The Rose Bowl innings:
  - 1st innings:
      team: England
      deliveries:
        - 0.1:
            batsman: ME Trescothick
            bowler: B Lee
            non_striker: GO Jones
            runs:
              batsman: 0
              extras: 0
              total: 0

【问题讨论】:

  • 您将无法快速将其转换为 data.frame,因为数据没有自然的矩形结构。您将不得不编写一个自定义解析函数将其转换为向量,然后将结果放在一起rbind()

标签: r csv dataframe yaml export-to-csv


【解决方案1】:

可以通过 reshape2 包中的 melt 来解决

下面的代码会有所帮助

library(reshape2)
library(reshape2)
data = yaml.load_file("C:\\Users\\vsahu\\Downloads\\mdms\\911047.yaml")
x = melt(data)
y = data.frame(x)

meta = y[y$L1 == 'meta',]
meta = meta[, colSums(is.na(meta)) != nrow(meta)]
data_meta = reshape(meta,direction = 'wide',timevar = 'L2',idvar = 'L1')

info = y[y$L1 == 'info',]
info = info[, colSums(is.na(info)) != nrow(info)]
info = subset(data_innings, select=-c(L1))


data_innings = y[(y$L1 == 'innings') & (y$L4 == 'deliveries'),]
data_innings$new = paste(data_innings$L7,data_innings$L8,sep="_")
data_innings = subset(data_innings, select=-c(L7,L8,L4,L1,L5))
data_innings = reshape(data_innings,idvar=c('L2','L3','L6'),direction = "wide",timevar = c('new'))
write.csv(data_innings,"data_innings.csv",row.names = F)

【讨论】:

    【解决方案2】:

    我在上面编辑了 Vaibhav 的答案以创建一个函数,该函数读取指定目录中的所有 yaml 文件并将其转换为 csv。它处理了由 reshape 引起的多行匹配错误。

    aggr_fielder <- function(x) {
    paste0(x, collapse="/")
    }
    
    convertCricsheetData <- function(source = ".",destination = ""){
    require(yaml)
    require(reshape2)
    require(data.table)
    all.files <- list.files(path = source,
                            pattern = ".yaml",
                            full.names = TRUE)
    
    for (i in 1:length(all.files)) {
        data = yaml.load_file(all.files[i])
        x = melt(data)
        y = data.table(x)
    
        meta = y[y$L1 == 'meta',]
        meta = meta[, colSums(is.na(meta)) != nrow(meta), with=FALSE]
        data_meta = reshape(meta,direction = 'wide',timevar = 'L2',idvar = 'L1')
    
        info = y[y$L1 == 'info',]
        info = info[, colSums(is.na(info)) != nrow(info), with=FALSE]
        info[, L1 := NULL]
        info[,match_no := i]
    
        data_innings = y[(y$L1 == 'innings') & (y$L4 == 'deliveries'),]
        data_innings[, new := paste(data_innings$L7,data_innings$L8,sep="_")]
        data_innings [, c("L7","L8","L4","L1","L5") := NULL]
        data_innings = dcast(data_innings, L2+L3+L6 ~ new, fun.aggregate = aggr_fielder,fill = NA)
        data_innings[,match_no := i]
        write.csv(data_innings,paste0(destination,paste(c(info[info$L2 == "dates",]$value,info[info$L2 == "teams",]$value), collapse = "-"),".csv"),row.names = F)
        write.csv(info,paste0(destination,paste(c("info",info[info$L2 == "dates",]$value,info[info$L2 == "teams",]$value), collapse = "-"),".csv"),row.names = F)
        }
    }
    

    【讨论】:

      猜你喜欢
      • 2017-08-09
      • 1970-01-01
      • 1970-01-01
      • 2013-08-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-07-01
      • 2013-08-27
      相关资源
      最近更新 更多