【问题标题】:Dividing rows of dataframe to simple rows in Pyspark在 Pyspark 中将数据帧的行划分为简单的行
【发布时间】:2017-04-05 13:17:11
【问题描述】:

我有这个架构,我想将结果内部拆分为列,以便获得 col1:EventCode、col2:Message 等...我正在使用 Pyspark,我尝试了 explode 函数,但它没有t 似乎适用于 structType,有没有办法在 Spark 中做到这一点?

root
 |-- result: struct (nullable = true)
 |    |-- EventCode: string (nullable = true)
 |    |-- Message: string (nullable = true)
 |    |-- _bkt: string (nullable = true)
 |    |-- _cd: string (nullable = true)
 |    |-- _indextime: string (nullable = true)
 |    |-- _pre_msg: string (nullable = true)
 |    |-- _raw: string (nullable = true)
 |    |-- _serial: string (nullable = true)
 |    |-- _si: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- _sourcetype: string (nullable = true)
 |    |-- _time: string (nullable = true)
 |    |-- host: string (nullable = true)
 |    |-- index: string (nullable = true)
 |    |-- linecount: string (nullable = true)
 |    |-- source: string (nullable = true)
 |    |-- sourcetype: string (nullable = true)

【问题讨论】:

    标签: apache-spark pyspark apache-spark-sql spark-dataframe


    【解决方案1】:

    将数据框的行划分为简单的行很容易。您所要做的就是从数据框中选择所有列并将其分配给另一个数据框。像这样的:

    simpleDF = df.select("result.*")
    

    它将上面给定的架构转换为以下架构:

    simpleDF.printSchema
    
    root
     |-- EventCode: string (nullable = true)
     |-- Message: string (nullable = true)
     |-- _bkt: string (nullable = true)
     |-- _cd: string (nullable = true)
     |-- _indextime: string (nullable = true)
     |-- _pre_msg: string (nullable = true)
     |-- _raw: string (nullable = true)
     |-- _serial: string (nullable = true)
     |-- _si: array (nullable = true)
     |    |-- element: string (containsNull = true)
     |-- _sourcetype: string (nullable = true)
     |-- _time: string (nullable = true)
     |-- host: string (nullable = true)
     |-- index: string (nullable = true)
     |-- linecount: string (nullable = true)
     |-- source: string (nullable = true)
     |-- sourcetype: string (nullable = true)
    

    【讨论】:

      猜你喜欢
      • 2016-07-11
      • 1970-01-01
      • 1970-01-01
      • 2023-03-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多