【问题标题】:How do you combine data in a Scala dataframe and output it as JSON objects?如何在 Scala 数据框中组合数据并将其输出为 JSON 对象?
【发布时间】:2021-08-05 09:18:24
【问题描述】:

这对我来说有点难以解释,所以我会尽力而为。这是给定的数据集:

Name Car Brand Car Model Car Color Year Bought
Tom Toyota Corolla Black 2009
Tom Hyundai Kona Blue 2010
Tom Kia Soul Red 2011
Bob Mazda CX-30 Red 2008
Bob BMW X1 Blue 2014

使用给定的数据集,我想根据名称对其进行压缩,然后将所有汽车放入一个列表中,并将其作为 JSON 对象输出到文件中的分隔行上。对于上面的数据集,输出应该是这样的:

{
    "name": "Tom",
    "Cars": [{
        "CarSpecifications": {
            "Brand": "Toyota",
            "Model": "Corolla",
            "Color": "Black"
        },
        "YearBought":2009
     }, 
     {
        "CarSpecifications": {
            "Brand": "Hyundai",
            "Model": "Kona",
            "Color": "Blue"
        },
        "YearBought":2010
     },
     {
        "CarSpecifications": {
            "Brand": "Hyundai",
            "Model": "Kona",
            "Color": "Blue"
        },
        "YearBought":2011
    }]
}

{
    "name": "Bob",
    "Cars": [{
        "CarSpecifications": {
            "Brand": "Mazda",
            "Model": "CX-30",
            "Color": "Red"
        },
        "YearBought":2008
     }, 
     {
        "CarSpecifications": {
            "Brand": "BMW",
            "Model": "X1",
            "Color": "Blue"
        },
        "YearBought":2014
     }]
}

如何使用 Scala 和 Scala Dataframes 完成这些转换?

【问题讨论】:

    标签: scala dataframe apache-spark


    【解决方案1】:

    您可以使用groupBy & collect_list 聚合数据集,并使用toJSON 生成 JSON 字符串:

    df.groupBy("Name").agg(collect_list(
        struct(
          struct(
            $"Car Brand".as("Brand"),
            $"Car Model".as("Model"),
            $"Car Color".as("Color")
          ).as("CarSpecifications"),
          $"Year Bought".as("YearBought")
        ).as("CarSpecifications")
      ).as("Cars"))
      .toJSON
      .show(false)
    
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |value                                                                                                                                                                                                                                                                                                    |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |{"Name":"Tom","Cars":[{"CarSpecifications":{"Brand":"Toyota","Model":"Corolla","Color":"Black"},"YearBought":"2009"},{"CarSpecifications":{"Brand":"Hyundai","Model":"Kona","Color":"Blue"},"YearBought":"2010"},{"CarSpecifications":{"Brand":"Kia","Model":"Soul","Color":"Red"},"YearBought":"2011"}]}|
    |{"Name":"Bob","Cars":[{"CarSpecifications":{"Brand":"Mazda","Model":"CX-30","Color":"Red"},"YearBought":"2008"},{"CarSpecifications":{"Brand":"BMW","Model":"X1","Color":"Blue"},"YearBought":"2014"}]}                                                                                                  |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2022-07-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多