【问题标题】:changing an double array field to a single array in hive or pyspark在 hive 或 pyspark 中将双数组字段更改为单个数组
【发布时间】:2021-12-27 18:32:53
【问题描述】:

我有一个字段 interest_product_id,如下所示 -

a.select('cust_id', 'interest_product_id').show(1,False)
+---------------+----------------------------------------------+
|cust_id        |interest_product_id                           |
+---------------+----------------------------------------------+
|4308c3w994     |[[73ndy0-885bns-ysrd, isgbf-6322-734f4-92j72]]|
+---------------+----------------------------------------------+

架构如下 -

root
 |-- cust_id: string (nullable = true)
 |-- interest_product_id: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: string (containsNull = true)

由于字段interest_product_id 是数组类型,并且元素是数组(字符串),因此字段显示[[**]]。如何将其转换为数组(字符串)??

预期结果 -

+---------------+----------------------------------------------+
|cust_id        |interest_product_id                           |
+---------------+----------------------------------------------+
|4308c3w994     |[73ndy0-885bns-ysrd, isgbf-6322-734f4-92j72]  |
+---------------+----------------------------------------------+

请建议最好的方法。谢谢!!

【问题讨论】:

    标签: python pyspark hive apache-spark-sql


    【解决方案1】:

    flatten,从嵌套数组创建一个平面数组。

    from pyspark.sql import functions as F
    
    df = spark.createDataFrame([("4308c3w994", [["73ndy0-885bns-ysrd", "isgbf-6322-734f4-92j72"]], )], ("cust_id", "interest_product_id", ))
    
    df.withColumn("interest_product_id", F.flatten(F.col("interest_product_id"))).show(truncate=False)
    

    输出

    +----------+--------------------------------------------+
    |cust_id   |interest_product_id                         |
    +----------+--------------------------------------------+
    |4308c3w994|[73ndy0-885bns-ysrd, isgbf-6322-734f4-92j72]|
    +----------+--------------------------------------------+
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2015-08-23
      • 1970-01-01
      • 2023-03-07
      • 2021-12-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多