【问题标题】:Not able to Explode and select in the same expression in spark scala无法在 spark scala 中的相同表达式中爆炸和选择
【发布时间】:2018-05-30 18:45:00
【问题描述】:

这是我的架构

root
 |-- DataPartition: string (nullable = true)
 |-- TimeStamp: string (nullable = true)
 |-- TRFCoraxData_instrumentId: long (nullable = true)
 |-- TRFCoraxData_organizationId: long (nullable = true)
 |-- Dividends: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- cr:AnnouncementDate: string (nullable = true)
 |    |    |-- cr:CorporateActionAdjustedDividendGrossAmount: double (nullable = true)
 |    |    |-- cr:CorporateActionAdjustedDividendNetAmount: double (nullable = true)
 |    |    |-- cr:CurrencyId: long (nullable = true)
 |    |    |-- cr:DividendEventId: long (nullable = true)
 |    |    |-- cr:DividendGrossAmount: double (nullable = true)
 |    |    |-- cr:DividendNetAmount: double (nullable = true)
 |    |    |-- cr:DividendType: string (nullable = true)
 |    |    |-- cr:ExDate: string (nullable = true)
 |    |    |-- cr:PayDate: string (nullable = true)
 |    |    |-- cr:PeriodDuration: string (nullable = true)
 |    |    |-- cr:PeriodEndDate: string (nullable = true)
 |    |    |-- cr:RecordDate: string (nullable = true)
 |-- FFAction|!|: string (nullable = true)

我想分解并选择同一表达式中的所有列,以便 我不必通过单独给出列名来编写 Column 或 Select 。

这是我要爆炸的代码

 val temp2 = temp1.select(getDataPartition($"DataPartition").as("DataPartition"), $"TimeStamp".as("TimeStamp"), $"TRFCoraxData_instrumentId".as("TRFCoraxData_instrumentId"), $"TRFCoraxData_organizationId".as("TRFCoraxData_organizationId"),explode($"Dividends"), $"FFAction|!|".as("FFAction|!|"))
 val temp = temp2.select(temp2.columns.map(x => col(x).as(x.replace("cr:", ""))): _*)

temp.show(false)

这是我得到的输出,我得到了作为 Col 的爆炸列。

如何在同一个表达式中获得列名

+-----------------+-------------------------+-------------------------+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
|DataPartition    |TimeStamp                |TRFCoraxData_instrumentId|TRFCoraxData_organizationId|col                                                                                                                                                                                    |FFAction|!||
+-----------------+-------------------------+-------------------------+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
|ThirdPartyPrivate|2017-06-07T09:18:33+00:00|8590925624               |4296241518                 |[2009-07-14T00:00:00+00:00,null,0.35,500110,73014469387,0.35,null,INTE,2009-08-13T00:00:00+00:00,2009-09-15T00:00:00+00:00,P3M,2009-09-30T00:00:00+00:00,2009-08-17T00:00:00+00:00]    |O|!|       |
|ThirdPartyPrivate|2017-06-07T09:18:33+00:00|8590925624               |4296241518                 |[2008-02-05T00:00:00+00:00,null,0.3,500110,73015860528,0.3,null,INTE,2008-02-14T00:00:00+00:00,2008-03-17T00:00:00+00:00,P3M,2008-03-31T00:00:00+00:00,2008-02-19T00:00:00+00:00]      |O|!|       |
|ThirdPartyPrivate|2017-06-07T09:18:33+00:00|8590925624               |4296241518                 |[2008-04-29T00:00:00+00:00,null,0.3,500110,73015864496,0.3,null,INTE,2008-05-14T00:00:00+00:00,2008-06-16T00:00:00+00:00,P3M,2008-06-30T00:00:00+00:00,2008-05-16T00:00:00+00:00]      |O|!|       |
+-----------------+-------------------------+-------------------------+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+

【问题讨论】:

    标签: scala apache-spark apache-spark-sql


    【解决方案1】:

    如何在同一个表达式中获得列名

    col 是 spark 本身为分解列给出的列名。如果您想要 col 以外的其他名称作为

    ,您可以像对其他列所做的那样使用 alias
    explode($"Dividends").as("Dividends")
    

    然后您可以使用.*将结构列扩展为单独的列

    temp2.select(col("Dividends.*"))
    

    我想分解并选择同一表达式中的所有列,这样我就不必通过分别给出列名来编写 Column 或 Select

    一个表达式只能使用一个生成器。

    【讨论】:

    • 谢谢,这就是问题所在。只有一个生成器可以与一个表达式一起使用。
    猜你喜欢
    • 2022-01-16
    • 1970-01-01
    • 1970-01-01
    • 2012-03-28
    • 2018-10-05
    • 2016-11-09
    • 2018-04-27
    • 2017-08-08
    • 1970-01-01
    相关资源
    最近更新 更多