【问题标题】:PySpark adding a column to an existing DataFrame - TypeError: Invalid argument, not a string or columnPySpark向现有DataFrame添加列 - TypeError:无效参数,不是字符串或列
【发布时间】:2020-12-14 17:31:15
【问题描述】:

我正在尝试将三个计算列添加到我的数据框中。

以下方法不起作用并引发错误:

TypeError:参数无效,不是字符串或列: 类型的 DataFrame[TicketClosedDate:timestamp]。对于列字面量,请使用“lit”、“array”、“struct”或“create_map”函数。

    TAT = df.select(datediff(col('zTicketSubmitDateUTC'), col('zTicketUpdateDateUTC')).alias('TAT'))
    
    TicketClosedDate = df.select(to_timestamp(
                    when(col('TicketStatusName')=='Closed',col('TicketUpdateDate'))
                    .when(col('TicketStatusName')=='Complete',col('TicketUpdateDate'))
                    .when(col('TicketStatusName')=='Done',col('TicketUpdateDate'))
                    .otherwise('Null')
                    ).alias('TicketClosedDate'))
    
    zTicketClosedDateUTC = df.select(to_timestamp(
                    when(col('TicketStatusName')=='Closed',col('zTicketUpdateDateUTC'))
                    .when(col('TicketStatusName')=='Complete',col('zTicketUpdateDateUTC'))
                    .when(col('TicketStatusName')=='Done',col('zTicketUpdateDateUTC'))
                    .otherwise('Null')
                    ).alias('zTicketClosedDateUTC'))
    
    
    df2 = df.select(
        col('ProjectID'),
        col('TicketID'),
        col('ChildTicketID'),
        col('TicketSubmitDate'),
        col('zTicketSubmitDateUTC'),
        col('TicketUpdateDate'),
        col('zTicketUpdateDateUTC'),
        TicketClosedDate,
        zTicketClosedDateUTC,
        col('TicketStatusName'),
        col('PtgName'),
        col('TicketCategory'),
        TAT)

【问题讨论】:

    标签: apache-spark pyspark apache-spark-sql


    【解决方案1】:

    试试下面的代码。你不需要在变量中做df.select()

    TAT = datediff(col('zTicketSubmitDateUTC'), col('zTicketUpdateDateUTC')).alias('TAT')
    
    TicketClosedDate = to_timestamp(
                    when(col('TicketStatusName')=='Closed',col('TicketUpdateDate'))
                    .when(col('TicketStatusName')=='Complete',col('TicketUpdateDate'))
                    .when(col('TicketStatusName')=='Done',col('TicketUpdateDate'))
                    .otherwise('Null')
                    ).alias('TicketClosedDate')
    
    zTicketClosedDateUTC = to_timestamp(
                    when(col('TicketStatusName')=='Closed',col('zTicketUpdateDateUTC'))
                    .when(col('TicketStatusName')=='Complete',col('zTicketUpdateDateUTC'))
                    .when(col('TicketStatusName')=='Done',col('zTicketUpdateDateUTC'))
                    .otherwise('Null')
                    ).alias('zTicketClosedDateUTC')
    
    
    df2 = df.select(
        col('ProjectID'),
        col('TicketID'),
        col('ChildTicketID'),
        col('TicketSubmitDate'),
        col('zTicketSubmitDateUTC'),
        col('TicketUpdateDate'),
        col('zTicketUpdateDateUTC'),
        TicketClosedDate,
        zTicketClosedDateUTC,
        col('TicketStatusName'),
        col('PtgName'),
        col('TicketCategory'),
        TAT)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-07-01
      • 2019-11-19
      • 1970-01-01
      • 1970-01-01
      • 2018-09-24
      • 1970-01-01
      • 2020-10-27
      • 1970-01-01
      相关资源
      最近更新 更多