【问题标题】:spark dataframe pivoting is throwing AssertionError: assertion failed: unsafe symbol Unstable火花数据框旋转正在抛出 AssertionError: assertion failed: unsafe symbol Unstable
【发布时间】:2019-07-01 19:43:24
【问题描述】:

我有一个数据框,即 resultDf 如下

+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+
|model_family_id|classification_type|classification_value|benchmark_type_code|          data_date|data_item_code|data_item_value_numeric|data_item_value_string|
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+
|              1|            COUNTRY|                 AGO|               MEAN|2018-03-31 00:00:00|   CREDITSCORE|                     15|                     b|
|              1|            COUNTRY|                 AGO|            OBS_CNT|2018-03-31 00:00:00|   CREDITSCORE|                      4|                     b|
|              1|            COUNTRY|                 AGO|         OBS_CNT_CA|2018-03-31 00:00:00|   CREDITSCORE|                      4|                  null|
|              1|            COUNTRY|                 AGO|       PERCENTILE_0|2018-03-31 00:00:00|   CREDITSCORE|                     15|                     b|
|              1|            COUNTRY|                 AGO|      PERCENTILE_10|2018-03-31 00:00:00|   CREDITSCORE|                     15|                     b|
|              1|            COUNTRY|                 AGO|     PERCENTILE_100|2018-03-31 00:00:00|   CREDITSCORE|                     15|                     b|
|              1|            COUNTRY|                 AGO|      PERCENTILE_25|2018-03-31 00:00:00|   CREDITSCORE|                     15|                     b|
|              1|            COUNTRY|                 AGO|      PERCENTILE_50|2018-03-31 00:00:00|   CREDITSCORE|                     15|                     b|
|              1|            COUNTRY|                 AGO|      PERCENTILE_75|2018-03-31 00:00:00|   CREDITSCORE|                     15|                     b|
|              1|            COUNTRY|                 AGO|      PERCENTILE_90|2018-03-31 00:00:00|   CREDITSCORE|                     15|                     b|
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+

我根据“benchmark_type_code”列对表进行透视, 需要实现以下业务逻辑

如果(data_item_code)是“SCORE”或“PG_SCORE” ====> 选择 data_item_value_string 作为值 别的 ==> 选择 data_item_value_numeric 作为值

为此我在下面写了代码


   val pivot_resultDf =  resultDf.groupBy("model_family_id","classification_type","classification_value" ,"benchmark_type_code","data_date")
                .pivot("benchmark_type_code")
                .agg( first( 
                        when( col("data_item_code").===("SCORE"),  col("data_item_value_numeric"))
                             .otherwise(col("data_item_value_string"))
                    ) )

但是我在 agg 函数 @when 条件中遇到错误


java.lang.AssertionError: assertion failed: unsafe symbol Unstable (child of <none>) in runtime reflection universe
    at scala.reflect.internal.Symbols$Symbol.<init>(Symbols.scala:205)
    at scala.reflect.internal.Symbols$TypeSymbol.<init>(Symbols.scala:3030)
    at scala.reflect.internal.Symbols$Symbol.newStubSymbol(Symbols.scala:521)
    at scala.reflect.internal.pickling.UnPickler$Scan.readExtSymbol$1(UnPickler.scala:258)
    at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:286)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.unpickleClass(JavaMirrors.scala:619)
    at scala.reflect.runtime.SymbolLoaders$TopClassCompleter$$anonfun$complete$1.apply$mcV$sp(SymbolLoaders.scala:28)
    at scala.reflect.runtime.SymbolLoaders$TopClassCompleter$$anonfun$complete$1.apply(SymbolLoaders.scala:25)
    at scala.reflect.runtime.SymbolLoaders$TopClassCompleter$$anonfun$complete$1.apply(SymbolLoaders.scala:25)
    at scala.reflect.internal.SymbolTable.slowButSafeEnteringPhaseNotLaterThan(SymbolTable.scala:263)
    at scala.reflect.runtime.SymbolLoaders$TopClassCompleter.complete(SymbolLoaders.scala:25)
    at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1535)
    at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:158)
    at org.apache.spark.sql.functions$.typedLit(functions.scala:113)
    at org.apache.spark.sql.functions$.lit(functions.scala:96)
    at org.apache.spark.sql.Column.$eq$eq$eq(Column.scala:262)

我在这里做错了什么?如何解决这个问题?

【问题讨论】:

    标签: scala apache-spark apache-spark-sql datastax


    【解决方案1】:

    我不确定您为什么会收到断言错误,但我能够成功获得结果。通常断言错误是句法错误。请检查行尾并尝试在 spark shell 上执行以查看真正的差距在哪里。 找到显示我能够获得所需结果的屏幕截图。

    【讨论】:

    • 我有几列像 A, B , C, D, E 。我是 groupBy("A","B").pivot("C).agg(first(...)) ..... 但是结果数据框没有原始 D & D 列数据对吗?在我的结果数据框中也得到这些?
    • 而不是“benchmark_type_code”列,它的值应该在透视列中对吗?我没有得到其他条件,我该怎么办?
    【解决方案2】:

    这是有效的

    .agg(首先( 当(col(“数据”).isin(“x”,“a”,“y”,“z”), when( col("code").isin("aa","bb") , col("numeric")).otherwise(col("string")) ) .otherwise(col("数字")) )

    【讨论】:

      猜你喜欢
      • 2020-09-29
      • 2021-04-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多