【发布时间】:2018-12-13 05:14:08
【问题描述】:
我试图在 scala 中编写一个 udf 函数并在我的 pyspark 工作中使用它。 我的数据框架构是
root
|-- vehicle_id: string
|-- driver_id: string
|-- StartDtLocal: timestamp
|-- EndDtLocal: timestamp
|-- trips: array
| |-- element: struct
| | |-- week_start_dt_local: timestamp
| | |-- week_end_dt_local: timestamp
| | |-- start_dt_local: timestamp
| | |-- end_dt_local: timestamp
| | |-- StartDtLocal: timestamp
| | |-- EndDtLocal: timestamp
| | |-- vehicle_id: string
| | |-- duration_sec: float
| | |-- distance_km: float
| | |-- speed_distance_ratio: float
| | |-- speed_duration_ratio: float
| | |-- speed_event_distance_km: float
| | |-- speed_event_duration_sec: float
|-- trip_details: array
| |-- element: struct
| | |-- event_start_dt_local: timestamp
| | |-- force: float
| | |-- speed: float
| | |-- sec_from_start: float
| | |-- sec_from_end: float
| | |-- StartDtLocal: timestamp
| | |-- EndDtLocal: timestamp
| | |-- vehicle_id: string
| | |-- trip_duration_sec: float
我正在尝试编写一个 udf 函数
def calculateVariables(row: Row):HashMap[String, Float] = {
case class myRow(week_start_dt_local: Timestamp, week_end_dt_local: Timestamp, start_dt_local: Timestamp, end_dt_local :Timestamp, StartDtLocal:Timestamp,EndDtLocal:Timestamp,vehicle_id:String,duration_sec:Int,distance_km:Int,speed_distance_ratio:Float,speed_duration_ratio:Float,speed_event_distance_km:Float,speed_event_duration_sec:Float)
val trips = row.getAs[WrappedArray[myRow]](4)
在此映射函数中,我试图将行转换为案例类,但无法。我收到此错误。
java.lang.ClassCastException:org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema 无法转换为 VariableCalculation.VariableCalculation$myRow$3
谁能帮我解决这个问题?
【问题讨论】:
标签: scala apache-spark