【发布时间】:2019-12-02 19:12:57
【问题描述】:
我有一个 PySpark DataFrame,df,其中一些列如下所示。 hour 列是 UTC 时间,我想创建一个基于 time_zone 列的具有本地时间的新列。我怎样才能在 PySpark 中做到这一点?
df
+-------------------------+------------+
| hour | time_zone |
+-------------------------+------------+
|2019-10-16T20:00:00+0000 | US/Eastern |
|2019-10-15T23:00:00+0000 | US/Central |
+-------------------------+------------+
#What I want:
+-------------------------+------------+---------------------+
| hour | time_zone | local_time |
+-------------------------+------------+---------------------+
|2019-10-16T20:00:00+0000 | US/Eastern | 2019-10-16T15:00:00 |
|2019-10-15T23:00:00+0000 | US/Central | 2019-10-15T17:00:00 |
+-------------------------+------------+---------------------+
【问题讨论】:
标签: apache-spark pyspark apache-spark-sql