【问题标题】:How to convert Timestamp column to milliseconds Long column in Spark SQL如何在 Spark SQL 中将时间戳列转换为毫秒长列
【发布时间】:2019-06-18 12:40:01
【问题描述】:

Spark SQL 中将Timestamp 列转换为毫秒时间戳Long 列的最短且最有效的方法是什么?

这是一个从时间戳到毫秒的转换示例

scala> val ts = spark.sql("SELECT now() as ts")
ts: org.apache.spark.sql.DataFrame = [ts: timestamp]

scala> ts.show(false)
+-----------------------+                                                       
|ts                     |
+-----------------------+
|2019-06-18 12:32:02.41 |
+-----------------------+

scala> val tss = ts.selectExpr(
 |   "ts",
 |   "BIGINT(ts) as seconds_ts",
 |   "BIGINT(ts) * 1000 + BIGINT(date_format(ts, 'SSS')) as millis_ts"
 | )
tss: org.apache.spark.sql.DataFrame = [ts: timestamp, seconds_ts: bigint ... 1 more field]

scala> tss.show(false)
+----------------------+----------+-------------+                               
|ts                    |seconds_ts|millis_ts    |
+----------------------+----------+-------------+
|2019-06-18 12:32:02.41|1560861122|1560861122410|
+----------------------+----------+-------------+

如您所见,从时间戳获取毫秒的最直接方法不起作用 - 转换为 long 返回秒,但时间戳中的毫秒信息被保留。

我发现提取毫秒信息的唯一方法是使用 date_format 函数,这并不像我想象的那么简单。

有人知道从Timestamp 列中获取毫秒UNIX 时间的方法比这更简单吗?

【问题讨论】:

    标签: apache-spark apache-spark-sql


    【解决方案1】:

    根据Spark的DateTimeUtils上的代码:

    “时间戳对外暴露为java.sql.Timestamp,内部存储为longs,能够以微秒级精度存储时间戳。”

    因此,如果您定义一个具有 java.sql.Timestamp 作为输入的 UDF,您可以简单地调用 getTime 以毫秒为单位的 Long。

    val tsConversionToLongUdf = udf((ts: java.sql.Timestamp) => ts.getTime)
    

    将此应用于各种时间戳:

    val df = Seq("2017-01-18 11:00:00.000", "2017-01-18 11:00:00.111", "2017-01-18 11:00:00.110", "2017-01-18 11:00:00.100")
      .toDF("timestampString")
      .withColumn("timestamp", to_timestamp(col("timestampString")))
      .withColumn("timestampConversionToLong", tsConversionToLongUdf(col("timestamp")))
      .withColumn("timestampCastAsLong", col("timestamp").cast(LongType))
    
    df.printSchema()
    df.show(false)
    
    // returns
    root
     |-- timestampString: string (nullable = true)
     |-- timestamp: timestamp (nullable = true)
     |-- timestampConversionToLong: long (nullable = false)
     |-- timestampCastAsLong: long (nullable = true)
    
    +-----------------------+-----------------------+-------------------------+-------------------+
    |timestampString        |timestamp              |timestampConversionToLong|timestampCastAsLong|
    +-----------------------+-----------------------+-------------------------+-------------------+
    |2017-01-18 11:00:00.000|2017-01-18 11:00:00    |1484733600000            |1484733600         |
    |2017-01-18 11:00:00.111|2017-01-18 11:00:00.111|1484733600111            |1484733600         |
    |2017-01-18 11:00:00.110|2017-01-18 11:00:00.11 |1484733600110            |1484733600         |
    |2017-01-18 11:00:00.100|2017-01-18 11:00:00.1  |1484733600100            |1484733600         |
    +-----------------------+-----------------------+-------------------------+-------------------+
    

    请注意,“timestampCastAsLong”列仅显示直接转换为 Long 不会以毫秒为单位返回所需的结果,而只会以秒为单位。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-12-06
      • 2016-08-01
      • 2016-03-04
      • 2013-03-09
      • 2016-03-04
      • 1970-01-01
      相关资源
      最近更新 更多