【问题标题】:Spark: Wrong timestamp parsingSpark:错误的时间戳解析
【发布时间】:2020-09-16 21:52:09
【问题描述】:

我正在创建以下数据框

syncs.select($"event.timestamp",to_date($"event.timestamp".cast(TimestampType))).show

这包括以下行

    timestamp|to_date(CAST(`event.timestamp` AS TIMESTAMP))|
-------------+---------------------------------------------+
1589509800768|                                  52339-07-25|
1589509802730|                                  52339-07-25|
1589509809092|                                  52339-07-25|
1589509810402|                                  52339-07-25|
1589509812112|                                  52339-07-25|
1589509817489|                                  52339-07-25|
1589509818065|                                  52339-07-25|
1589509818902|                                  52339-07-25|
1589509819020|                                  52339-07-25|
1589509819425|                                  52339-07-25|
1589509819830|                                  52339-07-25|

基于 this 1589509800768 是 2020 年 5 月 15 日星期五 02:30:00。

我不明白为什么我会收到这些未来的日期。从时间戳到日期的转换是否也需要某种日期格式?

【问题讨论】:

    标签: scala date apache-spark timestamp


    【解决方案1】:

    Spark 需要以秒而不是毫秒为单位的纪元时间,因此您可以将其除以 1000。

    scala> val values = List(1589509800768L)
    values: List[Long] = List(1589509800768)
    
    scala> val df = values.toDF()
    df: org.apache.spark.sql.DataFrame = [value: bigint]
    
    scala> df.show(false)
    +-------------+
    |value        |
    +-------------+
    |1589509800768|
    +-------------+
    
    
    scala> df.select((col("value") / 1000 ).cast(TimestampType).as("current_time")).show(false)
    +-----------------------+
    |current_time           |
    +-----------------------+
    |2020-05-14 19:30:00.768|
    +-----------------------+
    
    scala> df.select((col("value") / 1000 ).cast(TimestampType).as("current_time")).withColumn("time_utc",
         |   expr("""to_utc_timestamp(current_time, "PST")""")
         | ).show(false)
    +-----------------------+-----------------------+
    |current_time           |time_utc               |
    +-----------------------+-----------------------+
    |2020-05-14 19:30:00.768|2020-05-15 02:30:00.768|
    +-----------------------+-----------------------+
    
    

    【讨论】:

      【解决方案2】:

      首先您应该将秒转换为毫秒,然后转换为时间戳或日期

      object ToTimestamp exteds App{
        val spark = SparkSession
          .builder()
          .appName("ToTimestamp")
          .master("local[*]")
          .config("spark.sql.shuffle.partitions","4") //Change to a more reasonable default number of partitions for our data
          .config("spark.app.id","ToTimestamp") // To silence Metrics warning
          .getOrCreate()
      
        val sc = spark.sparkContext
      
        import org.apache.spark.sql.functions._
        import spark.implicits._
      
        val data = sc.parallelize(List(1589509800768L,1589509802730L,1589509809092L,1589509810402L)).toDF("millis")
      
        val toTimestamp = data.withColumn("timestamp", from_unixtime(col("millis") / 1000))
        toTimestamp.show(truncate = false)
      /*
      +-------------+-------------------+
      |millis       |timestamp          |
      +-------------+-------------------+
      |1589509800768|2020-05-15 04:30:00|
      |1589509802730|2020-05-15 04:30:02|
      |1589509809092|2020-05-15 04:30:09|
      |1589509810402|2020-05-15 04:30:10|
      +-------------+-------------------+
      */
      
        val toDate = toTimestamp.selectExpr("millis", "timestamp").withColumn("data", to_date(col("timestamp")))
        toDate.show(truncate = false)
      
      /*
      +-------------+-------------------+----------+
      |millis       |timestamp          |data      |
      +-------------+-------------------+----------+
      |1589509800768|2020-05-15 04:30:00|2020-05-15|
      |1589509802730|2020-05-15 04:30:02|2020-05-15|
      |1589509809092|2020-05-15 04:30:09|2020-05-15|
      |1589509810402|2020-05-15 04:30:10|2020-05-15|
      +-------------+-------------------+----------+
      */
      }
      

      【讨论】:

        猜你喜欢
        • 2019-07-20
        • 1970-01-01
        • 1970-01-01
        • 2023-03-25
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多