【问题标题】:To get Date Series from Start and end date in Spark SQL从 Spark SQL 中的开始日期和结束日期获取日期系列
【发布时间】:2017-06-16 06:39:01
【问题描述】:

我必须将包含开始和结束日期的日期元组转换为日期系列。

-+-----------------------------------------+
 |dateRange                                |
-+-----------------------------------------+
 |[2017-04-06 00:00:00,2017-04-05 00:00:00]|
 |[2017-04-05 00:00:00,2017-04-04 00:00:00]|
 |[2017-04-04 00:00:00,2017-04-03 00:00:00]|
 |[2017-04-03 00:00:00,2017-03-31 00:00:00]| 
 |[2017-03-31 00:00:00,2017-03-30 00:00:00]|
 |[2017-03-30 00:00:00,2017-03-29 00:00:00]|
 |[2017-03-29 00:00:00,2017-03-28 00:00:00]|
 |[2017-03-28 00:00:00,2017-03-27 00:00:00]|
 |[2017-04-06 00:00:00,2017-04-05 00:00:00]|
 |[2017-04-05 00:00:00,2017-04-04 00:00:00]|
 |[2017-04-04 00:00:00,2017-04-03 00:00:00]|
 |[2017-04-03 00:00:00,2017-03-31 00:00:00]|
 |[2017-03-31 00:00:00,2017-03-30 00:00:00]|
 |[2017-03-30 00:00:00,2017-03-29 00:00:00]|
 |[2017-03-29 00:00:00,2017-03-28 00:00:00]|
 |[2017-03-28 00:00:00,2017-03-27 00:00:00]|
 |[2017-04-06 00:00:00,2017-04-05 00:00:00]|
-+-----------------------------------------+

如何将这些元组转换为日期系列,将“to”日期转换为“From”日期?

|[2017-04-03 00:00:00,2017-03-31 00:00:00]|  

转换后应该转换为

|[2017-04-03 00:00:00,2017-04-02 00:00:00,2017-04-01 00:00:00,2017-03-31 00:00:00]|  

【问题讨论】:

    标签: apache-spark apache-spark-sql


    【解决方案1】:

    我已经尝试了下面的代码 sn-p 并且它对我有用。

      import org.apache.spark.sql.functions._
      import org.joda.time.LocalDate
      def dayIterator(start: LocalDate, end: LocalDate) = Iterator.iterate(start)(_ plusDays 1) takeWhile (_ isBefore end)
    
      def dateSeries( date1 : String,date2 : String) : Array[String]= {
        val fromDate = new LocalDate(date1.split(" ")(0))
        val toDate = new LocalDate(date2.split(" ")(0))
        val series = dayIterator(fromDate,toDate).toArray
        val arr = series.map(a => a.toString() + " 00:00:00.0")
        arr
      }
    
      val DateSeries = udf(dateSeries(_: String, _ : String))
    
    
    scala> dateSeries("2017-03-31 00:00:00.0","2017-04-03 00:00:00.0"
    res53: Array[String] = Array(2017-03-31, 2017-04-01, 2017-04-02)
    

    不,即使在 dateSeries 方法的地图操作中附加“00:00:00.0”后,我也无法弄清楚。它返回的数组没有那个附加的字符串。

    【讨论】:

      【解决方案2】:

      创建UDF 并计算fromDatetoDate 之间的日期可以解决问题。为了简单起见,我使用了 Joda Time API。您需要将依赖项添加为

      对于 SBT:

      libraryDependencies += "joda-time" % "joda-time" % "2.8.1"
      

      以下是您的问题的示例

      import spark.implicits._
      
          val data = spark.sparkContext.parallelize(Seq(
            ("2017-04-03 00:00:00,2017-03-31 00:00:00"),
            ("2017-03-31 00:00:00,2017-03-30 00:00:00"),
            ("2017-03-30 00:00:00,2017-03-29 00:00:00"),
            ("2017-03-29 00:00:00,2017-03-28 00:00:00"),
            ("2017-03-28 00:00:00,2017-03-27 00:00:00"),
            ("2017-04-03 00:00:00,2017-03-31 00:00:00"),
            ("2017-04-06 00:00:00,2017-04-05 00:00:00")
          )).toDF("dateRanges")
      
      
          val calculateDate = udf((date: String) => {
      
            val dtf = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss")
      
              val from = dtf.parseDateTime(date.split(",")(0)).toDateTime()
              val to   = dtf.parseDateTime(date.split(",")(1)).toDateTime()
              val dates = scala.collection.mutable.MutableList[String]()
              var toDate = to
              while(from.getMillis != toDate.getMillis){
                if (from.getMillis > toDate.getMillis){
                  dates += from.toString(dtf)
                  toDate = toDate.plusDays(1)
                }
                else {
                  dates += from.toString(dtf)
                  toDate = toDate.minusDays(1)
                }
              }
            dates
          })
      
          data.withColumn("newDate", calculateDate(data("dateRanges")))
      

      如果您的toDate 小于或大于fromDate,这两种情况都适用。

      希望这会有所帮助!

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2012-08-19
        • 1970-01-01
        • 1970-01-01
        • 2016-10-19
        • 2022-12-01
        相关资源
        最近更新 更多