【问题标题】:How to change the date_format in Pyspark?如何更改 Pyspark 中的 date_format?
【发布时间】:2022-02-10 13:22:33
【问题描述】:

我目前正在研究 pyspark,并且我有一个 csv 文件(其中有几列,我将只显示日期数据类型列)在 Excel 中打开时看起来像这样:

Date received   Date sent to company
11/13/2014  11/13/2014
11/13/2014  11/13/2014
11/13/2014  11/13/2014
11/13/2014  11/13/2014
12-11-2014  11/13/2014
12-11-2014  11/13/2014
12-11-2014  11/13/2014
12-11-2014  11-12-2014
12-11-2014  11-12-2014
12-11-2014  11-12-2014
12-11-2014  11-12-2014
12-11-2014  11-12-2014
12-11-2014  11-12-2014
12-11-2014  11-12-2014
12-11-2014  11-12-2014
12-11-2014  11-12-2014
12-11-2014  11-12-2014
12-11-2014  11-12-2014

Here is the screenshot for more clear understanding

如您所见,我已将这个 csv 文件用于我的 pyspark,但我真的希望日期列以一种特定格式显示:“dd-mm-yyyy”。

有人可以帮我吗?!

虽然我试过了:

df.select(col("Date_received"),to_date(col("Date_received"),"dd-MM-yyyy").alias("date")) \
  .show()

给出以下输出:

+-------------+----------+
|Date_received|      date|
+-------------+----------+
|   11/13/2014|      null|
|   11/13/2014|      null|
|   11/13/2014|      null|
|   11/13/2014|      null|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
|   12-11-2014|2014-11-12|
+-------------+----------+
only showing top 20 rows

观察前 4 行的输出如何为“null”。而且我还提供“dd-mm-yyyy”,那么输出怎么会有“yyyy-mm-dd”格式?

如何解决这个问题?因为我想在这里更改 date_format(到“dd-mm-yyyy”)。

【问题讨论】:

    标签: python sql pyspark


    【解决方案1】:

    要处理数据中可用的多个date_formats,您可以使用to_date 将它们中的每一个解析为一个新列,然后使用coalesce 第一个非空值

    您可以在此找到更多信息 - Parse Date Format

    Spark 中可用的日期解析格式 - https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

    一个典型的例子如下-

    数据准备

    df = pd.read_csv(StringIO("""
    Date received,Date sent to company
    11/13/2014,11/13/2014
    11/13/2014,11/13/2014
    11/13/2014,11/13/2014
    11/13/2014,11/13/2014
    12-11-2014,11/13/2014
    12-11-2014,11/13/2014
    12-11-2014,11/13/2014
    12-11-2014,11-12-2014
    12-11-2014,11-12-2014
    12-11-2014,11-12-2014
    12-11-2014,11-12-2014
    12-11-2014,11-12-2014
    12-11-2014,11-12-2014
    12-11-2014,11-12-2014
    12-11-2014,11-12-2014
    12-11-2014,11-12-2014
    12-11-2014,11-12-2014
    12-11-2014,11-12-2014
    """),delimiter=",")
    
    
    sparkDF = sql.createDataFrame(df)
    
    sparkDF.show()
    
    +-------------+--------------------+
    |Date received|Date sent to company|
    +-------------+--------------------+
    |   11/13/2014|          11/13/2014|
    |   11/13/2014|          11/13/2014|
    |   11/13/2014|          11/13/2014|
    |   11/13/2014|          11/13/2014|
    |   12-11-2014|          11/13/2014|
    |   12-11-2014|          11/13/2014|
    |   12-11-2014|          11/13/2014|
    |   12-11-2014|          11-12-2014|
    |   12-11-2014|          11-12-2014|
    |   12-11-2014|          11-12-2014|
    |   12-11-2014|          11-12-2014|
    |   12-11-2014|          11-12-2014|
    |   12-11-2014|          11-12-2014|
    |   12-11-2014|          11-12-2014|
    |   12-11-2014|          11-12-2014|
    |   12-11-2014|          11-12-2014|
    |   12-11-2014|          11-12-2014|
    |   12-11-2014|          11-12-2014|
    +-------------+--------------------+
    

    至今

    sparkDF = sparkDF.withColumn('p1',F.to_date(F.col('Date received'),'MM/dd/yyyy'))\
                     .withColumn('p2',F.to_date(F.col('Date received'),'MM-dd-yyyy'))
    
    sparkDF.show()
    
    +-------------+--------------------+----------+----------+
    |Date received|Date sent to company|        p1|        p2|
    +-------------+--------------------+----------+----------+
    |   11/13/2014|          11/13/2014|2014-11-13|      null|
    |   11/13/2014|          11/13/2014|2014-11-13|      null|
    |   11/13/2014|          11/13/2014|2014-11-13|      null|
    |   11/13/2014|          11/13/2014|2014-11-13|      null|
    |   12-11-2014|          11/13/2014|      null|2014-12-11|
    |   12-11-2014|          11/13/2014|      null|2014-12-11|
    |   12-11-2014|          11/13/2014|      null|2014-12-11|
    |   12-11-2014|          11-12-2014|      null|2014-12-11|
    |   12-11-2014|          11-12-2014|      null|2014-12-11|
    |   12-11-2014|          11-12-2014|      null|2014-12-11|
    |   12-11-2014|          11-12-2014|      null|2014-12-11|
    |   12-11-2014|          11-12-2014|      null|2014-12-11|
    |   12-11-2014|          11-12-2014|      null|2014-12-11|
    |   12-11-2014|          11-12-2014|      null|2014-12-11|
    |   12-11-2014|          11-12-2014|      null|2014-12-11|
    |   12-11-2014|          11-12-2014|      null|2014-12-11|
    |   12-11-2014|          11-12-2014|      null|2014-12-11|
    |   12-11-2014|          11-12-2014|      null|2014-12-11|
    +-------------+--------------------+----------+----------+
    

    合并

    sparkDF = sparkDF.withColumn('date_received_parsed',F.coalesce(F.col('p1'),F.col('p2')))\
                     .drop(*['p1','p2'])
    
    sparkDF.show()
    
    +-------------+--------------------+--------------------+
    |Date received|Date sent to company|date_received_parsed|
    +-------------+--------------------+--------------------+
    |   11/13/2014|          11/13/2014|          2014-11-13|
    |   11/13/2014|          11/13/2014|          2014-11-13|
    |   11/13/2014|          11/13/2014|          2014-11-13|
    |   11/13/2014|          11/13/2014|          2014-11-13|
    |   12-11-2014|          11/13/2014|          2014-12-11|
    |   12-11-2014|          11/13/2014|          2014-12-11|
    |   12-11-2014|          11/13/2014|          2014-12-11|
    |   12-11-2014|          11-12-2014|          2014-12-11|
    |   12-11-2014|          11-12-2014|          2014-12-11|
    |   12-11-2014|          11-12-2014|          2014-12-11|
    |   12-11-2014|          11-12-2014|          2014-12-11|
    |   12-11-2014|          11-12-2014|          2014-12-11|
    |   12-11-2014|          11-12-2014|          2014-12-11|
    |   12-11-2014|          11-12-2014|          2014-12-11|
    |   12-11-2014|          11-12-2014|          2014-12-11|
    |   12-11-2014|          11-12-2014|          2014-12-11|
    |   12-11-2014|          11-12-2014|          2014-12-11|
    |   12-11-2014|          11-12-2014|          2014-12-11|
    +-------------+--------------------+--------------------+
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2022-01-23
      • 2017-04-18
      • 1970-01-01
      • 2015-12-27
      • 2016-03-08
      • 2023-04-07
      • 2020-09-02
      • 1970-01-01
      相关资源
      最近更新 更多