【发布时间】:2020-08-21 07:05:39
【问题描述】:
是否试图从 pyspark 数据框中的日期列中获取每月的星期几?我正在使用以下示意图来获取星期:date_format(to_date("my_date_col","yyyy-MM-dd") "W") from https://www.datasciencemadesimple.com/get-week-number-from-date-in-pyspark/#:~:text=In%20order%20to%20get%20Week,we%20use%20weekofmonth()%20function.
奇怪的是,这似乎每周都有效,除了 8 月 20 日的第一周!
base.filter(col("acct_cycle_cut_dt").between("2020-08-01","2020-08-07")\
).select("acct_cycle_cut_dt",month("acct_cycle_cut_dt"),\
date_format(to_date("acct_cycle_cut_dt","yyyy-MM-dd"), "W")\
).limit(4).show()
+-----------------+------------------------+----------------------------------------------------------+
|acct_cycle_cut_dt|month(acct_cycle_cut_dt)|date_format(to_date(`acct_cycle_cut_dt`, 'yyyy-MM-dd'), W)|
+-----------------+------------------------+----------------------------------------------------------+
| 2020-08-02| 8| 2|
| 2020-08-07| 8| 2|
| 2020-08-07| 8| 2|
| 2020-08-07| 8| 2|
+-----------------+------------------------+----------------------------------------------------------+
base.filter(col("acct_cycle_cut_dt").between("2020-07-01","2020-07-07")\
).select("acct_cycle_cut_dt",month("acct_cycle_cut_dt"),\
date_format(to_date("acct_cycle_cut_dt","yyyy-MM-dd"), "W")\
).limit(4).show()
+-----------------+------------------------+----------------------------------------------------------+
|acct_cycle_cut_dt|month(acct_cycle_cut_dt)|date_format(to_date(`acct_cycle_cut_dt`, 'yyyy-MM-dd'), W)|
+-----------------+------------------------+----------------------------------------------------------+
| 2020-07-03| 7| 1|
| 2020-07-03| 7| 1|
| 2020-07-02| 7| 1|
| 2020-07-02| 7| 1|
+-----------------+------------------------+----------------------------------------------------------+
【问题讨论】:
-
您的代码和数据难以阅读,没有'2020-08-01'的结果和执行结果,可重现的数据。
-
请原谅我的糟糕编辑!
标签: date debugging pyspark week-number