【问题标题】:Parse error when trying to remove time and change date format?尝试删除时间并更改日期格式时解析错误?
【发布时间】:2021-10-07 08:10:47
【问题描述】:

我正在尝试在我的社交媒体数据集中删除时间并更改日期格式,以便在我合并两个数据集时它与我的股票数据兼容。

这是我的社交媒体数据集示例:

0       id      created_at
1       1       7:51 PM ET Fri, 17 July 2020
2       2       7:33 PM ET Fri, 17 July 2020
4       4       7:25 PM ET Fri, 17 July 2020
5       5       4:24 PM ET Fri, 17 July 2020
…       …       …
3076    3076    10:15 AM ET Tue, 26 Dec 2017
3077    3077    11:12 AM ET Thu, 20 Sept 2018
3078    3078    7:07 PM ET Fri, 22 Dec 2017
3079    3079    7:07 PM ET Fri, 22 Dec 2017
3080    3080    6:52 PM ET Fri, 22 Dec 2017

我试图让日期看起来像这样:

Date        Open    High
2017-12-22  2684.22 2685.35
2017-12-26  2679.09 2682.74
2017-12-27  2682.10 2685.64
2017-12-28  2686.10 2687.66
2017-12-29  2689.15 2692.12

这是我尝试过但没有奏效的方法:

pd.to_datetime(data['created_at'])

但我得到错误:

 ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2053         try:
-> 2054             values, tz_parsed = conversion.datetime_to_datetime64(data)
   2055             # If tzaware, these values represent unix timestamps, so we

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

ParserError                               Traceback (most recent call last)
<ipython-input-13-34e0ddb54ab0> in <module>
----> 1 pd.to_datetime(data['created_at'])

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
    801             result = arg.map(cache_array)
    802         else:
--> 803             values = convert_listlike(arg._values, format)
    804             result = arg._constructor(values, index=arg.index, name=arg.name)
    805     elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    457         assert format is None or infer_datetime_format
    458         utc = tz == "utc"
--> 459         result, tz_parsed = objects_to_datetime64ns(
    460             arg,
    461             dayfirst=dayfirst,

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2057             return values.view("i8"), tz_parsed
   2058         except (ValueError, TypeError):
-> 2059             raise e
   2060 
   2061     if tz_parsed is not None:

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2042 
   2043     try:
-> 2044         result, tz_parsed = tslib.array_to_datetime(
   2045             data,
   2046             errors=errors,

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime_object()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime_object()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()

~/opt/anaconda3/lib/python3.8/site-packages/dateutil/parser/_parser.py in parse(timestr, parserinfo, **kwargs)
   1366         return parser(parserinfo).parse(timestr, **kwargs)
   1367     else:
-> 1368         return DEFAULTPARSER.parse(timestr, **kwargs)
   1369 
   1370 

~/opt/anaconda3/lib/python3.8/site-packages/dateutil/parser/_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    641 
    642         if res is None:
--> 643             raise ParserError("Unknown string format: %s", timestr)
    644 
    645         if len(res) == 0:

ParserError: Unknown string format: created_at 

感谢您的帮助:)

编辑:Sample of dataset

【问题讨论】:

    标签: python pandas date parsing time


    【解决方案1】:

    拆分, 并保留第二部分(日期)并使用pd.to_datetime 将其转换为日期时间:

    >>> pd.to_datetime(df['created_at'].str.split(', ').str[1])
    1      2020-07-17
    2      2020-07-17
    4      2020-07-17
    5      2020-07-17
    3076   2017-12-26
    3077   2018-09-20
    3078   2017-12-22
    3079   2017-12-22
    3080   2017-12-22
    Name: created_at, dtype: datetime64[ns]
    

    旧答案 你可以使用dateutil包(已经安装了pandas):

    from dateutil import parser
    
    >>> df['created_at'].apply(parser.parse, tzinfos={'ET': -4*3600})
    
    1      2020-07-17 19:51:00-04:00
    2      2020-07-17 19:33:00-04:00
    4      2020-07-17 19:25:00-04:00
    5      2020-07-17 16:24:00-04:00
    3076   2017-12-26 10:15:00-04:00
    3077   2018-09-20 11:12:00-04:00
    3078   2017-12-22 19:07:00-04:00
    3079   2017-12-22 19:07:00-04:00
    3080   2017-12-22 18:52:00-04:00
    Name: created_at, dtype: datetime64[ns, tzoffset('ET', -14400)]
    

    如果需要,您可以向字典 tzinfos 添加其他时区。

    更新

    ParserError:未知字符串格式:created_at。

    之所以引发此异常,是因为在 df['created_at'] 列中,您有一个值为“created_at”的值。例如:

    >>> df
       id                    created_at
    0   0                         hello  # <- it's not a valid datetime
    1   1  7:51 PM ET Fri, 17 July 2020
    2   2  7:33 PM ET Fri, 17 July 2020
    
    >>> df['created_at'].apply(parser.parse, tzinfos={'ET': -4*3600})
    
    ---------------------------------------------------------------------------
    ParserError                               Traceback (most recent call last)
    
    ...
    
    ParserError: Unknown string format: hello  # 'hello' is not a valid datetime
    

    要查找不正确的,请搜索所有不包含 'AM' 或 'PM' 作为值的行:

    >>> df.loc[~df['created_at'].str.contains(r'(?:AM|PM)'), 'created_at']
    
    1    hello
    Name: created_at, dtype: object
    

    【讨论】:

    • 您好,感谢您的帮助。但我收到错误 NameError: name 'parser' is not defined 应用您的代码时。有什么想法吗?
    • @user16561849。不要忘记像这样导入模块:from dateutil import parser.
    • 对不起,愚蠢的错误。但是,它仍然显示 ParserError: Unknown string format: created_at.
    • 您在df['created_at'] 中有一个值为“created_at”,因此解析器无法将此字符串解码为有效的日期格式。
    • 所以要解决这个问题,我需要将 'created_at' 更改为 'Date'?
    猜你喜欢
    • 2018-06-18
    • 1970-01-01
    • 2020-01-05
    • 1970-01-01
    • 2017-11-13
    • 2018-05-27
    • 1970-01-01
    • 1970-01-01
    • 2015-02-03
    相关资源
    最近更新 更多