【问题标题】:How to fix the ValueError when converting to datetime?转换为日期时间时如何修复 ValueError?
【发布时间】:2020-10-03 11:13:58
【问题描述】:

时间数据与我的格式不匹配的错误。

这是数据示例:

import pandas as pd

data = pd.DataFrame({'TransactionTime': ['Sat Feb 02 12:50:00 IST 2019']})

这是我的代码:

data['TransactionTime'] = pd.to_datetime(data['TransactionTime'], format = '%a %b %d %H:%M:%S %Z %Y')

追溯

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
e:\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    431             try:
--> 432                 values, tz = conversion.datetime_to_datetime64(arg)
    433                 return DatetimeIndex._simple_new(values, name=name, tz=tz)

pandas\_libs\tslibs\conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-139-ed51a35d7ed3> in <module>
----> 1 data['TransactionTime'] = pd.to_datetime(data['TransactionTime'], format = '%a %b %d %H:%M:%S %Z %Y')

e:\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
    726             result = arg.map(cache_array)
    727         else:
--> 728             values = convert_listlike(arg._values, format)
    729             result = arg._constructor(values, index=arg.index, name=arg.name)
    730     elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):

e:\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    433                 return DatetimeIndex._simple_new(values, name=name, tz=tz)
    434             except (ValueError, TypeError):
--> 435                 raise e
    436 
    437     if result is None:

e:\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    398                 try:
    399                     result, timezones = array_strptime(
--> 400                         arg, format, exact=exact, errors=errors
    401                     )
    402                     if "%Z" in format or "%z" in format:

pandas\_libs\tslibs\strptime.pyx in pandas._libs.tslibs.strptime.array_strptime()

ValueError: time data 'Sat Feb 02 12:50:00 IST 2019' does not match format '%a %b %d %H:%M:%S %Z %Y' (match)

【问题讨论】:

  • 这可能会帮助您了解data['TransactionTime'] 是什么。根据the documentation,它需要是“int,float,str,datetime,list,tuple,一维数组,Series DataFrame/dict-like”,如果你遇到这个错误,它可能不是。另一种可能性是您要求将其格式化为'%a %b %d %H:%M:%S %Z %Y',而给定数据中不存在这些字段中的一个或多个。 (见this page

标签: python pandas datetime


【解决方案1】:

如果您的数据有多个时区。

  • 将唯一时区映射到相应的 UTC 偏移量,然后进行本地化。
  • 创建DateTime 列后,可以删除其他列。
    • df.drop(columns=['a', 'b', 'd', 'time', 'tz', 'Y', 'TTime'], inplace=True)
import pandas as pd

# data and dataframe
df = pd.DataFrame({'TTime': ['Sat Feb 02 12:50:00 IST 2019', 'Sat Feb 02 12:50:00 EST 2019']})

                        TTime
 Sat Feb 02 12:50:00 IST 2019
 Sat Feb 02 12:50:00 EST 2019

# split the string into components; assumes all strings are formatted similarly
df[['a', 'b', 'd', 'time', 'tz', 'Y']] = df.TTime.str.split(expand=True)

                        TTime    a    b   d      time   tz     Y
 Sat Feb 02 12:50:00 IST 2019  Sat  Feb  02  12:50:00  IST  2019
 Sat Feb 02 12:50:00 EST 2019  Sat  Feb  02  12:50:00  EST  2019

# create list of unique time zones
uni_tzs = df.tz.unique().tolist()
print(uni_tzs)
>>> ['IST', 'EST']

# UTC offset for each timezone
tzs = ['+05:30', '-05:00']

# combine into a dict
maps = dict(zip(uni_tzs, tzs))

# map the different time zones to their UTC offsets
df.tz = df.tz.map(maps)

# create the DateTime column and convert to a time zone of your choice
df['DateTime'] = pd.to_datetime(pd.to_datetime(df.Y + df.b + df.d + df.time + df.tz, format='%Y%b%d%H:%M:%S%z'), utc=True).dt.tz_convert('Asia/Kolkata')

                        TTime    a    b   d      time      tz     Y                  DateTime
 Sat Feb 02 12:50:00 IST 2019  Sat  Feb  02  12:50:00  +05:30  2019 2019-02-02 12:50:00+05:30
 Sat Feb 02 12:50:00 EST 2019  Sat  Feb  02  12:50:00  -05:00  2019 2019-02-02 23:20:00+05:30

或者

  • 映射到时区名称而不是偏移量。
    • format 参数中使用%Z 而不是%z
df_tzs = df.tz.unique().tolist()
tzs = ['Asia/Kolkata', 'US/Eastern']
maps = dict(zip(df_tzs, tzs))

df.tz = df.tz.map(maps)

df['DateTime'] = pd.to_datetime(pd.to_datetime(df.Y + df.b + df.d + df.time + df.tz, format='%Y%b%d%H:%M:%S%Z'), utc=True).dt.tz_convert('Asia/Kolkata')

                        TTime    a    b   d      time            tz     Y                  DateTime
 Sat Feb 02 12:50:00 IST 2019  Sat  Feb  02  12:50:00  Asia/Kolkata  2019 2019-02-02 12:50:00+05:30
 Sat Feb 02 12:50:00 EST 2019  Sat  Feb  02  12:50:00    US/Eastern  2019 2019-02-02 23:20:00+05:30

【讨论】:

    【解决方案2】:

    该错误很可能源于%Z 无法将IST 解析到正确时区的问题。有多个时区可以缩写为“IST”,所以无论如何它都是模棱两可的。

    解析例如'IST' 到特定时区,您可以定义映射字典并将其提供给 dateutil 的 parser.parse:

    import pandas as pd
    import dateutil
    
    tzmap = {'IST': dateutil.tz.gettz('Asia/Kolkata')}
    
    data = pd.DataFrame({'TransactionTime': ['Sat Feb 02 12:50:00 IST 2019']})
    
    data['TransactionTime'] = data['TransactionTime'].apply(lambda t: dateutil.parser.parse(t, tzinfos=tzmap))
    
    # data['TransactionTime']
    # 0   2019-02-02 12:50:00+05:30
    # Name: TransactionTime, dtype: datetime64[ns, tzfile('Asia/Calcutta')]
    

    【讨论】:

      猜你喜欢
      • 2019-09-12
      • 2020-08-18
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-09-14
      • 1970-01-01
      • 2013-02-14
      • 2014-02-19
      相关资源
      最近更新 更多