【发布时间】:2022-01-05 21:40:08
【问题描述】:
我有一个数据框,您可以通过以下代码获得它:
import numpy as np
import pandas as pd
from io import StringIO
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
df4s = """
contract RB BeginDate
0 A00118 46 20120705
1 A00118 47 20121005
2 A00253 48 0
3 A00253 48 0
"""
df4 = pd.read_csv(StringIO(df4s.strip()), sep='\s+',
dtype={"BeginDate": int}
)
输出是:
contract RB BeginDate
0 A00118 46 20120705
1 A00118 47 20121005
2 A00253 48 0
3 A00253 48 0
现在我想根据'BeginDate'生成一个新的标题'first_month',逻辑很简单, 如果 BeginDate 等于 0 那么 first_month 将为 0,或者它将等于 BeginDate 的月份值,我的代码是:
df4['first_month'] = np.where(df4['BeginDate'] != 0,
df4['BeginDate'].astype(str).str[4:6:1].astype(int), 0)
错误是:
ValueError: invalid literal for int() with base 10: ''
错误轨迹是:
:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy)
707 # work around NumPy brokenness, #1987
708 if np.issubdtype(dtype.type, np.integer):
--> 709 return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
710
711 # if we have a datetime/timedelta array of objects
pandas\_libs\lib.pyx in pandas._libs.lib.astype_intsafe()
pandas/_libs/src\util.pxd in util.set_value_at_unsafe()
ValueError: invalid literal for int() with base 10: ''
输出应该是:
contract RB BeginDate first_month
0 A00118 46 20120705 7
1 A00118 47 20121005 10
2 A00253 48 0 0
3 A00253 48 0 0
有朋友可以帮忙吗?
【问题讨论】:
标签: python pandas dataframe numpy