【发布时间】:2020-05-27 23:12:12
【问题描述】:
我有一个“出生日期”中包含出生日期的数据框,我想将它们与另一个数据框中的类别相匹配。例如:
firstname lastname birthdate new_professionactuelle new_regiononame new_communeoname
0 Ferhat Abbas 1899-08-24 00:00:00 Ecrivain Jijel Bouafourna
1 Ahmed Ben Bella 1916-12-25 00:00:00 combattant Oranie Maghnia
以及年龄类别:df['S02Q02_Age_rec'].unique(),即:array(['16-24', '25-34', '35-44', '55 or above', '45-54'], dtype=object)
我知道如何知道这些人的年龄:
from datetime import datetime, date
def calculate_age(born):
today = date.today
days_in_year = 365.2425
if born != None:
age = int((date.today() - datetime.strptime(born, '%Y-%m-%d %H:%M:%S').date()).days / days_in_year)
return age
else:
return born
df["age"] = df["birthdate"].apply(calculate_age)
但是如何匹配我得到的年龄,即int 到string 类别,如25-34?
apply
我试过了:
import re
def age_classifier(age, intervals):
for interval in intervals:
lower = int(re.split("\s+", interval)[0])
upper = int(re.split("\s+", interval)[1])
if age in range(lower,upper):
return interval
else:
return age
df["age"] = df["birthdate"].apply(age_classifier(intervals = age_intervals))
但它会返回:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-301-95b25980deea> in <module>
----> 1 df["age"] = df["birthdate"].apply(age_classifier(intervals = age_intervals))
TypeError: age_classifier() missing 1 required positional argument: 'age'
所以我在代码中留下了编码区间:
def age_classifier(age, intervals = None):
intervals = age_intervals
for interval in intervals:
lower = int(re.split("\s+", interval)[0])
upper = int(re.split("\s+", interval)[1])
if age in range(lower,upper):
return interval
else:
return age
但它让我回来了,df["age"] = df["birthdate"].apply(age_classifier):
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-318-757dd6509cc9> in <module>
----> 1 df["age"] = df["birthdate"].apply(age_classifier)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
3589 else:
3590 values = self.astype(object).values
-> 3591 mapped = lib.map_infer(values, f, convert=convert_dtype)
3592
3593 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-316-75658a636ff4> in age_classifier(age, intervals)
2 intervals = age_intervals
3 for interval in intervals:
----> 4 lower = int(re.split("\s+", interval)[0])
5 upper = int(re.split("\s+", interval)[1])
6 if age in range(lower,upper):
ValueError: invalid literal for int() with base 10: '16-24'
np.vectorize:
我也试过了:
np.vectorize(age_classifier)(df["birthdate"],age_intervals)
但该功能似乎只能逐条阅读,即df["birthdate"][0]和16-24。
【问题讨论】:
标签: python-3.x date datetime