【问题标题】:How to map a date of birth to an age category?如何将出生日期映射到年龄类别?
【发布时间】:2020-05-27 23:12:12
【问题描述】:

我有一个“出生日期”中包含出生日期的数据框,我想将它们与另一个数据框中的类别相匹配。例如:

    firstname   lastname    birthdate               new_professionactuelle  new_regiononame     new_communeoname
0   Ferhat      Abbas       1899-08-24 00:00:00     Ecrivain                Jijel               Bouafourna  
1   Ahmed       Ben Bella   1916-12-25 00:00:00     combattant              Oranie              Maghnia

以及年龄类别:df['S02Q02_Age_rec'].unique(),即:array(['16-24', '25-34', '35-44', '55 or above', '45-54'], dtype=object)

我知道如何知道这些人的年龄:

from datetime import datetime, date 

def calculate_age(born):
    today = date.today
    days_in_year = 365.2425
    if born != None:
        age = int((date.today() - datetime.strptime(born, '%Y-%m-%d %H:%M:%S').date()).days / days_in_year)
        return age
    else:
        return born

df["age"] = df["birthdate"].apply(calculate_age)

但是如何匹配我得到的年龄,即intstring 类别,如25-34

apply

我试过了:

import re

def age_classifier(age, intervals):
    for interval in intervals:
        lower = int(re.split("\s+", interval)[0])
        upper = int(re.split("\s+", interval)[1])
        if age in range(lower,upper):
            return interval
        else:
            return age

df["age"] = df["birthdate"].apply(age_classifier(intervals = age_intervals))

但它会返回:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-301-95b25980deea> in <module>
----> 1 df["age"] = df["birthdate"].apply(age_classifier(intervals = age_intervals))

TypeError: age_classifier() missing 1 required positional argument: 'age'

所以我在代码中留下了编码区间:

def age_classifier(age, intervals = None):
    intervals = age_intervals
    for interval in intervals:
        lower = int(re.split("\s+", interval)[0])
        upper = int(re.split("\s+", interval)[1])
        if age in range(lower,upper):
            return interval
        else:
            return age

但它让我回来了,df["age"] = df["birthdate"].apply(age_classifier):

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-318-757dd6509cc9> in <module>
----> 1 df["age"] = df["birthdate"].apply(age_classifier)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   3589             else:
   3590                 values = self.astype(object).values
-> 3591                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3592 
   3593         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-316-75658a636ff4> in age_classifier(age, intervals)
      2     intervals = age_intervals
      3     for interval in intervals:
----> 4         lower = int(re.split("\s+", interval)[0])
      5         upper = int(re.split("\s+", interval)[1])
      6         if age in range(lower,upper):

ValueError: invalid literal for int() with base 10: '16-24'

np.vectorize:

我也试过了:

np.vectorize(age_classifier)(df["birthdate"],age_intervals)

但该功能似乎只能逐条阅读,即df["birthdate"][0]16-24

【问题讨论】:

    标签: python-3.x date datetime


    【解决方案1】:
    from datetime import datetime, date 
    
    def calculate_age(born):
        today = date.today
        days_in_year = 365.2425
        if type(born) == str:
            age = int((date.today() - datetime.strptime(born, '%Y-%m-%d %H:%M:%S').date()).days / days_in_year)
            return age
        else:
            age = int((date.today() - born.date()).days / days_in_year)
            return age
        return born
    
    df['birthdate']=pd.to_datetime(df['birthdate'], errors='coerce')
    df["age"] = df["birthdate"].dropna().apply(calculate_age)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-03-09
      • 2020-09-08
      • 1970-01-01
      • 1970-01-01
      • 2015-01-21
      相关资源
      最近更新 更多