【问题标题】:Categorize data from sentence column对句子列中的数据进行分类
【发布时间】:2021-11-29 16:35:14
【问题描述】:

我正在尝试通过分析每行中包含一个句子的列来添加一列单字类别

我尝试了以下代码,但它一直给我错误!

def loan_cat(row):
    rows = df[df.columns[0]].count()
    for i in rows: 
        data = df['purpose'][i]
        if 'house' in data:
            return 'house'
        elif 'education' | 'university' in data:
            return 'education'
        elif 'wedding' in data:
            return 'wedding'
        elif 'car' in data:
            return 'car'
        elif 'real' in data:
            return 'real estate'
        elif 'property'in data:
            return 'property'
        return 'undefined'
    
df['purpose_1'] = df.apply(loan_cat, axis=1)

有没有更好的方法来分析和分类数据?

【问题讨论】:

    标签: python-3.x pandas dataframe analysis


    【解决方案1】:

    使用字典

    import pandas
    
    data = pandas.Series(["purchase a house",
                          "purchase car",
                          "supplemental education",
                          "burger",
                          "attend university"])
    
    arr = {"house": "house",
           "education": "education",
           "university": "education",
           "car": "car"}
    
    
    def foo(s, d):
        for k, v in d.items():
            if k in s:
                return v
        return "NA"
    
    
    data.apply(lambda x: foo(x, arr))
    # 0        house
    # 1          car
    # 2    education
    # 3           NA
    # 4    education
    # dtype: object
    

    【讨论】:

    • 有可能,但我想要更多地分析每一行中的文本,然后给出类别作为结果
    【解决方案2】:

    我想出了答案:

    def loan_cat(value): 
    
          if 'hous' in value:
             return 'House'
          elif 'educ' in value:
             return 'Education'
          elif 'university' in value:
             return 'Education'
          elif 'wedding' in value:
             return 'Wedding'
          elif 'car' in value:
             return 'Car'
          elif 'real' in value:
             return 'Real Estate'
          elif 'property'in value:
             return 'Property'
          return 'undefined'
    
    df['purpose_cat'] = df['purpose'].apply(lambda value: loan_cat(value)) 
    print(df['purpose_cat'].value_counts()) 
    

    【讨论】:

    • 正如目前所写,您的答案尚不清楚。请edit 添加其他详细信息,以帮助其他人了解这如何解决所提出的问题。你可以找到更多关于如何写好答案的信息in the help center
    猜你喜欢
    • 2017-09-30
    • 1970-01-01
    • 2016-12-20
    • 1970-01-01
    • 2016-03-20
    • 2017-05-03
    • 1970-01-01
    • 2012-08-31
    • 1970-01-01
    相关资源
    最近更新 更多