PANDAS 新列使用正则表达式映射来匹配具有列表值的字典答案

【问题标题】：PANDAS new column using regular expression map to match on a dict with list valuesPANDAS 新列使用正则表达式映射来匹配具有列表值的字典
【发布时间】：2017-07-29 05:04:44
【问题描述】：

假设您有一个 dicts 列表，每个 dict 的值作为值列表。像这样。

foods = [ 
    {'apples' : ['sweet', 'round', 'red'] }, 
    {'liver' : ['juicy', 'flat', 'nasty'] }, 
    {'chocolate': ['tasty', 'block', 'dark' ] } 
]

现在想象一个带有一列名称的简单数据框，如下所示：

menu = [
    {'Name' : ['Sweet caramel bananas', 'Juicy farm salad', 'Hog face dark ice-cream destruction'] }
    {'Price' : [20, 15, 32] }
]
yum_yums = pandas.DataFrame(menu)

假设您想为每个菜单项创建一个食物类别。例如，因为甜焦糖香蕉包含苹果，所以它应该有 'apple' 键作为类别。

使用正则表达式将第一个列表中的 dicts 中的值与 NAME 列中的值相匹配，创建一个新列并将键作为分配的类别的最佳方法是什么？

最终结果如下：

menu = [
    {'Name' : ['Sweet caramel bananas', 'Juicy farm salad', 'Hog face dark ice-cream destruction'] }
    {'Price' : [20, 10, 32] }
    {'Category' : ['apple', 'liver', 'chocolate'] }
]
food_w_cat = pandas.DataFrame(menu)

【问题讨论】：

我看不出你想用正则表达式做什么。我根本不知道问题出在哪里。

标签： python regex pandas dictionary dataframe

【解决方案1】：

您可以简单地遍历菜单和类别，而不是使用正则表达式，例如：

category = []
#iterate through menu names
for i_name in menu[0]['Name']:

    #transform menu name to lowercase for comparison
    i_name_lower = [i.lower() for i in i_name.split(' ')]

    #enable multiple categories of food per menu
    food_category = []

    #iterate through food categories
    for i_foods in foods:       

        key = list(i_foods.keys())[0]   

        if any([j in i_name_lower for j in i_foods[key]]):
            food_category.append(key)

    category.append(food_category)

menu.append({'category':category})

然后输出如下：

[{'Name': ['Sweet caramel bananas',
   'Juicy farm salad',
   'Hog face dark ice-cream destruction']},
 {'Price': [20, 15, 32]},
 {'category': [['apples'], ['liver'], ['chocolate']]}]

【讨论】：