使用正则表达式将具有多个值的字典映射到键答案

【问题标题】：Mapping a dictionary with multiple values to key using regular expressions使用正则表达式将具有多个值的字典映射到键
【发布时间】：2019-04-03 00:02:18
【问题描述】：

位置列示例：

file= pd.DataFrame(columns = ['location'])
file['location'] = ['India, city3','city3','city2','china']

new_dict 示例（它是一个默认字典）：

new_dict = {'India':['India','city1', 'city2', 'city3'],'China':['China','city4','city5']}

预期输出：

India
India
India
China

示例代码：

for x in file['location']:
    for Country,Cities in new_dict.items():
        if re.findall('(?<![a-zA-Z])'+str(Cities).lower()+'(?![a-zA-Z])', str(x).lower()) != None:
            file['COUNTRY'] = Country

我目前正在尝试使用字典将城市映射到国家。我正在尝试将一些正则表达式合并在一起，因为location 列不会提供完全匹配。我收到此错误bad character range i-d at position 1408。请告诉我如何解决这个问题。

【问题讨论】：

您在问多个问题 1) 如何使用字典将城市映射到国家 2) 接收错误 > 位置 1408 处的错误字符范围 i-d
我可以将城市映射到国家，但它只需要完全匹配。并且不拿起任何其他东西，例如。 city1,India 不会被接走。只有完全匹配，例如。 city2 或印度

标签： python regex python-3.x pandas dictionary

【解决方案1】：

首先你需要使用ChainMap来展平你的新字典

from collections import ChainMap
d = dict(ChainMap(*map(dict.fromkeys,new_dict.values() , new_dict.keys())))
d
Out[49]: 
{'China': 'China',
 'India': 'India',
 'city1': 'India',
 'city2': 'India',
 'city3': 'India',
 'city4': 'China',
 'city5': 'China'}

然后我们使用replace 和split 来产生结果

sample_df.replace(d,regex=True).location.str.split(',').str[0]
Out[53]: 
0    India
1    India
2    India
3    china
Name: location, dtype: object

【讨论】：

嗨，对不起，我没有说清楚。上面的代码有效。但是，它不适用于所有行，因为并非所有行都采用相同的格式。例如。 'shanghai,china' , 'china,shanghai', 'shanghai, china, building 三'等