【问题标题】:How to add a specific word to a new column when it is a value in a list within a column当特定单词是列内列表中的值时,如何将特定单词添加到新列
【发布时间】:2020-09-19 11:20:23
【问题描述】:

假设我的数据集

name what
A    apple[red]
B    cucumber[green]
C    dog
C    orange
D    banana
D    monkey
E    cat
F    carrot
.
.

我想创建并指定一个列表,如果该列包含该列表中包含的值,我想将指定的值设为新列。

列表值

fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']

得到我想要的结果

name what     class
A    apple    fruit
B    cucumber vegetable
C    dog      animal
C    orange   fruit
D    banana   fruit
D    monkey   animal
E    cat      animal
F    carrot   vegetable

列表值和列值不“匹配”,必须包含。

感谢您的阅读。

【问题讨论】:

  • 到目前为止你尝试了什么?
  • @Anwarvic df1 = df['column anme'].str.contains("|".join(listname)) ,, 不能指定多个列表,也说不出话我指定了。

标签: python pandas dataframe contains


【解决方案1】:

Series.map 与从列表创建的字典一起使用,并使用扁平化的值交换键:

fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']

d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}

字典理解的循环替代:

d1 = {}
for oldk, oldv in d.items():
    for k in oldv:
        d1[k] = oldk

然后:

df['class'] = df['what'].map(d1)
#if need values before first [
#df['class'] = df['what'].str.split('[').str[0].map(d1)
print (df)
  name      what      class
0    A     apple      fruit
1    B  cucumber  vegetable
2    C       dog     animal
3    C    orange      fruit
4    D    banana      fruit
5    D    monkey     animal
6    E       cat     animal
7    F    carrot  vegetable

编辑:对于子字符串匹配,您可以通过字典 d 循环,检查 Series.str.contains 的匹配以获取掩码并设置新值:

d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}

for k, v in d.items():
    mask = df['what'].str.contains('|'.join(v))
    df.loc[mask, 'class'] = k
print (df)
  name             what      class
0    A       apple[red]      fruit
1    B  cucumber[green]  vegetable
2    C              dog     animal
3    C           orange      fruit
4    D           banana      fruit
5    D           monkey     animal
6    E              cat     animal
7    F           carrot  vegetable

如果可能有多个单词,请使用单词边界:

for k, v in d.items():
    pat = '|'.join(r"\b{}\b".format(x) for x in v)
    df.loc[ df['what'].str.contains(pat), 'class'] = k
print (df)
  name             what      class
0    A       apple[red]      fruit
1    B  cucumber[green]  vegetable
2    C              dog     animal
3    C           orange      fruit
4    D           banana      fruit
5    D           monkey     animal
6    E              cat     animal
7    F           carrot  vegetable

【讨论】:

  • 我输入了相同的答案,但我无法击败回答熊猫问题的 AI。
  • @ybin - 当然,它用于迭代字典 doldkoldv 表示原始键和原始值。
  • jezrael,我刚刚做了个改动,但是what和list的值不匹配,还有上面apple[red]这样的其他值,那么那个list的值可以是'contain '而不是'匹配'?
  • 我所有的实际数据集都是由多个单词组成的。给您添麻烦了,,,
  • @jezrael 哦,我用list = [f"(?i){re.escape(k)}" for k in list]解决了,非常感谢!
猜你喜欢
  • 1970-01-01
  • 2022-11-22
  • 1970-01-01
  • 2023-03-28
  • 2021-11-18
  • 1970-01-01
  • 1970-01-01
  • 2018-12-24
  • 1970-01-01
相关资源
最近更新 更多