如何从python中的字符串中只获取单词？答案

【问题标题】：How to get only the word from a string in python?如何从python中的字符串中只获取单词？
【发布时间】：2021-02-16 13:45:44
【问题描述】：

我是 pandas 的新手，我对字符串有疑问。所以我有一个字符串s = "'hi'+'bikes'-'cars'>=20+'rangers'" 我只想要字符串中的单词，而不是符号或整数。我该怎么做？

我的意见：

s = "'hi'+'bikes'-'cars'>=20+'rangers'"

异常输出：

s = "'hi','bikes','cars','rangers'"

【问题讨论】：

这里有一个提示：使用 Python regex 库。

标签： python string split integer word

【解决方案1】：

使用正则表达式试试这个

s = "'hi'+'bikes'-'cars'>=20+'rangers'"
samp= re.compile('[a-zA-z]+')
word= samp.findall(s)

【讨论】：

【解决方案2】：

不确定 pandas，但您也可以使用正则表达式，这是解决方案

import re


s = "'hi'+'bikes'-'cars'>=20+'rangers'"
words = re.findall("(\'.+?\')", s)
output = ','.join(words)

print(output)

【讨论】：

【解决方案3】：

对于熊猫，我会先将数据框中的列转换为字符串：

df
                                   a  b
0  'hi'+'bikes'-'cars'>=20+'rangers'  1
1      random_string 'with'+random,#  4
2             more,weird/stuff=wrong  6

df["a"] = df["a"].astype("string")

 df["a"]
0    'hi'+'bikes'-'cars'>=20+'rangers'
1        random_string 'with'+random,#
2               more,weird/stuff=wrong
Name: a, dtype: string

现在你可以看到dtype是字符串，这意味着你可以对它进行字符串操作，包括翻译和拆分（pandas strings）。但首先你必须用从字符串模块string docs导入的标点和数字制作一个翻译表

from string import digits, punctuation

然后制作一个字典，将每个数字和标点符号映射到空格

from itertools import chain
t = {k: " " for k in chain(punctuation, digits)}

使用 str.maketrans 创建翻译表（python 3.8 不需要导入，但与其他版本可能有点不同）并将翻译和拆分（中间有“str”）应用于列）

t = str.maketrans(t)

df["a"] = df["a"].str.translate(t).str.split()
df
                                a  b
0      [hi, bikes, cars, rangers]  1
1  [random, string, with, random]  4
2     [more, weird, stuff, wrong]  6

如您所见，您现在只有单词。

【讨论】：