Python：从字符串中检测单词并找到它的位置答案

【问题标题】：Python: Detect word from string and also find its locationPython：从字符串中检测单词并找到它的位置
【发布时间】：2021-08-30 20:00:53
【问题描述】：

我是 python 新手，想制作一个简单的程序，以詹姆斯邦德风格打印你的名字和介词。

因此，如果名称包含任何介词，例如“Van”、“Von”、“De”或“Di”，我希望程序将其打印为：

{Preposition} {LastName}, {FirstName} {Preposition} {LastName} *edited

为此，我知道我们需要一个用户名和介词列表。

a = [user input separated with the .split function]
b = [list of prepositions]

为了找到名称中介词的实例，我发现可以使用下面的代码：

if any(x in a for x in b):

但是，我在尝试打印名称时遇到了一个问题，因为介词可能是上述（列表 b）中的任何一个。如果不知道它及其在字符串中的位置，我找不到打印方法。一开始我以为可以使用.index函数，但它似乎只能搜索一个单词，而不是这里需要的几个。我能得到的最接近的是：

name_split.index('preposition1') # works
name_split.index('preposition1', 'preposition2', etc.) # does not work

所以我要问的是是否有办法检查输入文本中是否使用列表 (b) 中的任何单词，并获取所述单词的位置.

希望我能够正确解释它，并且有人可以向我提供一些帮助。提前;谢谢。

【问题讨论】：

您可以遍历 a 并检查是否在 b 中，如果是则打印或存储名称。 for name in a: if name in b: print name ...您以这种方式在每次迭代中打印或存储，而无需确定命题分别出现的位置

标签： python indexing split

【解决方案1】：

我想不出比使用for 循环更好的方法：

pattern = "{1} {2}, {0} {1} {2}"
prepositions = ['van', 'von', 'de', 'di']

# (optional) 'lower' so that we don't have to consider cases like 'vAn'
name = "Vincent van Gogh".lower()
index = -1  # by default, we believe that we did not find anything
for preposition in prepositions:
    # 'find' is the same as 'index', but returns -1 if the substring is not found
    index = name.find(preposition)
    if index != -1:
        break  # found an entry

if index == -1:
    print("Not found")
else:
    print("The index is", index,
          "and the preposition is", preposition)
    print(pattern.format(*name.split()))

输出：

The index is 8 and the preposition is van
van gogh, vincent van gogh

如果您想遍历名称列表，那么您可以这样做：

pattern = ...
prepositions = ...
names = ...

for name in names:
    name = name.lower()
    ... # the rest is the same

带有第二种介词的新版本（"Jr.", "Sr."）：

def check_prepositions(name, prepositions):
    index = -1

    for preposition in prepositions:
        index = name.find(preposition)
        if index != -1:
            break  # found an entry

    return index, preposition


patterns = [
    "{1} {2}, {0} {1} {2}",
    "{1}, {0} {1} {2}"
]

all_prepositions = [
    ['van', 'von', 'de', 'di'],
    ["Jr.", "Sr."]
]

names = ["Vincent van Gogh", "Robert Downey Jr.", "Steve"]

for name in names:
    for pattern, prepositions in zip(patterns, all_prepositions):
        index, preposition = check_prepositions(name, prepositions)

        if index != -1:
            print("The index is", index,
                  "and the preposition is", preposition)
            print(pattern.format(*name.split()))
            break

    if index == -1:
        print("Not found, name:", name)

输出：

The index is 8 and the preposition is van
van Gogh, Vincent van Gogh
The index is 14 and the preposition is Jr.
Downey, Robert Downey Jr.
Not found, name: Steve

【讨论】：

@Daniel 如果这个答案对你有帮助或者你喜欢它，请不要忘记vote up and mark the answer as a solution
@Daniel 我已经根据你的问题更新了我的答案
@Daniel 我在您的评论中添加了第二种介词
希望你@Daniel 可以accept my answer。

【解决方案2】：

为什么你在名字中找到什么介词很重要？您不会在任何地方打印它，您真正关心的是姓氏和姓名的其余部分。您可以简单地使用rsplit() 从右侧拆分，而不是寻找介词，并要求maxsplit 为1。例如：

>>> "Vincent van Gogh".rsplit(" ", 1)
['Vincent van', 'Gogh']

>>> "James Bond".rsplit(" ", 1)
['James', 'Bond']

然后，您可以简单地打印您认为合适的值。

fname, lname = input_name.rsplit(" ", 1)
print(f"{lname}, {fname} {lname}")

使用input_name = "Vincent van Gogh"，这将打印Gogh, Vincent van Gogh。使用input_name = "James Bond"，您将获得Bond, James Bond。

这还有一个额外的好处，即如果人们输入中间名/首字母，它也可以工作。

>> fname, lname = "Samuel L. Jackson".rsplit(" ", 1)
>> print(f"{lname}, {fname} {lname}")
Jackson, Samuel L. Jackson

请注意，人们写名字的方式有很多奇怪之处，因此值得一看 Falsehoods Programmers Believe About Names

【讨论】：

道歉。我要打印的是 {Preposition} {LastName}, {FirstName} {Preposition} {LastName}，现在在原始问题中进行了编辑。换句话说，印刷品将是“梵高，文森特梵高”。我还打算以我在这里得到的答案为灵感，进一步改进代码中使用“Jr.”、“Sr.”或名称中的任何罗马数字，可能如下所示：{LastName}, { FirstName} {LastName} {Jr.}

【解决方案3】：

使用正则表达式的不同方法（我知道）。

import re

def process_input(string: str) -> str:
    string = string.strip()
    # Preset some values.
    ln, fn, prep = "", "", ""

    # if the string is blank, return it
    # Otherwise, continue.
    if len(string) > 0:

        # Search for possible delimiter.
        res = re.search(r"([^a-z0-9-'\. ]+)", string, flags = re.I)

        # If delimiter found...
        if res:
            delim = res.group(0)

            # Split names by delimiter and strip whitespace.
            ln, fn, *err = [s.strip() for s in re.split(delim, string)]
     
        else:
            # Split on whitespace
            names = [s.strip() for s in re.split(r"\s+", string)]

            # If first, preposition, last exist or first and last exist.
            # update variables.
            # Otherwise, raise ValueError.
            if len(names) == 3:
                fn, prep, ln = names
            elif len(names) == 2:
                fn, ln = names
            else:
                raise ValueError("First and last name required.")

        # Check for whitespace in last name variable.
        ws_res = re.search(r"\s+", ln)
        if ws_res:
            # Split last name if found.
            prep, ln, *err = re.split(r"\s+", ln)
        
        # Create array of known names.
        output = [f"{ln},", fn, ln]

        # Insert prep if it contains a value
        # This is simply a formatting thing.
        if len(prep) > 0:
            output.insert(2, prep)

        # Output formatted string.
        return " ".join(output)

    return string


if __name__ == "__main__":
    # Loop until q called or a max run amout is reached.
    re_run = True
    max_runs = 10

    while re_run or max_runs > 0:
        print("Please enter your full name\nor press [q] to exit:")
        user_input = input()
        if user_input:
            if user_input.lower().strip() == "q":
                re_run = False
                break

            result = process_input(user_input)
            print("\n" + result + "\n\n")
            max_runs -= 1

【讨论】：