【问题标题】:Python: Detect word from string and also find its locationPython:从字符串中检测单词并找到它的位置
【发布时间】:2021-08-30 20:00:53
【问题描述】:

我是 python 新手,想制作一个简单的程序,以詹姆斯邦德风格打印你的名字和介词。

因此,如果名称包含任何介词,例如“Van”、“Von”、“De”或“Di”,我希望程序将其打印为:

{Preposition} {LastName}, {FirstName} {Preposition} {LastName} *edited

为此,我知道我们需要一个用户名和介词列表。

a = [user input separated with the .split function]
b = [list of prepositions]

为了找到名称中介词的实例,我发现可以使用下面的代码:

if any(x in a for x in b):

但是,我在尝试打印名称时遇到了一个问题,因为介词可能是上述(列表 b)中的任何一个。如果不知道它及其在字符串中的位置,我找不到打印方法。一开始我以为可以使用.index函数,但它似乎只能搜索一个单词,而不是这里需要的几个。我能得到的最接近的是:

name_split.index('preposition1') # works
name_split.index('preposition1', 'preposition2', etc.) # does not work

所以我要问的是是否有办法检查输入文本中是否使用列表 (b) 中的任何单词,并获取所述单词的位置.

希望我能够正确解释它,并且有人可以向我提供一些帮助。提前;谢谢。

【问题讨论】:

  • 您可以遍历 a 并检查是否在 b 中,如果是则打印或存储名称。 for name in a: if name in b: print name ...您以这种方式在每次迭代中打印或存储,而无需确定命题分别出现的位置

标签: python indexing split


【解决方案1】:

我想不出比使用for 循环更好的方法:

pattern = "{1} {2}, {0} {1} {2}"
prepositions = ['van', 'von', 'de', 'di']

# (optional) 'lower' so that we don't have to consider cases like 'vAn'
name = "Vincent van Gogh".lower()
index = -1  # by default, we believe that we did not find anything
for preposition in prepositions:
    # 'find' is the same as 'index', but returns -1 if the substring is not found
    index = name.find(preposition)
    if index != -1:
        break  # found an entry

if index == -1:
    print("Not found")
else:
    print("The index is", index,
          "and the preposition is", preposition)
    print(pattern.format(*name.split()))

输出:

The index is 8 and the preposition is van
van gogh, vincent van gogh

如果您想遍历名称列表,那么您可以这样做:

pattern = ...
prepositions = ...
names = ...

for name in names:
    name = name.lower()
    ... # the rest is the same

带有第二种介词的新版本("Jr.", "Sr."):

def check_prepositions(name, prepositions):
    index = -1

    for preposition in prepositions:
        index = name.find(preposition)
        if index != -1:
            break  # found an entry

    return index, preposition


patterns = [
    "{1} {2}, {0} {1} {2}",
    "{1}, {0} {1} {2}"
]

all_prepositions = [
    ['van', 'von', 'de', 'di'],
    ["Jr.", "Sr."]
]

names = ["Vincent van Gogh", "Robert Downey Jr.", "Steve"]

for name in names:
    for pattern, prepositions in zip(patterns, all_prepositions):
        index, preposition = check_prepositions(name, prepositions)

        if index != -1:
            print("The index is", index,
                  "and the preposition is", preposition)
            print(pattern.format(*name.split()))
            break

    if index == -1:
        print("Not found, name:", name)

输出:

The index is 8 and the preposition is van
van Gogh, Vincent van Gogh
The index is 14 and the preposition is Jr.
Downey, Robert Downey Jr.
Not found, name: Steve

【讨论】:

【解决方案2】:

为什么你在名字中找到什么介词很重要?您不会在任何地方打印它,您真正关心的是姓氏姓名的其余部分。您可以简单地使用rsplit() 从右侧拆分,而不是寻找介词,并要求maxsplit 为1。例如:

>>> "Vincent van Gogh".rsplit(" ", 1)
['Vincent van', 'Gogh']

>>> "James Bond".rsplit(" ", 1)
['James', 'Bond']

然后,您可以简单地打印您认为合适的值。

fname, lname = input_name.rsplit(" ", 1)
print(f"{lname}, {fname} {lname}")

使用input_name = "Vincent van Gogh",这将打印Gogh, Vincent van Gogh。使用input_name = "James Bond",您将获得Bond, James Bond

这还有一个额外的好处,即如果人们输入中间名/首字母,它也可以工作。

>> fname, lname = "Samuel L. Jackson".rsplit(" ", 1)
>> print(f"{lname}, {fname} {lname}")
Jackson, Samuel L. Jackson

请注意,人们写名字的方式有很多奇怪之处,因此值得一看 Falsehoods Programmers Believe About Names

【讨论】:

  • 道歉。我要打印的是 {Preposition} {LastName}, {FirstName} {Preposition} {LastName},现在在原始问题中进行了编辑。换句话说,印刷品将是“梵高,文森特梵高”。我还打算以我在这里得到的答案为灵感,进一步改进代码中使用“Jr.”、“Sr.”或名称中的任何罗马数字,可能如下所示:{LastName}, { FirstName} {LastName} {Jr.}
【解决方案3】:

使用正则表达式的不同方法(我知道)。

import re

def process_input(string: str) -> str:
    string = string.strip()
    # Preset some values.
    ln, fn, prep = "", "", ""

    # if the string is blank, return it
    # Otherwise, continue.
    if len(string) > 0:

        # Search for possible delimiter.
        res = re.search(r"([^a-z0-9-'\. ]+)", string, flags = re.I)

        # If delimiter found...
        if res:
            delim = res.group(0)

            # Split names by delimiter and strip whitespace.
            ln, fn, *err = [s.strip() for s in re.split(delim, string)]
     
        else:
            # Split on whitespace
            names = [s.strip() for s in re.split(r"\s+", string)]

            # If first, preposition, last exist or first and last exist.
            # update variables.
            # Otherwise, raise ValueError.
            if len(names) == 3:
                fn, prep, ln = names
            elif len(names) == 2:
                fn, ln = names
            else:
                raise ValueError("First and last name required.")

        # Check for whitespace in last name variable.
        ws_res = re.search(r"\s+", ln)
        if ws_res:
            # Split last name if found.
            prep, ln, *err = re.split(r"\s+", ln)
        
        # Create array of known names.
        output = [f"{ln},", fn, ln]

        # Insert prep if it contains a value
        # This is simply a formatting thing.
        if len(prep) > 0:
            output.insert(2, prep)

        # Output formatted string.
        return " ".join(output)

    return string


if __name__ == "__main__":
    # Loop until q called or a max run amout is reached.
    re_run = True
    max_runs = 10

    while re_run or max_runs > 0:
        print("Please enter your full name\nor press [q] to exit:")
        user_input = input()
        if user_input:
            if user_input.lower().strip() == "q":
                re_run = False
                break

            result = process_input(user_input)
            print("\n" + result + "\n\n")
            max_runs -= 1

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-04-27
    • 1970-01-01
    • 1970-01-01
    • 2015-07-18
    • 2020-04-26
    • 2017-10-21
    • 1970-01-01
    相关资源
    最近更新 更多