【问题标题】:Python - tokenizing, replacing wordsPython - 标记,替换单词
【发布时间】:2013-05-16 09:30:01
【问题描述】:

我正在尝试创建类似句子的东西,其中包含随机单词。具体来说,我会有类似的东西:

"The weather today is [weather_state]."

并且能够做一些事情,比如在 [括号] 中查找所有标记,而不是将它们交换为字典或列表中的随机对应物,给我留下:

"The weather today is warm."
"The weather today is bad."

"The weather today is mildly suiting for my old bones."

请记住,[bracket] 标记的位置不会总是在同一位置,并且我的字符串中会有多个括号标记,例如:

"[person] is feeling really [how] today, so he's not going [where]."

我真的不知道从哪里开始,或者这甚至是使用标记化或标记模块的最佳解决方案。非常感谢任何可以为我指明正确方向的提示!

编辑:只是为了澄清,我真的不需要使用方括号,任何非标准字符都可以。

【问题讨论】:

  • 可能是个愚蠢的建议,但您是否看过 {}s 的字符串格式?

标签: python dictionary tokenize string-parsing


【解决方案1】:

您正在寻找带有回调函数的 re.sub:

words = {
    'person': ['you', 'me'],
    'how': ['fine', 'stupid'],
    'where': ['away', 'out']
}

import re, random

def random_str(m):
    return random.choice(words[m.group(1)])


text = "[person] is feeling really [how] today, so he's not going [where]."
print re.sub(r'\[(.+?)\]', random_str, text)

#me is feeling really stupid today, so he's not going away.   

请注意,与format 方法不同,这允许对占位符进行更复杂的处理,例如

[person:upper] got $[amount if amount else 0] etc

基本上,您可以在此基础上构建自己的“模板引擎”。

【讨论】:

  • 太棒了,我喜欢它干净高效的方式。它确实有效,并且作为 Python 初学者理解它给了我一个优势。 :) 聪明的做法是编写一个字典文件,将其保存在磁盘上并将其加载到此处的“单词”字典中......关于字典文件语法在文件中的样子的任何指针?非常感谢!
  • @bitworks:最简单最方便的选择是json:docs.python.org/2/library/json.html
【解决方案2】:

您可以使用format 方法。

>>> a = 'The weather today is {weather_state}.'
>>> a.format(weather_state = 'awesome')
'The weather today is awesome.'
>>>

还有:

>>> b = '{person} is feeling really {how} today, so he\'s not going {where}.'
>>> b.format(person = 'Alegen', how = 'wacky', where = 'to work')
"Alegen is feeling really wacky today, so he's not going to work."
>>>

当然,这种方法只适用于如果你可以从方括号切换到大括号。

【讨论】:

    【解决方案3】:

    如果您使用大括号而不是方括号,那么您的字符串可以用作string formatting template。您可以使用itertools.product 进行大量替换:

    import itertools as IT
    
    text = "{person} is feeling really {how} today, so he's not going {where}."
    persons = ['Buster', 'Arthur']
    hows = ['hungry', 'sleepy']
    wheres = ['camping', 'biking']
    
    for person, how, where in IT.product(persons, hows, wheres):
        print(text.format(person=person, how=how, where=where))
    

    产量

    Buster is feeling really hungry today, so he's not going camping.
    Buster is feeling really hungry today, so he's not going biking.
    Buster is feeling really sleepy today, so he's not going camping.
    Buster is feeling really sleepy today, so he's not going biking.
    Arthur is feeling really hungry today, so he's not going camping.
    Arthur is feeling really hungry today, so he's not going biking.
    Arthur is feeling really sleepy today, so he's not going camping.
    Arthur is feeling really sleepy today, so he's not going biking.
    

    要生成随机句子,你可以使用random.choice:

    for i in range(5):
        person = random.choice(persons)
        how = random.choice(hows)
        where = random.choice(wheres)
        print(text.format(person=person, how=how, where=where))
    

    如果你必须使用方括号并且在你的格式中没有大括号,你 可以用大括号替换括号,然后按上述进行:

    text = "[person] is feeling really [how] today, so he's not going [where]."
    text = text.replace('[','{').replace(']','}')
    

    【讨论】:

    • 这个person=person, how=how, where=where 的东西如果有数百个就会变得非常愚蠢。
    • 我决定在这里远离format(**locals()),因为它无法准确说明替换是如何进行的。但如果您确实有数百个变量,format(**locals()) 将是您的最佳选择。
    猜你喜欢
    • 2020-03-27
    • 2017-11-03
    • 2016-10-02
    • 1970-01-01
    • 2021-12-26
    • 2020-06-03
    • 2014-07-21
    • 1970-01-01
    • 2018-07-22
    相关资源
    最近更新 更多