Python：“胡言乱语的句子生成器”，行为怪异答案

【问题标题】：Python: "gibberish sentence generator", misbehaving in a weird mannerPython：“胡言乱语的句子生成器”，行为怪异
【发布时间】：2014-03-26 20:27:20
【问题描述】：

我正在尝试在 Python 中创建一个简单的“乱码生成器”程序，它会打印一串由字符、空格和末尾的标点符号组成的随机乱码（换句话说，就是一个完整的句子）。它基本上已经可以工作了，但是我遇到了一个奇怪的问题，我无法理解。

尽管我的代码明确限制了任何超过 11 个字符的单词，但我的乱码字符串中的最后一个“单词”总是比应有的长度长。在浏览了上帝的代码之后，我知道有多少次我仍然不知道可能导致这种情况的原因。有趣的是，只有长字符串才会真正引人注目，而短句（最多 50 个字符）看起来基本没问题。

这是我在 Windows powershell 中运行时得到的两个示例输出：

第一个有 50 个字符：

您要打印出多少乱码？ 50

Uxlouasieyt uoygigjas eayouiumza gyfejmu th egkyaulheeb。

秒，300 个字符：

您要打印出多少乱码？ 300

Yhiaztexj ekkexe iiuiyx itozlyui zao cegyeuyiml aofzyyreet cofi owzycwobla rreyblioca rla tpocnelavj ytpa x eefra gnyoe yfxyhnivme miert ywy ykhi ee gup eui ttuoi oeoyaf uenyecb apluo yli xmy uiyaoneewe jyxymxal y dzaiglu uo eqkiyeiz ke oxayuiayzf yyi iqoezu ekuioyotly viyslaybiiwvymitoeagrejvavihigpyoxawefunodgu！

请注意句子中的最后一个单词如何随着字符串的长度逐渐变长，而所有排除的单词都保持在 11 个字符以内。就好像在 gibberish_list 中添加空格的代码部分在某些时候被忽略了。但是为什么呢？

这是完整的代码：

import random

def gibberishgen():
    alphabet_vowels = ['a','e','i','o','u','y',]
    alphabet_consonants = ['b','c','d','f','g','h','j','k','l','m','n','p','q','r','s','t','v','w','x','z']
    gibberish_list = []
    
    while True:
        gibberishamount = raw_input("How many gibberish characters would you like to print out? ")
        if gibberishamount.isdigit():
            break
        else:
            print "Please give me a number!"
    
    # fill the gibberish_list with characters
    lasttwochars = ['','']
    for char in range(1, int(gibberishamount)+1):
        nextcharvowel = random.choice(alphabet_vowels)
        nextcharconsonant = random.choice(alphabet_consonants)
        if lasttwochars[0] in alphabet_consonants and lasttwochars[1] in alphabet_consonants:   # because I don't want more than 2 consonants in a row
            nextchar = nextcharvowel
        else:
            roll = random.randint(1,10)
            if roll > 5:
                nextchar = nextcharvowel
            else:   
                nextchar = nextcharconsonant
        gibberish_list.append(nextchar)
        lasttwochars.append(nextchar)
        lasttwochars.pop(0)
    
    # insert spaces at randomized intervals to separate the "words" from each other
    last_whitespace = 0
    for index in range(0, len(gibberish_list)+1):
        randspace = random.randint(1,10)
        if index >= last_whitespace + 3 and randspace <= 2:     # make sure words don't get too short on average
            gibberish_list.insert(index, ' ')
            last_whitespace = index
        elif index > last_whitespace + 10:                      # ...or too long
            gibberish_list.insert(index, ' ')
            last_whitespace = index
    
    punctlist = ['.', '!', '?']
    
    gibberishstring = ''.join(gibberish_list)
    finalstring = gibberishstring.capitalize() + random.choice(punctlist)
    print "\n", finalstring, "\n"
    
gibberishgen()

如果有人向我解释这里发生了什么，我将不胜感激。我学习 python 才两个月，所以是的，很可能我错过了一些应该很明显的东西。

还可以随时指出您发现的任何不好的语法或练习。

【问题讨论】：

标签： python string

【解决方案1】：

当您在gibberish_list 中插入空格时，它会变得越来越长，但是当您开始迭代时，您的循环会在与gibberish_list 中的最后一个字符相对应的字符索引处停止，因此它永远不会到达列表的末尾，插入的空格越多（即更长的字符串），这一点就越明显。

【讨论】：

是的，这是有道理的。实际上我不敢相信我没有想到这一点，现在你指出了这一点。感谢您清理一切。

【解决方案2】：

这是一个稍微扩展的版本：

它适用于 Python 2.x 和 3.x，并使用真实的字母和单词长度频率。

from itertools import islice
from random import choice, randint
import sys

if sys.hexversion < 0x3000000:
    inp = raw_input
    rng = xrange
else:
    inp = input
    rng = range


LETTERS = (    # relative character frequencies
    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabb"
    "bbbbbbbbbbcccccccccccccccccccccdddddddddddddddddddddddddddddddde"
    "eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee"
    "eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeefffffffffffffffffgggggggggggggggg"
    "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhiiiiiiiiiiiiiiiiii"
    "iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiijjkkkkkklllllllllllllllllllll"
    "llllllllllmmmmmmmmmmmmmmmmmmmnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn"
    "nnnnnnnnnnnnnnnnoooooooooooooooooooooooooooooooooooooooooooooooo"
    "ooooooooopppppppppppppppqrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr"
    "rrrrrrsssssssssssssssssssssssssssssssssssssssssssssssstttttttttt"
    "ttttttttttttttttttttttttttttttttttttttttttttttttttttttttttuuuuuu"
    "uuuuuuuuuuuuuuuvvvvvvvvwwwwwwwwwwwwwwwwwwxxxyyyyyyyyyyyyyyyzz"
)

CONSONANTS  = ''.join(ch for ch in LETTERS if ch not in "aeiouy")
VOWELS      = ''.join(ch for ch in LETTERS if ch     in "aeiouy")
PUNCTUATION = "....??!"

is_cons     = set(CONSONANTS).__contains__    # is_cons(x) == x in set(CONSONANTS)

WORDLEN = [     # relative word-length frequencies
    2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,
    2,  2,  2,  2,  2,  3,  3,  3,  3,  3,  3,  3,
    3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,
    3,  3,  3,  3,  4,  4,  4,  4,  4,  4,  4,  4,
    4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,
    5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,
    5,  5,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,
    7,  7,  7,  7,  7,  7,  7,  7,  8,  8,  8,  8,
    8,  9,  9,  9, 10, 10, 10, 11, 11, 12
]

wordlen = lambda: choice(WORDLEN)

def get_int(prompt):
    while True:
        try:
            return int(inp(prompt))
        except ValueError:
            pass

def gibberish():
    """
    Generate an infinite sequence of random letters,
      allowing no more than two consecutive consonants
    """
    a = choice(LETTERS); yield a
    b = choice(LETTERS); yield b
    while True:
        c = choice(VOWELS if is_cons(a) and is_cons(b) else LETTERS)
        yield c
        a, b = b, c

def take_n(iterable, n):
    return list(islice(iterable, n))

def add_spaces(iterable, make_word_length):
    iterable = iter(iterable)
    while True:
        for i in rng(make_word_length()):
            yield next(iterable)
        yield ' '

def gibberish_sentence():
    length   = get_int("How many characters of gibberish would you like? ")
    chars    = take_n(gibberish(), length)              # make that many chars
    chars    = add_spaces(chars, wordlen)               # add spaces to make "words"
    sentence = ''.join(chars).rsplit(' ', 1)[0]         # crop at last space (don't leave a part-word at the end)
    return sentence.capitalize() + choice(PUNCTUATION)  # capitalize and add punctuation

def main():
    print(gibberish_sentence())

if __name__=="__main__":
    main()

示例输出：

How many characters of gibberish would you like? 180
Ahisent anoe tfon evaer an irpenn otjievt ecfiotuee ebaa wtah sav hii lti
ukt erd elrihe dewa st aosdeec zenle acju ld eeaotl entetom wisvos
aeatresl oixb atidb eekermo nteu darso hligseoanei vhaeoedse qyr sogudc.

【讨论】：