这是基于word pronunciations 生成并使用arpabet 编写的婴儿谈话:
#!/usr/bin/env python3
from nltk.corpus import cmudict # $ pip install nltk
# >>> nltk.download('cmudict')
def baby_talk(word, repeat=3, phone_sep=u'\N{NO-BREAK SPACE}',
pronunciations=cmudict.dict()):
for phones in pronunciations.get(word.casefold(), []):
for i, ph in enumerate(phones):
if ph[-1] in '012': # found vowel sound
return phone_sep.join((phones[:i] + [ph[:-1]]) * repeat)
return naive_baby_talk(word, repeat, phone_sep) # no pronunciations
def naive_baby_talk(word, repeat, phone_sep, vowels="aeiouAEIOU"):
i = None
for i, char in enumerate(word, start=1):
if char in vowels:
break # found vowel
return phone_sep.join([word[:i]] * repeat)
例子:
import re
sentences = ["hello world",
"Quiet European rhythms.",
"My nth happy hour.",
"Herb unit -- a dynasty heir."]
for sentence in sentences:
sesese = " ".join(["".join(
[w if i & 1 or not w else baby_talk(w) # keep non-words as is
for i, w in enumerate(re.split("(\W+)", non_whitespace))])
for non_whitespace in sentence.split()])
print(u'"{}" → "{}"'.format(sentence, sesese))
输出
“你好世界”→“HH AH HH AH HH AH W ER W ER W ER”
“安静的欧洲节奏。” → “K W AY K W AY K W AY Y UH Y UH Y UH R IH R IH R IH。”
“我的第 n 个欢乐时光。” →“M ay m ay m ay eh eh eh hh ae hh ae ae ae aw aw aw aw aw aw aw。”
“药草单位——一个王朝的继承人。” → “呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃地
注意:
-
nth, hour, herb, heir 以元音开头
-
European, unit 以辅音开头
-
y 在“节奏”中,“朝代”是元音
见: