婴儿谈话功能答案

【问题标题】：Baby talk function婴儿谈话功能
【发布时间】：2016-02-14 11:57:03
【问题描述】：

我想构造一个函数，其中 input 是一个普通句子，而 output 是翻译成“子语言”的那个句子。儿童语言是指每个单词的第一个音节只说3遍。所以“hello world”将是“hehehe wowowo”。

我的想法是首先将句子拆分为单词列表。然后对于每个单词，我们应该有一个从 0 开始的计数器。如果辅音计数器加 0 - 如果元音计数器加 1。当计数器 1 停止时，返回辅音和元音并转到下一个单词。但是我在“访问”列表中的每个单词时遇到了麻烦。如何将我的想法付诸实践？

【问题讨论】：

请发布您正在尝试做的事情以及出现的一些实际问题，以便我们更好地帮助您
换句话说：显示你的代码。
欢迎来到 StackOverflow。请阅读并遵循帮助文档中的发布指南。 Minimal, complete, verifiable example 适用于此。在您发布代码并准确描述问题之前，我们无法有效地帮助您。 StackOverflow 不是编码或教程服务。

标签： python python-3.x

【解决方案1】：

不要使用 0-1 计数器；这称为布尔标志。找到元音后，继续下一步。

vowel_list = 'aeiou'
sentence = "hello world"

# split the sentence into a list of words.
word_list = sentence.split()
for word in word_list:
    # Find the minimal pronounceable prefix and print it 3 times
    # Find the first vowel
    for i in range(len(word)):
        if word[i] in vowel_list:
            # Grab the consonants and vowel, and stop
            syllable = word[:i+1]
            break
    # Report the syllable in triplicate
    print syllable*3

这个的输出是

hehehe
wowowo

这应该可以帮助您解决当前的问题。您仍然可以按照您描述的方式将其拼凑成一个函数，然后将单个婴儿单词放入一个婴儿句子中。我也留给你处理问题案例，比如

each
school
rhythm

如果这不能为您解决任何问题，请用更清晰的描述编辑问题。

【讨论】：

【解决方案2】：

这是基于word pronunciations 生成并使用arpabet 编写的婴儿谈话：

#!/usr/bin/env python3
from nltk.corpus import cmudict  # $ pip install nltk
# >>> nltk.download('cmudict')

def baby_talk(word, repeat=3, phone_sep=u'\N{NO-BREAK SPACE}',
              pronunciations=cmudict.dict()):
    for phones in pronunciations.get(word.casefold(), []):
        for i, ph in enumerate(phones):
            if ph[-1] in '012':  # found vowel sound
                return phone_sep.join((phones[:i] + [ph[:-1]]) * repeat)
    return naive_baby_talk(word, repeat, phone_sep)  # no pronunciations


def naive_baby_talk(word, repeat, phone_sep, vowels="aeiouAEIOU"):
    i = None
    for i, char in enumerate(word, start=1):
        if char in vowels:
            break  # found vowel
    return phone_sep.join([word[:i]] * repeat)

例子：

import re

sentences = ["hello world",
             "Quiet European rhythms.",
             "My nth happy hour.",
             "Herb unit -- a dynasty heir."]
for sentence in sentences:
    sesese = " ".join(["".join(
        [w if i & 1 or not w else baby_talk(w)  # keep non-words as is
         for i, w in enumerate(re.split("(\W+)", non_whitespace))])
        for non_whitespace in sentence.split()])
    print(u'"{}" → "{}"'.format(sentence, sesese))

输出

“你好世界”→“HH AH HH AH HH AH W ER W ER W ER” “安静的欧洲节奏。” → “K W AY K W AY K W AY Y UH Y UH Y UH R IH R IH R IH。” “我的第 n 个欢乐时光。” →“M ay m ay m ay eh eh eh hh ae hh ae ae ae aw aw aw aw aw aw aw。” “药草单位——一个王朝的继承人。” → “呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃呃地

注意：

nth, hour, herb, heir 以元音开头
European, unit 以辅音开头
y 在“节奏”中，“朝代”是元音

见：

【讨论】：

【解决方案3】：

def end_at_vowel(string):
    vowels = ["a","e","i","o","u"] # A list of vowels
    letters = []
    for l in string:
        letters += l
        if l in vowels:
            break
    return "".join(letters)
def bbt(string):
    string = string.split() #Split the string into a list
    return " ".join([end_at_vowel(w) * 3 for w in string])

这应该主要处理您描述的内容。看一下 cmets 和这两个函数，看看您是否可以破译正在发生的事情。

【讨论】：

【解决方案4】：

这是我推荐正则表达式的少数几次之一：

import re

FIRST_SYLLABLE = re.compile(r'.*?[aeiou]', re.IGNORECASE)

def baby_talk(sentence):
    words = []
    for word in sentence.split():
        match = FIRST_SYLLABLE.match(word)
        if match:
            words.append(match.group(0) * 3)
    return ' '.join(words)

print baby_talk('hello world')

一行一行：

import re

FIRST_SYLLABLE = re.compile(r'.*?[aeiou]', re.IGNORECASE)

这使得编译后的模式匹配任何内容，包括第一个元音。

def baby_talk(sentence):
    words = []
    for word in sentence.split():
        match = FIRST_SYLLABLE.match(word)

这会尝试将单词与我们编译的模式进行匹配。

        if match:
            words.append(match.group(0) * 3)

如果有效，match.group(0) 将包含匹配部分。给定“你好”，match.group(0) 将是“他”。将其复制三份并将其添加到输出单词列表中。

    return ' '.join(words)

返回由空格连接在一起的输出单词列表。

print baby_talk('hello world')

【讨论】：