【发布时间】:2015-01-19 05:10:38
【问题描述】:
嗨,我是 python 编程的新手,请帮助我创建一个函数,该函数将文本文件作为参数并创建一个单词列表,从而删除所有标点符号和列表“拆分”双空格。我的意思是该列表应该在文本文件中的每个双空格出现时创建存在。
这是我的功能:
def tokenize(document):
file = open("document.txt","r+").read()
print re.findall(r'\w+', file)
输入文本文件有如下字符串:
What's did the little boy tell the game warden? His dad was in the kitchen poaching eggs!
注意:守望者后有双倍间距?在他之前
我的函数给了我这样的输出
['what','s','did','the','little','boy','tell','the','game','warden','His','dad','was','in','the','kitchen','poaching','eggs']
期望的输出:
[['what','s','did','the','little','boy','tell','the','game','warden'],
['His','dad','was','in','the','kitchen','poaching','eggs']]
【问题讨论】:
标签: python list split tokenize