【发布时间】:2014-06-17 13:56:23
【问题描述】:
我需要一些 Python 代码帮助来计算单词中辅音的频率。考虑以下示例输入:
"There is no new thing under the sun."
那么所需的输出将是:
1 : 2
2 : 3
3 : 2
4 : 1
因为有2个1个辅音的单词,3个2个辅音的单词,2个3个辅音的单词和1个4个辅音的单词。
以下代码执行类似的工作,但它不是计算辅音,而是计算文本文件中整个单词的频率。我知道只有一点点变化会深入到这个词中(我认为)。
def freqCounter(file1, file2):
freq_dict = {}
dict_static = {2:0, 3:0, 5:0}
# get rid of punctuation
punctuation = re.compile(r'[.?!,"\':;]') # use re.compile() function to convert string into a RegexObject.
try:
with open(file1, "r") as infile, open(file2, "r") as infile2: # open two files at once
text1 = infile.read() # read the file
text2 = infile2.read()
joined = " ".join((text1, text2))
for word in joined.lower().split():
#remove punctuation mark
word = punctuation.sub("", word)
#print word
l = len(word) # assign l tp be the word's length
# if corresponding word's length not found in dict
if l not in freq_dict:
freq_dict[l] = 0 # assign the dict key (the length of word) to value = 0
freq_dict[l] += 1 # otherwise, increase the value by 1
except IOError as e: # exception catch for error while reading the file
print 'Operation failed: %s' % e.strerror
return freq_dict # return the dictionary
任何帮助将不胜感激!
【问题讨论】:
标签: python python-2.7 amazon-web-services mapreduce