【问题标题】:MapReduce to count the frequency of the number consonants in words from a text fileMapReduce 计算文本文件中单词中数字辅音的频率
【发布时间】:2014-06-17 13:56:23
【问题描述】:

我需要一些 Python 代码帮助来计算单词中辅音的频率。考虑以下示例输入:

"There is no new thing under the sun."

那么所需的输出将是:

1 : 2
2 : 3
3 : 2
4 : 1

因为有2个1个辅音的单词,3个2个辅音的单词,2个3个辅音的单词和1个4个辅音的单词。

以下代码执行类似的工作,但它不是计算辅音,而是计算文本文件中整个单词的频率。我知道只有一点点变化会深入到这个词中(我认为)。

def freqCounter(file1, file2):
    freq_dict = {}
    dict_static = {2:0, 3:0, 5:0}
    # get rid of punctuation
    punctuation = re.compile(r'[.?!,"\':;]') # use re.compile() function to convert string into a RegexObject. 
    try:
        with open(file1, "r") as infile, open(file2, "r") as infile2: # open two files at once
            text1 = infile.read()   # read the file
            text2 = infile2.read()
            joined = " ".join((text1, text2)) 
            for word in joined.lower().split(): 
                #remove punctuation mark
                word = punctuation.sub("", word)
                #print word
                l = len(word) # assign l tp be the word's length
                # if corresponding word's length not found in dict
                if l not in freq_dict:
                    freq_dict[l] = 0 # assign the dict key (the length of word) to value = 0
                freq_dict[l] += 1 # otherwise, increase the value by 1
    except IOError as e:     # exception catch for error while reading the file
        print 'Operation failed: %s' % e.strerror
    return freq_dict # return the dictionary

任何帮助将不胜感激!

【问题讨论】:

    标签: python python-2.7 amazon-web-services mapreduce


    【解决方案1】:

    我会尝试一种更简单的方法:

    from collections import Counter
    words = 'There is no new thing under the sun.'
    words = words.replace('a', '').replace('e', '').replace('i', '').replace('o', '').replace('u', '')  # you are welcome to replace this with a smart regex
    
    # Now words have no more vowels i.e. only consonants 
    word_lengths = map(len, words.split(' '))
    c = Counter(word_lengths)
    
    freq_dict = dict(Counter(c))
    

    【讨论】:

      【解决方案2】:

      一个简单的解决方案

      def freqCounter(_str):
          _txt=_str.split()
          freq_dict={}
          for word in _txt:
              c=0
              for letter in word:
                 if letter not in "aeiou.,:;!?[]\"`()'":
                     c+=1
              freq_dict[c]=freq_dict.get(c,0)+ 1
          return freq_dict
      
      txt = "There is no new thing under the sun."
      table=freqCounter(txt)
      for k in table:
          print( k, ":", table[k])
      

      【讨论】:

        【解决方案3】:

        这个怎么样?

        with open('conts.txt', 'w') as fh:
            fh.write('oh my god becky look at her butt it is soooo big')
        
        consonants = "bcdfghjklmnpqrstvwxyz"
        def count_cons(_file):
            results = {}
            with open(_file, 'r') as fh:
                for line in fh:
                    for word in line.split(' '):
                        conts = sum([1 if letter in consonants else 0 for letter in word])
                        if conts in results:
                            results[conts] += 1
                        else:
                            results[conts] = 1
            return results
        
        print count_cons('conts.txt')
        

        错过了结果

        {1: 5, 2: 5, 3: 1, 4: 1}
        [Finished in 0.0s]
        

        【讨论】:

          猜你喜欢
          • 2011-05-30
          • 1970-01-01
          • 1970-01-01
          • 2016-01-23
          • 2015-06-14
          • 2017-03-29
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多