【问题标题】:How to find the number of times a string is found in a saved html file?如何查找在保存的 html 文件中找到字符串的次数?
【发布时间】:2020-04-24 15:40:53
【问题描述】:

我有一个保存的 html 文件,我试图从中查找特定字符串的找到次数。例如:

string= 'Beautiful days'
text = "those beautiful days were unforgettable. I wish every day was a beautiful day"

预期输出 = 2(美好的日子,美好的日子”

尝试以下方法: 我尝试使用 spacy 但做不到。谁能告诉我这个的逻辑?

【问题讨论】:

  • beautiful daysbeautiful day 不一样。
  • @DirtyBit 我正在尝试找到最接近和完全匹配
  • 酷。在这种情况下你可以使用lower()

标签: python string nlp spacy


【解决方案1】:

您可以使用词干分析器。这可能有点矫枉过正,但它也会找到最接近的词

import nltk
nltk.download('punkt')

from nltk.stem import PorterStemmer 
from nltk.tokenize import word_tokenize 

ps = PorterStemmer() 

sentence = "those beautiful days were unforgettable. I wish every day was a beautiful day"
words = word_tokenize(sentence) 
sentence = ""
for w in words: 
    sentence += (ps.stem(w.lower()) + " ")
query = 'Beautiful days' 
words = word_tokenize(query) 
query = ""
for w in words: 
    query += (ps.stem(w.lower()) + " ")
print(sentence)
print(query)
print(sentence.count(query))
those beauti day were unforgett . i wish everi day wa a beauti day 
beauti day 
2

【讨论】:

    【解决方案2】:

    你也可以使用:

    import re
    
    with open("count_string_in_file.txt") as f:
        html = f.read()
    
    to_match = "beautiful day"
    matches = re.findall(to_match, html, re.IGNORECASE)
    print(len(matches), matches)
    # 2 ['beautiful day', 'beautiful day']
    

    Demo

    【讨论】:

      猜你喜欢
      • 2020-01-22
      • 2017-04-16
      • 1970-01-01
      • 2018-02-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-05-12
      相关资源
      最近更新 更多