【问题标题】:Search for multiple words in a list using python使用python在列表中搜索多个单词
【发布时间】:2020-04-17 15:19:30
【问题描述】:

我目前正在开发我的第一个 python 项目。目标是能够通过从我生成的单词列表中搜索和打印包含特定单词的句子来总结网页的信息。例如,以下(大)列表包含我在商业网站上使用 cewl 生成的“商业关键术语”;

business_list = ['business', 'marketing', 'market', 'price', 'management', 'terms', 'product', 'research', 'organisation', 'external', 'operations', 'organisations', 'tools', 'people', 'sales', 'growth', 'quality', 'resources', 'revenue', 'account', 'value', 'process', 'level', 'stakeholders', 'structure', 'company', 'accounts', 'development', 'personal', 'corporate', 'functions', 'products', 'activity', 'demand', 'share', 'services', 'communication', 'period', 'example', 'total', 'decision', 'companies', 'service', 'working', 'businesses', 'amount', 'number', 'scale', 'means', 'needs', 'customers', 'competition', 'brand', 'image', 'strategies', 'consumer', 'based', 'policy', 'increase', 'could', 'industry', 'manufacture', 'assets', 'social', 'sector', 'strategy', 'markets', 'information', 'benefits', 'selling', 'decisions', 'performance', 'training', 'customer', 'purchase', 'person', 'rates', 'examples', 'strategic', 'determine', 'matrix', 'focus', 'goals', 'individual', 'potential', 'managers', 'important', 'achieve', 'influence', 'impact', 'definition', 'employees', 'knowledge', 'economies', 'skills', 'buying', 'competitive', 'specific', 'ability', 'provide', 'activities', 'improve', 'productivity', 'action', 'power', 'capital', 'related', 'target', 'critical', 'stage', 'opportunities', 'section', 'system', 'review', 'effective', 'stock', 'technology', 'relationship', 'plans', 'opportunity', 'leader', 'niche', 'success', 'stages', 'manager', 'venture', 'trends', 'media', 'state', 'negotiation', 'network', 'successful', 'teams', 'offer', 'generate', 'contract', 'systems', 'manage', 'relevant', 'published', 'criteria', 'sellers', 'offers', 'seller', 'campaigns', 'economy', 'buyers', 'everyone', 'medium', 'valuable', 'model', 'enterprise', 'partnerships', 'buyer', 'compensation', 'partners', 'leaders', 'build', 'commission', 'engage', 'clients', 'partner', 'quota', 'focused', 'modern', 'career', 'executive', 'qualified', 'tactics', 'supplier', 'investors', 'entrepreneurs', 'financing', 'commercial', 'finances', 'entrepreneurial', 'entrepreneur', 'reports', 'interview', 'ansoff']

下面的程序允许我从我指定的 URL 中复制所有文本并将其组织成一个列表,其中的元素由句子分隔;

from bs4 import BeautifulSoup
import urllib.request as ul

url = input("Enter URL: ")
html = ul.urlopen(url).read()

soup = BeautifulSoup(html, 'lxml')
for script in soup(["script", "style"]):
    script.decompose()
strips = list(soup.stripped_strings)
# Joining list to form single text
text = " ".join(strips)
text = text.lower()
# Replacing substitutes of '.'
for i in range(len(text)):
    if text[i] in "?!:;":
        text = text.replace(text[i], ".")
# Splitting text by sentences
sentences = text.split(".")

我目前的目标是让程序打印包含上述一个(或多个)关键术语的所有句子,但是我一次只成功使用一个单词;

# Word to search for in the text
word_search = input("Enter word: ")
word_search = word_search.lower()
sentences_with_word = []
for x in sentences:
               if x.count(word_search)>0:
                          sentences_with_word.append(x)
# Separating sentences into separate lines
sentence_text = "\n\n".join(sentences_with_word)
print(sentence_text)

有人可以演示如何同时为整个列表实现这一点吗?谢谢。

编辑

正如MachineLearner 所建议的,这里是单个单词的输出示例。如果我使用wikipedia's page on marketing 作为 URL 并选择单词 'ma​​rketing' 作为 'word_search' 的输入,这是生成的输出的一部分(尽管整个输出几乎有 600 行长) ;

marketing mix the marketing mix is a foundational tool used to guide decision making in marketing

 the marketing mix represents the basic tools which marketers can use to bring their products or services to market

 they are the foundation of managerial marketing and the marketing plan typically devotes a section to the marketing mix

 the 4ps [ edit ] the traditional marketing mix refers to four broad levels of marketing decision

【问题讨论】:

  • 您能否提供一些示例输入和所需的示例输出?
  • 我已经把它包括在上面了
  • 请注意,您可以将if sentence.count(word) > 0 替换为if word in sentence,它的意图更清晰,速度可能会稍微快一些。

标签: python beautifulsoup urllib


【解决方案1】:

使用双循环检查列表中包含的多个单词:

for sentence in sentences:
  for word in words:
    if sentence.count(word) > 0:
      output.append(sentence)
      # Do not forget to break the second loop, else
      # you'll end up with multiple times the same sentence
      # in the output array if the sentence contains 
      # multiple words
      break

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-05-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-10-14
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多