【问题标题】:Python3: writing article in own wordsPython3:用自己的话写文章
【发布时间】:2017-10-22 03:25:12
【问题描述】:

我正在尝试从新闻文章中提取摘要。这是我到目前为止所尝试的:

>>> from newspaper import Article
>>> url = 'http://abcnews.go.com/International/wireStory/north-korea-ready-deploy-mass-produce-missile-47552675'
>>> article = Article(url)
>>> article.download()
>>> article.parse()
>>> article.nlp()
>>> article.keywords
['ready', 'north', 'test', 'missiles', 'deploy', 'tested', 'korea', 'missile', 'launch', 'nuclear', 'capable', 'media', 'massproduce']
>>> article.summary
'North Korea says it\'s ready to deploy and start mass-producing a new medium-range missile capable of reaching Japan and major U.S. military bases there following a test launch it claims confirmed the missile\'s combat readiness and is an "answer" to U.S. President Donald Trump\'s policies.\nPyongyang\'s often-stated goal is to perfect a nuclear warhead that it can put on a missile capable of hitting Washington or other U.S. cities.\nAt the request of diplomats from the U.S., Japan and South Korea, a United Nations\' Security Council consultation on the missile test will take place Tuesday.\nNorth Korea a week earlier had successfully tested a new midrange missile — the Hwasong 12 — that it said could carry a heavy nuclear warhead.\nExperts said that rocket flew higher and for a longer time than any other missile previously tested by North Korea and represents another big advance toward a viable ICBM.'

我已经看到上一段中生成的摘要完全来自新闻文章本身。而我想实现类似人类的总结(用自己的话或旋转内容或任何东西,但应该是相关的)。

请给我建议或建议我需要做什么,以使我的代码完全符合我的要求?

【问题讨论】:

  • 这是一项艰巨的任务。我怀疑是否有现成的 Python 库可以满足您的需求。
  • 我同意,我标记了要转移到 ai.stackexchange.com 的问题

标签: python-3.x nlp summarization


【解决方案1】:

sumy 确实提供了几种总结英文文本的方法。大多数(如果不是全部)这些算法将从输入文档中提取句子。基于这些句子,您可以对它们进行后处理以拆分和/或合并句子并使用同义词。

除此之外,这个话题在工程领域仍然不多,而是在研究领域。试试AI StackExchange

【讨论】:

  • 但它没有给出摘要或我自己的话的输出。它更像是基于提取的摘要库。
  • 对不起,我忘记了“自定义词”的事情。我更新了答案。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-12-17
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多