Python - 从 URL 中抓取标题，但 URL 来自用户输入答案

【问题标题】：Python - scraping a headline from a URL, but the URL comes from user inputPython - 从 URL 中抓取标题，但 URL 来自用户输入
【发布时间】：2017-08-04 15:13:10
【问题描述】：

我有一个 Python 代码，它返回 BBC 新闻故事的标题和第一段，但目前我必须提供链接。代码如下：

from lxml import html
import requests

response = requests.get('http://www.bbc.co.uk/news/business-40660355')

if (response.status_code == 200):

    pagehtml = html.fromstring(response.text)

    news1 = pagehtml.xpath('//h1[@class="story-body__h1"]/text()')
    news2 = pagehtml.xpath('//p[@class="story-body__introduction"]/text()')
print("\n".join(news1) + " (BBC News)")
print("\n".join(news2))

但此代码依赖于我将 URL 处理到 requests.get('') 位中。

这是我尝试更改它以允许用户输入：

from lxml import html
import requests

response = input()

if (response.status_code == 200):

    pagehtml = html.fromstring(response.text)

    news1 = pagehtml.xpath('//h1[@class="story-body__h1"]/text()')
    news2 = pagehtml.xpath('//p[@class="story-body__introduction"]/text()')
print("\n".join(news1) + " (BBC News)")
print("\n".join(news2))

但不幸的是，它返回了以下错误：

http://www.bbc.co.uk/news/world-europe-40825668
Traceback (most recent call last):
  File "myscript2.py", line 5, in <module>
    response = input()
  File "<string>", line 1
    http://www.bbc.co.uk/news/world-europe-40825668
        ^
SyntaxError: invalid syntax

我想知道是否有人知道通过输入而不是依赖用户更改代码来从 URL 获取信息的最佳方法来使此代码工作。

谢谢

【问题讨论】：

除非你使用 python3，否则你想要raw_input。
另外，我想说你想要一些类似的东西：response = requests.get(input())
嗨@jordanm，谢谢，我正在使用 Python 3.5

标签： python input screen-scraping scrape

【解决方案1】：

我不知道“回答您自己的问题”是否常见，但我已经解决了。我改用 raw_input，并替换了我的 input() ，但用：

my_url = raw_input()
response = requests.get(my_url)

不确定是否有其他人会看到这个，但希望它有所帮助！

【讨论】：

【解决方案2】：

这就是您要查找的内容：

from lxml import html
import requests

url = raw_input('Enter a URL: ')
response = requests.get(url)

if (response.status_code == 200):
    pagehtml = html.fromstring(response.text)

    news1 = pagehtml.xpath('//h1[@class="story-body__h1"]/text()')
    news2 = pagehtml.xpath('//p[@class="story-body__introduction"]/text()')
print("\n".join(news1) + " (BBC News)")
print("\n".join(news2))

要将结果放入 .txt 文件中，请使用以下命令：

with open('fileName.txt', 'a') as output:
    output.write(news1 + '\n')

【讨论】：

谢谢你 Anoop - 在玩弄它的过程中，我几乎得到了你所拥有的东西 - 我之前没想过要放弦，所以我很感激。我给你投了赞成票，但因为我的代表显然少于 15 名，所以这不算！