【问题标题】:Error in Scraping a Webpage in python在 python 中抓取网页时出错
【发布时间】:2014-07-13 23:38:46
【问题描述】:

我正在尝试抓取此页面: http://photo.net/nikon-camera-forum/00aoms 我在 Python 中使用 Requests 包,但是虽然页面很好,并且当我在浏览器中输入 url 时它会加载,但我得到这个错误作为 requests.get.text 的输出,我不知道有什么问题:

"photo.net Temporarily Unavailable 
photo.net 
Sun Jul 13 19:26:33 EDT 2014 — photo.net is down temporarily for 
system maintenance. Please visit us again later."

【问题讨论】:

    标签: python web-scraping python-requests


    【解决方案1】:

    该网站有一个简单的User-Agent 标头检查,provide it

    >>> import requests
    >>> response = requests.get('http://photo.net/nikon-camera-forum/00aoms', headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4)'})
    >>> print response.text
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
    <html xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://opengraphprotocol.org/schema/">
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <script type="text/javascript">var _sf_startpt=(new Date()).getTime()</script>
    
    <title>D800 wifi options? - Photo.net Nikon Forum</title>
    ...
    

    仅供参考,没有传递标题是什么:

    >>> response = requests.get('http://photo.net/nikon-camera-forum/00aoms')
    >>> print response.text
    <html><head><title>photo.net Temporarily Unavailable</title></head>
    <center><h2>photo.net </h2>
    <p><i>Sun Jul 13 19:46:33 EDT 2014</i>&nbsp;&mdash; photo.net is down temporarily for 
    system maintenance.  Please visit us again later.
    </center>
    </body>
    </html>
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-08-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-06-14
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多