【问题标题】:Why the working code is not giving any outputs anymore?为什么工作代码不再提供任何输出?
【发布时间】:2019-12-23 13:21:15
【问题描述】:

我从答案How to use BeautifulSoup to parse google search results in Python中获取了下面的代码

它曾经在我的 Ubuntu 16.04 上工作,我同时拥有 Python 2 和 3。

代码如下:

import urllib
from bs4 import BeautifulSoup
import requests
import webbrowser

text = 'My query goes here'
text = urllib.parse.quote_plus(text)

url = 'https://google.com/search?q=' + text

response = requests.get(url)

#with open('output.html', 'wb') as f: 
#    f.write(response.content)
#webbrowser.open('output.html')

soup = BeautifulSoup(response.text, 'lxml')
for g in soup.find_all(class_='g'):
    print(g.text)
    print('-----')

它执行但什么也不打印。这个问题对我来说真的很可疑。任何帮助将不胜感激。

【问题讨论】:

    标签: python python-3.x beautifulsoup


    【解决方案1】:

    问题在于,当您未在标头中指定 User-Agent 时,Google 会提供不同的 HTML。要指定自定义标头,请将带有User-Agent 的dict 添加到请求中的headers= 参数:

    import urllib
    from bs4 import BeautifulSoup
    import requests
    import webbrowser
    
    text = 'My query goes here'
    text = urllib.parse.quote_plus(text)
    
    url = 'https://google.com/search?q=' + text
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'
    }
    
    response = requests.get(url, headers=headers)
    
    soup = BeautifulSoup(response.text, 'lxml')
    for g in soup.find_all(class_='g'):
        print(g.text)
        print('-----')
    

    打印:

    How to Write the Perfect Query Letter - Query Letter Examplehttps://www.writersdigest.com/.../how-to-write-the-perfect-qu...PuhverdatudTõlgi see leht21. märts 2016 - A literary agent shares a real-life novel pitch that ultimately led to a book deal—and shows you how to query your own work with success.
    -----
    Inimesed küsivad ka järgmistHow do you start a query letter?What should be included in a query letter?How do you end a query in an email?How long is a query letter?Tagasiside
    -----
    
    ...and so on.
    

    【讨论】:

    • 另外,您可以使用requests.utils.default_headers() 获取完整的默认标题集。
    • 非常感谢@Andrej Kesely 的合作。我也感谢 @Joshua Nixon 的其他建议。
    【解决方案2】:

    详细了解user-agentrequest headers

    基本上,user-agent 让我们识别浏览器、它的版本号和它的主机操作系统,在 Web 上下文中代表一个人(浏览器),让服务器和网络对等方识别它是否是机器人。

    查看SelectorGadget Chrome 扩展程序,通过单击浏览器中所需的元素来获取CSS 选择器。 CSS 选择器reference.

    为了使它看起来更好,您可以将 URL params 作为更易读的dict() 传递,requests 会自动为您完成所有操作(将 user-agent 添加到 headers 中也是如此):

    params = {
      "q": "My query goes here"
    }
    
    requests.get("YOUR_URL", params=params)
    

    代码和full example in the online IDE

    from bs4 import BeautifulSoup
    import requests
    
    headers = {
        'User-agent':
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
    }
    
    params = {
      "q": "My query goes here"
    }
    
    html = requests.get('https://www.google.com/search', headers=headers, params=params)
    soup = BeautifulSoup(html.text, 'lxml')
    
    for result in soup.select('.tF2Cxc'):
      title = result.select_one('.DKV0Md').text
      print(title)
    
    -------
    '''
    MySQL 8.0 Reference Manual :: 3.2 Entering Queries
    Google Sheets Query function: Learn the most powerful ...
    Understanding MySQL Queries with Explain - Exoscale
    An Introductory SQL Tutorial: How to Write Simple Queries
    Writing Subqueries in SQL | Advanced SQL - Mode
    Getting IO and time statistics for SQL Server queries
    How to store MySQL query results in another Table? - Stack ...
    More efficient SQL with query planning and optimization (article)
    Here are my Data Files. Here are my Queries. Where ... - CIDR
    Slow in the Application, Fast in SSMS? - Erland Sommarskog
    '''
    

    或者,您可以使用来自 SerpApi 的 Google Organic Results API 来做同样的事情。这是一个带有免费计划的付费 API。

    您的情况不同的是,您只需要从 JSON 字符串中提取所需的数据,而不是弄清楚如何从 Google 提取、维护或绕过块。

    要集成的代码:

    import os
    from serpapi import GoogleSearch
    
    params = {
        "engine": "google",
        "q": "My query goes here",
        "hl": "en",
        "api_key": os.getenv("API_KEY"),
    }
    
    search = GoogleSearch(params)
    results = search.get_dict()
    
    for result in results["organic_results"]:
      print(result['title'])
    
    --------
    '''
    MySQL 8.0 Reference Manual :: 3.2 Entering Queries
    Google Sheets Query function: Learn the most powerful ...
    Understanding MySQL Queries with Explain - Exoscale
    An Introductory SQL Tutorial: How to Write Simple Queries
    Writing Subqueries in SQL | Advanced SQL - Mode
    Getting IO and time statistics for SQL Server queries
    How to store MySQL query results in another Table? - Stack ...
    More efficient SQL with query planning and optimization (article)
    Here are my Data Files. Here are my Queries. Where ... - CIDR
    Slow in the Application, Fast in SSMS? - Erland Sommarskog
    '''
    

    免责声明,我为 SerpApi 工作。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-10-28
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多