【问题标题】:How can I get the contents of the "feedback" box from Google searches?如何从 Google 搜索中获取“反馈”框的内容?
【发布时间】:2016-05-23 02:48:03
【问题描述】:

当您在 Google 搜索中提出问题或请求对某个词的定义时,Google 会在 “反馈” 框中为您提供答案摘要。

例如,当您搜索 define apple 时,您会得到以下结果:

现在,我想明确一点,我不需要整个页面或其他结果,我只需要这个框:

如何在 Python 3 中使用 RequestsBeautiful Soup 模块来获取此 “反馈” 框的内容?

如果不可能,我可以使用 Google Search Api 获取 “反馈” 框的内容吗?

我在 SO 上找到了 similar question,但 OP 没有指定语言,没有答案,我担心这两个 cmets 已经过时了,因为这个问题是在将近 9 个月前提出的。

提前感谢您的时间和帮助。

【问题讨论】:

    标签: python-3.x beautifulsoup python-requests google-search google-search-api


    【解决方案1】:

    使用 requestsbs4 很容易完成,您只需使用类 lr_dct_ent 从 div 中提取文本

    import requests
    from bs4 import BeautifulSoup
    
    h = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}
    r = requests.get("https://www.google.ie/search?q=define+apple", headers=h).text
    soup = BeautifulSoup(r)
    
    print("\n".join(soup.select_one("div.lr_dct_ent").text.split(";")))
    

    正文在有序列表中,名词在 lr_dct_sf_h 类的 div 中:

    In [11]: r = requests.get("https://www.google.ie/search?q=define+apple", headers=h).text
    In [12]: soup = BeautifulSoup(r,"lxml")    
    In [13]: div = soup.select_one("div.lr_dct_ent")    
    In [14]: n_v = div.select_one("div.lr_dct_sf_h").text   
    In [15]: expl = [li.text for li in div.select("ol.lr_dct_sf_sens li")]    
    In [16]: print(n_v)
    noun
    
    In [17]: print("\n".join(expl))
    1. the round fruit of a tree of the rose family, which typically has thin green or red skin and crisp flesh.used in names of unrelated fruits or other plant growths that resemble apples in some way, e.g. custard apple, oak apple.
    used in names of unrelated fruits or other plant growths that resemble apples in some way, e.g. custard apple, oak apple.
    2. the tree bearing apples, with hard pale timber that is used in carpentry and to smoke food.
    

    【讨论】:

    • 非常感谢您的回答。我现在正在测试它。我已经使用了我的每日投票限制,但我会在大约 2 小时后回来给你应得的投票:D
    【解决方案2】:

    问题是个好主意

    程序可以启动 python3 defineterm.py 苹果

    #! /usr/bin/env python3.5
    # defineterm.py
    
    import requests
    from bs4 import BeautifulSoup
    import sys
    import html
    import codecs
    
    searchterm = ' '.join(sys.argv[1:])
    
    url = 'https://www.google.com/search?q=define+' + searchterm
    res = requests.get(url)
    try:
        res.raise_for_status()
    except Exception as exc:
        print('error while loading page occured: ' + str(exc))
    
    text = html.unescape(res.text)
    soup = BeautifulSoup(text, 'lxml')
    prettytext = soup.prettify()
    
    #next lines are for analysis (saving raw page), you can comment them
    frawpage = codecs.open('rawpage.txt', 'w', 'utf-8')
    frawpage.write(prettytext)
    frawpage.close()
    
    firsttag = soup.find('h3', class_="r")
    if firsttag != None:
        print(firsttag.getText())
        print()
    
    #second tag may be changed, so check it if not returns correct result. That might be situation for all searched tags.
    secondtag = soup.find('div', {'style': 'color:#666;padding:5px 0'})
    if secondtag != None:
        print(secondtag.getText())
        print()
    
    termtags = soup.findAll("li", {"style" : "list-style-type:decimal"})
    
    count = 0
    for tag in termtags:
        count += 1
        print( str(count)+'. ' + tag.getText())
        print()
    

    使脚本成为可执行文件

    然后在 ~/.bashrc
    可以添加这一行

    alias defterm="/data/Scrape/google/defineterm.py "
    

    为你的地方设置正确的路径

    然后执行

    source ~/.bashrc
    

    程序可以通过以下方式启动:

    defterm apple (or other term)
    

    【讨论】:

    • 非常感谢您的回答以及defterm 脚本。我现在正在测试它。我已经使用了我的每日投票限制,但我会在大约 2 小时后回来给你应得的投票:D 另外我想请求允许修改 defterm 脚本并将其再次放到 Github 上谢谢。
    • python脚本和defterm修改和发布在github上没问题
    【解决方案3】:

    最简单的方法是使用 SelectorGadget 获取此文本的 CSS 选择器。

    from bs4 import BeautifulSoup
    import requests, lxml
    
    headers = {
        'User-agent':
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
    }
    
    html = requests.get('https://www.google.de/search?q=define apple', headers=headers)
    soup = BeautifulSoup(html.text, 'lxml')
    
    syllables = soup.select_one('.frCXef span').text
    phonetic = soup.select_one('.g30o5d span span').text
    noun = soup.select_one('.h3TRxf span').text
    print(f'{syllables}\n{phonetic}\n{noun}')
    
    # Output:
    '''
    ap·ple
    ˈapəl
    the round fruit of a tree of the rose family, which typically has thin red or green skin and crisp flesh. Many varieties have been developed as dessert or cooking fruit or for making cider.
    '''
    

    或者,您可以使用来自 SerpApi 的 Google Direct Answer Box API 来做同样的事情。这是一个付费 API,可免费试用 5,000 次搜索。

    要集成的代码:

    from serpapi import GoogleSearch
    
    params = {
      "api_key": "YOUR_API_KEY",
      "engine": "google",
      "q": "define apple",
      "google_domain": "google.com",
    }
    
    search = GoogleSearch(params)
    results = search.get_dict()
    
    syllables = results['answer_box']['syllables']
    phonetic = results['answer_box']['phonetic']
    noun = results['answer_box']['definitions'][0] # array output
    print(f'{syllables}\n{phonetic}\n{noun}')
    
    # Output:
    '''
    ap·ple
    ˈapəl
    the round fruit of a tree of the rose family, which typically has thin red or green skin and crisp flesh. Many varieties have been developed as dessert or cooking fruit or for making cider.
    '''
    

    免责声明,我为 SerpApi 工作

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-10-08
      • 2012-08-24
      • 2015-02-28
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多