【问题标题】:How to crawl question and answer of Google People Also Ask with Selenium and Python?如何使用 Selenium 和 Python 抓取 Google People Also Ask 的问题和答案?
【发布时间】:2021-12-31 01:02:23
【问题描述】:

我将此代码用于抓取问题和 Google People Also Ask 的 anwser。我想用它来为作家创造想法。

但是我不能准确地得到那个元素,这次我尝试使用完整的 Xpath,但是使用一些关键字它会改变完整的 Xpath 并且会出错。

可变顺序是我需要抓取的问题质量和数量。

def get_question(order):
    try:
        question = driver.find_element(By.XPATH, '/html/body/div[7]/div/div[9]/div[1]/div/div[2]/div[2]/div/div/div[2]/div/div/div[1]/div[' + str(order + 1) + ']/div[2]/div/div[1]/div[2]/span').text
    except:
        question = ''
        print('Xpath question wrong')
    return question

def get_answer(order):
    try:
        answer = driver.find_element(By.XPATH, '/html/body/div[7]/div/div[9]/div[1]/div/div[2]/div[2]/div/div/div[2]/div/div/div[1]/div[' + str(order + 1) + ']/div[2]').get_attribute('outerHTML')
    except:
        answer = ''
        print('Xpath answer wrong')

然后我会使用循环进行一一问答,像这样:

if __name__ == '__main__':
    quality_question = 5
    order = 0
    while order <= quality_question:
        question = get_question(order)
        answer = get_answer(order)

有任何想法在所有情况下获得准确的问题和答案。当结果的结构不同时,完整的 Xpath 将发生变化。 感谢您的支持。有什么想法吗?

这是我需要抓取的 Google 网址,您可以尝试使用搜索查询“如何制作面包店”

Link

【问题讨论】:

  • 您能分享/html/body/div[7]/div/div[9]/div[1]/div/div[2]/div[2]/div/div/div[2]/div/div/div[1]/div[' + str(order + 1) + ']/div[2]/div/div[1]/div[2]/span/html/body/div[7]/div/div[9]/div[1]/div/div[2]/div[2]/div/div/div[2]/div/div/div[1]/div[' + str(order + 1) + ']/div[2] 的HTML 或直接页面网址吗?
  • 我不希望会有一个 blank 问题和一个 blank 答案。 find_element() NSE 失败。你想做什么?
  • @cruisepandey 你好!我正在更新帖子。我尝试查询谷歌搜索是:如何制作面包店
  • @DebanjanB 我尝试这样做是因为我需要抓取很多查询。但我希望它与良好的元素一起使用。我需要有关设置查找元素的帮助。
  • 直接页面网址示例:google.com/…

标签: python selenium xpath web-crawler google-crawlers


【解决方案1】:

你应该先构造一个xpath

//span[text()='People also ask']/../following-sibling::div/descendant::div[@data-hveid and @class and @jsname and @data-ved]

只是弄清楚存在多少问题。

一旦你这样做了,接下来的工作就是点击它,然后检索答案。

代码:

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(30)
wait = WebDriverWait(driver, 30)

driver.get("https://www.google.com/search?q=How%20to%20make%20bakery%3F&source=hp&ei=j0aZYYjRAvja2roPrcWcyAU&iflsig=ALs-wAMAAAAAYZlUn4NMUPjfIpQmrXSmjIDnaWjJXWIJ&ved=0ahUKEwjI1JDn0Kf0AhV4rVYBHa0iB1kQ4dUDCAc&uact=5&oq=How%20to%20make%20bakery%3F&gs_lcp=Cgdnd3Mtd2l6EAMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBNQAFgAYJMDaABwAHgAgAF-iAF-kgEDMC4xmAEAoAECoAEB&sclient=gws-wiz")

all_questions = driver.find_elements(By.XPATH, "//span[text()='People also ask']/../following-sibling::div/descendant::div[@data-hveid and @class and @jsname and @data-ved]")
print(len(all_questions))

j = 1
for question in all_questions:
    time.sleep(1)
    ele = driver.find_element(By.XPATH, f"(//span[text()='People also ask']/../following-sibling::div/descendant::div[@data-hveid and @class and @jsname and @data-ved])[{j}]")
    j = j + 2
    ele.click()
    time.sleep(1)
    answer = ele.find_element(By.XPATH, ".//../following-sibling::div").get_attribute('innerText')
    print(answer)
    print('--------------')

进口:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

输出:

6
The most profitable bakeries have a gross profit margin of 9%, while the average is much lower at 4%. The growth of profitable bakeries can be as high as 20% year over year. While a large number of bakeries never reach the break-even, a handful of them can even have a net profit margin as high as 12%.06-Jul-2020

How Much Do Bakery Owners Make? | Restaurant Accounting
https://restaurantaccounting.net › how-much-do-bakery-o...
Search for: Do bakeries make money?
--------------
Home Bakery Business - How to Start
Decide on the goods to bake. ...
Plan your kitchen space. ...
Get a permit. ...
Talk to a tax agent. ...
Set appropriate prices. ...
Start baking and selling.
16-Feb-2015

Home Bakery Business - How to Start | BakeCalc
http://www.bakecalc.com › blog › how-to-start-a-home-b...
Search for: How do I start a small baking business from home?
--------------
Follow the below-mentioned steps to open a successful bakery business in India in 2021:
Create A Bakery Business Plan. ...
Choose A Location For Your Bakery Business. ...
Get All Licenses Required To Open A Bakery Business In India. ...
Get Manpower Required To Open A Bakery. ...
Buy Equipment Needed To Start A Bakery Business.
More items...

A Detailed Guide On How To Start A Bakery Business In India
https://www.posist.com › Home › Resources
Search for: How do I start my own bakery?
--------------
Baking is a profitable business. ... And so long as you exercise good business practices and maintain the quality of your products, the bakery is sure to give you a good return. Like all business ventures, however, a bakery business requires that you prepare well for it.11-Nov-2015

6 key ingredients to start a bakery business
https://business.inquirer.net › 6-key-ingredients-to-start-a-...
Search for: Is bakery a good business?
--------------
Whatever your reason, investing in a small bakery can be a benefit for a community and a boon for your wallet. Bakeries are booming and if you can get in on the ground floor of a good one, the opportunity can be very profitable.

Investing in a Small Bakery Business
https://smallbusiness.chron.com › investing-small-bakery-...
Search for: Is a bakery a good investment?
--------------
When respondents were asked what are the top bakery items they produce, cookies rank first at 89 percent, followed by cakes at 79 percent, cupcakes 73 percent, muffins/scones 68 percent, cinnamon rolls 65 percent, and bread 57 percent.06-Sep-2017

The Most Profitable Products at Bakeries - Bake Magazine
https://www.bakemag.com › articles › 5668-the-most-prof...
Search for: What are the most popular bakery items?
--------------

Process finished with exit code 0

【讨论】:

猜你喜欢
  • 1970-01-01
  • 2023-03-13
  • 1970-01-01
  • 2013-09-30
  • 1970-01-01
  • 2021-04-03
  • 1970-01-01
  • 2020-01-10
  • 2022-01-16
相关资源
最近更新 更多