【问题标题】:How to scrape Google Trends URL?如何抓取谷歌趋势 URL?
【发布时间】:2022-11-02 19:23:29
【问题描述】:
我在 Node JS 的帮助下抓取 Google Trends URL,但每次它都返回一个 429 错误代码,但在邮递员上工作正常,标题与我传入的代码相同。
这是我的代码:
const unirest = require("unirest")
const getData = async() => {
let url = "https://trends.google.com/trends/api/explore?tz=420&req=%7B%22comparisonItem%22%3A%5B%7B%22keyword%22%3A%22audi%22%2C%22geo%22%3A%22%22%2C%22time%22%3A%22today+12-m%22%7D%2C%7B%22keyword%22%3A%22mercedes%22%2C%22geo%22%3A%22%22%2C%22time%22%3A%22today+12-m%22%7D%2C%7B%22keyword%22%3A%22bmw%22%2C%22geo%22%3A%22%22%2C%22time%22%3A%22today+12-m%22%7D%5D%2C%22category%22%3A0%2C%22property%22%3A%22%22%7D"
const response = await unirest
.get(url)
.headers({
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
})
console.log(response.body)
}
getData();
【问题讨论】:
标签:
node.js
web-scraping
unirest
【解决方案1】:
谷歌是一个很难抓取的平台。除了它的机器人预防系统外,它还经常使用 A/B 测试,这会改变布局并需要对网络爬虫进行额外调整。作为WebScrapingAPI 的工程师,我可以向您推荐我们的Google Trends Scraper。以下是它的工作原理:
const axios=require('axios');
const API_KEY = '<YOUR_API_KEY>'
const QUERY = 'test'
const SCRAPER = `https://api.searchdata.io/v1?engine=google_trends&api_key=${API_KEY}&q=${encodeURI(QUERY)}`
const scrape = async () => {
try {
let response = await axios.get(SCRAPER)
console.log(response.data)
} catch (e) {
console.log(e)
}
}
scrape()
或者,您可以使用 Puppeteer 呈现页面,但您可能会被阻止。这是一个脚本:
const puppeteer = require("puppeteer")
const cheerio=require('cheerio');
const main = async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
acceptInsecureCerts: true,
})
const page = await browser.newPage()
await page.goto('https://trends.google.com/trends/api/explore?tz=420&req=%7B%22comparisonItem%22%3A%5B%7B%22keyword%22%3A%22audi%22%2C%22geo%22%3A%22%22%2C%22time%22%3A%22today+12-m%22%7D%2C%7B%22keyword%22%3A%22mercedes%22%2C%22geo%22%3A%22%22%2C%22time%22%3A%22today+12-m%22%7D%2C%7B%22keyword%22%3A%22bmw%22%2C%22geo%22%3A%22%22%2C%22time%22%3A%22today+12-m%22%7D%5D%2C%22category%22%3A0%2C%22property%22%3A%22%22%7D')
const html = await page.content();
console.log(html);
}
main()