【发布时间】:2022-06-18 18:05:51
【问题描述】:
我正在尝试使用 puppeteer 来获取一些乒乓球投注赔率。但是,我在尝试加载 Setka Cup 乒乓球比赛时遇到了问题。
这个杯子和其他几个乒乓球杯子没有为我加载一条消息(粗略翻译):抱歉,此页面不再可用。投注已结束或已暂停。
我已经能够加载其他一些杯赛、其他运动的赔率(虽然没有使用无头模式),我认为这不是基于位置的错误,因为它通过我的常规 Chrome 浏览器加载并且两个浏览器似乎都发送了相同的信息(使用 Chrome 开发工具中的网络标签捕获)。
我发现并尝试了很多其他工具/技巧,但没有一个能解决这个问题。
是否有专门针对这项运动/杯赛的额外刮擦/机器人预防措施?希望我没有遗漏任何清晰的东西,因为我刚刚开始做这一切。谢谢
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const randomUA = require('modern-random-ua');
const stealth = StealthPlugin();
stealth.enabledEvasions.delete('chrome.runtime')
stealth.enabledEvasions.delete('iframe.contentWindow')
puppeteer.use(stealth);
const VIEWPORT = {width: 1200, height: 900};
const BET365 = 'https://www.bet365.com/#/AS/B92/';
function delay(time) {
return new Promise(function(resolve) {
setTimeout(resolve, time)
});
}
const escapeXpathString = str => {
const splitedQuotes = str.replace(/'/g, `', "'", '`);
return `concat('${splitedQuotes}', '')`;
};
const clickByText = async (page, text) => {
const escapedText = escapeXpathString(text);
const linkHandlers = await page.$x(`//span[contains(text(), ${escapedText})]`);
if (linkHandlers.length > 0) {
await linkHandlers[0].click();
} else {
throw new Error(`Link not found: ${text}`);
}
};
(async () => {
const browser = await puppeteer.launch({
headless: false,
args: [
"--disable-infobars",
"--no-sandbox",
"--disable-blink-features=AutomationControlled",
],
ignoreDefaultArgs: ["--enable-automation"],
});
const page = (await browser.pages())[0];
await page.evaluateOnNewDocument(() => {
delete navigator.__proto__.webdriver;
Object.defineProperty(navigator, 'maxTouchPoints', {
get() {
return 1;
},
});
navigator.permissions.query = i => ({then: f => f({state: "prompt", onchange: null})});
});
await page.viewport(VIEWPORT);
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36');
// await page.setUserAgent(randomUA.generate());
const client = await page.target().createCDPSession()
await client.send('Network.clearBrowserCookies')
await page.goto(BET365, { waitUntil: 'networkidle2' });
await page.waitForTimeout(5000);
await clickByText(page, `Setka Cup`);
await page.waitForTimeout(2230);
await page.screenshot({path: '1.png'});
console.log("screenshot 1");
await browser.close();
})()
【问题讨论】:
标签: javascript node.js puppeteer screen-scraping