需要 javascript react 中给定 URL 的完整呈现文本答案

【问题标题】：Need full rendered text of a given URL in javascript react需要 javascript react 中给定 URL 的完整呈现文本
【发布时间】：2021-05-31 08:22:53
【问题描述】：

我有一个奇怪的要求。我正在创建一个平台，艺术家将在其中创建个人资料并展示他们的作品。现在大多数艺术家都有他们以前的唱片或其他一些网站平台，如songkick或他们自己的网站。现在我的客户要求是用户将提供他们过去数据的url或其他东西，系统将通过该网站并根据某些字段获取内容。例如事件数据包含事件/位置/日期

我现在正在使用 AWS Comprehend 分析数据，我陷入的部分是获取整个网站的数据/文本。

假设我有一个https://www.something.com 的网址。我想去这个网站并在里面获取所有呈现的文本。如果这是不道德的，或者我需要用其他方法做到这一点，请建议我。

我现在正在尝试做的事情并失败了

fetch('https://www.somthing.com').then((response)=>console.log(response))

但这给了我fetch failed type error

我知道首先想到的是使用提供的 url 平台 API，但大多数网站都没有

【问题讨论】：

标签： javascript reactjs web-scraping web-crawler

【解决方案1】：

如果其他人想知道同样的事情，那么我将 node js 与 puppeteer 库和 request 库一起使用（已弃用）

木偶师

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://www.kaiakater.com/shows");
const example = await page.evaluate(() => {
  const temp = Array.from(document.body.getElementsByTagName("*"), (e1) => {});

请求库

const requestFunction = () => {
 request("https://www.instagram.com/", function (error, response, body) {
 console.error("error:", error); 
 // Print the error if one occurred
 console.log("statusCode:", response && response.statusCode); 
 // Print the response 
 status code if a response was received
 console.log("body:", body); // Print the HTML for the Google homepage.
});
};

【讨论】：