我建议您使用WebScrapingAPI 的extract_rules 功能,它返回一个可以使用CSS 选择器提取的元素数组。例如,我在以下 GET 请求中使用了 [data-testid='comment'] 作为 CSS 选择器:
https://api.webscrapingapi.com/v1?api_key=<YOUR_API_KEY>&url=https://www.reddit.com/r/obama/comments/xgsxy7/donald_trump_and_barack_obama_are_among_the/&render_js=1&extract_rules={"comments":{"selector":"[data-testid='comment']", "output":"text"}}
我得到了:
{
"comments":[
"I wonder what's the most number of living ex-presidents there have been at one time?",
"The highest number is six—occurring in four different periods in history. The most recent period was 2017-2018 before GHW Bush died.",
"I don't understand what the first half of your title is doing there, other than to confuse and cause a person to have to read the whole title a couple of times to work out that all the living ex-presidents are invited to QEII's DC memorial service.",
"Agreed, OP is pretty awful at writing headlines.",
"Former disgraced president trump",
"No, he's still disgraced.",
"If the link is behind a paywall, or for an ad-free version:outline.comOr if you want to see the full original page:archive.org or archive.fo or 12ft.ioOr Google cache:https://www.google.com/search?q=site:https://www.townandcountrymag.com/society/politics/a41245384/donald-trump-barack-obama-george-bush-queen-elizabeth-memorial/I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns."
]
}