Bs4和python中的问题答案

【问题标题】：Problems in Bs4 and pythonBs4和python中的问题
【发布时间】：2018-05-29 18:44:25
【问题描述】：

import requests

from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
url = 'https://edition.cnn.com/'
page = requests.get(url,headers=headers)
soup = BeautifulSoup(page.content,"html.parser")
al = soup.find_all("h3",attrs={'class':'cd__headline'})
for divv in al:
for links in divv.find_all('a'):

    print(links.text)
    print(links.get('href'))

我正在尝试从 cnn 中提取头条新闻。我正在提供带有正确 html 元素和类的汤，但输出仍然为空，我没有收到任何错误或回溯

【问题讨论】：

您需要进行更多调试才能确定问题。 A）在不同的网站上试试这个。 B）在您保存在磁盘上的静态页面上尝试此操作。 C) 将此代码模块化，使错误更加明显，您可以单独测试部分代码。
@tadman 我在不同的网站上尝试过相同的代码，效果很好
那你需要了解edition.cnn.com的独特之处
@tadman 当我在 edition.cnn.com 上使用带有 bs4 的 Selenuim 时，相同的标签工作正常

标签： python-3.x beautifulsoup

【解决方案1】：

网页是从嵌入在 HTML 中的脚本元素中的 JSON 动态生成的。您可以提取 JSON 并对其进行解析以获取所需的数据，或者如您在上面的评论中所说，使用 Selenium 在页面上呈现 JavaScript。提取 JSON：

import requests
import json
from bs4 import BeautifulSoup

url = 'https://edition.cnn.com/'
page = requests.get(url)
soup = BeautifulSoup(page.content,"html.parser")
# Find the script element containging th JSON the web-page is dynamically generated from.
anchor = "var CNN = CNN || {};CNN.isWebview = false;CNN.contentModel = "
s = soup.find(lambda tag:tag.name=="script" and anchor in tag.text)
# Extract the JSON.
j = s.text[s.text.find("articleList")-2:s.text.find("}]")+4]
# Load the JSON.
d = json.loads(j)
# Read the headline from the JSON.
for article in d['articleList']:
    print ( article['headline'])

输出：

Here's how the show's cast reacted to the rant
Wanda Sykes quit show before it was cancelled
ABC took a moral stand on Roseanne. Spoiler alert: Trump won't.
<strong>Your questions on the 'Spider-Man' photo, answered</strong>
Trump, without proof, says Mueller team will meddle in 2018 elections
Trump wins by demonizing Mueller
2 police officers, passerby killed in Belgium
MH370 search ends but mystery remains
Israel responds to Gaza fire with airstrikes
French Open: Serena, Sharapova win
Duterte will 'go to war' over South China Sea
Giuliani gets booed on his birthday
<strong>Childhood obesity highest in home of Mediterranean diet</strong>
Top North Korea official heading to US to revive Trump talks
Suspected serial killer ID'd, but cops 'can't arrest him'
Pre-monsoon storms kill 48 in India
Lava 'river' engulfs home in minutes
Mugabe warned: Be at hearing or face jail 
Why supersonic air travel could boom in Asia 
'Unbreakable:' How tennis star Jelena Dokic overcame 'years of abuse' 
This guy survived Vesuvius eruption -- but not for long
Best travel photos of 2018
Online dating 'lowers self-esteem and increases depression'
Who is North Korea's go-to diplomat?
The best cities for swimming
Vatican unveils radical chapels
Why this country has the best libraries
The architect that changed our cities
<strong>Jill Filipovic:</strong> French Spider-Man's act of bravery you don't know about
<strong>Silvia Marchetti:</strong> Italy's chaos is more dangerous than Brexit
<strong>Jesse Williams and Judith Browne Dianis:</strong> Starbucks' incident proves 'Whites Only' spaces still exist 
<strong>Perez and O'Leary Carmona:</strong> How Trump is dehumanizing Latinos
Moment man climbs building to save child
Flash floods ravage US town 
See North Korea's nuclear tunnels go up in smoke
Meghan laughs off Harry's bee encounter
Blue flames burn during Kilauea eruption
Footage of NBA player's arrest released
Why Dubai is hungry for food delivery apps
Paris in spring? Must be Rafa Nadal time 
Fore! Golfers ignore erupting volcano
Take a tour of the Russia World Cup stadiums
Rugby World Cup 2019 Japan venues
Gorgeous Vietnam: Take a photo tour
Breathtaking architecture found underwater
India's problem with rape: Do women feel safe? 
Afghan who risked life for UK: 'They are sending me to get killed'

【讨论】：