您可能需要显示来自下一个span 元素的文本。这可以按如下方式完成:
import requests
from bs4 import BeautifulSoup
def beautiful_soup(url):
'''DEFINING THE FUNCTION HERE THAT SENDS A REQUEST AND PRETTIFIES THE TEXT
INTO SOMETHING THAT IS EASY TO READ'''
request = requests.get(url)
soup = BeautifulSoup(request.text, "lxml")
#print(soup.prettify())
return soup
soup = beautiful_soup('https://news.google.com/?hl=en-IN&gl=IN&ceid=IN:en')
for headlines in soup.find_all('a', {'class': 'VDXfz'}):
print(headlines.find_next('span').text)
这会给你输出开始的东西:
I Take Back My Comment, Says Ram Madhav After Omar Abdullah’s Dare to Prove Pakistan Charge
Ram Madhav Backpedals On "Instruction From Pak" After Omar Abdullah Dare
National Conference backed PDP to save J&K from uncertainty: Omar Abdullah
On Ram Madhav ‘instruction from Pak’ barb, Omar Abdullah’s stinging reply
Make public reports of horse-trading in govt formation in J-K: Omar Abdullah to Guv
您可以使用以下方法将标题写入 CSV 格式的文件:
import requests
from bs4 import BeautifulSoup
import csv
def beautiful_soup(url):
'''DEFINING THE FUNCTION HERE THAT SENDS A REQUEST AND PRETTIFIES THE TEXT
INTO SOMETHING THAT IS EASY TO READ'''
request = requests.get(url)
soup = BeautifulSoup(request.text, "lxml")
return soup
soup = beautiful_soup('https://news.google.com/?hl=en-IN&gl=IN&ceid=IN:en')
with open('output.csv', 'w', newline='', encoding='utf-8') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(['Headline'])
for headlines in soup.find_all('a', {'class': 'VDXfz'}):
headline = headlines.find_next('span').text
print(headline)
csv_output.writerow([headline])
目前这只产生一个名为Headline的列