【发布时间】:2021-10-26 22:18:41
【问题描述】:
我们正在抓取 Billboard 的热门 100 名单 https://www.billboard.com/charts/hot-100/2021-10-30 并且有一些不错的代码,但很难完成:
from bs4 import BeautifulSoup
import requests
import pandas as pd
def CleanBullet(bullet):
this_rank = all_bullets[0].find("span", class_="chart-element__rank").get_text().strip('\n').strip('\n').strip('Rising')
this_song = all_bullets[0].find("span", class_="chart-element__information__song").get_text().strip('\n')
this_artist = all_bullets[0].find("span", class_="chart-element__information__artist").get_text().strip('\n')
this_last_week = all_bullets[0].find("span", class_="text--last").get_text().strip(' Last Week')
this_peak = all_bullets[0].find("span", class_="text--peak").get_text().strip(' Peak Rank')
this_weeks_on = all_bullets[0].find("span", class_="text--week").get_text().strip(' Weeks on Chart')
this_df = pd.DataFrame()
data={
'rank': this_rank,
'song': this_song,
'artist': this_artist,
'last_week': this_last_week,
'peak': this_peak,
'weeks_on': this_weeks_on
}
this_df = this_df.append(data, ignore_index=True)
return this_df
base_url = "https://www.billboard.com/charts/hot-100/2021-10-30"
response = requests.get(base_url)
web_page = response.text
soup = BeautifulSoup(web_page, "html.parser")
full_table = soup.find("ol", class_="chart-list__elements").find_all("li")
df1 = CleanBullet(full_table[0])
df1
我们怎么做:
- 对
full_table中的 100 个元素中的每一个元素应用此函数,从而生成一个包含 100 行的数据框? - 删除排名列中的
\n,因为strip('\n')似乎不起作用...
【问题讨论】:
-
strip()默认删除所有空格(包括换行符) -
strip()在上面的例子中没有像我需要的那样删除\n换行符。
标签: python pandas beautifulsoup