【发布时间】:2020-06-14 15:16:23
【问题描述】:
我创建了一个脚本来解析来自网页的不同容器中的singers、their concerning links、actors 和their concerning links。脚本运行良好。我不能做的是将结果相应地写入 csv 文件。
我试过了:
import csv
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
base_url = 'https://www.hindigeetmala.net'
link = 'https://www.hindigeetmala.net/movie/2_states.htm'
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
with open("hindigeetmala.csv","w",newline="") as f:
writer = csv.writer(f)
writer.writerow(['singer_records','actor_records'])
for item in soup.select("tr[itemprop='track']"):
try:
singers = [i.get_text(strip=True) for i in item.select("span[itemprop='byArtist']") if i.get_text(strip=True)]
except Exception: singers = ""
try:
singer_links = [urljoin(base_url,i.get("href")) for i in item.select("a:has(> span[itemprop='byArtist'])") if i.get("href")]
except Exception: singer_links = ""
singer_records = [i for i in zip(singers,singer_links)]
try:
actors = [i.get_text(strip=True) for i in item.select("a[href^='/actor/']") if i.get("href")]
except Exception: actors = ""
try:
actor_links = [urljoin(base_url,i.get("href")) for i in item.select("a[href^='/actor/']") if i.get("href")]
except Exception: actor_links = ""
actor_records = [i for i in zip(actors,actor_links)]
song_name = item.select_one("span[itemprop='name']").get_text(strip=True)
writer.writerow([singer_records,actor_records,song_name])
print(singer_records,actor_records,song_name)
如果我按原样执行脚本,我得到的是the output。
当我尝试writer.writerow([*singer_records,*actor_records,song_name]) 时,我得到了这种类型的output。只写入第一对元组。
这是我期待的output。
如何根据第三张图片将结果写入 csv 文件中的名称及其链接?
PS 为简洁起见,输出的所有图像代表 csv 文件的第一列。
【问题讨论】:
标签: python python-3.x web-scraping