【问题标题】:Unable to write results in a csv file in some customized manner无法以某种自定义方式将结果写入 csv 文件
【发布时间】:2020-06-14 15:16:23
【问题描述】:

我创建了一个脚本来解析来自网页的不同容器中的singerstheir concerning linksactorstheir concerning links。脚本运行良好。我不能做的是将结果相应地写入 csv 文件。

Webpage link

我试过了:

import csv
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

base_url = 'https://www.hindigeetmala.net'
link = 'https://www.hindigeetmala.net/movie/2_states.htm'

res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")

with open("hindigeetmala.csv","w",newline="") as f:
    writer = csv.writer(f)
    writer.writerow(['singer_records','actor_records'])

    for item in soup.select("tr[itemprop='track']"):
        try:
            singers = [i.get_text(strip=True) for i in item.select("span[itemprop='byArtist']") if i.get_text(strip=True)]
        except Exception: singers = ""

        try:
            singer_links = [urljoin(base_url,i.get("href")) for i in item.select("a:has(> span[itemprop='byArtist'])") if i.get("href")]
        except Exception: singer_links = ""
        singer_records = [i for i in zip(singers,singer_links)]

        try:
            actors = [i.get_text(strip=True) for i in item.select("a[href^='/actor/']") if i.get("href")]
        except Exception: actors = ""
        try:
            actor_links = [urljoin(base_url,i.get("href")) for i in item.select("a[href^='/actor/']") if i.get("href")]
        except Exception: actor_links = ""
        actor_records = [i for i in zip(actors,actor_links)]
        song_name = item.select_one("span[itemprop='name']").get_text(strip=True)
        writer.writerow([singer_records,actor_records,song_name])
        print(singer_records,actor_records,song_name)

如果我按原样执行脚本,我得到的是the output

当我尝试writer.writerow([*singer_records,*actor_records,song_name]) 时,我得到了这种类型的output。只写入第一对元组。

这是我期待的output

如何根据第三张图片将结果写入 csv 文件中的名称及其链接?

PS 为简洁起见,输出的所有图像代表 csv 文件的第一列。

【问题讨论】:

    标签: python python-3.x web-scraping


    【解决方案1】:

    根据 SIM 的反馈,我认为这就是您要寻找的(我刚刚添加了一个用于格式化您的列表的功能)

    import csv
    import requests
    from bs4 import BeautifulSoup
    from urllib.parse import urljoin
    
    base_url = 'https://www.hindigeetmala.net'
    link = 'https://www.hindigeetmala.net/movie/2_states.htm'
    
    res = requests.get(link)
    soup = BeautifulSoup(res.text, "lxml")
    
    
    def merge_results(inpt):
        return [','.join(nested_items for nested_items in
                         [','.join("'" + tuple_item + "'" for tuple_item in item)
                          for item in inpt])]
    
    
    with open("hindigeetmala.csv", "w", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(['singer_records', 'actor_records'])
    
        for item in soup.select("tr[itemprop='track']"):
            try:
                singers = [i.get_text(strip=True) for i in item.select(
                    "span[itemprop='byArtist']") if i.get_text(strip=True)]
            except Exception:
                singers = ""
    
            try:
                singer_links = [urljoin(base_url, i.get("href")) for i in item.select(
                    "a:has(> span[itemprop='byArtist'])") if i.get("href")]
            except Exception:
                singer_links = ""
            singer_records = [i for i in zip(singers, singer_links)]
    
            try:
                actors = [i.get_text(strip=True) for i in item.select(
                    "a[href^='/actor/']") if i.get("href")]
            except Exception:
                actors = ""
            try:
                actor_links = [urljoin(base_url, i.get("href")) for i in item.select(
                    "a[href^='/actor/']") if i.get("href")]
            except Exception:
                actor_links = ""
            actor_records = [i for i in zip(actors, actor_links)]
            song_name = item.select_one(
                "span[itemprop='name']").get_text(strip=True)
            writer.writerow(merge_results(singer_records) +
                            merge_results(actor_records)+[song_name])
            print(singer_records, actor_records, song_name)
    

    【讨论】:

    • song_name 似乎不是一个列表。
    • 我认为 OP 正在尝试将 singer_records 的所有信息写入单个单元格等,但您建议的方式将跨列分布。
    猜你喜欢
    • 2018-06-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-09-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多