如何使用 csv 文件或文本文件（不使用 pandas）将网络抓取的数据转换为表格格式答案

【问题标题】：how to get web-scraped data into tabular format by using csv files or text files (no using pandas)如何使用 csv 文件或文本文件（不使用 pandas）将网络抓取的数据转换为表格格式
【发布时间】：2020-12-29 04:53:29
【问题描述】：

我从网站上抓取了数据，我需要将其转换为 csv 文件，然后读取该文件并显示它

请不要使用 pandas 并将其转换为数据框，然后将其转换为 csv 文件

我想要一种方法，将抓取的数据直接写入 csv 文件，然后还需要读取 csv 文件中的数据并在 python idle 中显示

以下是代码

import requests
from bs4 import BeautifulSoup

start_url="https://www.indeed.co.in/jobs?q=teacher&l=India"
page_data=requests.get(start_url) #sending a http request to the site
soup=BeautifulSoup(page_data.content,"html.parser") #getting that requested data to store in an object

#lists in which the data is going to be appended
Title=[]
Company=[]
Summary=[]
Location=[]
Link_to_apply=[]
  

for job_tag in soup.find_all("div",class_="jobsearch-SerpJobCard unifiedRow row result"):  

    title=job_tag.find("h2",class_="title")
    company=job_tag.find("span",class_="company")
    location=job_tag.find(class_="location accessible-contrast-color-location").text.strip()
    summary=job_tag.find("div",class_="summary")
    link=job_tag.find("a",href=True)
    base_url="https://www.indeed.com"
    final_link=base_url+link["href"]

   Title.append(title.text.replace('/n'," ").strip())   ###text removes all the unwanted text and gives only the data
   Company.append(company.text.replace('\n'," ").strip())## replace() its replces new lines with just 1 space bar
   Summary.append(summary.text.replace('\n'," ").strip())#strip() replaces all leading and trailing spaces
   Location.append(location.replace('\n'," "))
   Link_to_apply.append(final_link)

请注意只能使用python idle

【问题讨论】：

标签： python-3.x list csv web-scraping beautifulsoup

【解决方案1】：

您一口气问了两个问题。以下方法应该回答这两个问题。它的第一部分将抓取数据写入 csv 文件，最后一部分从新创建的 csv 文件中读取数据。你最好把脚本放在一个文件夹里执行，这样你就可以在同一个文件夹里得到csv文件了。

import csv
import requests
from bs4 import BeautifulSoup

base_url = "https://www.indeed.com"
start_url = "https://www.indeed.co.in/jobs?q=teacher&l=India"

page_data = requests.get(start_url)
soup = BeautifulSoup(page_data.content,"html.parser")

with open("output.csv","w",newline="",encoding="utf-8-sig") as f:
    writer = csv.writer(f)
    writer.writerow(['title','company','location','summary','final_link'])
    for job_tag in soup.find_all("div",class_="jobsearch-SerpJobCard"):  
        title = job_tag.find("h2",class_="title").get_text(strip=True)
        company = job_tag.find("span",class_="company").get_text(strip=True)
        location = job_tag.find(class_="location").get_text(strip=True)
        summary = job_tag.find("div",class_="summary").get_text(strip=True)
        link = job_tag.find("a",href=True)
        final_link = base_url + link["href"]
        writer.writerow([title,company,location,summary,final_link])

with open("output.csv","r",encoding="utf-8-sig") as r:
    reader = csv.DictReader(r)
    for item in reader:
        print(item['title'],item['company'])

【讨论】：

它给出了一个错误，错误被发布下来 Traceback（最近一次调用最后）：文件“C：/Users/sajim/Pictures/New folder/stack overflow try.py”，第 21 行，在 writer.writerow([title,company,location,summary,final_link]) 文件“C:\Users\sajim\Desktop\python official\lib\encodings\cp1252.py”，第 19 行，编码返回编解码器。 charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u20b9' in position 122: character maps to
那是因为编码。我已经修改了上面的脚本。立即尝试。
尽管我没有收到错误消息，但现在有办法将数据显示在表格列或任何表格格式中。输出有点混乱，这就是原因。你知道SIM卡的方式
这不是tabular format吗？这是脚本应该生成的，除非您手动修改。
我知道它在 csv 文件中的表格格式我的意思是，有没有办法让表格在空闲时显示而不会被截断