将 beautifulsoup 网页抓取保存到 json答案

【问题标题】：saving a beautifulsoup web scrape to json将 beautifulsoup 网页抓取保存到 json
【发布时间】：2020-02-24 19:05:39
【问题描述】：

python noob here，我已经设法从 Wikipedia 上抓取了一份公司列表，如何将输出保存为 JSON 文件？

import requests
from bs4 import BeautifulSoup
import JSON

url = "https://en.wikipedia.org/wiki/List_of_companies_traded_on_the_JSE"
responce = requests.get(url)
soup = BeautifulSoup(responce.text, 'html.parser')
tables = soup.findAll('table', {'class':"wikitable sortable"})

for table in soup.find_all('table', {'class':"wikitable sortable"}):
         print(table.text

【问题讨论】：

你应该使用import json而不是JSON

标签： python json python-3.x beautifulsoup

【解决方案1】：

在 table 中，您将检索您感兴趣的所有对象，您甚至可以在选择中添加更多粒度以仅从 HTML 中获取名称。然后做：

with open('data.json', 'w') as outfile:
    json.dump(table, outfile)

【讨论】：

【解决方案2】：

使用这个：

import requests
from bs4 import BeautifulSoup
import json

url = "https://en.wikipedia.org/wiki/List_of_companies_traded_on_the_JSE"
responce = requests.get(url)
soup = BeautifulSoup(responce.text, 'html.parser')
table = soup.findAll('table', {'class':"wikitable sortable"})
tables = [str(x.text) for x in table]
json_text = json.dumps(tables)

with open('companies.json', 'w') as json_file:
    json_file.write(json_text)

这应该可以解决问题。虽然我不确定你打算用它做什么，因为这是表中所有数据的列表。

【讨论】：

非常感谢！我实际上想要做的是将数据保存在带有行标题的 CSV 文件中，我该怎么做呢？再次感谢您。
filename = 'jse.csv' with open(filename, 'wb') as f: w = csv.DictWriter(f,['company','Symbol','url','lines ']) w.writeheader() for table in tables: w.writerow(table) 类似这样？