【发布时间】:2019-06-18 07:13:25
【问题描述】:
我正在尝试自动化我的一些工作。 有问题的网站是 training.gov.au,它们在特定页面下嵌套表格,例如https://training.gov.au/Training/Details/BSBWHS402 我真正想做的是能够指出我想使用哪个模块(在本例中为 BSBWHS402)并遍历嵌套在该页面上的特定表,然后将这些表重新加工成 .csv 或理想情况下工作成预格式化的 .csv 文件。文档
我已经能够通过扼杀其他人的工作从代码中获得我需要的东西,但无法让它看起来与表格中的网站相似。 我尝试将其粘贴到 .csv 中并使用分隔符,但这不起作用,显然并没有真正实现自动化。
import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv
website_url = requests.get('https://training.gov.au/Training/Details/BSBWHS402').text
soup = BeautifulSoup(website_url,'lxml')
tables = soup.findAll('table')
My_table = soup.find('Elements and Performance Criteria')
df = pd.read_html(str(tables))
results = (df[8].to_json(orient='records'))
print(results)
我得到以下单行;
[{"0":"ELEMENT","1":"PERFORMANCE CRITERIA"},{"0":"Elements describe the essential outcomes.","1":"Performance criteria describe the performance needed to demonstrate achievement of the element."},{"0":"1 Assist with determining the legal framework for WHS in the workplace","1":"1.1 Access current WHS legislation and related documentation relevant to the organisation\u2019s operations 1.2 Use knowledge of the relationship between WHS Acts, regulations, codes of practice, standards and guidance material to assist with determining legal requirements in the workplace 1.3 Assist with identifying and confirming the duties, rights and obligations of individuals and parties as specified in legislation 1.4 Assist with seeking advice from legal advisers where necessary"},{"0":"2 Assist with providing advice on WHS compliance","1":"2.1 Assist with providing advice to individuals and parties about their legal duties, rights and obligations, and the location of relevant information in WHS legislation 2.2 Assist with providing advice to individuals and parties about the functions and powers of the WHS regulator and how they are exercised, and the objectives and principles underpinning WHS"},{"0":"3 Assist with WHS legislation compliance measures","1":"3.1 Assist with assessing how the workplace complies with relevant WHS legislation 3.2 Assist with determining the WHS training needs of individuals and parties, and with providing training to meet legal and other requirements 3.3 Assist with developing and implementing changes to workplace policies, procedures, processes and systems that will achieve compliance"}]
我不确定如何准确地使用它,但我至少可以注意到它已经分配了它应该放在哪一列。
非常愿意接受有关如何使该产品变得更好的批评和想法。 我将为此制作一个 UI 以输入模块名称,但这是我未来的问题。 提前致谢
【问题讨论】:
-
那么这样的输出到底有什么问题呢?它输出 JSON 格式。里面有一个行数组。尝试将此字符串粘贴到任何 JSON 查看器中。例如这里jsoneditoronline.org
-
我不一定想要 JSON 格式的,这是我发现的工作方式。我不知道如何从这个 JSON 到 .csv 另外,当我仔细观察它时,它会将一些数据整理在一起,因为所有 1.1、1.2、1.3 都在同一个数据集中。在网站上,这些是表格中的单独行。
标签: python-3.x csv web-scraping beautifulsoup