【发布时间】:2018-10-31 14:58:07
【问题描述】:
我正在从仪表板上抓取一些数据,并且一直试图将多个 div classes 中的一些数据放入 pandas 数据框中。我应该如何尝试转换这样的东西:
[<div class="map-item" data-companyname="Apical Group" data-country="INDONESIA" data-district="Jakarta Utara" data-latitude="-6.099396000" data-longitude="106.951478000" data-millname="AAJ Marunda" data-province="Jakarta" data-report="http://naturalhealthytreat.com/sites/neste-daemeter.com/files/AAJ_Marunda.pdf" id="map_item_4645">AAJ Marunda</div>,
<div class="map-item" data-companyname="Apical Group" data-country="INDONESIA" data-district="Lubuk Gaung" data-latitude="1.754005000" data-longitude="101.363532000" data-millname="Sari Dumai Sejati" data-province="Riau" data-report="http://naturalhealthytreat.com/sites/neste-daemeter.com/files/Sari_Dumai_Sejati.pdf" id="map_item_4646">Sari Dumai Sejati</div>,
<div class="map-item" data-companyname="Kutai Refinery Nusantara " data-country="INDONESIA" data-district="Balikpapan" data-latitude="-1.179099000" data-longitude="116.788274000" data-millname="Kutai Refinery Nusantara " data-province="Penajam Paser Utara" data-report="http://naturalhealthytreat.com/sites/neste-daemeter.com/files/Kutai_Refinery_Nusantara_.pdf" id="map_item_4647">Kutai Refinery Nusantara </div>]
进入这样的数据框:
no companyname country district latitude longitude millname province report
1 Apical Group INDONESIA Jakarta Utara -6.099396 106.951478 AAJ Marunda Jakarta http://naturalhealthytreat.com/sites/neste-daemeter.com/files/AAJ_Marunda.pdf
2 Apical Group INDONESIA Lubuk Gaung 1.754005 101.363532 Sari Dumai Sejati Riau http://naturalhealthytreat.com/sites/neste-daemeter.com/files/Kutai_Refinery_Nusantara_.pdf
3 Kutai Refinery Nusantara INDONESIA Balikpapan -1.179099 116.788274 Kutai Refinery Nusantara Penajam Paser Utara http://naturalhealthytreat.com/sites/neste-daemeter.com/files/Kutai_Refinery_Nusantara_.pdf
到目前为止,这是我从网页中获取多个 div 类的代码:
from bs4 import BeautifulSoup
import requests
# Link of Neste dashboard
url = 'http://nestetraceabilitydashboard.com/nestes-pfad-traceability-dashboard'
page = requests.get(url).content
soup = BeautifulSoup(page, "html.parser")
divList = soup.findAll('div', attrs={ "class" : "map-item"})
【问题讨论】:
标签: python pandas beautifulsoup