【问题标题】:How do I fetch a particular item from html code using bs4? [closed]如何使用 bs4 从 html 代码中获取特定项目? [关闭]
【发布时间】:2020-02-23 13:39:16
【问题描述】:

我有以下 HTML 代码。我想转换下面的 HTML 代码:

<div class="company_data__list">

 <div class="company_data__row"><div class="company_data__head">Name</div><div class="company_data__data">ABC Company<br/>Subtitle</div></div>
 <div class="company_data__row"><div class="company_data__head">Capital</div><div class="company_data__data">230000</div></div>
 <div class="company_data__row"><div class="company_data__head">Total</div><div class="company_data__data">103</div></div>

 <div class="company_data__row"><div class="company_data__head">Name</div><div class="company_data__data">XYZ Company<br/>Subtitle</div> 
 <div class="company_data__row"><div class="company_data__head">Total</div><div class="company_data__data">10</div></div>

 <div class="company_data__row"><div class="company_data__head">Name</div><div class="company_data__data">CAT Company<br/>Subtitle</div></div>
 <div class="company_data__row"><div class="company_data__head">Capital</div><div class="company_data__data">430000</div></div>
 <div class="company_data__row"><div class="company_data__head">Total</div><div class="company_data__data">10233</div></div>
 <div class="company_data__row"><div class="company_data__head">URL</div><div class="company_data__data">www.abc.com</div></div>

</div>



进入一个如下所示的 Json 文件:

{ id: '1',
  data:{
    name: 'ABC CAT Company',
    capital: '230000',
    total:'103'
  },
  id:'2',
  data: {
    name: 'XYZ CAT Company',
    total:'10'
  },
  id:'3',
  data: {
    name: 'CAT Company',
    capital: '430000',
    total:'10',
    url:'www.abc.com'
  },


}

我正在使用python3、bs4、re(正则表达式)

【问题讨论】:

  • 如果您使用的是 bs4,为什么需要正则表达式来检索该数据?
  • 我意识到了!编辑了我的问题。谢谢
  • 到目前为止您尝试了什么,什么不起作用?如果你只需要一个函数,你可以试试soup.find()soup.find_all(),你可以给它“div”,甚至像attrs={"class": "company_data__data"}这样的过滤。这可能会有所帮助:crummy.com/software/BeautifulSoup/bs4/doc/#find-all
  • 感谢您的链接。我想我会多读一点。

标签: python html json regex beautifulsoup


【解决方案1】:

这是一种方法。

例如:

import csv
from bs4 import BeautifulSoup

html = """<div class="company_data__list">

 <div class="company_data__row"><div class="company_data__head">Name</div><div class="company_data__data">ABC Company<br/>Subtitle</div></div>
 <div class="company_data__row"><div class="company_data__head">Capital</div><div class="company_data__data">230000</div></div>
 <div class="company_data__row"><div class="company_data__head">Total</div><div class="company_data__data">103</div></div>

 <div class="company_data__row"><div class="company_data__head">Name</div><div class="company_data__data">XYZ Company<br/>Subtitle</div></div>
 <div class="company_data__row"><div class="company_data__head">Capital</div><div class="company_data__data">330000</div></div>
 <div class="company_data__row"><div class="company_data__head">Total</div><div class="company_data__data">10</div></div>

 <div class="company_data__row"><div class="company_data__head">Name</div><div class="company_data__data">CAT Company<br/>Subtitle</div></div>
 <div class="company_data__row"><div class="company_data__head">Capital</div><div class="company_data__data">430000</div></div>
 <div class="company_data__row"><div class="company_data__head">Total</div><div class="company_data__data">10233</div></div>

</div>"""

soup = BeautifulSoup(html, "html.parser")
content = soup.find("div", class_="company_data__list").find_all("div", class_='company_data__data') #Find required DIV
with open(filename, "w") as csv_file:       #Open File
    writer = csv.writer(csv_file)           #Create CSV object
    for i in range(0, len(content), 3):
        temp = [j.text for j in content[i:i+3]]
        writer.writerow(temp)               #Write Content

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-12-15
    • 1970-01-01
    • 1970-01-01
    • 2016-06-22
    相关资源
    最近更新 更多