【问题标题】:Extracting Data from HTML Span using Beautiful Soup使用 Beautiful Soup 从 HTML Span 中提取数据
【发布时间】:2018-11-08 17:47:55
【问题描述】:

我想从 html 代码中提取“1.02 千万”和“7864”并将它们保存在 csv 文件的不同列中。

代码:

<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>

【问题讨论】:

    标签: python-3.x html web-scraping beautifulsoup


    【解决方案1】:

    不确定实际数据,但这只是我快速汇总的数据。如果您需要它导航到网站,请使用import requests。您需要添加 url = 'yourwebpagehere' page = requests.get(url) 并将 soup 更改为 soup = BeautifulSoup(page.text, 'lxml') 然后删除 html 变量,因为它是不需要的。

    from bs4 import BeautifulSoup
    import csv
    
    html = '<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>'
    soup = BeautifulSoup(html, 'lxml')
    findSpan = soup.find('span')
    findB = soup.find('b')
    print([findSpan.text, findB.text.replace('/sq.ft', '')])
    
    with open('NAMEYOURFILE.csv', 'w+') as writer:
        csv_writer = csv.writer(writer)
        csv_writer.writerow(["First Column Name", "Second Column Name"])
        csv_writer.writerow([findSpan, findB])
    

    【讨论】:

      【解决方案2】:

      在代码中自我解释

      from bs4 import BeautifulSoup
      
      # data for first column
      firstCol = []
      # data for second column
      secondCol = []
      
      for url in listURL:
          html = '.....' # downloaded html
          soup = BeautifulSoup(html, 'html.parser')
      
          # 'select_one' select using CSS selectors, return only first element
          fCol = soup.select_one('.featuresvap h3 span')
          # remove: <i class="icon-inr"></i>
          span.find("i").extract()
          sCol = soup.select_one('.featuresvap h3 b')
          firstCol.append(fCol.text)
          secondCol.append(sCol.text.replace('/sq.ft', ''))
      
      with open('results.csv', 'w') as fl:
          csvContent = ','.join(firstCol) + '\n' + ','.join(secondCol)
          fl.write(csvContent)
      
      ''' sample results
      1.02 Crores | 2.34 Crores
      7864        | 2475
      
      '''
      print('finish')
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-05-30
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-12-31
        相关资源
        最近更新 更多