【问题标题】:BeautifulSoup: find all instances when class name repeatsBeautifulSoup:查找类名重复时的所有实例
【发布时间】:2021-08-05 02:54:36
【问题描述】:

我有以下代码:

import requests, pandas as pd
from bs4 import BeautifulSoup
s = requests.session()
url2 = r'https://www.har.com/homedetail/6408-burgoyne-rd-157-houston-tx-77057/3380601'
r = s.get(url2)
soup = BeautifulSoup(r.text, 'html.parser')    
z2 = soup.find_all("div", {"class": 'dc_blocks_2c'})

z2 返回一个长列表。如何获取数据框中的所有变量和值?即收集 dc_labeldc_value 对。

【问题讨论】:

标签: python pandas beautifulsoup


【解决方案1】:

读取表格时,有时只使用 read_html() 方法会更容易。如果它没有捕获您想要的所有内容,您可以为其他内容编写代码。只取决于您需要从页面获得什么。

url = 'https://www.har.com/homedetail/6408-burgoyne-rd-157-houston-tx-77057/3380601'
list_of_dataframes = pd.read_html(url)
for df in list_of_dataframes:
    print(df)

或按列表中的位置获取 df。例如,

df = list_of_dataframes[2]

捕获的所有数据帧:

                      0           1
0  Original List Price:    $249,890
1        Price Reduced:     -$1,000
2   Current List Price:    $248,890
3    Last Reduction on:  05/14/2021
                      0           1
0  Original List Price:    $249,890
1        Price Reduced:     -$1,000
2   Current List Price:    $248,890
3    Last Reduction on:  05/14/2021
   Tax Year Cost/sqft Market Value  Change Tax Assessment Change.1
0      2020   $114.36     $187,555  -4.88%       $187,555   -4.88%
1      2019   $120.22     $197,168  -9.04%       $197,168   -9.04%
2      2018   $132.18     $216,768   0.00%       $216,768    0.00%
3      2017   $132.18     $216,768   5.74%       $216,768    9.48%
4      2016   $125.00     $205,000   2.19%       $198,000    6.90%
5      2015   $122.32     $200,612  18.71%       $185,219   10.00%
6      2014   $103.05     $169,000  10.40%       $168,381   10.00%
7      2013    $93.34     $153,074   0.00%       $153,074    0.00%
8      2012    $93.34     $153,074     NaN       $153,074      NaN
                           0         1
0         Market Land Value:   $39,852
1  Market Improvement Value:  $147,703
2        Total Market Value:  $187,555
                             0         1
0                 HOUSTON ISD:  1.1367 %
1               HARRIS COUNTY:  0.4071 %
2       HC FLOOD CONTROL DIST:  0.0279 %
3   PORT OF HOUSTON AUTHORITY:  0.0107 %
4            HC HOSPITAL DIST:  0.1659 %
5  HC DEPARTMENT OF EDUCATION:  0.0050 %
6   HOUSTON COMMUNITY COLLEGE:  0.1003 %
7             HOUSTON CITY OF:  0.5679 %
8              Total Tax Rate:  2.4216 %
                                                                          0            1
0  Estimated Monthly Principal & Interest  (Based on the calculation below)        $ 951
1            Estimated Monthly Property Tax  (Based on Tax Assessment 2020)        $ 378
2                                                     Home Owners Insurance  Get a Quote

【讨论】:

    【解决方案2】:
    pd.DataFrame([el.find_all('div', {'dc_label','dc_value'}) for el in z2])
    
                                   0                                                  1
    0                        [MLS#:]                                  [30509690 (HAR) ]
    1               [Listing Price:]  [$ 248,890 ($151.76/sqft.) , [], [$Convert ], ...
    2              [Listing Status:]  [[\n, [\n, <span class="status_icon_1" style="...
    3                     [Address:]                          [6408 Burgoyne Road #157]
    4                    [Unit No.:]                                              [157]
    5                        [City:]                                        [[Houston]]
    6                       [State:]                                               [TX]
    7                    [Zip Code:]                                          [[77057]]
    8                      [County:]                                  [[Harris County]]
    9                 [Subdivision:]  [ , [Briarwest T/H Condo (View subdivision pri...
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2023-04-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-04-16
      • 1970-01-01
      • 1970-01-01
      • 2015-02-05
      相关资源
      最近更新 更多