【问题标题】:Trying to scrape one element nested within a tag试图抓取一个嵌套在标签中的元素
【发布时间】:2019-07-31 13:40:20
【问题描述】:

我试图只捕获“Other”文本,本质上是提取强标签元素

<ul class="listing-row__meta">
                        <li>
                            <strong>Ext. Color:</strong>

                                Other
                        </li>
                    </ul>

到目前为止我的代码:

import requests
from bs4 import BeautifulSoup
from csv import writer

response = requests.get('https://www.cars.com/for-sale/searchresults.action/?mdId=21811&mkId=20024&page=1&perPage=100&rd=99999&searchSource=PAGINATION&showMore=false&sort=relevance&stkTypId=28880&zc=11209')

soup = BeautifulSoup(response.text, 'html.parser')

posts = soup.find_all(class_='shop-srp-listings__inner')

with open('posts.csv', 'w') as csv_file:
    csv_writer = writer(csv_file)
    headers = ['title', 'color', 'price']
    csv_writer.writerow(headers)

    for post in posts:
        title = post.find(class_="listing-row__title").get_text().replace('\n', '').strip()
        color = post.find("li").get_text().replace('\n', '').strip()
        colorremove = color.strong.extract()
        price = post.find("span", attrs={"class": "listing-row__price"}).get_text().replace('\n', '').strip()
        csv_writer.writerow([title, colorremove, price])

这个特定的脚本没有运行,在此之前我只保留了颜色线并且工作正常,但它描绘了“Ext. Color”

【问题讨论】:

    标签: python python-3.x web-scraping beautifulsoup screen-scraping


    【解决方案1】:

    你可以find&lt;strong&gt;元素,然后得到它的next_sibling

    from bs4 import BeautifulSoup
    
    markup = r"""
    <ul class="listing-row__meta">
                            <li>
                                <strong>Ext. Color:</strong>
    
                                    Other
                            </li>
                        </ul>
    """
    
    soup = BeautifulSoup(markup, "html.parser")
    print(soup.find("strong").next_sibling.strip())
    

    结果:

    Other
    

    【讨论】:

      【解决方案2】:

      你可以在父类上使用 stripped_strings

      from bs4 import BeautifulSoup
      
      html = """
      <ul class="listing-row__meta">
                              <li>
                                  <strong>Ext. Color:</strong>
      
                                      Other
                              </li>
                          </ul>
      """
      
      soup = BeautifulSoup(html, "lxml")
      firstItem = soup.select_one('.listing-row__meta')
      strings = [string for string in firstItem.stripped_strings]
      print(strings[1])
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-07-23
        • 2014-05-11
        • 1970-01-01
        • 2013-02-02
        • 2021-10-08
        • 1970-01-01
        相关资源
        最近更新 更多