【问题标题】:Webscraping specific fields in Python在 Python 中抓取特定字段
【发布时间】:2021-07-12 15:23:20
【问题描述】:

如何从这里提取公司及其描述?

从我昨天的question 中,我弄清楚了如何提取名称,但是当我应用相同的逻辑来提取它们的描述时,它适得其反。

request = requests.get("https://www.clstack.com", verify=False, headers=headers)
soup = bs4.BeautifulSoup(request.content, 'html.parser')
data = soup.find_all('td', {'class':'company'})

for i in data:
    print(i.find['tr'])

输出

company|description

desc 在“td”标签内,但是当我从代码中调用它时,我没有得到任何输出。

【问题讨论】:

  • 没有与 desc 标签关联的类,这让我的理解更加混乱。
  • edit 包含错误的完整回溯。
  • 没有输出就是error.lol
  • @Byte 显然不会有输出。 td 标签没有任何 tr 标签。 td 在里面 tr
  • 那么如何同时访问描述和公司名称呢? html 是我的第一次,所以教程并没有真正的帮助,我很困惑。

标签: python python-3.x selenium web-scraping


【解决方案1】:

您会注意到<td class="company"> 标记后面跟着另一个带有描述的<td> 标记。因此,一旦您遍历 <td class="company"> 元素,只需使用 .find_next('td') 来获取带有描述的下一个标签:

import requests
import bs4

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'} # This is chrome, you can set whatever browser you like
request = requests.get("https://www.cloudtango.org", verify=False, headers=headers)
soup = bs4.BeautifulSoup(request.content, 'html.parser')
data = soup.find_all('td', {'class':'company'})

for each in data:
    company =  each.find('img')['alt']   
    description = each.find_next('td').text
    print(f'{company}: {description}\n\n')

输出:

Redcentric: Redcentric is a leading UK IT managed services provider that offers a range of IT and Cloud services designed to support organisations in their journey from traditional infrastructure to the Cloud …


Modern Networks: Established in 1999, Modern Networks is a leading provider of IT support, network services, business broadband and telecoms to the UK’s commercial property sector. Additionally, we work with around …


BlackPoint IT Services: BlackPoint’s comprehensive range of Managed IT Services is designed to help you improve IT quality, efficiency and reliability -and save you up to 50% on IT cost. Providing IT solutions for more …


AffinityMSP: AffinityMSP was created with one goal in mind: to help Australian businesses achieve success through high-performance technology. Our consultants take the time to get to know your business and …


centrexIT: Founded in 2002, centrexIT is San Diego's leader in IT management. Our locally-based technology professionals provide outsourced IT service, support, security and leadership for small and medium-…


Carbon60: Carbon60 specializes in delivering secure managed cloud solutions for public and private sector organizations with business-critical workloads. Businesses are at different stages in their cloud …


...

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-07-22
    • 2021-06-22
    • 2015-08-28
    • 2021-09-18
    • 1970-01-01
    • 2013-08-08
    相关资源
    最近更新 更多