【问题标题】:Problem to extract the href link from the soup find result从汤查找结果中提取href链接的问题
【发布时间】:2020-11-09 11:24:31
【问题描述】:

我正在尝试从第一个搜索省份内的网站 https://www.lianjia.com/city/ 获取链接。从第一个省开始,我想获取属于该省的城市的链接,我找到了所有带有 href 链接的 li 标签 print(t),但是当我尝试通过t.get('href')提取链接时,它什么也没返回,下面的代码有什么问题,有人可以帮忙吗?

url1 = 'https://www.lianjia.com/city/'
req1 = requests.get(url1)
soup1 = BeautifulSoup(req1.content, 'html.parser')
part = soup1.findAll("div",{"class":"city_province"})
for t in part[0].find_all('li'):
    print(t)
    print(t.get('href'))

【问题讨论】:

    标签: python web-scraping beautifulsoup scrapy


    【解决方案1】:

    li 标签没有href 属性。您必须获得所有锚点才能获得href

    试试这个:

    import requests
    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup(requests.get('https://www.lianjia.com/city/').content, 'html.parser')
    provinces = soup.find_all("div", {"class": "city_province"})
    anchors = [[a["href"] for a in p.find_all("a")] for p in provinces]
    
    for province_urls in anchors:
        print(province_urls)
    

    输出:

    ['https://aq.lianjia.com/', 'https://cz.fang.lianjia.com/', 'https://hf.lianjia.com/', 'https://mas.lianjia.com/', 'https://wuhu.lianjia.com/']
    ['https://bj.lianjia.com/']
    ['https://cq.lianjia.com/']
    ['https://fz.lianjia.com/', 'https://quanzhou.lianjia.com/', 'https://xm.lianjia.com/', 'https://zhangzhou.lianjia.com/']
    and so on...
    

    【讨论】:

    猜你喜欢
    • 2020-09-30
    • 2015-06-09
    • 1970-01-01
    • 1970-01-01
    • 2012-07-16
    • 2017-12-06
    • 2018-07-31
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多