【问题标题】:Web Scraping using BeautifulSoup on Python在 Python 上使用 BeautifulSoup 进行网页抓取
【发布时间】:2019-10-07 14:39:43
【问题描述】:

我正在尝试从该网站https://www.programmableweb.com/apis/directory 抓取每个 API 和类别的名称 并以这种格式打印出来

名称:谷歌地图

类别:映射

由于某种原因,我的代码只打印第一行。


我的代码

from bs4 import BeautifulSoup as bs

import requests

url = 'https://www.programmableweb.com/apis/directory'

response = requests.get(url)

data = response.text

soup = bs(data, 'html.parser')  

info = soup.find_all('table',{'class':'views-table cols-4 table'})



for i in info:

    name = soup.find('td',{'class':'views-field views-field-title col-md-3'}).text
    category = soup.find('td',{'class':'views-field views-field-field-article-primary-category'}).text
    print('name:',name, '\nCategory:', category)

如果你能进一步帮助我,我想做的是:

  1. API 名称
  2. API 网址
  3. API 类别
  4. 点击链接时的API描述
  5. 抓取下一页直到没有剩余页面
  6. 使用pandas将其做成DataFrame,然后放入csv文件中

【问题讨论】:

    标签: python-3.x web-scraping beautifulsoup


    【解决方案1】:

    您没有遍历表的行。您 find_all 为<table> 标签(只有1个,然后尝试遍历这些标签。您要做的是在<table>标签内找到所有<tr>标签,然后遍历@ 987654324@ 标签。您也只是从soup 对象中获取第一个元素,而不是info 对象。

    更简单的解决方案,因为它是您所追求的<table> 标签,所以使用 pandas 来抓取它(它实际上在引擎盖下使用了 beautifulsoup)。但它会为您完成所有艰苦的工作:

    import pandas as pd
    
    url = 'https://www.programmableweb.com/apis/directory'
    table = pd.read_html(url)[0]
    

    输出:

    print (table.to_string())
                          API Name                                        Description              Category   Submitted
    0                  Google Maps  [This API is no longer available. Google Maps'...               Mapping  12.05.2005
    1                      Twitter  [This API is no longer available. It has been ...                Social  12.08.2006
    2                      YouTube  The Data API allows users to integrate their p...                 Video  02.08.2006
    3                       Flickr  The Flickr API can be used to retrieve photos ...                Photos  09.04.2005
    4                     Facebook  [This API is no longer available. Its function...                Social  08.16.2006
    5   Amazon Product Advertising  What was formerly the ECS - eCommerce Service ...             eCommerce  12.02.2005
    6                       Twilio  Twilio provides a simple hosted API and markup...             Telephony  01.09.2009
    7                      Last.fm  The Last.fm API gives users the ability to bui...                 Music  10.30.2005
    8                   Twilio SMS  Twilio provides a simple hosted API and markup...             Messaging  02.19.2010
    9          Microsoft Bing Maps  Bing Maps API and Interactive SDK features an ...               Mapping  12.02.2005
    10                 del.icio.us  From their site: del.icio.us is a social bookm...             Bookmarks  10.30.2005
    11           Google App Engine  [This API is no longer available. Its function...                 Tools  12.05.2008
    12                  Foursquare  The Foursquare Places API provides location ba...                Social  09.10.2009
    13             Google Homepage  From their site: The Google Gadgets API provid...               Widgets  12.14.2005
    14         DocuSign Enterprise  DocuSign is a Cloud based legally compliant eS...  Electronic Signature  03.29.2008
    15                   Amazon S3  Since 2006 Amazon Web Services has been offeri...               Storage  03.14.2006
    16              Google AdSense  The Google AdSense API is ideal for developers...           Advertising  06.01.2006
    17                    GeoNames  Geonames is a geographical database with web s...             Reference  01.12.2006
    18                   Wikipedia  The unofficial Wikipedia API. Because Wikipedi...             Reference  09.05.2008
    19                         Box  Box is a modern content management platform th...               Content  03.07.2006
    20                  Amazon EC2  The Amazon Elastic Compute Cloud (Amazon EC2) ...                 Cloud  08.25.2006
    21                        Bing  [The Bing API is now the Bing Web Search API. ...                Search  06.04.2009
    22                    LinkedIn  LinkedIn is the world's largest business socia...                Social  12.10.2007
    23             Instagram Graph  Instagram is a photo sharing iPhone app and se...                Photos  12.15.2010
    24                 Yelp Fusion  The Yelp Fusion APIs are RESTful APIs and user...       Recommendations  08.03.2007
    

    如果您单击下一页,您会看到 https://www.programmableweb.com/apis/directory?page=1,因此只需在 for 循环中迭代直到结束,并在每次迭代后附加到您的数据帧。

    【讨论】:

      猜你喜欢
      • 2020-10-04
      • 2021-01-31
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-07-08
      • 2018-10-16
      相关资源
      最近更新 更多