从静态网站抓取表格答案

【问题标题】：Scrape table from static web site从静态网站抓取表格
【发布时间】：2021-05-06 15:15:30
【问题描述】：

我需要来自iana.org 的顶级域的抓取表。

我的代码：

import requests
from bs4 import BeautifulSoup

URL = 'https://www.iana.org/domains/root/db'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

results = soup.find(id='tld-table')

我怎样才能将它与网站上的结构（域、类型、TLD MANAGER）一样的 pandas DataFrame。

【问题讨论】：

标签： python python-3.x web-scraping beautifulsoup python-requests

【解决方案1】：

Pandas 已经自带了可以读表的东西from html，不用再用 BeautifulSoup：

import pandas as pd

url = "https://www.iana.org/domains/root/db"
# This returns a list of DataFrames with all tables in the page. 
df = pd.read_html(url)[0]

【讨论】：

【解决方案2】：

你可以使用熊猫pd.read_html

import pandas as pd

URL = "https://www.iana.org/domains/root/db"

df = pd.read_html(URL)[0]

print(df.head())
    Domain     Type                            TLD Manager
0     .aaa  generic  American Automobile Association, Inc.
1    .aarp  generic                                   AARP
2  .abarth  generic         Fiat Chrysler Automobiles N.V.
3     .abb  generic                                ABB Ltd
4  .abbott  generic              Abbott Laboratories, Inc.

【讨论】：