【发布时间】:2021-02-22 09:22:47
【问题描述】:
我想获取分行和 ATM 的列表(仅)及其地址。
我正在尝试刮:
url="https://www.ocbcnisp.com/en/hubungi-kami/lokasi-kami"
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui
import WebDriverWait
from selenium.webdriver.support
import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
driver.get(URL)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
import re
import pandas as pd
Branch_list=[]
Address_list=[]
for i in soup.find_all('div',class_="ocbc-card ocbc-card--location"):
Branch=soup.find_all('p',class_="ocbc-card__title")
Address=soup.find_all('p',class_="ocbc-card__desc")
for j in Branch:
j = re.sub(r'<(.*?)>', '', str(j))
j = j.strip()
Branch_list.append(j)
for k in Address:
k = re.sub(r'<(.*?)>', '', str(k))
k = k.strip()
Address_list.append(k)
OCBC=pd.DataFrame()
OCBC['Branch_Name']=Branch_list
OCBC['Address']=Address_list
这为我提供了第一页所需的信息,但我想为所有页面都这样做。有人可以推荐吗?
【问题讨论】:
-
如果您使用 Selenium,则只需单击 Next 按钮即可在下一页上抓取数据。我也认为没有理由使用正则表达式和 BeautifulSoup
标签: python selenium-webdriver web-scraping beautifulsoup