【发布时间】:2020-08-08 22:16:51
【问题描述】:
我刚刚开始学习数据抓取。我为此使用 Selenium 并将数据存储在 Excel 工作表中。问题是我无法弄清楚如何让硒循环点击下一页并刮掉他们的数据,直到页面用完。 为了更好地理解它,下面是我的完整代码。
from selenium import webdriver
import pandas as pd
from openpyxl.workbook import Workbook
DRIVER_PATH = 'C:/Users/Neha/Downloads/chromedriver_win32/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get('https://www.fundoodata.com/citiesindustry/19/2/list-of-information-technology-(it)-companies-in-noida')
company_names = driver.find_elements_by_class_name('heading')
names_list = []
for name in company_names:
text = name.text
names_list.append(text)
print(text)
driver.quit()
df = pd.DataFrame(names_list)
writer = pd.ExcelWriter('companies_names.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='List')
writer.save()
我希望它从每个页面中抓取公司名称。下一个按钮的属性和 xPath 是 -
<li><a href="/citiesindustry/19/2/list-of-information-technology-(it)-companies-in-noida?&pageno=2&tot_rows=606&total_results=606&no_of_offices=0">Next</a></li>
Xpath
//*[@id="main-container"]/div[2]/div[4]/div[2]/div[1]/div/ul/li[7]/a
【问题讨论】:
标签: python python-3.x pandas selenium web-scraping