【发布时间】:2019-11-05 05:00:36
【问题描述】:
嗨 Everone 我想 刮擦 但是你在 59 岁时遇到这个错误
我的xlsx 文件中有 1089 个项目
错误:
Traceback (most recent call last):
File ".\seleniuminform.py", line 28, in <module>
s.write(phone[i].text + "," + wevsite_link[i].text + "\n")
IndexError: list index out of range
这是我的python代码:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
with open("Sans Fransico.csv","r") as s:
s.read()
df = pd.read_excel('myfile.xlsx') # Get all the urls from the excel
mylist = df['Urls'].tolist() #urls is the column name
driver = webdriver.Chrome()
for url in mylist:
driver.get(url)
wevsite_link = driver.find_elements_by_css_selector(".text--offscreen__373c0__1SeFX+ .link-size--default__373c0__1skgq")
phone = driver.find_elements_by_css_selector(".text--offscreen__373c0__1SeFX+ .text-align--left__373c0__2pnx_")
num_page_items = len(phone)
with open("Sans Fransico.csv", 'a',encoding="utf-8") as s:
for i in range(num_page_items):
s.write(phone[i].text + "," + wevsite_link[i].text + "\n")
driver.close()
print ("Done")
链接:
https://www.yelp.com/biz/daeho-kalbijjim-and-beef-soup-san-francisco-9?osq=Restaurants
此网站和电话出现此处错误:
【问题讨论】:
-
显然,
wevsite_link没有phone长。你能解释为什么你期望它是相同的长度吗?如果没有,您是否考虑过您想要发生的事情? -
你应该首先找到所有
.text--offscreen__373c0__1SeFX+,然后使用for循环在每个.text--offscreen__373c0__1SeFX+中搜索电话和网站以创建对(phone, webside)。如果某些.text--offscreen__373c0__1SeFX+没有phone,有时您可能会得到(None, webside) -
能否请您回答,以便我更好地理解!
-
您使用网页和手机显示图像,但页面上的某些项目可能没有手机 - 并且您获得的手机可能比网站少。
-
@KarlKnechtel 解释得很好,这基本上就是我在回答中写的。
标签: python selenium loops for-loop web-scraping