【问题标题】:Extract some basic data with beautiful soup用美汤提取一些基础数据
【发布时间】:2019-07-12 05:47:40
【问题描述】:

最近我尝试用python开始网络爬虫,为了用漂亮的汤提取instagram中的一些基本信息。

我写了一个简单的代码,如下所示:

from bs4 import BeautifulSoup
import selenium.webdriver as webdriver

url = 'http://instagram.com/umnpics/'
driver = webdriver.Firefox()
driver.get(url)

soup = BeautifulSoup(driver.page_source)

for x in soup.findAll('li', {'class':'photo'}):
    print (x)

但是运行之后,出现了一些异常:

Traceback (most recent call last):
  File "C:\Users\Mhdn\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\common\service.py", line 76, in start
    stdin=PIPE)
  File "C:\Program Files (x86)\Python37-32\lib\subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "C:\Program Files (x86)\Python37-32\lib\subprocess.py", line 1178, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Mhdn\Desktop\test2.py", line 5, in <module>
    driver = webdriver.Firefox()
  File "C:\Users\Mhdn\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\firefox\webdriver.py", line 164, in __init__
    self.service.start()
  File "C:\Users\Mhdn\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\common\service.py", line 83, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH.

【问题讨论】:

  • selenium.common.exceptions.WebDriverException:消息:“geckodriver”可执行文件需要在 PATH 中。
  • 尝试在此处添加路径:driver = webdriver.Firefox('path/to/geckodriver')
  • 我的回答解决了你的问题吗?

标签: selenium-webdriver web-scraping beautifulsoup instagram


【解决方案1】:
  • 你需要从here下载geckodriver到你的本地系统
  • 在您的代码中,您需要为 geckodriver 提供 executable_path

executable_path 添加到您的代码中:

from bs4 import BeautifulSoup
import selenium.webdriver as webdriver

url = 'http://instagram.com/umnpics/'
driver = webdriver.Firefox(executable_path= 'path/to/geckodriver')   #<---Add path to your geckodriver

#example: driver = webdriver.Firefox(executable_path= 'home/downloads/geckodriver')

driver.get(url)

soup = BeautifulSoup(driver.page_source)

for x in soup.findAll('li', {'class':'photo'}):
    print (x)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2015-03-18
    • 2012-08-01
    • 2017-12-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-09-11
    • 1970-01-01
    相关资源
    最近更新 更多