【问题标题】:How do I extract text information from an angular website?如何从 Angular 网站中提取文本信息?
【发布时间】:2019-03-20 13:18:49
【问题描述】:

我正在尝试从该网站提取某些文本字段,但对 Angular 来说是新的。我正在使用 selenium 来构建这个网络 scraper 。我注意到确切的文本值没有存储在 html 代码中。有人可以帮助或提供一些提示来解决这个问题。我尝试使用:

find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

但没有任何进展。谢谢你:)

这是我尝试提取文本的一种方式:

def csc():
    alpah_list = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P"]
    indexOfAlpha = 0
    indexOfSheet = 2
    for x in range(2,4):
        y = x + 2
        driver.implicitly_wait(20)
        ranSleep()
        driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div[2]/div/div['+ str(x) +']/div/div/div[6]/a').click()
        driver.implicitly_wait(20)
        worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), str(driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']"))))
        ranSleep()
        driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/ul/li[2]/a/span').click()
        ranSleep()
        indexOfSheet += 1

但我在终端上收到此错误

Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element(By.cssSelector("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']")))
AttributeError: type object 'By' has no attribute 'cssSelector'
Shahans-MacBook-Pro:WebScraping Shahan$ python3 selTest.py 
Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']")))
TypeError: 'str' object is not callable
Shahans-MacBook-Pro:WebScraping Shahan$ python3 selTest.py 
Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), str(driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']"))))
TypeError: 'str' object is not callable

P.S 很抱歉,我无法共享该网站,因为它需要私人登录。

&lt;input class="edited_field ng-pristine ng-untouched ng-valid ng-not-empty" type="text" ng-model="tab.content.site.name" ng-disabled="!tab.content.updateBtnPermission" disabled="disabled"&gt;

Snippet of the text I want to extract with the html and angular code

Qharr 的错误

这是我根据Qharr评论写的代码

def csc():
    alpah_list = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P"]
    indexOfAlpha = 0
    indexOfSheet = 2
    for x in range(2,4):
        y = x + 2
        driver.implicitly_wait(20)
        ranSleep()
        driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div[2]/div/div['+ str(x) +']/div/div/div[6]/a').click()
        driver.implicitly_wait(20)
        worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element_by_css_selector('input.edited_field.ng-pristine.ng-untouched.ng-valid.ng-not-empty'))
        ranSleep()
        driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/ul/li[2]/a/span').click()
        ranSleep()
        indexOfSheet += 1
Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), str(driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']"))))
TypeError: 'str' object is not callable
Shahans-MacBook-Pro:WebScraping Shahan$ python3 selTest.py 
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/xlsxwriter/worksheet.py", line 469, in _write
    f = float(token)
TypeError: float() argument must be a string or a number, not 'WebElement'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element_by_css_selector('input.edited_field.ng-pristine.ng-untouched.ng-valid.ng-not-empty'))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/xlsxwriter/worksheet.py", line 67, in cell_wrapper
    return method(self, *args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/xlsxwriter/worksheet.py", line 408, in write
    return self._write(row, col, *args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/xlsxwriter/worksheet.py", line 474, in _write
    raise TypeError("Unsupported type %s in write()" % type(token))
TypeError: Unsupported type <class 'selenium.webdriver.remote.webelement.WebElement'> in write()

【问题讨论】:

  • 你试过driver.execute_script("arguments[0].value", element)吗?
  • 我将如何使用它
  • 请阅读为什么screenshot of HTML or code or error is a bad idea。考虑使用基于格式化文本的相关 HTML、代码试验和错误堆栈跟踪来更新问题。
  • 如何您使用这些方法比您使用的哪些方法更重要。发布完整的代码片段以及错误消息。
  • 感谢现在更新:0

标签: python html angularjs selenium web-scraping


【解决方案1】:

当前错误抱怨复合类名称。试试

driver.find_element_by_css_selector('input.edited_field.ng-pristine.ng-untouched.ng-valid.ng-not-empty'))

您可能还需要一个等待条件,并且可能可以缩短选择器以使用更少的类。

【讨论】:

  • 即使在我添加了 driver.implicityly_wait(10) 之后这也不起作用
  • 你需要比不起作用更具描述性
  • 这是我在终端 selenium.common.exceptions.NoSuchElementException 上遇到的错误:消息:没有这样的元素:无法找到元素:{“method”:“css selector”,“selector”:” input.edited_field.ng-pristine.ng-untouched.ng-valid ng-not-empty"}
  • 请将相关的 html 编辑到您的问题中。使用sn-p工具插入(不是图片)
  • 我明白了。这些错误不是来自我的代码,而是来自您现有的代码。有几个问题需要从外观上解决。
猜你喜欢
  • 2010-09-24
  • 2018-06-18
  • 2012-03-12
  • 1970-01-01
  • 2021-11-29
  • 2013-09-12
  • 1970-01-01
  • 2011-09-14
  • 1970-01-01
相关资源
最近更新 更多