【问题标题】:How get the text from the <p> tag using XPath Selenium and Python如何使用 XPath Selenium 和 Python 从 <p> 标记中获取文本
【发布时间】:2020-07-15 23:08:06
【问题描述】:

我需要使用 XPath 从 &lt;p&gt; 中的文本中捕获一行。我需要将文本 Content-type: text/plain; charset=us-ascii 存储到 python 中的变量中,但出现下一个错误:

selenium.common.exceptions.WebDriverException: Message: TypeError: Expected an element or WindowProxy, got: [object Text] {}

这是我正在尝试的代码:

import selenium.webdriver as webdriver

browser = webdriver.Firefox()
browser.get('https://www.w3.org/Protocols/rfc1341/7_1_Text.html')

foo = browser.find_element_by_xpath('/html/body/p[5]/text()')
print(foo)

<h1>7.1  The Text Content-Type</h1>
<p>
The text Content-Type is intended for sending material which
is  principally textual in form.  It is the default Content-
Type.  A "charset" parameter may be  used  to  indicate  the
character set of the body text.  The primary subtype of text
is "plain".  This indicates plain (unformatted)  text.   The
default  Content-Type  for  Internet  mail  is  "text/plain;
charset=us-ascii".
<p>
Beyond plain text, there are many formats  for  representing
what might be known as "extended text" -- text with embedded
formatting and  presentation  information.   An  interesting
characteristic of many such representations is that they are
to some extent  readable  even  without  the  software  that
interprets  them.   It is useful, then, to distinguish them,
at the highest level, from such unreadable data  as  images,
audio,  or  text  represented in an unreadable form.  In the
absence  of  appropriate  interpretation  software,  it   is
reasonable to show subtypes of text to the user, while it is
not reasonable to do so with most nontextual data.
<p>
Such formatted textual  data  should  be  represented  using
subtypes  of text.  Plausible subtypes of text are typically
given by the common name of the representation format, e.g.,
"text/richtext".
<p>
<h3>7.1.1     The charset parameter</h3>
<p>
A critical parameter that may be specified in  the  Content-
Type  field  for  text  data  is the character set.  This is
specified with a "charset" parameter, as in:
<p>
     Content-type: text/plain; charset=us-ascii
<p>
Unlike some  other  parameter  values,  the  values  of  the
charset  parameter  are  NOT  case  sensitive.   The default
character set, which must be assumed in  the  absence  of  a
charset parameter, is US-ASCII.

【问题讨论】:

    标签: python selenium selenium-webdriver xpath getattribute


    【解决方案1】:

    打印文本Content-type: text/plain; charset=us-ascii 你必须诱导WebDriverWaitvisibility_of_element_located() 并且你可以使用以下Locator Strategies 之一:

    • 使用XPATHtext属性:

      driver.get("https://www.w3.org/Protocols/rfc1341/7_1_Text.html")
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[contains(., 'The charset parameter')]//following-sibling::p[2]"))).text)
      
    • 使用XPATHget_attribute()

      driver.get("https://www.w3.org/Protocols/rfc1341/7_1_Text.html")
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[contains(., 'The charset parameter')]//following-sibling::p[2]"))).get_attribute("innerHTML"))
      
    • 控制台输出:

      Content-type: text/plain; charset=us-ascii
      
    • 注意:您必须添加以下导入:

      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
      

    【讨论】:

    • 感谢您的回答。它在此示例中有效,但我现在尝试将其应用于另一个示例并且我无法捕捉到“用户”行。 &lt;br /&gt; Hi User!,&lt;br /&gt; Your apply has been accepted!.&lt;br /&gt; &lt;br /&gt; Take your data:&lt;br /&gt; --------------------------&lt;br /&gt; ??? User: name&lt;br /&gt; ??? Pass: 12345678&lt;br /&gt; --------------------------&lt;br /&gt; Good Bye&lt;br /&gt; &lt;br /&gt;
    • 收到 TypeError: 'str' object is not callable
    【解决方案2】:

    xpath 中的text() 是这里的问题,见下文:

    import selenium.webdriver as webdriver
    
    browser = webdriver.Firefox()
    browser.get('https://www.w3.org/Protocols/rfc1341/7_1_Text.html')
    
    foo = browser.find_element_by_xpath('/html/body/p[5]')
    print(foo.text)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-07-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-03-08
      • 2021-10-20
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多