Selenium c# 从网页中抓取信息答案

【问题标题】：Selenium c# scraping information from a webpageSelenium c# 从网页中抓取信息
【发布时间】：2021-07-03 07:41:48
【问题描述】：

我正在尝试运行本网站 https://www.selenium.dev/documentation/en/ 中 selenium 文档中的代码。我所做的唯一更改是我使用的是 chrome 驱动程序而不是 firefox 驱动程序。我得到的错误是OpenQA.Selenium.NoSuchElementException: 'no such element: Unable to locate element: {"method":"css selector","selector":"h3>div"}。我认为这意味着找不到 h3>div 元素，我能想到的唯一原因是我需要先接受 cookie，然后才能找到该元素。

我尝试打印出页面源并找到 h3 或 div 标签，但页面源太大而无法放入我的终端。

using (IWebDriver driver = new ChromeDriver())
        {
            WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
            driver.Navigate().GoToUrl("https://www.google.com/ncr");
            driver.FindElement(By.Name("q")).SendKeys("cheese" + Keys.Enter);
            wait.Until(webDriver => webDriver.FindElement(By.CssSelector("h3>div")).Displayed);
            IWebElement firstResult = driver.FindElement(By.CssSelector("h3>div"));
            Console.WriteLine(firstResult.GetAttribute("textContent"));
        }

代码崩溃的行是第 6 行开始 wait.Until 与上面显示的错误消息。

感谢您提供的任何帮助！

【问题讨论】：

请edit您的问题包含HTML的代表性示例。如果目标元素存在于其中，也请包括框架或 iframe。

标签： c# selenium

【解决方案1】：

主要问题是定位器h3>div 不再存在于搜索结果页面上。该代码可能很旧，并且 google.com 可能不是作为代码示例的最佳站点，因为它们经常重新设计。您可以用h3 替换它，代码应该可以工作。

既然你是新人，让我提几个建议。

在浏览器中测试您的定位器。您可以在 Chrome 开发控制台中执行此操作。您可以将 $$() 用于 CSS 选择器，将 $x() 用于 XPath。在您的情况下，您将在控制台中输入 $$("h3>div") 并看到它返回 0 个元素。现在试试$$("h3")，看看它返回了20 个元素。阅读有关Chrome dev console 的更多信息。
该代码并没有真正展示最佳实践......它有一些但在其他方面有所不足。如果我写它，它看起来像
```
 using (IWebDriver driver = new ChromeDriver())
 {
     driver.Url = "https://www.google.com/ncr";
     driver.FindElement(By.Name("q")).SendKeys("cheese" + Keys.Enter);
     WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
     IWebElement firstResult = wait.Until(ExpectedConditions.ElementIsVisible(By.CssSelector("h3")));
     Console.WriteLine(firstResult.Text);
 }
```
这是我所做的更改：
1. WebDriverWait 应该在您使用它的地方定义，而不是在脚本顶部。
2. 使用ExpectedConditions，这样您就不必为常见的事情编写自己的自定义等待。请参阅the docs 了解更多信息。
3. .ElementIsVisible() 返回等待的元素，因此您不必等待（点击页面），然后刮掉页面（再次点击页面），然后打印文本。
4. 使用.Text 而不是.GetAttribute("textContent")。 .Text （基本上）做同样的事情，并且在我遇到的所有情况下都有效。如果您拼写“textContent”的拼写或大写错误，则在运行测试等之前您不会发现。它只是更快、更好、更容易使用.Text，直到您遇到罕见的情况，它不会工作。

【讨论】：