消除网页上的重复链接，避免链接过时错误答案

【问题标题】：eliminating duplicate links on the webpage and avoid link is stale error消除网页上的重复链接，避免链接过时错误
【发布时间】：2019-09-06 01:08:47
【问题描述】：

我有一个包含 20 个链接的列表，其中一些是重复的。我点击第一个链接到下一页，我从下一页下载了一些文件。

第 1 页

链接 1
链接 2
链接 3
链接 1
链接 3
链接 4
链接 2

链接 1（点击）-->（打开）页面 2

第2页（点击返回按钮浏览器）-->（返回）第1页

现在我点击链接 2 并重复同样的操作。

             System.setProperty("webdriver.chrome.driver", "C:\\chromedriver.exe"); 
    String fileDownloadPath = "C:\\Users\\Public\\Downloads"; 


    //Set properties to supress popups
    Map<String, Object> prefsMap = new HashMap<String, Object>();
    prefsMap.put("profile.default_content_settings.popups", 0);
    prefsMap.put("download.default_directory", fileDownloadPath);
    prefsMap.put("plugins.always_open_pdf_externally", true);
    prefsMap.put("safebrowsing.enabled", "false"); 

    //assign driver properties
    ChromeOptions option = new ChromeOptions();
    option.setExperimentalOption("prefs", prefsMap);
    option.addArguments("--test-type");
    option.addArguments("--disable-extensions");
    option.addArguments("--safebrowsing-disable-download-protection");
    option.addArguments("--safebrowsing-disable-extension-blacklist");


    WebDriver driver  = new ChromeDriver(option);
           driver.get("http://www.mywebpage.com/");

           List<WebElement> listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')]"));
        Thread.sleep(500);



        pageSize = listOfLinks.size();

        System.out.println( "The number of links in the page is: " + pageSize);

        //iterate through all the links on the page
        for ( int i = 0; i < pageSize; i++)
        {

            System.out.println( "Clicking on link: " + i );
            try 
            {
                    linkText = listOfLinks.get(i).getText();
                    listOfLinks.get(i).click();
            }
            catch(org.openqa.selenium.StaleElementReferenceException ex)
            {
                listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')]"));
                linkText = listOfLinks.get(i).getText();
                listOfLinks.get(i).click();
            }
               try 
            {
              driver.findElement(By.xpath("//span[contains(@title,'download')]")).click();

            }
            catch (org.openqa.selenium.NoSuchElementException ee)
            {
                driver.navigate().back();
                Thread.sleep(300);
                continue;
            }
      Thread.sleep(300);                 
            driver.navigate().back();
            Thread.sleep(100);
        }

代码运行良好，点击所有链接并下载文件。现在我需要改进逻辑，省略重复的链接。我试图过滤掉列表中的重复项，但不知道应该如何处理 org.openqa.selenium.StaleElementReferenceException。我正在寻找的解决方案是单击第一次出现的链接，如果再次出现，请避免单击该链接。

（这是从门户下载多个文件的复杂逻辑的一部分>我无法控制。因此请不要返回>问题，例如为什么页面上有重复的链接第一名。）

【问题讨论】：

嗨，如果将已经访问过的链接添加到一个单独的变量并在转换之前查看，下一个链接是否存在于已访问列表中？
检查我的答案，详细说明如何仅获取唯一链接和处理陈旧元素。如果您有任何问题，请告诉我。

标签： java selenium selenium-webdriver xpath

【解决方案1】：

首先我不建议你反复向 WebDriver 发出请求（findElements），你会看到很多性能问题，主要是如果你有很多链接和页面。

另外，如果你总是在同一个标签上做同样的事情，你需要等待刷新 2 次（链接页面和下载页面），现在如果你在新标签中打开每个链接，你只需要等待您将下载的页面刷新。

我有一个建议，只是像 @supputuri 所说的不同的重复链接，并在 NEW 选项卡中打开每个链接，这样您就不需要处理陈旧的，不需要每次都在屏幕上搜索链接，也不需要每次迭代都等待带有链接的页面刷新。

List<WebElement> uniqueLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]"));

for ( int i = 0; i < uniqueLinks.size(); i++)
{
    new Actions(driver)
         .keyDown(Keys.CONTROL)
         .click(uniqueLinks.get(i))
         .keyUp(Keys.CONTROL)
         .build()
         .perform();
    // if you want you can create the array here on this line instead of create inside the method below.
    driver.switchTo().window(new ArrayList<>(driver.getWindowHandles()).get(1));
    //do your wait stuff.
    driver.findElement(By.xpath("//span[contains(@title,'download')]")).click();
    //do your wait stuff.
    driver.close();
    driver.switchTo().window(new ArrayList<>(driver.getWindowHandles()).get(0));
}

我现在无法正确测试我的代码，对此代码的任何问题都只是评论，我会更新答案，但这个想法是正确的，而且非常简单。

【讨论】：

我喜欢在新标签中打开链接的想法，但我们可能不得不考虑在新标签中打开 href。为了做到这一点，你必须确保你得到了href（不是点击链接）并使用js打开带有href的新窗口或标签，并处理新窗口以进行下载(which might give you some issues when running in IE)。还有每个窗口实例消耗的资源（因为 selenium 将每个选项卡视为一个单独的窗口）。
我不知道 OP 正在使用的应用程序，但通常应用程序会有缓存，这将确保性能不会影响。这取决于 OP 他想要如何实现它，但如果你想走这条路线，那是我的两件事，你应该保持警惕。
@Spencer 感谢您提供出色的解决方案。 driver.switchTo().window(driver.getWindowHandles().get(1)) 这给出了语法错误。 get(1) 是正确的语法吗？
试试driver.switchTo().window(new ArrayList<String> (driver.getWindowHandles()).get(1)); FYI，这在 FF 和 IE 中可能不起作用。
@supputuri 它只是在同一窗口中打开第一个链接。现在我有 14 个独特的链接，并认为它会打开 14 个新标签。我正在使用 Chrome。

【解决方案2】：

首先让我们看看 xpath。

示例 HTML：

<!DOCTYPE html>
<html>
	<body>
	<div>
		<a href='https://google.com'>Google</a>
		<a href='https://yahoo.com'>Yahoo</a>
		<a href='https://google.com'>Google</a>
		<a href='https://msn.com'>MSN</a>
	</body>
</html>

让我们看看 xpath 从上面获取不同的链接。

//a[not(@href = following::a/@href)]

xpath 中的逻辑是我们确保链接的 href 不与任何后续链接 href 匹配，如果匹配，则认为它是重复的并且 xpath 不返回该元素。

陈旧元素： 所以，现在是时候处理代码中的陈旧元素问题了。当您单击链接 1 时，存储在 listOfLinks 中的所有引用都将无效，因为 selenium 将在每次加载页面时将新引用分配给元素。当您尝试使用旧参考访问元素时，您将获得stale element 异常。这是应该给你一个想法的代码的sn-p。

List<WebElement> listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]"));
Thread.sleep(500);
pageSize = listOfLinks.size();
System.out.println( "The number of links in the page is: " + pageSize);
//iterate through all the links on the page
for ( int i = 0; i < pageSize; i++)
{
    // ===> consider adding step to explicit wait for the Link element with "//a[contains(@href,'Link')][not(@href = following::a/@href)]" xpath present using WebDriverWait 
    // don't hard code the sleep 
    // ===> added this line
    <WebElement> link = driver.findElements(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]")).get(i);
    System.out.println( "Clicking on link: " + i );
    // ===> updated next 2 lines
    linkText = link.getText();
    link.click();
    // ===> consider adding explicit wait using WebDriverWait to make sure the span exist before clicking. 
    driver.findElement(By.xpath("//span[contains(@title,'download')]")).click();
    // ===> check this answer (https://stackoverflow.com/questions/34548041/selenium-give-file-name-when-downloading/56570364#56570364) for make sure the download is completed before clicking on browser back rather than sleep for x seconds.
    driver.navigate().back();
    // ===>  removed hard coded wait time (sleep)
}

xpath 截图：

编辑1：

如果您想在新窗口中打开链接，请使用以下逻辑。

WebDriverWait wait = new WebDriverWait(driver, 20);
        wait.until(ExpectedConditions.presenceOfAllElementsLocatedBy(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]")));
        List<WebElement> listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]"));
        JavascriptExecutor js = (JavascriptExecutor) driver; 
        for (WebElement link : listOfLinks) {
            // get the href
            String href = link.getAttribute("href");
            // open the link in new tab
            js.executeScript("window.open('" + href +"')");
            // switch to new tab
            ArrayList<String> tabs = new ArrayList<String> (driver.getWindowHandles());
            driver.switchTo().window(tabs.get(1));
            //click on download

            //close the new tab
            driver.close();
            // switch to parent window
            driver.switchTo().window(tabs.get(0));
         }

截图：抱歉截图质量不佳，由于大小限制无法上传高质量视频。

【讨论】：

添加了使用 xpath 获取不同链接元素（不包括重复链接）的信息。

【解决方案3】：

你可以这样做。

将列表中元素的索引保存到哈希表中
如果 Hashtable 已经包含，跳过它
一旦完成，HT 就只有独特的元素，即第一个创建者

HT 的值是来自 listOfLinks 的索引

        HashTable < String, Integer > hs1 = new HashTable(String, Integer);
        for (int i = 0; i < listOfLinks.size(); i++) {
            if (!hs1.contains(e.getText()) {

                    hs1.add(e.getText(), i);
                }
            }
            for (int i: hs1.values()) {

                listOfLinks.get(i).click();
            }

【讨论】：

嗨@Arun我试过这种方式。问题是异常部分。在我单击后退按钮并返回第 1 页后，链接列表已经过时。应该有更好的方法来处理这个我无法弄清楚。 catch(org.openqa.selenium.StaleElementReferenceException ex) { listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')]"));链接文本 = listOfLinks.get(i).getText(); listOfLinks.get(i).click(); }
@Prem 仅供参考，当您单击元素并且页面上的元素重新加载时，Selenium 将刷新元素引用。因此，您不能再使用旧的元素引用。检查下面我的答案，如果有的话，提出你的想法/cmets。