Selenium 没有使用与网站相同的 HTML答案

【问题标题】：Selenium isn't pulling off the same HTML that is on the siteSelenium 没有使用与网站相同的 HTML
【发布时间】：2015-03-02 00:50:29
【问题描述】：

我正在开发一个项目，该项目连接到成绩查看器并从网站上提取 html。但是，当它这样做时，它似乎失去了一些东西。我正在连接到页面并使用 Selenium WebDriver 打印 Web 源，但它拉出的 html 与我在页面上看到的 html 略有不同。这里和那里只是缺少小块。这是我的代码：

    //get into frame
    driver.switchTo().frame(driver.findElement(By.id("sg-legacy-iframe")));//get inside iFrame grades are held in   
    WebElement full = driver.findElement(By.id("btnView"));
    full.submit();//click "show full view button"

    //Print out source
    PrintWriter pw = new PrintWriter(new FileWriter(new File("grades.txt")));
    pw.println(driver.getCurrentUrl());//confirms the driver is on the correct page
    pw.println((driver.getPageSource()));//prints out html
    pw.close();

我怀疑在页面和 iFrame 之间切换时可能是某种 cookie 问题，但我真的不知道。我还有一份它应该获取的正确 HTML 代码的副本，它是实际输出，但它们很大，无法放入正文。这些是指向预期和输出的 HTML 的链接，任何机密信息都已更改。主要问题是找不到“AssignmentClass”div。

Desired HTML Output(HTML of the site)

HTML being output by my program

如果有人能解释为什么会发生这种情况或如何解决，我会永远爱你。

【问题讨论】：

你如何比较html？以及如何在txt 文件中获取html？是同一个版本吗？
对于所需的输出，我刚刚连接到站点并将 html 复制粘贴到文本文件中。对于实际输出，这一切都是由java完成的。它们非常相似，我只是并排滚动浏览它们以比较它们。在“
" 行
之后，文档开始变化
恐怕你比较的html的版本不一样。根据我的理解，来自源和来自客户端的一个不一定相同。
这是同一版本的 HTML，它们来自完全相同的页面。它只是我的期望输出的子序列

标签： java html selenium webdriver

【解决方案1】：

在我过去的一个项目中，我使用getAttribute() 来获取 html 的源代码。那么类似的东西，你试过了吗？

driver.switchTo().frame(driver.findElement(By.id("sg-legacy-iframe")));
WebElement full = driver.findElement(By.id("btnView"));
full.submit();//click "show full view button"
WebElement body = driver.findElement(By.tagName("body"));

//Print out source
PrintWriter pw = new PrintWriter(new FileWriter(new File("grades.txt")));
pw.println(driver.getCurrentUrl());//confirms the driver is on the correct page
pw.println(body.getAttribute("innerHTML"));//prints out html
pw.close();

【讨论】：