【问题标题】:Webbrowser behaviour issues网络浏览器行为问题
【发布时间】:2013-09-05 12:40:46
【问题描述】:

我正在尝试使用 .NET C# 自动化 Web 浏览器。问题是控件或者我应该说 IE 浏览器在不同的计算机上表现得很奇怪。例如,我在第一台计算机上单击链接并填写 Ajax 弹出表单,没有任何错误:

private void btn_Start_Click(object sender, RoutedEventArgs e)
{
    webbrowserIE.Navigate("http://www.test.com/");
    webbrowserIE.DocumentCompleted += fillup_LoadCompleted; 
}

void fillup_LoadCompleted(object sender, System.Windows.Forms.WebBrowserDocumentCompletedEventArgs e)
{
    System.Windows.Forms.HtmlElement ele = web_BrowserIE.Document.GetElementById("login");
    if (ele != null)
        ele.InvokeMember("Click");

    if (this.web_BrowserIE.ReadyState == System.Windows.Forms.WebBrowserReadyState.Complete)
    {
        web_BrowserIE.Document.GetElementById("login").SetAttribute("value", myUserName);
        web_BrowserIE.Document.GetElementById("password").SetAttribute("value", myPassword);

        foreach (System.Windows.Forms.HtmlElement el in web_BrowserIE.Document.GetElementsByTagName("button"))
        {
            if (el.InnerText == "Login")
            {
                el.InvokeMember("click");
            }
        }

        web_BrowserIE.DocumentCompleted -= fillup_LoadCompleted;        
    }
}

但是,上面的代码不能在第二台电脑上运行,唯一的点击方式是这样的:

private void btn_Start_Click(object sender, RoutedEventArgs e)
{
    webbrowserIE.DocumentCompleted += click_LoadCompleted;
    webbrowserIE.Navigate("http://www.test.com/"); 
}

void click_LoadCompleted(object sender, System.Windows.Forms.WebBrowserDocumentCompletedEventArgs e)
{
    if (this.webbrowserIE.ReadyState == System.Windows.Forms.WebBrowserReadyState.Complete)
    {
        System.Windows.Forms.HtmlElement ele = webbrowserIE.Document.GetElementById("login");
        if (ele != null)
            ele.InvokeMember("Click");

        webbrowserIE.DocumentCompleted -= click_LoadCompleted;
        webbrowserIE.DocumentCompleted += fillup_LoadCompleted;
    }
}

void click_LoadCompleted(object sender, System.Windows.Forms.WebBrowserDocumentCompletedEventArgs e)
{

        webbrowserIE.Document.GetElementById("login_login").SetAttribute("value", myUserName);
        webbrowserIE.Document.GetElementById("login_password").SetAttribute("value", myPassword);

        //If you know the ID of the form you would like to submit:
        foreach (System.Windows.Forms.HtmlElement el in webbrowserIE.Document.GetElementsByTagName("button"))
        {
            if (el.InnerText == "Login")
            {
                el.InvokeMember("click");
            }
        }

        webbrowserIE.DocumentCompleted -= click_LoadCompleted;      
}

因此,在第二个解决方案中,我必须调用两个加载完成链。有人可以建议我应该如何处理这个问题吗?此外,提出更稳健的方法的建议将非常有帮助。提前谢谢你

【问题讨论】:

    标签: c# browser web-scraping webbrowser-control screen-scraping


    【解决方案1】:

    我可以推荐两件事:

    • 不要在 DocumentComplete 事件上执行您的代码,而是在 DOM window.onload 事件上执行。
    • 为确保您的网页在 WebBrowser 控件中的行为与在完整 Internet Explorer 浏览器中的行为方式相同,请考虑实现 Feature Control

    [EDITED] 根据您的代码结构,还有一个建议。显然,您执行了一系列导航/处理DocumentComplete 操作。为此,使用async/await 可能更自然、更容易。这是一个这样做的例子,有或没有async/await。它也说明了如何处理onload

    async Task DoNavigationAsync()
    {
        bool documentComplete = false;
        TaskCompletionSource<bool> onloadTcs = null;
    
        WebBrowserDocumentCompletedEventHandler handler = delegate 
        {
            if (documentComplete)
                return; // attach to onload only once per each Document
            documentComplete = true;
    
            // now subscribe to DOM onload event
            this.wb.Document.Window.AttachEventHandler("onload", delegate
            {
                // each navigation has its own TaskCompletionSource
                if (onloadTcs.Task.IsCompleted)
                    return; // this should not be happening
    
                // signal the completion of the page loading
                onloadTcs.SetResult(true);
            });
        };
    
        // register DocumentCompleted handler
        this.wb.DocumentCompleted += handler;
    
        // Navigate to http://www.example.com?i=1
        documentComplete = false;
        onloadTcs = new TaskCompletionSource<bool>();
        this.wb.Navigate("http://www.example.com?i=1");
        await onloadTcs.Task;
        // the document has been fully loaded, you can access DOM here
        MessageBox.Show(this.wb.Document.Url.ToString());
    
        // Navigate to http://example.com?i=2
        // could do the click() simulation instead
    
        documentComplete = false;
        onloadTcs = new TaskCompletionSource<bool>(); // new task for new navigation
        this.wb.Navigate("http://example.com?i=2");
        await onloadTcs.Task;
        // the document has been fully loaded, you can access DOM here
        MessageBox.Show(this.wb.Document.Url.ToString());
    
        // no more navigation, de-register DocumentCompleted handler
        this.wb.DocumentCompleted -= handler;
    }
    

    这是相同的代码没有async/await 模式(对于.NET 4.0):

    Task DoNavigationAsync()
    {
        // save the correct continuation context for Task.ContinueWith
        var continueContext = TaskScheduler.FromCurrentSynchronizationContext(); 
    
        bool documentComplete = false;
        TaskCompletionSource<bool> onloadTcs = null;
    
        WebBrowserDocumentCompletedEventHandler handler = delegate 
        {
            if (documentComplete)
                return; // attach to onload only once per each Document
            documentComplete = true;
    
            // now subscribe to DOM onload event
            this.wb.Document.Window.AttachEventHandler("onload", delegate
            {
                // each navigation has its own TaskCompletionSource
                if (onloadTcs.Task.IsCompleted)
                    return; // this should not be happening
    
                // signal the completion of the page loading
                onloadTcs.SetResult(true);
            });
        };
    
        // register DocumentCompleted handler
        this.wb.DocumentCompleted += handler;
    
        // Navigate to http://www.example.com?i=1
        documentComplete = false;
        onloadTcs = new TaskCompletionSource<bool>();
        this.wb.Navigate("http://www.example.com?i=1");
    
        return onloadTcs.Task.ContinueWith(delegate 
        {
            // the document has been fully loaded, you can access DOM here
            MessageBox.Show(this.wb.Document.Url.ToString());
    
            // Navigate to http://example.com?i=2
            // could do the 'click()' simulation instead
    
            documentComplete = false;
            onloadTcs = new TaskCompletionSource<bool>(); // new task for new navigation
            this.wb.Navigate("http://example.com?i=2");
    
            onloadTcs.Task.ContinueWith(delegate 
            {
                // the document has been fully loaded, you can access DOM here
                MessageBox.Show(this.wb.Document.Url.ToString());
    
                // no more navigation, de-register DocumentCompleted handler
                this.wb.DocumentCompleted -= handler;
            }, continueContext);
    
        }, continueContext);
    }
    

    注意,在这两种情况下,它仍然是一段异步代码,它返回一个Task 对象。以下是如何处理此类任务完成的示例:

    private void Form1_Load(object sender, EventArgs e)
    {
        DoNavigationAsync().ContinueWith(_ => {
            MessageBox.Show("Navigation complete!");
        }, TaskScheduler.FromCurrentSynchronizationContext());
    }
    

    在这里使用TAP pattern 的好处是DoNavigationAsync 是一个自包含的、独立的方法。它可以重复使用,并且不会干扰父对象(在本例中为主窗体)的状态。

    【讨论】:

    • 实际上,我已经实现了功能控制,因为这是我的问题:),我不确定 DOM onload 事件,因为我需要完成网页事件的下载才能实现点击?还是?
    • 哦.. 我应该检查一下,我确实觉得我之前和@Jim 说过话 :] 随意尝试我链接的codeonload 确实在页面有时被解雇了满载(包括任何帧)。这种方法是一种更可靠的方法,可以确保页面已完成自己对onloadonreadystatechange DOM 事件的处理。
    • 您可以解释一下,因为您注册了 DocumentCompleted 事件并在其中附加了 AttachEventHandler
    • 你说得对,我执行了一系列 DocumentComplete 操作。我正在开发 .net 4,所以不确定 async/await 是否可用。另外,奇怪的是,通过事件处理,我可以在一台电脑上工作,而不能在另一台电脑上工作。那么根据您的经验, async/await 会是更强大的解决方案吗?
    • 功能控制对我有用。谢谢
    猜你喜欢
    • 2011-03-11
    • 1970-01-01
    • 1970-01-01
    • 2011-06-11
    • 1970-01-01
    • 1970-01-01
    • 2023-01-29
    • 2012-10-30
    • 1970-01-01
    相关资源
    最近更新 更多