【问题标题】:Multithreaded WebRequests being executed in succession连续执行的多线程 WebRequest
【发布时间】:2015-12-20 21:10:49
【问题描述】:

我正在用 C# 构建一个网络抓取工具,用于处理代理和大量请求。这些页面是通过 ConnectionManager 类加载的,该类抓取一个代理并使用随机代理重试加载该页面,直到页面正确加载。

平均而言,单个任务需要 100 到 300 个请求,为了加快处理速度,我设计了使用多线程同时下载网页的方法。

        public Review[] getReviewsMultithreaded(int reviewCount)
    {
        ArrayList reviewList = new ArrayList();
        int currentIndex = 0;
        int currentPage = 1;
        int totalPages = (reviewCount / 10) + 1;
        bool threadHasMoreWork = true;
        Object pageLock = new Object();
        Thread[] threads = new Thread[Program.maxScraperThreads];

        for(int i = 0; i < Program.maxScraperThreads; i++)
        {
            threads[i] = (new Thread(() => 
            {
                while (threadHasMoreWork)
                {
                    HtmlDocument doc;
                    lock(pageLock)
                    {
                        if (currentPage <= totalPages)
                        {
                            string builtString = "http://www.example.com/reviews/" + _ID + "?pageNumber=" + currentPage;
                            //Log.WriteLine(builtString);
                            currentPage++;
                            doc = Program.conManager.loadDocument(builtString);
                        }
                        else
                        {
                            threadHasMoreWork = false;
                            continue;
                        }
                    }

                    try
                        {
                            //Get info from page and add to list
                            reviewList.Add(cRev);
                        }
                        Log.WriteLine(_asin + " reviews scraped: " + reviewList.Count);
                    }
                    catch (Exception ex) { continue; }
                }
            }));
            threads[i].Start();
        }

        bool threadsAreRunning = true;
        while(threadsAreRunning) //this is in a separate thread itself, so as not to interrupt the GUI
        {
            threadsAreRunning = false;
            foreach (Thread t in threads)
                if (t.IsAlive)
                {
                    threadsAreRunning = true;
                    Thread.Sleep(2000);
                }
        }

        //flatten the arraylist to a primitive
        return reviewArray;
    }

但是,我注意到请求仍然主要是一次处理一个,因此该方法并没有比以前快多少。锁会导致问题吗? ConnectionManager 是在一个对象中实例化的,并且每个线程都从同一个对象调用 loadDocument 吗?

【问题讨论】:

    标签: c# multithreading


    【解决方案1】:

    啊,没关系。我注意到锁包括对加载页面的方法的调用,因此一次只加载一个页面。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2018-01-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-05-17
      • 2014-09-29
      相关资源
      最近更新 更多