【问题标题】:HTML Agility Pack errorsHTML 敏捷包错误
【发布时间】:2011-01-04 22:08:21
【问题描述】:

我是第一次尝试 HTML Agility Pack,我正在使用示例代码部分来解析 HTML 中的 URL。但我收到一个错误,我不确定我为什么会收到它。有人可以指出我做错了什么吗?

这里是源代码(html 是 HTML 的传入字符串):

 StringBuilder sb = new StringBuilder();

 HtmlDocument htmldoc = new HtmlDocument();
 htmldoc.LoadHtml(html);

 foreach (HtmlNode link in htmldoc.DocumentNode.SelectNodes("//a[@HREF]"))
     {
     HtmlAttribute att = link.Attributes["HREF"];
     sb.AppendLine(att.Value + "|");
     }
 return sb.ToString();

我在调试我的应用程序时收到以下错误(调试器将它放在“foreach”之后):

System.NullReferenceException was unhandled
  Message=Object reference not set to an instance of an object.
  Source=ScreenScraper
  StackTrace:
       at ScreenScraper.its.GetITSLoadID(String html) in C:\Web_Projects\ScreenScaper\ScreenScraper\its.cs:line 22
       at ScreenScraper.frm1.btnStartScraping_Click(Object sender, EventArgs e) in C:\Web_Projects\ScreenScaper\ScreenScraper\frm1.cs:line 43
       at System.Windows.Forms.Control.OnClick(EventArgs e)
       at System.Windows.Forms.Button.OnClick(EventArgs e)
       at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
       at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
       at System.Windows.Forms.Control.WndProc(Message& m)
       at System.Windows.Forms.ButtonBase.WndProc(Message& m)
       at System.Windows.Forms.Button.WndProc(Message& m)
       at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
       at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
       at System.Windows.Forms.NativeWindow.DebuggableCallback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
       at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG& msg)
       at System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr dwComponentID, Int32 reason, Int32 pvLoopData)
       at System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32 reason, ApplicationContext context)
       at System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason, ApplicationContext context)
       at System.Windows.Forms.Application.Run(Form mainForm)
       at ScreenScraper.Program.Main() in C:\Web_Projects\ScreenScaper\ScreenScraper\Program.cs:line 18
       at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
       at System.AppDomain.nExecuteAssembly(RuntimeAssembly assembly, String[] args)
       at System.Runtime.Hosting.ManifestRunner.Run(Boolean checkAptModel)
       at System.Runtime.Hosting.ManifestRunner.ExecuteAsAssembly()
       at System.Runtime.Hosting.ApplicationActivator.CreateInstance(ActivationContext activationContext, String[] activationCustomData)
       at System.Runtime.Hosting.ApplicationActivator.CreateInstance(ActivationContext activationContext)
       at System.Activator.CreateInstance(ActivationContext activationContext)
       at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssemblyDebugInZone()
       at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()
  InnerException: 

【问题讨论】:

  • 也许创建列表有问题。只是一个建议,但尝试在 foreach 之前初始化 HtmlNodes 列表,然后遍历创建的列表。至少那时您将能够看到是循环还是导致它的列表

标签: screen-scraping html-agility-pack


【解决方案1】:

Html Agility Pack 有一个“设计错误”,它会为空集合返回 null。所以你需要这样做:

HtmlNodeList list = htmldoc.DocumentNode.SelectNodes("//a[@HREF]");
if (list != null)
{
  foreach (HtmlNode link in list)
  ...
}

顺便说一句,在 XPATH 表达式中指定的所有标签都必须是小写的,即使它们在 HTML 文本中声明不同(因为 HTML 不区分大小写,默认的 Html Agility Pack XPATH 约定是使用小写标签)。所以你应该这样写:

HtmlNodeList list = htmldoc.DocumentNode.SelectNodes("//a[@href]");

【讨论】:

  • 对此的一个后续问题。如果我正在寻找特定的“SelectNode”而不是“SelectNodes”怎么办。我还会把它放在一个列表中然后评估它吗?
  • @Wildbill - 如果没有找到任何内容,SelectNode 返回 null(这与标准 .NET 规则一致),否则返回所选节点
猜你喜欢
  • 2017-06-01
  • 2011-01-26
  • 1970-01-01
  • 1970-01-01
  • 2016-06-01
  • 2017-06-21
  • 1970-01-01
相关资源
最近更新 更多