【问题标题】:Scrape data from local HTML file从本地 HTML 文件中抓取数据
【发布时间】:2021-03-18 00:10:47
【问题描述】:

我正在尝试从 windows battery-report.html 中提取设计容量 mWh 和完全充电容量 mWh 的值,HTML 文档将这些值存储在一个表中,但没有我可以轻松访问的属性名称 我确实添加了 AngleSharp,但不太清楚如何在这种情况下使用它来获取我需要的数据,但它可能不适合这项工作。

  </td>
</tr></thead>
<tr>
   <td><span class="label">NAME</span></td>
   <td>Blade</td>
</tr>
<tr>
   <td><span class="label">MANUFACTURER</span></td>
   <td>Razer</td>
</tr>
<tr>
   <td><span class="label">SERIAL NUMBER</span></td>
   <td>CNB1RC30-027097A00283-A05</td>
</tr>
<tr>
   <td><span class="label">CHEMISTRY</span></td>
   <td>Li-I</td>
</tr>
<tr>
   <td><span class="label">DESIGN CAPACITY</span></td>
   <td>65,003 mWh
   </td>
</tr>
<tr style="height:0.4em;"></tr>
<tr>
   <td><span class="label">FULL CHARGE CAPACITY</span></td>
   <td>72,395 mWh
   </td>
</tr>
<tr>
   <td><span class="label">CYCLE COUNT</span></td>
   <td>

我生成电池报告并将其传递给getBattery

private void BatteryHealthBtn_Click(object sender, EventArgs e)
    {
        string designCap = null;
        string fullCap = null;
        ManagementObjectSearcher mybatteryObject = new ManagementObjectSearcher("select * from Win32_Battery");
        foreach (ManagementObject obj in mybatteryObject.Get())
        {
            if (obj["DesignCapacity"] != null || obj["FullChargeCapacity"] != null)
            {
                designCapTxt.Text = obj["DesignCapacity"].ToString();
                fullCapTxt.Text = obj["FullChargeCapacity"].ToString();
            }
            else
            {
                MessageBox.Show("No WMI Data Found Running Manually", "Error No WMI",
             MessageBoxButtons.OK, MessageBoxIcon.Error);
                var saveLocation = System.AppDomain.CurrentDomain.BaseDirectory + "battery-report.html";
                if (saveLocation != null)
                {
                    System.Diagnostics.Process process = new System.Diagnostics.Process();
                    System.Diagnostics.ProcessStartInfo startInfo = new System.Diagnostics.ProcessStartInfo();
                    startInfo.FileName = "cmd.exe";
                    startInfo.Arguments = "/C powercfg /batteryreport /output " + '"' + saveLocation + '"';
                    process.StartInfo = startInfo;
                    process.Start();
                    System.Diagnostics.Process.Start(saveLocation);

                    GetBattery(saveLocation);
                }
            }
        }
    }

Image of the Hmtl Document

public async void GetBattery(string html)
    {
        var config = Configuration.Default.WithDefaultLoader();
        string address = html;

        IDocument document = await
        BrowsingContext.New(config).OpenAsync(address);

        var designCap = document.GetElementsByClassName("label");
        
        MessageBox.Show(designCap.ToString(), "a",
             MessageBoxButtons.OK, MessageBoxIcon.Error);
    }

认为我越来越接近这一点,但仍然在第 4 行获得空引用

var config = Configuration.Default.WithDefaultLoader();
        var address = html;
        var document = await BrowsingContext.New(config).OpenAsync(address);
        var cellSelector = "tr td:nth-child(2)";
        var cells = document.QuerySelectorAll(cellSelector);
        var designCap = cells.Select(m => m.TextContent);

【问题讨论】:

  • maeby 我的问题问得不对。我也添加了我当前的 csharp 代码,但我只是不知道我需要搜索什么。因为我尝试过的任何标签都返回 null
  • 我看到了一个BrowsingContext 课程。这是什么库/API?是否需要使用此特定库找到解决方案?
  • 我正在使用anglesharp而不是它,但这是我尝试使用 AngleSharp.Html.Parser 向其他解决方案开放的方法;使用角度锐利;使用 AngleSharp.Dom;

标签: c# html parsing web-scraping


【解决方案1】:

不得不换成 html 敏捷包,但我明白了

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
                    // There are various options, set as needed
                    htmlDoc.OptionFixNestedTags = true;
                    // filePath is a path to a file containing the html
                    htmlDoc.Load(saveLocation);
                    foreach (HtmlNode table in htmlDoc.DocumentNode.SelectNodes("//table"))
                    {
                        foreach (HtmlNode row in table.SelectNodes("tr"))
                        {
                            if(row.InnerText.Contains("DESIGN CAPACITY"))
                            {
                                designCapTxt.Text = row.InnerText;
                            }
                            if (row.InnerText.Contains("FULL CHARGE CAPACITY"))
                            {
                                fullCapTxt.Text = row.InnerText;
                            }
                        }
                    }

【讨论】:

    猜你喜欢
    • 2017-08-26
    • 2021-03-29
    • 2012-10-30
    • 2015-08-10
    • 2012-04-03
    • 2023-04-10
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多