【问题标题】:Regular Expressions slowing down the program正则表达式减慢程序
【发布时间】:2014-11-11 22:10:25
【问题描述】:

我正在尝试创建一个程序来解析游戏聊天日志中的数据。到目前为止,我已经设法让程序运行并解析我想要的数据,但我的问题是程序变得越来越慢。

目前解析一个 10MB 的文本文件需要 5 秒,我注意到如果我将 RegexOptions.Compiled 添加到我的正则表达式,它会下降到 3 秒。

我相信我已经确定了我的正则表达式匹配的问题。由于 5 个正则表达式,当前一行被读取了 5 次,所以当我稍后添加更多时,程序会变得更慢。

我应该怎么做才能使我的程序不会因多个正则表达式而变慢?感谢所有使代码更好的建议!

if (sender.Equals(ButtonParse))
        {
            var totalShots = 0f;
            var totalHits = 0f;
            var misses = 0;
            var crits = 0;

            var regDmg = new Regex(@"(?<=\bSystem\b.* You inflicted )\d+.\d", RegexOptions.Compiled);
            var regMiss = new Regex(@"(?<=\bSystem\b.* Target evaded attack)", RegexOptions.Compiled);
            var regCrit = new Regex(@"(?<=\bSystem\b.* Critical hit - additional damage)", RegexOptions.Compiled);
            var regHeal = new Regex(@"(?<=\bSystem\b.* You healed yourself )\d+.\d", RegexOptions.Compiled);
            var regDmgrec = new Regex(@"(?<=\bSystem\b.* You take )\d+.\d", RegexOptions.Compiled);

            var dmgList = new List<float>(); //New list for damage values
            var healList = new List<float>(); //New list for heal values
            var dmgRecList = new List<float>(); //New list for damage received values

            using (var sr = new StreamReader(TextBox1.Text))
            {
                while (!sr.EndOfStream)
                {
                    var line = sr.ReadLine();

                    var match = regDmg.Match(line);
                    var match2 = regMiss.Match(line);
                    var match3 = regCrit.Match(line);
                    var match4 = regHeal.Match(line);
                    var match5 = regDmgrec.Match(line);

                    if (match.Success)
                    {
                        dmgList.Add(float.Parse(match.Value, CultureInfo.InvariantCulture));
                        totalShots++;
                        totalHits++;
                    }
                    if (match2.Success)
                    {
                        misses++;
                        totalShots++;
                    }
                    if (match3.Success)
                    {
                        crits++;
                    }
                    if (match4.Success)
                    {
                        healList.Add(float.Parse(match4.Value, CultureInfo.InvariantCulture));
                    }
                    if (match5.Success)
                    {
                        dmgRecList.Add(float.Parse(match5.Value, CultureInfo.InvariantCulture));
                    }
                }
                TextBlockTotalShots.Text = totalShots.ToString(); //Show total shots
                TextBlockTotalDmg.Text = dmgList.Sum().ToString("0.##"); //Show total damage inflicted

                TextBlockTotalHits.Text = totalHits.ToString(); //Show total hits
                var hitChance = totalHits / totalShots; //Calculate hit chance
                TextBlockHitChance.Text = hitChance.ToString("P"); //Show hit chance

                TextBlockTotalMiss.Text = misses.ToString(); //Show total misses
                var missChance = misses / totalShots; //Calculate miss chance
                TextBlockMissChance.Text = missChance.ToString("P"); //Show miss chance

                TextBlockTotalCrits.Text = crits.ToString(); //Show total crits
                var critChance = crits / totalShots; //Calculate crit chance
                TextBlockCritChance.Text = critChance.ToString("P"); //Show crit chance

                TextBlockDmgHealed.Text = healList.Sum().ToString("F1"); //Show damage healed

                TextBlockDmgReceived.Text = dmgRecList.Sum().ToString("F1"); //Show damage received

                var pedSpent = dmgList.Sum() / (float.Parse(TextBoxEco.Text, CultureInfo.InvariantCulture) * 100); //Calculate ped spent
                TextBlockPedSpent.Text = pedSpent.ToString("0.##") + " PED"; //Estimated ped spent
            }
        }

这是一个示例文本:

2014-09-02 23:07:22 [System] [] You inflicted 45.2 points of damage.
2014-09-02 23:07:23 [System] [] You inflicted 45.4 points of damage.
2014-09-02 23:07:24 [System] [] Target evaded attack.
2014-09-02 23:07:25 [System] [] You inflicted 48.4 points of damage.
2014-09-02 23:07:26 [System] [] You inflicted 48.6 points of damage.
2014-10-15 12:39:55 [System] [] Target evaded attack.
2014-10-15 12:39:58 [System] [] You inflicted 56.0 points of damage.
2014-10-15 12:39:59 [System] [] You inflicted 74.6 points of damage.
2014-10-15 12:40:02 [System] [] You inflicted 78.6 points of damage.
2014-10-15 12:40:04 [System] [] Target evaded attack.
2014-10-15 12:40:06 [System] [] You inflicted 66.9 points of damage.
2014-10-15 12:40:08 [System] [] You inflicted 76.2 points of damage.
2014-10-15 12:40:12 [System] [] You take 18.4 points of damage.
2014-10-15 12:40:14 [System] [] You inflicted 76.1 points of damage.
2014-10-15 12:40:17 [System] [] You inflicted 88.5 points of damage.
2014-10-15 12:40:19 [System] [] You inflicted 69.0 points of damage.
2014-10-19 05:56:30 [System] [] Critical hit - additional damage! You inflict 275.4 points of damage.
2014-10-19 05:59:29 [System] [] You inflicted 92.8 points of damage.
2014-10-19 05:59:31 [System] [] Critical hit - additional damage! You inflict 251.5 points of damage.
2014-10-19 05:59:35 [System] [] You take 59.4 points of damage.
2014-10-19 05:59:39 [System] [] You healed yourself 84.0 points.

【问题讨论】:

  • 不要使用lookarounds(可变长度lookbehinds,即(?&lt;=...)),而是使用捕获组(...)来获取您感兴趣的值。
  • 非常感谢 Qtax!我不知道lookbehind对性能的影响,现在我了解了分组。现在处理整个事情只需不到一秒钟的时间。

标签: c# regex parsing


【解决方案1】:

这是我看到的问题

  1. 正如 cmets 中所建议的那样,对于基本模式情况,正则表达式解析器的工作方式并没有太多。
  2. 为什么要对同一文本多次解析数据?创建一个正则表达式模式来完成所有工作,每行扫描一次。
  3. 在 WPF 中,不要让 GUI 线程停止工作,而是在后台任务中完成工作并更新视图模型(您使用的是 MVVM 对吗?)这将使用 INotifyPropertyChanged 事件将信息传播到屏幕。李>

以下是一个逐行工作的正则表达式模式解决方案。它的第一个任务是验证[System] 是否包含在行中。如果不是,则在该行上不匹配。如果它确实有系统,那么它会查找特定的关键字和可能的值,并将它们放入正则表达式 named match captures 在键/值对的情况下。

一旦使用 linq 完成,它将汇总找到的值。请注意,我已经注释了该模式并让正则表达式解析器忽略它。

string pattern = @"^       # Beginning of line to anchor it.
(?=.+\[System\])           # Within the line a literal '[System]' has to occur
(?=.+                      # Somewhere within that line search for these keywords:
  (?<Action>               # Named Match Capture Group 'Action' will hold a keyword.
          inflicte?d?      # if the line has inflict or inflicted put it into 'Action'
          |                # or
          evaded           # evaded
          | take           # or take
          | yourself       # or yourself (heal)
   )
  (\s(?<Value>[\d.]+))?)   # if a value of points exist place into 'Value'
.+                         # match one or more to complete it.
$                          #end of line to stop on";

 // IgnorePatternWhiteSpace only allows us to comment the pattern. Does not affect processing.
var tokens =
   Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline)
        .OfType<Match>()
        .Select( mt => new {
                            Action = mt.Groups["Action"].Value,
                            Value  = mt.Groups["Value"].Success ? double.Parse(mt.Groups["Value"].Value) : 0,
                            Count  = 1,
                           })
         .GroupBy ( itm => itm.Action,  // Each action will be grouped into its name for summing
                    itm => itm,   // This is value to summed amongst the individual items of the group.
                    (action, values) => new
                            {
                                Action = action,
                                Count  = values.Sum (itm => itm.Count),
                                Total  = values.Sum(itm => itm.Value)
                             }
                         );

结果

linq 结果将每个标记作为一个实体返回,该实体汇总了操作的所有值,但也计算了这些操作发生的次数。

数据

string data=@"2014-09-02 23:07:22 [System] [] You inflicted 45.2 points of damage.
2014-09-02 23:07:23 [System] [] You inflicted 45.4 points of damage.
2014-09-02 23:07:24 [System] [] Target evaded attack.
2014-09-02 23:07:25 [System] [] You inflicted 48.4 points of damage.
2014-09-02 23:07:26 [System] [] You inflicted 48.6 points of damage.
2014-10-15 12:39:55 [System] [] Target evaded attack.
2014-10-15 12:39:58 [System] [] You inflicted 56.0 points of damage.
2014-10-15 12:39:59 [System] [] You inflicted 74.6 points of damage.
2014-10-15 12:40:02 [System] [] You inflicted 78.6 points of damage.
2014-10-15 12:40:04 [System] [] Target evaded attack.
2014-10-15 12:40:06 [System] [] You inflicted 66.9 points of damage.
2014-10-15 12:40:08 [System] [] You inflicted 76.2 points of damage.
2014-10-15 12:40:12 [System] [] You take 18.4 points of damage.
2014-10-15 12:40:14 [System] [] You inflicted 76.1 points of damage.
2014-10-15 12:40:17 [System] [] You inflicted 88.5 points of damage.
2014-10-15 12:40:19 [System] [] You inflicted 69.0 points of damage.
2014-10-19 05:56:30 [System] [] Critical hit - additional damage! You inflict 275.4 points of damage.
2014-10-19 05:59:29 [System] [] You inflicted 92.8 points of damage.
2014-10-19 05:59:31 [System] [] Critical hit - additional damage! You inflict 251.5 points of damage.
2014-10-19 05:59:35 [System] [] You take 59.4 points of damage.
2014-10-19 05:59:39 [System] [] You healed yourself 84.0 points.";

【讨论】:

  • 感谢您向我展示正则表达式模式。我尝试做类似的事情,但我缺乏技能,但现在我正在学习。我在我的程序中实现了这一点,但我有两个问题。首先,这适用于示例数据,但对于实际日志,它会给我错误“输入字符串的格式不正确”。其次,我在只输入回避和关键数据后让它工作,但程序现在需要 10 秒来处理数据。
  • @nhahtdh 我同意可以进一步修改模式以清除不必要的部分以加快速度。那是我的第一次尝试,所以可能有更好的模式。我在开头有那个部分(^)并匹配但不捕获(?: ),或者吃掉用户正在寻找的真正匹配之一的开头。该部分将其锚定到每一行以进行匹配。但我同意它可以在某些时候更改为仅查找特定单词匹配而没有行锚的开头。
  • 我建议放弃整个内容,因为关键字比 .+ 部分更能成为锚。
  • @nhahtdh 我应该指出我需要检查消息是否来自 [System] 而不是其他地方。由于日志还包含聊天数据,如果玩家输入这些单词,我会得到不准确的数据,因为它不是来自[系统]。无论如何,正如 Qtax 对我的问题所评论的那样,lookbehinds 在我的程序中产生了巨大的性能影响,并且删除它们解决了这个问题,但如果你知道更好的方法来解决这个问题,请随时回答。
  • @Iceyou90 我更新了我的正则表达式以验证它只获取系统行,但更改了逻辑以将每个项目放入可以总结的键值对中。正如新帖子中提到的,由于这是 WPF,因此不会在 GUI 线程上进行任何处理,请将操作放在后台线程/任务中,以免中断用户。
猜你喜欢
  • 2023-03-17
  • 1970-01-01
  • 1970-01-01
  • 2016-06-18
  • 2012-03-30
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多