所以,在看到 Adam Ralph 的帖子后,我怀疑他的解决方案比 Regex 解决方案更快。只是想我会分享我的测试结果,因为我确实发现它更快。
实际上有两个因素在起作用(忽略系统变量):提取的子字符串数量(由分隔符数量决定)和总字符串长度。下面绘制的非常简单的场景使用“A”作为由两个空格字符(空格后跟制表符)分隔的子字符串。这突出了提取的子字符串数量的影响。我继续进行了一些多变量测试,以得出适用于我的操作系统的以下通用方程。
正则表达式()
t = (28.33*SSL + 572)(SSN/10^6)
Split().Where()
t = (6.23*SSL + 250)(SSN/10^6)
其中 t 是以毫秒为单位的执行时间,SSL 是平均子字符串长度,SSN 是字符串中分隔的子字符串数。
这些方程也可以写成
t = (28.33*SL + 572*SSN)/10^6
和
t = (6.23*SL + 250*SSN)/10^6
其中 SL 是总字符串长度 (SL = SSL * SSN)
结论: Split().Where() 解决方案比 Regex() 更快。主要因素是子字符串的数量,而字符串长度起次要作用。相应系数的性能增益约为 2 倍和 5 倍。
这是我的测试代码(可能比必要的材料要多,但它是为获取我谈到的多变量数据而设置的)
using System;
using System.Linq;
using System.Diagnostics;
using System.Text.RegularExpressions;
using System.Windows.Forms;
namespace ConsoleApplication1
{
class Program
{
public enum TestMethods {regex, split};
[STAThread]
static void Main(string[] args)
{
//Compare TestMethod execution times and output result information
//to the console at runtime and to the clipboard at program finish (so that data is ready to paste into analysis environment)
#region Config_Variables
//Choose test method from TestMethods enumerator (regex or split)
TestMethods TestMethod = TestMethods.split;
//Configure RepetitionString
String RepetitionString = string.Join(" \t", Enumerable.Repeat("A",100));
//Configure initial and maximum count of string repetitions (final count may not equal max)
int RepCountInitial = 100;int RepCountMax = 1000 * 100;
//Step increment to next RepCount (calculated as 20% increase from current value)
Func<int, int> Step = x => (int)Math.Round(x / 5.0, 0);
//Execution count used to determine average speed (calculated to adjust down to 1 execution at long execution times)
Func<double, int> ExecutionCount = x => (int)(1 + Math.Round(500.0 / (x + 1), 0));
#endregion
#region NonConfig_Variables
string s;
string Results = "";
string ResultInfo;
double ResultTime = 1;
#endregion
for (int RepCount = RepCountInitial; RepCount < RepCountMax; RepCount += Step(RepCount))
{
s = string.Join("", Enumerable.Repeat(RepetitionString, RepCount));
ResultTime = Test(s, ExecutionCount(ResultTime), TestMethod);
ResultInfo = ResultTime.ToString() + "\t" + RepCount.ToString() + "\t" + ExecutionCount(ResultTime).ToString() + "\t" + TestMethod.ToString();
Console.WriteLine(ResultInfo);
Results += ResultInfo + "\r\n";
}
Clipboard.SetText(Results);
}
public static double Test(string s, int iMax, TestMethods Method)
{
switch (Method)
{
case TestMethods.regex:
return Math.Round(RegexRunTime(s, iMax),2);
case TestMethods.split:
return Math.Round(SplitRunTime(s, iMax),2);
default:
return -1;
}
}
private static double RegexRunTime(string s, int iMax)
{
Stopwatch sw = new Stopwatch();
sw.Restart();
for (int i = 0; i < iMax; i++)
{
System.Collections.Generic.IEnumerable<string> ens = Regex.Split(s, @"\s+");
}
sw.Stop();
return Math.Round(sw.ElapsedMilliseconds / (double)iMax, 2);
}
private static double SplitRunTime(string s,int iMax)
{
Stopwatch sw = new Stopwatch();
sw.Restart();
for (int i = 0; i < iMax; i++)
{
System.Collections.Generic.IEnumerable<string> ens = s.Split().Where(x => x != string.Empty);
}
sw.Stop();
return Math.Round(sw.ElapsedMilliseconds / (double)iMax, 2);
}
}
}