【发布时间】:2015-06-18 21:50:28
【问题描述】:
我需要使用全球化规则来搜索文档中所有出现的字符串。伪代码为:
var searchText = "Hello, World";
var compareInfo = new CultureInfo("en-US").CompareInfo;
DocumentIterator start = null; // the start position if a match occurs
var sb = new StringBuilder();
// the document is not a string, but exposes an iterator to its content
for (var iter = doc.Start(); iter.IsValid(); ++iter)
{
start = start ?? iter; // the start of the potential match
var ch = iter.GetChar();
sb.Append(ch);
if (compareInfo.Compare(searchText, sb.ToString()) == 0) // exact match
{
Console.WriteLine($"match at {start}-{iter}");
// not shown: continue to search for more occurrences.
}
else if (!compareInfo.IsPrefix(criteria.Text, sb.ToString()))
{
// restart the search from the character immediately following start
sb.Clear();
iter = start; // this gets incremented immediately
start = null;
}
}
这将文化敏感字符串匹配的艰巨工作委托给 CompareInfo。
但是,代码实现的类流过程存在性能问题,因为它在每次迭代中调用 StringBuilder.ToString(),从而破坏了 StringBuilder 的性能优势。
问题:如何有效地进行此搜索?
【问题讨论】:
-
为什么不能使用 compareInfo.IndexOf(searchText, sb) where sb - full document?
-
@Oleg,我已经编辑了代码以更清楚地表明文档不是字符串,而是将迭代器暴露给其字符内容。
标签: c# string search stringbuilder culture