【发布时间】:2015-01-03 12:52:48
【问题描述】:
我正在用 C# 编写一个 Levenshtein Distance 函数来计算两个字符串之间的编辑距离。问题是我想用不同的排序规则多次调用该方法,但只有一个排序规则可以通过 SQL 到 CLR 接口 - 这是数据库的默认排序规则。
这里是CLR函数的代码:
[SqlFunction(IsDeterministic = true, Name = "LevenshteinDistance")]
public static SqlInt64 Distance(SqlString textA, SqlString textB)
{
// get a collation-aware comparer so string/character comparisons
// will match the inputs' specified collation
var aCompareInfo = textA.CompareInfo;
var compareOptions = ConvertCompareOptions(textA.SqlCompareOptions);
var aLength = textA.Value.Length;
var bLength = textB.Value.Length;
// degenerate cases
if (aCompareInfo.Compare(textA.Value, 0, aLength, textB.Value, 0, bLength, compareOptions) == 0) { return 0; }
if (aLength == 0) { return bLength; }
if (bLength == 0) { return aLength; }
// create two work vectors of integer distances
var previousDistances = new SqlInt64[Maximum(aLength, bLength) + 1];
var currentDistances = new SqlInt64[Maximum(aLength, bLength) + 1];
// initialize previousDistances (the previous row of distances)
// this row is A[0][i]: edit distance for an empty textA
// the distance is just the number of characters to delete from textB
for (var i = 0; i < previousDistances.Length; i++)
{
previousDistances[i] = i;
}
for (var i = 0; i < aLength; i++)
{
// calculate currentDistances from the previous row previousDistances
// first element of currentDistances is A[i+1][0]
// edit distance is delete (i+1) chars from textA to match empty textB
currentDistances[0] = i + 1;
// use formula to fill in the rest of the row
for (var j = 0; j < bLength; j++)
{
var cost = (aCompareInfo.Compare(textA.Value, i, 1, textB.Value, j, 1, compareOptions) == 0) ? 0 : 1;
currentDistances[j + 1] = Minimum(currentDistances[j] + 1, previousDistances[j + 1] + 1, previousDistances[j] + cost);
}
// copy currentDistances to previousDistances for next iteration
for (var j = 0; j < previousDistances.Length; j++)
{
previousDistances[j] = currentDistances[j];
}
}
return currentDistances[bLength];
}
将 CLR 程序集部署到 SQL Server (2008 R2) 并像这样调用它之后:
print dbo.LevenshteinDistance('abc' collate Latin1_General_CI_AI, 'ABC' collate Latin1_General_CI_AI)
print dbo.LevenshteinDistance('abc' collate Latin1_General_CS_AS_KS_WS, N'ABC' collate Latin1_General_CS_AS_KS_WS)
两个调用都返回零 (0)。因为我为第二次调用指定了区分大小写的排序规则,所以我希望第二次调用返回三 (3)。
在 SQL Server 中使用 CLR 函数,是否可以指定数据库默认值以外的排序规则并在 CLR 函数中使用它们?如果有,怎么做?
【问题讨论】:
标签: c# sql-server sql-server-2008-r2 collation sqlclr