【问题标题】:Javascript sorting to match SQL Server sortingJavascript 排序以匹配 SQL Server 排序
【发布时间】:2011-03-14 08:56:22
【问题描述】:

谁能指出我在 javascript 中的排序算法,它的排序方式与 SQL Server 的排序方式相同(对于 nvarchar/unicode 列)?

作为参考,我之前关于这种行为的问题可以在这里找到:SQL Server 2008 - different sort orders on VARCHAR vs NVARCHAR values

与其尝试更改服务器端的排序行为,有没有办法可以在客户端进行匹配?我之前的问题专门讨论了排序顺序中的破折号,但我会假设它不仅仅是简单地忽略破折号作为排序的一部分。

我在这里添加了一些额外的用例来更好地展示这个问题

从 SQL Server (2008) 排序的示例数据:

?test  
^&$Grails Found  
bags of Garbage  
Brochures distributed  
Calls Received  
exhibit visitors  
Exhibit Visitors  
-Exhibit Visitors  
--Exhibit Visitors  
Ëxhibit Visitors  
Grails Found  

如何让 javascript 以相同的方式对相同的值进行排序?

如果我可以进一步澄清,请告诉我。

【问题讨论】:

  • 那么,从那个问题来看,您希望 JavaScript 现在将 Unicode A 排序在 Unicode -A 之前?
  • @Brock - 正确,但更具体地说,我想要一个与服务器端匹配的 javascript 排序算法(我想除了“-”字符之外还有更多需要考虑的)

标签: javascript sql-server unicode sorting


【解决方案1】:

首先,您的数据库排序规则是什么?我假设它是SQL_Latin1_General_CP1_CS_ASSQL_Latin1_General_CP1_CI_AS。如果是这样,那么以下应该可以工作(尚未完全测试)。

看起来编写一个 true Unicode 分类器是一项艰巨的任务。我见过比规格更直接的税法。 ;-) 它似乎总是涉及查找表和至少一个 3 级排序 - 修改字符和缩写来解释。

我将以下内容限制为 Latin 1Latin Extended-ALatin Extended-B 表/排序规则。该算法应该在这些集合上运行得相当好,但我没有对其进行全面测试,也没有正确考虑修改字符(以节省速度和复杂性)。

in action at jsbin.com

功能:

function bIgnoreForPrimarySort (iCharCode)
{
    /*--- A bunch of characters get ignored for the primary sort weight.
        The most important ones are the hyphen and apostrophe characters.
        A bunch of control characters and a couple of odds and ends, make up
        the rest.
    */
    if (iCharCode < 9)                                                  return true;

    if (iCharCode >= 14   &&  iCharCode <= 31)                          return true;

    if (iCharCode >= 127  &&  iCharCode <= 159)                         return true;

    if (iCharCode == 39   ||  iCharCode == 45  ||  iCharCode == 173)    return true;

    return false;
}


function SortByRoughSQL_Latin1_General_CP1_CS_AS (sA, sB)
{
    /*--- This Sorts Latin1 and extended Latin1 unicode with an approximation
        of SQL's SQL_Latin1_General_CP1_CS_AS collation.
        Certain modifying characters or contractions my be off (not tested), we trade-off
        perfect accuracy for speed and relative simplicity.

        True unicode sorting is devilishly complex and we're not getting paid enough to
        fully implement it in Javascript.  ;-)

        It looks like a definative sort would require painstaking exegesis of documents
        such as: http://unicode.org/reports/tr10/
    */
    //--- This is the master lookup table for Latin1 code-points.  Here through the extended set \u02AF
    //--- Make this static?
    var aSortOrder  = [
                     -1,  151,  152,  153,  154,  155,  156,  157,  158,    2,    3,    4,    5,    6,  159,  160,  161,  162,  163,  164,
                    165,  166,  167,  168,  169,  170,  171,  172,  173,  174,  175,  176,    0,    7,    8,    9,   10,   11,   12,  210,
                     13,   14,   15,   41,   16,  211,   17,   18,   65,   69,   71,   74,   76,   77,   80,   81,   82,   83,   19,   20,
                     42,   43,   44,   21,   22,  214,  257,  266,  284,  308,  347,  352,  376,  387,  419,  427,  438,  459,  466,  486,
                    529,  534,  538,  559,  576,  595,  636,  641,  647,  650,  661,   23,   24,   25,   26,   27,   28,  213,  255,  265,
                    283,  307,  346,  350,  374,  385,  418,  426,  436,  458,  464,  485,  528,  533,  536,  558,  575,  594,  635,  640,
                    646,  648,  660,   29,   30,   31,   32,  177,  178,  179,  180,  181,  182,  183,  184,  185,  186,  187,  188,  189,
                    190,  191,  192,  193,  194,  195,  196,  197,  198,  199,  200,  201,  202,  203,  204,  205,  206,  207,  208,  209,
                      1,   33,   53,   54,   55,   56,   34,   57,   35,   58,  215,   46,   59,  212,   60,   36,   61,   45,   72,   75,
                     37,   62,   63,   64,   38,   70,  487,   47,   66,   67,   68,   39,  219,  217,  221,  231,  223,  233,  250,  276,
                    312,  310,  316,  318,  392,  390,  395,  397,  295,  472,  491,  489,  493,  503,  495,   48,  511,  599,  597,  601,
                    603,  652,  590,  573,  218,  216,  220,  230,  222,  232,  249,  275,  311,  309,  315,  317,  391,  389,  394,  396,
                    294,  471,  490,  488,  492,  502,  494,   49,  510,  598,  596,  600,  602,  651,  589,  655,  229,  228,  227,  226,
                    235,  234,  268,  267,  272,  271,  270,  269,  274,  273,  286,  285,  290,  287,  324,  323,  322,  321,  314,  313,
                    326,  325,  320,  319,  358,  357,  362,  361,  356,  355,  364,  363,  378,  377,  380,  379,  405,  404,  403,  402,
                    401,  400,  407,  406,  393,  388,  417,  416,  421,  420,  432,  431,  428,  440,  439,  447,  446,  444,  443,  442,
                    441,  450,  449,  468,  467,  474,  473,  470,  469,  477,  484,  483,  501,  500,  499,  498,  507,  506,  527,  526,
                    540,  539,  544,  543,  542,  541,  561,  560,  563,  562,  567,  566,  565,  564,  580,  579,  578,  577,  593,  592,
                    611,  610,  609,  608,  607,  606,  613,  612,  617,  616,  615,  614,  643,  642,  654,  653,  656,  663,  662,  665,
                    664,  667,  666,  574,  258,  260,  262,  261,  264,  263,  281,  278,  277,  304,  292,  289,  288,  297,  335,  337,
                    332,  348,  349,  369,  371,  382,  415,  409,  434,  433,  448,  451,  462,  476,  479,  509,  521,  520,  524,  523,
                    531,  530,  552,  572,  571,  569,  570,  583,  582,  581,  585,  632,  631,  634,  638,  658,  657,  669,  668,  673,
                    677,  676,  678,   73,   79,   78,  680,  644,   50,   51,   52,   40,  303,  302,  301,  457,  456,  455,  482,  481,
                    480,  225,  224,  399,  398,  497,  496,  605,  604,  626,  625,  620,  619,  624,  623,  622,  621,  334,  241,  240,
                    237,  236,  254,  253,  366,  365,  360,  359,  430,  429,  505,  504,  515,  514,  675,  674,  422,  300,  299,  298,
                    354,  353,   84,   85,   86,   87,  239,  238,  252,  251,  513,  512,  243,  242,  245,  244,  328,  327,  330,  329,
                    411,  410,  413,  412,  517,  516,  519,  518,  547,  546,  549,  548,  628,  627,  630,  629,   88,   89,   90,   91,
                     92,   93,   94,   95,   96,   97,   98,   99,  100,  101,  102,  103,  104,  105,  106,  107,  108,  109,  110,  111,
                    112,  113,  114,  115,  116,  117,  118,  119,  120,  121,  122,  123,  124,  125,  126,  127,  128,  129,  130,  131,
                    132,  133,  134,  135,  136,  137,  138,  139,  140,  141,  142,  143,  246,  247,  248,  259,  279,  280,  293,  291,
                    339,  336,  338,  331,  340,  341,  342,  423,  367,  373,  351,  370,  372,  383,  381,  384,  408,  414,  386,  445,
                    453,  452,  454,  461,  463,  460,  475,  478,  465,  508,  522,  525,  532,  550,  553,  554,  555,  545,  556,  557,
                    537,  551,  568,  333,  424,  343,  344,  586,  584,  618,  633,  637,  639,  645,  659,  649,  670,  671,  672,  679,
                    681,  682,  683,  282,  686,  256,  345,  368,  375,  425,  435,  437,  535,  684,  685,  305,  296,  306,  591,  587,
                    588,  144,  145,  146,  147,  148,  149,  150
                    ];

    var iLenA           = sA.length,    iLenB           = sB.length;
    var jA              = 0,            jB              = 0;
    var sIgnoreBuff_A   = [],           sIgnoreBuff_B   = [];


    function iSortIgnoreBuff ()
    {
        var iIgLenA = sIgnoreBuff_A.length, iIgLenB = sIgnoreBuff_B.length;
        var kA      = 0,                    kB      = 0;

        while (kA < iIgLenA  &&  kB < iIgLenB)
        {
            var igA = sIgnoreBuff_A [kA++],  igB = sIgnoreBuff_B [kB++];

            if (aSortOrder[igA]  >  aSortOrder[igB] )   return 1;
            if (aSortOrder[igA]  <  aSortOrder[igB] )   return -1;
        }
        //--- All else equal, longest string loses
        if (iIgLenA > iIgLenB)      return 1;
        if (iIgLenA < iIgLenB)      return -1;

        return 0;
    }


    while (jA < iLenA  &&  jB < iLenB)
    {
        var cA  = sA.charCodeAt (jA++);
        var cB  = sB.charCodeAt (jB++);

        if (cA == cB)
        {
            continue;
        }

        while (bIgnoreForPrimarySort (cA) )
        {
            sIgnoreBuff_A.push (cA);
            if (jA < iLenA)
                cA  = sA.charCodeAt (jA++);
            else
                break;
        }
        while (bIgnoreForPrimarySort (cB) )
        {
            sIgnoreBuff_B.push (cB);
            if (jB < iLenB)
                cB  = sB.charCodeAt (jB++);
            else
                break;
        }

        /*--- Have we reached the end of one or both strings, ending on an ignore char?
            The strings were equal, up to that point.
            If one of the strings is NOT an ignore char, while the other is, it wins.
        */
        if (bIgnoreForPrimarySort (cA) )
        {
            if (! bIgnoreForPrimarySort (cB))   return -1;
        }
        else if (bIgnoreForPrimarySort (cB) )
        {
            return 1;
        }
        else
        {
            if (aSortOrder[cA]  >  aSortOrder[cB] )
                return 1;

            if (aSortOrder[cA]  <  aSortOrder[cB] )
                return -1;

            //--- We are equal, so far, on the main chars.  Where there ignore chars?
            var iBuffSort   = iSortIgnoreBuff ();
            if (iBuffSort)  return iBuffSort;

            //--- Still here?  Reset the ignore arrays.
            sIgnoreBuff_A   = [];
            sIgnoreBuff_B   = [];
        }

    } //-- while (jA < iLenA  &&  jB < iLenB)

    /*--- We have gone through all of at least one string and they are still both
        equal barring ignore chars or unequal lengths.
    */
    var iBuffSort   = iSortIgnoreBuff ();
    if (iBuffSort)  return iBuffSort;

    //--- All else equal, longest string loses
    if (iLenA > iLenB)      return 1;
    if (iLenA < iLenB)      return -1;

    return 0;

} //-- function SortByRoughSQL_Latin1_General_CP1_CS_AS

测试:

var aPhrases    = [
                    'Grails Found',
                    '--Exhibit Visitors',
                    '-Exhibit Visitors',
                    'Exhibit Visitors',
                    'Calls Received',
                    'Ëxhibit Visitors',
                    'Brochures distributed',
                    'exhibit visitors',
                    'bags of Garbage',
                    '^&$Grails Found',
                    '?test'
                ];

aPhrases.sort (SortByRoughSQL_Latin1_General_CP1_CS_AS);

console.log (aPhrases.join ('\n') );

结果:

?test
^&$Grails Found
bags of Garbage
Brochures distributed
Calls Received
exhibit visitors
Exhibit Visitors
-Exhibit Visitors
--Exhibit Visitors
Ëxhibit Visitors
Grails Found

【讨论】:

  • 我已经验证服务器排序规则设置为:SQL_Latin1_General_CP1_CI_AS,我将调查你的方法,看看它是如何成功的。顺便说一句,我认为我的赏金有点便宜......如果这可行,我会在接受你的答案之前让它过期,这样我就可以奖励你一个更高的答案(看起来公平/合理?)跨度>
  • 我认为这已经接近我们所希望的完美了,非常感谢您在这方面的帮助(Stackoverflow 需要一个“购买啤酒”功能!)
  • @icc97,空值应该在任何排序之前被删除或“保护”。在 sort 函数中绕过它们是低效且不必要的。
  • @icc97,啊,但他们确实如此。考虑这个非常常见的例子:function cI (A, B) { return A.toLowerCase().localeCompare(B.toLowerCase()); } ... JS 排序值中的空值并不常见,也不是一个好习惯。
  • @icc97,处理空值是好的和正确的。但它应该在一次遍历数组中完成,排序之前。
【解决方案2】:

@BrockAdams' answer 很棒,但是我在字符串中间有一些带有连字符的边缘情况与 SQL 服务器不匹配,我不太清楚哪里出了问题,所以我写了一个更实用的版本,它只是过滤掉被忽略的字符,然后根据拉丁代码点比较数组。

它的性能可能不太好,但要理解的代码更少,而且它可以匹配我在下面添加的 SQL 测试用例。

我使用的是带有Latin1_General_100_CI_AS 的 SQL Server 数据库,因此它不区分大小写,但我将此处的代码保持为区分大小写,切换到不区分大小写检查很容易,只需创建一个将toLowerCase 应用于变量的包装函数。

两个排序规则与我拥有的测试用例的排序没有区别。

/**
 * This is a modified version of sortByRoughSQL_Latin1_General_CP1_CS_AS
 * This has a more functional approach, it is more basic
 * It simply does a character filter and then sort
 * @link https://stackoverflow.com/a/3266430/327074
 *
 * @param   {String} a
 * @param   {String} b
 * @returns {Number}   -1,0,1
 */
function latinSqlSort(a, b) {
    'use strict';
    //--- This is the master lookup table for Latin1 code-points.
    //    Here through the extended set \u02AF
    var latinLookup = [
         -1,151,152,153,154,155,156,157,158,  2,  3,  4,  5,  6,159,160,161,162,163,164,
        165,166,167,168,169,170,171,172,173,174,175,176,  0,  7,  8,  9, 10, 11, 12,210,
         13, 14, 15, 41, 16,211, 17, 18, 65, 69, 71, 74, 76, 77, 80, 81, 82, 83, 19, 20,
         42, 43, 44, 21, 22,214,257,266,284,308,347,352,376,387,419,427,438,459,466,486,
        529,534,538,559,576,595,636,641,647,650,661, 23, 24, 25, 26, 27, 28,213,255,265,
        283,307,346,350,374,385,418,426,436,458,464,485,528,533,536,558,575,594,635,640,
        646,648,660, 29, 30, 31, 32,177,178,179,180,181,182,183,184,185,186,187,188,189,
        190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,
          1, 33, 53, 54, 55, 56, 34, 57, 35, 58,215, 46, 59,212, 60, 36, 61, 45, 72, 75,
         37, 62, 63, 64, 38, 70,487, 47, 66, 67, 68, 39,219,217,221,231,223,233,250,276,
        312,310,316,318,392,390,395,397,295,472,491,489,493,503,495, 48,511,599,597,601,
        603,652,590,573,218,216,220,230,222,232,249,275,311,309,315,317,391,389,394,396,
        294,471,490,488,492,502,494, 49,510,598,596,600,602,651,589,655,229,228,227,226,
        235,234,268,267,272,271,270,269,274,273,286,285,290,287,324,323,322,321,314,313,
        326,325,320,319,358,357,362,361,356,355,364,363,378,377,380,379,405,404,403,402,
        401,400,407,406,393,388,417,416,421,420,432,431,428,440,439,447,446,444,443,442,
        441,450,449,468,467,474,473,470,469,477,484,483,501,500,499,498,507,506,527,526,
        540,539,544,543,542,541,561,560,563,562,567,566,565,564,580,579,578,577,593,592,
        611,610,609,608,607,606,613,612,617,616,615,614,643,642,654,653,656,663,662,665,
        664,667,666,574,258,260,262,261,264,263,281,278,277,304,292,289,288,297,335,337,
        332,348,349,369,371,382,415,409,434,433,448,451,462,476,479,509,521,520,524,523,
        531,530,552,572,571,569,570,583,582,581,585,632,631,634,638,658,657,669,668,673,
        677,676,678, 73, 79, 78,680,644, 50, 51, 52, 40,303,302,301,457,456,455,482,481,
        480,225,224,399,398,497,496,605,604,626,625,620,619,624,623,622,621,334,241,240,
        237,236,254,253,366,365,360,359,430,429,505,504,515,514,675,674,422,300,299,298,
        354,353, 84, 85, 86, 87,239,238,252,251,513,512,243,242,245,244,328,327,330,329,
        411,410,413,412,517,516,519,518,547,546,549,548,628,627,630,629, 88, 89, 90, 91,
         92, 93, 94, 95, 96, 97, 98, 99,100,101,102,103,104,105,106,107,108,109,110,111,
        112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,
        132,133,134,135,136,137,138,139,140,141,142,143,246,247,248,259,279,280,293,291,
        339,336,338,331,340,341,342,423,367,373,351,370,372,383,381,384,408,414,386,445,
        453,452,454,461,463,460,475,478,465,508,522,525,532,550,553,554,555,545,556,557,
        537,551,568,333,424,343,344,586,584,618,633,637,639,645,659,649,670,671,672,679,
        681,682,683,282,686,256,345,368,375,425,435,437,535,684,685,305,296,306,591,587,
        588,144,145,146,147,148,149,150
    ];

    /**
     * A bunch of characters get ignored for the primary sort weight.
     * The most important ones are the hyphen and apostrophe characters.
     * A bunch of control characters and a couple of odds and ends, make up
     * the rest.
     *
     * @param   {Number}
     * @returns {Boolean}
     * @link https://stackoverflow.com/a/3266430/327074
     */
    function ignoreForPrimarySort(iCharCode) {
        if (iCharCode < 9) {
            return true;
        }

        if (iCharCode >= 14 && iCharCode <= 31) {
            return true;
        }

        if (iCharCode >= 127 && iCharCode <= 159) {
            return true;
        }

        if (iCharCode == 39 || iCharCode == 45 || iCharCode == 173) {
            return true;
        }

        return false;
    }

    // normal sort
    function compare(a, b) {
        if (a === b) {
            return 0;
        }
        return a > b ? 1 : -1;
    }

    // compare two arrays return first compare difference
    function arrayCompare(a, b) {
        return a.reduce(function (acc, x, i) {
            return acc === 0 && i < b.length ? compare(x, b[i]) : acc;
        }, 0);
    }

    /**
     * convert a string to array of latin code point ordering
     * @param   {String} x
     * @returns {Array}    integer array
     */
    function toLatinOrder(x) {
        return x.split('')
            // convert to char codes
            .map(function(x){return x.charCodeAt(0);})
            // filter out ignored characters
            .filter(function(x){return !ignoreForPrimarySort(x);})
            // convert to latin order
            .map(function(x){return latinLookup[x];});
    }

    // convert inputs
    var charA = toLatinOrder(a),
        charB = toLatinOrder(b);

    // compare the arrays
    var charsCompare = arrayCompare(charA, charB);
    if (charsCompare !== 0) {
        return charsCompare;
    }

    // fallback to the filtered array length
    var charsLenCompare = compare(charA.length, charB.length);
    if (charsLenCompare !== 0) {
        return charsLenCompare;
    }

    // Final fallback to a basic length comparison
    return compare(a.length, b.length);
}

var tests = [
    'Grails Found',
    '--Exhibit Visitors',
    '-Exhibit Visitors',
    'Exhibit Visitors',
    'Calls Received',
    'Ëxhibit Visitors',
    'Brochures distributed',
    'exhibit visitors',
    'bags of Garbage',
    '^&$Grails Found',
    '?test',
    '612C-520',
    '612-C-122',
    '612C-122 I',
    '612-C-126 L',
    '612C-301 B',
    '612C-304 B',
    '612C-306',
    '612-C-306',
    '612-C-306 2',
    '612-C-403 H',
    '612C403 O',
    '612-C-403(V)',
    '612E-306A/B I',
    '612E-306A/B O',
    '612C-121 O',
    '612C-111 B',
    '- -612C-111 B'
].sort(latinSqlSort).join('<br>');

document.write(tests);

我还做了一个SQL fiddle 来仔细检查它。如果链接断开,这里是它的外观截图:

【讨论】:

  • 不确定- -612C-111 B 值的排序是否正确,但总体而言,这个答案似乎不错(现在不想以应有的严谨性重新审视这个问题)。
  • @BrockAdams 这实际上是让我陷入这个兔子洞的案例之一。我检查了 SQL Server - 这是一个 SQL Fiddle
【解决方案3】:

抱歉,JavaScript 没有排序规则功能。您获得的唯一字符串比较是直接在 String 中的 UTF-16 代码单元上进行,由 charCodeAt() 返回。

对于基本多语言平面内的字符,这与二进制排序规则相同,因此如果您需要 JS 和 SQL Server 同意(无论如何忽略星体平面),我认为这是您要做的唯一方法. (无论如何,没有在 JS 中构建一个字符串整理器并精心复制 SQL Server 的整理规则。那里不是很有趣。)

(用例是什么,为什么要匹配?)

【讨论】:

  • 感谢您的见解;用例非常简单——我从 sql server 发回排序后的数据,并在表中具有客户端排序功能。当他们不同意时,我在分页时遇到问题等。
猜你喜欢
  • 2021-11-30
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-10-02
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多