【问题标题】:Alphanumeric Sort字母数字排序
【发布时间】:2015-06-22 23:03:31
【问题描述】:

在 SQL 端对数据进行排序时,我需要快速帮助。我正在使用Sqlserver 2012(如果答案提供了新功能,那就太好了)。

我已经搜索了一些链接为Sorting in alphanumericAlphanumeric string Sorting in Sqlserver - Code project。但没有给出想要的结果。

还是我尝试过的:

CREATE TABLE dbo.Section
    (
           Section varchar(50) NULL
    )
    INSERT INTO dbo.Section (Section.Section) VALUES ('Campsit no.43')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Campsit no.41')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Campsite No. 11')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Campsite No. 1')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Campsite No. 12')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Campsite No. 2')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Campsite No. 3')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Campsite No. 4')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Campsite No. 40')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Campsite No. 41')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Campsite no.20')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Campsite no.41')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Cabin')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Group Tent Campsite')
    INSERT INTO dbo.Section (Section.Section) VALUES ('Tent Campsite')
    INSERT INTO dbo.Section (Section.Section) VALUES ('test1')
    INSERT INTO dbo.Section (Section.Section) VALUES ('test2')
    INSERT INTO dbo.Section (Section.Section) VALUES ('test11')
    SELECT Section
    FROM dbo.Section
    --Show normal Sort
    SELECT Section
    FROM dbo.Section
    ORDER BY Section
    --Show AlphaNumberic Sort
    SELECT Section
    FROM dbo.Section
    ORDER BY LEFT(Section,PATINDEX('%[0-9]%',Section)), -- alphabetical sort
             CONVERT(varchar(50),SUBSTRING(Section,PATINDEX('%[0-9]%',Section),LEN(Section))) -- numerical sort
    --cleanup our work
    --DROP Table dbo.Section

现在我想要的是:如果在字母部分中找到相同的字符串,首先排序,然后是数字(如果可能,也考虑空格,或者你可以给出没有空格的结果,例如 Campsite no.41 和 Campsite No.41将以相同的顺序给出)

Actual Result          Expected Result
Campsit no.41          Campsit no.41
Campsit no.43          Campsit no.43
Campsite No. 1         Campsite No. 1
Campsite No. 11        Campsite No. 2
Campsite No. 12        Campsite No. 3
Campsite No. 2         Campsite No. 4
Campsite No. 21        Campsite No. 11
Campsite No. 3         Campsite No. 12
Campsite No. 4         Campsite No. 21
Campsite No. 40        Campsite No. 40
Campsite No. 41        Campsite No. 41
Campsite no.20         Campsite no.20 --this will good to come here, if possible or if not, then remove space and set approriate
Campsite no.41         Campsite no.41 --this will good to come here, if possible or if not, then remove space and set approriate
Group Tent Campsite    Group Tent Campsite
Tent Campsite          Tent Campsite
test1                  test1
test11                 test2
test2                  test11

【问题讨论】:

  • 你的规则太疯狂了。当字符串以No. xxx 结尾时,您想将xxx 视为数字,但testxxx 您仍希望将xxx 视为字符串(否则顺序将变为test1test2、@ 987654332@).
  • 对不起..你理解的对。
  • 我明白,只是我认为你不会写那些规则。除非您可以用简单的英语表达将字符串末尾视为数字的规则是什么。
  • 我还是不明白:为什么 20 号和 41 号露营地会在 12 号和 21 号之间?
  • 请查看更新后的答案。

标签: sql-server tsql sorting sql-server-2012 natural-sort


【解决方案1】:

这里有一个提示:每当您遇到排序问题时,请在您的 select 子句中添加 order by items。这将使您能够查看您正在排序的内容是否实际上是您想要排序的内容:

SELECT Section,
        CASE WHEN PATINDEX('%[0-9]%',Section) > 1 THEN
          LEFT(Section,PATINDEX('%[0-9]%',Section)-1)
        ELSE 
          Section
        END As alphabetical_sort, -- alphabetical sort
        CASE WHEN PATINDEX('%[0-9]%',Section) > 1 THEN
          CAST(SUBSTRING(Section,PATINDEX('%[0-9]%',Section),LEN(Section)) as float)
        ELSE
          NULL
        END As Numeric_Sort
FROM dbo.Section
ORDER BY alphabetical_sort, Numeric_Sort

正确排序后,我所要做的就是将 case 语句移到 order by 子句中:

SELECT Section
FROM dbo.Section
ORDER BY 
    CASE WHEN PATINDEX('%[0-9]%',Section) > 1 THEN
        LEFT(Section,PATINDEX('%[0-9]%',Section)-1)
    ELSE 
        Section
    END , -- Alphabetical sort
    CASE WHEN PATINDEX('%[0-9]%',Section) > 1 THEN
        CAST(SUBSTRING(Section,PATINDEX('%[0-9]%',Section),LEN(Section)) as float)
    ELSE
        NULL
    END  -- Numeric sort

基本上,你有 4 个主要问题:

  • 您的字母排序表达式假定每一行都有数字。
  • 您的字母排序表达式包含数字和文本。
  • 您的数字排序表达式既有数字值也有字母值。
  • 由于第 3 条,您无法将数字排序表达式强制转换为数字类型,这就是为什么您需要字符串排序。

See this sql fiddle

【讨论】:

  • 是的,由于字母和字母数字字符串,我被卡住了。
  • 你检查过 sql fiddle 吗?
  • 为什么 SQL Server 不知道如何自动对字母数字文本进行排序?为什么它不能遵循 Windows 的功能?甚至 MySQL 也知道如何正确处理它。
  • @ParisQianSen 你为什么这么认为?对碰巧包含数字的字符串进行排序仍然是字符串排序,即使在 Windows 中也是如此。关键是字符串"1", "11", "2" 正确排序的,而数字1, 11, 2 不是。
  • @ZoharPeled,常识兄弟,我们永远不会将Volume 10 放在Volume 1 旁边而不是Volume 9。它必须是一些性能考虑。如果我愿意为了人类的方便而牺牲机器的一些性能呢?我不能这样做。很少有人会认为您的"1", "11", "2" 用例正确
【解决方案2】:

来试试这个。注意:小屋在您的数据中,但不是您的预期结果。另外,如果您想对空格进行更改,请告诉我。

SELECT  Section,
        FormatSection
FROM dbo.Section
CROSS APPLY (SELECT CASE 
                        WHEN PATINDEX('%[0-9]%',Section) != 0 
                            THEN SUBSTRING(Section,0,PATINDEX('%[0-9]%',Section)) + FORMAT(CAST(SUBSTRING(Section,PATINDEX('%[0-9]%',Section),5) AS INT),'0#')
                        ELSE Section
                    END
            ) AS CA(FormatSection)
ORDER BY FormatSection

结果:

Section                                            FormatSection
-------------------------------------------------- ---------------------
Cabin                                              Cabin
Campsit no.41                                      Campsit no.41
Campsit no.43                                      Campsit no.43
Campsite No. 1                                     Campsite No. 01
Campsite No. 2                                     Campsite No. 02
Campsite No. 3                                     Campsite No. 03
Campsite No. 4                                     Campsite No. 04
Campsite No. 11                                    Campsite No. 11
Campsite No. 12                                    Campsite No. 12
Campsite No. 40                                    Campsite No. 40
Campsite No. 41                                    Campsite No. 41
Campsite no.20                                     Campsite no.20
Campsite no.41                                     Campsite no.41
Group Tent Campsite                                Group Tent Campsite
Tent Campsite                                      Tent Campsite
test1                                              test01
test2                                              test02
test11                                             test11

【讨论】:

    【解决方案3】:

    下面给出了你所追求的结果,但我怀疑它是万无一失的。我认为您将很难找到一个可靠且性能良好的解决方案。

    第一部分是获取第一个数字(它出现在空格之后,因此 Campsite no.41Campsite no. 40 的处理方式不同),我将其放入 APPLY 以便更容易重复使用结果.

    下一个阶段是找到第一个数字之后的第一个非数字字符,即数字结束的地方,这样我们就可以使用子字符串提取完整的数字,然后最后使用TRY_CONVERT(INT将这个提取到一个可排序的输入。

    SELECT  s.Section, 
            TextPart = SUBSTRING(s.Section, 1, ISNULL(fn.FirstNumber, LEN(s.Section))),
            Number = CASE WHEN FirstNumber IS NULL THEN NULL
                        ELSE TRY_CONVERT(INT, SUBSTRING(s.section, fn.FirstNumber + 1, ISNULL(ln.LastNumber, LEN(s.Section)))) 
                    END
    FROM    dbo.Section AS s
        -- GET FIRST NUMBER (WHERE PRECEDING CHARACTER IS A SPACE
        CROSS APPLY (SELECT NULLIF(PATINDEX('% [0-9]%', s.section), 0)) AS fn (FirstNumber)
    
        -- GET FIRST NON NUMERIC CHARACTER AFTER FIRST NUMBER
        CROSS APPLY (SELECT NULLIF(PATINDEX('%[^0-9]%', SUBSTRING(s.section, fn.FirstNumber + 1, LEN(s.Section))), 0)) AS ln (LastNumber)
    ORDER BY TextPart, Number;
    

    n.b.您需要将 select 中的表达式移动到 order by 而不是列别名,但我将其保留为这种格式以便更清楚地了解发生了什么

    我已尝试评论解决方案,但有很多事情正在进行,因此对每一位的完整解释将非常困难。抱歉,有什么不清楚的地方


    编辑

    抱歉,错过了从需要 (test1, test11, test2) 切换到 (test1, test2, test11) 的更新。这只是改变了您查找第一个字母的逻辑,但现在它是前一个字符不是完整字符 (PATINDEX('%[^.][0-9]%', s.section)) 的位置,而不是前一个字符是以前的空格的位置(这确保 Campsite no.20 排在后面Campsite no. 40)

    SELECT  s.Section, 
            TextPart = SUBSTRING(s.Section, 1, ISNULL(fn.FirstNumber, LEN(s.Section))),
            Number = CASE WHEN FirstNumber IS NULL THEN NULL
                        ELSE TRY_CONVERT(INT, SUBSTRING(s.section, fn.FirstNumber + 1, ISNULL(ln.LastNumber, LEN(s.Section)))) 
                    END
    FROM    #Section AS s
        -- GET FIRST NUMBER (WHERE PRECEDING CHARACTER IS NOT A FULL STOP
        CROSS APPLY (SELECT NULLIF(PATINDEX('%[^.][0-9]%', s.section), 0)) AS fn (FirstNumber)
    
        -- GET FIRST NON NUMERIC CHARACTER AFTER FIRST NUMBER
        CROSS APPLY (SELECT NULLIF(PATINDEX('%[^0-9]%', SUBSTRING(s.section, fn.FirstNumber + 1, LEN(s.Section))), 0)) AS ln (LastNumber)
    ORDER BY TextPart, Number;
    

    【讨论】:

    • 他可能想在 where 子句中指定他只在字符串以数字结尾时才使用这种排序。否则我会看到事情变得非常古怪。
    • 一切正常,但在最后 3 个条目中没有得到正确的顺序。
    【解决方案4】:

    我找到了以下替代方案。

    创建此函数并在查询“ORDER BY fnGetNumericFromString([columnNm])”中使用此函数。

    CREATE FUNCTION fnGetNumericFromString (@InString VARCHAR(20), @OutStrType VARCHAR(3))
        RETURNS VarChar(20)
        AS
        BEGIN
        -- declare variables
        DECLARE @pos INT
        DECLARE @strLength INT
        DECLARE @NumericString VarChar(20)
        DECLARE @CharString VarChar(20)
        DECLARE @ReturnValue VarChar(20)
        -- set values
        SET @NumericString = ''
        SET @CharString = ''
        SET @pos= 1
        SET @strLength = LEN(@InString)
        SET @InString = UPPER(@InString)
        SET @OutStrType = UPPER(@OutStrType)
    
        --start looping
        WHILE @pos <= @strLength 
        BEGIN
        -- number codes are 48 to 57
        IF ASCII(SUBSTRING(@InString, @pos, 1))BETWEEN 48 AND 57
        SET @NumericString = @NumericString + SUBSTRING(@InString, @pos, 1)
        else
        SET @CharString = @CharString + SUBSTRING(@InString, @pos, 1)
    
        --increment to next character
        SET @pos = @pos + 1
        END
    
        IF @OutStrType = 'STR'
        SET @ReturnValue = @CharString
        ELSE
        SET @ReturnValue = @NumericString
    
        RETURN @ReturnValue
        END
    
    
    select section from Section
    order by dbo.fnGetNumericFromString(section, 'str'), CAST(dbo.fnGetNumericFromString(section, 'int') AS INT)
    

    【讨论】:

    • 我们也可以用 PATINDEX 代替 SUBSTRING 重写上面的例子。
    【解决方案5】:
     SELECT *,
           ROW_NUMBER()OVER(ORDER BY CASE WHEN ISNUMERIC (ID)=1 THEN CONVERT(NUMERIC(20,2),SUBSTRING(Id, PATINDEX('%[0-9]%', Id), LEN(Id)))END DESC)Rn ---- numerical
    FROM
    (
    SELECT '1'Id UNION ALL
    SELECT '25.20' Id UNION ALL
    SELECT 'A115' Id UNION ALL
    SELECT '2541' Id UNION ALL
    SELECT '571.50' Id UNION ALL
    SELECT '67' Id UNION ALL
    SELECT 'B48' Id UNION ALL
    SELECT '500' Id UNION ALL
    SELECT '147.54' Id UNION ALL
    SELECT 'A-100' Id
    )A
    ORDER BY 
    CASE WHEN ISNUMERIC (ID)=0                                /* alphabetical sort */ 
         THEN CASE WHEN PATINDEX('%[0-9]%', Id)=0
                   THEN LEFT(Id,PATINDEX('%[0-9]%',Id))
                   ELSE LEFT(Id,PATINDEX('%[0-9]%',Id)-1)
              END
    END DESC
    

    【讨论】:

      猜你喜欢
      • 2017-06-05
      • 2014-05-05
      • 2013-10-05
      • 2022-06-13
      • 1970-01-01
      • 2016-03-07
      • 2020-01-02
      • 1970-01-01
      相关资源
      最近更新 更多