【问题标题】:Special Characters - Sql特殊字符 - Sql
【发布时间】:2019-06-07 13:01:47
【问题描述】:

如何在 SqlServer 的列中获取特殊字符?

我有电子邮件列表,我必须找到像下面的例子这样的特殊字符

**Email** 
JóhnSnow@gmail.com
Khãlessi@gmail.com 

正如您在上面看到的,有 '~' 和 '´' 作为特殊字符。可能会出现其他字符,例如“..”或其他。

我正在使用 Sql Server 2012,

有人有解决方案的建议吗?

【问题讨论】:

  • 查询应该只返回~/ã 还是Khãlessi@gmail.com
  • 是的@Sami,只有~
  • 我很好奇你为什么想知道这个?
  • @martinBrown 啊哈哈,你的问题很有趣。我得到了 DW 环境,并且有一个客户在使用我的数据,其中一个问我是否有可能找到不符合模式的内容。如您所知,电子邮件、电话号码等一些信息是手动输入到我的数据库中的。这就是为什么我在这里打开这个问题
  • 认为它可能是这样的。提醒一句,构成电子邮件地址中有效字符的规则有些复杂。例如,主机的规则(@ 之后的位)与本地部分的规则(@ 之前的位)不同。它还取决于您支持哪些 SMTP 扩展(特别是 RFC 6531),以及主机是否将被编码为 punycode 或者它们是否已经应该采用这种形式。

标签: sql sql-server special-characters


【解决方案1】:

要提取特殊字符,您首先需要将字符串拆分为行,这样您就可以单独查询每行,这可以使用数字表来完成。如果您没有,它们很容易在运行中创建:

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3)
SELECT  Number
FROM    Numbers;

这给出了从 1 到 10000 的数字列表。更多关于这个here

然后您可以使用条件Number < LEN(Email) 将其加入您的数据,以确保您为电子邮件中的每个字符返回一行,然后使用SUBSTRING() 提取位置n 处的字符:

DECLARE @T TABLE (ID INT IDENTITY, Email NVARCHAR(255));
INSERT @T (Email)
VALUES (N'JóhnSnów@gmail.com'), (N'Khãlessi@gmail.com'), ('NedStark@gmail.com');

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3)
SELECT  t.ID, 
        t.Email, 
        Character = SUBSTRING(t.Email, n.Number, 1)
FROM    @T AS t
        INNER JOIN Numbers n    
            ON n.Number < LEN(t.Email)
ORDER BY t.ID;

这给出了:

ID  Email                   Character
-----------------------------
1   JóhnSnow@gmail.com      J
1   JóhnSnow@gmail.com      ó
1   JóhnSnow@gmail.com      h
1   JóhnSnow@gmail.com      n
1   JóhnSnow@gmail.com      S
1   JóhnSnow@gmail.com      n
1   JóhnSnow@gmail.com      ó
1   JóhnSnow@gmail.com      w
.....

然后,您可以通过使用排序规则 SQL_Latin1_General_Cp1251_CS_AS 将它们转换为 VARCHAR 来提取特殊字符,并检查原始字符:

DECLARE @T TABLE (ID INT IDENTITY, Email NVARCHAR(255));
INSERT @T (Email)
VALUES (N'JóhnSnów@gmail.com'), (N'Khãlessi@gmail.com'), ('NedStark@gmail.com');

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3),
AllCharacters as
(   SELECT  t.ID,  
            t.Email, 
            Character = SUBSTRING(t.Email, n.Number, 1), 
            Position = n.Number
    FROM    @T AS t
            INNER JOIN Numbers n    
                ON n.Number < LEN(t.Email)
)
SELECT  ac.ID, ac.Character, ac.Position
FROM    AllCharacters AS ac
WHERE   CONVERT(CHAR(1), ac.Character) COLLATE SQL_Latin1_General_Cp1251_CS_AS <> ac.Character
ORDER BY ac.ID;

结果

ID  Email                   Character   Position
----------------------------------------------------
1   JóhnSnów@gmail.com          ó           2
1   JóhnSnów@gmail.com          ó           7
2   Khãlessi@gmail.com          ã           3

最后,如果需要,您可以将XML extensions to concatenate 这些字符用于单个列:

DECLARE @T TABLE (ID INT IDENTITY, Email NVARCHAR(255));
INSERT @T (Email)
VALUES (N'JóhnSnów@gmail.com'), (N'Khãlessi@gmail.com'), ('NedStark@gmail.com');

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3),
AllCharacters as
(   SELECT  t.ID,  
            t.Email, 
            Character = SUBSTRING(t.Email, n.Number, 1), 
            Position = n.Number
    FROM    @T AS t
            INNER JOIN Numbers n    
                ON n.Number < LEN(t.Email)
), SpecialCharacters AS
(   SELECT  ac.ID, ac.Character, ac.Position
    FROM    AllCharacters AS ac
    WHERE   CONVERT(CHAR(1), ac.Character) COLLATE SQL_Latin1_General_Cp1251_CS_AS <> ac.Character
)
SELECT  t.ID,
        t.Email,
        SpecialCharacters = ISNULL(STUFF(s.SpecialCharacterList.value('.', 'NVARCHAR(255)'), 1, 2, ''), '')
FROM    @T AS T
        CROSS APPLY
        (   SELECT  CONCAT(N', ', s.Character, '(', Position, ')')
            FROM    SpecialCharacters AS s
            WHERE   s.ID = t.ID
            ORDER BY Position
            FOR XML PATH(''), TYPE
        ) s (SpecialCharacterList)
ORDER BY ID;

结果

ID  Email                   SpecialCharacters
------------------------------------------------
1   JóhnSnów@gmail.com      ó(2), ó(7)
2   Khãlessi@gmail.com      ã(3)
3   NedStark@gmail.com  

顺便说一句,将您视为特殊字符的内容存储在表格中可能更适合您的需求,而不是依赖于特定排序规则的代码页,如果您要这样做,您只需要更改这一行:

WHERE   CONVERT(CHAR(1), ac.Character) COLLATE SQL_Latin1_General_Cp1251_CS_AS <> ac.Character

为:

WHERE EXISTS (SELECT 1 FROM MySpecialCharacterTable AS sct WHERE sct.Character = ac.Character)

【讨论】:

  • 谢谢!你帮我解决了!我必须对你的脚本进行一些调整才能获得“..”和“,”
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-05-07
  • 2021-12-26
  • 2018-09-24
  • 2018-07-31
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多