【问题标题】:Create clustered or nonclustered index on 1 million rows table for LIKE query?在 100 万行表上为 LIKE 查询创建聚集或非聚集索引?
【发布时间】:2017-09-28 09:00:28
【问题描述】:

我有一个包含邮政编码、城市、经度、纬度、省的数据库表。

此表的一个用途是为自动完成小部件返回建议的代码(加上城市和省)。

citypostalcode 组成了一个独特的记录。我从 3 个字符开始查询。

这个查询很慢,这使得自动完成体验很糟糕。我想知道上面的信息哪种索引最有效?我正在使用 Azure SQL 数据存储,因此无法运行查询分析器/优化顾问。

我在postalCode 上尝试了非聚集索引,在postalCodecity 上尝试了2 列聚集索引。两者都在查询中产生了相同的结果:

Select * 
From PostalCode 
Where code LIKE '%L6J 0%'

我没有更新或插入此表。

【问题讨论】:

  • 英国邮政编码看起来可以去掉前缀通配符。在一些会产生很大速度差异的数据库引擎中 - 您是否能够为您想要的功能做到这一点?我不知道将前几个字母与“邮政编码中的任何地方”匹配是否有任何价值,因为人们总是从第一个字母开始输入它们。
  • 如果您在LIKE 比较中使用领先 %,则任何索引都无法帮助您。这将总是导致全表扫描
  • 真的很好。我实际上正在使用 EF/linq 所以这是一个开始。这可能不会放在那里。我不知道为什么我在 SQL Server 管理器中进行测试,但你是 100% 正确的。在代码和城市上使用聚集索引更快,并且没有领先的 %
  • 如果您在LIKE 比较中使用前导%,则数据库可以使用索引扫描而不是 扫描。它仍然不会很漂亮,但行越宽索引扫描看起来越好。
  • @billyjean: 你可以使用全文搜索

标签: sql tsql azure-sql-database clustered-index non-clustered-index


【解决方案1】:

由于邮政编码的长度相当短且已知 (8),因此这是一个很好的分块候选者。将邮政编码分解为其所有组成部分,并将它们与其起始位置和长度一起存储以启用索引查找。

例如,对于诸如“OX1 1JZ”之类的邮政编码,存储以下所有字符串:

start len postcodePart
1   2   OX
1   3   OX1
1   4   OX11
1   5   OX11J
1   6   OX11JZ
2   2   X1
2   3   X11
2   4   X11J
2   5   X11JZ
3   2   11
3   3   11J
3   4   11JZ
4   2   1J
4   3   1JZ
5   2   JZ

这是一个示例脚本,演示了该技术以及如何使用 100 个示例邮政编码和一个触发器来粉碎邮政编码。

注意!!这不是生产就绪代码,只是展示技术的示例。

USE tempdb
GO

-- https://www.postcodelist.co.uk/
--uk-postcodes.csv

IF OBJECT_ID('dbo.postCodeParts') IS NOT NULL DROP TABLE dbo.postCodeParts
IF OBJECT_ID('dbo.postCodes') IS NOT NULL DROP TABLE dbo.postCodes
GO

CREATE TABLE dbo.postCodes (
    postcodeId              INT IDENTITY CONSTRAINT PK_postCodes PRIMARY KEY,
    postcode                VARCHAR(8) NOT NULL

    --... the rest of your columns

    )
GO


CREATE TABLE dbo.postCodeParts (
    postcodePartId          INT IDENTITY CONSTRAINT PK_postCodeParts PRIMARY KEY NONCLUSTERED,
    postcodeId              INT NOT NULL FOREIGN KEY REFERENCES dbo.postCodes ( postcodeId ),

    totalLen                TINYINT NOT NULL,
    xStart                  TINYINT NOT NULL,
    xLen                    TINYINT NOT NULL,
    postcodePart            VARCHAR(8) NOT NULL INDEX cdx_postCodeParts CLUSTERED

    )
GO


-- Add a trimmed copy of the postcode to the parts table, chunked up.
CREATE TRIGGER dbo.trg_postCodes
ON dbo.postcodes
FOR INSERT
AS
BEGIN

    ;WITH cte AS
    (
    SELECT *
    FROM (
        VALUES ( 1 ), ( 2 ), ( 3 ), ( 4 ), ( 5 ), ( 6 ), ( 7 ), ( 8 ) 
        ) x(y)
    )
    INSERT INTO dbo.postCodeParts ( postcodeId, totalLen, xStart, xLen, postcodePart )
    SELECT 
        p.postcodeId, 
        p.xTotalLen, 
        c1.y AS xstart, 
        c2.y AS xlen, 
        SUBSTRING( p.postCode, c1.y, c2.y ) AS xstring
    FROM ( 
        SELECT
            postcodeId,
            REPLACE( postcode, ' ', '' ) postCode, 
            LEN( REPLACE( postcode, ' ', '' ) ) AS xTotalLen 
        FROM inserted 
    ) p
        CROSS JOIN cte c1
            CROSS JOIN cte c2
    WHERE c2.y Between 2 And p.xTotalLen
      AND ( ( c2.y ) + ( c1.y - 1 ) ) <= p.xTotalLen

END
GO


INSERT INTO dbo.postcodes ( postcode )
VALUES
    ( 'OX1 1AA' ),( 'OX1 1AB' ),( 'OX1 1AD' ),( 'OX1 1AE' ),( 'OX1 1AF' ),( 'OX1 1AG' ),( 'OX1 1AN' ),( 'OX1 1AS' ),( 'OX1 1AW' ),( 'OX1 1AY' ),
    ( 'OX1 1AZ' ),( 'OX1 1BD' ),( 'OX1 1BE' ),( 'OX1 1BN' ),( 'OX1 1BP' ),( 'OX1 1BS' ),( 'OX1 1BT' ),( 'OX1 1BU' ),( 'OX1 1BX' ),( 'OX1 1BY' ),
    ( 'OX1 1BZ' ),( 'OX1 1DA' ),( 'OX1 1DB' ),( 'OX1 1DE' ),( 'OX1 1DF' ),( 'OX1 1DG' ),( 'OX1 1DJ' ),( 'OX1 1DL' ),( 'OX1 1DP' ),( 'OX1 1DQ' ),
    ( 'OX1 1DS' ),( 'OX1 1DW' ),( 'OX1 1DZ' ),( 'OX1 1EA' ),( 'OX1 1EF' ),( 'OX1 1EJ' ),( 'OX1 1EN' ),( 'OX1 1EP' ),( 'OX1 1EQ' ),( 'OX1 1ER' ),
    ( 'OX1 1ES' ),( 'OX1 1ET' ),( 'OX1 1EU' ),( 'OX1 1EW' ),( 'OX1 1EX' ),( 'OX1 1GA' ),( 'OX1 1GB' ),( 'OX1 1GD' ),( 'OX1 1GE' ),( 'OX1 1GF' ),
    ( 'OX1 1GH' ),( 'OX1 1GJ' ),( 'OX1 1GL' ),( 'OX1 1HB' ),( 'OX1 1HD' ),( 'OX1 1HF' ),( 'OX1 1HG' ),( 'OX1 1HH' ),( 'OX1 1HN' ),( 'OX1 1HP' ),
    ( 'OX1 1HQ' ),( 'OX1 1HR' ),( 'OX1 1HS' ),( 'OX1 1HT' ),( 'OX1 1HU' ),( 'OX1 1HW' ),( 'OX1 1HX' ),( 'OX1 1HY' ),( 'OX1 1HZ' ),( 'OX1 1JA' ),
    ( 'OX1 1JB' ),( 'OX1 1JD' ),( 'OX1 1JE' ),( 'OX1 1JF' ),( 'OX1 1JG' ),( 'OX1 1JH' ),( 'OX1 1JJ' ),( 'OX1 1JL' ),( 'OX1 1JP' ),( 'OX1 1JQ' ),
    ( 'OX1 1JR' ),( 'OX1 1JS' ),( 'OX1 1JT' ),( 'OX1 1JU' ),( 'OX1 1JW' ),( 'OX1 1JX' ),( 'OX1 1JY' ),( 'OX1 1JZ' ),( 'OX1 1LB' ),( 'OX1 1LD' ),
    ( 'OX1 1LE' ),( 'OX1 1LF' ),( 'OX1 1LG' ),( 'OX1 1LJ' ),( 'OX1 1LL' ),( 'OX1 1LQ' ),( 'OX1 1LT' ),( 'OX1 1LU' ),( 'OX1 1LY' ),( 'OX1 1ND' )
GO

SELECT * FROM dbo.postCodes
SELECT * FROM dbo.postCodeParts ORDER BY xStart, xLen

SELECT * 
FROM dbo.postCodes pc
    INNER JOIN dbo.postCodeParts pcp ON pc.postcodeId = pcp.postcodeId
WHERE postcodePart = '1J'
ORDER BY xStart, xLen
GO

IF OBJECT_ID('dbo.usp_searchPostCodes') IS NOT NULL DROP PROC dbo.usp_searchPostCodes
GO
CREATE PROC dbo.usp_searchPostCodes

    @searchString   VARCHAR(8)

AS

    SET NOCOUNT ON

    --!!TODO add error handling
    --!!TODO does not deal with middle wildcards or _ wildcard

    DECLARE @leadingWildCard BIT
    DECLARE @cleanSearchString VARCHAR(8)

    SELECT @leadingWildCard = CASE WHEN LEFT( @searchString, 1 ) = '%' THEN 1 ELSE 0 END
    SELECT @cleanSearchString = REPLACE( REPLACE( @searchString, ' ', '' ), '%', '' )

    -- Debugging
    --PRINT @leadingWildCard
    --PRINT @cleanSearchString

    IF @leadingWildCard = 0

        -- No leading wildcard, start at position 1
        SELECT pc.postcode
        FROM dbo.postCodes pc
            INNER JOIN dbo.postCodeParts pcp ON pc.postcodeId = pcp.postcodeId
        WHERE pcp.postcodePart = @cleanSearchString
          AND pcp.xstart = 1
        ORDER BY xStart, xLen

    ELSE

        -- Leading wildcard, return all positions
        SELECT pc.postcode
        FROM dbo.postCodes pc
            INNER JOIN dbo.postCodeParts pcp ON pc.postcodeId = pcp.postcodeId
        WHERE pcp.postcodePart = @cleanSearchString
        ORDER BY xStart, xLen

RETURN
GO


EXEC dbo.usp_searchPostCodes 'OX1 1J%'
EXEC dbo.usp_searchPostCodes '%X1 1J%'
GO


SELECT xStart, xLen, postcodePart
FROM dbo.postCodes pc
    INNER JOIN dbo.postCodeParts pcp ON pc.postcodeId = pcp.postcodeId
WHERE pc.postcode= 'OX1 1JZ'
ORDER BY xStart, xLen

【讨论】:

  • 对此有何看法?
  • 嗨,鲍勃,抱歉,今天刚刚重温。是的,我仍然没有得到我想要的性能..所以我开始考虑分解 pcode。这远远超出了我的想法。这看起来真的很有趣。那么触发器是用于填充 postCodeParts 表吗?我已经有了邮政编码表。也实现这个..你如何实际查询它?谢谢鲍勃!
  • 完成我的代码示例,它展示了如何查询它。
  • 向 postalCode 表添加 6 列怎么样。 pcode1, pcode2,pcode3, pcode4, pcode5
  • 好吧,这最终取决于你。叫它你怎么看。根据我的经验,走捷径通常迟早会导致问题。祝你好运!
猜你喜欢
  • 2021-01-14
  • 1970-01-01
  • 2015-01-01
  • 2017-05-08
  • 2013-01-24
  • 2014-08-27
  • 2013-08-07
  • 1970-01-01
相关资源
最近更新 更多