【发布时间】:2014-06-15 14:35:01
【问题描述】:
我有一个包含 450,000 行的表,其中有一个可变长度的 varchar 列(6 到 13 个字符之间,分布不均匀)。我需要使用标准连接到另一个表,即目标表中的列以第一个表的列的值开头。
在我当前的测试样本中,我知道所有匹配项都是 6 个字符,所以我使用 t1.Digits = left(t2.Number, 6) 进行连接,速度非常快(运行大型查询只需几秒钟)。我的测试样本是 10,000 条记录,但在生产中查询需要对数十万条记录进行操作。
我也知道绝大多数记录将始终是 6 个字符匹配,但我需要支持更多匹配,否则有时会返回重复记录。问题是我已经尝试了以下所有方法,并且每种方法都比我在左侧六个字符上的简单连接要慢得多。我从来没有让他们跑超过五分钟,但他们没有任何终止的迹象:
t1.Digits = left(t2.Number, datalength(t1.Digits))charindex(t1.Digits, t2.Number) = 1- 将预先计算的
DigitLength int列添加到t1,然后使用t1.Digits = left(t2.Number, t1.DigitLength) t2.Number like t1.Digits + '%'
上述四个解决方案中的每一个都在理论上实现了我想要的,但是对于我的目的来说运行速度太慢了。
即使这些列中的值是数字,我仍然使用varchar,因为在许多情况下需要保留前导零。无论如何,即使对于数据是字母数字的情况,也应该有一个快速的解决方案。
有没有人知道一个非常快速的“开始于”逻辑,在性能上可以与我过于简单的连接相媲美?
我在t1.Digits 列上有聚集索引吗?
这是使用上述方法 #4 运行的执行计划:
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.0" Build="9.00.5000.00" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="10720" StatementId="1" StatementOptmLevel="FULL" StatementSubTreeCost="7471.7" StatementText="select c.FromNumber, c.ToNumber, d.Destination, d.Digits
from Converting c
--join CASH.CASH.dbo.DestinationLookup d on d.Digits = left(c.FromNumber, 6) 
join CASH.CASH.dbo.DestinationLookup d on c.FromNumber like d.Digits + '%' 
" StatementType="SELECT">
<StatementSetOptions ANSI_NULLS="false" ANSI_PADDING="false" ANSI_WARNINGS="false" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="false" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="false" />
<QueryPlan DegreeOfParallelism="1" MemoryGrant="114" CachedPlanSize="99" CompileTime="36" CompileCPU="35" CompileMemory="312">
<RelOp AvgRowSize="77" EstimateCPU="174.861" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="10720" LogicalOp="Inner Join" NodeId="0" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="7471.7">
<OutputList>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10720" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<NestedLoops Optimized="false">
<OuterReferences>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
</OuterReferences>
<RelOp AvgRowSize="38" EstimateCPU="0.164714" EstimateIO="0.00281532" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="10720" LogicalOp="Sort" NodeId="1" Parallel="false" PhysicalOp="Sort" EstimatedTotalSubtreeCost="0.340338">
<OutputList>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
</OutputList>
<MemoryFractions Input="1" Output="1" />
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRebinds="1" ActualRewinds="0" ActualRows="10720" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<Sort Distinct="false">
<OrderBy>
<OrderByColumn Ascending="true">
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
</OrderByColumn>
</OrderBy>
<RelOp AvgRowSize="38" EstimateCPU="0.00296763" EstimateIO="0.126907" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="10720" LogicalOp="Table Scan" NodeId="2" Parallel="false" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="0.129875">
<OutputList>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10720" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<TableScan Ordered="false" ForcedIndex="false" NoExpandHint="false">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
</DefinedValue>
</DefinedValues>
<Object Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" />
</TableScan>
</RelOp>
</Sort>
</RelOp>
<RelOp AvgRowSize="48" EstimateCPU="0.00290986" EstimateIO="0.01" EstimateRebinds="1390" EstimateRewinds="9329" EstimateRows="15609.2" LogicalOp="Lazy Spool" NodeId="3" Parallel="false" PhysicalOp="Table Spool" EstimatedTotalSubtreeCost="7296.5">
<OutputList>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRebinds="1391" ActualRewinds="9329" ActualRows="10720" ActualEndOfScans="10720" ActualExecutions="10720" />
</RunTimeInformation>
<Spool>
<RelOp AvgRowSize="48" EstimateCPU="5.21308" EstimateIO="0" EstimateRebinds="1390" EstimateRewinds="0" EstimateRows="15609.2" LogicalOp="Compute Scalar" NodeId="4" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="7251.4">
<OutputList>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ScalarOperator ScalarString="[CASH].[CASH].[dbo].[DestinationLookup].[Digits] as [d].[Digits]">
<Identifier>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
</Identifier>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
<ScalarOperator ScalarString="[CASH].[CASH].[dbo].[DestinationLookup].[Destination] as [d].[Destination]">
<Identifier>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</Identifier>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="48" EstimateCPU="5.21308" EstimateIO="0" EstimateRebinds="1390" EstimateRewinds="0" EstimateRows="15609.2" LogicalOp="Remote Query" NodeId="5" Parallel="false" PhysicalOp="Remote Query" EstimatedTotalSubtreeCost="7251.4">
<OutputList>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRebinds="1391" ActualRewinds="0" ActualRows="1391" ActualEndOfScans="1391" ActualExecutions="1391" />
</RunTimeInformation>
<RemoteQuery RemoteSource="CASH" RemoteQuery="SELECT "Tbl1004"."Digits" "Col1021","Tbl1004"."Destination" "Col1022" FROM "CASH"."dbo"."DestinationLookup" "Tbl1004" WHERE ? like "Tbl1004"."Digits"+'%'" />
</RelOp>
</ComputeScalar>
</RelOp>
</Spool>
</RelOp>
</NestedLoops>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
这是使用简单 left(t2.Number, 6) 加入时的计划:
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.0" Build="9.00.5000.00" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="10720" StatementId="1" StatementOptmLevel="FULL" StatementSubTreeCost="15.1845" StatementText="select c.FromNumber, c.ToNumber, d.Destination, d.Digits
from Converting c
join CASH.CASH.dbo.DestinationLookup d on d.Digits = left(c.FromNumber, 6) " StatementType="SELECT">
<StatementSetOptions ANSI_NULLS="false" ANSI_PADDING="false" ANSI_WARNINGS="false" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="false" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="false" />
<QueryPlan DegreeOfParallelism="1" CachedPlanSize="105" CompileTime="60" CompileCPU="58" CompileMemory="360">
<RelOp AvgRowSize="77" EstimateCPU="0.0448096" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="10720" LogicalOp="Inner Join" NodeId="0" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="15.1845">
<OutputList>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10720" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<NestedLoops Optimized="false">
<OuterReferences>
<ColumnReference Column="Expr1005" />
</OuterReferences>
<RelOp AvgRowSize="43" EstimateCPU="0.001072" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="10720" LogicalOp="Compute Scalar" NodeId="1" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.13985">
<OutputList>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
<ColumnReference Column="Expr1005" />
</OutputList>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1005" />
<ScalarOperator ScalarString="substring([CASH].[dbo].[Converting].[FromNumber] as [c].[FromNumber],(1),(6))">
<Intrinsic FunctionName="substring">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(1)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(6)" />
</ScalarOperator>
</Intrinsic>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="38" EstimateCPU="0.011949" EstimateIO="0.126829" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="10720" LogicalOp="Table Scan" NodeId="2" Parallel="false" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="0.138778">
<OutputList>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10720" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<TableScan Ordered="false" ForcedIndex="false" NoExpandHint="false">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
</DefinedValue>
</DefinedValues>
<Object Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" />
</TableScan>
</RelOp>
</ComputeScalar>
</RelOp>
<RelOp AvgRowSize="48" EstimateCPU="0.000258212" EstimateIO="0.003125" EstimateRebinds="10580.9" EstimateRewinds="138.124" EstimateRows="1" LogicalOp="Lazy Spool" NodeId="6" Parallel="false" PhysicalOp="Index Spool" EstimatedTotalSubtreeCost="14.9998">
<OutputList>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRebinds="830" ActualRewinds="9890" ActualRows="10720" ActualEndOfScans="0" ActualExecutions="10720" />
</RunTimeInformation>
<Spool>
<SeekPredicate>
<Prefix ScanType="EQ">
<RangeColumns>
<ColumnReference Column="Expr1005" />
</RangeColumns>
<RangeExpressions>
<ScalarOperator ScalarString="[Expr1005]">
<Identifier>
<ColumnReference Column="Expr1005" />
</Identifier>
</ScalarOperator>
</RangeExpressions>
</Prefix>
</SeekPredicate>
<RelOp AvgRowSize="48" EstimateCPU="0.0103333" EstimateIO="0" EstimateRebinds="1180" EstimateRewinds="0" EstimateRows="1" LogicalOp="Compute Scalar" NodeId="7" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="12.2037">
<OutputList>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ScalarOperator ScalarString="[CASH].[CASH].[dbo].[DestinationLookup].[Digits] as [d].[Digits]">
<Identifier>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
</Identifier>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
<ScalarOperator ScalarString="[CASH].[CASH].[dbo].[DestinationLookup].[Destination] as [d].[Destination]">
<Identifier>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</Identifier>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="48" EstimateCPU="0.0103333" EstimateIO="0" EstimateRebinds="1180" EstimateRewinds="0" EstimateRows="1" LogicalOp="Remote Query" NodeId="8" Parallel="false" PhysicalOp="Remote Query" EstimatedTotalSubtreeCost="12.2037">
<OutputList>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRebinds="456" ActualRewinds="0" ActualRows="456" ActualEndOfScans="0" ActualExecutions="456" />
</RunTimeInformation>
<RemoteQuery RemoteSource="CASH" RemoteQuery="SELECT "Tbl1004"."Digits" "Col1015","Tbl1004"."Destination" "Col1016" FROM "CASH"."dbo"."DestinationLookup" "Tbl1004" WHERE "Tbl1004"."Digits"=?" />
</RelOp>
</ComputeScalar>
</RelOp>
</Spool>
</RelOp>
</NestedLoops>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
更新:我一直无法找到理想的解决方案,但我发现了次佳的解决方案。似乎使用“like”对这两个表进行的非常简单的查询在大约五秒钟内完成。因此,我没有尝试将连接塞进我的怪物查询中,它永远不会完成,而是使用它来创建一个临时查找表,然后我的怪物查询使用它。总之,现在大查询在 9 秒内完成,并且我的 varchar 连接中支持可变长度字符串。
另一个有助于加快这一进程的因素是将 t1 中列的填充因子从 80 更改为 100。此填充因子非常适合该表,因为它是一个静态参考表,每年仅更改一次。
【问题讨论】:
-
LIKE之类的函数是不可搜索的,这意味着普通索引将被忽略。我真的不认为有一个很好的方法来做你正在尝试的事情。为什么不能只在 t2 表中存储适当的值? -
谁告诉你的?只有前导通配符会杀死索引使用,而不是 LIKE 本身。例如,
WHERE col LIKE 'x%'将使用 col 上的索引(如果存在)。 -
感谢您的建议。不幸的是,我的 t2 表没有静态内容。它用于处理新的传入记录,每月将有数百万条记录。将适当的值放入其中本质上就是我试图对这个连接做的事情。
-
@dean,感谢您证明“喜欢”是这四个选项中最快的。我仍然需要更快的东西,但无论如何我很高兴知道。
-
然后向我们展示实际的执行计划,感谢支持:)
标签: sql sql-server join query-optimization