检查所有关键字是否与 SQL 中的至少一列匹配答案

【问题标题】：Check if all keywords matched at least one column in SQL检查所有关键字是否与 SQL 中的至少一列匹配
【发布时间】：2017-05-17 23:21:08
【问题描述】：

我正在尝试编写一个 SQL 查询，该查询将搜索以检查所有关键字是否存在于多个列中。关键字可以包含通配符，例如“%”来表示任何字符串。

例如

first_name  | last_name | age   | height    | mother's name
-------------------------------------------------------
 mary       | jones     | 19    | 170       | sally jane     
 john       | doe       | 43    | 165       | sarah connor
 john       | connor    | 17    | 173       | sarah connor
 joe        | bloe      | 32    | 173       | sarah connor
 john       | connor    | 32    | 165       | sarah connor

如果我搜索“jo% %connor%”，我需要找到至少一列包含“jo%”且至少一列包含“%connor%”的所有行，并且我需要确保所有的关键字至少匹配一列。

我无法在表格上使用全文搜索。而且我认为我不能只连接所有列并检查它是否包含所有单词，因为搜索词中的通配符可能表明单词的开头必须以 jo 开头。

有没有在 SQL Server 2012 中进行这种搜索而不更改表属性等的好方法？

【问题讨论】：

如果 first_name 是 'mary' 而 last_name 是 'joconnor' 会发生什么？这是一场比赛吗？它验证 'jo%' 和 '%connor%' 但它是同一列。
没有。空格将分隔单词
必须是通配符搜索吗？如果没有，你可以做一个 OR 如果你有有限数量的列
@searchString like col1+col2+col3+Col4+Col5 OR @searchString like Col2+Col1+Col3+Col4+Col5 OR @searchString like Col3+Col1+Col2+Col4+Col5 OR ... 每次在 bigining 时只置换一列
但是我怎么知道所有的关键词都被找到了呢？是的，它必须能够进行通配符搜索

标签： sql sql-server sql-server-2012

【解决方案1】：

这是一个选项，您不必详细说明要搜索的所有字段，它只会返回命中 ALL 的记录，同时尊重各个搜索模式

现在，我使用了我的 Parse Function，但可以很容易地转换为内联查询。

示例

Declare @YourTable Table ([first_name] varchar(50),[last_name] varchar(50),[age] int,[height] int,[mother_name] varchar(50))
Insert Into @YourTable Values
 ('mary','jones',19,170,'sally jane')
,('john','doe',43,165,'sarah connor')
,('john','connor',17,173,'sarah connor')
,('joe','bloe',32,173,'sarah connor')
,('john','connor',32,165,'sarah connor')


Declare @Search varchar(max) = 'jo% %connor%'

;with cte as (
                Select *,MaxHit=max(RetSeq) over () From [dbo].[udf-Str-Parse](@Search,' ')
             )
Select A.* 
 From @YourTable A
 Cross Apply ( Select XMLData=convert(xml,(Select A.* For XML RAW))) B
 Cross Apply (
                Select Hits=count(*)
                  From (
                        Select Value  = attr.value('.','varchar(max)') 
                         From  B.XMLData.nodes('/row') as A(r)
                         Cross Apply A.r.nodes('./@*') AS B(attr)
                        ) C1
                 Join cte C2 on patindex(C2.RetVal,Value)>0
                 Having count(Distinct C2.RetSeq)>=max(C2.MaxHit)
             ) C

退货

first_name  last_name   age height  mother_name
john        doe         43  165     sarah connor
john        connor      17  173     sarah connor
joe         bloe        32  173     sarah connor
john        connor      32  165     sarah connor

有兴趣的解析函数

CREATE FUNCTION [dbo].[udf-Str-Parse] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (  
    Select RetSeq = Row_Number() over (Order By (Select null))
          ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
    From  (Select x = Cast('<x>' + replace((Select replace(@String,@Delimiter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.')) as A 
    Cross Apply x.nodes('x') AS B(i)
);
--Thanks Shnugo for making this XML safe
--Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
--Select * from [dbo].[udf-Str-Parse]('this,is,<test>,for,< & >',',')

EDIT - 没有 XML 的选项 - （比 XML 更高效）

Declare @Search varchar(max) = 'jo% %connor%'

;with cte as (
                Select *,MaxHit=max(RetSeq) over () From [dbo].[udf-Str-Parse](@Search,' ')
             )
Select A.*,C.*
 From #Temp A
 Cross Apply (
                Select Hits=count(Distinct C2.RetSeq)
                  From ( values (A.[first_name])
                               ,(A.[last_name])
                               ,(concat('',A.[age]))
                               ,(concat('',A.[height]))
                               ,(A.[mother_name])
                        ) C1 (Value)
                 Join cte C2 on patindex(C2.RetVal,Value)>0
                 Having count(Distinct C2.RetSeq)>=max(C2.MaxHit)
             ) C

注意：我将分隔符放回 [SPACE]，但这会排除像“%sarah connor%”这样的搜索。就个人而言，我更喜欢像 PIPE 这样的代币，但这是一种选择。此外，您还可以搜索日期、或和/或数字。

【讨论】：

我认为这可行，但速度很慢。完成 6500 条记录需要 20 秒。如果没有人能给我更好的查询，我会接受答案。
我必须通过查询才能准确了解正在发生的事情，因为它比我习惯的要复杂。但是有必要转成XML吗？我相信这可能会减慢速度。
@Asagohan XML 部分肯定没有优化。它只是动态地反透视您的数据（无需指定所有字段名称并避免数据类型冲突）。 UNPIVOT 肯定会更高效，也就是说，我无法想象你在哪里得到 20 秒。只是为了好玩，我将您的数据扩展到 20,000 行，并在 1.8 秒内得到结果（在我的笔记本电脑上）。
@Asagohan 编辑选项在0.187秒内返回结果
@Asagohan 您可能还注意到我使用 concat() 与 cast() 或 convert() 来表示数字。根据我的经验，我发现这更快，但我还没有看到任何明确的基准

【解决方案2】：

这不是微不足道的，但您可以拆分搜索字符串（假设模式没有空格）。然后，假设您对每一行都有一个唯一的 id：

with s(pattern) as (
      select *
      from dbo.split(@str, ' ')
     )
select t.*
from t cross apply
     (select count(*) as cnt
      from s
      where (first_name + last_name + cast(age as varchar(255)) + '|' + cast(height as varchar(255)) + '|' + mothername) like concat('%', s.pattern, '%')
     ) s
where s.cnt = (select count(*) from s);

【讨论】：

我不确定“where”部分在做什么。它应该连接first_name等吗？它不能以它所在的形式在 SQL Server 中工作