SqlServer 随机数据生成观察答案

【问题标题】：SqlServer Random Data Generation ObservationSqlServer 随机数据生成观察
【发布时间】：2011-05-27 17:28:09
【问题描述】：

我有一个问题，为什么这两个查询的输出不同。我本来希望它们能以同样的方式工作。

查询 1：

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');

while @i1 <= ASCII('9')
begin
    insert into @table1 (c)
    select (CHAR(@i1))    

    set @i1 = @i1 +1;
end


insert into @cache (originalValue, obfuscateValue)
select [firstname], 
        (select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID()) 
from Customer
where [firstname] is not null

select * from @cache;

查询 2：

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

declare @table1 table(c char(1));
declare @i1 int;
set @i1 = ASCII('0');

while @i1 <= ASCII('9')
begin
    insert into @table1 (c)
    select (CHAR(@i1))    

    set @i1 = @i1 +1;
end


insert into @cache (originalValue)
select [firstname]
from Customer
where [firstname] is not null

update c
set c.obfuscateValue = t.Value
from @cache c
join 
(
    select originalValue,
    (       
        (select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID()) 
    ) as Value
    from @cache
) t on t.originalValue = c.originalValue

select * from @cache;

他们应该做同样的事情，但第一个查询返回以下结果：

Jonathon    73
Everett 73
Janet   73
Andy    73
Shauna  73

第二个：

Jonathon    82
Everett 40
Janet   68
Andy    79
Shauna  29

如您所见，第二个结果中的第二列具有不同的值，而第一列具有相同的值。

看起来在第一次查询中

(select top 1 c from @table1 order by NEWID()) + 
        (select top 1 c from @table1 order by NEWID())

只调用一次。

谁能解释一下这个谜团？

【问题讨论】：

这可能解释了一些事情stackoverflow.com/questions/1468159/…
您只是想生成一个介于 00 和 99 之间的数字吗？还是别的什么？
不，这是一个例子。它应该适用于任何策略，例如 000-999999。

标签： sql-server tsql sql-server-2008

【解决方案1】：

我认为随机值可以通过另一种方式生成。

这是如何生成 [a-zA-Z]{3,6}

declare @min int, @max int;
declare @alpha varchar(max)

set @min = 3;
set @max = 6;
set @alpha = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

declare @cache table(originalValue nvarchar(255), obfuscateValue nvarchar(255));

insert into @cache (originalValue, obfuscateValue)
select [firstname], LEFT(t.Value, case when t.maxLen < @min then @min else t.maxLen end)
from Customer 
join
(
    select ABS(CHECKSUM(NEWID()))%@max + 1 as maxLen,
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) +
            SUBSTRING(@alpha, ABS(CHECKSUM(NEWID()))%LEN(@alpha) + 1, 1) as Value
)t on t.Value is not null
where [firstname] is not null

select * from @cache;

【讨论】：

【解决方案2】：

一行？

SELECT
     RIGHT( --number of zeros to match expected max length. Or use REPLICATE.
        '000000' + CAST(
          --The 2 newid() expression means we'll get a larger number
          --less chance of using leading static zeroes
          CAST(CHECKSUM(NEWD_ID()) as bigint) * CAST(CHECKSUM(NEWD_ID()) as bigint)
            as varchar(30))
        --The 3 gives us the desired mask. Currently 3 digits.
        , 3)

【讨论】：

是的，但我认为我的解决方案更适合一组已知元素。
@denis_n：由你决定，但它与你在问题中提出的问题不同......

【解决方案3】：

您的假设是正确的，即第一个查询仅运行一次“选择顶部”。发生这种行为是因为优化器选择优化查询的方式。它决定是因为子查询（选择顶部查询）是自包含的，并且与外部选择查询不相关，因此它在执行计划中使用了 Tablespool（Lazy Spool）运算符。这会导致选择顶部值被放置在 tempdb 中以供重用。

由于优化器选择使用嵌套循环运算符将所有数据组合在一起，因此无需重新绑定，因此使用假脱机值而不是对每个输入外部行重新应用查询。

在第二个查询期间，优化器选择不使用 Tablespool 操作符（我相信这样做是因为输入表来自 tempdb）。因此，您可以为临时表中的每个输入行重新应用 select top 子查询。

如果需要，如果您想强制执行计划按需要执行，您可以使用表/查询提示。

【讨论】：

我从来没有使用过表/查询提示，也许你提供一个例子？