用于仅选择第一列中具有相同数据的行的第一次出现的 SQL 查询答案

【问题标题】：SQL query for selecting only first occurrences of rows with same data in the first column用于仅选择第一列中具有相同数据的行的第一次出现的 SQL 查询
【发布时间】：2011-03-09 12:27:40
【问题描述】：

是否有一个简洁的 SQL 查询会返回行，以便只返回 first 出现的行，在第一列中具有相同的数据？也就是说，如果我有类似的行

blah something
blah somethingelse
foo blah
bar blah
foo hello

查询应该给我第一、第三和第四行（因为第一行是第一列中“blah”的第一次出现”，第三行是第一列中“foo”的第一次出现，第四row 是第一列中第一次出现的“bar”）。

我正在使用H2 database engine，如果这很重要的话。

更新：抱歉，表定义不清楚，这里更好； “blah”、“foo”等表示行中第一列的值。

blah [rest of columns of first row]
blah [rest of columns of second row]
foo  [-""- third row]
bar  [-""- fourth row]
foo  [-""- fifth row]

【问题讨论】：

你的表有PK列吗？
当您说“第一个”时，您的意思是“我偶然发现的第一个”或“按字母顺序排列的第一个”，还是“第一个”的其他定义？ :)
添加到@Jonathon 的问题中选择 blah something 而不是 blah somethingelse 的规则是什么
@everybody：我确实有一个 PK 列，“第一个”是指 PK 列的第一个（我认为这是“偶然发现”的顺序，除非另有说明） .
@Mark：选择第一个“blah”的规则是PK列的第一个。

标签： sql h2

【解决方案1】：

如果您的意思是在第 2 列按字母顺序排列，这里有一些 SQL 来获取这些行：

create table #tmp (
    c1 char(20),
    c2 char(20)
)
insert #tmp values ('blah','something')
insert #tmp values ('blah','somethingelse')
insert #tmp values ('foo','ahhhh')
insert #tmp values ('foo','blah')
insert #tmp values ('bar','blah')
insert #tmp values ('foo','hello')

select c1, min(c2) c2 from #tmp
group by c1

【讨论】：

【解决方案2】：

分析请求可以解决问题。

Select *
from (
    Select rank(c1) over (partition by c1) as myRank, t.*
    from myTable t )
where myRank = 1

但这只是 V1.3.X 的优先级 2

http://www.h2database.com/html/roadmap.html?highlight=RANK&search=rank#firstFound

【讨论】：

【解决方案3】：

我认为这可以满足您的需求，但我不能 100% 确定。（也基于 MS SQL Server。）

create table #t
(
PKCol int identity(1,1),
Col1 varchar(200)
)

Insert Into #t
Values ('blah something')
Insert Into #t
Values ('blah something else')
Insert Into #t
Values ('foo blah')
Insert Into #t
Values ('bar blah')
Insert Into #t
Values ('foo hello')


Select t.*
From #t t
Join (
     Select min(PKCol) as 'IDToSelect'
     From #t
     Group By Left(Col1, CharIndex(space(1), col1))
)q on t.PKCol = q.IDToSelect

drop table #t

【讨论】：

【解决方案4】：

如果您对最快的查询感兴趣：在表的第一列上建立索引是相对重要的。这样，查询处理器可以扫描该索引中的值。然后，最快的解决方案可能是使用“外部”查询来获取不同的 c1 值，加上“内部”或嵌套查询来获取第二列的可能值之一：

drop table test;
create table test(c1 char(20), c2 char(20));
create index idx_c1 on test(c1);

-- insert some data (H2 specific)
insert into test select 'bl' || (x/1000), x from system_range(1, 100000); 

-- the fastest query (64 ms)
select c1, (select i.c2 from test i where i.c1=o.c1 limit 1) from test o group by c1;

-- the shortest query (385 ms)
select c1, min(c2) c2 from test group by c1;

【讨论】：