【问题标题】:How to use unique column values as input into another select statement如何使用唯一列值作为另一个选择语句的输入
【发布时间】:2012-06-01 12:27:38
【问题描述】:

我有一个表 (MySQL),它有一个名为 binID 的列。此列中的值范围为 1 到 70。

我想要做的是选择该列的唯一值(应该是从 1 到 70 的数字),然后使用每个(我们称之为 theBinID)作为参数将它们迭代到另一个 SELECT 语句中,例如:

SELECT * FROM MyTable WHERE binID = theBinID ORDER BY createdDate DESC LIMIT 10

基本上,我希望为每个 binID 获取 10 个最近的行。

我不相信有一种方法可以用基本的 SQL 语句来做到这一点,虽然我希望这是答案,所以我编写了一个存储过程,它在SELECT DISTINCT 的 binID,然后对其进行迭代并填充临时表。

我的问题是,这是为了优化,如果我获取 100K 行,我得到 1.7 秒的平均时间。执行我的存储过程以获取 700 行(70 个 bin 的 10 条记录)需要 1.4 秒。我意识到 0.3 秒可以被视为相当大的改进,但我希望在 100K 行中获得这个亚秒级。

有没有更好的办法??

完整的存储过程是这样的:

BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE binID INT;
DECLARE cur1 CURSOR FOR SELECT DISTINCT heatmapBinID from MEStressTest ORDER BY heatmapBinID ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

DROP TEMPORARY TABLE IF EXISTS TempResults;

CREATE TEMPORARY TABLE TempResults (
    `recordID` text NOT NULL,
    `queryTerm` text NOT NULL,
    `recordCreated` double(11,0) NOT NULL,
    `recordByID` text NOT NULL,
    `recordByName` text NOT NULL,
    `recordText` text NOT NULL,
    `recordSource` text NOT NULL,
    `rerecordCount` int(11) NOT NULL DEFAULT '0',
    `timecodeOffset` int(11) NOT NULL DEFAULT '-1',
    `recordByImageURL` text NOT NULL,
    `canDelete` int(11) NOT NULL DEFAULT '1',
    `heatmapBinID` int(11) DEFAULT NULL,
    `timelineBinID` int(11) DEFAULT NULL,
    PRIMARY KEY (`recordID`(20))
);

OPEN cur1;

read_loop: LOOP
    FETCH cur1 INTO binID;

    IF done THEN
        LEAVE read_loop;
    END IF;

    INSERT INTO TempResults (recordID, queryTerm, recordCreated, recordByID, recordByName, recordText, recordSource, rerecordCount, timecodeOffset, recordByImageURL, canDelete, heatmapBinID, timelineBinID)
    SELECT * FROM MEStressTest WHERE heatmapBinID = binID ORDER BY recordCreated DESC LIMIT numRecordsPerBin;
END LOOP;

CLOSE cur1;

SELECT * FROM TempResults ORDER BY heatmapBinID ASC, recordCreated DESC;

结束

【问题讨论】:

    标签: mysql stored-procedures query-optimization


    【解决方案1】:

    尝试在 MySQL 中模拟 ROW_NUMBER OVER PARTITION:http://www.sqlfiddle.com/#!2/fd8b5/4

    鉴于此数据:

    create table sentai(
      band varchar(50),
      member_name varchar(50),
      member_year int not null
    );
    
    insert into sentai(band, member_name, member_year) values
    ('BEATLES','JOHN',1960),
    ('BEATLES','PAUL',1961),
    ('BEATLES','GEORGE',1962),
    ('BEATLES','RINGO',1963),
    ('VOLTES V','STEVE',1970),
    ('VOLTES V','MARK',1971),
    ('VOLTES V','BIG BERT',1972),
    ('VOLTES V','LITTLE JOHN',1973),
    ('VOLTES V','JAMIE',1964),
    ('ERASERHEADS','ELY',1990),
    ('ERASERHEADS','RAYMUND',1991),
    ('ERASERHEADS','BUDDY',1992),
    ('ERASERHEADS','MARCUS',1993);
    

    对象,找到每个乐队中所有三个最近的成员。

    首先我们必须根据大多数年份在每个成员上放置一个 row_number(按降序排列)

    select *,
    
      @rn := @rn + 1 as rn
    from (sentai s, (select @rn := 0) as vars)
    order by s.band, s.member_year desc;
    

    输出:

    |        BAND | MEMBER_NAME | MEMBER_YEAR | @RN := 0 | RN |
    |-------------|-------------|-------------|----------|----|
    |     BEATLES |       RINGO |        1963 |        0 |  1 |
    |     BEATLES |      GEORGE |        1962 |        0 |  2 |
    |     BEATLES |        PAUL |        1961 |        0 |  3 |
    |     BEATLES |        JOHN |        1960 |        0 |  4 |
    | ERASERHEADS |      MARCUS |        1993 |        0 |  5 |
    | ERASERHEADS |       BUDDY |        1992 |        0 |  6 |
    | ERASERHEADS |     RAYMUND |        1991 |        0 |  7 |
    | ERASERHEADS |         ELY |        1990 |        0 |  8 |
    |    VOLTES V | LITTLE JOHN |        1973 |        0 |  9 |
    |    VOLTES V |    BIG BERT |        1972 |        0 | 10 |
    |    VOLTES V |        MARK |        1971 |        0 | 11 |
    |    VOLTES V |       STEVE |        1970 |        0 | 12 |
    |    VOLTES V |       JAMIE |        1964 |        0 | 13 |
    

    然后当成员在不同的乐队时我们重置行号:

    select *,
    
      @rn := IF(@pg = s.band, @rn + 1, 1) as rn,
      @pg := s.band
    from (sentai s, (select @pg := null, @rn := 0) as vars)
    order by s.band, s.member_year desc;
    

    输出:

    |        BAND | MEMBER_NAME | MEMBER_YEAR | @PG := NULL | @RN := 0 | RN | @PG := S.BAND |
    |-------------|-------------|-------------|-------------|----------|----|---------------|
    |     BEATLES |       RINGO |        1963 |      (null) |        0 |  1 |       BEATLES |
    |     BEATLES |      GEORGE |        1962 |      (null) |        0 |  2 |       BEATLES |
    |     BEATLES |        PAUL |        1961 |      (null) |        0 |  3 |       BEATLES |
    |     BEATLES |        JOHN |        1960 |      (null) |        0 |  4 |       BEATLES |
    | ERASERHEADS |      MARCUS |        1993 |      (null) |        0 |  1 |   ERASERHEADS |
    | ERASERHEADS |       BUDDY |        1992 |      (null) |        0 |  2 |   ERASERHEADS |
    | ERASERHEADS |     RAYMUND |        1991 |      (null) |        0 |  3 |   ERASERHEADS |
    | ERASERHEADS |         ELY |        1990 |      (null) |        0 |  4 |   ERASERHEADS |
    |    VOLTES V | LITTLE JOHN |        1973 |      (null) |        0 |  1 |      VOLTES V |
    |    VOLTES V |    BIG BERT |        1972 |      (null) |        0 |  2 |      VOLTES V |
    |    VOLTES V |        MARK |        1971 |      (null) |        0 |  3 |      VOLTES V |
    |    VOLTES V |       STEVE |        1970 |      (null) |        0 |  4 |      VOLTES V |
    |    VOLTES V |       JAMIE |        1964 |      (null) |        0 |  5 |      VOLTES V |
    

    然后我们只选择每个乐队最近的三个成员:

    select x.band, x.member_name, x.member_year
    from
    (
      select *,
        @rn := IF(@pg = s.band, @rn + 1, 1) as rn,
        @pg := s.band
      from (sentai s, (select @pg := null, @rn := 0) as vars)
      order by s.band, s.member_year desc
    ) as x
    where x.rn <= 3
    order by x.band, x.member_year desc;
    

    输出:

    |        BAND | MEMBER_NAME | MEMBER_YEAR |
    |-------------|-------------|-------------|
    |     BEATLES |       RINGO |        1963 |
    |     BEATLES |      GEORGE |        1962 |
    |     BEATLES |        PAUL |        1961 |
    | ERASERHEADS |      MARCUS |        1993 |
    | ERASERHEADS |       BUDDY |        1992 |
    | ERASERHEADS |     RAYMUND |        1991 |
    |    VOLTES V | LITTLE JOHN |        1973 |
    |    VOLTES V |    BIG BERT |        1972 |
    |    VOLTES V |        MARK |        1971 |
    

    虽然窗口函数(例如 ROW_NUMBER OVER PARTITION)在 MySQL 上尚不可用,但只需使用变量进行模拟即可。请让我们知道这是否比光标方法更快


    在支持窗口的 RDBMS 上的样子:http://www.sqlfiddle.com/#!1/fd8b5/6

    with member_recentness as
    (
      select row_number() over each_band as recent, *
      from sentai
      window each_band as (partition by band order by member_year desc)
    )
    select * 
    from member_recentness
    where recent <= 3;
    

    输出:

    | RECENT |        BAND | MEMBER_NAME | MEMBER_YEAR |
    |--------|-------------|-------------|-------------|
    |      1 |     BEATLES |       RINGO |        1963 |
    |      2 |     BEATLES |      GEORGE |        1962 |
    |      3 |     BEATLES |        PAUL |        1961 |
    |      1 | ERASERHEADS |      MARCUS |        1993 |
    |      2 | ERASERHEADS |       BUDDY |        1992 |
    |      3 | ERASERHEADS |     RAYMUND |        1991 |
    |      1 |    VOLTES V | LITTLE JOHN |        1973 |
    |      2 |    VOLTES V |    BIG BERT |        1972 |
    |      3 |    VOLTES V |        MARK |        1971 |
    

    【讨论】:

      【解决方案2】:

      如果您尝试在没有任何连接键的情况下内连接 2 个表,它将是 2 个表的笛卡尔积,即:

      SELECT * 
      FROM MyTable t 
          INNER JOIN (SELECT DISTINCT binId FROM MyTable) AS u 
      WHERE 
          t.binID = theBinID 
      ORDER BY t.createdDate DESC LIMIT 10
      

      您可以参考this

      【讨论】:

      • 干杯,Raymomd。不确定我是否理解这一点(theBinID 来自哪里),但它似乎产生了表的前 10 行而不是每个 bin,就像下面的答案一样。
      【解决方案3】:
      SELECT * FROM MyTable WHERE binID IN (SELECT DISTINCT(bin_id) FROM mysql_table) ORDER BY createdDate DESC LIMIT 10;
      

      这没有经过测试,语法没关系。

      添加索引以提高性能。

      【讨论】:

      • 感谢您的回复,但这与 SELECT * FROM MyTable ORDER BY createdDate DESC LIMIT 10 相同,它只为我提供表中的前 10 行,而不是每个 bin 中的前 10 行。
      猜你喜欢
      • 1970-01-01
      • 2021-04-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-11-13
      • 1970-01-01
      • 1970-01-01
      • 2017-10-26
      相关资源
      最近更新 更多