【问题标题】:MySQL Random unique precent data from table with 1 column (without primary key)来自具有 1 列的表的 MySQL 随机唯一百分比数据(无主键)
【发布时间】:2020-08-02 17:34:45
【问题描述】:

我需要从表中获取数据(所有数据的 100%)并将其拆分为 3 列。

示例:我们有数据:

numbers
80174
91467
1105
12040
62224
46508
33149
61384
10811
84923

我们需要采取:

      | Random 60% of all | Random  40% of all
      | unique and not    | unique and not 
  All | contained in 40%  | contained in 60% 
      | of the column     | of the column
----------------------------------------------
80174 |      84923        |      33149
91467 |      91467        |      61384
1105  |       1105        |      10811
12040 |      62224        |      80174
62224 |      12040        |     
46508 |      46508        |     
33149 |                   |
61384 |                   |
10811 |                   |
84923 |                   |

【问题讨论】:

  • 是否需要将它们放在一个结果集中?
  • 您使用的是哪个版本的 MySQL?

标签: mysql sql random


【解决方案1】:

这使用 ORDER BY RAND() 函数来随机化行 它使用 mysql 8 窗口函数 ROW_NUMBER 首先将数字拆分为 60/40,然后再加入它们。

你可以在 mysql 5.x 中重建 row_number 函数,但它不是那么漂亮

根据torpas 的建议进行编辑,计算必要的行数。 编辑 2:在 forpas 的另一条评论之后,我将 CEIL 替换为 RAND

我认为必须有一个完善的解决方案,更多地使用 MODULO

CREATE TABLE Table1
    (`numbers` int)
;
    
INSERT INTO Table1
    (`numbers`)
VALUES
    (80174),
    (91467),
    (1105),
    (12040),
    (62224),
    (46508),
    (33149),
    (61384),
    (10811),
    (84923),
    (80179),
    (91469),
    (1109),
    (12049),
    (62229)    
;
WITH rand_num as (SELECT `numbers`, ROW_NUMBER() OVER (ORDER BY RAND()) as rn FROM Table1 ),
limitscal as (SELECT ROUND((COUNT(*)  * 6 / 10),0)  si_x  FROM Table1),
countcal as (SELECT COUNT(*)  cnt  FROM Table1),
60_num as (SELECT `numbers`, ROW_NUMBER() OVER (ORDER BY RAND()) as rn2 
            FROM  rand_num CROSS JOIN limitscal CROSS JOIN countcal
            WHERE rn MOD countcal.cnt < limitscal.si_x)
,40_num as (SELECT `numbers`, ROW_NUMBER() OVER (ORDER BY RAND()) as rn2 
            FROM  rand_num CROSS JOIN limitscal  CROSS JOIN countcal 
            WHERE rn MOD countcal.cnt >= limitscal.si_x )
SELECT 6_n.`numbers`,4_n.`numbers` FROM 60_num 6_n LEFT JOIN 40_num 4_n ON 6_n.rn2 = 4_n.rn2
数字 |数字 ------: | ------: 10811 | 61384 80174 | 12049 12040 | 46508 91467 | 84923 80179 | 1109 91469 | 62224 33149 | 1105 | 62229 |

db小提琴here

【讨论】:

  • @forpas 你是对的,我改变了它允许任意数量的行
【解决方案2】:

您可以根据rand()分配分组:

select t.*, (case then rand() < 0.6 then 1 else 2 end) as grouping
from t;

请注意,这约为 60%/40%。如果你想要一个精确的分割,你可以使用窗口函数:

select t.*,
       (case when seqnum <= 0.6 * cnt then 1 else 2 end) as grouping
from (select t.*, count(*) over () as cnt, row_number() over (order by rand()) as seqnum
      from t
     ) t

【讨论】:

  • 如果有窗口功能,使用NTILE()会更方便。对于 60%,它将是 NTILE(5) &lt;= 3
【解决方案3】:

您想对行进行随机编号。你想这样做两次。一次用于随机分成 60% 和 40%,一次用于所有列的随机排序。

从 MySQL 8 开始,您可以为此使用窗口函数 ROW_NUMBER。但是,将其应用为row_number() over (order by rand()) 两次会导致两次相同的随机顺序,因为 MySQL 会看到您按相同的表达式进行排序。所以,稍微改变一下表达方式,例如通过添加两个不同的常量。

剩下的是两个外连接到 100% 的行,一个连接 60%,一个连接剩余的 40%。

with prepared as
(
  select
    number,
    row_number() over (order by rand() + 0) as rn1,
    row_number() over (order by rand() + 1) as rn2,
    count(*) over () as cnt
  from numbers
)
, p100 as (select rn1 as rn, number from prepared)
, p60 as (select rn2 as rn, number from prepared where rn2 / cnt <= 0.6)
, p40 as (select cnt - rn2 + 1 as rn, number from prepared where rn2 / cnt > 0.6)
select
  p100.number as number1,
  p60.number as number2,
  p40.number as number3
from p100
left join p60 on p60.rn = p100.rn
left join p40 on p40.rn = p100.rn
order by p100.rn;

演示:https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=b14419fd15f8a7987c10f2ef25ced826

【讨论】:

    猜你喜欢
    • 2017-10-12
    • 2013-02-24
    • 2011-09-20
    • 1970-01-01
    • 2014-12-16
    • 2019-05-21
    • 1970-01-01
    • 1970-01-01
    • 2021-12-18
    相关资源
    最近更新 更多