PL/SQL 行数更新答案

【问题标题】：PL/SQL rownum updatesPL/SQL 行数更新
【发布时间】：2011-03-19 07:50:42
【问题描述】：

我正在处理一个包含几个表的数据库。他们是一个

      districts table
      PK district_id

      student_data table
      PK study_id
      FK district_id

      ga_data table
      PK study_id
      district_id

ga_data 表是我添加的数据。student_data 表和 ga_data 都有 130 万条记录。两个表之间的 study_id 是 1 比 1，但是 ga_data.district_id 是 NULL 并且需要更新。我在使用以下 PL/SQL 时遇到问题：

update ga_data
set district_id = (select district_id from student_data
where student_data.study_id = ga_data.study_id)
where ga_data.district_id is null and rownum < 100;

我需要逐步执行，这就是我需要 rownum 的原因。但是我正确使用它吗？多次运行查询后，它只更新了 130 万条记录中的大约 8000 条记录（应该是大约 110 万条更新，因为 student_data 中的一些 District_ids 为空）。谢谢！

【问题讨论】：

不确定 rownum 在 UPDATE 上的工作原理，但我认为每次运行查询时，它都会更新 100 行，不是吗？因此，我认为，对于一百万次更新，您必须运行它 10.000 次。除非我真的知道它是如何工作的，否则我宁愿按 study_id 进行分区而不是使用 rownum。

标签： mysql oracle plsql

【解决方案1】：

ROWNUM 只是在前 n 行之后切断查询。您在 STUDENT_DATA 中有一些行的 DISTRICT_ID 为 NULL。因此，在多次运行之后，您的查询很可能会陷入困境，返回相同的 100 条 QA_DATA 记录，所有这些记录都匹配那些讨厌的 STUDENT_DATA 行之一。

因此，您需要一些机制来确保您以自己的方式逐步通过 QA_DATA 表。标志列将是一种解决方案。对查询进行分区以使其命中一组不同的 STUDENT_ID 是另一种方法。

目前尚不清楚为什么必须以 100 个为一组进行此操作，但可能最简单的方法是使用 BULK PROCESSING（至少在 Oracle 中：这种 PL/SQL 语法在 MySQL 中不起作用）。

这是一些测试数据：

SQL> select district_id, count(*)
  2  from student_data
  3  group by district_id
  4  /

DISTRICT_ID   COUNT(*)
----------- ----------
   7369        192
   7499        190
   7521        192
   7566        190
   7654        192
   7698        191
   7782        191
   7788        191
   7839        191
   7844        192
   7876        191
   7900        192
   7902        191
   7934        192
   8060        190
   8061        193
   8083        190
   8084        193
   8085        190
   8100        193
   8101        190
               183

22 rows selected.

SQL> select district_id, count(*)
  2  from qa_data
  3  group by district_id
  4  /

DISTRICT_ID   COUNT(*)
----------- ----------
                  4200

SQL>

此匿名块使用批量处理 LIMIT 子句将结果集批处理为 100 行的块。

SQL> declare
  2      type qa_nt is table of qa_data%rowtype;
  3      qa_recs qa_nt;
  4
  5      cursor c_qa is
  6          select qa.student_id
  7                 , s.district_id
  8          from qa_data qa
  9                  join student_data s
 10                      on (s.student_id = qa.student_id);
 11  begin
 12      open c_qa;
 13
 14      loop
 15          fetch c_qa bulk collect into qa_recs limit 100;
 16          exit when qa_recs.count() = 0;
 17
 18          for i in qa_recs.first()..qa_recs.last()
 19          loop
 20              update qa_data qt
 21                  set qt.district_id = qa_recs(i).district_id
 22                  where qt.student_id = qa_recs(i).student_id;
 23          end loop;
 24
 25      end loop;
 26  end;
 27  /

PL/SQL procedure successfully completed.

SQL>

请注意，此构造允许我们在发布更新之前对选定的行进行额外处理。如果我们需要以编程方式应用复杂的修复，这很方便。

如您所见，QA_DATA 中的数据现在与 STUDENT_DATA 中的数据匹配

SQL> select district_id, count(*)
  2  from qa_data
  3  group by district_id
  4  /

DISTRICT_ID   COUNT(*)
----------- ----------
   7369        192
   7499        190
   7521        192
   7566        190
   7654        192
   7698        191
   7782        191
   7788        191
   7839        191
   7844        192
   7876        191
   7900        192
   7902        191
   7934        192
   8060        190
   8061        193
   8083        190
   8084        193
   8085        190
   8100        193
   8101        190
               183

22 rows selected.

SQL>

【讨论】：

理论上，每次运行语句时，它应该返回不同的 100 行，因为之前运行的行应该不再符合条件。当然，假设更新语句实际上导致先前具有 NULL district_id 的行具有非 NULL district_id
@JustinCave - 感谢您强迫我澄清这一点。对我来说，现在是早上早些时候，我还没有喝茶来让旧的灰质工作。

【解决方案2】：

一次只更新 100 行是一种奇怪的要求。这是为什么呢？

无论如何，由于 student_data 中的 district_id 可以为空，因此您可能会一遍又一遍地更新相同的 100 行。

如果您扩展查询以确保存在非空的 District_id，您最终可能会到达您想要的位置：

update ga_data
set district_id = (
  select district_id 
  from student_data
  where student_data.study_id = ga_data.study_id
)
where ga_data.district_id is null 
and exists (
  select 1
  from student_data
  where student_data.study_id = ga_data.study_id
  and district_id is not null
)
and rownum < 100;

【讨论】：

【解决方案3】：

如果这是一次性转换，您应该考虑一种完全不同的方法。重新创建表作为两个表的连接。我保证当你意识到它与各种有趣的 100 行一次更新相比有多快时，你会放声大笑。

create table new_table as
   select study_id
         ,s.district_id
         ,g.the_remaining_columns_in_ga_data
    from student_data s
    join ga_data      g using(study_id);

   create indexes, constraints etc 
   drop table ga_data;
   alter table new_table rename to ga_data;

或者，如果不是一次性转换，或者您无法重新创建/删除表，或者您只是想在数据加载上多花几个小时：

merge
 into ga_data      g
using student_data s
   on (g.study_id  = s.study_id)
when matched then
   update
      set g.district_id = s.district_id;

最后一条语句也可以重写为可更新视图，但我个人从不使用它们。

在运行合并之前删除/禁用 ga_data.district_id 上的索引/约束并在之后重新创建它们将提高性能。

【讨论】：