累积添加上个月或上一年的缺失数据答案

【问题标题】：Add missing data from previous month or year cumulatively累积添加上个月或上一年的缺失数据
【发布时间】：2016-04-19 09:47:09
【问题描述】：

假设我有以下数据：

select 1 id, 'A' name, '2007' year, '04' month,  5 sales  from dual union all
select 2 id, 'A' name, '2007' year, '05' month,  2 sales  from dual union all
select 3 id, 'B' name, '2008' year, '12' month,  3 sales  from dual union all
select 4 id, 'B' name, '2009' year, '12' month, 56 sales  from dual union all
select 5 id, 'C' name, '2009' year, '08' month, 89 sales  from dual union all
select 13 id,'B' name, '2016' year, '01' month, 10 sales  from dual union all
select 14 id,'A' name, '2016' year, '02' month,  8 sales  from dual union all
select 15 id,'D' name, '2016' year, '03' month, 12 sales  from dual union all
select 16 id,'E' name, '2016' year, '04' month, 34 sales  from dual

我想累计所有年份及其各自时期（月）的所有销售额。输出应如下所示：

name    year    month   sale   opening bal   closing bal
 A      2007     04      5        0              5
 A      2007     05      2        5              7
 B      2008     12      3        12             15
 A      2008     04      0        5              5    -- to be generated
 A      2008     05      0        7              7    -- to be generated
 B      2009     12      56       15             71
 C      2009     08      89       71             160
 A      2009     04      0        5              5    -- to be generated
 A      2009     05      0        7              7    -- to be generated
 B      2016     01      10       278            288
 B      2016     12      0         71             71  -- to be generated
 A      2016     02      8        288            296
 A      2016     04      0         5              5   -- to be generated
 A      2016     05      0         7              7   -- to be generated
 D      2016     03      12       296            308
 E      2016     04      34       308            342
 C      2016     08      0        160            160  -- to be generated

期初余额为上个月的期末余额，如果进入下一年，则下一年的期初余额为上一年的期末余额。在接下来的几年里，它应该能够像这样工作。我已经让这部分工作了。但是，我不知道如何解决 2008 年存在的 2009 年缺少的问题。例如，密钥 A,2008,04 和 A,2008,05 在 2009 年不存在，代码应该能够添加它在 2009 年就像上面一样。其他年份和月份也是如此。

我正在开发 Oracle 12c。

提前致谢。

【问题讨论】：

您的示例数据是否输入错误？您的输出数据引用 A、2007、04 和 05，但在您的样本数据中，您有 A、2007、04 但有 B、2007、05。另外，“后续年份”是什么意思？这是从 2007 年到现在的所有年份，还是只是数据中存在的年份（例如 2007、2008、2009 和 2016 年）？附： 感谢您提供的示例数据！很难以这种方便的形式获取数据！ *{:-D
我那里没有 B, 2007,05。我自己看不到：S
select 2 id, 'B' name, '2007' year, '05' month, 2 sales from dual union all
我需要创建一个表格，其输出内容如上所示。这样，我的累积添加值的分析函数将完美运行。由于数据现在在 union alls 上面，因此分析功能不起作用。因为如果 2009 年缺少的组合键在 2008 年存在，那么 2009 年将不会被拾取。
为什么 B/2008/12 的期初余额为 12，C/2009/08 的期初余额为 71 等 - 而不是零？这些数字是从哪里来的——你是在假设你没有显示的其他数据吗？

标签： sql oracle plsql

【解决方案1】：

@boneists 方法的一种变体，从 CTE 中的样本数据开始：

with t as (
  select 1 id, 'A' name, '2007' year, '04' month,  5 sales  from dual union all
  select 2 id, 'A' name, '2007' year, '05' month,  2 sales  from dual union all
  select 3 id, 'B' name, '2008' year, '12' month,  3 sales  from dual union all
  select 4 id, 'B' name, '2009' year, '12' month, 56 sales  from dual union all
  select 5 id, 'C' name, '2009' year, '08' month, 89 sales  from dual union all
  select 13 id,'B' name, '2016' year, '01' month, 10 sales  from dual union all
  select 14 id,'A' name, '2016' year, '02' month,  8 sales  from dual union all
  select 15 id,'D' name, '2016' year, '03' month, 12 sales  from dual union all
  select 16 id,'E' name, '2016' year, '04' month, 34 sales  from dual
),
y (year, rnk) as (
  select year, dense_rank() over (order by year)
  from (select distinct year from t)
),
r (name, year, month, sales, rnk) as (
  select t.name, t.year, t.month, t.sales, y.rnk
  from t
  join y on y.year = t.year
  union all
  select r.name, y.year, r.month, 0, y.rnk
  from y
  join r on r.rnk = y.rnk - 1
  where not exists (
    select 1 from t where t.year = y.year and t.month = r.month and t.name = r.name
  )
)
select name, year, month, sales,
  nvl(sum(sales) over (partition by name order by year, month
    rows between unbounded preceding and 1 preceding), 0) as opening_bal,
  nvl(sum(sales) over (partition by name order by year, month
    rows between unbounded preceding and current row), 0) as closing_bal
from r
order by year, month, name;

这也得到了相同的结果，尽管它也不符合问题中的预期结果：

NAME YEAR MONTH      SALES OPENING_BAL CLOSING_BAL
---- ---- ----- ---------- ----------- -----------
A    2007 04             5           0           5
A    2007 05             2           5           7
A    2008 04             0           7           7
A    2008 05             0           7           7
B    2008 12             3           0           3
A    2009 04             0           7           7
A    2009 05             0           7           7
C    2009 08            89           0          89
B    2009 12            56           3          59
B    2016 01            10          59          69
A    2016 02             8           7          15
D    2016 03            12           0          12
A    2016 04             0          15          15
E    2016 04            34           0          34
A    2016 05             0          15          15
C    2016 08             0          89          89
B    2016 12             0          69          69

y CTE（随意使用更有意义的名称！）从您的原始数据生成所有不同的年份，并添加一个排名，因此 2007 是 1，2008 是 2，2009 是 3，2016 是4.

r 递归 CTE 根据前几年的名称/月份数据将您的实际数据与零销售额的虚拟行相结合。

根据递归 CTE 产生的结果，您可以进行分析累积总和以添加期初/期末余额。这是使用窗口条款来决定要包括哪些销售价值 - 基本上期初余额和期末余额是到目前为止所有值的总和，但期初不包括当前行。

【讨论】：

我想我可能更喜欢分区连接而不是递归 CTE，现在我已经弄清楚它在做什么；以前没用过。如果有大量真实数据，它也可能会显着提高效率，因为它避免了自连接（在 not-exists 子句中）。

【解决方案2】：

这是我能得到的最接近您的结果，尽管我意识到这不是完全匹配。例如，您的期初余额看起来不正确（对于 id = 3 的输出行，期初余额 12 是从哪里来的？）。无论如何，希望以下内容能让您进行适当的修改：

with sample_data as (select 1 id, 'A' name, '2007' year, '04' month,  5 sales  from dual union all
                     select 2 id, 'A' name, '2007' year, '05' month,  2 sales  from dual union all
                     select 3 id, 'B' name, '2008' year, '12' month,  3 sales  from dual union all
                     select 4 id, 'B' name, '2009' year, '12' month, 56 sales  from dual union all
                     select 5 id, 'C' name, '2009' year, '08' month, 89 sales  from dual union all
                     select 13 id, 'B' name, '2016' year, '01' month, 10 sales  from dual union all
                     select 14 id, 'A' name, '2016' year, '02' month,  8 sales  from dual union all
                     select 15 id, 'D' name, '2016' year, '03' month, 12 sales  from dual union all
                     select 16 id, 'E' name, '2016' year, '04' month, 34 sales  from dual),
             dts as (select distinct year
                     from   sample_data),
             res as (select sd.name,
                            dts.year,
                            sd.month,
                            nvl(sd.sales, 0) sales,
                            min(sd.year) over (partition by sd.name, sd.month) min_year_per_name_month,
                            sum(nvl(sd.sales, 0)) over (partition by name order by to_date(dts.year||'-'||sd.month, 'yyyy-mm')) - nvl(sd.sales, 0) as opening,
                            sum(nvl(sd.sales, 0)) over (partition by name order by to_date(dts.year||'-'||sd.month, 'yyyy-mm')) as closing
                     from   dts
                            left outer join sample_data sd partition by (sd.name, sd.month) on (sd.year = dts.year))
select name,
       year,
       month,
       sales,
       opening,
       closing
from   res
where  (opening != 0 or closing != 0)
and    year >= min_year_per_name_month
order by to_date(year||'-'||month, 'yyyy-mm'),
         name;

NAME YEAR MONTH      SALES    OPENING    CLOSING
---- ---- ----- ---------- ---------- ----------
A    2007 04             5          0          5
A    2007 05             2          5          7
A    2008 04             0          7          7
A    2008 05             0          7          7
B    2008 12             3          0          3
A    2009 04             0          7          7
A    2009 05             0          7          7
C    2009 08            89          0         89
B    2009 12            56          3         59
B    2016 01            10         59         69
A    2016 02             8          7         15
D    2016 03            12          0         12
A    2016 04             0         15         15
E    2016 04            34          0         34
A    2016 05             0         15         15
C    2016 08             0         89         89
B    2016 12             0         69         69

我使用Partition Outer Join 链接表中的任何月份和名称组合（在我的查询中，sample_data 子查询 - 您不需要该子查询，您只需使用您的表！）同一张表中的任何一年，然后计算出期初/期末余额。然后我丢弃所有期初和期末余额为 0 的行。

【讨论】：

嗯，谢谢 Boneist。我的实例中没有启用分区。我正在考虑更多关于 Merge Into 查询的内容。查看诸如名称、年份、月份之类的键是否在 2008 年存在，但在 2009 年不存在，然后在 2009 年使用上述值插入它。
连接中使用的分区不与用于对表进行分区的分区相同。 AFAIK，使用特定类型的连接没有许可问题！
@bytebiscuit 我已更新我的答案以排除在首次出现在表格中的年份之前没有值的月份