【问题标题】:Grouped running total with Power Query M使用 Power Query M 分组运行总计
【发布时间】:2021-04-21 10:34:21
【问题描述】:

这就是我的表格的样子(170 万行):

我正在尝试建立每个客户 ID 和日期的运行总计。

这很容易用 DAX 表达,但不幸的是我的机器上没有足够的内存(16GB RAM)。

所以,我正在尝试使用缓冲表等找到 Power Query M 的替代方案,但这对我来说太复杂了。

有人可以帮忙吗?提前非常感谢您!

编辑:按日期和客户 ID 排序后,添加索引并添加自定义列:

= Table.AddColumn(#"Added Index", "Personalizado", each (i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]))

我得到以下信息:

编辑2: 整个代码:

let
    Origem = dataset,
    #"Linhas Agrupadas" = Table.Group(Origem, {"Date", "CustomerID"}, {{"Sales", each List.Sum([Sales]), type nullable number}}),
    #"Linhas Ordenadas" = Table.Sort(#"Linhas Agrupadas",{{"Date", Order.Ascending}, {"CustomerID", Order.Ascending}}),
    #"Linhas Filtradas" = Table.SelectRows(#"Linhas Ordenadas", each [Sales] <> 0),
    #"Added Index" = Table.AddIndexColumn(#"Linhas Filtradas", "Index", 0, 1, Int64.Type),
    #"Personalizado Adicionado" = Table.AddColumn(#"Added Index","CumSum",(i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]), type number )
in
    #"Personalizado Adicionado"

【问题讨论】:

    标签: powerbi powerquery


    【解决方案1】:

    方法1

    首先对数据进行排序,可能是在日期列和客户ID 列上。然而它出现在屏幕上的是它要累积总数的行顺序

    添加列..索引列...

    添加列.. 带有公式的自定义列

    = (i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales])
    

    右键单击索引列并将其删除

    可能在索引步骤周围添加一个 Table.Buffer() 将有助于加快速度

    示例完整代码:

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Sorted Rows" = Table.Sort(Source,{{"CustomerID", Order.Ascending}, {"Date", Order.Ascending}}),
    #"Added Index" = Table.Buffer(Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1)),
    #"Added Custom" = Table.AddColumn(#"Added Index","CumSum",(i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]), type number ),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"})
    in  #"Removed Columns"
    

    方法二

    创建函数fn_cum_total

    (Input) =>
    let withindex = Table.AddIndexColumn(Input, "Index", 1, 1),
    cum = Table.AddColumn(withindex, "Total",each List.Sum(List.Range(withindex[Sales],0,[Index])))[Total]
    in cum
    

    创建使用该函数在按 CustomerID 分组后将累积总计添加到 Sales 列的查询

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Sorted Rows" = Table.Buffer(Table.Sort(Source,{{"CustomerID", Order.Ascending}, {"Date", Order.Ascending}})),
    Running_Total = Table.Group(#"Sorted Rows",{"CustomerID"},{{"Data",
        (Input as table) as table =>  let  zz = fn_cum_total(Input),
         result = Table.FromColumns(Table.ToColumns(Input)&{zz}, Value.Type(Table.AddColumn(Input, "total", each null, type number))) in result, type table}} ),
    #"Expanded Data" = Table.ExpandTableColumn(Running_Total, "Data", {"Date", "Sales", "total"}, {"Date", "Sales", "total"})
    in #"Expanded Data"
    

    方法2我不能记功,很久以前借的,但不记得出处

    【讨论】:

    • 谢谢,我在表格的每个单元格中都有“功能”,请参阅上面我编辑的问题以了解更多详细信息。
    • 我刚刚重试了上面的代码。看起来不错。请张贴你的
    • = Table.AddColumn(#"添加索引", "Personalizado", each (i)=>List.Sum(Table.SelectRows(#"添加索引", each [CustomerID]=i[ CustomerID] 和 [Index]
    • 对不起。我的意思是请发布整个代码块
    • 堆栈溢出不允许我在这里粘贴整个代码,我将更新我上面的问题,谢谢
    猜你喜欢
    • 2020-05-19
    • 2019-08-11
    • 2021-12-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-06-13
    • 1970-01-01
    相关资源
    最近更新 更多