一种可能的解决方案是遍历所有值并自动生成日期。
带注释的加载脚本示例:
// Sample data
RawData:
Load * Inline [
Id, Cr Date , End Date
1 , 12/08/2019, 06/08/2020
2 , 05/07/2019, 16/11/2020
];
// temp table to keep distinct values
// of the concatenation of id, cr date and end date
// example record:
// 1^12/08/2019^06/08/2020
TempTable:
Load
distinct
Id & '^' & [Cr Date] & '^' & [End Date] as Id_Dates
Resident
RawData
;
// for each record in Id_Dates field
for i = 1 to FieldValueCount('Id_Dates')
// get the current iteration value
let value = FieldValue('Id_Dates', $(i));
// extract the Id
let currentId = SubField('$(value)', '^', 1);
// extract cr date
let currentCrDate = Num(SubField('$(value)', '^', 2));
// extract end date
let currentEndDate = Num(SubField('$(value)', '^', 3));
// autogenerate all dates between the currentCrDate and currentEndDate
// add the current Id value (this will link to the RawData table
DueDates:
LOAD
'$(currentId)' as Id,
date($(currentCrDate) + IterNo() - 1, 'DD/MM/YYYY') AS DueDate
AUTOGENERATE (1)
WHILE
$(currentCrDate) + IterNo() -1 <= $(currentEndDate)
;
next
// we dont need this table anymore
Drop Table TempTable;
脚本完成后将包含两个表格:
而DueDates 表将包含如下值:
附:如果源数据有很多不同的Id 值,此解决方案可能效率不高。
如果是这种情况,请告诉我,我会考虑其他解决方案
更新(2021 年 6 月 10 日)
另一种避免单独循环遍历所有行的方法是在源数据和具有所有可能日期的日历表之间创建交叉连接(笛卡尔连接)。一旦我们有了这个表,我们就可以过滤掉不需要的行
这种方法会更快但它很可能会在重新加载期间消耗更多的 RAM。重新加载完成后,结果应用程序应具有与“遍历每一行”方法相同的内存占用
RawData:
Load * Inline [
Id, Cr Date , End Date
1 , 12/08/2019, 06/08/2020
2 , 05/07/2019, 16/11/2020
];
// Get min and max dates from [Cr Date] and [End Date] fields
TempTable1:
Load
min([Cr Date]) as MinDate,
max([Cr Date]) as MaxDate
Resident
RawData
;
concatenate
Load
min([End Date]) as MinDate,
max([End Date]) as MaxDate
Resident
RawData
;
// Get the overall min and max dates
NoConcatenate
TempTable2:
Load
min(MinDate) as MinDate,
max(MaxDate) as MaxDate
Resident
TempTable1
;
Drop Table TempTable1;
// gnerate all possible dates between the min and max dates
// once the dates are generated join the result table to RawData
// since there is no common fields between both tables
// the result table will be many to many join (cartesian join)
// as a result at this point RawData will be quite large table
// No of rows in RawData (initially) * No of rows in the Calendar table
// for example if RawData has 10 rows and calendar have 1000 the result table
// will have 10 000 rows
// We will reduce the rows a bit in the next step
let vMinDate = peek('MinDate');
let vMaxDate = peek('MaxDate');
join (RawData)
Calendar:
Load
Date($(vMinDate) + IterNo() - 1, 'DD/MM/YYYY') as DueDate
Autogenerate 1
While
$(vMinDate) + IterNo() - 1 <= $(vMaxDate)
;
Drop Table TempTable2;
// Load resident modified RawData table and while loading we'll create new field
// This field will be used a flag and we'll filter on it at the end
// The logic in the field is:
// if [Cr Date] >= DueDate <= [End Date] then set it to 1 else 0
// The final step is to keep only records with TempFlag == 1
NoConcatenate
RawData_Final:
Load
Id,
[Cr Date],
[End Date],
DueDate
Where
TempFlag = 1
;
Load
Id,
[Cr Date],
[End Date],
DueDate,
if(DueDate >= [Cr Date] and DueDate <= [End Date], 1, 0) as TempFlag
Resident
RawData
;
Drop Table RawData;