如果您需要从并行处理选项中受益,那么您不能强制停止所有已处理分区的回滚。
解决类似问题的首选方法之一是批量处理分区;您可以自动并行处理每个 n 个分区,而不是在一个操作中处理所有分区。 (经过多次体验,我发现在我的机器上将MaxParallel 选项配置为 10 是最佳解决方案。
那么如果发生错误,只有当前批次会回滚。
在这个答案中,我将尝试提供使用 SSIS 自动批量处理分区的分步指南。
包概览
- 一批建筑尺寸
- 获取未处理的分区计数
- 循环遍历分区(每个循环读取 10 个分区)
- 过程数据
- 进程索引
包裹详情
创建变量
首先,我们必须添加一些我们在流程中需要的变量:
- intCount, intCurrent:在forloop容器中使用
- p_Cube:Cube 对象 ID
- p_Database:分析数据库 ID
- p_MaxParallel:一批要处理的分区数
- p_MeasureGroup:度量组对象 ID
- p_ServerName:分析服务实例名称
<Machine Name>\<Instance Name>
- strProcessData、strProcessDimensions 和 strProcessIndexes:用于存储与处理数据、索引和维度相关的 XMLA 查询
名称以p_ 开头的所有变量都是必需的,可以作为参数添加。
为 Analysis Services 添加连接管理器
添加变量后,我们要创建一个连接管理器来连接SQL Server分析服务实例:
- 首先我们必须手动配置连接管理器:
- 然后我们必须分配服务器名称和初始目录表达式,如下图所示:
- 将连接管理器重命名为
ssas:
加工尺寸
首先添加一个Sequence Container来隔离包内的维度处理,然后添加一个Script Task和一个Analysis Services Processing Task:
打开脚本任务并选择p_Database,p_MaxParallel 作为只读变量,strProcessDimensions 作为读写变量:
现在,打开脚本编辑器并使用以下代码:
代码是准备 XMLA 命令来处理维度,这个 XMLA 查询将在 Analysis Services 处理任务中使用
#region Namespaces
using System;
using System.Data;
using System.Data.SqlClient;
using Microsoft.SqlServer.Dts.Runtime;
using System.Linq;
using System.Windows.Forms;
using Microsoft.AnalysisServices;
#endregion
namespace ST_00ad89f595124fa7bee9beb04b6ad3d9
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
Server myServer = new Server();
string ConnStr = Dts.Connections["ssas"].ConnectionString;
myServer.Connect(ConnStr);
Database db = myServer.Databases.GetByName(Dts.Variables["p_Database"].Value.ToString());
int maxparallel = (int)Dts.Variables["p_MaxParallel"].Value;
var dimensions = db.Dimensions;
string strData;
strData = "<Batch xmlns=\"http://schemas.microsoft.com/analysisservices/2003/engine\"> \r\n <Parallel MaxParallel=\"" + maxparallel.ToString() + "\"> \r\n";
foreach (Dimension dim in dimensions)
{
strData +=
" <Process xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:ddl2=\"http://schemas.microsoft.com/analysisservices/2003/engine/2\" xmlns:ddl2_2=\"http://schemas.microsoft.com/analysisservices/2003/engine/2/2\" xmlns:ddl100_100=\"http://schemas.microsoft.com/analysisservices/2008/engine/100/100\" xmlns:ddl200=\"http://schemas.microsoft.com/analysisservices/2010/engine/200\" xmlns:ddl200_200=\"http://schemas.microsoft.com/analysisservices/2010/engine/200/200\" xmlns:ddl300=\"http://schemas.microsoft.com/analysisservices/2011/engine/300\" xmlns:ddl300_300=\"http://schemas.microsoft.com/analysisservices/2011/engine/300/300\" xmlns:ddl400=\"http://schemas.microsoft.com/analysisservices/2012/engine/400\" xmlns:ddl400_400=\"http://schemas.microsoft.com/analysisservices/2012/engine/400/400\"> \r\n" +
" <Object> \r\n" +
" <DatabaseID>" + db.ID + "</DatabaseID> \r\n" +
" <DimensionID>" + dim.ID + "</DimensionID> \r\n" +
" </Object> \r\n" +
" <Type>ProcessFull</Type> \r\n" +
" <WriteBackTableCreation>UseExisting</WriteBackTableCreation> \r\n" +
" </Process> \r\n";
}
//}
strData += " </Parallel> \r\n</Batch>";
Dts.Variables["strProcessDimensions"].Value = strData;
Dts.TaskResult = (int)ScriptResults.Success;
}
#region ScriptResults declaration
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
}
}
现在,打开 Analysis Services 处理任务并手动定义任何任务,然后转到表达式并将 strProcessDimensions 变量分配给 ProcessingCommands 属性:
获取未处理的分区数
为了在块中循环分区,我们首先要获取未处理的分区计数。为此,您必须添加一个脚本任务。选择p_Cube、p_Database、p_MeasureGroup、p_ServerName 变量作为只读变量,intCount 作为读写变量。
在脚本编辑器中编写以下脚本:
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using Microsoft.AnalysisServices;
using System.Linq;
#endregion
namespace ST_e3da217e491640eca297900d57f46a85
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
// TODO: Add your code here
Server myServer = new Server();
string ConnStr = Dts.Connections["ssas"].ConnectionString;
myServer.Connect(ConnStr);
Database db = myServer.Databases.GetByName(Dts.Variables["p_Database"].Value.ToString());
Cube objCube = db.Cubes.FindByName(Dts.Variables["p_Cube"].Value.ToString());
MeasureGroup objMeasureGroup = objCube.MeasureGroups[Dts.Variables["p_MeasureGroup"].Value.ToString()];
Dts.Variables["intCount"].Value = objMeasureGroup.Partitions.Cast<Partition>().Where(x => x.State != AnalysisState.Processed).Count();
Dts.TaskResult = (int)ScriptResults.Success;
}
#region ScriptResults declaration
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
}
}
分块处理分区
最后一步是创建一个 Forloop 容器并如下图所示进行配置:
- InitExpression:@intCurrent = 0
- EvalExpression:@intCurrent
- AssignExpression = @intCurrent + @p_MaxParallel
在 For Loop 容器内添加一个脚本任务以准备 XMLA 查询并添加两个 Analysis Services 处理任务,如下图所示:
在脚本任务中,选择p_Cube、p_Database、p_MaxParallel、p_MeasureGroup 作为只读变量,并选择strProcessData、strProcessIndexes 作为读写变量。
在脚本编辑器中编写以下脚本:
脚本是准备分别处理分区数据和索引所需的 XMLA 命令
#region Namespaces
using System;
using System.Data;
using System.Data.SqlClient;
using Microsoft.SqlServer.Dts.Runtime;
using System.Linq;
using System.Windows.Forms;
using Microsoft.AnalysisServices;
#endregion
namespace ST_00ad89f595124fa7bee9beb04b6ad3d9
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
Server myServer = new Server();
string ConnStr = Dts.Connections["ssas"].ConnectionString;
myServer.Connect(ConnStr);
Database db = myServer.Databases.GetByName(Dts.Variables["p_Database"].Value.ToString());
Cube objCube = db.Cubes.FindByName(Dts.Variables["p_Cube"].Value.ToString());
MeasureGroup objMeasureGroup = objCube.MeasureGroups[Dts.Variables["p_MeasureGroup"].Value.ToString()];
int maxparallel = (int)Dts.Variables["p_MaxParallel"].Value;
int intcount = objMeasureGroup.Partitions.Cast<Partition>().Where(x => x.State != AnalysisState.Processed).Count();
if (intcount > maxparallel)
{
intcount = maxparallel;
}
var partitions = objMeasureGroup.Partitions.Cast<Partition>().Where(x => x.State != AnalysisState.Processed).OrderBy(y => y.Name).Take(intcount);
string strData, strIndexes;
strData = "<Batch xmlns=\"http://schemas.microsoft.com/analysisservices/2003/engine\"> \r\n <Parallel MaxParallel=\"" + maxparallel.ToString() + "\"> \r\n";
strIndexes = "<Batch xmlns=\"http://schemas.microsoft.com/analysisservices/2003/engine\"> \r\n <Parallel MaxParallel=\"" + maxparallel.ToString() + "\"> \r\n";
string SQLConnStr = Dts.Variables["User::p_DatabaseConnection"].Value.ToString();
foreach (Partition prt in partitions)
{
strData +=
" <Process xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:ddl2=\"http://schemas.microsoft.com/analysisservices/2003/engine/2\" xmlns:ddl2_2=\"http://schemas.microsoft.com/analysisservices/2003/engine/2/2\" xmlns:ddl100_100=\"http://schemas.microsoft.com/analysisservices/2008/engine/100/100\" xmlns:ddl200=\"http://schemas.microsoft.com/analysisservices/2010/engine/200\" xmlns:ddl200_200=\"http://schemas.microsoft.com/analysisservices/2010/engine/200/200\" xmlns:ddl300=\"http://schemas.microsoft.com/analysisservices/2011/engine/300\" xmlns:ddl300_300=\"http://schemas.microsoft.com/analysisservices/2011/engine/300/300\" xmlns:ddl400=\"http://schemas.microsoft.com/analysisservices/2012/engine/400\" xmlns:ddl400_400=\"http://schemas.microsoft.com/analysisservices/2012/engine/400/400\"> \r\n " +
" <Object> \r\n " +
" <DatabaseID>" + db.Name + "</DatabaseID> \r\n " +
" <CubeID>" + objCube.ID + "</CubeID> \r\n " +
" <MeasureGroupID>" + objMeasureGroup.ID + "</MeasureGroupID> \r\n " +
" <PartitionID>" + prt.ID + "</PartitionID> \r\n " +
" </Object> \r\n " +
" <Type>ProcessData</Type> \r\n " +
" <WriteBackTableCreation>UseExisting</WriteBackTableCreation> \r\n " +
" </Process> \r\n";
strIndexes +=
" <Process xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:ddl2=\"http://schemas.microsoft.com/analysisservices/2003/engine/2\" xmlns:ddl2_2=\"http://schemas.microsoft.com/analysisservices/2003/engine/2/2\" xmlns:ddl100_100=\"http://schemas.microsoft.com/analysisservices/2008/engine/100/100\" xmlns:ddl200=\"http://schemas.microsoft.com/analysisservices/2010/engine/200\" xmlns:ddl200_200=\"http://schemas.microsoft.com/analysisservices/2010/engine/200/200\" xmlns:ddl300=\"http://schemas.microsoft.com/analysisservices/2011/engine/300\" xmlns:ddl300_300=\"http://schemas.microsoft.com/analysisservices/2011/engine/300/300\" xmlns:ddl400=\"http://schemas.microsoft.com/analysisservices/2012/engine/400\" xmlns:ddl400_400=\"http://schemas.microsoft.com/analysisservices/2012/engine/400/400\"> \r\n " +
" <Object> \r\n " +
" <DatabaseID>" + db.Name + "</DatabaseID> \r\n " +
" <CubeID>" + objCube.ID + "</CubeID> \r\n " +
" <MeasureGroupID>" + objMeasureGroup.ID + "</MeasureGroupID> \r\n " +
" <PartitionID>" + prt.ID + "</PartitionID> \r\n " +
" </Object> \r\n " +
" <Type>ProcessIndexes</Type> \r\n " +
" <WriteBackTableCreation>UseExisting</WriteBackTableCreation> \r\n " +
" </Process> \r\n";
}
strData += " </Parallel> \r\n</Batch>";
strIndexes += " </Parallel> \r\n</Batch>";
Dts.Variables["strProcessData"].Value = strData;
Dts.Variables["strProcessIndexes"].Value = strIndexes;
Dts.TaskResult = (int)ScriptResults.Success;
}
#region ScriptResults declaration
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
}
}
现在打开两个分析服务处理任务并手动定义任何任务(只是为了验证任务)。然后转到表达式并将strProcessData 变量分配给第一个任务中的ProcessingCommands 属性,并将strProcessIndexes 变量分配给ProcessingCommands。
现在你可以执行包了,如果发生错误只有当前批次会回滚(10个分区)。
可能的改进
您可以添加一些日志记录任务来跟踪包进度,尤其是在处理大量分区时。
由于它包含有用的细节,我在我的个人博客上发布了这个答案:
我还发表了一篇文章,详细介绍了 SQLShack: