首先,我的解决方案是基于@DrHouseofSQL 和@Bhouse 的答案,所以你必须先阅读@DrHouseofSQL 的答案,然后再阅读@BHouse 的答案,然后继续阅读这个答案
问题
注意:页面名称将是动态的,列位置可能会改变(例如:列“ABC可能存在于第一行或第二行或...
这种情况有点复杂,可以使用以下解决方法解决:
解决方案概述
- 在导入数据的数据流任务前添加脚本任务
- 您必须使用脚本任务打开excel文件并获取工作表名称和标题行
- 构建查询并将其存储在变量中
- 在第二个数据流任务中,您必须使用上面存储的查询作为源(请注意,您必须将
Delay Validation 属性设置为 true)
解决方案详情
- 首先创建一个字符串类型的 SSIS 变量(即@[User::strQuery])
- 添加另一个包含 Excel 文件路径的变量(即@[User::ExcelFilePath])
- 添加一个脚本任务,选择
@[User::strQuery]作为读写变量,@[User::ExcelFilePath]作为只读变量(在脚本任务窗口中)
- 将脚本语言设置为 VB.Net,并在脚本编辑器窗口中编写以下脚本:
注意:你必须导入System.Data.OleDb
在下面的代码中,我们在excel中搜索前15行找到表头,如果15行后能找到表头,可以增加数字。我还假设列范围是从A 到I
m_strExcelPath = Dts.Variables.Item("ExcelFilePath").Value.ToString
Dim strSheetname As String = String.Empty
Dim intFirstRow As Integer = 0
m_strExcelConnectionString = Me.BuildConnectionString()
Try
Using OleDBCon As New OleDbConnection(m_strExcelConnectionString)
If OleDBCon.State <> ConnectionState.Open Then
OleDBCon.Open()
End If
'Get all WorkSheets
m_dtschemaTable = OleDBCon.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
New Object() {Nothing, Nothing, Nothing, "TABLE"})
'Loop over work sheet to get the first one (the excel may contains temporary sheets or deleted ones
For Each schRow As DataRow In m_dtschemaTable.Rows
strSheetname = schRow("TABLE_NAME").ToString
If Not strSheetname.EndsWith("_") AndAlso strSheetname.EndsWith("$") Then
Using cmd As New OleDbCommand("SELECT * FROM [" & strSheetname & "A1:I15]", OleDBCon)
Dim dtTable As New DataTable("Table1")
cmd.CommandType = CommandType.Text
Using daGetDataFromSheet As New OleDbDataAdapter(cmd)
daGetDataFromSheet.Fill(dtTable)
For intCount As Integer = 0 To 15
If Not String.IsNullOrEmpty(dtTable.Rows(intCount)(0).ToString) Then
'+1 because datatable is zero based indexed, +1 because we want to start from the second row
intFirstRow = intCount + 2
End If
Next
End Using
If intFirstRow = 0 Then Throw New Exception("header not found")
End Using
'when the first correct sheet is found there is no need to check others
Exit For
End If
Next
OleDBCon.Close()
End Using
Catch ex As Exception
Throw New Exception(ex.Message, ex)
End Try
Dts.Variables.Item("strQuery").Value = "SELECT * FROM [" & strSheetname & "A" & intFirstRow.ToString & ":I]"
Dts.TaskResult = ScriptResults.Success
End Sub
- 然后你必须添加一个 Excel 连接管理器,并选择你要导入的 excel 文件(只需选择一个示例来定义元数据)
- 将默认值
Select * from [Sheet1$A2:I] 分配给变量@[User::strQuery]
- 在数据流任务中添加一个 Excel 源,从变量中选择 SQL 命令,然后选择
@[User::strQuery]
- 转到列选项卡并按照@BHouse 建议的方式命名列
图片取自@BHouse 回答
- 将 DataFlow 任务
Delay Validation 属性设置为 True
- 将其他组件添加到 DataFlow 任务中
更新 1:
来自 OP cmets:sometimes excel with empty data will come.(i.e) we have only header row not not data... in that case it fails entire task
解决方案:
如果您的 excel 文件不包含数据(只有标题),您必须执行以下步骤:
- 添加布尔类型的 SSIS 变量 *(即
@[User::ImportFile])
- 将
@[User::ImportFile] 添加到脚本任务ReadWrite 变量中
- 在脚本任务中检查文件是否包含行
- 如果是则设置
@[User::ImportFile] = True,否则设置@[User::ImportFile] = False
- 双击将脚本任务连接到数据流的箭头(优先级约束)
- 将其类型设置为约束和表达式
-
写出下面的表达式
@[User::ImportFile] == True
注意:新的脚本任务代码为:
m_strExcelPath = Dts.Variables.Item("ExcelFilePath").Value.ToString
Dim strSheetname As String = String.Empty
Dim intFirstRow As Integer = 0
m_strExcelConnectionString = Me.BuildConnectionString()
Try
Using OleDBCon As New OleDbConnection(m_strExcelConnectionString)
If OleDBCon.State <> ConnectionState.Open Then
OleDBCon.Open()
End If
'Get all WorkSheets
m_dtschemaTable = OleDBCon.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
New Object() {Nothing, Nothing, Nothing, "TABLE"})
'Loop over work sheet to get the first one (the excel may contains temporary sheets or deleted ones
For Each schRow As DataRow In m_dtschemaTable.Rows
strSheetname = schRow("TABLE_NAME").ToString
If Not strSheetname.EndsWith("_") AndAlso strSheetname.EndsWith("$") Then
Using cmd As New OleDbCommand("SELECT * FROM [" & strSheetname & "A1:I15]", OleDBCon)
Dim dtTable As New DataTable("Table1")
cmd.CommandType = CommandType.Text
Using daGetDataFromSheet As New OleDbDataAdapter(cmd)
daGetDataFromSheet.Fill(dtTable)
For intCount As Integer = 0 To 15
If Not String.IsNullOrEmpty(dtTable.Rows(intCount)(0).ToString) Then
'+1 because datatable is zero based indexed, +1 because we want to start from the second row
intFirstRow = intCount + 2
End If
Next
End Using
End Using
'when the first correct sheet is found there is no need to check others
Exit For
End If
Next
OleDBCon.Close()
End Using
Catch ex As Exception
Throw New Exception(ex.Message, ex)
End Try
If intFirstRow = 0 OrElse _
intFirstRow > dtTable.Rows.Count Then
Dts.Variables.Item("ImportFile").Value = False
Else
Dts.Variables.Item("ImportFile").Value = True
End If
Dts.Variables.Item("strQuery").Value = "SELECT * FROM [" & strSheetname & "A" & intFirstRow.ToString & ":I]"
Dts.TaskResult = ScriptResults.Success
End Sub
更新 2:
来自 OP cmets:is there any other work around available to process the data flow task without skipping all data flow task,Actually one of the task will log the filename and data count and all, which are missing here
解决方案:
- 只需添加另一个数据流任务
- 使用另一个连接器和表达式
@[User::ImportFile] == False 将此数据流与脚本任务连接(与第一个连接器的步骤相同)
- 在 DataFlow 任务中添加一个 SCript 组件作为源
- 创建要导入日志的输出列
- 创建一个包含您需要导入的信息的行
- 添加日志目标
或者您可以添加Execute SQL Task 来在日志表中插入一行,而不是添加另一个Data Flow Task