【问题标题】:Import multiple delimited text files into a SQL Server database and automatically create tables将多个分隔文本文件导入 SQL Server 数据库并自动创建表
【发布时间】:2017-03-25 14:18:01
【问题描述】:

我有多个分隔文本文件(例如.csv 文件),每个文件都包含列、行和标题。

我想尽可能轻松地将所有这些输入文件导入 SQL Server。具体来说,我想创建输出表,我将在其中动态导入这些文件

其中一些输入文件需要导入到同一个输出表中,而另一些则需要导入到不同的表中。您可以假设将导入到同一个表中的所有文件都具有相同的标题。

SQL Server Management Studio 有一个导入向导,它允许您导入分隔的文本文件(和其他格式)并自动创建输出表。但是,这不允许您同时导入多个文件。此外,它需要大量的手工工作并且不可复制。

可以在网上找到许多将多个文本文件导入表格的脚本。但是,其中大多数都需要先创建输出表。这也需要每个表的额外工作。

有没有办法列出所有相关的输入文件及其对应的输出表,以便在导入数据后自动创建表?

【问题讨论】:

    标签: sql sql-server csv import create-table


    【解决方案1】:

    此脚本允许您将多个分隔文本文件导入到SQL 数据库自动创建导入数据的表,包括所有必需的列。该脚本包含一些文档。

    /*
    **  This file was created by Laurens Bogaardt, Advisor Data Analytics at EY Amsterdam on 2016-11-03.
    **  This script allows you to import multiple delimited text files into a SQL database. The tables 
    **  into which the data is imported, including all required columns, are created automatically. This 
    **  script uses tab-delimited (tsv) files and SQL Server Management Studio. The script may need some 
    **  minor adjustments for other formats and tools. The scripts makes several assumptions which need 
    **  to be valid before it can run properly. First of all, it assumes none of the output tables exist 
    **  in the SQL tool before starting. Therefore, it may be necessary to clean the database and delete 
    **  all the existing tables. Secondly, the script assumes that, if multiple text files are imported 
    **  into the same output table, the number and order of the columns of these files is identical. If 
    **  this is not the case, some manual work may need to be done to the text files before importing.
    **  Finally, please note that this script only imports data as strings (to be precise, as NVARCHAR's
    **  of length 255). It does not allow you to specify the datatype per column. This would need to be 
    **  done using another script after importing the data as strings.
    */
    
    -- 1.   Import Multiple Delimited Text Files into a SQL Database
    
    -- 1.1  Define the path to the input and define the terminators
    
    /*
    **  In this section, some initial parameters are set. Obviously, the 'DatabaseName' refers to the 
    **  database in which you want to create new tables. The '@Path' parameter sets the folder in 
    **  which the text files are located which you want to import. Delimited files are defined by 
    **  two characters: one which separates columns and one which separates rows. Usually, the 
    **  row-terminator is the newline character CHAR(10), also given by '\n'. When files are created 
    **  in Windows, the row-terminator often includes a carriage return CHAR(13), also given by '\r\n'. 
    **  Often, a tab is used to separate each column. This is given by CHAR(9) or by the character '\t'. 
    **  Other useful characters include the comma CHAR(44), the semi-colon CHAR(59) and the pipe 
    **  CHAR(124).
    */
    
    USE [DatabaseName]
    DECLARE @Path NVARCHAR(255) = 'C:\\PathToFiles\\'
    DECLARE @RowTerminator NVARCHAR(5) = CHAR(13) + CHAR(10)
    DECLARE @ColumnTerminator NVARCHAR(5) = CHAR(9)
    
    -- 1.2  Define the list of input and output in a temporary table
    
    /*
    **  In this section, a temporary table is created which lists all the filenames of the delimited 
    **  files which need to be imported, as well as the names of the tables which are created and into 
    **  which the data is imported. Multiple files may be imported into the same output table. Each row 
    **  is prepended with an integer which increments up starting from 1. It is essential that this 
    **  number follows this logic. The temporary table is deleted at the end of this script.
    */
    
    IF OBJECT_ID('[dbo].[Files_Temporary]', 'U') IS NOT NULL
    DROP TABLE [dbo].[Files_Temporary];
    CREATE TABLE [dbo].[Files_Temporary]
    (
        [ID] INT
        , [FileName] NVARCHAR(255)
        , [TableName] NVARCHAR(255)
    );
    
    INSERT INTO [dbo].[Files_Temporary] SELECT 1,   'MyFileA.txt',  'NewTable1'
    INSERT INTO [dbo].[Files_Temporary] SELECT 2,   'MyFileB.txt',  'NewTable2'
    INSERT INTO [dbo].[Files_Temporary] SELECT 3,   'MyFileC.tsv',  'NewTable2'
    INSERT INTO [dbo].[Files_Temporary] SELECT 4,   'MyFileD.csv',  'NewTable2'
    INSERT INTO [dbo].[Files_Temporary] SELECT 5,   'MyFileE.dat',  'NewTable2'
    INSERT INTO [dbo].[Files_Temporary] SELECT 6,   'MyFileF',      'NewTable3'
    INSERT INTO [dbo].[Files_Temporary] SELECT 7,   'MyFileG.text', 'NewTable4'
    INSERT INTO [dbo].[Files_Temporary] SELECT 8,   'MyFileH.txt',  'NewTable5'
    INSERT INTO [dbo].[Files_Temporary] SELECT 9,   'MyFileI.txt',  'NewTable5'
    INSERT INTO [dbo].[Files_Temporary] SELECT 10,  'MyFileJ.txt',  'NewTable5'
    INSERT INTO [dbo].[Files_Temporary] SELECT 11,  'MyFileK.txt',  'NewTable6'
    
    -- 1.3  Loop over the list of input and output and import each file to the correct table
    
    /*
    **  In this section, the 'WHILE' statement is used to loop over all input files. A counter is defined 
    **  which starts at '1' and increments with each iteration. The filename and tablename are retrieved 
    **  from the previously defined temporary table. The next step of the script is to check whether the 
    **  output table already exists or not.
    */
    
    DECLARE @Counter INT = 1
    
    WHILE @Counter <= (SELECT COUNT(*) FROM [dbo].[Files_Temporary])
    BEGIN
        PRINT 'Counter is ''' + CONVERT(NVARCHAR(5), @Counter) + '''.'
    
        DECLARE @FileName NVARCHAR(255)
        DECLARE @TableName NVARCHAR(255)
        DECLARE @Header NVARCHAR(MAX)
        DECLARE @SQL_Header NVARCHAR(MAX)
        DECLARE @CreateHeader NVARCHAR(MAX) = ''
        DECLARE @SQL_CreateHeader NVARCHAR(MAX)
    
        SELECT @FileName = [FileName], @TableName = [TableName] FROM [dbo].[Files_Temporary] WHERE [ID] = @Counter
    
        IF OBJECT_ID('[dbo].[' + @TableName + ']', 'U') IS NULL
        BEGIN
    /*
    **  If the output table does not yet exist, it needs to be created. This requires the list of all 
    **  columnnames for that table to be retrieved from the first line of the text file, which includes 
    **  the header. A piece of SQL code is generated and executed which imports the header of the text 
    **  file. A second temporary table is created which stores this header as a single string.
    */
            PRINT 'Creating new table with name ''' + @TableName + '''.'
    
            IF OBJECT_ID('[dbo].[Header_Temporary]', 'U') IS NOT NULL
            DROP TABLE [dbo].[Header_Temporary];
            CREATE TABLE [dbo].[Header_Temporary]
            (
                [Header] NVARCHAR(MAX)
            );
    
            SET @SQL_Header = '
                BULK INSERT [dbo].[Header_Temporary]
                FROM ''' + @Path + @FileName + '''
                WITH
                (
                    FIRSTROW = 1,
                    LASTROW = 1,
                    MAXERRORS = 0,
                    FIELDTERMINATOR = ''' + @RowTerminator + ''',
                    ROWTERMINATOR = ''' + @RowTerminator + '''
                )'
            EXEC(@SQL_Header)
    
            SET @Header = (SELECT TOP 1 [Header] FROM [dbo].[Header_Temporary])
            PRINT 'Extracted header ''' + @Header + ''' for table ''' + @TableName + '''.'
    /*
    **  The columnnames in the header are separated using the column-terminator. This can be used to loop 
    **  over each columnname. A new piece of SQL code is generated which will create the output table 
    **  with the correctly named columns.
    */
            WHILE CHARINDEX(@ColumnTerminator, @Header) > 0
            BEGIN          
                SET @CreateHeader = @CreateHeader + '[' + LTRIM(RTRIM(SUBSTRING(@Header, 1, CHARINDEX(@ColumnTerminator, @Header) - 1))) + '] NVARCHAR(255), '
                SET @Header = SUBSTRING(@Header, CHARINDEX(@ColumnTerminator, @Header) + 1, LEN(@Header)) 
            END
            SET @CreateHeader = @CreateHeader + '[' + @Header + '] NVARCHAR(255)'
    
            SET @SQL_CreateHeader = 'CREATE TABLE [' + @TableName + '] (' + @CreateHeader + ')'
            EXEC(@SQL_CreateHeader)
        END
    
    /*
    **  Finally, the data from the text file is imported into the newly created table. The first line, 
    **  including the header information, is skipped. If multiple text files are imported into the same 
    **  output table, it is essential that the number and the order of the columns is identical, as the 
    **  table will only be created once, using the header information of the first text file.
    */
        PRINT 'Inserting data from ''' + @FileName + ''' to ''' + @TableName + '''.'
        DECLARE @SQL NVARCHAR(MAX)
        SET @SQL = '
            BULK INSERT [dbo].[' + @TableName + ']
            FROM ''' + @Path + @FileName + '''
            WITH
            (
                FIRSTROW = 2,
                MAXERRORS = 0,
                FIELDTERMINATOR = ''' + @ColumnTerminator + ''',
                ROWTERMINATOR = ''' + @RowTerminator + '''
            )'
        EXEC(@SQL)
    
        SET @Counter = @Counter + 1
    END;
    
    -- 1.4  Cleanup temporary tables
    
    /*
    **  In this section, the temporary tables which were created and used by this script are deleted. 
    **  Alternatively, the script could have used 'real' temporary table (identified by the '#' character 
    **  in front of the name) or a table variable. These would have deleted themselves once they were no 
    **  longer in use. However, the end result is the same.
    */
    
    IF OBJECT_ID('[dbo].[Files_Temporary]', 'U') IS NOT NULL
    DROP TABLE [dbo].[Files_Temporary];
    
    IF OBJECT_ID('[dbo].[Header_Temporary]', 'U') IS NOT NULL
    DROP TABLE [dbo].[Header_Temporary];
    

    【讨论】:

    • 这是一个写得很好的答案。我建议的一个小调整不是手动插入文件和表名,而是可以使用以下语句自动填充表, ------插入文件名 ------ 插入 Files_Temporary (filename) exec master..xp_cmdshell 'dir > /b /ad' ------更新表名消除文件扩展名------- update Files_Temporary set [TableName]= SUBSTRING(文件名,0, CHARINDEX('.',文件名))
    【解决方案2】:

    将带有分隔的 .txt 或 .csv 文件的整个文件夹插入 SQL 服务器

    注意:不要害怕在这里看到的冗长脚本。只有 3 个变量需要更改,整个脚本应该可以正常工作。

    此解决方案是从接受的答案 (@LBogaardt) 升级而来,它还实施了 @Chendur Mar 的建议,以从文件夹中获取所有文件。

    我的补充:

    • 此解决方案适用于 UTF-8 文件
    • NVARCHAR(MAX) 中导入,而不是在 NVARCHAR(255) 中导入 - 如果您愿意,可以更改它
    • 已实现 error logging,因此您可以查看在发生这种情况时哪些行导入中断

    第 1 步 - 启用 xp_cmdshell

    请参阅 here 了解如何执行此操作。

    第 2 步 - 为导入文件夹中的每个人启用权限

    请记住 - 导入文件夹是服务器上的远程文件夹。所以你需要在服务器上创建文件夹并在那里上传你的文件。 在this之后设置该文件夹的权限。

    第 3 步 - 编辑脚本参数并执行它

    您只需更改前 4 行:

    1. 行 - 而不是 yourDatabase,输入您的数据库名称

    2. 行 - 定义 .txt .csv 文件所在的导入文件夹的位置

    3. 行 - 定义最有可能是new line(\n) 的行终止符,所以保持原样

    4. 行 - 为您的文件定义分隔符 - 如果您使用逗号而不是输入 CHAR(44)','。 CHAR(9) 是制表符。

    脚本:

    USE yourDatabase
    DECLARE @Location NVARCHAR(MAX) = 'C:\Users\username\Desktop\Import\';
    DECLARE @RowTerminator NVARCHAR(5) = '\n';
    DECLARE @ColumnTerminator NVARCHAR(5) = CHAR(9);
    
    DECLARE @SQLINSERT NVARCHAR(MAX);
    
    
    -- 1.2  Define the list of input and output in a temporary table
    
    /*
    **  In this section, a temporary table is created which lists all the filenames of the delimited
    **  files which need to be imported, as well as the names of the tables which are created and into
    **  which the data is imported. Multiple files may be imported into the same output table. Each row
    **  is prepended with an integer which increments up starting from 1. It is essential that this
    **  number follows this logic. The temporary table is deleted at the end of this script.
    */
    
    IF OBJECT_ID('[dbo].[Files_Temporary]', 'U') IS NOT NULL
    DROP TABLE [dbo].[Files_Temporary];
    CREATE TABLE [dbo].[Files_Temporary]
    (
        [ID] INT identity (1,1) primary key
        , [FileName] NVARCHAR(max)
        , [TableName] NVARCHAR(max)
    );
    
    --insert names into  [dbo].[Files_Temporary] 
    SET @SQLINSERT = 'INSERT INTO [dbo].[Files_Temporary] (filename) exec master.dbo.xp_cmdshell' + char(39) + ' dir ' + @Location + ' /b /a-d' + char(39)
    EXEC(@SQLINSERT)
    ------Update table names eliminating the file extension-------
    update [dbo].[Files_Temporary] set [TableName]= SUBSTRING(filename,0, CHARINDEX('.',filename))
    
    -- 1.3  Loop over the list of input and output and import each file to the correct table
    
    /*
    **  In this section, the 'WHILE' statement is used to loop over all input files. A counter is defined
    **  which starts at '1' and increments with each iteration. The filename and tablename are retrieved
    **  from the previously defined temporary table. The next step of the script is to check whether the
    **  output table already exists or not.
    */
    
    DECLARE @Counter INT = 1
    
    WHILE @Counter <= (SELECT COUNT(*) FROM [dbo].[Files_Temporary])
    BEGIN
        PRINT 'Counter is ''' + CONVERT(NVARCHAR(5), @Counter) + '''.'
    
        DECLARE @FileName NVARCHAR(MAX)
        DECLARE @TableName NVARCHAR(MAX)
        DECLARE @Header NVARCHAR(MAX)
        DECLARE @SQL_Header NVARCHAR(MAX)
        DECLARE @CreateHeader NVARCHAR(MAX) = ''
        DECLARE @SQL_CreateHeader NVARCHAR(MAX)
    
        SELECT @FileName = [FileName], @TableName = [TableName] FROM [dbo].[Files_Temporary] WHERE [ID] = @Counter
    
        IF OBJECT_ID('[dbo].[' + @TableName + ']', 'U') IS NULL
        BEGIN
    /*
    **  If the output table does not yet exist, it needs to be created. This requires the list of all
    **  columnnames for that table to be retrieved from the first line of the text file, which includes
    **  the header. A piece of SQL code is generated and executed which imports the header of the text
    **  file. A second temporary table is created which stores this header as a single string.
    */
            PRINT 'Creating new table with name ''' + @TableName + '''.'
    
            IF OBJECT_ID('[dbo].[Header_Temporary]', 'U') IS NOT NULL
            DROP TABLE [dbo].[Header_Temporary];
            CREATE TABLE [dbo].[Header_Temporary]
            (
                [Header] NVARCHAR(MAX)
            );
    
            SET @SQL_Header = '
                BULK INSERT [dbo].[Header_Temporary]
                FROM ''' + @Location + @FileName + '''
                WITH
                (
                    FIRSTROW = 1,
                    LASTROW = 1,
                    MAXERRORS = 0,
                    FIELDTERMINATOR = ''' + @RowTerminator + ''',
                    ROWTERMINATOR = ''' + @RowTerminator + '''
                )'
            EXEC(@SQL_Header)
    
            SET @Header = (SELECT TOP 1 [Header] FROM [dbo].[Header_Temporary])
            PRINT 'Extracted header ''' + @Header + ''' for table ''' + @TableName + '''.'
    /*
    **  The columnnames in the header are separated using the column-terminator. This can be used to loop
    **  over each columnname. A new piece of SQL code is generated which will create the output table
    **  with the correctly named columns.
    */
            WHILE CHARINDEX(@ColumnTerminator, @Header) > 0
            BEGIN
                SET @CreateHeader = @CreateHeader + '[' + LTRIM(RTRIM(SUBSTRING(@Header, 1, CHARINDEX(@ColumnTerminator, @Header) - 1))) + '] NVARCHAR(MAX), '
                SET @Header = SUBSTRING(@Header, CHARINDEX(@ColumnTerminator, @Header) + 1, LEN(@Header))
            END
            SET @CreateHeader = @CreateHeader + '[' + @Header + '] NVARCHAR(MAX)'
    
            SET @SQL_CreateHeader = 'CREATE TABLE [ESCO].[' + @TableName + '] (' + @CreateHeader + ')'
            EXEC(@SQL_CreateHeader)
        END
    
    /*
    **  Finally, the data from the text file is imported into the newly created table. The first line,
    **  including the header information, is skipped. If multiple text files are imported into the same
    **  output table, it is essential that the number and the order of the columns is identical, as the
    **  table will only be created once, using the header information of the first text file.
    */
        --bulk insert
    
        PRINT 'Inserting data from ''' + @FileName + ''' to ''' + @TableName + '''.'
        DECLARE @SQL NVARCHAR(MAX)
        SET @SQL = '
            BULK INSERT [dbo].[' + @TableName + ']
            FROM ''' + @Location + @FileName + '''
            WITH
            (
                FIRSTROW = 2,
                MAXERRORS = 0,
                FIELDTERMINATOR = ''' + @ColumnTerminator + ''',
                ROWTERMINATOR = ''' + @RowTerminator + ''',
                CODEPAGE = ''65001'',
                DATAFILETYPE = ''Char'',
                ERRORFILE = ''' + @Location + 'ImportLog.log''
            )'
        EXEC(@SQL)
    
        SET @Counter = @Counter + 1
    END;
    
    -- 1.4  Cleanup temporary tables
    
    /*
    **  In this section, the temporary tables which were created and used by this script are deleted.
    **  Alternatively, the script could have used 'real' temporary table (identified by the '#' character
    **  in front of the name) or a table variable. These would have deleted themselves once they were no
    **  longer in use. However, the end result is the same.
    */
    
    IF OBJECT_ID('[dbo].[Files_Temporary]', 'U') IS NOT NULL
    DROP TABLE [dbo].[Files_Temporary];
    
    IF OBJECT_ID('[dbo].[Header_Temporary]', 'U') IS NOT NULL
    DROP TABLE [dbo].[Header_Temporary];
    

    最后禁用xp_cmdshell 并删除导入文件夹。

    【讨论】:

      【解决方案3】:

      如果我是你,我会创建一个小的 VBA 脚本来将文件夹中的所有 TXT 文件转换为 XLS 文件,然后像你描述的那样将每个文件加载到 SQL Server 表中。

      select * 
      into SQLServerTable FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 
          'Excel 8.0;Database=C:\your_path_here\test.xls;HDR=YES', 
          'SELECT * FROM [Sheet1$]')
      

      详情请看这里。

      http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=49926

      至于TXT文件转XLS文件的过程,试试这个。

      Private Declare Function SetCurrentDirectoryA Lib _
              "kernel32" (ByVal lpPathName As String) As Long
      
      Public Function ChDirNet(szPath As String) As Boolean
      'based on Rob Bovey's code
          Dim lReturn As Long
          lReturn = SetCurrentDirectoryA(szPath)
          ChDirNet = CBool(lReturn <> 0)
      End Function
      
      Sub Get_TXT_Files()
      'For Excel 2000 and higher
          Dim Fnum As Long
          Dim mysheet As Worksheet
          Dim basebook As Workbook
          Dim TxtFileNames As Variant
          Dim QTable As QueryTable
          Dim SaveDriveDir As String
          Dim ExistFolder As Boolean
      
          'Save the current dir
          SaveDriveDir = CurDir
      
          'You can change the start folder if you want for
          'GetOpenFilename,you can use a network or local folder.
          'For example ChDirNet("C:\Users\Ron\test")
          'It now use Excel's Default File Path
      
          ExistFolder = ChDirNet("C:\your_path_here\Text\")
          If ExistFolder = False Then
              MsgBox "Error changing folder"
              Exit Sub
          End If
      
          TxtFileNames = Application.GetOpenFilename _
          (filefilter:="TXT Files (*.txt), *.txt", MultiSelect:=True)
      
          If IsArray(TxtFileNames) Then
      
              On Error GoTo CleanUp
      
              With Application
                  .ScreenUpdating = False
                  .EnableEvents = False
              End With
      
              'Add workbook with one sheet
              Set basebook = Workbooks.Add(xlWBATWorksheet)
      
              'Loop through the array with txt files
              For Fnum = LBound(TxtFileNames) To UBound(TxtFileNames)
      
                  'Add a new worksheet for the name of the txt file
                  Set mysheet = Worksheets.Add(After:=basebook. _
                                      Sheets(basebook.Sheets.Count))
                  On Error Resume Next
                  mysheet.Name = Right(TxtFileNames(Fnum), Len(TxtFileNames(Fnum)) - _
                                          InStrRev(TxtFileNames(Fnum), "\", , 1))
                  On Error GoTo 0
      
                  With ActiveSheet.QueryTables.Add(Connection:= _
                              "TEXT;" & TxtFileNames(Fnum), Destination:=Range("A1"))
                      .TextFilePlatform = xlWindows
                      .TextFileStartRow = 1
      
                      'This example use xlDelimited
                      'See a example for xlFixedWidth below the macro
                      .TextFileParseType = xlDelimited
      
                      'Set your Delimiter to true
                      .TextFileTabDelimiter = True
                      .TextFileSemicolonDelimiter = False
                      .TextFileCommaDelimiter = False
                      .TextFileSpaceDelimiter = False
      
                      'Set the format for each column if you want (Default = General)
                      'For example Array(1, 9, 1) to skip the second column
                      .TextFileColumnDataTypes = Array(1, 9, 1)
      
                      'xlGeneralFormat  General          1
                      'xlTextFormat     Text             2
                      'xlMDYFormat      Month-Day-Year   3
                      'xlDMYFormat      Day-Month-Year   4
                      'xlYMDFormat      Year-Month-Day   5
                      'xlMYDFormat      Month-Year-Day   6
                      'xlDYMFormat      Day-Year-Month   7
                      'xlYDMFormat      Year-Day-Month   8
                      'xlSkipColumn     Skip             9
      
                      ' Get the data from the txt file
                      .Refresh BackgroundQuery:=False
                  End With
              ActiveSheet.QueryTables(1).Delete
              Next Fnum
      
              'Delete the first sheet of basebook
              On Error Resume Next
              Application.DisplayAlerts = False
              basebook.Worksheets(1).Delete
              Application.DisplayAlerts = True
              On Error GoTo 0
      
      CleanUp:
      
              ChDirNet SaveDriveDir
      
              With Application
                  .ScreenUpdating = True
                  .EnableEvents = True
              End With
          End If
      End Sub
      

      您可以设置 Windows 调度程序以根据需要自动为您运行该进程。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2011-11-11
        • 2018-07-23
        • 2014-01-07
        • 2020-05-18
        • 1970-01-01
        相关资源
        最近更新 更多