通过 VBA 将大量数据集从 Excel 导入 Access答案

【问题标题】：Import huge dataset into Access from Excel via VBA通过 VBA 将大量数据集从 Excel 导入 Access
【发布时间】：2016-09-28 13:47:37
【问题描述】：

我有一个庞大的数据集，需要将其从 Excel 导入 Access（约 800k 行）。但是，我可以忽略具有特定列值的行，它们占实际数据集的 90%。所以事实上，我只需要导入大约 10% 的行。

过去我一直以以下方式（伪代码）逐行导入 Excel 文件：

For i = 1 To EOF
    sql = "Insert Into [Table] (Column1, Column2) VALUES ('" & _
    xlSheet.Cells(i, 1).Value & " ', '" & _
    xlSheet.Cells(i, 2).Value & "');"       
Next i
DoCmd.RunSQL sql

对于大约 800k 行，这需要很长时间，因为每行都会创建并运行一个查询。

考虑到我也可以忽略 90% 的行，将数据集从 Excel 导入 Access 的最快方法是什么？

我正在考虑创建一个激活过滤器的临时 Excel 文件。然后我只导入过滤后的 excel。

但是有比这更好/更快的方法吗？另外，通过 vba 访问导入 excel 的最快方法是什么？

提前致谢。

【问题讨论】：

VB.NET 不是 vba。标签上说的很对。
1.) 您能否删除 90% 的行以减少您正在处理的数据？ 2.) 这是一次性操作，还是您会重复执行此操作？这会改变您的解决方案的稳健性。通常，RDBMS 允许从 Excel、CSV、XML 等导入数据库表。这是一个手动步骤，但如果它是您一次性的需要，它可能是最快和最简单的...
您可以使用 ADO 查询 excel 表，并使用 ADO 和 SQL 来完成。

标签： excel vba ms-access

【解决方案1】：

考虑为导入运行一个特殊的 Access 查询。将以下 SQL 添加到 Access 查询窗口或作为 DAO/ADO 连接中的 SQL 查询。包括任何需要命名列标题的WHERE 子句，现在设置为HDR:No：

INSERT INTO [Table] (Column1, Column2)
SELECT *
FROM [Excel 12.0 Xml;HDR=No;Database=C:\Path\To\Workbook.xlsx].[SHEET1$];

或者，运行 Make-Table 查询，以防您在最终表之前需要临时临时表（删除 90% 的行），但请注意此查询将替换表（如果存在）：

SELECT * INTO [NewTable]
FROM [Excel 12.0 Xml;HDR=No;Database=C:\Path\To\Workbook.xlsx].[SHEET1$];

【讨论】：

使用 dao 而不是逐行的方法大大提高了速度，非常感谢！
太棒了！事实上，SQL 会立即处理这项工作，而不是迭代处理。很高兴我能帮上忙。
如果它运行良好，我可以使用它。我每周一次从合作伙伴公司获得一个 excel 文件。它有 6 列，目前约有 800k+ 行。永远不应删除行，但字段可能会更改，并且始终会添加行。我需要一种快速的方法来从 excel 文件中更新访问表并吐出更改的报告/文件/表。它在 50 万行数据上运行的速度有多快？

【解决方案2】：

对您的代码稍作改动即可为您进行过滤：

Dim strTest As String
For i = 1 To EOF
    strTest=xlSheet.Cells(i, 1).Value
    if Nz(strTest)<>"" Then
        sql = "Insert Into [Table] (Column1, Column2) VALUES ('" & _
        strTest & " ', '" & _
        xlSheet.Cells(i, 2).Value & "');" 
            DoCmd.RunSQL sql
      End If
Next i

我认为将 RunSQL 置于循环之外只是您的伪代码中的一个错误。这会测试第一列中的 Cell 是否为空，但您可以用适合您情况的任何条件替换。

【讨论】：

感谢您的输入，不幸的是，这种方法仍然比使用 dao 慢得多

【解决方案3】：

我参加聚会有点晚了，但我偶然发现了这个寻找类似问题的信息。我想我可能会分享我的解决方案，以防它可以帮助其他人或 OP，如果他/她仍在研究它。这是我的问题和我所做的：

我有一个已建立的 Access 数据库，其行数与 OP 大致相同（6 列，约 850k 行）。我们大约每周一次从合作伙伴公司收到一个带有一张表的 .xlsx 文件和与数据库相同结构的数据。

这个文件包含整个数据库，加上更新（新记录和对旧记录的更改，没有删除）。第一列包含每一行的唯一标识符。当我们通过 Parfait 建议的类似查询收到文件时，Access db 会更新，但由于它是整个 850k+ 记录，因此需要 10-15 分钟或更长时间来比较和更新，具体取决于我们正在进行的其他操作。

由于仅将更改加载到当前 Access db 中会更快，因此我需要生成一个 delta 文件（最好是 .txt，可以使用 excel 打开并在需要时另存为 .xlsx）。我认为这类似于 OP 正在寻找的东西。为此，我用 C++ 编写了一个小应用程序来比较前一周的文件和本周的文件。数据本身是字符和数字数据的混合体，为简单起见，我将在此处将其称为 string1 到 string6。它看起来像这样：

Col1       Col2       Col3       Col4       Col5       Col6
string1    string2    string3    string4    string5    string6
.......
''''Through 850k rows''''

将两个 .xlsx 文件保存为 .txt 制表符分隔文件后，它们如下所示：

Col1\tCol2\tCol3\tCol4\tCol5\tCol6\n
string1\tstring2\tstring3\tstring4\tstring5\tstring6\n
....
//Through 850k rows//

有趣的部分！我将旧的 .txt 文件存储为哈希表（使用标准库中的 c++ unordered_map）。然后使用来自新 .txt 文件的输入文件流，我在新文件中使用 Col1 作为哈希表的键，并将任何差异输出到两个不同的文件。一种是您可以使用查询将新数据附加到数据库中，而另一种则可以用于更新已更改的数据。

我听说可以创建一个比unordered_map 更有效的哈希表，但目前，这很有效，所以我会坚持下去。这是我的代码。

#include <iostream>     
#include <fstream>      
#include <string>       
#include <iterator>
#include <unordered_map>


int main()
{
    using namespace std;

    //variables
    const string myInFile1{"OldFile.txt"};
    const string myInFile2{"NewFile.txt"};
    string mappedData;
    string key;

    //hash table objects
    unordered_map<string, string> hashMap;
    unordered_map<string, string>::iterator cursor;

    //input files
    ifstream fin1;
    ifstream fin2;
    fin1.open(myInFile1);
    fin2.open(myInFile2);

    //output files
    ofstream fout1;
    ofstream fout2;
    fout1.open("For Updated.txt");  //updating old records 
    fout2.open("For Upload.txt");   //uploading new records

    //This loop takes the original input file (i.e.; what is in the database already)
    //and hashes the entire file using the Col1 data as a key. On my system this takes
    //approximately 2 seconds for 850k+ rows with 6 columns
    while(fin1)
    {
            getline(fin1, key, '\t');          //get the first column
            getline(fin1, mappedData, '\n');   //get the other 5 columns
            hashMap[key] = mappedData;         //store the data in the hash table
    }
    fin1.close();

    //output file headings
    fout1 << "COl1\t" << "COl2\t" << "COl3\t" << "COl4\t" << "COl5\t" << "COl6\n";
    fout2 << "COl1\t" << "COl2\t" << "COl3\t" << "COl4\t" << "COl5\t" << "COl6\n";

    //This loop takes the second input file and reads each line, first up to the
    //first tab delimiter and stores it as "key", then up to the new line character
    //storing it as "mappedData" and then uses the value of key to search the hash table
    //If the key is not found in the hash table, a new record is created in the upload
    //output file. If it is found, the mappedData from the file is compared to that of
    //the hash table and if different, the updated record is sent to the update output
    //file. I realize that while(fin2) is not the optimal syntax for this loop but I
    //have included a check to see if the key is empty (eof) after retrieving
    //the current line from the input file. YMMV on the time here depending on how many
    //records are added or updated (1000 records takes about another 5 seconds on my system)    
    while(fin2)
    {
        getline(fin2, key, '\t');           //get key from Col1 in the input file
        getline(fin2, mappedData, '\n');    //get the mappeData (Col2-Col6)
        if(key.empty())                     //exit the file read if key is empty
            break;
        cursor = hashMap.find(key);         //assign the iterator to the hash table at key

        if(cursor != hashMap.end())         //check to see if key in hash table
        {
            if(cursor->second != mappedData) //compare mappedData
            {          
                fout2 << key << "\t" << mappedData<< "\n";
            }
        }
        else                                //for updating old records
        {
            fout1 << key << "\t" << mappedData<< "\n";
        }
    }


    fin2.close();
    fout1.close();
    fout2.close();
    return 0;
}

我正在做一些事情来使它成为一个易于使用的可执行文件（例如读取 excel.zip 文件的 xml 结构以便直接读取或使用 ODBC 连接），但现在，我'我只是对其进行测试以确保输出正确。当然，必须使用类似于 Parfait 建议的查询将输出文件加载到访问数据库中。此外，我不确定 Excel 或 Access VBA 是否有用于构建哈希表的库，但如果它可以节省访问 Excel 数据的时间，则可能值得进一步探索。欢迎任何批评或建议。

【讨论】：