【问题标题】:How to transfer a large ADO recordset in Access 2013 to a CSV file?如何将 Access 2013 中的大型 ADO 记录集传输到 CSV 文件?
【发布时间】:2014-11-13 22:38:03
【问题描述】:

在将大型 ADO 记录集从 Access 2013 传输到新的 CSV 文件时,我遇到了一些性能问题。

我的要求是:
1) 文件必须包含列名
2) Recordset 可以包含超过 500,000 条记录
3) 文件必须用逗号分隔
4) 记录有字母数字字符串,有时包含逗号
5) 程序必须创建一个新的 CSV 文件(不仅仅是更新一个预先存在的文件)
注意:数据存储在记录集中,因为它是从 MS-SQL Server 查询的

我对 VBA 编程比较陌生,因此我们将不胜感激。我读到从文件开头到结尾运行的 GetRows 循环可以是将大型记录集导出到 CSV 的有效过程。谢谢

这是我当前的实现,显然违反了我的一些要求:

    'WRITE DATA TO TEXT FILE
     Dim f As ADODB.Field
     Dim myFileSystemObject As Object
     Dim txtfile As Object
     Set myFileSystemObject = CreateObject("Scripting.FileSystemObject")
     Set txtfile = myFileSystemObject.CreateTextFile(strPath & ".txt", True)
     With adRs
      For Each f In .Fields
         txtfile.Write (f.Name)
         txtfile.Write Chr(9)
      Next
      txtfile.WriteLine
      txtfile.Write adRs.GetString(adClipString)
      .Close
     End With
     txtfile.Close

【问题讨论】:

  • 你有一个很糟糕的问题:3) File must be comma delimited 4) Records have alphanumeric strings that sometimes contain commas 这很糟糕。如果您的数据中有逗号,那是您的分隔符,您将失去列边界。如果您有逗号,则需要使用不同的分隔符,例如 |。否则,处理该问题的唯一方法是使所有字段固定宽度,这意味着如果 A 列是 50 个字符,则需要为它没有的每个字符填充一个空格。但这让您知道数据中嵌入了逗号,而不是分隔符。
  • @StarPilot - 处理逗号字段的标准 CSV 方法是引用它们。这不是什么大问题。
  • @TimWilliams - 使用引号的问题是当数据包含引号时,如果数据中有用户通用字段,例如 cmets 或 notes 列,这很常见。然后你会被困在使用另一个字符作为除数、固定长度的输入字段,或者试图检测这些引号并转义或转换它们。避免头痛的最简单方法是使用分隔符(人眼)或很少使用字符作为分隔符的固定长度字段。固定长度可以很好地压缩,简单的修剪消除了空白填充。
  • @StarPilot - 我的意思是 CSV 有处理此类事情的标准方法 - 包括引号(加倍)。由于 OP 对 CSV 输出有要求,他们可以按照这些方法生成与 Excel 兼容的输出。
  • @TimWilliams - 我过去的经验是,许多人没有适当地实施标准,你最终会因此而受挫。因此,我形成了一种态度,即最好尽可能采取一些预防措施,并在您发现另一个不符合标准的情况时避免胃灼热。但我明白你的意思。

标签: sql-server vba csv ado


【解决方案1】:

你可以试试这个(虽然我不知道性能会受到影响;你的问题似乎是双重的 - 我如何得到我的结果,然后我怎样才能有效地做到这一点)。

'WRITE DATA TO TEXT FILE
Dim f As ADODB.Field
Dim myFileSystemObject As Object
Dim txtfile As Object
Dim str As String

Set myFileSystemObject = CreateObject("Scripting.FileSystemObject")
Set txtfile = myFileSystemObject.CreateTextFile(strPath & ".txt", True)

With adRs
    For Each f In .Fields
        txtfile.Write (f.Name)
        txtfile.Write Chr(9)
    Next

    txtfile.WriteLine

    'Add in leading double quote, double quote text qualifier throughout, and changing carriage returns to double quote/carriage return/double quote.
    str = Chr(34) & Replace(Replace(adRs.GetString, vbTab, Chr(34) & "," & Chr(34)), vbCr, Chr(34) & vbCr & Chr(34))

    'Write to file removing last extraneous double quote.
    txtfile.Write Mid(str, 1, Len(str) - 1)

    .Close
    End With

txtfile.Close

所有代码所做的只是在字符串的开头添加一个“,将所有以前的制表符更改为“,”并将所有回车更改为“[cr]”。Write 函数中的 Replace 应该删除最后一个无关的”。文本分隔符应保护任何嵌入的逗号。同样,不确定性能,但似乎您至少可以启动并运行符合您的规格的东西。

编辑:如果这不能让您到达您想要的位置,您也可以查看 SQL Server 的 BCP 实用程序。

【讨论】:

  • VBlades,非常感谢您的帮助!您提供的解决方案适用于较小的文件并解决了 CSV 双引号问题。对于更大的文件,我会采纳您的建议并探索服务器端解决方案。我发现这个有用的博客在这里介绍了使用 BCP 实用程序的一种方法:[link]drewgillson.com/blog/export-data-sql-server-csv
  • 酷,不用担心。如果您认为这是一个不错的答案,您可以将其标记为这样。这可能会对将来遇到此问题的人有所帮助。
  • 抱歉 VBLades 看来我的声誉还不够高。一旦我提高了我的代表,我会确保投票。谢谢
  • 不用担心,但您不需要有很高的代表来为您的问题选择解决方案。这与给它一个赞成票是不一样的。不过,这没什么大不了的,SO 为我节省了很多次,只是希望有人知道如果这个问题再次出现,有一个可行的解决方案。保重。
  • 感谢 VBlades 现在我明白了,对不起这样的菜鸟。
【解决方案2】:

这里的游戏迟到了 - 问题已经超过 3 年了! - 你已经得到了一个实现将文本字段封装在引号中的答案,这是一个合格答案的标记。

但是,解决这三个大问题的答案很少:

  1. 优化 VBA 笨拙的字符串处理;
  2. 字节顺序标记,VBA 嵌入到它保存到的每个字符串中 文件,确保 csv 文件的一些最常见的消费者 无法正确阅读;
  3. ...而且在编写文件之间从来没有任何中间立场 逐行,永远,并把它写成一大块,会抛出 内存不足错误。

VBA 初学者可能会发现字符串优化难以理解:本机 VBA 中最大的性能提升是避免字符串分配和连接(原因如下:http://www.aivosto.com/vbtips/stringopt2.html#huge) - 所以我改用 join、split 和 replace myString = MyString & MoreString

在结尾处调用RecordSet.GetRows() 的尾随循环将引起对结构化编程有强烈看法的编码人员的注意。

所以这里是:


 Public Function RecordsetToCSV(ByRef rst As ADODB.Recordset, _
                                ByRef OutputFile As String, _
                                Optional ByRef FieldList As Variant, _
                                Optional ByVal CoerceText As Boolean = True, _
                                Optional ByVal CleanupText As Boolean = True _
                                ) As Long

' Output a recordset to a csv file and returns the row count.

' If the output file is locked, or specified in an inaccessible location, the ' 'ByRef' OutputFile parameter becomes a file in the user's local temp folder

' You can supply your own field list. This isn't a substituted file header of ' aliased field names: it is a subset of the field names, which ADO will read ' selectively from the recordset. Each item in the list matches a named field

' CoerceText=TRUE will encapsulate all items, numeric or not, in quote marks. ' CleanupText=TRUE strips quotes and linefeeds from the data: FALSE is faster

' You should only set them FALSE if you're confident that the data is 'clean' ' with no quote marks, commas or line breaks in any unencapsulated text field

' This code handles unicode, and outputs a file that can be read by Microsoft ' ODBC and OLEDB database drivers by removing the Byte Order Marker.

On Error Resume Next

' Coding note: we're not doing any string-handling in VBA.Strings: allocating ' deallocating and (especially!) concatenating are SLOW. We are using the VBA ' Join and Split functions ONLY. Feel free to optimise further by declaring a ' faster set of string functions from the Kernel if you want to. ' ' Other optimisations: type pun. Byte Arrays are interchangeable with strings ' Some of our loops through these arrays have a 'step' of 2. This optimises a ' search-and-replace for ANSI chars in an array of 2-byte unicodes. Note that ' it's only used to remove known ANSI 'Latin' characters with a 'low' byte of ' zero: any other use of the two-byte 'step' will fail on non-Latin unicodes.

' ** THIS CODE IS IN THE PUBLIC DOMAIN ** ' Nigel Heffernan Excellerando.Blogspot.com

Const FETCH_ROWS As Long = 4096

Dim COMMA As String * 1 Dim BLANK As String * 4 Dim EOROW As String * 2

COMMA = ChrW$(44) BLANK = ChrW$(13) & ChrW$(10) & ChrW$(13) & ChrW$(10) EOROW = ChrW$(13) & ChrW$(10)

Dim FetchArray As Variant

Dim i As Long ' i for rows in the output file, records in the recordset Dim j As Long ' j for columns in the output file, fields in the recordset Dim k As Long ' k for all other loops: bytes in individual data items

Dim i_Offset As Long

Dim i_LBound As Long Dim i_UBound As Long Dim j_LBound As Long Dim j_UBound As Long Dim k_lBound As Long Dim k_uBound As Long

Dim hndFile As Long Dim varField As Variant

Dim iRowCount As Long Dim arrBytes() As Byte Dim arrTemp1() As String Dim arrTemp2() As String Dim arrTemp3(0 To 2) As String

Dim boolNumeric As Boolean

Dim strHeader As String Dim arrHeader() As Byte

Dim strFile As String Dim strPath As String Dim strExtn As String

strFile = FileName(OutputFile) strPath = FilePath(OutputFile) strExtn = FileExtension(strFile)

If rst Is Nothing Then Exit Function If rst.State <> 1 Then Exit Function

If strExtn = "" Then strExtn = ".csv" End If

With FSO

If strFile = "" Then
    strFile = .GetTempName
    strFile = Left(strFile, Len(strFile) - Len(".tmp"))
    strFile = strFile & strExtn
End If

If strPath = "" Then
    strPath = TempSQLFolder
End If

If Right(strPath, 1) <> "\" Then
    strPath = strPath & "\"
End If

strExtn = FileExtension(strFile)
If strExtn = "" Then
    strExtn = ".csv"
    strFile = strFile & strExtn
End If


OutputFile = strPath & strFile

End With

If FileName(OutputFile) <> "" Then If Len(VBA.FileSystem.Dir(OutputFile, vbNormal)) <> 0 Then

    Err.Clear
    VBA.FileSystem.Kill OutputFile  ' do it now, and reduce wait for deletion
    If Err.Number = 70 Then  ' permission denied: change the output file name
        OutputFile = FileStripExtension(OutputFile) & "_" & FileStripExtension(FSO.GetTempName) & FileExtension(OutputFile)
    End If

End If

End If

' ChrW$() gives a 2-byte 'Wide' char. This coerces all subsequent operations to UTF16

arrTemp3(0) = ChrW$(34) ' Encapsulating quote arrTemp3(1) = vbNullString ' The field value will go here arrTemp3(2) = ChrW$(34) ' Encapsulating quote

If rst.EOF And rst.BOF Then FetchArray = Empty ElseIf rst.EOF Then rst.MoveFirst End If

' An empty recordset must still write a header row of field names: we put this in the ' output buffer and write it to the file before we start looping through the records.

ReDim FetchArray(0 To rst.Fields.Count, 0 To 0)

i_LBound = 0 i_UBound = 0

If IsMissing(FieldList) Then

For j = LBound(FetchArray, 1) To UBound(FetchArray, 1) - 1 Step 1
    FetchArray(j, i_UBound) = rst.Fields(j).Name
Next j

Else

j = 0

For Each varField In FieldList
    j_UBound = j_UBound + 1
Next varField

ReDim arrTemp2(j_LBound To j_UBound)
For Each varField In FieldList
    FetchArray(j, i_UBound) = CStr(varField)
    j = j + 1
Next varField

End If

ReDim arrTemp1(i_LBound To i_UBound) ' arrTemp1 is the rowset we write to file ReDim arrTemp2(j_LBound To j_UBound) ' arrTemp2 represents a single record

Do Until IsEmpty(FetchArray)

i_LBound = LBound(FetchArray, 2)
i_UBound = UBound(FetchArray, 2)

j_LBound = LBound(FetchArray, 1)
j_UBound = UBound(FetchArray, 1)

If UBound(arrTemp1) <> i_UBound + 1 Then
    ReDim arrTemp1(i_LBound To i_UBound + 1)
    arrTemp1(i_UBound + 1) = vbNullString   ' The 'Join' operation will insert a trailing row
End If                                      ' delimiter here (Not required by the last chunk)

If UBound(arrTemp2) <> j_UBound Then
    ReDim arrTemp2(j_LBound To j_UBound)
End If


' Data body. This is heavily optimised to avoid VBA String functions with allocations

For i = i_LBound To i_UBound Step 1

    ' If this is confusing... Were you expecting FetchArray(i,j)? i for row, j for column?
    ' FetchArray comes from RecordSet.GetRows(), which returns a TRANSPOSED array: i and j
    ' are still the field and record ordinals, row(i) and column(j) in the output file.

    For j = j_LBound To j_UBound

        If IsNull(FetchArray(j, i)) Then
            arrTemp2(j) = ""
        Else
            arrTemp2(j) = FetchArray(j, i)  ' confused? see he note above
        End If

        If CleanupText Or (i_UBound = 0) Then  ' (i_UBound=0): always clean up field names

            arrBytes = arrTemp2(j) ' Integer arithmetic is faster than string-handling for
                                   ' this: all VBA string operations require an allocation

            For k = LBound(arrBytes) To UBound(arrBytes) Step 2

                Select Case arrBytes(k)
                Case 10, 13, 9, 160
                    If arrBytes(k + 1) = 0 Then
                        arrBytes(k) = 32 ' replaces CR, LF, Tab, and non-breaking
                    End If               ' spaces with the standard ANSI space
                Case 44
                    If Not CoerceText Then
                        If arrBytes(k + 1) = 0 Then
                            arrBytes(k) = 32 ' replace comma with the ANSI space
                        End If
                    End If
                Case 34
                    If arrBytes(k + 1) = 0 Then
                        arrBytes(k) = 39  ' replaces double-quote with single quote
                    End If
                End Select

            Next k

            arrTemp2(j) = arrTemp2(j)

        End If  ' cleanup


        If CoerceText Then  ' encapsulate all fields in quotes, numeric or not

           arrTemp3(1) = arrTemp2(j)
           arrTemp2(j) = Join$(arrTemp3, vbNullString)

        ElseIf (i = 0) And (i = i_UBound) Then ' always encapsulate field names

           arrTemp3(1) = arrTemp2(j)
           arrTemp2(j) = Join$(arrTemp3, vbNullString)

        Else ' selective encapsulation, leaving numeric fields unencapsulated:
             ' we *could* do this by reading the ADODB field types: but that's
             ' slower, and you may be 'caught out' by provider-specific types.

            arrBytes = arrTemp2(j)

            boolNumeric = True

            For k = LBound(arrBytes) To UBound(arrBytes) Step 2
                If arrBytes(k) < 43 Or arrBytes(k) > 57 Then

                    If arrBytes(k) <> 69 Then
                        boolNumeric = False
                        Exit For
                    Else
                        If k > UBound(arrBytes) - 5 Then
                            boolNumeric = False
                            Exit For
                        ElseIf arrBytes(k + 2) = 45 Then
                            ' detect "1.234E-05"
                        ElseIf arrBytes(k + 2) = 43 Then
                            ' detect "1.234E+05"
                        Else
                            boolNumeric = False
                            Exit For
                        End If
                    End If

                End If
            Next k

            If boolNumeric Then
               For k = 1 + LBound(arrBytes) To UBound(arrBytes) Step 2
                   If arrBytes(k) <> 0 Then
                       boolNumeric = False
                       Exit For
                   End If
               Next k
            End If

           arrBytes = vbNullString

           If Not boolNumeric Then ' text field, encapsulate it
               arrTemp3(1) = arrTemp2(j)
               arrTemp2(j) = Join(arrTemp3, vbNullString)
           End If

        End If ' CoerceText

    Next j

   arrTemp1(i) = Join(arrTemp2, COMMA)

Next i

iRowCount = iRowCount + i - 2


'   **** WHY WE 'PUT' A BYTE ARRAY INSTEAD OF A VBA STRING VARIABLE  **** ****
'
'       Put #hndFile, , StrConv(Join(arrTemp1, EOROW), vbUnicode)
'       Put #hndFile, , Join(arrTemp1, EOROW)
'
'   If you pass unicode, Wide or UTF-16 string variables to PUT, it prepends a
'   Unicode Byte Order Mark to the data which, when written to your file, will
'   render the field names illegible to Microsoft's JET ODBC and ACE-OLEDB SQL
'   drivers (which can actually read unicode field names, if the helpful label
'   isn't in the way). The primeval 'PUT' statement writes a Byte array as-is.
'
'   **** **** **** **** **** **** **** **** **** **** **** **** **** **** ****


arrBytes = Join$(arrTemp1, vbCrLf)


If hndFile = 0 Then

    i_Offset = 1
    If Len(Dir(OutputFile)) > 0 Then
        VBA.FileSystem.Kill OutputFile
    End If

    WaitForFileDeletion OutputFile

    hndFile = FreeFile
    Open OutputFile For Binary Access Write As #hndFile

End If


Put #hndFile, i_Offset, arrBytes
i_Offset = i_Offset + 1 + UBound(arrBytes)
Erase arrBytes


If rst.EOF Then
    Erase FetchArray
    FetchArray = Empty
Else
    If IsMissing(FieldList) Then
        FetchArray = rst.GetRows(FETCH_ROWS)
    Else
        FetchArray = rst.GetRows(FETCH_ROWS, , FieldList)
    End If
End If

Loop ' until isempty(FetchArray)

If iRowCount < 1 Then ' iRowCount = 0 ' Row Count excludes the header End If

RecordsetToCSV = iRowCount

ExitSub:

On Error Resume Next

If hndFile <> 0 Then
    Close #hndFile
End If

Erase arrBytes
Erase arrTemp1
Erase arrTemp2
Exit Function

ErrSub:

Resume ExitSub

End Function

Public Function FilePath(Path As String) As String ' Strip the filename from a path, leaving only the path to the folder ' The last char of this path will be the backslash

' This does not check for the existence or accessibility of the file: ' all we're doing here is string-handling

Dim strPath As String Dim arrPath() As String

Const BACKSLASH As String * 1 = "\"

strPath = Trim(Path)

If strPath = "" Then Exit Function If Right$(strPath, 1) = BACKSLASH Then Exit Function

arrPath = Split(strPath, BACKSLASH)

If UBound(arrPath) = 0 Then ' does not contain "\" FilePath = "" Else arrPath(UBound(arrPath)) = vbNullString FilePath = Join$(arrPath, BACKSLASH) End If

Erase arrPath

End Function

Public Function FileName(Path As String) As String ' Strip the folder and path from a file's path string, leaving only the file name

' This does not check for the existence or accessibility of the file: ' all we're doing here is string-handling

Dim strPath As String Dim arrPath() As String

Const BACKSLASH As String * 1 = "\"

strPath = Trim(Path)

If strPath = "" Then Exit Function If Right$(strPath, 1) = BACKSLASH Then Exit Function

arrPath = Split(strPath, BACKSLASH)

If UBound(arrPath) = 0 Then ' does not contain "\" FileName = Path Else FileName = arrPath(UBound(arrPath)) End If

Erase arrPath

End Function

Public Function FileExtension(Path As String) As String ' Return the extension of the file

' This is just string-handling: no file or path validation is attempted ' The file extension is deemed to be whatever comes after the final '.' ' The extension is returned with the dot, eg: ".txt" not "txt" ' If no extension is detected, FileExtension returns an empty string

Dim strFile As String Dim arrFile() As String Const DOT_EXT As String * 1 = "."

strFile = FileName(Path) strFile = Trim(strFile)

If strFile = "" Then Exit Function If Right$(strFile, 1) = DOT_EXT Then Exit Function

arrFile = Split(strFile, DOT_EXT)

If UBound(arrFile) = 0 Then ' does not contain "\" FileExtension = vbNullString Else FileExtension = arrFile(UBound(arrFile)) FileExtension = Trim(FileExtension) If Len(FileExtension) > 0 Then FileExtension = DOT_EXT & FileExtension End If End If

Erase arrFile

End Function

Public Function FileStripExtension(Path As String) As String ' Return the filename, with the extension removed

' This is just string-handling: no file validation is attempted ' The file extension is deemed to be whatever comes after the final '.' ' Both the dot and the extension are removed

Dim strFile As String Dim arrFile() As String Const DOT_EXT As String * 1 = "."

strFile = FileName(Path)

If strFile = "" Then Exit Function If Right$(strFile, 1) = DOT_EXT Then Exit Function

strFile = Trim(strFile)

arrFile = Split(strFile, DOT_EXT)

If UBound(arrFile) = 0 Then ' does not contain "\" FileStripExtension = vbNullString Else ReDim Preserve arrFile(LBound(arrFile) To UBound(arrFile) - 1) FileStripExtension = Join$(arrFile, DOT_EXT) End If

Erase arrFile

End Function

如果您还没有自己的版本,您还需要三个路径和文件名实用程序函数:

  • 文件名()
  • 文件路径()
  • FileStripExtension()

字符串封装逻辑还有改进的空间:正确的方法是查找记录集的字段类型并相应地应用引号,结果很可能比我笨重的要快字节计数方法。但是,我的方法完全是关于文件消费者以及他们期望看到的内容;这并不总是与应该接受的一致。

如果您成功编写了更快、更强大的版本,请告诉我。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2012-11-22
    • 2015-04-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-11-08
    • 1970-01-01
    相关资源
    最近更新 更多