【问题标题】:Extracting files from an Attachment field in an Access database从 Access 数据库中的附件字段中提取文件
【发布时间】:2014-11-09 22:14:37
【问题描述】:

我们正在进行一个项目,我们需要将存储在 Access 数据库中的数据迁移到缓存数据库。 Access 数据库包含数据类型为Attachment 的列;一些元组包含多个附件。我可以使用.FileName 获取这些文件的文件名,但我不确定如何确定一个文件何时结束,而另一个文件何时在.FileData 中开始。

我正在使用以下方法获取此数据:

System.Data.OleDb.OleDbCommand command= new System.Data.OleDb.OleDbCommand();
command.CommandText = "select [Sheet1].[pdf].FileData,* from [Sheet1]";
command.Connection = conn;
System.Data.OleDb.OleDbDataReader rdr = command.ExecuteReader();

【问题讨论】:

    标签: .net database ms-access


    【解决方案1】:

    我花了一段时间整理信息以检索存储在附件字段中的文件,所以我只是想分享它。

    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Linq;
    using System.Text;
    using System.Windows.Forms;
    using System.Data.OleDb;
    using System.IO;
    using System.Diagnostics;
    
    namespace AttachCheck
    {
        public partial class Form1 : Form
        {
            DataSet Set1 = new DataSet();
            int ColId;
    
            public Form1()
            {
                InitializeComponent();
    
                OleDbConnection connect = new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source='db/Adb.accdb'"); //set up connection
                //CL_ID is a fk so attachments can be linked to users
                OleDbCommand sql = new OleDbCommand("SELECT at_ID, [at_Name].[FileData], [at_Name].[FileName], [at_Name].[FileType] FROM Attachments WHERE at_ID =1;", connect);
                //adding sql to addapter to be ran
    
                OleDbDataAdapter OleDA = new OleDbDataAdapter(sql);
                //attempting to open connection
                try { connect.Open(); }
                catch (Exception err) { System.Console.WriteLine(err); }
    
                
                OleDA.Fill(Set1); //create and fill dataset
                connect.Close();for (int i = 0; i < Set1.Tables[0].Rows.Count; i++)
                {
                    System.Console.WriteLine(Set1.Tables[0].Rows[i]["at_Name.FileName"].ToString() + "This is the file name");
    
    
                // by using a datagrid it allows you to display the attachments and select which to open, the open should be a button.
                dataGridView1.Rows.Add(new object[] { Set1.Tables[0].Rows[i]["at_ID"].ToString(), Set1.Tables[0].Rows[i]["at_Name.FileName"].ToString(), "Open" });
                }
            }
    
            private void dataGridView1_CellContentClick(object sender, DataGridViewCellEventArgs e)
            {
    
                DataGridViewCell cell = (DataGridViewCell)
                dataGridView1.Rows[e.RowIndex].Cells[e.ColumnIndex];
    
                System.Console.WriteLine(dataGridView1.Rows[e.RowIndex].Cells[e.ColumnIndex]);
                string FullRow = dataGridView1.Rows[e.RowIndex].ToString(); //data retrieved from click on datagrid 
                //need to sub string to cut away row index and leave number
                string SubRow = FullRow.Substring(24, 1); //cutting string down from position 24 for 1 character
    
                System.Console.WriteLine(SubRow + " This is Row"); //
    
                int RowId = int.Parse(SubRow); //turn row number from string into integer that can be used
    
                string FullRow2 = dataGridView1.Rows[e.RowIndex].Cells[e.ColumnIndex].ToString(); //data retrieved from click on datagrid 
                //need to sub string to cut away row index and leave number
                string SubRow2 = FullRow2.Substring(37, 1); //cutting string down from position 24 for 1 character
                System.Console.WriteLine(SubRow2 + " This is Column"); //
                int ColId = int.Parse(SubRow2); //turn row number from string into integer that can be used
    
                
                if (ColId == 2)
                {
                    string fileName = Set1.Tables[0].Rows[RowId]["at_Name.FileName"].ToString(); //assign the file to variable
    
                    //retrieving the file contents from the database as an array of bytes
                    byte[] fileContents = (byte[])Set1.Tables[0].Rows[RowId]["at_Name.FileData"];
    
    
                    fileContents = GetFileContents(fileContents); //send filecontents array to be decrypted
    
                    string fileType = Set1.Tables[0].Rows[RowId]["at_Name.FileType"].ToString();
    
    
                    DisplayTempFile(fileName, fileContents, fileType); //forward the file type to display file contents   
                }
            }
    
            private const int CONTENT_START_INDEX_DATA_OFFSET = 0; //values used for decoding 
            private const int UNKNOWN_DATA_OFFSET = 4; //the files
            private const int EXTENSION_LENGTH_DATA_OFFSET = 8; //storedw within the access database
            private const int EXTENSION_DATA_OFFSET = 12; //and this one
    
    
            private byte[] GetFileContents(byte[] fileContents)
            {
    
                int contentStartIndex = BitConverter.ToInt32(fileContents, CONTENT_START_INDEX_DATA_OFFSET);
    
                //'The next four bytes represent a value whose meaning is unknown at this stage, although it may represent a Boolean value indicating whether the data is compressed or not.
                int unknown = BitConverter.ToInt32(fileContents, UNKNOWN_DATA_OFFSET);
    
                //'The next four bytes contain the the length, in characters, of the file extension.
                int extensionLength = BitConverter.ToInt32(fileContents, EXTENSION_LENGTH_DATA_OFFSET);
    
                //'The next field in the header is the file extension, not including a dot but including a null terminator.
                //'Characters are Unicode so double the character count to get the byte count.
                string extension = Encoding.Unicode.GetString(fileContents, EXTENSION_DATA_OFFSET, extensionLength * 2);
                return fileContents.Skip(contentStartIndex).ToArray();
    
    
            }
    
    
            private void DisplayTempFile(string fileName, byte[] fileContents, string fileType)
            {
    
                // System.Console.WriteLine(fileName + "File Name");
                // System.Console.WriteLine(fileType + "File Type");
                // System.Console.WriteLine(fileContents + "File Contents");
                
                string tempFolderPath = Path.GetTempPath(); //creating a temperary path for file to be opened from
                string tempFilePath = Path.Combine(tempFolderPath, fileName); // assigning the file to the path
    
                if (!string.IsNullOrEmpty(tempFilePath)) //checking the temp file exists
                {
                    tempFilePath = Path.Combine(tempFolderPath, //combines the strings 0 and 1 below
                    String.Format("{0}{1}",
                    Path.GetFileNameWithoutExtension(fileName),      //0                                                    
                    Path.GetExtension(fileName))); //1
                }
    
                //System.Console.WriteLine(tempFolderPath + " tempFolderPath");
                //System.Console.WriteLine(tempFilePath + " tempFilePath");
    
                //'Save the file and open it.
                File.WriteAllBytes(tempFilePath, fileContents);
                //creates new file, writes bytes array to it then closes the file
                //File.ReadAllBytes(tempFilePath);
    
                //'Open the file.
                System.Diagnostics.Process attachmentProcess = Process.Start(tempFilePath);
                //chooses the program to open the file if available on the computer
    
            }
        }
    }
    

    希望这对某人有所帮助

    【讨论】:

      【解决方案2】:

      以下代码遍历 Microsoft Access 数据库数据表的所有记录,并将每一行分配给一个记录集。遍历保存在“文档”字段中的所有附件。然后提取这些文件并将其保存在磁盘上。 这段代码是上面“Gord Thompson”介绍的代码的扩展。 我唯一做的就是为 Visual Basic.NET 编写代码。

      Imports Microsoft.Office.Interop.Access.Dao
      

      使用上述代码行引用 Dao。

      'Visual Basic.NET
      Private Sub ReadAttachmentFiles()
          'required COM reference: Microsoft Office 14.0 Access Database Engine Object Library
          'define a new database engine and a new database
          Dim dbe = New DBEngine
          Dim db As Database = dbe.OpenDatabase("C:\Users\Meisam\Documents\Databases\myDatabase.accdb")
          'define the main recordset object for each row
          Dim rstMain As Recordset = db.OpenRecordset( _
                  "SELECT * FROM Companies", _
                  RecordsetTypeEnum.dbOpenSnapshot)
          'evaluate whether the recordset is empty of records
          If Not (rstMain.BOF And rstMain.EOF) Then
              'if not empty, then move to the first record
              rstMain.MoveFirst()
              'do until the end of recordset is not reached
              Do Until rstMain.EOF
                  Dim myID As Integer = -1
                  ' ID is the name of primary field with uniqe values field 
                  myID = CInt(rstMain.Fields("ID").Value)
                  'define the secondary recordset object for the attachment field "Docs"
                  Dim rstAttach As Recordset2 = rstMain.Fields("Docs").Value
                  'evaluate whether the recordset is empty of records
                  If Not (rstAttach.BOF And rstAttach.EOF) Then
                      'if not empty, then move to the first record
                      rstAttach.MoveFirst()
                      'do until the end of recordset is not reached
                      Do Until rstAttach.EOF
                          'get the filename for each attachment in the field "Docs"
                          Dim fileName As String = rstAttach.Fields("FileName").Value
                          Dim fld As Field2 = rstAttach.Fields("FileData")
                          fld.SaveToFile("C:\Users\Meisam\Documents\test\" & myID & "_" & fileName)
                          rstAttach.MoveNext()
                      Loop
                  End If
                  rstMain.MoveNext()
              Loop
          End If
          'close the database
          db.Close()
      End Sub
      

      【讨论】:

        【解决方案3】:

        (我对这个问题的原始回答具有误导性。它适用于随后使用 Adob​​e Reader 打开的 PDF 文件,但它并不总是适用于其他类型的文件。以下是更正版本。 )

        很遗憾,我们无法使用 OleDb 直接检索 Access Attachment 字段中的文件内容。 Access 数据库引擎会将一些元数据添加到文件的二进制内容中,如果我们通过 OleDb 检索 .FileData,则会包含这些元数据。

        为了说明,使用 Access UI 将名为“Document1.pdf”的文档保存到附件字段。该 PDF 文件的开头如下所示:

        如果我们使用以下代码尝试将 PDF 文件提取到磁盘

        using (OleDbCommand cmd = new OleDbCommand())
        {
            cmd.Connection = con;
            cmd.CommandText = 
                    "SELECT Attachments.FileData " +
                    "FROM AttachTest " +
                    "WHERE Attachments.FileName='Document1.pdf'";
            using (OleDbDataReader rdr = cmd.ExecuteReader())
            {
                rdr.Read();
                byte[] fileData = (byte[])rdr[0];
                using (var fs = new FileStream(
                        @"C:\Users\Gord\Desktop\FromFileData.pdf", 
                        FileMode.Create, FileAccess.Write))
                {
                    fs.Write(fileData, 0, fileData.Length);
                    fs.Close();
                }
            }
        }
        

        然后生成的文件将在文件开头包含元数据(在这种情况下为 20 个字节)

        Adobe Reader 能够打开此文件,因为它足够强大,可以忽略文件中可能出现在“%PDF-1.4”签名之前的任何“垃圾”。不幸的是,并非所有文件格式和应用程序都对文件开头的多余字节如此宽容。

        唯一Official™ 从 Access 中的 Attachment 字段中提取文件的方法是使用 ACE DAO Field2 对象的 .SaveToFile 方法,如下所示:

        // required COM reference: Microsoft Office 14.0 Access Database Engine Object Library
        //
        // using Microsoft.Office.Interop.Access.Dao; ...
        var dbe = new DBEngine();
        Database db = dbe.OpenDatabase(@"C:\Users\Public\Database1.accdb");
        Recordset rstMain = db.OpenRecordset(
                "SELECT Attachments FROM AttachTest WHERE ID=1",
                RecordsetTypeEnum.dbOpenSnapshot);
        Recordset2 rstAttach = rstMain.Fields["Attachments"].Value;
        while ((!"Document1.pdf".Equals(rstAttach.Fields["FileName"].Value)) && (!rstAttach.EOF))
        {
            rstAttach.MoveNext();
        }
        if (rstAttach.EOF)
        {
            Console.WriteLine("Not found.");
        }
        else
        {
            Field2 fld = (Field2)rstAttach.Fields["FileData"];
            fld.SaveToFile(@"C:\Users\Gord\Desktop\FromSaveToFile.pdf");
        }
        db.Close();
        

        请注意,如果您尝试使用 Field2 对象的.Value,您仍然会在字节序列的开头获得元数据; .SaveToFile 进程将其剥离出来。

        【讨论】:

        • 很有意思,一直在找这样的东西,找不到了。
        • 如何从附件列中提取文件数组?
        • @RuneJeppesen - 循环通过 Recordset2,将单个文件提取到磁盘,然后将文件路径添加到您的阵列。如果你想要数组中的文件 contents 然后从磁盘读回文件,将内容(字节数组)添加到数组中,然后删除磁盘文件。
        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2014-08-20
        • 1970-01-01
        • 1970-01-01
        • 2016-12-09
        相关资源
        最近更新 更多