【问题标题】:Quickest Way to Add New Rows to Datatable That Could Contain Duplicates向可能包含重复项的数据表添加新行的最快方法
【发布时间】:2013-11-11 16:53:40
【问题描述】:

我有一个充满股票价格数据的表格。每行都有唯一的股票代码和日期组合。我通过获取包含每天每个股票价格数据的 CSV 文件来加载新数据。我知道 CSV 文件中有重复项。我只想添加尚未在我的数据表中的数据。最快的方法是什么?

我应该尝试添加每一行并捕获每个异常吗?或者,我是否应该通过读取我的数据表来比较每一行与我的数据表以查看该行是否已经存在?或者,还有其他选择吗?

其他信息

这就是我一直在做的。对于 CSV 文件中的每一行,我都会读取我的数据表以查看它是否已经存在。

Dim strURL As String
    Dim strBuffer As String
    strURL = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
    strBuffer = RequestWebData(strURL)
    Dim sReader As New StringReader(strBuffer)
    Dim List As New List(Of String)
    Do While sReader.Peek >= 0
        List.Add(sReader.ReadLine)
    Loop
    List.RemoveAt(0)
    Dim lines As String() = List.ToArray
    sReader.Close()
    For Each line In lines
        Dim checkDate = line.Split(",")(0).Trim()
        Dim dr As OleDbDataReader
        Dim cmd2 As New OleDb.OleDbCommand("SELECT * FROM " & tblName & " WHERE Ticker = ? AND [Date] = ?", con)
        cmd2.Parameters.AddWithValue("?", tickerValue)
        cmd2.Parameters.AddWithValue("?", checkDate)
        dr = cmd2.ExecuteReader
        If dr.Read() = 0 Then
            Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & " (Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", con)
            cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
            cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = checkDate
            cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = line.Split(",")(1).Trim
            cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = line.Split(",")(2).Trim
            cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = line.Split(",")(3).Trim
            cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = line.Split(",")(4).Trim
            cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = line.Split(",")(5).Trim
            cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = line.Split(",")(6).Trim
            cmd3.ExecuteNonQuery()
        Else
        End If

这是我已经切换到的,它给出了这个异常:The changes you requested to the table were not successful because they would create duplicate values in the index, primary key, or relationship. Change the data in the field or fields that contain duplicate data, remove the index, or redefine the index to permit duplicate entries and try again. 我可以每次都捕捉到这个异常并忽略它,直到我遇到新的一行。

Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
    Debug.WriteLine(strURL)
    Dim strBuffer As String = RequestWebData(strURL)
    Using streamReader = New StringReader(strBuffer)
        Using reader = New CsvReader(streamReader)
            reader.ReadHeaderRecord()
            While reader.HasMoreRecords
                Dim dataRecord As DataRecord = reader.ReadDataRecord()
                Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & " (Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", con)
                cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
                cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
                cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = dataRecord.Item("Open")
                cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = dataRecord.Item("High")
                cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = dataRecord.Item("Low")
                cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = dataRecord.Item("Close")
                cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = dataRecord.Item("Volume")
                cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = dataRecord.Item("Adj Close")
                cmd3.ExecuteNonQuery()
            End While
        End Using
    End Using

我只想用最有效的方法。

更新

根据下面的答案,这是我到目前为止的代码:

 Dim strURL As String = "http://ichart.yahoo.com/table.csv?s=" & tickerValue
    Dim strBuffer As String = RequestWebData(strURL)
    Using streamReader = New StringReader(strBuffer)
        Using reader = New CsvReader(streamReader)
            ' the CSV file has a header record, so we read that first
            reader.ReadHeaderRecord()

            While reader.HasMoreRecords
                Dim dataRecord As DataRecord = reader.ReadDataRecord()
                Dim cmd3 As OleDbCommand = New OleDbCommand("INSERT INTO " & tblName & "(Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) " & "SELECT ?, ?, ?, ?, ?, ?, ?, ? " & "FROM DUAL " & "WHERE NOT EXISTS (SELECT 1 FROM " & tblName & " WHERE Ticker = ? AND [Date] = ?)", con)
                cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
                cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
                cmd3.Parameters.Add("@[Open]", OleDbType.VarChar).Value = dataRecord.Item("Open")
                cmd3.Parameters.Add("@High", OleDbType.VarChar).Value = dataRecord.Item("High")
                cmd3.Parameters.Add("@Low", OleDbType.VarChar).Value = dataRecord.Item("Low")
                cmd3.Parameters.Add("@[Close]", OleDbType.VarChar).Value = dataRecord.Item("Close")
                cmd3.Parameters.Add("@Volume", OleDbType.VarChar).Value = dataRecord.Item("Volume")
                cmd3.Parameters.Add("@Adj_Close", OleDbType.VarChar).Value = dataRecord.Item("Adj Close")
                cmd3.Parameters.Add("@Ticker", OleDbType.VarChar).Value = tickerValue
                cmd3.Parameters.Add("@[Date]", OleDbType.VarChar).Value = dataRecord.Item("Date")
                cmd3.ExecuteNonQuery()
            End While
        End Using
    End Using

它给了我这个错误Data type mismatch in criteria expression.

【问题讨论】:

  • 如何加载表以及如何确定重复?为什么您认为添加重复项会出现异常?
  • @TimSchmelter 我已在原始帖子中添加了其他信息以回答您的问题。
  • 您使用哪种 DBMS?
  • @Fabian Microsoft Access

标签: sql vb.net primary-key sql-insert database-table


【解决方案1】:

大多数 DBMS 支持 INSERT 命令的(非标准)子句以忽略重复项,例如:

MySQL:INSERT IGNORE INTO ...

SQLite:插入或忽略进入...

这是非批处理模式下最快的方法,因为您不必在写入之前读取数据库。

您可以使用标准 SQL 执行相同的操作:

INSERT INTO ... 
SELECT <your values> 
WHERE NOT EXISTS ( <query for your values by id> );

或者(当您明确需要 FROM 子句时):

INSERT INTO ... 
SELECT <your values> 
FROM DUAL 
WHERE NOT EXISTS ( <query for your values by id> );

编辑

MS Access 没有内置的 DUAL 表(即始终只包含一行的表),但 Access 需要 FROM 子句。所以你必须建立自己的 DUAL 表:

CREATE TABLE DUAL (DUMMY INTEGER);
INSERT INTO DUAL VALUES (1);

您只需一劳永逸地执行此操作。然后,在您的代码中,您将插入类似

INSERT INTO MyTable (A,B,C,D)
SELECT 123, 456, 'Hello', 'World'
FROM DUAL
WHERE NOT EXISTS (SELECT 1 FROM MyTable WHERE A = 123 AND B = 456);

因此,对于您的示例,请使用:

Dim cmd3 As OleDbCommand = New OleDbCommand(_ 
    "INSERT INTO " & tblName &  _ 
    "(Ticker, [Date], [Open], High, Low, [Close], Volume, Adj_Close) " & _ 
    "SELECT ?, ?, ?, ?, ?, ?, ?, ? " & _ 
    "FROM DUAL " & _
    "WHERE NOT EXISTS (SELECT 1 FROM tblName WHERE Ticker = ? AND [Date] = ? AND ...)", con)

(WHERE 子句取决于您的键列)

【讨论】:

  • 我已经编辑了 MS Access 的帖子。不幸的是,Access 既没有 DUAL 表,也不允许没有 FROM 的 SELECT。所以你必须建立自己的 DUAL 表。
  • 你只需要一次就可以了,它永远是一行(在其他DBMS中是一个预定义的“常量表”),你可以在Access中手动创建它。它只需要在那里你的代码才能工作。
  • SQL中有两种INSERT语句:INSERT INTO ... VALUES和INSERT INTO ... SELECT。后者将查询的结果插入到表中,这可以是零个或任意数量的结果。我帖子中的“技巧”(非常标准)是从始终包含一行的表(DUAL)中选择一个常量元组(您要插入的行)。使用 WHERE EXISTS ( ... ),SELECT FROM DUAL 要么给你一行(当记录还不存在时),所以 INSERT 将插入该行,或者 SELECT FROM DUAL 会给你零行,所以没有记录被插入。
  • 我已经在我的帖子中添加了一个具体示例(用于表 MyTable)。
  • 你可以这样创建:CREATE TABLE DUAL(DUMMY INT);插入双重值(1);只需确保您只执行一次 - DUAL 必须只包含一条记录。因此,我建议,如果 CREATE TABLE DUAL 失败,请不要执行 INSERT INTO DUAL,因为您已经从之前的运行中获得了它。
猜你喜欢
  • 2011-04-05
  • 2016-03-13
  • 2010-09-20
  • 2014-08-31
  • 1970-01-01
  • 2017-07-22
  • 2011-08-17
  • 2017-10-30
  • 1970-01-01
相关资源
最近更新 更多