将 pandas df 导出到 sqlite 会导致重复的数据集而不是一个更新的数据集答案

【问题标题】：Exporting a pandas df to sqlite leads to duplicate datasets instead of one updated dataset将 pandas df 导出到 sqlite 会导致重复的数据集而不是一个更新的数据集
【发布时间】：2019-04-09 17:20:29
【问题描述】：

我正在通过 sqlalchmemy 将 csv 文件中的 pandas 数据帧上传到 sqlite 数据库中。初始填充工作正常，但是当我重新运行以下代码时，再次导出相同的数据并且数据库包含两个相同的数据集。

如何更改代码，以便仅将新的或更改的数据上传到数据库中？

import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, Numeric, DateTime
from sqlalchemy.orm import sessionmaker
from datetime import datetime
import pandas as pd

# Set up of the engine to connect to the database
# the urlquote is used for passing the password which might contain special characters such as "/"
engine = create_engine('sqlite:///historical_data3.db')
conn = engine.connect()
Base = declarative_base()

# Declaration of the class in order to write into the database. This structure is standard and should align with SQLAlchemy's doc.
class Timeseries_Values(Base):
    __tablename__ = 'Timeseries_Values'

    #id = Column(Integer)
    Date = Column(DateTime, primary_key=True)
    ProductID = Column(Integer, primary_key=True)
    Value = Column(Numeric)

    @property
    def __repr__(self):
        return "(Date='%s', ProductID='%s', Value='%s')" % (self.Date, self.ProductID, self.Value)



fileToRead = r'V:\PYTHON\ProjectDatabase\HistoricalDATA_V13.csv'
tableToWriteTo = 'Timeseries_Values'

# Panda to create a dataframe with ; as separator.
df = pd.read_csv(fileToRead, sep=';', decimal=',', parse_dates=['Date'], dayfirst=True)
# The orient='records' is the key of this, it allows to align with the format mentioned in the doc to insert in bulks.
listToWrite = df.to_dict(orient='records')

# Set up of the engine to connect to the database
# the urlquote is used for passing the password which might contain special characters such as "/"

metadata = sqlalchemy.schema.MetaData(bind=engine, reflect=True)
table = sqlalchemy.Table(tableToWriteTo, metadata, autoload=True)

# Open the session
Session = sessionmaker(bind=engine)
session = Session()

# Insert the dataframe into the database in one bulk
conn.execute(table.insert(), listToWrite)

# Commit the changes
session.commit()

# Close the session
session.close()

【问题讨论】：

您是否使用声明模型创建了表？如果是这样，它不应该允许重复开始，因为 { Date, ProductID } 是一个键。
你是说问题在于我如何设置数据库而不是“导出代码”？
两者。如果您的表具有声明模型中描述的主键，则不允许在键列中插入具有重复值的行。导出代码必须考虑到可能存在重复的事实并决定如何处理这些。
好的，感谢您的解释。所以我再次使用声明性模型设置了表格。初始导出工作正常，当我尝试再次将相同的数据导出到表中时，我收到“sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed”。所以这是有道理的，但是当我尝试导出一个包含新旧数据的新数据集时，我得到了同样的错误。我是否必须更改“conn.execute(table.insert(), listToWrite)”代码？我尝试过 update() 而不是 insert()，但这没有帮助。

标签： python pandas sqlite sqlalchemy dataset

【解决方案1】：

现在可以了，我已经添加了 df.to_sql 代码：

import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, Numeric, DateTime
from sqlalchemy.orm import sessionmaker
from datetime import datetime
import pandas as pd

# Set up of the engine to connect to the database
# the urlquote is used for passing the password which might contain special characters such as "/"
engine = create_engine('sqlite:///historical_data3.db')
conn = engine.connect()
Base = declarative_base()

# Declaration of the class in order to write into the database. This structure is standard and should align with SQLAlchemy's doc.
class Timeseries_Values(Base):
    __tablename__ = 'Timeseries_Values'

    #id = Column(Integer)
    Date = Column(DateTime, primary_key=True)
    ProductID = Column(Integer, primary_key=True)
    Value = Column(Numeric)


fileToRead = r'V:\PYTHON\ProjectDatabase\HistoricalDATA_V13.csv'
tableToWriteTo = 'Timeseries_Values'

# Panda to create a dataframe with ; as separator.
df = pd.read_csv(fileToRead, sep=';', decimal=',', parse_dates=['Date'], dayfirst=True)
# The orient='records' is the key of this, it allows to align with the format mentioned in the doc to insert in bulks.
listToWrite = df.to_dict(orient='records')

df.to_sql(name='Timeseries_Values', con=conn, if_exists='replace')

metadata = sqlalchemy.schema.MetaData(bind=engine, reflect=True)
table = sqlalchemy.Table(tableToWriteTo, metadata, autoload=True)

# Open the session
Session = sessionmaker(bind=engine)
session = Session()

# Insert the dataframe into the database in one bulk
conn.execute(table.insert(), listToWrite)

# Commit the changes
session.commit()

# Close the session
session.close()

【讨论】：