使用 SERIAL 自动生成 ID 在 Postgresql 数据库中保存数据框答案

【问题标题】：Save dataframe in Postgresql Database with SERIAL Autogenerated ID使用 SERIAL 自动生成 ID 在 Postgresql 数据库中保存数据框
【发布时间】：2019-12-03 00:07:05
【问题描述】：

以下列方式拥有一个数据框：

     word classification  counter
0   house           noun        2
1     the        article        2
2   white      adjective        1
3  yellow      adjective        1

我想使用以下定义存储在 Postgresql 表中：

CREATE TABLE public.word_classification (
    id SERIAL,
    word character varying(100),
    classification character varying(10),
    counter integer,
    start_date date,
    end_date date
);
ALTER TABLE public.word_classification OWNER TO postgres;

我目前的基本配置如下：

from sqlalchemy import create_engine
import pandas as pd

# Postgres username, password, and database name
POSTGRES_ADDRESS = 'localhost' ## INSERT YOUR DB ADDRESS IF IT'S NOT ON PANOPLY
POSTGRES_PORT = '5432'
POSTGRES_USERNAME = 'postgres' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES USERNAME
POSTGRES_PASSWORD = 'BVict31C' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES PASSWORD 
POSTGRES_DBNAME = 'local-sandbox-dev' ## CHANGE THIS TO YOUR DATABASE NAME
# A long string that contains the necessary Postgres login information
postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME,password=POSTGRES_PASSWORD,ipaddress=POSTGRES_ADDRESS,port=POSTGRES_PORT,dbname=POSTGRES_DBNAME))
# Create the connection
cnx = create_engine(postgres_str)

data=[['the','article',0],['house','noun',1],['yellow','adjective',2],
      ['the','article',4],['house','noun',5],['white','adjective',6]]
df = pd.DataFrame(data, columns=['word','classification','position'])
df_db = pd.DataFrame(columns=['word','classification','counter','start_date','end_date'])

count_series=df.groupby(['word','classification']).size()
new_df = count_series.to_frame(name = 'counter').reset_index()
df_db = new_df.to_sql('word_classification',cnx,if_exists='append',chunksize=1000)

我想插入到表中，因为我可以使用 SQL 语法：

insert into word_classification(word, classification, counter)values('hello','world',1);

目前，我在插入表格时遇到错误，因为我正在传递索引：

(psycopg2.errors.UndefinedColumn) column "index" of relation "word_classification" does not exist
LINE 1: INSERT INTO word_classification (index, word, classification...
                                         ^

[SQL: INSERT INTO word_classification (index, word, classification, counter) VALUES (%(index)s, %(word)s, %(classification)s, %(counter)s)]
[parameters: ({'index': 0, 'word': 'house', 'classification': 'noun', 'counter': 2}, {'index': 1, 'word': 'the', 'classification': 'article', 'counter': 2}, {'index': 2, 'word': 'white', 'classification': 'adjective', 'counter': 1}, {'index': 3, 'word': 'yellow', 'classification': 'adjective', 'counter': 1})]

我一直在寻找摆脱没有运气通过索引的方法。

感谢您的帮助

【问题讨论】：

你想通过将“索引”传递给 INSERT 语句来实现什么？ INSERT 添加一个新行并设置指定列的值。您的表中没有名为“index”的列。

标签： python pandas postgresql dataframe sql-insert

【解决方案1】：

在数据库中存储时关闭索引如下：

df_db = new_df.to_sql('word_classification',cnx,if_exists='append',chunksize=1000, index=False)

【讨论】：