【问题标题】:Save dataframe in Postgresql Database with SERIAL Autogenerated ID使用 SERIAL 自动生成 ID 在 Postgresql 数据库中保存数据框
【发布时间】:2019-12-03 00:07:05
【问题描述】:

以下列方式拥有一个数据框:

     word classification  counter
0   house           noun        2
1     the        article        2
2   white      adjective        1
3  yellow      adjective        1

我想使用以下定义存储在 Postgresql 表中:

CREATE TABLE public.word_classification (
    id SERIAL,
    word character varying(100),
    classification character varying(10),
    counter integer,
    start_date date,
    end_date date
);
ALTER TABLE public.word_classification OWNER TO postgres;

我目前的基本配置如下:

from sqlalchemy import create_engine
import pandas as pd

# Postgres username, password, and database name
POSTGRES_ADDRESS = 'localhost' ## INSERT YOUR DB ADDRESS IF IT'S NOT ON PANOPLY
POSTGRES_PORT = '5432'
POSTGRES_USERNAME = 'postgres' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES USERNAME
POSTGRES_PASSWORD = 'BVict31C' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES PASSWORD 
POSTGRES_DBNAME = 'local-sandbox-dev' ## CHANGE THIS TO YOUR DATABASE NAME
# A long string that contains the necessary Postgres login information
postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME,password=POSTGRES_PASSWORD,ipaddress=POSTGRES_ADDRESS,port=POSTGRES_PORT,dbname=POSTGRES_DBNAME))
# Create the connection
cnx = create_engine(postgres_str)

data=[['the','article',0],['house','noun',1],['yellow','adjective',2],
      ['the','article',4],['house','noun',5],['white','adjective',6]]
df = pd.DataFrame(data, columns=['word','classification','position'])
df_db = pd.DataFrame(columns=['word','classification','counter','start_date','end_date'])

count_series=df.groupby(['word','classification']).size()
new_df = count_series.to_frame(name = 'counter').reset_index()
df_db = new_df.to_sql('word_classification',cnx,if_exists='append',chunksize=1000)

我想插入到表中,因为我可以使用 SQL 语法:

insert into word_classification(word, classification, counter)values('hello','world',1);

目前,我在插入表格时遇到错误,因为我正在传递索引:

(psycopg2.errors.UndefinedColumn) column "index" of relation "word_classification" does not exist
LINE 1: INSERT INTO word_classification (index, word, classification...
                                         ^

[SQL: INSERT INTO word_classification (index, word, classification, counter) VALUES (%(index)s, %(word)s, %(classification)s, %(counter)s)]
[parameters: ({'index': 0, 'word': 'house', 'classification': 'noun', 'counter': 2}, {'index': 1, 'word': 'the', 'classification': 'article', 'counter': 2}, {'index': 2, 'word': 'white', 'classification': 'adjective', 'counter': 1}, {'index': 3, 'word': 'yellow', 'classification': 'adjective', 'counter': 1})]

我一直在寻找摆脱没有运气通过索引的方法。

感谢您的帮助

【问题讨论】:

  • 你想通过将“索引”传递给 INSERT 语句来实现什么? INSERT 添加一个新行并设置指定列的值。您的表中没有名为“index”的列。

标签: python pandas postgresql dataframe sql-insert


【解决方案1】:

在数据库中存储时关闭索引如下:

df_db = new_df.to_sql('word_classification',cnx,if_exists='append',chunksize=1000, index=False)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2014-01-23
    • 2023-03-19
    • 2023-03-12
    • 2018-03-28
    • 1970-01-01
    • 1970-01-01
    • 2019-08-08
    相关资源
    最近更新 更多