在 Postgres 中将表行的子集从一个数据库复制到另一个数据库的最佳方法是什么？答案

【问题标题】：What's the best way to copy a subset of a table's rows from one database to another in Postgres?在 Postgres 中将表行的子集从一个数据库复制到另一个数据库的最佳方法是什么？
【发布时间】：2010-09-29 17:08:58
【问题描述】：

我有一个生产数据库，比如一千万行。我想从过去一个小时的生产中提取大约 10,000 行并将它们复制到我的本地盒子。我该怎么做？

假设查询是：

SELECT * FROM mytable WHERE date > '2009-01-05 12:00:00';

如何获取输出，将其导出到某种转储文件，然后将该转储文件导入到我的本地数据库开发副本中 - 尽可能快速和轻松？

【问题讨论】：

标签： sql postgresql

【解决方案1】：

在psql 中，您只需将copy 与您提供给我们的查询一起使用，将其导出为CSV（或任何格式），使用\c 切换数据库并导入它。

在psql 中查看\h copy。

【讨论】：

我得到这个：错误：必须是超级用户才能复制到文件或从文件复制...有什么办法可以解决它，假设我不会以超级用户身份运行任意代码？
您应该编辑原始问题以添加此限制，以避免类似 Michael Buen 的答案。

【解决方案2】：

使用您添加的约束（不是超级用户），我找不到纯 SQL 解决方案。但是用你最喜欢的语言做这件事很简单。您打开一个到“旧”数据库的连接，另一个连接到新数据库，您在一个中选择并在另一个中插入。这是一个在 Python 中经过测试和工作的解决方案。

 #!/usr/bin/python

""" 

Copy a *part* of a database to another one. See
<http://stackoverflow.com/questions/414849/whats-the-best-way-to-copy-a-subset-of-a-tables-rows-from-one-database-to-anoth>

With PostgreSQL, the only pure-SQL solution is to use COPY, which is
not available to the ordinary user.

Stephane Bortzmeyer <bortzmeyer@nic.fr>

"""

table_name = "Tests"
# List here the columns you want to copy. Yes, "*" would be simpler
# but also more brittle.
names = ["id", "uuid", "date", "domain", "broken", "spf"]
constraint = "date > '2009-01-01'"

import psycopg2

old_db = psycopg2.connect("dbname=dnswitness-spf")
new_db = psycopg2.connect("dbname=essais")
old_cursor = old_db.cursor()
old_cursor.execute("""SET TRANSACTION READ ONLY""") # Security
new_cursor = new_db.cursor()
old_cursor.execute("""SELECT %s FROM %s WHERE %s """ % \
                       (",".join(names), table_name, constraint))
print "%i rows retrieved" % old_cursor.rowcount
new_cursor.execute("""BEGIN""")
placeholders = []
namesandvalues = {}
for name in names:
    placeholders.append("%%(%s)s" % name)
for row in old_cursor.fetchall():
    i = 0
    for name in names:
        namesandvalues[name] = row[i]
        i = i + 1
    command = "INSERT INTO %s (%s) VALUES (%s)" % \
              (table_name, ",".join(names), ",".join(placeholders))
    new_cursor.execute(command, namesandvalues)
new_cursor.execute("""COMMIT""")
old_cursor.close()
new_cursor.close()
old_db.close()
new_db.close()

【讨论】：

【解决方案3】：

源服务器：

BEGIN;

CREATE TEMP TABLE mmm_your_table_here AS
    SELECT * FROM your_table_here WHERE your_condition_here;

COPY mmm_your_table_here TO 'u:\\source.copy';

ROLLBACK;

您的本地盒子：

-- your_destination_table_here must be created first on your box

COPY your_destination_table_here FROM 'u:\\source.copy';

文章：http://www.postgresql.org/docs/8.1/static/sql-copy.html

【讨论】：

OP 明确表示（在对 Keltia 的回答的评论中）他不是超级用户，因此 COPY 不是一个选项）。
由于提问者没有编辑问题以包含超级用户约束，这应该被认为是一个很好的答案。

【解决方案4】：

来源：

psql -c "COPY (SELECT * FROM mytable WHERE ...) TO STDOUT" > mytable.copy

目的地：

psql -c "COPY mytable FROM STDIN" < mytable.copy

这假定 mytable 在源和目标中具有相同的架构和列顺序。如果不是这种情况，您可以尝试STDOUT CSV HEADER 和STDIN CSV HEADER 而不是STDOUT 和STDIN，但我没有尝试过。

如果您在 mytable 上有任何自定义触发器，您可能需要在导入时禁用它们：

psql -c "ALTER TABLE mytable DISABLE TRIGGER USER; \
         COPY mytable FROM STDIN; \
         ALTER TABLE mytable ENABLE TRIGGER USER" < mytable.copy

【讨论】：