【问题标题】:Postgres query over ODBC a order of magnitude slower?Postgres 通过 ODBC 查询要慢一个数量级?
【发布时间】:2011-02-22 19:19:02
【问题描述】:

我们有一个应用程序,它通过 psqlodbc 09.00.0200 驱动程序以下列方式从 PostgreSQL 9.0.3 数据库中获取一些数据:

1) SQLExecDirect 与START TRANSACTION
2) 带有

的 SQLExecDirect
DECLARE foo SCROLL CURSOR FOR  
SELECT table.alotofcolumns  
FROM table  
ORDER BY name2, id LIMIT 10000

3) SQLPrepare with

SELECT table.alotofcolumns, l01.languagedescription
FROM fetchcur('foo', ?, ?) table (column definitions) 
LEFT OUTER JOIN languagetable l01 ON (l01.lang = 'EN'
                                  AND l01.type = 'some type'
                                  AND l01.grp  = 'some group'
                                  AND l01.key  = table.somecolumn)
[~20 more LEFT OUTER JOINS in the same style, but for an other column]

4) SQLExecute,param1 设置为 SQL_FETCH_RELATIVE,param2 设置为 1
5) SQLExecute,param1 设置为 SQL_FETCH_RELATIVE,param2 设置为 -1
6) SQLExecute,param1 设置为 SQL_FETCH_RELATIVE,param2 设置为 0
7) 全部解除分配,关闭游标,结束事务

函数 fetchcur 执行FETCH RELATIVE $3 IN $1 INTO rec,其中 rec 是一条记录并返回该记录。步骤 4-6 在用户请求下一次又一次地执行,同时在此事务中执行了更多查询。在发出另一个用户请求之前也可能需要相当长的时间。通常查询需要很长时间:

4) ~ 130 毫秒
5) ~ 115 毫秒
6) ~ 110 毫秒

这对于快速的用户体验来说通常太慢了。所以我在 psql 命令行中尝试了相同的语句,并打开了 \timing。对于步骤 3-6,我使用了该语句:

3)

PREPARE p_foo (INTEGER, INTEGER) AS 
SELECT table.alotofcolumns, l01.languagedescription
FROM fetchcur('foo', $1, $2) table (column definitions) 
LEFT OUTER JOIN languagetable l01 ON (l01.lang = 'EN'
                                  AND l01.type = 'some type'
                                  AND l01.grp  = 'some group'
                                  AND l01.key  = table.somecolumn)
[~20 more LEFT OUTER JOINS in the same style, but for an other column]

4-6)

EXPLAIN ANALYZE EXECUTE p_foo (6, x);

第一次 EXECUTE 花费了 89 毫秒,但随后降至约 7 毫秒。即使我在两次执行之间等待几分钟,每次查询的时间也保持在 10 毫秒以下。那么额外的 100 毫秒会在哪里消失呢?应用程序和数据库在同一个系统上,所以网络延迟应该不是问题。每个 LEFT OUTER JOIN 只返回一行,并且只将该结果的一列添加到结果集中。所以结果是一行有大约 200 列,主要是 VARCHAR 和 INTEGER 类型。但这不应该是在同一台机器上传输大约 100 毫秒的数据。所以任何提示都会有所帮助。

机器有 2 GB 的 RAM,参数设置为:

shared_buffers = 512MB
effective_cache_size = 256MB
work_mem = 16MB
maintenance_work_mem = 256MB
temp_buffers = 8MB
wal_buffers= 1MB


编辑:我刚刚发现了如何从 psqlodbc 创建一个 mylog,但我在其中找不到时间值。

EDIT2:也可以为每一行添加一个时间戳。这确实表明在 psqlodbc 驱动程序收到响应之前需要超过 100 毫秒。所以我再次尝试使用psql 并添加选项-h 127.0.0.1 以确保它也通过TCP/IP。 psql 的结果是

00:07:51.026 [3086550720][SQLExecute]
00:07:51.026 [3086550720]PGAPI_Execute: entering...1
00:07:51.026 [3086550720]PGAPI_Execute: clear errors...
00:07:51.026 [3086550720]prepareParameters was not called, prepare state:3
00:07:51.026 [3086550720]SC_recycle_statement: self= 0x943b1e8
00:07:51.026 [3086550720]PDATA_free_params:  ENTER, self=0x943b38c
00:07:51.026 [3086550720]PDATA_free_params:  EXIT
00:07:51.026 [3086550720]Exec_with_parameters_resolved: copying statement params: trans_status=6, len=10128, stmt='SELECT [..]'
00:07:51.026 [3086550720]ResolveOneParam: from(fcType)=-15, to(fSqlType)=4(23)
00:07:51.026 [3086550720]cvt_null_date_string=0 pgtype=23 buf=(nil)
00:07:51.026 [3086550720]ResolveOneParam: from(fcType)=4, to(fSqlType)=4(23)
00:07:51.026 [3086550720]cvt_null_date_string=0 pgtype=23 buf=(nil)
00:07:51.026 [3086550720]   stmt_with_params = 'SELECT [..]'
00:07:51.027 [3086550720]about to begin SC_execute
00:07:51.027 [3086550720]       Sending SELECT statement on stmt=0x943b1e8, cursor_name='SQL_CUR0x943b1e8' qflag=0,1
00:07:51.027 [3086550720]CC_send_query: conn=0x9424668, query='SELECT [..]'
00:07:51.027 [3086550720]CC_send_query: conn=0x9424668, query='SAVEPOINT _EXEC_SVP_0x943b1e8'
00:07:51.027 [3086550720]send_query: done sending query 35bytes flushed
00:07:51.027 [3086550720]in QR_Constructor
00:07:51.027 [3086550720]exit QR_Constructor
00:07:51.027 [3086550720]read 21, global_socket_buffersize=4096
00:07:51.027 [3086550720]send_query: got id = 'C'
00:07:51.027 [3086550720]send_query: ok - 'C' - SAVEPOINT
00:07:51.027 [3086550720]send_query: setting cmdbuffer = 'SAVEPOINT'
00:07:51.027 [3086550720]send_query: returning res = 0x8781c90
00:07:51.027 [3086550720]send_query: got id = 'Z'
00:07:51.027 [3086550720]QResult: enter DESTRUCTOR
00:07:51.027 [3086550720]QResult: in QR_close_result
00:07:51.027 [3086550720]QResult: free memory in, fcount=0
00:07:51.027 [3086550720]QResult: free memory out
00:07:51.027 [3086550720]QResult: enter DESTRUCTOR
00:07:51.027 [3086550720]QResult: exit close_result
00:07:51.027 [3086550720]QResult: exit DESTRUCTOR
00:07:51.027 [3086550720]send_query: done sending query 1942bytes flushed
00:07:51.027 [3086550720]in QR_Constructor
00:07:51.027 [3086550720]exit QR_Constructor
00:07:51.027 [3086550720]read -1, global_socket_buffersize=4096
00:07:51.027 [3086550720]Lasterror=11
00:07:51.133 [3086550720]!!!  poll ret=1 revents=1
00:07:51.133 [3086550720]read 4096, global_socket_buffersize=4096
00:07:51.133 [3086550720]send_query: got id = 'T'
00:07:51.133 [3086550720]QR_fetch_tuples: cursor = '', self->cursor=(nil)
00:07:51.133 [3086550720]num_fields = 166
00:07:51.133 [3086550720]READING ATTTYPMOD
00:07:51.133 [3086550720]CI_read_fields: fieldname='id', adtid=23, adtsize=4, atttypmod=-1 (rel,att)=(0,0)
[last two lines repeated for the other columns]
00:07:51.138 [3086550720]QR_fetch_tuples: past CI_read_fields: num_fields = 166
00:07:51.138 [3086550720]MALLOC: tuple_size = 100, size = 132800
00:07:51.138 [3086550720]QR_next_tuple: inTuples = true, falling through: fcount = 0, fetch_number = 0
00:07:51.139 [3086550720]qresult: len=3, buffer='282'
[last line repeated for the other columns]
00:07:51.140 [3086550720]end of tuple list -- setting inUse to false: this = 0x87807e8 SELECT 1
00:07:51.140 [3086550720]_QR_next_tuple: 'C' fetch_total = 1 & this_fetch = 1
00:07:51.140 [3086550720]QR_next_tuple: backend_rows < CACHE_SIZE: brows = 0, cache_size = 0
00:07:51.140 [3086550720]QR_next_tuple: reached eof now
00:07:51.140 [3086550720]send_query: got id = 'Z'
00:07:51.140 [3086550720]     done sending the query:
00:07:51.140 [3086550720]extend_column_bindings: entering ... self=0x943b270, bindings_allocated=166, num_columns=166
00:07:51.140 [3086550720]exit extend_column_bindings=0x9469500
00:07:51.140 [3086550720]SC_set_Result(943b1e8, 87807e8)
00:07:51.140 [3086550720]QResult: enter DESTRUCTOR
00:07:51.140 [3086550720]retval=0
00:07:51.140 [3086550720]CC_send_query: conn=0x9424668, query='RELEASE _EXEC_SVP_0x943b1e8'
00:07:51.140 [3086550720]send_query: done sending query 33bytes flushed
00:07:51.140 [3086550720]in QR_Constructor
00:07:51.140 [3086550720]exit QR_Constructor
00:07:51.140 [3086550720]read -1, global_socket_buffersize=4096
00:07:51.140 [3086550720]Lasterror=11
00:07:51.140 [3086550720]!!!  poll ret=1 revents=1
00:07:51.140 [3086550720]read 19, global_socket_buffersize=4096
00:07:51.140 [3086550720]send_query: got id = 'C'
00:07:51.140 [3086550720]send_query: ok - 'C' - RELEASE
00:07:51.140 [3086550720]send_query: setting cmdbuffer = 'RELEASE'
00:07:51.140 [3086550720]send_query: returning res = 0x877cd30
00:07:51.140 [3086550720]send_query: got id = 'Z'
00:07:51.140 [3086550720]QResult: enter DESTRUCTOR
00:07:51.140 [3086550720]QResult: in QR_close_result
00:07:51.140 [3086550720]QResult: free memory in, fcount=0
00:07:51.140 [3086550720]QResult: free memory out
00:07:51.140 [3086550720]QResult: enter DESTRUCTOR
00:07:51.140 [3086550720]QResult: exit close_result
00:07:51.140 [3086550720]QResult: exit DESTRUCTOR

EDIT3:我意识到我之前在psql 测试中没有使用来自mylog 的相同查询。似乎psqlodbc 没有将PREPARE 用于SQLPrepare 和SQLExecute。它添加参数值并发送查询。正如 araqnid 建议的那样,我将 log_duration 参数设置为 0,并将 postgresql 日志的结果与应用程序和 psql 的结果进行比较。结果如下:

                            psql/app     pglog
query executed from app:      110 ms     70 ms
psql with PREPARE/EXECUTE:     10 ms      5 ms
psql with full SELECT:         85 ms     70 ms

那么如何解释这些值呢?似乎大部分时间都花在将完整查询发送到数据库(10000 个字符)并生成执行计划上。如果确实如此,将对 SQLPrepare 和 SQLExecute 的调用更改为通过 SQLExecDirect 执行的显式 PREPARE/EXECUTE 语句可以解决问题。有异议吗?

【问题讨论】:

  • "00:07:51.133 [3086550720]!!! poll ret=1 revents=1" - 您是否添加了感叹号来突出显示这一行,或者它们是输出的一部分?就在上面,它发送了一个保存点命令并快速得到了响应,因此它似乎不是网络延迟。尽管它似乎为实际查询结果返回了更大的数据块,这并不奇怪......
  • 其他可以尝试的东西 - 为该数据库用户设置log_duration 并查看服务器端的 postgresql 是否记录了当您从服务器运行它时与直接使用 psql 运行它相比需要更长的时间。
  • 感叹号是输出的一部分,我没有添加它们。 psqlodbc 也可以通过使用select() 而不是poll() 的选项重新编译。试过没有任何区别。感谢log_duration 的提示。我会试试的。

标签: sql database performance postgresql odbc


【解决方案1】:

终于找到了问题所在,psqlodbc的SQLPrepare/SQLExecute默认不执行PREPARE/EXECUTE。驱动程序自己构建 SELECT 并发送它。

解决方案是将UseServerSidePrepare=1 添加到odbc.ini 或SQLDriverConnect 的ConnectionString。从应用程序测量的一个查询的总执行时间从 >100ms 下降到 5-10ms。

【讨论】:

    【解决方案2】:

    我认为 psql 和您的程序之间的时间没有可比性。

    也许我遗漏了一些东西,但在 psql 中,您只是准备语句,而从未真正获取数据。 EXPLAIN PLAN 也没有发送数据

    因此,时间差很可能是从服务器向客户端发送所有行所需的网络流量。

    减少此时间的唯一方法是获得更快的网络或选择更少的列。您真的需要 "alotofcolumns" 中包含的所有列吗?

    【讨论】:

    • ANALYZEEXPLAIN ANALYZE 部分确实执行了查询。我在没有EXPLAIN ANALYZE 的情况下再次尝试了查询来验证,除了第一个查询之外,它在 10 毫秒之前和以下都接近结果。在我们的例子中,客户端和服务器在同一台机器上。所以它们之间只有来自本地主机的网络堆栈。那不应该加起来那么多时间还是我错过了什么?我仍然认为如果检测到丢失的列,稍后执行另一个查询的成本会更高。而且它只有一排。我认为这些列中最多只有几 KB 的数据
    • 你是对的EXPLAIN ANALYZE 确实运行了查询,但它没有将结果发送给客户端。如果发送的数据量很大(行多,列数据大),那么这仍然会花费大量时间(需要在两边分配内存,数据需要通过各个软件层推送等等) .
    猜你喜欢
    • 1970-01-01
    • 2021-04-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多