【发布时间】:2021-05-20 14:21:28
【问题描述】:
假设我有一个如下所示的日志文件:
'2021-05-18T14:01:13Z UTC [ db=dev user=rdsdb pid=11593 userid=1 xid=19771457 ]' LOG: BEGIN;
'2021-05-18T14:01:13Z UTC [ db=dev user=rdsdb pid=11593 userid=1 xid=19771457 ]' LOG: SET datestyle TO ISO;
'2021-05-18T14:01:13Z UTC [ db=dev user=rdsdb pid=11593 userid=1 xid=19771457 ]' LOG: SET TRANSACTION READ ONLY;
'2021-05-18T14:01:13Z UTC [ db=dev user=rdsdb pid=11593 userid=1 xid=19771457 ]' LOG: SET STATEMENT_TIMEOUT TO 300000;
'2021-05-18T14:01:13Z UTC [ db=dev user=rdsdb pid=11593 userid=1 xid=19771457 ]' LOG: /* hash: f71f47211eca32d63469fba576bbbb19 */
SELECT TRIM(application_name) AS application_name
, MAX(recordtime) AS last_used
FROM stl_connection_log
WHERE dbname <> 'dev'
AND username <> 'rdsdb'
AND ( application_name LIKE 'RedshiftUserLastLogin-v%'
OR application_name LIKE 'RedshiftSystemTablePersistence-v%'
OR application_name LIKE 'AnalyzeVacuumUtility-v%'
OR application_name LIKE 'ColumnEncodingUtility-v%' )
GROUP BY application_name
LIMIT 50
;
'2021-05-18T14:01:13Z UTC [ db=dev user=rdsdb pid=11593 userid=1 xid=19771457 ]' LOG: SELECT btrim( pg_catalog.stll_connection_log.application_name ) AS application_name, MAX(pg_catalog.stll_connection_log.recordtime) AS last_used FROM pg_catalog.stll_connection_log WHERE pg_catalog.stll_connection_log.dbname <> 'dev'::Char(3) AND pg_catalog.stll_connection_log.username <> 'rdsdb'::Char(5) AND (pg_catalog.stll_connection_log.application_name LIKE 'AnalyzeVacuumUtility-v%' OR pg_catalog.stll_connection_log.application_name LIKE 'ColumnEncodingUtility-v%' OR pg_catalog.stll_connection_log.application_name LIKE 'RedshiftSystemTablePersistence-v%' OR pg_cata
我想读取以引号 + 时间戳开头的每一“行”。每行都以此开头:'2021-05-18T14:01:13Z UTC [ db=dev user=rdsdb pid=11593 userid=1 xid=19771457 ]'(我们称之为行分隔符),然后将每一行分成相应的列(查询、pid、用户、数据库等)。我该如何以最简单的方式做到这一点?
问题是行分隔符没有出现在每个换行符上。如您所见,有一个“行”,查询位于多个换行符上,因此在 python 中读取文本文件时,我担心会有几行没有分隔符。那么这是否意味着当我在 python 中从文件中读取行时,我需要首先检查它是否以行分隔符开头,如果不是,则继续将该行附加到内存,直到达到行分隔符?
理想情况下,行:
'2021-05-18T14:01:13Z UTC [ db=dev user=rdsdb pid=11593 userid=1 xid=19771457 ]' LOG: /* hash: f71f47211eca32d63469fba576bbbb19 */
SELECT TRIM(application_name) AS application_name
, MAX(recordtime) AS last_used
FROM stl_connection_log
WHERE dbname <> 'dev'
AND username <> 'rdsdb'
AND ( application_name LIKE 'RedshiftUserLastLogin-v%'
OR application_name LIKE 'RedshiftSystemTablePersistence-v%'
OR application_name LIKE 'AnalyzeVacuumUtility-v%'
OR application_name LIKE 'ColumnEncodingUtility-v%' )
GROUP BY application_name
LIMIT 50
;
被分解成一个 csv 行,其中:
timestamp = 2021-05-18T14:01:13Z UTC
db = dev
user = rdsdb
pid = 11593
userid = 1
xid = 19771457
query = `SELECT TRIM(application_name) AS application_name, MAX(recordtime AS last_used FROM stl_connection_log WHERE dbname <> 'dev' AND username <> 'rdsdb' AND (application_name LIKE 'RedshiftUserLastLogin-v%' OR application_name LIKE 'RedshiftSystemTablePersistence-v% OR application_name LIKE 'AnalyzeVacuumUtility-v%' OR application_name LIKE 'ColumnEncodingUtility-v%' ) GROUP BY application_name LIMIT 50';
和行:
'2021-05-18T14:01:13Z UTC [ db=dev user=rdsdb pid=11593 userid=1 xid=19771457 ]' LOG: BEGIN;
分为:
timestamp = 2021-05-18T14:01:13Z UTC
db = dev
user = rdsdb
pid = 11593
userid = 1
xid = 19771457
query = `LOG: BEGIN';
【问题讨论】: