【问题标题】:Using Powershell to export big output from Oracle to CSV使用 Powershell 将大输出从 Oracle 导出到 CSV
【发布时间】:2019-09-24 09:47:24
【问题描述】:

我需要每周从 Oracle 导出一个相当大的 CSV 文件。

我尝试了两种方法。

  1. Adapter.fill(数据集)
  2. 循环遍历列和行以一次一行保存到 CSV 文件中。

第一个在运行时内存不足(服务器机器只有 4 GB 的 RAM),第二个大约需要一个小时,因为要导出超过 400 万行。

这里是代码 #1:

#Your query. It cannot contain any double quotes otherwise it will break.
$query = "SELECT manycolumns FROM somequery"

#Oracle login credentials and other variables
$username = "username"
$password = "password"
$datasource = "database address"
$output = "\\NetworkLocation\Sales.csv"

#creates a blank CSV file and make sure it's in ASCI
Out-File $output -Force ascii

#This here will look for "Oracle.ManagedDataAccess.dll" file inside "C:\Oracle" folder. We usually have two versions of Oracle installed so the adaptor can be in different locations. Needs changing if the Oracle is installed elsewhere.
$location = Get-ChildItem -Path C:\Oracle -Filter Oracle.ManagedDataAccess.dll -Recurse -ErrorAction SilentlyContinue -Force

#Establishes connection to Oracle using the DLL file
Add-Type -Path $location.FullName
$connectionString = 'User Id=' + $username + ';Password=' + $password + ';Data Source=' + $datasource
$connection = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($connectionString)
$connection.open()
$command=$connection.CreateCommand()
$command.CommandText=$query

#Creates a table in memory and fills it with results from the query. Then, export the virtual table into CSV.
$DataSet = New-Object System.Data.DataSet
$Adapter = New-Object Oracle.ManagedDataAccess.Client.OracleDataAdapter($command)
$Adapter.Fill($DataSet)
$DataSet.Tables[0] | Export-Csv $output -NoTypeInformation

$connection.Close()

这是#2

#Your query. It cannot contain any double quotes otherwise it will break.
$query = "SELECT manycolumns FROM somequery"

#Oracle login credentials and other variables
$username = "username"
$password = "password"
$datasource = "database address"
$output = "\\NetworkLocation\Sales.csv"
$tempfile = $env:TEMP + "\Temp.csv"

#creates a blank CSV file and make sure it's in ASCI
Out-File $tempfile -Force ascii

#This here will look for "Oracle.ManagedDataAccess.dll" file inside "C:\Oracle" folder. Needs changing if the Oracle is installed elsewhere.
$location = Get-ChildItem -Path C:\Oracle -Filter Oracle.ManagedDataAccess.dll -Recurse -ErrorAction SilentlyContinue -Force

#Establishes connection to Oracle using the DLL file
Add-Type -Path $location.FullName
$connectionString = 'User Id=' + $username + ';Password=' + $password + ';Data Source=' + $datasource
$connection = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($connectionString)
$connection.open()
$command=$connection.CreateCommand()
$command.CommandText=$query

#Reads results column by column. This way you don't have to specify how many columns it has.
$reader = $command.ExecuteReader()
  while($reader.Read()) {
       $props = @{}
       for($i = 0; $i -lt $reader.FieldCount; $i+=1) {
           $name = $reader.GetName($i)
           $value = $reader.item($i)
           $props.Add($name, $value)   
       }
       #Exports each line to CSV file. Works best when the file is on local drive as it saves it after each line.
       new-object PSObject -Property $props | Export-Csv $tempfile -NoTypeInformation -Append
  }

Move-Item $tempfile $output -Force

$connection.Close()

理想情况下,我想使用第一个代码,因为它比第二个代码快得多,但可以避免内存不足。

你们知道是否有某种方法可以“填充”前 100 万条记录、将它们附加到 CSV、清理“DataSet”表、接下来的 100 万条记录等?代码运行完 CSV 后,权重约为 1.3 GB,但当它运行时,即使 8 GB 的内存也不够用(我的笔记本电脑有 8 GB,但服务器只有 4 GB,真的很难)。

任何提示将不胜感激。

【问题讨论】:

  • Oracle itself 告诉create a CSV 文件怎么样?这会表现得更好,因为数据库引擎会在本地完成所有繁重的工作。
  • 您是否需要 Oracle 的“管理员”权限才能执行此操作?我的团队只有“读取”权限,因为数据库由第三方公司拥有和更新,我们支付大量资金只是为了进行简单的更改。
  • Oracle 权限问题在DBA.SE 上会更好。考虑在那里发布一个关于如何进行 CSV 导出的全新问题,也许还有关于这些问题的最佳实践。
  • 我对权限没有任何疑问。我只想使用 Windows 调度程序每周一次将查询结果导出到 1.3 GB 的 CSV 文件中。

标签: oracle powershell csv export-to-csv


【解决方案1】:

在 *nix 社区中,我们喜欢单行代码!

您可以在 sqlplus (>= 12) 中将标记设置为“csv on”

创建查询文件

cat > query.sql <<EOF
set head off
set feed off
set timing off
set trimspool on
set term off
spool output.csv
select 
  object_id, 
  owner, 
  object_name, 
  object_type, 
  status, 
  created, 
  last_ddl_time 
from dba_objects;
spool off
exit;
EOF

像这样后台处理 output.csv

sqlplus -s -m "CSV ON DELIM ',' QUOTE ON" user/password@\"localhost:1521/<my_service>\" @query.sql

另一个选项是 SQLcl(SQL Developer CLI 工具。二进制名称:'sql' 被我重命名为 'sqlcl'

创建查询文件(注意!term on|off)

cat > query.sql <<EOF
set head off
set feed off
set timing off
set term off
set trimspool on
set sqlformat csv
spool output.csv
select 
  object_id, 
  owner, 
  object_name, 
  object_type, 
  status, 
  created, 
  last_ddl_time 
from dba_objects 
where rownum < 5;
spool off
exit;
EOF

像这样后台处理 output.csv

sqlcl -s system/oracle@\"localhost:1521/XEPDB1\" @query.sql

维奥拉!

cat output.csv 
9,"SYS","I_FILE#_BLOCK#","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
38,"SYS","I_OBJ3","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
45,"SYS","I_TS1","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
51,"SYS","I_CON1","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04

获胜者是 77k 行的 sqlplus! (删除过滤器 rownum

time sqlcl -s system/oracle@\"localhost:1521/XEPDB1\" @query.sql

real    0m23.776s
user    0m39.542s
sys     0m1.293s

time sqlplus -s -m "CSV ON DELIM ',' QUOTE ON" system/oracle@localhost/XEPDB1 @query.sql

real    0m3.066s
user    0m0.700s
sys     0m0.265s

wc -l output.csv
77480 output.csv

您可以在 SQL Developer 中试验格式。

select /*CSV|HTML|JSON|TEXT|<TONSOFOTHERFORMATS>*/ from dba_objects;

如果您将 CSV 加载到数据库中,这个工具可以做到!

https://github.com/csv2db/csv2db

祝你好运!

【讨论】:

  • 我尝试了 sqlplus 方法,但它以非常混乱的格式输出文件。 SQL Developer 方法无法安排在虚拟机上运行所以不是解决方案...
  • @JarekSzczyg 您可能需要“设置”更多配置设置,例如linesizepagesize。我建议阅读您正在使用的 Oracle 版本的 SQL*Plus 文档。
  • @Abra 是绝对正确的。您需要将这些指令添加到查询脚本中设置 pagesize 9999 并设置 linesize 200。它不在查询脚本中的原因是因为我已将 sqlplus 配置为在每次启动时加载我的 login.sql 我最喜欢的设置是已配置。
  • 我确实尝试过使用这些设置,但我无法让“set sqlformat csv”工作,我不得不使用 /*csv*/,由于某种原因它并不总是有效。此外,“SET TRIMSPOOL”也不起作用,因为它保存了带有大量空格的文件。我的 1.3 GB 文件最终会增大两倍。我从来没有在 Oracle 中使用过 sql*plus 和脚本,所以这对我来说似乎有点复杂。
  • 我尝试按照您指定的方式运行它,它是这样说的:```在命令中从第 1 行开始出错:cat > query.sql
【解决方案2】:

感谢大家的回复,我了解了我从未知道存在的 Oracle 脚本和 sql*plus。我将来可能会使用它们,但我想我必须更新我的 Oracle Developer 包。

我找到了一种方法来编辑我的代码以使用此处的文档工作: https://docs.oracle.com/database/121/ODPNT/OracleDataAdapterClass.htm#i1002865

这并不完美,因为它每 100 万行暂停一次,保存输出并重新运行重新评估它的查询(我正在运行的查询大约需要 1-2 分钟来评估)。

这与运行一个代码 x 次(其中 x 是行数的上限,以百万为单位)执行“仅获取前 1'000'000 行”然后“偏移 1'000'00 行获取下一个 1”基本相同'000'000 rows only" 等并将其保存到 CSV 附加在底部。

代码如下:

#Your query. It cannot contain any double quotes otherwise it will break.
$query = "SELECT
A lot of columns
FROM
a lot of tables joined together
WHERE
a lot of conditions
"

#Oracle login credentials and other variables
$username = myusername
$password = mypassword
$datasource = TNSnameofmyDatasource
$output = "$env:USERPROFILE\desktop\Sales.csv"

#creates a blank CSV file and make sure it's in ASCII as that's what the output of my query is
Out-File $output -Force ascii

#This here will look for "Oracle.ManagedDataAccess.dll" file inside "C:\Oracle" folder. Needs changing if the Oracle is installed elsewhere.
$location = Get-ChildItem -Path C:\Oracle -Filter Oracle.ManagedDataAccess.dll -Recurse -ErrorAction SilentlyContinue -Force

#Establishes connection to Oracle using the DLL file
Add-Type -Path $location.FullName
$connectionString = 'User Id=' + $username + ';Password=' + $password + ';Data Source=' + $datasource
$connection = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($connectionString)
$connection.open()
$command=$connection.CreateCommand()
$command.CommandText=$query

#Creates a table in memory to be filled up with results from the query using ODAC
$DataSet = New-Object System.Data.DataSet
$Adapter = New-Object Oracle.ManagedDataAccess.Client.OracleDataAdapter($command)

#Declaring variables for the loop
$fromrecord = 0
$numberofrecords = 1000000
$timesrun = 0

#Loop as long as the number of Rows in the virtual table are equal to specified $numberofrecords
while(($timesrun -eq 0) -or ($DataSet.Tables[0].Rows.Count -eq $numberofrecords))
{
$DataSet.Clear()
$Adapter.Fill($DataSet,$fromrecord,$numberofrecords,'*') | Out-Null #Suppresses writing to console the number of rows filled
Write-progress "Saved: $fromrecord Rows"
$DataSet.Tables[0] | Export-Csv $output -Append -NoTypeInformation
$fromrecord=$fromrecord+$numberofrecords
$timesrun++
}

$connection.Close()

【讨论】:

    猜你喜欢
    • 2015-01-17
    • 2017-01-02
    • 2015-10-12
    • 2013-08-26
    • 1970-01-01
    • 2017-01-17
    • 1970-01-01
    • 2013-07-15
    • 2015-02-15
    相关资源
    最近更新 更多