获取内容数据块答案

【问题标题】：Get-content chunk of data获取内容数据块
【发布时间】：2017-02-22 17:19:11
【问题描述】：

我有大约 3GB 的大文件。这些文件在顶部和底部都有信息部分，这些信息行的数量因文件而异。即

infostart1
infostart2
START-OF-DATA
line1
line2
...
...
...
linen
END-OF-DATA
infoend1
infoend2

等等。我正在尝试创建一个仅复制 START-OF-DATA 和 END-OF-DATA 之间的行的 datfile。

$DataStartLineNumber = (Select-String $File -Pattern 'START-OF-DATA' | Select-Object -ExpandProperty 'LineNumber')[0]
$DataEndLineNumber = (Select-String $File -Pattern 'END-OF-DATA' | Select-Object -ExpandProperty 'LineNumber')[-1]

我试过了：

Get-Content -Path $File | Select-Object -Index ($DataStartLineNumber..($DataEndLineNumber-2)) | Add-Content $Destination

但由于内存使用，Get-Content 失败。

我也试过了：

Get-Content -Path $File -ReadCount 10000 | Select-Object -Index ($DataStartLineNumber..$DataEndLineNumber) | Add-Content $Destination

但是，这并没有按预期工作。

我不想逐行阅读，因为它需要太长时间。有没有办法从文件中读取数据块并应用过滤器来消除“开始数据”之前和“数据结束”之后的任何内容。或者按原样复制文件，然后以有效的方式删除“START-OF-DATA”之前和“END-OF-DATA”之后的所有内容。

【问题讨论】：

stackoverflow.com/questions/4192072/… 和 stackoverflow.com/questions/32336756/alternative-to-get-content
Get-Content 对大文件很糟糕。流阅读器将是去这里的方式。运行几个标志/布尔值，以便您知道何时开始和停止文件中的处理行。
谢谢Matt，我会研究一下，希望能找到有效的方法。
谢谢你，Matt，这帮助很大，而且速度非常快。

标签： file powershell

【解决方案1】：

作为Matt mentions in the comments，您可以自己逐行读取文件，使用StreamReader。

我建议从一个循环“跳过”开始，然后用另一个循环收集相关行：

$Reader = New-Object System.IO.StreamReader 'C:\Path\to\file.txt'
$StartBoundary = 'START-OF-DATA'
$EndBoundary = 'END-OF-DATA'

# Skip ahead to the starting boundary
while(-not($Reader.EndOfStream) -and ($line = $Reader.ReadLine()) -notmatch $StartBoundary){ <#nothing to be done here#> }

# Output all lines until we hit the end boundary
$lines = while(-not($Reader.EndOfStream) -and ($line = $Reader.ReadLine()) -notmatch $EndBoundary){ $line }

# $lines now contain the data

【讨论】：

谢谢 Mathias，我已将您的方法用于开始行和结束行。它有效:)

【解决方案2】：

我不知道你的内存问题是否会得到解决，但试试这个

$template=@"
{Content*:START-OF-DATA
line1
END-OF-DATA}
{Content*:START-OF-DATA
line2
Line3
END-OF-DATA}
"@

Get-ChildItem "C:\temp\test" -file | foreach {

  $Data=Get-Content $_.FullName | ConvertFrom-String -TemplateContent $template

  if ($Data -ne $null)
  {
     [pscustomobject]@{FullName=$_.FullName; Content=$Data} 
  }



} | Format-Table -Wrap

【讨论】：

我没有机会尝试这个解决方案，但谢谢 Esperento57