【问题标题】:Reading from the Pipeline Stream in PowerShell从 PowerShell 中的管道流中读取
【发布时间】:2015-11-08 01:31:23
【问题描述】:

背景

我希望编写使用Microsoft.VisualBasic.FileIO.TextFieldParser 来解析一些csv 数据的代码。 我为其生成此数据的系统不理解引号;所以我无法逃脱分隔符;而是必须更换它。 我找到了使用上述文本解析器的解决方案,但我只看到人们将它用于文件输入。我宁愿将数据保存在内存中/利用这个类的构造函数,它接受一个流作为输入。

理想情况下,它可以直接从用于管道的任何内存流中获取馈送;但我不知道如何访问它。 在我当前的代码中,我创建了自己的内存流并从管道向它提供数据;然后尝试从中读取。不幸的是,我错过了一些东西。

问题

  1. 如何在 PowerShell 中读取/写入内存流?
  2. 是否可以直接从输入函数管道的流中读取?

代码

clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
#[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null

function Clean-CsvStream {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory = $true, ValueFromPipeline=$true)]
        [string]$Line
        ,
        [Parameter(Mandatory = $false)]
        [char]$Delimiter = ','
    )
    begin {
        [System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
        [System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
        [System.IO.StreamReader]$readStream = New-Object System.IO.StreamReader($memStream)
        #[Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
        #$Parser.SetDelimiters($Delimiter)
        #$Parser.HasFieldsEnclosedInQuotes = $true
        #$writeStream.AutoFlush = $true
    }
    process {
        $writeStream.WriteLine($_)
        #$writeStream.Flush() #maybe we need to flush it before the reader will see it?
        write-output $readStream.ReadLine()
        #("Line: {0:000}" -f $Parser.LineNumber)
        #write-output $Parser.ReadFields()
    }
    end {
        #close streams and dispose (dodgy catch all's in case object's disposed before we call Dispose)
        #try {$Parser.Close(); $Parser.Dispose()} catch{} 
        try {$readStream.Close(); $readStream.Dispose()} catch{} 
        try {$writeStream.Close(); $writeStream.Dispose()} catch{} 
        try {$memStream.Close(); $memStream.Dispose()} catch{} 
    }
}
1,2,3,4 | Clean-CsvStream -$Delimiter ';' #nothing like the real data, but I'm not interested in actual CSV cleansing at this point

解决方法

与此同时,我的解决方案只是对对象的属性而不是 CSV 行进行此替换。

$cols = $objectArray | Get-Member | ?{$_.MemberType -eq 'NoteProperty'} | select -ExpandProperty name
$objectArray | %{$csvRow =$_; ($cols | %{($csvRow.$_ -replace "[`n,]",':')}) -join ',' }

更新

我意识到缺少的代码是$memStream.Seek(0, [System.IO.SeekOrigin]::Begin) | out-null;

但是,这并不完全符合预期;即我的 CSV 的第一行显示两次,而其他输出的顺序错误;所以大概我误解了如何使用Seek

clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null

function Clean-CsvStream {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory = $true, ValueFromPipeline=$true)]
        [string]$CsvRow
        ,
        [Parameter(Mandatory = $false)]
        [char]$Delimiter = ','
        ,
        [Parameter(Mandatory = $false)]
        [regex]$InvalidCharRegex 
        ,
        [Parameter(Mandatory = $false)]
        [string]$ReplacementString 

    )
    begin {
        [System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
        [System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
        [Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
        $Parser.SetDelimiters($Delimiter)
        $Parser.HasFieldsEnclosedInQuotes = $true
        $writeStream.AutoFlush = $true
    }
    process {
        if ($InvalidCharRegex) {
            $writeStream.WriteLine($CsvRow)
            #flush here if not auto
            $memStream.Seek(0, [System.IO.SeekOrigin]::Begin) | out-null;
            write-output (($Parser.ReadFields() | %{$_ -replace $InvalidCharRegex,$ReplacementString }) -join $Delimiter)
        } else { #if we're not replacing anything, keep it simple
            $CsvRow
        }
    }
    end {
        "end {"
        try {$Parser.Close(); $Parser.Dispose()} catch{} 
        try {$writeStream.Close(); $writeStream.Dispose()} catch{} 
        try {$memStream.Close(); $memStream.Dispose()} catch{} 
        "} #end"
    }
}
$csv = @(
    (new-object -TypeName PSCustomObject -Property @{A="this is regular text";B="nothing to see here";C="all should be good"}) 
    ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text2";B="what the`nLine break!";C="all should be good2"}) 
    ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text3";B="ooh`r`nwindows line break!";C="all should be good3"}) 
    ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text4";B="I've got;a semi";C="all should be good4"}) 
    ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text5";B="""You're Joking!"" said the Developer`r`n""No honestly; it's all about the secret VB library"" responded the Google search result";C="all should be good5"})
) | convertto-csv -Delimiter ';' -NoTypeInformation
$csv | Clean-CsvStream -Delimiter ';' -InvalidCharRegex "[`r`n;]" -ReplacementString ':' 

【问题讨论】:

    标签: powershell streamwriter memorystream


    【解决方案1】:

    在玩了很多之后,这似乎可行:

    clear-host
    [Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
    [Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null
    
    function Clean-CsvStream {
        [CmdletBinding()]
        param (
            [Parameter(Mandatory = $true, ValueFromPipeline=$true)]
            [string]$CsvRow
            ,
            [Parameter(Mandatory = $false)]
            [char]$Delimiter = ','
            ,
            [Parameter(Mandatory = $false)]
            [regex]$InvalidCharRegex 
            ,
            [Parameter(Mandatory = $false)]
            [string]$ReplacementString 
        )
        begin {
            [bool]$IsSimple = [string]::IsNullOrEmpty($InvalidCharRegex) 
            if(-not $IsSimple) {
                [System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
                [System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
                [Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
                $Parser.SetDelimiters($Delimiter)
                $Parser.HasFieldsEnclosedInQuotes = $true
            }
        }
        process {
            if ($IsSimple) {
                $CsvRow
            } else { #if we're not replacing anything, keep it simple
                [long]$seekStart = $memStream.Seek(0, [System.IO.SeekOrigin]::Current) 
                $writeStream.WriteLine($CsvRow)
                $writeStream.Flush()
                $memStream.Seek($seekStart, [System.IO.SeekOrigin]::Begin) | out-null 
                write-output (($Parser.ReadFields() | %{$_ -replace $InvalidCharRegex,$ReplacementString }) -join $Delimiter)
            }
        }
        end {
            if(-not $IsSimple) {
                try {$Parser.Close(); $Parser.Dispose()} catch{} 
                try {$writeStream.Close(); $writeStream.Dispose()} catch{} 
                try {$memStream.Close(); $memStream.Dispose()} catch{} 
            }
        }
    }
    $csv = @(
        (new-object -TypeName PSCustomObject -Property @{A="this is regular text";B="nothing to see here";C="all should be good"}) 
        ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text2";B="what the`nLine break!";C="all should be good2"}) 
        ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text3";B="ooh`r`nwindows line break!";C="all should be good3"}) 
        ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text4";B="I've got;a semi";C="all should be good4"}) 
        ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text5";B="""You're Joking!"" said the Developer`r`n""No honestly; it's all about the secret VB library"" responded the Google search result";C="all should be good5"})
    ) | convertto-csv -Delimiter ';' -NoTypeInformation
    $csv | Clean-CsvStream -Delimiter ';' -InvalidCharRegex "[`r`n;]" -ReplacementString ':' 
    

    1. 写之前先求当前位置
    2. 然后写
    3. 然后刷新(如果不是自动)
    4. 然后寻找数据的开头
    5. 然后阅读
    6. 重复

    但我不确定这是否正确;因为我找不到任何好的例子或文档解释,所以只是玩弄,直到有些东西起作用了,这有点含糊。

    如果有人知道如何直接从管道流中读取,我仍然很感兴趣;即消除奖金流的额外开销。


    对于@M.R.的评论

    抱歉,来晚了;如果它对其他人有用:

    如果行尾分隔符是 CrLf (\r\n) 而不仅仅是 Cr (\r),那么很容易区分记录/行的结尾和字段内的换行符:

    Get-Content -LiteralPath 'D:\test\file to clean.csv' -Delimiter "`r`n" | 
    %{$_.ToString().TrimEnd("`r`n")} | #the delimiter is left on the end of the string; remove it
    %{('"{0}"' -f $_) -replace '\|','"|"'} | #insert quotes at start and end of line, as well as around delimeters
    ConvertFrom-Csv -Delimiter '|' #treat the pipeline content as a valid pipe delimitted csv
    

    但是,如果不是,您将无法分辨哪个 Cr 是记录的结尾,而哪个只是文本中的中断。你可以通过计算管道的数量来稍微解决这个问题;即好像你有 5 列,在第四个分隔符之前的任何 CR 都是换行符而不是记录的结尾。但是,如果有另一个换行符,您无法确定这是最后一列数据中的换行符,还是该行的末尾。如果您知道第一列或最后一列不包含换行符(或两者都包含),则可以解决此问题。对于所有这些更复杂的场景,我怀疑正则表达式是最好的选择;使用 select-string 之类的东西来应用它。如果需要的话;在此处作为问题发布,提供您已经尝试过的确切要求和信息,其他人可以帮助您。

    【讨论】:

    • 这可能正是我需要的,但是不熟悉 PowerShell 脚本我不知道如何使用代码。如果我在以下位置 D:\test\file 有一个 csv 文件来 clean.csv,那么我该如何运行该脚本?分隔符是管道“|” , 文本限定符是 none 并且每行后面都有一个 CR。有些字段在中间的某个地方有一个 CR,这个 CR 必须被删除并替换为一个空格......
    • @M.R.如果您仍然感兴趣;请参阅此答案末尾的更新部分。遗憾的是,当您发表评论时,我最初错过了这条评论。
    • 谢谢你,约翰,是的,我仍然对脚本感兴趣,我会试试看!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-09-11
    • 1970-01-01
    相关资源
    最近更新 更多