【问题标题】:Powershell csv row column transpose and manipulationPowershell csv行列转置和操作
【发布时间】:2015-11-25 11:02:51
【问题描述】:

我是 Powershell 的新手。我试图针对中等大小的基于 csv 的记录(大约 10000 行)处理/转置行列。原始 CSV 包含大约 10000 行和 3 列 ("Time","Id","IOT"),如下所示:

"Time","Id","IOT" 
"00:03:56","23","26" 
"00:03:56","24","0" 
"00:03:56","25","0" 
"00:03:56","26","1" 
"00:03:56","27","0" 
"00:03:56","28","0" 
"00:03:56","29","0" 
"00:03:56","30","1953" 
"00:03:56","31","22" 
"00:03:56","32","39" 
"00:03:56","33","8" 
"00:03:56","34","5" 
"00:03:56","35","269" 
"00:03:56","36","5" 
"00:03:56","37","0" 
"00:03:56","38","0" 
"00:03:56","39","0" 
"00:03:56","40","1251" 
"00:03:56","41","103" 
"00:03:56","42","0" 
"00:03:56","43","0" 
"00:03:56","44","0" 
"00:03:56","45","0" 
"00:03:56","46","38" 
"00:03:56","47","14" 
"00:03:56","48","0" 
"00:03:56","49","0" 
"00:03:56","2013","0" 
"00:03:56","2378","0" 
"00:03:56","2380","32" 
"00:03:56","2758","0" 
"00:03:56","3127","0" 
"00:03:56","3128","0" 
"00:09:16","23","22" 
"00:09:16","24","0" 
"00:09:16","25","0" 
"00:09:16","26","2" 
"00:09:16","27","0" 
"00:09:16","28","0" 
"00:09:16","29","21" 
"00:09:16","30","48" 
"00:09:16","31","0" 
"00:09:16","32","4" 
"00:09:16","33","4" 
"00:09:16","34","7" 
"00:09:16","35","382" 
"00:09:16","36","12" 
"00:09:16","37","0" 
"00:09:16","38","0" 
"00:09:16","39","0" 
"00:09:16","40","1882" 
"00:09:16","41","42" 
"00:09:16","42","0" 
"00:09:16","43","3" 
"00:09:16","44","0" 
"00:09:16","45","0" 
"00:09:16","46","24" 
"00:09:16","47","22" 
"00:09:16","48","0" 
"00:09:16","49","0" 
"00:09:16","2013","0" 
"00:09:16","2378","0" 
"00:09:16","2380","19" 
"00:09:16","2758","0" 
"00:09:16","3127","0" 
"00:09:16","3128","0" 
... 
... 
... 

我尝试使用从https://gallery.technet.microsoft.com/scriptcenter/Powershell-Script-to-7c8368be下载的基于 powershell 脚本的代码进行转置
基本上我的powershell代码如下:

$b = @() 
    foreach ($Time in $a.Time | Select -Unique) { 
        $Props = [ordered]@{ Time = $time } 
        foreach ($Id in $a.Id | Select -Unique){ 
            $IOT = ($a.where({ $_.Id -eq $Id -and $_.time -eq $time })).IOT 
            $Props += @{ $Id = $IOT } 
        } 
        $b += New-Object -TypeName PSObject -Property $Props 
    } 
$b | FT -AutoSize 
$b | Out-GridView 

上面的代码可以给我预期的结果,所有"Id" 值将成为列标题,而所有"Time" 值将成为唯一行,"IOT" 值作为"Id" x "Time" 的交集如下:

"Time","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","2013","2378","2380","2758","3127","3128" 
"00:03:56","26","0","0","1","0","0","0","1953","22","39","8","5","269","5","0","0","0","1251","103","0","0","0","0","38","14","0","0","0","0","32","0","0","0" 
"00:09:16","22","0","0","2","0","0","21","48","0","4","4","7","382","12","0","0","0","1882","42","0","3","0","0","24","22","0","0","0","0","19","0","0","0" 

虽然它只涉及几百行,但结果很快就如预期的那样出来了,但是现在在处理具有 10000 行的整个 csv 文件时出现问题,上面的脚本“继续执行”并且似乎无法完成很长的时间(小时),无法吐出任何结果。 那么,如果一些来自 stackoverflow 的 powershell 专家可以帮助评估上面的代码,并且可能可以帮助修改以加快结果速度?

非常感谢您的建议

【问题讨论】:

  • 我会假设 $a 来自 Import-CSV?
  • 而且...每次都会共享所有相同的ID?
  • 这是真的 Mat,$a 是 import-csv 中的数组。从原始 csv 文件中,每个“时间”将重复/共享所有相同的“Id”。

标签: powershell transpose


【解决方案1】:

10000 条记录很多,但我认为建议 streamreader* 并手动解析 CSV 是不够的。不过,最不利于您的是以下行:

$b += New-Object -TypeName PSObject -Property $Props 

PowerShell 在这里所做的是创建一个新数组并将该元素附加到其中。这是一个非常占用内存的操作,您要重复 1000 次。在这种情况下,更好的做法是利用管道来发挥自己的优势。

$data = Import-Csv -Path "D:\temp\data.csv"
$headers = $data.ID  | Sort-Object {[int]$_}  -Unique

$data | Group-Object Time | ForEach-Object{
    $props = [ordered]@{Time = $_.Name}
    foreach($header in $headers){
        $props."$header" = ($_.Group | Where-Object{$_.ID -eq $header}).IOT
    }
    [pscustomobject]$props
} |  export-csv d:\temp\testing.csv -NoTypeInformation

$data 将你在内存中的整个文件作为一个对象。需要获取所有将成为列标题的$headers

按每个Time 对数据进行分组。然后在每个时间对象中,我们获取每个 ID 的值。如果该 ID 在此期间不存在,则该条目将显示为空。

这不是最好的方法,但应该比你的更快。我在一分钟内跑了 10000 条记录(3 次传球平均 51 秒)。如果可以,我会进行基准测试。

我只用我自己的数据运行了一次你的代码,花了 13 分钟。我认为可以肯定地说我的性能更快。


虚拟数据是用这个逻辑制作的,仅供参考

1..100 | %{
 $time = get-date -Format "hh:mm:ss"
 sleep -Seconds 1
    1..100 | % {

        [pscustomobject][ordered]@{
            time = $time 
            id = $_
            iot = Get-Random -Minimum 0 -Maximum 7
        } 
    }
} | Export-Csv d:\temp\data.csv -notypeinformation

* 对于您的流式阅读器来说,这不是一个很好的例子。只是指出它表明它是读取大文件的更好方法。只需要逐行解析字符串。

【讨论】:

  • 非常感谢马特,上面的代码确实加速了很多。实际上我提到的 csv 文件只是日常类型之一,我可能需要为每月 csv 文件处理数十万行的相同内容(每个 csv 文件大小约为 20 mbs),你认为“streamreader”需要案例吗?如果这是必须的,如果您不介意,请您使用流式阅读器技术举一些例子吗?
猜你喜欢
  • 1970-01-01
  • 2019-08-26
  • 2020-01-09
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-07-31
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多