【问题标题】:Remove everything before a certain number of delimiter characters from a file从文件中删除一定数量的分隔符之前的所有内容
【发布时间】:2017-03-20 03:02:36
【问题描述】:

我有一个逗号分隔的数据文件,但没有新行分隔标题字段和数据字段,并且无法更改。此外,即使在标题部分之后,也没有任何新行,例如 CR/LF,我看到的唯一一致性是分隔符。数据本质上是同一行上的一个大字符串,只有逗号分隔字段。

示例标题数据

"success":true,"dev":"id":999999999,"name":"device name","tags":"id":99999,"name":"devicesname","dataType":"Int","description":"my description","alarmHint":"","value":0.0,"quality":"good","deviceTagId":99,

带有标题和数据的示例数据

"success":true,"dev":"id":999999999,"name":"device name","tags":"id":99999,"name":"devicesname","dataType":"Int","description":"my description","alarmHint":"","value":0.0,"quality":"good","deviceTagId":99,"history":"date":"2016-11-05T21:15:47Z","value":0.0,"date":"2016-11-05T21:15:48Z","value":1.0,"date":"2016-11-05T21:15:50Z","value":0.0,"date":"2016-11-05T21:15:53Z","value":0.0,"date":"2016-11-05T21:15:57Z","value":0.0,"date":"2016-11-05T21:16:00Z","value":1.0,"date":"2016-11-05T21:16:02Z","value":1.0,"date":"2016-11-05T21:16:04Z","value":1.0,"date":"2016-11-05T21:16:07Z"1.0

我必须以某种方式获取这些数据并解析整个标题部分,例如删除第 11 个逗号之前的所有内容,然后我需要提取其余部分并解析以仅保留“值”和“日期”字段,在值字段数据值之后带有回车和换行符。

似乎字段/列名和该字段中数据的实际值用冒号分隔,我把我扔掉了。

我为此使用 Windows,并且更喜欢 PowerShell 解决方案,即使它需要进行 .NET 调用或其他什么,但我愿意接受任何人拥有的任何 Windows 解决方案都可以实现这一点。

对于任何可以帮助我解决此问题的人,我将永远感激不尽,因为我已经被困在做这么多事情了这么多小时,只是无法弄清楚如何做到这一点。不幸的是,数据来自无法更改数据的来源,但也许有一种方法可以做到这一点,但我还没有找到。

结束数据重新格式化/解析

"2016-11-05T21:15:47Z",0.0
"2016-11-05T21:15:48Z",1.0
"2016-11-05T21:15:50Z",0.0
"2016-11-05T21:15:53Z",:0.0
"2016-11-05T21:15:57Z",:0.0
"2016-11-05T21:16:00Z",1.0
"2016-11-05T21:16:02Z",1.0
"2016-11-05T21:16:04Z",1.0
"2016-11-05T21:16:07Z",1.0

【问题讨论】:

  • 起点:(Get-Content ".\40453833.csv") -split ",",然后使用 for 循环从第 11 个元素开始并步进 2 遍历数组。返回此处,edit 您的问题并提供minimal reproducible example 以获得下一个帮助。
  • 数据看起来像是某种损坏的 JSON。它来自哪里?
  • 看看你是否能在它被破坏之前得到它,因为除非你有一个非常稳定的数据结构,否则解除它会很痛苦。

标签: windows powershell csv parsing


【解决方案1】:

即使您的数据有逗号分隔的字段,它也不是 CSV 数据。

没有标题行后跟数据行;相反,在一行中只有一系列名称-值对,其中的名称不是唯一的

以下基于正则表达式的解决方案适用于您的示例输入:

# Replace the literal with `Get-Content YourFile` to load data from a file.
$s='"success":true,"dev":"id":999999999,"name":"device name","tags":"id":99999,"name":"devicesname","dataType":"Int","description":"my description","alarmHint":"","value":0.0,"quality":"good","deviceTagId":99,"history":"date":"2016-11-05T21:15:47Z","value":0.0,"date":"2016-11-05T21:15:48Z","value":1.0,"date":"2016-11-05T21:15:50Z","value":0.0,"date":"2016-11-05T21:15:53Z","value":0.0,"date":"2016-11-05T21:15:57Z","value":0.0,"date":"2016-11-05T21:16:00Z","value":1.0,"date":"2016-11-05T21:16:02Z","value":1.0,"date":"2016-11-05T21:16:04Z","value":1.0,"date":"2016-11-05T21:16:07Z","value":1.0'

# - Remove the part of the line before the first "date" entry.
# - Then extract the values from adjacent "date"-"value" pairs and output 
#   each value pair on a separate line.
$s -replace '^.+?("date":.+)', '$1' -replace '.+?:([^,]+),.+?:([^,]+)', ('$1,$2' + "`r`n")

【讨论】:

    猜你喜欢
    • 2019-10-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-11-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多