【问题标题】:Splitting columns into rows from a text file using unix shell script - Dynamically changing source file structure使用 unix shell 脚本将文本文件中的列拆分为行 - 动态更改源文件结构
【发布时间】:2020-07-15 05:43:20
【问题描述】:

我有一个具有这种结构的制表符分隔源文件:只有从 ID 到 Line Item/Property 的前 9 列是固定的,其余的都是动态变化的计数和结构。

ID  Date/Time (UTC) User    Description Security Change Previous Value  New Value   Module/List Line Item/Property  Scenarios   Region EM2  Plan Item PB6   Market EM4  Plants - Master Plan Brand PB4  T/DI    GRS 6   GRS 7   Target User Import  Object  Target Role Export  Dashboard   Action  Time

这是该文件中的一个示例记录

2572561 3/24/2020 14:01 chiara.bettini@gmail.com            FALSE   TRUE    FILTER:  Brand P&L Report - Market  Plan Brands                     Polly Pocket                chiara.bettini@gmail.com    

我需要使用 Unix shell 脚本将其更改为 具有以下标头和数据格式的 CSV 文件。我想保留永久列(直到行项目/属性的 ID),并将所有其他动态变量列放入属性名称和属性值列:

ID,Date/Time (UTC),User,Description,Security Change,Previous Value,New Value,Module/List,Line Item/Property,Attribute Name,Attribute Value
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Scenarios,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Region EM2,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Plan Item PB6,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Market EM4,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Plants - Master,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Plan Brand PB4,Polly Pocket
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,T/DI,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,GRS 6,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,GRS 7,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Target User,chiara.bettini@gmail.com
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Import,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Object,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Target Role,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Export,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Dashboard,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Action,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Time,



【问题讨论】:

  • 请在您的问题中将您的示例包装在代码标签中,然后告诉我们。
  • 请删除电子邮件地址,并用 foo@bar 之类的虚拟值替换它。
  • 它已经是一个虚拟的电子邮件地址
  • 请说明如何处理原始数据记录中的逗号(,)。

标签: shell csv unix sh


【解决方案1】:

注意:如果任何字段包含逗号字符 (,),以下内容将不会正常工作。

试试这个bash 脚本(命名为process 用于随后的终端会话):

#!/bin/bash

tr '\t' ',' | {
    IFS=',' # separator for all array reads and printfs

    # read and output heading
    read -r -a heading
    printf "%s\n" "${heading[*]:0:9},Attribute Name,Attribute Value"    

    # process one line of data
    while read -r -a data ; do
        for (( i=9; i<${#heading[*]}; ++i )) ; do
            printf "%s\n" "${data[*]:0:9},${heading[i]},${data[i]}"
        done
    done
}

终端会话:

$ cat data.in | tr '\t' ','
ID,Date/Time (UTC),User,Description,Security Change,Previous Value,New Value,Module/List,Line Item/Property,Scenarios,Region EM2,Plan Item PB6,Market EM4,Plants - Master,Plan Brand PB4,T/DI,GRS 6,GRS 7,Target User,Import,Object,Target Role,Export,Dashboard,Action,Time
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,,,,,,Polly Pocket,,,,chiara.bettini@gmail.com
$ ./process < data.in 
ID,Date/Time (UTC),User,Description,Security Change,Previous Value,New Value,Module/List,Line Item/Property,Attribute Name,Attribute Value
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Scenarios,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Region EM2,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Plan Item PB6,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Market EM4,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Plants - Master,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Plan Brand PB4,Polly Pocket
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,T/DI,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,GRS 6,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,GRS 7,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Target User,chiara.bettini@gmail.com
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Import,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Object,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Target Role,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Export,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Dashboard,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Action,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER:  Brand P&L Report - Market,Plan Brands,Time,
$ 

【讨论】:

  • 我必须将列转换为行。这是这里的一个重大挑战。这也是来自动态变化的源结构
  • 您的输入和样本输出没有显示任何行列转置操作...没关系我现在明白了...
  • 是的,抱歉,已更改问题以添加该问题
  • 您可以选择语言吗?我不推荐bash。我可以在tcl 中向您展示解决方案。
  • 不,Unix shell 脚本。
猜你喜欢
  • 1970-01-01
  • 2014-06-28
  • 2013-08-05
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2014-10-04
相关资源
最近更新 更多