【问题标题】:Format and replace timestamp column using awk使用 awk 格式化和替换时间戳列
【发布时间】:2017-05-18 04:26:53
【问题描述】:

我有多个具有以下格式的列

D,"4/2/2017 2:45:56 PM",ee,"4/2/2017 2:45:56 PM"
D,"03/02/2017 03:47:16 PM",ee,"03/02/2017 03:47:16 PM"
D,"09/2/2017 6:05:54 AM",ee,"09/2/2017 6:05:54 AM"
D,"5/01/2017 8:29:46 PM",ee,"5/01/2017 8:29:46 PM"
D,"4/2/2017 02:3:26 AM",ee,"4/2/2017 02:3:26 AM"

我想将它们格式化如下

D,"04/02/2017 02:45:56 PM",ee,"04/02/2017 02:45:56 PM"
D,"03/02/2017 03:47:16 PM",ee,"03/02/2017 03:47:16 PM"
D,"09/02/2017 06:05:54 AM",ee,"09/02/2017 06:05:54 AM"
D,"05/01/2017 08:29:46 PM",ee,"05/01/2017 08:29:46 PM"
D,"04/02/2017 02:03:26 AM",ee,"04/02/2017 02:03:26 AM"

我尝试使用 awk -F"[,/ :]" 分隔列,然后根据长度进行处理

但是当有多个列时,它变得乏味。

请建议 awk 中是否有任何日期时间或时间戳格式选项,以便我可以按列快速处理

【问题讨论】:

    标签: bash shell unix awk


    【解决方案1】:
    $ cat tst.awk
    function fmt(t,    f) {
        split(t,f,/["\/ :]/)
        return sprintf("\"%02d/%02d/%04d %02d:%02d:%02d %s\"",f[2],f[3],f[4],f[5],f[6],f[7],f[8])
    }
    BEGIN { FS=OFS="," }
    { $2=fmt($2); $4=fmt($4); print }
    
    $ awk -f tst.awk file
    D,"04/02/2017 02:45:56 PM",ee,"04/02/2017 02:45:56 PM"
    D,"03/02/2017 03:47:16 PM",ee,"03/02/2017 03:47:16 PM"
    D,"09/02/2017 06:05:54 AM",ee,"09/02/2017 06:05:54 AM"
    D,"05/01/2017 08:29:46 PM",ee,"05/01/2017 08:29:46 PM"
    D,"04/02/2017 02:03:26 AM",ee,"04/02/2017 02:03:26 AM"
    

    【讨论】:

    • 两全其美。
    【解决方案2】:

    我建议使用awk 及其printf 来格式化输出:

    awk -F '["/ :]' '{printf "%s\"%.2d/%.2d/%d %.2d:%.2d:%.2d %s\"%s\"%.2d/%.2d/%d %.2d:%.2d:%.2d %s\"\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16}' file
    

    输出:

    D,"04/02/2017 02:45:56 PM",ee,"04/02/2017 02:45:56 PM" D,"03/02/2017 03:47:16 PM",ee,"03/02/2017 03:47:16 PM" D,"09/02/2017 06:05:54 AM",ee,"09/02/2017 06:05:54 AM" D,"05/01/2017 08:29:46 PM",ee,"05/01/2017 08:29:46 PM" D,"04/02/2017 02:03:26 AM",ee,"04/02/2017 02:03:26 AM"

    【讨论】:

    • 我更喜欢这种方法,因为它也适用于 mawk(基于 Debian 的发行版中的默认 awk)。我正在尝试使用 gensub() 但放弃了,因为它在 mawk 中不可用。
    • @EdMorton 我的版本是 Ubuntu 16.04 附带的 1.3.3-17。这是与最近的 Ubuntu 17.04 和即将推出的 Debian 9 相同的版本。
    • 在他们的网页中,他们声明:“如前所述,mawk 已被一些包装商忽视”。我真的不明白为什么 Debian 选择 mawk 作为默认 awk,甚至不关心使用最新版本。我可以理解 /bin/sh 的选择不是 /bin/bash,但不是这个。
    【解决方案3】:

    使用 GNU awk(splitseps)。代码:

    function doit(str,    b) {                      # b is a local var buffer
        gsub(/\"/,"",str);                          # remove quotes
        n=split(str,a,"[/ :]",seps);                # split on special chars
        for(j=1;j<=n;j++) {                         # loop all elements in a
            if(a[j]~/^[0-9]+$/)                     # process all number elements
                a[j]=sprintf("%02d", a[j]) seps[j]; # zeropad
            b=b a[j]                                # gather buffer
        }
        return "\"" b "\""                          # return quoted
    }
    BEGIN { FS=OFS="," }
    {
        for(i=2;i<=NF;i+=2)                         # loop the right ones
            $i=doit($i)                             # call the contractor
    }
    1
    

    运行它:

    $ awk -f program.awk file
    

    输出:

    D,"04/02/2017 02:45:56 PM",ee,"04/02/2017 02:45:56 PM"
    D,"03/02/2017 03:47:16 PM",ee,"03/02/2017 03:47:16 PM"
    D,"09/02/2017 06:05:54 AM",ee,"09/02/2017 06:05:54 AM"
    D,"05/01/2017 08:29:46 PM",ee,"05/01/2017 08:29:46 PM"
    D,"04/02/2017 02:03:26 AM",ee,"04/02/2017 02:03:26 AM"
    

    【讨论】:

    • 出现错误.. 4 是无效的拆分参数数
    • 这是给 GNU awk 的,先生。我相信你在评论中提到了 mawk。
    【解决方案4】:

    您也可以使用sed,将单词边界之间的所有单个数字替换为0。但它会更改数据中的任何单个数字,即使它不在日期列中。因此,在您想替换所有单次出现的附加有0的数字时使用它

    sed 's|\b\([[:digit:]]\)\b|0\1|g'
    

    如果您想永久更改,请使用 -i 和 sed。

    它是如何工作的。

    正则表达式\b\([[:digit:]]\)\b 将匹配单词边界之间的单个数字,由(braces) 捕获。现在在replacesed 的一部分中,使用第一个匹配模式\1 硬编码0 将为您提供0 填充单个数字。

    正则表达式演示

    要了解此正则表达式的工作原理,请参阅 regex demo

    工作示例:

    bash-4.2$ cat file1
    D,"4/2/2017 2:45:56 PM",ee,"4/2/2017 2:45:56 PM"
    D,"03/02/2017 03:47:16 PM",ee,"03/02/2017 03:47:16 PM"
    D,"09/2/2017 6:05:54 AM",ee,"09/2/2017 6:05:54 AM"
    D,"5/01/2017 8:29:46 PM",ee,"5/01/2017 8:29:46 PM"
    D,"4/2/2017 02:3:26 AM",ee,"4/2/2017 02:3:26 AM"
    
    bash-4.2$ sed -i 's|\b\([[:digit:]]\)\b|0\1|g' file1
    
    bash-4.2$ cat file1
    D,"04/02/2017 02:45:56 PM",ee,"04/02/2017 02:45:56 PM"
    D,"03/02/2017 03:47:16 PM",ee,"03/02/2017 03:47:16 PM"
    D,"09/02/2017 06:05:54 AM",ee,"09/02/2017 06:05:54 AM"
    D,"05/01/2017 08:29:46 PM",ee,"05/01/2017 08:29:46 PM"
    D,"04/02/2017 02:03:26 AM",ee,"04/02/2017 02:03:26 AM"
    

    【讨论】:

      猜你喜欢
      • 2014-04-07
      • 2016-04-15
      • 2019-02-17
      • 1970-01-01
      • 1970-01-01
      • 2015-11-28
      • 2011-01-19
      相关资源
      最近更新 更多