【问题标题】:Add double quotes in .CSV comma delimited file using awk使用 awk 在 .CSV 逗号分隔文件中添加双引号
【发布时间】:2016-07-03 16:05:11
【问题描述】:

您好,我需要详细说明一个大的 csv 文件(20M 行),为每个逗号分隔的字段添加双引号。 csv 文件有 8 个字段,逗号分隔如下:

'2016-03-12','12393659','134',,'35533605',189348,9798,gmail.com;live_com.com
'2016-03-12','12390103','138',,'35438006',5133,1897,google.com
'2016-03-12','45616164','139',,'01318800',10945593,596633,facebook.com;tumblr.com;t.co
'2016-03-12','45673436','38',,'86441702',4350985,150327,serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net

如您所见,前 3 个字段在单引号之间,第 4 个为空白,第 5 个在单引号之间,第 6 到第 8 个仅以逗号分隔。 我想得到以下结果(也是第 4 个字段,即使为空也需要双引号):

"2016-03-12","12393659","134","","35533605","189348","9798","gmail.com;live_com.com"
"2016-03-12","12390103","138","","35438006","5133","1897","google.com"
"2016-03-12","45616164","139","","01318800","10945593","596633","facebook.com;tumblr.com;t.co"
"2016-03-12","45673436","38","","86441702","4350985,"150327","serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net"  

我通过 sed 和 awk 混合获得部分结果:

sed -e s/\'//g inpu.csv > output.csv eliminate quotes
awk '{gsub(/[^,]+/,"\"&\"")}1' output.csv > output1.csv add double quotes

但是第四个字段没有双引号,我需要尽可能减少阐述时间。 无论如何,有助于以更好的表现和第四场双引号来完成所有工作。 非常感谢您的帮助。 M.Tave

【问题讨论】:

    标签: csv awk quotes comma


    【解决方案1】:

    如果您的数据真的那么简单,没有嵌入引号或换行符或任何东西,那么您只需要:

    $ awk -F"'?,'?" -v OFS='","' '{$1=$1; gsub(/^.|$/,"\"")} 1' file
    "2016-03-12","12393659","134","","35533605","189348","9798","gmail.com;live_com.com"
    "2016-03-12","12390103","138","","35438006","5133","1897","google.com"
    "2016-03-12","45616164","139","","01318800","10945593","596633","facebook.com;tumblr.com;t.co"
    "2016-03-12","45673436","38","","86441702","4350985","150327","serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net"
    

    【讨论】:

    • Thanx 再次工作得很好......像往常一样。将使用这个,更压缩的代码。
    【解决方案2】:

    试试这个 awk 单行代码:

     awk -F, -v OFS="," -v re="^'?|'?$" -v q='"' 
                      '{for(i=1;i<=NF;i++)if($i)gsub(re,q,$i);else $i=q$i q}7' file
    

    想法是,使用gsub() 为那些非空字段添加双引号。那些空字段,只需在头部和尾部添加"。替换正则表达式被定义为脚本外部的 awk 变量,以避免转义。

    它适用于您的输入数据:

    kent$  awk -F, -v OFS="," -v re="^'?|'?$" -v q='"' '{for(i=1;i<=NF;i++)if($i)gsub(re,q,$i);else $i=q$i q}7' f
    "2016-03-12","12393659","134","","35533605","189348","9798","gmail.com;live_com.com"
    "2016-03-12","12390103","138","","35438006","5133","1897","google.com"
    "2016-03-12","45616164","139","","01318800","10945593","596633","facebook.com;tumblr.com;t.co"
    "2016-03-12","45673436","38","","86441702","4350985","150327","serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net"
    

    【讨论】:

      猜你喜欢
      • 2020-07-27
      • 1970-01-01
      • 2014-10-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-11-17
      • 1970-01-01
      相关资源
      最近更新 更多