【问题标题】:How to wrap the first 10 commas with double quotes?如何用双引号括起前 10 个逗号?
【发布时间】:2019-03-27 21:31:29
【问题描述】:

这是我的 input.csv 文件

dealerid,address,city,state,zip,vin,stocknumber,type,color,year,make,model,trim,bodystyle,fueltype,mileage,transmission,interiorcolor,interiorfabric,price,titlestatus,warranty,options_text,cylinders,engine,engineaspiration,enginetext,drivetrain,transmissiontext,mpgcity,mpghighway,features_text,vdc_url,images
TS06095298,999 wanna Road,Windsor,CT,06095,22HDT13S922218113,298,Used,Red,2002,OLDSMOBILE,BRAVADA,,,,136000,AUTOMATIC,,,2200,Clear,Available,"This vehicle is offered for sale by a verified private seller and features: FREE vehicle history & title report. Original window sticker available. Seller`s identity, email and phone verified. Secure cashless transactions. No cash needed. Pay securely by debit card or ACH. Bill of Sale and receipt issued for completed transactions. Vehicle financing options may be available.",,,,,,,,,,https://www.example.com/listings/298,"https://www.example.com/rails/00008.jpg,https://www.example.com/rails/AM00010.jpg"
TS06095298,999 wanna Road,Windsor,CT,06095,22HDT13S922123453,307,Used,Brown,2008,HONDA,599,,,,217538,AUTOMATIC,,,3500,Clear,Available,"This vehicle is offered for sale by a verified private seller and features: FREE vehicle history & title report. Original window sticker available. Seller`s identity, email and phone verified. Secure cashless transactions. No cash needed. Pay securely by debit card or ACH. Bill of Sale and receipt issued for completed transactions. Vehicle financing options may be available.",,,,,,,,,,https://www.example.com/listings/211,"https://www.example.com/rails/00008.jpg,https://www.example.com/rails/AM00010.jpg"  

我需要用双引号将所有列括起来,所以我最终会得到一个像这样的文件:

"dealerid","address","city","state","zip","vin","stocknumber","type","color","year","make","model","trim","bodystyle","fueltype","mileage","transmission","interiorcolor","interiorfabric","price","titlestatus","warranty","options_text","cylinders","engine","engineaspiration","enginetext","drivetrain","transmissiontext","mpgcity","mpghighway","features_text","vdc_url","images"
"TS06095298","999 wanna Road,Windsor","CT","06095","22HDT13S922218113","298","Used","Red","2002","OLDSMOBILE","BRAVADA","","","","136000,AUTOMATIC","","","2200","Clear","Available","This vehicle is offered for sale by a verified private seller and features: FREE vehicle history & title report. Original window sticker available. Seller`s identity, email and phone verified. Secure cashless transactions. No cash needed. Pay securely by debit card or ACH. Bill of Sale and receipt issued for completed transactions. Vehicle financing options may be available.","","","","","","","","","","https://www.example.com/listings/298","https://www.example.com/rails/00008.jpg,https://www.example.com/rails/AM00010.jpg"
"TS06095298","999 wanna Road,Windsor","CT","06095","22HDT13S922123453","307","Used","Brown","2008","HONDA","599","","","","217538","AUTOMATIC","","","3500","Clear","Available","This vehicle is offered for sale by a verified private seller and features: FREE vehicle history & title report. Original window sticker available. Seller`s identity, email and phone verified. Secure cashless transactions. No cash needed. Pay securely by debit card or ACH. Bill of Sale and receipt issued for completed transactions. Vehicle financing options may be available.","","","","","","","","","","https://www.example.com/listings/211","https://www.example.com/rails/00008.jpg,https://www.example.com/rails/AM00010.jpg"

该文件始终保持不变,某些列中缺少相同的数据。

images 列和 features 文本列已经包装好了。

看到始终缺少相同的信息,我决定在每行的开头添加双引号并开始用双引号替换逗号,但开始遇到一些问题。

这是我目前所拥有的。我知道代码效率不高,但这是一个开始。

#!/bin/bash

#- Temp Directories
tmp_dir="$(mktemp -d -t 'csv.XXXXX' || mktemp -d 2>/dev/null)"
tmp_input1="${tmp_dir}/temp_input1.csv"
tmp_input2="${tmp_dir}/temp_input2.csv"
tmp_input3="${tmp_dir}/temp_input3.csv"

#- Variables
client="00000"
wDir="$(pwd)"
ftpDir="${wDir}/.clientftp"
clientDir="${ftpDir}/${client}"
csvFile="${clientDir}/final.csv"
inputCsv="${wDir}/input.csv"

#  Lets Begin
cd "$wDir" || exit

      cp "$inputCsv" "$tmp_input1"
      dos2unix "$tmp_input1"

      #  place first line to a temp file , surrounding commas with double quotes , adding double quotes to the front and end of line
      head -1 "$tmp_input1" | sed -e 's/,/","/g;s/.*/"&"/' > "$tmp_input2"

      #  place remainding lines to a temp file
      sed 1,1d "$tmp_input1" | sed "s/^/\"/" > "$tmp_input3"
      sed -i 's/",,,,,,,,,,https/","","","","","","","","","","https/g' "$tmp_input3"
      sed -i 's/,Clear,Available,"/","Clear","Available","/g' "$tmp_input3"
      sed -i 's/,,,,/","","","","/g' "$tmp_input3"
      sed -i 's/,,,/","","","/g' "$tmp_input3"

      #  Create final file
      cat "$tmp_input2" > "$csvFile"
      cat "$tmp_input3" >> "$csvFile"

      rm -rf "$tmp_dir"

      { clear; echo ""; echo "";  echo "nano $csvFile"; echo ""; }

nano "$csvFile"

这个脚本产生:

"dealerid","address","city","state","zip","vin","stocknumber","type","color","year","make","model","trim","bodystyle","fueltype","mileage","transmission","interiorcolor","interiorfabric","price","titlestatus","warranty","options_text","cylinders","engine","engineaspiration","enginetext","drivetrain","transmissiontext","mpgcity","mpghighway","features_text","vdc_url","images"
"TS06095298,999 wanna Road,Windsor,CT,06095,22HDT13S922218113,298,Used,Red,2002,OLDSMOBILE,BRAVADA","","","","136000,AUTOMATIC","","","2200","Clear","Available","This vehicle is offered for sale by a verified private seller and features: FREE vehicle history & title report. Original window sticker available. Seller`s identity, email and phone verified. Secure cashless transactions. No cash needed. Pay securely by debit card or ACH. Bill of Sale and receipt issued for completed transactions. Vehicle financing options may be available.","","","","","","","","","","https://www.example.com/listings/298,"https://www.example.com/rails/00008.jpg,https://www.example.com/rails/AM00010.jpg"
"TS06095298,999 wanna Road,Windsor,CT,06095,22HDT13S922123453,307,Used,Brown,2008,HONDA,599","","","","217538,AUTOMATIC","","","3500","Clear","Available","This vehicle is offered for sale by a verified private seller and features: FREE vehicle history & title report. Original window sticker available. Seller`s identity, email and phone verified. Secure cashless transactions. No cash needed. Pay securely by debit card or ACH. Bill of Sale and receipt issued for completed transactions. Vehicle financing options may be available.","","","","","","","","","","https://www.example.com/listings/211,"https://www.example.com/rails/00008.jpg,https://www.example.com/rails/AM00010.jpg"

所以现在我有几个问题:
1- vdc_url 列没有右双引号
2-前10个逗号需要用双引号括起来

最后一列可以包含多于 3 张图片

任何帮助将不胜感激。

【问题讨论】:

  • 您的预期输出在第二行有"136000,AUTOMATIC",但应该是"136000","AUTOMATIC",对吧?
  • @BenjaminW。是的,先生,所有列都应该用双引号括起来。好收获!
  • 另外,"999 wanna Road,Windsor" 应该是 "999 wanna Road","Windsor"
  • @BenjaminW。你是对的。看看格伦的回答。它似乎成功了,而且非常有效。无论如何感谢您的帮助,我很感激。

标签: bash csv sed


【解决方案1】:

我喜欢 ruby​​ 用于快速 CVS 转换:

ruby -rcsv -e '
    out = CSV.instance($stdout, {force_quotes: true})
    CSV.foreach(ARGV.shift) {|row| out << row}
' input.csv

确保您在任何行上都没有任何尾随空格。

csvkit 也是一个不错的解决方案。

【讨论】:

  • 现在这是关于最酷的 4 行代码或什么......就在格伦!
【解决方案2】:

使用 GNU awk 进行 FPAT:

$ awk -v FPAT='[^,]*|"[^"]*"' -v OFS=',' '
    { for (i=1;i<=NF;i++) {gsub(/^"|"$/,"",$i); $i="\"" $i "\""} }
1' file
"dealerid","address","city","state","zip","vin","stocknumber","type","color","year","make","model","trim","bodystyle","fueltype","mileage","transmission","interiorcolor","interiorfabric","price","titlestatus","warranty","options_text","cylinders","engine","engineaspiration","enginetext","drivetrain","transmissiontext","mpgcity","mpghighway","features_text","vdc_url","images"
"TS06095298","999 wanna Road","Windsor","CT","06095","22HDT13S922218113","298","Used","Red","2002","OLDSMOBILE","BRAVADA","","","","136000","AUTOMATIC","","","2200","Clear","Available","This vehicle is offered for sale by a verified private seller and features: FREE vehicle history & title report. Original window sticker available. Seller`s identity, email and phone verified. Secure cashless transactions. No cash needed. Pay securely by debit card or ACH. Bill of Sale and receipt issued for completed transactions. Vehicle financing options may be available.","","","","","","","","","","https://www.example.com/listings/298","https://www.example.com/rails/00008.jpg,https://www.example.com/rails/AM00010.jpg"
"TS06095298","999 wanna Road","Windsor","CT","06095","22HDT13S922123453","307","Used","Brown","2008","HONDA","599","","","","217538","AUTOMATIC","","","3500","Clear","Available","This vehicle is offered for sale by a verified private seller and features: FREE vehicle history & title report. Original window sticker available. Seller`s identity, email and phone verified. Secure cashless transactions. No cash needed. Pay securely by debit card or ACH. Bill of Sale and receipt issued for completed transactions. Vehicle financing options may be available.","","","","","","","","","","https://www.example.com/listings/211","https://www.example.com/rails/00008.jpg,https://www.example.com/rails/AM00010.jpg"

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-06-22
    • 1970-01-01
    • 1970-01-01
    • 2022-01-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多