【发布时间】:2019-10-07 06:51:54
【问题描述】:
重击 4.4.0 Ubuntu 16.04
我在 CSV 文件中有几列都是大写字母,有些是小写字母。有些列只有一个单词,而其他列可能有 50 个单词。此时,我使用 2 个命令逐列转换,当文件有 50k 行时,这对服务器来说是相当繁重的。
例子:
#-- Place the header line in a temp file
head -n 1 "$tmp_input1" > "$tmp_input3"
#-- Remove the header line in orginal file
tail -n +2 "$tmp_input1" > "$tmp_input1-temp" && mv "$tmp_input1-temp" "$tmp_input1"
#-- Change the words in the 11th column to lower case then change the first leter to upper case
awk -F"," 'BEGIN{OFS=","} {$11 = tolower($11); print}' "$tmp_input4" > "$tmp_input5"
sed -i "s/\b\(.\)/\u\1/g" "$tmp_input5"
#-- Change the words in the 12th column to lower case then change the first leter to upper case
awk -F"," 'BEGIN{OFS=","} {$12 = tolower($12); print}' "$tmp_input5" > "$tmp_input6"
sed -i "s/\b\(.\)/\u\1/g" "$tmp_input6"
#-- Change the words in the 13th column to lower case then change the first leter to upper case
awk -F"," 'BEGIN{OFS=","} {$13 = tolower($13); print}' "$tmp_input6" > "$tmp_input7"
sed -i "s/\b\(.\)/\u\1/g" "$tmp_input7"
cat "$tmp_input7" >> "$tmp_input3"
是否可以在一个命令中执行多个列?
以下是 csv 文件的示例:
"dealer_id","vin","conditon","stocknumber","make","model","year","broken","trim","bodystyle","color","interiorcolor","interiorfabric","engine","enginedisplacement","engineaspiration","engineText","transmission","drivetrain","mpgcity","mpghighway","mileage","cylinders","fuelconditon","optiontext","description","titlestatus","warranty","price","specialprice","window_sticker_price","mirrorhangerprice","images","ModelCode","PackageCodes"
"JOHNVANC04A","2C4RC1N73JR290946","N","JR290946","Chrysler","Pacifica","2018","","Hybrid Limited FWD","Mini-van, Passenger","Brilliant BLACK Crystal PEARL Coat","","..LEATHER SEATS..","V6 Cylinder Engine","3.6L","","","AUTOMATIC","FWD","0","0","553","6","H","..1-SPEED A/T..,..AUTO-OFF HEADLIGHTS..,..BACK-UP CAMERA..,..COOLED DRIVER SEAT..,..CRUISE CONTROL..","======KEY FEATURES INCLUDE: . LEATHER SEATS. THIRD ROW SEAT. QUAD BUCKET SEATS. REAR AIR. HEATED DRIVER SEAT.","","0","41680","","48830","","http://i.autoupktech.com/c640/9c40231cbcfa4ef89425d108e4e3a410.jpg",http://i.autoupnktech.com/c640/9c40231cbcfa4ef89425d108e4e3a410.jpg","RUES53","AAX,AT2,DFQ,EH3,GWM,WPU"
这是上述列的一个 sn-p 改进
Column 11 should be - "Brilliant Black Crystal Pearl Coat"
Column 13 should be - "Leather Seats"
Column 16 should be - "Automatic"
Column 23 should be - "1-Speed A/T,Auto-Off Headlights,Back-up Camera"
Column 24 should be - "Key Features Include: Leather Seats,Third Row Seat"
请记住,不能删除列周围的双引号。我只需要转换某些列而不是整个文件。这是转换后的第 11、13、16、23 和 24 列的示例。
"Brilliant Black Crystal Pearl Coat","Leather Seats","Automatic","1-Speed A/T,Auto-Off Headlights,Back-up Camera","Key Features Include: Leather Seats,Third Row Seat"
【问题讨论】:
-
请发布示例输入文件和您想要的预期输出。我认为使用 gnu sed 可以。 1.把所有东西都小写。 2.然后
sed 's/\W./\U&/g' -
嘿,你需要这个:“这是一个字段”,“这是另一个字段”或这个“这是一个字段”,“这是另一个字段”?????
-
我知道这是题外话,但使用 Python 的
str.title()会轻而易举。 -
即:
'just,a,LONG,LIST of SOme,RaNdOm wORDS'.title(),结果为'Just,A,Long,List Of Some,Random Words'。 -
好的...让我改进它以处理多个单词