【发布时间】:2013-05-22 06:57:21
【问题描述】:
Ubuntu 12.04 LTS
Ruby ruby 1.9.3dev(2011-09-23 修订版 33323)[i686-linux]
Rails 3.2.9
以下是我收到的 CSV 文件的内容:
"date/time","settlement id","type","order id","sku","description","quantity","marketplace","fulfillment","order city","order state","order postal","product sales","shipping credits","gift wrap credits","promotional rebates","sales tax collected","selling fees","fba fees","other transaction fees","other","total"
"Mar 1, 2013 12:03:54 AM PST","5481545091","Order","108-0938567-7009852","ALS2GL36LED","Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor","1","amazon.com","Amazon","Pasadena","CA","91104-1056","43.00","3.25","0","-3.25","0","-6.45","-3.75","0","0","32.80"
但是,当我尝试解析 CSV 文件时出现错误:
1.9.3dev :016 > options = { col_sep: ",", quote_char:'"' }
=> {:col_sep=>",", :quote_char=>"\""}
1.9.3dev :022 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
CSV::MalformedCSVError: Illegal quoting in line 1.
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
from (irb):22
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'
然后我尝试简化数据,即
"name","age","email"
"jignesh","30","jignesh@example.com"
但是我仍然遇到同样的错误:
1.9.3dev :023 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
CSV::MalformedCSVError: Illegal quoting in line 1.
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
from (irb):23
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'
我再次尝试像这样简化数据:
name,age,email
jignesh,30,jignesh@example.com
它有效。请参阅下面的输出:
1.9.3dev :024 > CSV.foreach("/tmp/my_data.csv") { |row| puts row }
name
age
email
jignesh
30
jignesh@example.com
=> nil
但我将收到包含引用数据的 CSV 文件,因此我实际上并不是在寻找删除引号解决方案。我无法找出导致错误的原因:CSV::MalformedCSVError: Illegal quoting in line 1. 在我之前的示例中。
我已经通过在我的文本编辑器中启用“显示空白字符”和“显示行尾”来验证 CSV 中没有前导/尾随空格。我还使用以下方法验证了编码。
1.9.3dev :026 > File.open("/tmp/my_data.csv").read.encoding
=> #<Encoding:UTF-8>
注意:我也尝试使用 CSV.read,但该方法出现同样的错误。
谁能帮我摆脱这个问题,让我明白哪里出了问题?
======================
我刚刚在http://www.ruby-forum.com/topic/448070 找到了以下帖子并尝试了以下内容:
file_data = file.read
file_data.gsub!('"', "'")
arr_of_arrs = CSV.parse(file_data)
arr_of_arrs.each do |arr|
Rails.logger.debug "=======#{arr}"
end
得到以下输出:
=======["\xEF\xBB\xBF'date/time'", "'settlement id'", "'type'", "'order id'", "'sku'", "'description'", "'quantity'", "'marketplace'", "'fulfillment'", "'order city'", "'order state'", "'order postal'", "'product sales'", "'shipping credits'", "'gift wrap credits'", "'promotional rebates'", "'sales tax collected'", "'selling fees'", "'fba fees'", "'other transaction fees'", "'other'", "'total'"]
=======["'Mar 1", " 2013 12:03:54 AM PST'", "'5481545091'", "'Order'", "'108-0938567-7009852'", "'ALS2GL36LED'", "'Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor'", "'1'", "'amazon.com'", "'Amazon'", "'Pasadena'", "'CA'", "'91104-1056'", "'43.00'", "'3.25'", "'0'", "'-3.25'", "'0'", "'-6.45'", "'-3.75'", "'0'", "'0'", "'32.80'"]
由于默认使用的 col_sep 是逗号字符,因此无法正确读取数据。 但是我尝试像这样使用 quote_char 选项:
arr_of_arrs = CSV.parse(file_data, :quote_char => "'")
但最终出现以下错误:
CSV::MalformedCSVError (Illegal quoting in line 1.):
谢谢, 吉格尼什
【问题讨论】:
-
使用了您提供的示例数据并且解析工作正常。没有收到任何
CSV::MalformedCSVError: Illegal quoting in line 1错误。 -
在我编辑的部分中,输出包含以下内容:“\xEF\xBB\xBF'date/time'”。它会产生一些问题吗?我不知道它代表什么。谢谢。
-
文件开头的 Unicode 字符是 BOM(字节顺序标记)。你可以试试
sub!(/^\xEF\xBB\xBF/, '')或CSV.foreach("test.csv", encoding: "bom|utf-8") -
谢谢阿南德,我将尝试使用您建议的编码解决方案。同时在使用 header_converters 时在编辑部分使用我的临时解决方案,例如:arr_of_arrs = CSV.parse(file_data, { col_sep: ";", headers: true , header_converters: [ :symbol ] }) 我收到以下错误:Encoding::UndefinedConversionError ("\xEF" from ASCII-8BIT to UTF-8). 那一个提到 ASCCII-8BIT 作为编码。该编码如何重要,那些如何BOM 字符进入了那里?此类错误应该清楚地显示在库抛出的异常中,而不是在 to_s 输出中偶然发现它们。
-
以下链接joelonsoftware.com/articles/Unicode.html,将有助于理解编码的重要性。至于这些 BOM 字符是如何进入的,您需要检查收到的 CSV 文件的来源以及它是如何保存的。