Processing malformed files with FasterCSV

On a recent project, I had to implement a CSV parser that would gracefully handle malformed files. I’m talking about files with unescaped quotes, wacky UTF-8 chars, and various other abominations of nature.

I originally assumed FasterCSV would handle this automagically, but it turns out that the library’s most commonly used methods are pretty strict when it comes to handling CSV files.

For example, parsing a malformed file one line at a time will result in an exception being thrown, even before any rows are yielded to the block:

FasterCSV.foreach("malformed.csv") do |row|
  # use row here...
end

Not cool! I managed to get around this by manually looping over each row and rescuing a malformed CSV exception if one gets thrown:

FasterCSV.open("malformed.csv", "rb") do |output|
  loop do
    begin
      break unless row = output.shift
      # use row here...
    rescue FasterCSV::MalformedCSVError => e
      # handle malformed row here...
    end
  end
end

Anyone have a better way to do this?

6 thoughts on “Processing malformed files with FasterCSV”

Matthew, I just wanted to thank you for your code bit. I ran into the same problem with FasterCSV today and this really helped.

Thanks for the idea. The foreach syntax seems so convenient until you actually need to deal with errors. Wish there was a simpler way!

thanks for the solution. Its a cool way to get around it but imo FasterCSV is just not cool.

But how do you actually capture the offending row so you can at least log it? You would think something like output.gets or something, but I can’t figure it out. Not a big fan of FasterCSV at this point.

That really does look nasty. I’m wondering if it would be better to use #parse inside of a begin/rescue block and then just work with each row as an Array outside of FasterCSV.

Eric, what would that look like?

Comments are closed.