Thursday, April 22, 2010

Reading huge csv or xls file and importing the data into database tables.

To implement this we need to gems fastercsv (gem install fastercsv) and ar-extensions ( gem install ar-extensions)

FasterCSV is intended as a replacement to Ruby‘s standard CSV library. It was designed to address concerns users of that library had and it has three primary goals:
  1. Be significantly faster than CSV while remaining a pure Ruby library.
  2. Use a smaller and easier to maintain code base. (FasterCSV is larger now, but considerably richer in features. The parsing core remains quite small.)
  3. Improve on the CSV interface. 
Advantages:

  • FasterCSV has a stricter parser and will throw MalformedCSVErrors on problematic data.
  • FasterCSV has a less liberal idea of a line ending than CSV. What you set as the :row_sep is law.
  • CSV returns empty lines as [nil]. FasterCSV calls them [].
  • FasterCSV has a much faster parser. 

ar-extensions

ActiveRecord::Extensions provides extensions to:
  • mass import data
  • convert ActiveRecord models and arrays to CSV format
  • work with temporary tables
  • control foreign keys
  • to include extensible features for better and custom finder support
Sample code to implement :
require 'rubygems'
require 'fastercsv'
require 'ar-extensions'
table_columns = [:attribute1,:attribute2, :attribute3, :attribute4, :attribute5 ]
table_data = Array.new


FasterCSV.foreach("#{RAILS_ROOT}/public/data/yourcsvfile.csv") do |row|
  row.delete_at(14)   #this can be used incase you want to delete any unnecessary column from any row of csv data

  table_data << row
end
options = { :validate => false }
CareHomeHq.import table_columns,table_data

Here insert of data will also be faster because we are not creating a new instance of CareHomeHq everytime a new row is fetched. Instead all the rows are saved in an array first and then we use ar-extensions to import the data.


PARSING FROM TEMP FILE
In case you want to parse from temp file then all you need to do is
  @parsed_file = FasterCSV.parse(params[:data_import][:file].read).each do |row|   
       puts row.inspect
      #your code here     
   end

No comments:

Post a Comment