Quick ETL
Wednesday, April 5th, 2006 | Nick Bugajski
If you have a large amount of data logs that you need to process, you might consider writing a parsing script in Ruby.
A lot of server logs come in CSV files that are gzipped. Rather than going through a process involving unzipping and parsing files one at a time or configuring an ETL tool to do it for you, it may be easier and faster to just write a script that does everything you need all at once. Consider the following Ruby snippet:
Zlib::GzipReader.open('data.csv.gz') do |gz|
CSV::Reader.parse(gz.readline) do |row|
# do something interesting!
end
end
Two lines of code and it is already time to add your business logic!
Given that most ETL tools are complicated and/or expensive, writing a simple script often seems the path of least resistance. Especially when you just need to load the data to do some analysis and not set up an ongoing processing system.
No comments yet.





