Ruby on Rails Introduction- The Hard Way
May 8, 2012 - 22:10
I was presented a problem late last week- a website that was displaying database records on individual pages. This information was important for a client who didn't have backend access to the site. Having seen The Social Network a couple too many times, I knew that it was possible. This would require new knowledge, so I set out on my quest.
I landed on Ruby as the language for my solution. I had already played with the basics (pretty much just helloworld.rb). I found a slightly dated but still great resource on scraping websites that had me running my test scripts in minutes. At the guide's suggestion I installed the Nokogiri Gem to allow for page parsing.
There wasn't much of a hard part to the process at all- once I was able to figure out where the information I needed was within the DOM, I built my XML queries and ran the script on a single record. The hardest part, honestly, was remembering the syntax for output- like I said, I have almost no Ruby experience.
I ended up with something like this:
raw_data = Nokogiri::HTML(open('http://www.example_website.com/?record=1'))
first_name = raw_data.xpath('//td/div/span')[1].content
last_name = raw_data.xpath('//td/div/span[2]')[1].content
#etc...
puts "First Name: "+first_name
puts "Last Name: "+last_name
#etc...
The next step was introducing a variable to the URL.
active = 1
active_page = 'http://www.example_website.com/?record='+active.to_s()
The hangup here came from not setting active's type to string- you can't concantenate strings with ints (Is this right?)
Now that there's a variable involved, we can wrap the whole function in a for.. loop.
for i in 1..100 #whatever the endpoint is
active = i
active_record = "http://www.example_website.com/?record="+active.to_s()
raw_data = Nokogiri::HTML(open(active_record))
first_name = raw_data.xpath('//td/div/span')[1].content
last_name = raw_data.xpath('//td/div/span[2]')[1].content
#etc...
puts i.to_s()+" of 2218....."
puts "First Name: "+first_name
puts "Last Name: "+last_name
#etc...
end
What's the point of running this scraper if we can't store the data? I found a pretty cool gem that allows Ruby to export to CSV called FasterCSV. Call open the .csv before the loop initializes, and then add to the .csv on each loop iteration. Easy as pie.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'fastercsv'
FasterCSV.open("temp.csv","w") do |csv|
for i in 1..100 #or whatever
active = i
active_record = "http://www.example_website.com/?record="+active.to_s()
raw_data = Nokogiri::HTML(open(active_record))
first_name = raw_data.xpath('//td/div/span')[1].content
last_name = raw_data.xpath('//td/div/span[2]')[1].content
#etc...
puts i.to_s()+" of 2218....."
puts "First Name: "+first_name
puts "Last Name: "+last_name
#etc...
csv<< [first_name,last_name,etc]
end
end
And that's how I got my data. All in all, a pretty quick foray into some new (to me) functions of Ruby. As usual, by the time I got around to doing this the client had already change their minds on the information.