Category: Ruby

  • Faster PDFs with wicked_pdf and delayed_job (part 3)

    In part 2 we coded our PDF generator as a background job. But the PDF is still being stored on the local file system. Let’s store it in S3 instead and give our users a URL so they can download it.

    First let’s add the AWS SDK gem to our Gemfile:

    gem "aws-sdk"
    

    Let’s define environment variables for our AWS credentials:

    AWS_ACCESS_KEY_ID=abc
    AWS_SECRET_ACCESS_KEY=secret
    

    Next we’ll modify our background job to connect to S3 and upload our PDF file instead of saving it to the local file system:

    class PdfJob < ActiveJob::Base
      def perform(html)
        pdf = WickedPdf.new.pdf_from_string(html)
        s3 = AWS::S3.new
        bucket = s3.buckets['my-bucket'] # replace with your bucket name
        bucket.objects['output.pdf'].write(pdf)
      end
    end
    

    Nice! But how do we enable our users to download the file? S3 has several options for this. One option would be to make the bucket publicly accessible. The downside to this approach is that it would allow anyone to download any PDFs stored in the bucket, regardless of who originally uploaded them. Depending on what kind of data is being included in the PDFs, this could be a bad idea.

    A better option is to generate a temporary URL. This URL can be given to a user so they can download the file, but the URL is only usable for the period of time we specify. This reduces the likelihood that the PDF will be exposed publicly. Here’s how it’s done:

    class PdfJob < ActiveJob::Base
      def perform(html)
        # ...
        obj = bucket.objects['output.pdf'].write(pdf)
        url = obj.url_for(:get, expires: 3.minutes.from_now).to_s
      end
    end
    

    Looks good. But how do we get this URL back to the user? The background job is asynchronous so it’s not like we can generate the PDF and return the string to the user all in the same HTTP request.

    A simple approach is to write the URL back into the database. Let’s introduce a new user param and update the user with the URL (this assumes the column exists on the users table):

    class PdfJob < ActiveJob::Base
      def perform(html, user)
        # ...
        url = obj.url_for(:get, s3_url_options).to_s
        user.update_attribute(:pdf_url, url)
      end
    end
    

    Now that the URL is available in the database, we can display it on the user’s profile page.

    If we want to get even fancier we can write some JavaScript that’s executed immediately after the user requests a PDF. This script would periodically poll an Ajax endpoint in our app to determine if the URL has been written to the users table yet. When it detects the URL, it would redirect the user to the URL. This would make the PDF generation process seamless from the user’s perspective.

    An example in jQuery might look something like this:

    function poll(btn) {
      $.get("http://www.our-app.com/users/123/pdf_url", function(data) {
        if (data.length > 0) {
          window.location = data;
        } else {
          setTimeout(function() { poll(btn) }, 2000);
        }
      });
    }
    

    Our controller action might look like this:

    class UsersController < ApplicationController
      def pdf_url
        user = User.find(params[:id])
        render text: user.pdf_url
      end
    end
    

    And there you have it. I hope this gave you a good idea of just how easy it can be to generate PDFs in a background job. If your site isn’t getting much traffic, it’s probably not worth going this route. But if it’s a popular site (or you expect it to be one day) it would be well worth investing the time to background this process. It’ll go a long way towards keeping your HTTP response times short, and your app will feel much snappier as a result.

  • Faster PDFs with wicked_pdf and delayed_job (part 2)

    In part 1 we learned why backgrounding is important. Now let’s dive into some code.

    First things first. Add wicked_pdf and delayed_job to your Gemfile:

    gem "wicked_pdf"
    gem "delayed_job"
    

    Now we can generate a PDF from inside our Rails app with this simple command:

    html = "<strong>Hello world!</strong>"
    pdf = WickedPdf.new.pdf_from_string(html)
    IO.write("output.pdf", pdf)</pre>
    

    You’ll notice that the more complex the HTML, the longer it takes wicked_pdf to run. That’s exactly why it’s important to run this process as a background job instead of in a web server process. A complex PDF with embedded images can take several seconds to render. That translates into several seconds of unavailability for the web process handling that particular request.

    Let’s move this code into a background job:

    class PdfJob < ActiveJob::Base
      def perform
        html = "<strong>Hello world!</strong>"
        pdf = WickedPdf.new.pdf_from_string(html)
        IO.write("output.pdf", pdf)
      end
    end
    

    Now we can queue the background job from a Rails controller like this:

    class PdfController < ApplicationController
      def generate_pdf
        PdfJob.perform_later
      end
    end
    

    The only problem is, our job isn’t doing anything particularly interesting yet. The HTML is statically defined and we’re writing out to the same file each time the job runs. Let’s make this more dynamic.

    First, let’s consider the HTML we want to generate. In a Rails app, the controller is generally responsible for rendering HTML from a given ERB template using a specific layout. There are ways to render ERB templates outside controllers, but they tend to be messy and unwieldy. In this situation, it’s perfectly reasonable to render the HTML in the controller and pass it along when we queue a job:

    class PdfController < ApplicationController
      def generate_pdf
        html = render_to_string template: "my_pdf"
        PdfJob.perform_later(html)
      end
    end
    

    This assumes an ERB template named “my_pdf.erb” exists and contains the HTML we want to convert into a PDF. Our method definition within our background job then becomes:

    class PdfJob < ActiveJob::Base
      def perform(html)
        pdf = WickedPdf.new.pdf_from_string(html)
        IO.write("output.pdf", pdf)
      end
    end
    

    delayed_job actually persists the HTML passed to the job in a database table so the job can retrieve the HTML when it gets executed. Since the job is executed asynchronously, the HTML has to be stored somewhere temporarily.

    So far, so good. The job will generate a PDF based on the HTML rendered in the controller. But how do we return this PDF back to the user when it’s ready? It turns out there are a variety of ways to do this. Saving the PDF to the file system in a publicly accessible folder is always an option. But why consume precious storage space on our own server when we can just upload to Amazon S3 instead for a few fractions of a cent?

    What’s nice about S3 is that it can be configured to automatically delete PDFs within a bucket after 24 hours. Furthermore, we can generate a temporary URL to allow a user to download a PDF directly from the S3 bucket. This temporary URL expires after a given period of time, greatly reducing the chance that a third party might access sensitive information.

    Next week I’ll demonstrate how to integrate S3 into our background job using the AWS SDK.

  • Faster PDFs with wicked_pdf and delayed_job (part 1)

    What do you get when you combine the slick PDF generation capabilities of wicked_pdf with the elegance and efficiency of delayed_job? A high performance way to convert HTML pages into beautiful PDF documents.

    I’ve been leveraging wicked_pdf to generate high school transcripts from my SaaS app, Teascript, since 2009. Prior to that I had been using Prawn which ultimately proved to lack the flexibility I needed to produce beautiful PDFs.

    wicked_pdf converts HTML pages into PDF documents using WebKit, the engine behind Apple’s Safari browser (among others). For the past few years, Teascript produced PDFs without any kind of backgrounding in place. This meant that if someone’s PDF took an unusually long time to generate, they were tying up a web server process for that entire duration.

    If multiple users generated PDFs simultaneously, it might prevent other visitors from accessing the site. Not good. Furthermore, if the PDF generation process exceeded the web server’s default timeout, the user might not ever get the PDF, just an error page.

    Any time your web app integrates with a third party API or a system process, it’s a viable candidate for backgrounding. delayed_job to the rescue. By offloading the long-running processes onto background workers, we free our web server to do what it’s best at: serving static HTML and images.

    Backgrounding isn’t a silver bullet, though. It introduces added complexity into the app, making it more vulnerable to failures. This requires writing additional code to handle these failure scenarios gracefully. But at the cost of this added complexity, we can ensure our web server stays fast and lean while our users still get the pretty PDF they want.

    Next week we’ll dive into some actual code. I’ll demonstrate how to integrate wicked_pdf with delayed_job and hook the entire thing up to your Rails app. Don’t touch that remote.

  • Slides from my API talk

    Thanks to everyone who turned out for my API talk at the Triangle Ruby Brigade. I wasn’t expecting such a large crowd and the resulting Q&A was really good. It was interesting hearing how other developers are using APIs in their projects, and what problems they are encountering and solving. I’ve posted my slide deck for those who are interested. I also recorded audio from the talk and will be posting a link here when that’s online.

  • Building an external HTTP-based API in Rails

    If you’ll be in or near Raleigh the evening of March 12th, consider dropping by the Triangle Ruby Brigade. I’ll be presenting on how to build HTTP-based APIs in Rails, including:

    • Creating an API controller
    • Wiring up versioned routes for your API
    • Protecting your API with authentication
    • Choosing a transport encoding

    The question of which transport encoding to use is critical. If your API will be consumed by iOS devices, choosing binary property lists over XML or JSON can give you a 30% performance boost as well as an associated reduction in bandwidth consumption. Building an API that generates plists is straightforward with the help of a couple of Ruby gems.

    I’ll be sharing code examples from a recent project that surfaced a large, multi-faceted API to hundreds of iOS devices using binary plists. I’ll also have plenty of resources for those interested in learning more. It’s sure to be a great time! Hope to see you there.

  • RubyConf 2012 recap (part 2)

    Continuing from part 1 of the recap, here are the remaining six talks I attended during RubyConf in Denver:

    • Y Not – Adventures in Functional Programming by Jim Weirich
      Jim’s presentations never disappoint and this was no exception. Similar to his prior talk where he built Git from scratch, except this time he build the Y-combinator using nothing but stabby procs. Mind blowing.

    • Ruby vs. the World by Matt Aimonetti
      A fascinating look at how Ruby stacks up agains three other languages: Go, Clojure, and Scala. Matt included plenty of code examples and shared his thoughts about the pros and cons of each language.

    • Real Time Salami by Aaron Patterson
      Building real-time monitoring systems in Ruby while enjoying delicious salami. What better combination could there be? Aaron even brought samples for everyone. I won’t call it bribery… [slides]

    • Inside RubyMotion by Rich Kilmer
      One of the most crowded talks of the conf, Rich demonstrated how to build iOS applications in pure Ruby. Impressive is an understatement. Does this project offer sweet escape from the dungeons of Objective-C? You be the judge.

    • The Insufficiency of Good Design by Sarah Mei
      A practical exploration of team dynamics, communication, code quality, and problem solving. [slides]

    • Simulating the World with Ruby by Bryan Liles
      The real world has millions of “objects” interacting in seemingly random ways. How would we go about modeling this in Ruby? Bryan demonstrated how and threw in a healthy dose of statistics for good measure. [slides]

    Attending RubyConf this year made me regret skipping last year. I’m looking forward to visiting Miami Beach for RubyConf 2013.

    If you’re interested in picking up new tricks and techniques for your own programming, or are just looking for a healthy dose of motivation, you should consider attending as well. My advice is to act fast once tickets are announced. They tend to sell out very quickly.

  • RubyConf 2012 recap

    After being unable to attend RubyConf last year, I was thrilled when I heard that this year’s conference would be held in Denver. Having lived in Boulder for several years, I’ve learned to love Colorado, the scenery, and the people. So it was almost a given that I would be attending.

    RubyConf 2012 was one of the most useful Ruby conferences I’ve been to. The variety and quality of the talks and the venue combined to create a memorable experience. My reading list is slam full of interesting things I picked up at the conf and want to keep learning about on my own. Another reason this was a great conf: the swag. I left the conf with no less than 8 T-shirts, all of which I’m reasonably sure I will actually wear (sometimes free shirts are rendered unwearable by being poorly made or just plain ugly). Some attendees even scored 9 or 10 shirts.

    But enough about shirts. Let’s go over some of the best talks I attended. (Which, by the way, will be posted online by Confreaks shortly, if they aren’t there already.)

    • My Name is MagLev by Jesse Cooke
      A Ruby implementation sitting on a Smalltalk VM, sporting a baked-in ORM that transparently persists your Ruby objects to the database. No more ActiveRecord wrangling!

    • Implementation Details of Ruby 2.0 VM by Koichi Sasada
      The 20th anniversary edition of Ruby was previewed at RubyConf and boy howdy does it have some nice features. Besides better method dispatch performance, the ability to prepend a module is very handy. The target release month for Ruby 2.0 is February, 2013.

    • Ruby Monitoring State of the Union by Joseph Ruscio
      Joseph surveyed various options for monitoring your Ruby programs: New Relic, statsd-ruby, Librato, and various monolithic open source software packages.

    • Zero Downtime Deploys Made Easy by Matt Duncan
      The title was misleading since Matt opened by saying there is no silver bullet. But he did share some interesting tricks to avoid locking database tables during long migrations, and also outlined a way to migrate between API versions.

    • Refactoring from Good to Great by Ben Orenstein
      One of my favorite talks of the conf, Ben gave several examples of smelly code and then proceeded to live code his way through various refactorings. Highly recommended.

    Tomorrow I’ll recap the remaining six talks, including Jim Weirich’s keynote which involved stabby procs and succeeded in completely blowing my synapses. Stay tuned.

  • Ruby Hoedown 2012 recap

    Ruby Hoedown 2012This year’s Ruby Hoedown was at the Scarritt Bennet Center in Nashville. Per his usual, Jeremy McAnally organized a top-notch, free, two day regional Ruby conference that was a pleasure to attend. A lot of work goes into organizing this type of thing. I doff my proverbial hat to Jeremy for making the Hoedown a reality for six years straight.

    I wasn’t able to attend the Hoedown last year so this was my first experience at Scarritt Bennett. The gothic architecture was quite beautiful and made for some lovely ad-hoc photos taken with my iPhone. The presentation room was comfortable enough and just the right size for the nearly 250 developers who attended. Power was in short supply the first day, but the problem was quickly rectified (EE pun) by the appearance of a plethora of extension cords and power strips. By the end of the day there was enough power for everyone who wanted it. Wi-fi remained stable throughout the conference.

    Brad Winfrey gave the first talk of the day, titled “gem install erlang” [slides]. I’m not a functional language guy, but Brad’s talk made me want to look at erlang again. What was most impressive to me was his demonstration of erlang’s built-in pattern matching. I can see how someone could get addicted to that kind of power.

    Next was Phil Harvey with “REST & Hypermedia” [slides]. If you’ve ever wanted to change an API without breaking things for your existing users, Phil’s talk gave a solution in hypermedia. He demonstrated various ways to link together resources using calls that return link relations. The server essentially builds URLs dynamically for the user. He also made the point that if you aren’t using hypermedia, you aren’t really using REST.

    GitHub was well represented at the conference. Brandon Keepers, one of their developers, presented on “Why Our Code Smells” [slides]. I always appreciate suggestions on how to make my code better and Brandon did not disappoint. The biggest idea I took away from his talk was to strive for clean separation between the ORM and business models. In other words, reduce coupling to the framework (e.g. ActiveRecord).

    Jeffrey Baird gave a talk titled “Growing Your Own Developers: Hiring Programmers with Little to No Experience” [slides]. I really appreciated this talk since a big catalyst to pursuing programming as a career was an 8-month apprenticeship at RoleModel Software during my sophomore year of college. Jeffrey made the point that computer science majors are not predicted to meet labor demands through 2016. One way companies can find the talent they need is to hire motivated, passionate beginners and give them the tools and training they need to grow into experts.

    To conclude day 1, Dave Worth presented “Static Analysis in Ruby Applications with Brakeman” [slides] which I unfortunately missed.

    Jeremy Holland kicked off day 2 with “Using System V Shared Memory in Ruby Projects” [slides], a highly technical but very enjoyable mini-tutorial on how to use and manage shared memory with C. The problem he was trying to solve was to quickly search a massive binary tree. Ruby has no concept of shared memory, requiring C to be brought into the equation. It was nice hearing about another tool on the programmer tool-belt that can be used to solve problems like this.

    Will Farrington introduced us to “The Setup” [slides] which is GitHub’s answer to the problem of managing an army of developer laptops. The Setup uses the CLI and Puppet and has been in development for 6 months. It enables a developer to script a configuration for his laptop (e.g. Apache, Ruby, RVM, a text editor, custom Bash aliases, etc) and have that configuration automatically installed on a new MacBook.

    “Adhearsion: Telephony Through Ruby-colored Lenses” [slides] by Ben Klang was another presentation I unfortunately had to miss. But I’m sure he knocked it out of the park, to borrow the colloquialism.

    Lance Ball presented “Sleep Better with TorqueBox” [slides], an introduction to the Java-based JBoss 7 application server. He quickly pointed out that you don’t need to know Java to use it, and that no instrumentation is required for Rails apps. In fact, the server supports any Rack-based application and provides scheduled jobs, robust background processing, long-running daemons, caching, messaging, web sockets, and clustering.

    Lightning talks have been an important part of the conference each year and 2012 was no exception. Talks were given by Will Farrington, Brandon Valentine, Ernie Miller, Cameron Dukes, Yossef Mendelssohn, Frank Rietta, Chad Taylor, Loren Norman, Edward Anderson, Winston Hearn, and Jeremy McAnally.

    Anthony Eden gave the keynote which wrapped up day 2 and the conference itself. His presentation was a nice mix of nerdy technical content (Lisp, Clojure, Erlang) along with some plain old motivational talk. He encouraged us to keep building things, to expand our toolbox by learning new programming languages, to share our experiences with others, and to never stop having fun.

    This has been my first Ruby conference in over a year. It was great to reconnect with the community, make some new friends, and learn about various software projects people are working on. I left Nashville feeling recharged and ready to put into practice what I had learned.

    You should consider attending the Hoedown in 2013. Maybe I’ll see you there.

  • Learn about A/B testing at raleigh.rb

    I’ll be presenting on A/B testing at tonight’s raleigh.rb meetup. As developers, we use tools like RSpec and Cucumber to verify that our application is functional, but how can we verify that the layout of our home page is user-friendly? How can we determine the ideal size for our signup button? How can we maximize throughput to our signup form? A/B testing is an easy and compelling way to increase the effectiveness of our web applications. Join us tonight to learn how to leverage A/B testing in Ruby using several popular tools.

  • Drop seconds when querying MySQL timestamps

    One of the Rails apps I’ve been working on formats times as HH:MM in the view. No seconds are displayed. This is a pretty common way to format things. When doing timestamp comparisons in SQL, however, the seconds are taken into account. This is bad since it can cause discrepancies in the view.

    For example, say I have a table of records with created_at timestamps. My view displays all records with timestamps equal to or before the current time. Let’s assume the current time is 15:00:00 precisely and I happen to have a record with a timestamp of 15:00:00 in the database. The SQL comparison would work fine in this case.

    SELECT * FROM records WHERE created_at <= "2010-06-25 15:00:00"
    => 1 row in set
    

    What if the timestamp in the database is 15:00:03 though? Let’s run the query again.

    SELECT * FROM records WHERE created_at <= "2010-06-25 15:00:00"
    => Empty set
    

    Since 15:00:03 is greater than the current time of 15:00:00, the record doesn’t get returned. This would be fine if we were displaying seconds in the view, but we’re not. From the user’s perspective, the timestamp on the record is still 15:00 and should appear in the view since it’s equal to the current time. But it doesn’t.

    One way to fix this would be to handle the time comparisons in Ruby. This is certainly a legitimate option. For this particular project, though, performance was a big issue. (And we all know that Rails can’t scale.) I needed a way to continue letting the database handle the comparisons while disregarding seconds.

    The solution I ended up with isn’t ideal (it relies on a couple of functions built into MySQL) but it works fine and runs fast:

    SELECT * FROM records WHERE (created_at - INTERVAL SECOND(created_at) SECOND) <= "2010-06-25 15:00:00"
    => 1 row in set
    

    The number of seconds is extracted from the created_at timestamp and then subtracted from the timestamp. So if the timestamp was 15:00:03, MySQL subtracts 3 seconds to end up with 15:00:00.

    This solved the comparison problem for me and made my client very happy. Double win.