Category: Rails

  • Faster PDFs with wicked_pdf and delayed_job (part 3)

    In part 2 we coded our PDF generator as a background job. But the PDF is still being stored on the local file system. Let’s store it in S3 instead and give our users a URL so they can download it.

    First let’s add the AWS SDK gem to our Gemfile:

    gem "aws-sdk"
    

    Let’s define environment variables for our AWS credentials:

    AWS_ACCESS_KEY_ID=abc
    AWS_SECRET_ACCESS_KEY=secret
    

    Next we’ll modify our background job to connect to S3 and upload our PDF file instead of saving it to the local file system:

    class PdfJob < ActiveJob::Base
      def perform(html)
        pdf = WickedPdf.new.pdf_from_string(html)
        s3 = AWS::S3.new
        bucket = s3.buckets['my-bucket'] # replace with your bucket name
        bucket.objects['output.pdf'].write(pdf)
      end
    end
    

    Nice! But how do we enable our users to download the file? S3 has several options for this. One option would be to make the bucket publicly accessible. The downside to this approach is that it would allow anyone to download any PDFs stored in the bucket, regardless of who originally uploaded them. Depending on what kind of data is being included in the PDFs, this could be a bad idea.

    A better option is to generate a temporary URL. This URL can be given to a user so they can download the file, but the URL is only usable for the period of time we specify. This reduces the likelihood that the PDF will be exposed publicly. Here’s how it’s done:

    class PdfJob < ActiveJob::Base
      def perform(html)
        # ...
        obj = bucket.objects['output.pdf'].write(pdf)
        url = obj.url_for(:get, expires: 3.minutes.from_now).to_s
      end
    end
    

    Looks good. But how do we get this URL back to the user? The background job is asynchronous so it’s not like we can generate the PDF and return the string to the user all in the same HTTP request.

    A simple approach is to write the URL back into the database. Let’s introduce a new user param and update the user with the URL (this assumes the column exists on the users table):

    class PdfJob < ActiveJob::Base
      def perform(html, user)
        # ...
        url = obj.url_for(:get, s3_url_options).to_s
        user.update_attribute(:pdf_url, url)
      end
    end
    

    Now that the URL is available in the database, we can display it on the user’s profile page.

    If we want to get even fancier we can write some JavaScript that’s executed immediately after the user requests a PDF. This script would periodically poll an Ajax endpoint in our app to determine if the URL has been written to the users table yet. When it detects the URL, it would redirect the user to the URL. This would make the PDF generation process seamless from the user’s perspective.

    An example in jQuery might look something like this:

    function poll(btn) {
      $.get("http://www.our-app.com/users/123/pdf_url", function(data) {
        if (data.length > 0) {
          window.location = data;
        } else {
          setTimeout(function() { poll(btn) }, 2000);
        }
      });
    }
    

    Our controller action might look like this:

    class UsersController < ApplicationController
      def pdf_url
        user = User.find(params[:id])
        render text: user.pdf_url
      end
    end
    

    And there you have it. I hope this gave you a good idea of just how easy it can be to generate PDFs in a background job. If your site isn’t getting much traffic, it’s probably not worth going this route. But if it’s a popular site (or you expect it to be one day) it would be well worth investing the time to background this process. It’ll go a long way towards keeping your HTTP response times short, and your app will feel much snappier as a result.

  • Faster PDFs with wicked_pdf and delayed_job (part 2)

    In part 1 we learned why backgrounding is important. Now let’s dive into some code.

    First things first. Add wicked_pdf and delayed_job to your Gemfile:

    gem "wicked_pdf"
    gem "delayed_job"
    

    Now we can generate a PDF from inside our Rails app with this simple command:

    html = "<strong>Hello world!</strong>"
    pdf = WickedPdf.new.pdf_from_string(html)
    IO.write("output.pdf", pdf)</pre>
    

    You’ll notice that the more complex the HTML, the longer it takes wicked_pdf to run. That’s exactly why it’s important to run this process as a background job instead of in a web server process. A complex PDF with embedded images can take several seconds to render. That translates into several seconds of unavailability for the web process handling that particular request.

    Let’s move this code into a background job:

    class PdfJob < ActiveJob::Base
      def perform
        html = "<strong>Hello world!</strong>"
        pdf = WickedPdf.new.pdf_from_string(html)
        IO.write("output.pdf", pdf)
      end
    end
    

    Now we can queue the background job from a Rails controller like this:

    class PdfController < ApplicationController
      def generate_pdf
        PdfJob.perform_later
      end
    end
    

    The only problem is, our job isn’t doing anything particularly interesting yet. The HTML is statically defined and we’re writing out to the same file each time the job runs. Let’s make this more dynamic.

    First, let’s consider the HTML we want to generate. In a Rails app, the controller is generally responsible for rendering HTML from a given ERB template using a specific layout. There are ways to render ERB templates outside controllers, but they tend to be messy and unwieldy. In this situation, it’s perfectly reasonable to render the HTML in the controller and pass it along when we queue a job:

    class PdfController < ApplicationController
      def generate_pdf
        html = render_to_string template: "my_pdf"
        PdfJob.perform_later(html)
      end
    end
    

    This assumes an ERB template named “my_pdf.erb” exists and contains the HTML we want to convert into a PDF. Our method definition within our background job then becomes:

    class PdfJob < ActiveJob::Base
      def perform(html)
        pdf = WickedPdf.new.pdf_from_string(html)
        IO.write("output.pdf", pdf)
      end
    end
    

    delayed_job actually persists the HTML passed to the job in a database table so the job can retrieve the HTML when it gets executed. Since the job is executed asynchronously, the HTML has to be stored somewhere temporarily.

    So far, so good. The job will generate a PDF based on the HTML rendered in the controller. But how do we return this PDF back to the user when it’s ready? It turns out there are a variety of ways to do this. Saving the PDF to the file system in a publicly accessible folder is always an option. But why consume precious storage space on our own server when we can just upload to Amazon S3 instead for a few fractions of a cent?

    What’s nice about S3 is that it can be configured to automatically delete PDFs within a bucket after 24 hours. Furthermore, we can generate a temporary URL to allow a user to download a PDF directly from the S3 bucket. This temporary URL expires after a given period of time, greatly reducing the chance that a third party might access sensitive information.

    Next week I’ll demonstrate how to integrate S3 into our background job using the AWS SDK.

  • Faster PDFs with wicked_pdf and delayed_job (part 1)

    What do you get when you combine the slick PDF generation capabilities of wicked_pdf with the elegance and efficiency of delayed_job? A high performance way to convert HTML pages into beautiful PDF documents.

    I’ve been leveraging wicked_pdf to generate high school transcripts from my SaaS app, Teascript, since 2009. Prior to that I had been using Prawn which ultimately proved to lack the flexibility I needed to produce beautiful PDFs.

    wicked_pdf converts HTML pages into PDF documents using WebKit, the engine behind Apple’s Safari browser (among others). For the past few years, Teascript produced PDFs without any kind of backgrounding in place. This meant that if someone’s PDF took an unusually long time to generate, they were tying up a web server process for that entire duration.

    If multiple users generated PDFs simultaneously, it might prevent other visitors from accessing the site. Not good. Furthermore, if the PDF generation process exceeded the web server’s default timeout, the user might not ever get the PDF, just an error page.

    Any time your web app integrates with a third party API or a system process, it’s a viable candidate for backgrounding. delayed_job to the rescue. By offloading the long-running processes onto background workers, we free our web server to do what it’s best at: serving static HTML and images.

    Backgrounding isn’t a silver bullet, though. It introduces added complexity into the app, making it more vulnerable to failures. This requires writing additional code to handle these failure scenarios gracefully. But at the cost of this added complexity, we can ensure our web server stays fast and lean while our users still get the pretty PDF they want.

    Next week we’ll dive into some actual code. I’ll demonstrate how to integrate wicked_pdf with delayed_job and hook the entire thing up to your Rails app. Don’t touch that remote.

  • Slides from my API talk

    Thanks to everyone who turned out for my API talk at the Triangle Ruby Brigade. I wasn’t expecting such a large crowd and the resulting Q&A was really good. It was interesting hearing how other developers are using APIs in their projects, and what problems they are encountering and solving. I’ve posted my slide deck for those who are interested. I also recorded audio from the talk and will be posting a link here when that’s online.

  • Building an external HTTP-based API in Rails

    If you’ll be in or near Raleigh the evening of March 12th, consider dropping by the Triangle Ruby Brigade. I’ll be presenting on how to build HTTP-based APIs in Rails, including:

    • Creating an API controller
    • Wiring up versioned routes for your API
    • Protecting your API with authentication
    • Choosing a transport encoding

    The question of which transport encoding to use is critical. If your API will be consumed by iOS devices, choosing binary property lists over XML or JSON can give you a 30% performance boost as well as an associated reduction in bandwidth consumption. Building an API that generates plists is straightforward with the help of a couple of Ruby gems.

    I’ll be sharing code examples from a recent project that surfaced a large, multi-faceted API to hundreds of iOS devices using binary plists. I’ll also have plenty of resources for those interested in learning more. It’s sure to be a great time! Hope to see you there.

  • Excluding iPads from the mobile-fu Rails plugin

    mobile-fu is a handy little Rails plugin that automatically detects when mobile devices are accessing your web application and adjusts the request format accordingly. For example:

    Mime::Type.register_alias "text/html", :mobile
    
    class ApplicationController < ActionController::Base
      has_mobile_fu
    
      def index
        if is_mobile_device?
          render :text => "You're using a mobile device!"
        else
          render :text => "You're not using a mobile device."
        end
      end
    end
    

    mobile-fu even loads any mobile stylesheets you’ve created, including device-specific stylesheets for iPhone, Android, Blackberry, etc. There is also a way to set the mobile flag manually. This comes in handy when you want to test your mobile layouts without actually picking up a device.

    I did find one problem, though: by default, mobile-fu sets the mobile request format for iPads. This probably isn’t the desired behavior for most sites. Mobile Safari does an acceptable job of rendering without the need for specific enhancements on the server side.

    I want my HTML pages to continue rendering on the iPad the same way they do in a desktop browser (at least until I have time to customize a layout specifically for the iPad).

    It’s easy enough to get mobile-fu to do the right thing, though. Simply override the #set_mobile_format method thusly:

    class ApplicationController < ActionController::Base
      has_mobile_fu
      
      # Continue rendering HTML for the iPad (no mobile views yet)
      def set_mobile_format
        is_device?("ipad") ? request.format = :html : super
      end
    end
    

    I also recommend adding the following to your CSS:

    -webkit-appearance: none;
    

    This prevents the iPad from rendering form controls natively. Native controls tend to break the layout of pages designed for desktop browsers. I’ve found it best to reserve native controls for when I actually build a dedicated iPad-only layout for the site.

  • Announcing youtube_tags extension for Radiant

    A recent project required the latest YouTube videos from a specific user to be listed inside a Radiant page. There wasn’t an existing extension that did this so I built my own. youtube_tags is based on the excellent twitter_tags extension. It enables inclusion of YouTube videos within Radiant pages using a series of Radius tags. It leverages the youtube-g gem to pull data directly from the GData API.

    For example, this is how you would display linked titles for the top 5 videos from my YouTube account (“pelargir”) using the Radius tags provided by the extension:

    <ul>
      <r:youtube user="pelargir">
        <r:videos count="5">
          <li><a href="<r:video:url />"><r:video:title /></a></li>
        </r:videos>
      </r:youtube>
    </ul>
    

    To install in your own project, visit the youtube_tags profile in the Radiant extension registry.

  • Spreedly extension for Radiant

    I built the original Spreedly extension for Radiant a couple of years ago. It’s a nice little package that makes it really easy to integrate Spreedly’s subscription payment system with your Radiant site. You can choose which pages to require a subscription to view, manage subscribers from the admin backend, and so on. It’s long overdue, but I finally got around to upgrading the extension to work flawlessly with Radiant 0.9.1. Check it out and have fun.

  • Drop seconds when querying MySQL timestamps

    One of the Rails apps I’ve been working on formats times as HH:MM in the view. No seconds are displayed. This is a pretty common way to format things. When doing timestamp comparisons in SQL, however, the seconds are taken into account. This is bad since it can cause discrepancies in the view.

    For example, say I have a table of records with created_at timestamps. My view displays all records with timestamps equal to or before the current time. Let’s assume the current time is 15:00:00 precisely and I happen to have a record with a timestamp of 15:00:00 in the database. The SQL comparison would work fine in this case.

    SELECT * FROM records WHERE created_at <= "2010-06-25 15:00:00"
    => 1 row in set
    

    What if the timestamp in the database is 15:00:03 though? Let’s run the query again.

    SELECT * FROM records WHERE created_at <= "2010-06-25 15:00:00"
    => Empty set
    

    Since 15:00:03 is greater than the current time of 15:00:00, the record doesn’t get returned. This would be fine if we were displaying seconds in the view, but we’re not. From the user’s perspective, the timestamp on the record is still 15:00 and should appear in the view since it’s equal to the current time. But it doesn’t.

    One way to fix this would be to handle the time comparisons in Ruby. This is certainly a legitimate option. For this particular project, though, performance was a big issue. (And we all know that Rails can’t scale.) I needed a way to continue letting the database handle the comparisons while disregarding seconds.

    The solution I ended up with isn’t ideal (it relies on a couple of functions built into MySQL) but it works fine and runs fast:

    SELECT * FROM records WHERE (created_at - INTERVAL SECOND(created_at) SECOND) <= "2010-06-25 15:00:00"
    => 1 row in set
    

    The number of seconds is extracted from the created_at timestamp and then subtracted from the timestamp. So if the timestamp was 15:00:03, MySQL subtracts 3 seconds to end up with 15:00:00.

    This solved the comparison problem for me and made my client very happy. Double win.

  • Rails 2.3.8 – an embarrassing trip

    November 30, 2009: Rails 2.3.5 has just been released. I upgrade my production Rails apps and rock on.

    February 17 of this year: RubyGems 1.3.6 is released and my apps begin suffering from deprecation warnings. They’re all over the place: when I run a test, when I launch script/console… when I sneeze.

    /Users/pelargir/Projects/teascript/config/../vendor/rails/railties/lib/rails/gem_dependency.rb:119:Warning: Gem::Dependency#version_requirements is deprecated and will be removed on or after August 2010.  Use #requirement
    

    Rumor has it the deprecation warning will go away with 2.3.6. And 2.3.6 is expected to drop any day. Yay! Problem will be solved soon… or so I thought. I begin waiting.

    May 23: 2.3.6 finally drops.
    May 24: 2.3.7 drops because of a bug in 2.3.6. What the heck?
    May 25: 2.3.8 drops because of a bug in 2.3.7. Okay, this is getting crazy.

    Was anyone else embarrassed about the 6-month delay for 2.3.6 followed by two more point releases over the span of three days? This is exactly the kind of anecdote an exec at a Fortune 500 would raise to prevent a move towards Rails and keep the company locked into Java or .NET for another decade. Ugh.

    We can do better than this.