Faster PDFs with wicked_pdf and delayed_job (part 2)

In part 1 we learned why backgrounding is important. Now let’s dive into some code.

First things first. Add wicked_pdf and delayed_job to your Gemfile:

gem "wicked_pdf"
gem "delayed_job"

Now we can generate a PDF from inside our Rails app with this simple command:

html = "<strong>Hello world!</strong>"
pdf = WickedPdf.new.pdf_from_string(html)
IO.write("output.pdf", pdf)</pre>

You’ll notice that the more complex the HTML, the longer it takes wicked_pdf to run. That’s exactly why it’s important to run this process as a background job instead of in a web server process. A complex PDF with embedded images can take several seconds to render. That translates into several seconds of unavailability for the web process handling that particular request.

Let’s move this code into a background job:

class PdfJob < ActiveJob::Base
  def perform
    html = "<strong>Hello world!</strong>"
    pdf = WickedPdf.new.pdf_from_string(html)
    IO.write("output.pdf", pdf)
  end
end

Now we can queue the background job from a Rails controller like this:

class PdfController < ApplicationController
  def generate_pdf
    PdfJob.perform_later
  end
end

The only problem is, our job isn’t doing anything particularly interesting yet. The HTML is statically defined and we’re writing out to the same file each time the job runs. Let’s make this more dynamic.

First, let’s consider the HTML we want to generate. In a Rails app, the controller is generally responsible for rendering HTML from a given ERB template using a specific layout. There are ways to render ERB templates outside controllers, but they tend to be messy and unwieldy. In this situation, it’s perfectly reasonable to render the HTML in the controller and pass it along when we queue a job:

class PdfController < ApplicationController
  def generate_pdf
    html = render_to_string template: "my_pdf"
    PdfJob.perform_later(html)
  end
end

This assumes an ERB template named “my_pdf.erb” exists and contains the HTML we want to convert into a PDF. Our method definition within our background job then becomes:

class PdfJob < ActiveJob::Base
  def perform(html)
    pdf = WickedPdf.new.pdf_from_string(html)
    IO.write("output.pdf", pdf)
  end
end

delayed_job actually persists the HTML passed to the job in a database table so the job can retrieve the HTML when it gets executed. Since the job is executed asynchronously, the HTML has to be stored somewhere temporarily.

So far, so good. The job will generate a PDF based on the HTML rendered in the controller. But how do we return this PDF back to the user when it’s ready? It turns out there are a variety of ways to do this. Saving the PDF to the file system in a publicly accessible folder is always an option. But why consume precious storage space on our own server when we can just upload to Amazon S3 instead for a few fractions of a cent?

What’s nice about S3 is that it can be configured to automatically delete PDFs within a bucket after 24 hours. Furthermore, we can generate a temporary URL to allow a user to download a PDF directly from the S3 bucket. This temporary URL expires after a given period of time, greatly reducing the chance that a third party might access sensitive information.

Next week I’ll demonstrate how to integrate S3 into our background job using the AWS SDK.