Faster PDFs with wicked_pdf and delayed_job (part 3)

In part 2 we coded our PDF generator as a background job. But the PDF is still being stored on the local file system. Let’s store it in S3 instead and give our users a URL so they can download it.

First let’s add the AWS SDK gem to our Gemfile:

gem "aws-sdk"

Let’s define environment variables for our AWS credentials:

AWS_ACCESS_KEY_ID=abc
AWS_SECRET_ACCESS_KEY=secret

Next we’ll modify our background job to connect to S3 and upload our PDF file instead of saving it to the local file system:

class PdfJob < ActiveJob::Base
  def perform(html)
    pdf = WickedPdf.new.pdf_from_string(html)
    s3 = AWS::S3.new
    bucket = s3.buckets['my-bucket'] # replace with your bucket name
    bucket.objects['output.pdf'].write(pdf)
  end
end

Nice! But how do we enable our users to download the file? S3 has several options for this. One option would be to make the bucket publicly accessible. The downside to this approach is that it would allow anyone to download any PDFs stored in the bucket, regardless of who originally uploaded them. Depending on what kind of data is being included in the PDFs, this could be a bad idea.

A better option is to generate a temporary URL. This URL can be given to a user so they can download the file, but the URL is only usable for the period of time we specify. This reduces the likelihood that the PDF will be exposed publicly. Here’s how it’s done:

class PdfJob < ActiveJob::Base
  def perform(html)
    # ...
    obj = bucket.objects['output.pdf'].write(pdf)
    url = obj.url_for(:get, expires: 3.minutes.from_now).to_s
  end
end

Looks good. But how do we get this URL back to the user? The background job is asynchronous so it’s not like we can generate the PDF and return the string to the user all in the same HTTP request.

A simple approach is to write the URL back into the database. Let’s introduce a new user param and update the user with the URL (this assumes the column exists on the users table):

class PdfJob < ActiveJob::Base
  def perform(html, user)
    # ...
    url = obj.url_for(:get, s3_url_options).to_s
    user.update_attribute(:pdf_url, url)
  end
end

Now that the URL is available in the database, we can display it on the user’s profile page.

If we want to get even fancier we can write some JavaScript that’s executed immediately after the user requests a PDF. This script would periodically poll an Ajax endpoint in our app to determine if the URL has been written to the users table yet. When it detects the URL, it would redirect the user to the URL. This would make the PDF generation process seamless from the user’s perspective.

An example in jQuery might look something like this:

function poll(btn) {
  $.get("http://www.our-app.com/users/123/pdf_url", function(data) {
    if (data.length > 0) {
      window.location = data;
    } else {
      setTimeout(function() { poll(btn) }, 2000);
    }
  });
}

Our controller action might look like this:

class UsersController < ApplicationController
  def pdf_url
    user = User.find(params[:id])
    render text: user.pdf_url
  end
end

And there you have it. I hope this gave you a good idea of just how easy it can be to generate PDFs in a background job. If your site isn’t getting much traffic, it’s probably not worth going this route. But if it’s a popular site (or you expect it to be one day) it would be well worth investing the time to background this process. It’ll go a long way towards keeping your HTTP response times short, and your app will feel much snappier as a result.