In part 1 we learned why backgrounding is important. Now let’s dive into some code.
gem "wicked_pdf" gem "delayed_job"
Now we can generate a PDF from inside our Rails app with this simple command:
html = "<strong>Hello world!</strong>" pdf = WickedPdf.new.pdf_from_string(html) IO.write("output.pdf", pdf)</pre>
You’ll notice that the more complex the HTML, the longer it takes wicked_pdf to run. That’s exactly why it’s important to run this process as a background job instead of in a web server process. A complex PDF with embedded images can take several seconds to render. That translates into several seconds of unavailability for the web process handling that particular request.
Let’s move this code into a background job:
class PdfJob < ActiveJob::Base def perform html = "<strong>Hello world!</strong>" pdf = WickedPdf.new.pdf_from_string(html) IO.write("output.pdf", pdf) end end
Now we can queue the background job from a Rails controller like this:
class PdfController < ApplicationController def generate_pdf PdfJob.perform_later end end
The only problem is, our job isn’t doing anything particularly interesting yet. The HTML is statically defined and we’re writing out to the same file each time the job runs. Let’s make this more dynamic.
First, let’s consider the HTML we want to generate. In a Rails app, the controller is generally responsible for rendering HTML from a given ERB template using a specific layout. There are ways to render ERB templates outside controllers, but they tend to be messy and unwieldy. In this situation, it’s perfectly reasonable to render the HTML in the controller and pass it along when we queue a job:
class PdfController < ApplicationController def generate_pdf html = render_to_string template: "my_pdf" PdfJob.perform_later(html) end end
This assumes an ERB template named “my_pdf.erb” exists and contains the HTML we want to convert into a PDF. Our method definition within our background job then becomes:
class PdfJob < ActiveJob::Base def perform(html) pdf = WickedPdf.new.pdf_from_string(html) IO.write("output.pdf", pdf) end end
delayed_job actually persists the HTML passed to the job in a database table so the job can retrieve the HTML when it gets executed. Since the job is executed asynchronously, the HTML has to be stored somewhere temporarily.
So far, so good. The job will generate a PDF based on the HTML rendered in the controller. But how do we return this PDF back to the user when it’s ready? It turns out there are a variety of ways to do this. Saving the PDF to the file system in a publicly accessible folder is always an option. But why consume precious storage space on our own server when we can just upload to Amazon S3 instead for a few fractions of a cent?
What’s nice about S3 is that it can be configured to automatically delete PDFs within a bucket after 24 hours. Furthermore, we can generate a temporary URL to allow a user to download a PDF directly from the S3 bucket. This temporary URL expires after a given period of time, greatly reducing the chance that a third party might access sensitive information.
Next week I’ll demonstrate how to integrate S3 into our background job using the AWS SDK.