The Basics of Ruby Memoization

Are you familiar with Memoization? Maybe you’ve heard of it, but are uncertain how you can use it in your code? In this post you’ll get an introductory look into memoization. You’ll learn what it is, how you can use it to speed up your code, and where memoization can bite you in the butt!

First off a simple definition:

Memoization is the process of storing a computed value to avoid duplicated work by future calls.

Broken down into basics memoization is always going to fit into the following pattern:

Perform some work
Store the work result
Use stored results in future calls

In Ruby the most common pattern for memoizing a call is using the conditional assignment operator: ||=. If you’re not familiar with this operator I’d suggest reading Peter Cooper’s excellent explanation on it.

Let’s look at a common piece of code that occurs in many Rails applications and apply memoization to it.

If you’ve ever worked with a user login system, you’re likely familiar with the pattern of loading the current_user from the application_controller:

def current_user
  User.find(session[:user_id])
end

Within any one request in a Rails app you’ll usually see multiple calls to current_user which means User.find is run multiple times. Here’s an example console output from a request without memoization in place for current_user:

Console Output of Rails app not using Memoization

In this image that there is 1 call to current_user that performs the initial query, then 5 more calls (represented by CACHE). Those cache calls mean Rails is returning the cached result of the SQL query, but it doesn’t include the cost of building the User object. And because Rails hides the cost of object creation these queries cost more than the 0.0ms and 0.1ms reported!

As a rule of thumb, if you see CACHE (X.Xms) you should investigate for inefficiencies in your code base!

In our case, we know the problem is because there are multiple calls to current_user occurring. Let’s fix this code by introducing memoization into the current_user method and storing the result of User.find using conditional assignment to an instance variable:

def current_user
  @current_user ||= User.find(session[:user_id])
end

It’s important to notice that the result of find is assigned to an instance variable instead of a local variable. If you were to use a local variable (a variable without the @ symbol) then user wouldn’t be stored and the find query would occur on every call to current_user. Nothing would have improved.

Re-running the request with memoized current_user the console output looks like this:

Console Output of Rails app using Memoization

All the CACHE lines are gone, which means there are no more calls to rebuild the User object each time current_user is called! And if you look closely at the last line in each image, the total time drops 50ms by introducing memoization!

Like all programming techniques and tools memoization has its place. In our case we’re trading space (storage of the User object) for faster query time. The trick is knowing when to use it and when not to use it.

When should you memoize?

When you’ve got duplicated database calls (like current_user above)
When you’ve got expensive calculations
When you’ve got repeated calculations that don’t change

When shouldn’t you memoize?

Memoize can introduce some very subtle bugs that are hard to track down. Memoization shouldn’t be used with methods that take parameters:

# incorrect memoization
def full_name(first_name, last_name)
  @full_name ||= "#{first_name} #{last_name}"
end

puts full_name('Billy', 'Bob')
# => "Billy Bob"

puts full_name('Sally', 'Sue')
# => "Billy Bob"

Or with methods that use instance variables:

# incorrect memoization
def full_name
  @full_name ||= "#{@first_name} #{@last_name}"
end

@first_name = 'Billy'
@last_name = 'Bob'

puts full_name
# => "Billy Bob"

@first_name = 'Sally'
@last_name = 'Sue'

puts full_name
# => "Billy Bob"

In both cases we see full_name is memoized on the first call for 'Billy Bob' and as a result the second call produces 'Billy Bob' even though different values ought to be applied. Don’t let these contrived examples fool you, hitting problems like this in production code is a total pain!

Be sure to check out the follow up post on advanced memoization that shows you how to get around the pitfalls noted above!