Reducing db load with cross-table caching (denormalization) using JSON in Rails

I’m working on a side project that lets people share embeddable videos with a follower/following system a la Twitter. You can see a rough example of the setup I’m using in this old demo app. The basic idea is that each user has a feed of posts created by other users that they follow.

So far, I’ve found that having a feed_items table that stores posts on a per-user basis makes the most sense. If you drop an index in there, it seems speedy enough for my needs. Here’s a simplified example of the relevant portion of the schema:

http://gist.github.com/566272?file=gistfile1.rb

Now, if you’re going to show a user their “feed” you might do something like this:

http://gist.github.com/566272?file=gistfile2.rb

The only problem is that, even with those fancy includes, you’re still querying 3 different tables:

http://gist.github.com/566272?file=gistfile3.txt

So, I started experimenting with storing attributes of the post and associated user information directly in the feed_items table. I played around with Active Record’s serialize method (link), but it didn’t seem like the right fit.

Here’s what I came up with yesterday. I think it’s a pretty good start:

http://gist.github.com/566272?file=gistfile4.rb

Essentially, I’m just pushing a cached copy of the post (including the associated user) into the feed_items table, using JSON. This way, we can get away with a single query for the user’s feed_items:

http://gist.github.com/566272?file=gistfile5.txt

I’m curious to hear what other people think about this, and if there are smarter ways to accomplish the same goal.

Published by

Trevor Turk

A chess-playing machine of the late 18th century, promoted as an automaton but later proved a hoax.

5 thoughts on “Reducing db load with cross-table caching (denormalization) using JSON in Rails”

  1. This does seems like good old denormalization to me too, and it seems like you'd want a Post#after_save to refresh all the feed_items caches in the event of a Post being updated..

  2. Yeah, this is basically just denormalization. The difference from what I've seen in a "typical" Rails app before is that I'm denormalizing a big chunk of the associated record(s) instead of doing it column by column without using the serialize with YAML method that Rails provides.

    I'm not bothering with a Post#after_save right now because I'm not allowing Posts to be updated ;)

    As for the Mongo/Friendly-ness of it – yeah, it's a bit schema-less, but it's more about enabling denormalization with as little complexity as possible. For me, on this project, that means trying to stick with the one database if possible.

  3. On Weplay, we use an activities table iun MySQL, that has cached html in a column similar to what you're doing. Those activity IDs are distributed to other user's feeds, which are Redis lists. So serving a feed means a fast query to redis to grab the list of IDs to render, and then an ID-based lookup from the MySQL activities table, which brings back the HTML ready to render. The advantage over your approach (if I understand it correctly) is less duplicate data storage and only one place to refresh the cached representation of the activity/feed item.

    Cheers,

    Luke

Comments are closed.