Tag Archives: rails

Migrating from synchronous rails, to async node – part three of a three part guide

In the last post, I implemented an asynchronous function that wrapped a call to Redis, using an existing node library, node-redis.

The final implementation introduced nested asynchronous method calls, and the code ended up looking a bit like this, even after simplifying somewhat:

  User.prototype.findByDevice = function(device_mac, callback) {

    db.hgetall(device_mac, function (err, device) {
      if (err) { console.log(err) }
      else {
        if (device.hasOwnProperty('mac')) { 
          db.hgetall(device.owner, function (err, user) {
            if (err) { console.log(err)}
            else{
              callback(err , user)
            }
          });
        }
      }
    });
  }

Now the code took this form, because we relied on the results of one asynchronous function to make the second one - if you take a second to imagine how hard to read this would look at four, five or six levels of nesting, you'll quickly understand why so many developers are writing their own callback management libraries to make this easier to work with.

Introducing async.js

The one I've found most promising so far is async.js, a fairly comprehensive utility module that provides a number of different ways to ensure that asynchronous functions are either called in a specific order, or run in parallel, aggregating their results before allowing code to continue and so on.

In this case, I'll be focusing on the use of waterfall, a function in the async module that lets you pass in an array of functions to be called in order, passing the results of one to the next, until a final callback passes the final result on to the code initially calling the function async was called from within.

  User.prototype.findByDevice = function(device_mac, callback) {

    async.waterfall([

        // fetch our device first
        function(cb){
          db.hgetall(device_mac, function (err, res) {
            cb(null, res);
          })
        },

        // new we have our device, fetch the user
        function(device, cb){
          db.hgetall(device.owner, function (err, res) {
            cb(err, res);
          })
        }

        // return our user object
      ], function (err, user) {
        callback(err, user)
      });

  }

In our case, we have our function findByDevice on User, and we have passed an array containing our two asynchronous functions as the first argument to async, then passing a final anonymous function to return our user object.

To be more specific, just like the code above, we take our mac address string as the first parameter to findByDevice, and the function to execute as our second parameter, callback.

We then make the asynchronous call to Redis to fetch a device object, passing in cb as our function to execute once Redis has given us our hash, to pass it to the next function in the array.

We then use the owner property of the device object passed into the second function, to make another call to fetch our user, again passing in cb, to execute once Redis has given us a user object, to pass to the final function.

Once we have the user object, we can pass it on to the code that called findByDevice with callback(err, user), completing the asynchronous callback chain.

More than just waterfalls

Of course, just because we now know how to execute asynchronous functions in a set order, one after the other like we're used to doesn't mean we should always do so.

One of the advantages of node's asynchronous style is that it allows the parallel execution of code, so the same operations could be applied to the an array of values at the same time, getting around bottlenecks, but then only passing on the results once all the operations have been completed.

Alternatively, this allows us to pop values onto queues, with set numbers of workers, to work through them, without needing a dedicated worker process like you would with in Rails for, when using delayed_job or resque.

Doesn't all this seem like a lot of work though? The ruby you showed me first was much shorter, and easier to read

In a word, yes.

Node isn't a magic bullet, and although it's popular, if you're doing a basic CRUD app, there are often very good reasons to choose Rails, Django over Node and Express.

That being said, it pays to understand your options when choosing a particular technology to solve the problem facing you. Also, if you're a fan of behaviour driven development, it's good to know that such an approach is possible with this technology, and, once you've got your head around async programming, there's value in knowing that there some well developed tools to help you apply these techniques to both server-side, and client-side javascript.

If anything's not clear in this series, please let me know - I've sunk a good few hours into these posts now, to make it easier to understand async node development if you're used to sync ruby development, and I'd really like to know where I can improve these for future visitors.

Migrating from synchronous rails, to async node – part two of a three part guide

This is the second of the three part series covering how to migrate from developing synchronously with Rails and Rspec, to asynchronously with Node, and [Mocha]. It picks up from the previous post, introducing Mocha syntax, and asynchronous testing.

We covered before how we'd implement a class with instance methods in Ruby in the previous post.

Here's the simplified psuedo-code, for comparison to some coming javascript:

class User do

  def initialize
    @db = Redis.new
  end

  def find_by_mac(mac)
    @db.hgetall(mac) 
  end

end

It looks a bit different when working with asynchronous javascript code.

What implementation in node looks like

Because javascript doesn't have a class system, if we want something that acts a bit like a class, the idiomatic approach is to use functions, and use a bit of boilerplate code to make it easier to identify the function in stacktraces or logging when developing.

In javascript, instead of defining class methods or instance methods like we do with Ruby, we'd use prototype to inject new methods into the User function, so they're available to all instances of the User function in the system.

This chunk of code below is roughly analagous to declaring a User class in Ruby, mixing in methods from an EventEmitter module and giving it a to_s method, so there's a readable string returned when you try to log the class, or print it:

function User() {
  if(false === (this instanceof User)) {
    return new User();
  }
  events.EventEmitter.call(this);
}
sys.inherits(User, events.EventEmitter);

User.prototype.toString = function() {
  return "User"
}

One thing - because we can rely on User.prototype.toString returning a value instantaneously, we can treat it as synchronous code, without thinking about callbacks, and using return the way we would in other languages.

Writing asynchronous functions

However, when we're working with asynchronous functions, things are different, and so far, we've only covered calling them, not defining them.

Here's a simplified version of an asynchronous function in use in the User function.

We have defined a function on the prototype of User, accepting two parameters:

  • device_mac - a String we use as our key when fetching a hash with Redis
  • callback - the function we want to pass into findByDevice for later execution when Redis gives us our hash to execute operations on when it's done.

In line with convention, our function callback itself takes two parameters, err and res. In our case, res is the hash given to us by Redis, if all is well, and err is what we get if something goes wrong with Redis when it's fetching our hash for us.

User.prototype.findByDevice = function(device_mac, callback) {
  db.hgetall(device_mac, function (err, res) {
    callback(null, res);
  })
})

Checking this against our test code

It might be helpful to show these side by side, to put the implemented function on User, next to the function we're passing with our test to see what it is we're passing into findByDevice:

Our implemented function:

User.prototype.findByDevice = function(device_mac, callback) {
  db.hgetall(device_mac, function (err, res) {
    callback(err, res);
  })
})

The test:

user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res) {
  if (err) {
    // do something to recover
  } else {
    res.username.should.be.ok
    res.username.should.equal('mrchrisadams')
    done()
  }
})

When we have the value from Redis, callback(err, res) is executing the function below, with our res.username.should.be.ok type assertions.

When you need to call async functions from async functions

Once you've got your head around passing functions for asynchronous code, you'll often find yourself working with multiple asynchronous functions, that you need to control the order of, so that data is passed from one to the other, to give you the result you want.

Here's the first actual implementation of the findByDevice function I ended up using in the project I'm working on.

We still pass in the callback function as our final parameter, but in order to return the value we want, we end up nesting a second call to db.hgetall inside the anonymous function we pass into our first call of db.hgetall, then use callback(err, user) to execute the function passed into findByDevice, and pass the results along to the code initially calling user.findByDevice.

  User.prototype.findByDevice = function(device_mac, callback) {

    db.hgetall(device_mac, function (err, device) {
      if (err) { console.log(err) }
      else {
        if (device.hasOwnProperty('mac')) { 
          db.hgetall(device.owner, function (err, user) {
            if (err) { console.log(err)}
            else{
              callback(err , user)
            }
          });
        }
      }
    });
  }

Avoiding callback hell.

Even with just two asynchronous function calls, this isn't very readable.

Also, it seems that nearly every second developer on the planet playing with node has written their own callback handling library to make this easier to read and more maintainable.

In fact, there's a bewildering number of libraries out there that claim to make this problem much easier to understand.

In the next post, I'll introduce async.js a well documented library I've found fairly straightforward to work with, to help mitigate against callback hell.

Migrating from synchronous rails, to async node – part one of a three part guide

In my spare time, I've recently been working with a few codebases that either are written in, or use enough code written in nodejs, that make me keen to have some kind of testing framework in place to help put the same kinds of safety nets in place that I'm used to working with on Chef, Sinatra or Rails projects.

After losing a couple of weekends looking into BDD style development with node, and getting my head around asynchronous coding concepts, I think I've settled on an approach that feels enough like Rspec to feel comfortable enough to use for future server-side js development.

It's way too much for a single post, so I'll be sharing the first of a three part series of posts here, to help other Ruby developers used to synchronous development with Rspec adjust to asynchronous development using Mocha, the closest thing I can find to it in node-land right now.

In this post I'll cover how Mocha syntax compares to Rspec, then the next post I'll cover implementing the code to pass these Mocha tests. In the third post, I'll touch on how to keep asynchronous code halfway manageable in node, and avoid getting stuck in callback hell.

How we do it in Ruby

I'm going to use a side project I've been hacking on for a few months, to show how I'd add a new class to wrap calls around a persistence layer to provide a degree of encapsulation, abstracting away the database technology from external interface for the class.

In this case, with my User class, I want to have a method that finds me a user, stored as a hash in Redis, keyed on the mac address of the laptop or iphone they're using.

The tests I'd write might look a bit like this pseudocode here:

describe 'User' do

  before(:each) do 
    redis = Redis.new
    # create an entry in redis with the key "99:aa:44:33:01:3r", 
    # and the hash below
    redis.hmset("99:aa:44:33:01:3r", {
      :username    => "mrchrisadams",
      :name        => "Chris"
      :email       => "wave@chrisadams.me.uk",
      :mac_address => "99:aa:44:33:01:3r"
    })
  end

  it 'fetches the user object' do 
    u = User.new
    c = u.find_by_mac('99:aa:44:33:01:3r')
    c.name.should be('mrchrisadams')
  end

end

I use the before block to store a hash inside redis, setting a few extra values on it, and then later on, in the it 'fetches the user object' block, I create an instance of my User class, and call the find_by_mac method to fetch me the hash I just stored in Redis.

The implemntation code in Ruby, might look like this:

class User do

  def initialize
    @db = Redis.new
  end

  def find_by_mac(mac)
    @db.hgetall(mac) 
  end

end

So far so good. This is synchronous code, and feels comfortable, and is easy enough to work with.

Doing this in node

For future reference, the completed mocha test code is here on github, and the completed implementation code is here too.

Now, lets try to take the same approach in node, to see how different this looks, but also to see what we need to be aware of when learning to think in asynchronous terms.

So, lets be good developers and try to write out test code first, in Mocha. I'll paste the lot, then go through the interesting bits piece by piece.

describe('User', function() {
  describe('#findByDevice', function() {

    beforeEach(function(done) {

      db.hmset("mrchrisadams", {
        name: "Chris Adams",
        username: "mrchrisadams",
        devices: ["00:1e:c2:a4:d3:5e"],
        email_address: "wave@chrisadams.me.uk"
      }, done);

    })


    it('should fetch the user for that mac', function(done) {
      var user = new User();
      user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res) {
        if (err) {
          console.log(err)
        } else {
          res.username.should.be.ok
          res.username.should.equal('mrchrisadams')
          done()
        }
      }); // find by device
    }) // should fetch the user for that mac
  })
})

So first of all, much of the syntax is looks quite like Rspec. We have nested describe and it blocks, and even the assertion syntax is familiar, with nice, readable shoulds, bes and equals all around.

However, there are a few important additions here that we need, to allow for the asynchronous nature of node.

First of all, lets look at the beforeEach function:

beforeEach(function(done) {

  db.hmset("mrchrisadams", {
    name: "Chris Adams",
    // object vars edited out for brevity
  }, done);
})

In this case, we're making a call to db, an instance of node-redis a popular redis client library for node, that is completely asynchronous.

If we tried using the bog-standard beforeEach function like this, when working with an asynchronous library (not the lack of done), it would look like the code below:

  beforeEach(function() {
    db.hmset("mrchrisadams", {
      name: "Chris Adams",
      // object vars edited out for brevity
    });
  })

Had we done this, node would have zipped to the beforeEach function, started it, then returned straight away, racing ahead trying to run the tests below it, without waiting for our Redis setup steps to be finished.

Now Redis is fast, but you can't rely on that to make sure your tests are set up before you run them, and this code would have given us at best unpredictable results, but more likely, fails across the board.

done to the rescue

Here's how we do it when working asynchronously with Mocha.

beforeEach(function(done) {

  db.hmset("mrchrisadams", {
    name: "Chris Adams",
    // object vars edited out for brevity
  }, done);
})

The difference this time round is that we're passing in done, a function that exists to stop the tests running until our Redis setup steps are finished, and we're in a state for testing.

We have to take this approach too for the tests itself, in our it function:

it('should fetch the user for that mac', function(done) {
  var user = new User();
  user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res) {
    if (err) {
      // do something to recover
    } else {
      res.username.should.be.ok
      res.username.should.equal('mrchrisadams')
      done()
    }
  }); // close findByDevice
}) // close should fetch the user for that mac

Here, we're doing something very different to the ruby approach of storing returned values from methods in variables, then testing the value of those.

Our first real exposure to "continuation-passing style"

Look at this line in paticular:

user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res)

Understanding this here for me was the key to getting my head around this initially very alien syntax. If you're having trouble with the shift from sync to async, the closest thing in typical sync ruby code might be something like this line, where you set the varibale res then use it for testing assertions against:

res = user.findByDevice('00:1e:c2:a4:d3:5e)
res.should be_okay

Now there's two important things here to remember when working with node:

1) Because we're working asynchronously, we only want to run out assertions once we know we have the values back from the call we just made
2) We are working with javascript, passing functions around to execute the code inside them at a later date, both common, and necessary.

So, our solution to the asynchronous problem, is to pass in a function with the assertions we care about inside it, as a parameter to our findByDevice call on the user object.

So, what we're saying here is: "go fetch the results of findByDevice, with the paramters 00:1e:c2:a4:d3:5e, and here's the function I'd like you to execute when you're done, please":

  user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res) {
    if (err) {
      // do something to recover
    } else {
      res.username.should.be.ok
      res.username.should.equal('mrchrisadams')
      done()
    }
  })

You might be confused by the two parameters err, and res.

There is a generally accepted convention when coding asynchronously in node, to make it possible to pass the result from one callback to another. Passing in a function which itself has the parameters error, and result (or some variaton on the name, like err, and res) as the last argument going into a method call is idiomatic node javascript now, and is often referred to as the continuation-passing style.

It's crucial to understand it, because you won't get far without it.

Where does done() fit into this?

You might notice a call to done() on the last line of the function we're passing in, after the assertions.

Mocha, when you pass in done to a testing block, doesn't know when the test finished, so will wait for done() to be called, before deciding if that particular it block has failed or passed.

Now onto the implementation in node

So we've run thorugh how Mocha works now, and how it compares to Rspec, and we've seen how we rely on anonymous functions to run our assertions on the results of asynchronous functions.

(As an aside, anonymous functions are simply functions with no name - just the keyword, parameters, the code to execute them. They looks like function(err, res) { // do stuff }, instead of function hasSomeName(err, res) { // do stuff } ).

It's worth re-reading the above, until you're really comfortable with the concepts, as the next section, implementation is unfortunately pretty messy.

CodeSchool is really rather good

Since coming across CodeSchool's free [Rails for Zombies] course online a few months back, I've been working though a few other paid courses they offer, and I recently finished both Rails for Zombies 2, and Rails Best Practices.

I thought it might be useful to share my experiences with Codeschool learning tools.

While I've been dabbling in Rails on and off over the years, since joining AMEE a few months ago, at times I've felt a little behind all the cool kids who've been coding nothing but Ruby all the time I was swearing at Drupal at Headshift, especially after all the changes brought about in Rails 3 and 3.1, with the asset pipeline, new AREL based query syntax, and shiny new default exotica like Coffeescript and SCSS.

Sure, one of the key points of Rails 3 is that it's much more modular, so you're not explicitly forced to use all this new hotness, but if you know enough about all these newer parts of the framework, at least you can make more informed decisions about which parts you'll update, and which parts you'll keep when updating a series of apps from Rails 2.3 to 3.

I've found Codeschool's trio of Rails focussed courses to be immensely useful in this respect. I really like the way they deliver the courses, and if I could, I'd want to learn to use any moderately complex like this in future, instead of reading about it in books.

The problems I have with learning from computer books

I'm pretty sure I'm not alone in having a habit of buying interesting looking O'Reilly or Pragmatic Programmer books, and never actually getting round to working through them in a useful fashion for one of the following reasons:

  • I read them away from the computer and everything feels too abstract to be useful, or I forget it what I've really learnt by the time I get a chance to use it
  • I sit down at a computer, with a book and an editor, and I end up wrestling with the setup, and spending time fighting with rubygems, or some some compiler issue brought about by one C-extension needed by a gem, or I simply lose hours due to stray typos I make when typing out code from the page, or when copying from a pdf.

This doesn't mean books are useless to me, I've got an immense amount from the Rspec book lately, as well as Gregory Browns Ruby Best Practices. But I frequently struggle with the two points enough to make me think twice about buying books at all.

Why I prefer CodeSchool's approach

I really like the information in Codeschools is structured - it feels like a much better fit for how I tend to learn things.

Each course is split into a logical series of chapters, focusing on a specific element of the Rails framework, before you get to use what you've learnt.

For example, you'll watch a 15-20 minute long video introducing how Activerecord works together with the new AREL based query builder, and have the option to get some downloadable slides and cheatsheets.

You'll then then work through a series of increasingly difficult exercises where you have to apply what you've learned, completing code examples, which are executed on the Codeschool servers to let you see if you've got things right or wrong. You'll inevitably have forgotten some of the content in the video by the time you're completing all these code samples, which is why the cheatsheets and slides are so useful - using them and referring back to them helps reinforce the lessons you've just learnt.

If you're really stuck, there are increasingly explicit hints available on some difficult sections, but having each hint costs points from your score as you progress through the course.

This was enough to make me refer back to the course content when I could have just taken the easy option, whilst still leaving me with options if I couldn't get past a section of the course on my own.

When I first started with CodeSchool, the model of paying more than $50 for what looked like screencasts with a few bells on top did seem a bit steep, especially when the fantastic Peepcode screencasts are a fraction of the price. But when you consider that a comparable computer book will often cost north of £30 anyway, and be obselete much faster, it starts to feel much more reasonable. Right now, I struggle to find a better tool to help you get a good mental model of how the parts of a framework like Ruby on Rails work.

Becoming a subscriber

Partly out of curiousity, and partly because I'm already impressed what CodeSchool offer, I tried out enrolling to CodeSchool recently - you get access to all the courses for $25 per month, in addition to steady stream of incredibly well made CodeSchool screencasts at are only available to members.

In terms of honing my skills as a developer, it feels like the best $25 I could spend each month - meetups are great for find people to ask specific questions, and reference sites are great once you have a decent idea of what you're supposed to be looking for, but the high level knowledge codeschool gives me, combined with the practice of working though problems that don't feel ridiculously contrived, is something I only think I could better by having someone dedicated to tutoring me in person and pairing with me daily.

I'm pretty sure that would cost more than $25 bones per month, so for now, I can safely say I'm a happy Codeschool user, and I'd happily recommend them to anyone interested in learning Rails, or any Rails developers who want to get a good grasp of the current best practices when writing Rails 3 apps.

Avoiding that Face Palm moment with Logrotate

A runaway log file brought down on a previously built Rails app I was involved in building recently, and the solution to the problem here is so simple and easy to implement that a) I feel like a total dunce for not having this setup here in the first place b) it's almost churlish not to list it here, for reference for someone else, in the hope that it saves them feeling this stupid themselves in the future, (oh, and keeps the site they're working on up...).

If you're not using Rail's own to rotate the log files it generates, it really is good idea to make sure any log files it does make are being rotated, to make sure you don't get caught out when that innocuous seeming development.log file from a few months back goes live ends up bringing down your site because it's since grown from a 6k file to a 7 gigiabyte one, and eaten all the space on your server.

Making sure this doesn't happen is a pretty simple process:

  • Find where the offending logfiles are eating up all your disk space.
  • Create a new logrotate entry pointing to them.
  • Trigger the logrotate daemon to test it
  • Relax and get on with your life

Okay lets run through this in more detail.

Find the offending logfiles

The first step here is to find where the logs are being created. This here is the cause of the problem on a lot of boxes running Rails apps, because if you're using Capistrano to deploy an app, and you're using a stock Passenger config then your logs will end up in somewhere that the logrotate daemon, the program that dutifully goes around compressing and sorting logfiles on your server, won't know where to look for by default.

Normally, you can expect to find these quietly ballooning log files somewhere like /home/deploy/app/shared/log/, or /rails/deploy/appname.production/shared/log/.

Create the new logrotate entry

Now that we know where the logs are that keep eating space, we can tell logrotate about them, to make sure they won't grow too large. Create a text file named railsapp (or whatever makes the most sense to you) in /etc/logrotate.d/, the default place to leave instructions for logrotate to follow:

 
/home/deploy/app/shared/log/*.log {
  daily
  missingok
  rotate 30
  compress
  delaycompress
  sharedscripts
  postrotate
    touch /home/deploy/app/current/tmp/restart.txt
  endscript
}

Looking at that line by line:

  • daily calls this script daily
  • missingok means it's okay if we're missing some log files, we'll still work without stopping
  • rotate 30 means keep the last 30 days of logs
  • compress yup, compress the logs (using gzip by default)
  • delaycompress means "wait til the next day before compressing this file, just incase there are still be processes writing to this logfile"
  • sharedscripts means only call the next postrotate script once for all the files that match the pattern above, instead of restarting for each file
  • postrotate ... endscript - this script here fires after a rotate, to restart the passenger server, so that future processes log to the fresh, empty logfile

If you want to learn more, this article by Jesse Andrews on how he uses it is an absolute gem.

Trigger the logrotate daemon

Now that we have a logrotate entry, lets check it if works now rather than going to bed and finding out we had mistyped the path, by forcing a log rotate with this command (note the -f flag):

logrotate -f /etc/logrotate.d/railsapp

Get on with your life

If that's worked, then huzzah! That should be one less thing to worry about when looking after a webapp - though the usual "do some real homework before putting absolute faith in and deploying on a production system" disclaimers apply. As ever with Linux, be sure to read the man pages before use.

nb. While it's true that Rails is actually smart enough to rotate its own logs if you remember to configure it to behave this way, learning how to use logrotate like this means we can use this on other apps too, without being too tied to a particular framework. Handy when you need it for a Merb, Sinatra, Django, or even Node.js project

Frustrating and Cryptic Ruby Idioms (#1 of a series)

I keep coming across these FACRI's (Frustrating and Cryptic Ruby Idioms) in my work, so I'm jotting them here in the hope that I'll remember them better in future.

Ruby idioms

Ruby is a wonderful, if somewhat slow and memory hungry language, with an incredibly flexible and expressive syntax. However this flexibility leads to the creation of idioms that initially look totally opaque, if you don't know what to look for.

Case in point: the object{:&method} idiom

If you want to take an array called names , want to create a new array by running a text manipulation on every member of that array, a terse, but readable way to do this would be:

result = names.map { |name| name.upcase }

The intent is pretty clear here, and what happens programatically is also very readable. Another way to do this though is write it like:

result = names.map {&:upcase}

Something called type coercion is occurring here; you normally pass the map method a Proc object to execute, with a placeholder name for each iteration, and the code to run and return. However because you're not passing a Proc object here, Ruby tries to convert it on the fly into a Proc object using a method called to_proc:

def to_proc
proc { |obj, *args| obj.send(self, *args) }
end

So in this case, it's passing in names, and the method in the *args is upcase. I wasn't familiar with the send method here either, so the documentation for it from ruby core may help here:

class Klass
def hello(*args)
"Hello " + args.join(' ')
end
end
k = Klass.new
k.send :hello, "gentle", "readers" #=> "Hello gentle readers"

An expensive idiom, by rockstars, for rockstars.

Th end result of all these examples is a saving of about 12 characters, at the expense of readability, and a huge performance hit as each member in names of passed around and type coerced like there's no tomorrow.

If you're a coding savant, the elegance of this will probably make you weep tears of syntactic joy, and the clever brevity of this isn't lost on me.

However, coming to this, without too much knowledge of the Ruby extensions project, or someone to talk you through what's happening is likely to be a frustrating experience.

Hope fully this will save time for someone else in future.