If you are following the Ruby mailing list, you will see we just made a new release of RubyGems. For the past week, most of my spare time has gone into making that happen. Now its time to get back into some other projects that I’m woefully behind on …
But before we do, it’s time to have a little fun with Ruby.
Patrick Logan wonders aloud why spread sheets haven’t made it into programming languages as first class objects. He then goes on to speculate what it might look like in Smalltalk. And along the way, he bumps into some interesting concepts about evaluating expressions.
Translating Patrick’s Smalltalk code into Ruby turns out to be quite straight forward …
ss = SpreadSheet.new ss[1,1].value = 5 ss[1,2].value = ss[1,1] * 6 ss[1,3].value = ss[1,2] * 7 puts ss[1,3].value # => 210
ss is our spread sheet object. Indexing it with row and column indicies yields cells, which can hold formulas. The first cell gets a five. The second cell (at [1,,2]) gets 6 times that (30), and the final cell gets 7 times that (210).
Straight forward arithmetic, right?
Wrong!
Remember that this is a spread sheet. Changing the values in cells should effect cells that derive from it. In other words, if we change the value of ss[1,1] and then ask for the value of ss[1,3], we should get a new value.
ss[1,1].value = 10 puts ss[1,3].value # Should now print 420!
The naive implementation of just storing values in arrays as we calculate them just isn’t going to hack it.
My first thought was that we would want to just wrap the expressions in a lambda. Putting them in a lambda has the effect of deferring the evaluation of the expression until we really want it. And we can reevaluate the expression as needed as its dependents change.
The result would look like this …
ss = SpreadSheet.new
ss[1,1].value = 5
ss[1,2].value = lambda { ss[1,1] * 6 }
ss[1,3].value = lambda { ss[1,2] * 7 }
puts ss[1,3].value # => 210
In fact, a commenter to Patrick’s blog suggested the same thing.
Patrick responds.
How would this work out in Ruby? In Ruby, all computation is accomplished by sending messages. To defer a computation, just capture its message and replay it later at your convenience. We will start with a formula object that turns any message sent to it into a deferred object.
class Formula < Builder::BlankSlate
def method_missing(sym, *args, &block)
Deferred.new(self, sym, args, block)
end
end
Since you generally want every message sent to a formula object to be recorded, I based Formula on the BlankSlate class I’ve mentioned earlier. Deferred is also fairly straight forward to write. The initialize method just records the receiver of a message, the message name, arguments and any blocks.
class Deferred < Formula
def initialize(target, operation, args, block)
@target = target
@operation = operation
@args = args
@block = block
end
# ...
We derive Deferred from Formula because we want operations against a deferred object to also be deferred, and Formula handles that perfectly.
Now, when we want the value of a deferred operation, we need to ask it nicely. We will use a method named formula_value for that purpose.
def formula_value
@target.formula_value.send(@operation, *eval_args, &@block)
end
# ...
To replay the deferred operation, we need to get the target (reciever) of the message. Since the target was a formula, we need to ask for its formula value. We then send it the original operation (message name) and a list of arguments. Just one subtle note about the arguments. They too might (or might not) be formula objects, in which case we need to evaluate them before passing them to send. eval_args just creates a new list of formula values from the original argument list.
def eval_args
@args.collect { |a| a.formula_value }
end
end # of class Deferred
Suppose I had a Formula object, and tried to add 1 to it. What would I get …
result = some_formula + 1
result is a new deferred object, capturing the sending of +1 to the formula. To get the real value, we ask the result for its formula_value:
result.formula_value
which recursively gets the formula value of some_formula and sends it a +1 message.
Great. We see how to get deferred evaluations if we have a formula object, but where does that original formula object come from?
There are several options. One is to create a Formula wrapper around a normal Ruby value.
class Const < Formula
attr_reader :formula_value
def initialize(value)
@formula_value = value
end
end
Now we can say:
result = Const.new(5) + 3 # Returns a deferred formula object
and later ask:
result.formula_value # Returns 8
Another question: I can see how formula+1 works. The + message is sent to a formula object and gets handled specially. But what if we write:
1 + formula
Won’t 1 get confused because it doesn’t know how to handle a formula object?
Well, yes and no. The + operation in Integer will get confused, but it has a backup procedure to follow when it doesn’t know how to add itself to an arbitrary object: It asks the object how to do it.
Integer + will call coerce on the Formula object and expect Formula to figure out how to do the addition. Coerce should return a two element list whose elements can be added together.
Here is coerce for Formula:
class Formula
def coerce(other)
[Const.new(other), self]
end
end
Formula just wraps the original number in a Formula wrapper and then lets the wrapper handle deferring the addition.
The Cells of the spreadsheet are Formula objects too. The only catch here is that they can hold different Formula objects over time. value should calculate the value of a cell (and will in turn call formula_value to do so). value=(val) should set the internal formula to whatever the user specifies.
Here is Cell in all its glory.
class Cell < Formula
def initialize
@formula = 0
end
def value
@formula.formula_value
end
def value=(new_value)
@formula = new_value
end
def formula_value
@formula.formula_value
end
end
Making a Cell object a formula is pretty key to the whole deferred calculation design. If a cell contains only normal Ruby objects, then its value can be calculated immediately. However, we want to defer calculations that depend upon other cells, and since adding a cell object to any expression will automatically defer it, we get the effect we want automatically.
The astute observer will note that I initialize @formula in Cell to 0, but later in the code send a formula_value message to @formula. Won’t this cause problem, sending formula message to a non-formula object?
Well, yes it would. But it simplifies the code greatly if we just pretend all objects are formulas and can respond to formula_value. Non-formula objects just respond by returning themselves.
This is easy to accomplish.
module Kernel
def formula_value
self
end
end
(And this is why I chose a rather awkward name for fetching a formula’s value. Since all objects will implement this method, I wanted it to avoid potential name conflicts).
The final piece of the puzzle is the actual spread sheet object. Its only real function is to act as a lookup container for the Cell objects. My implementation uses a cheesy "convert two integers to a string key" technique. It works ok, but it bothers me.
class SpreadSheet
def initialize
@hash = Hash.new
end
def [](r,c)
@hash["#{r}@#{c}"] ||= Cell.new
end
end
That’s it. The guts of a spread sheet in under 100 lines of code. You can see the complete listing on my wiki.
This technique of deferring calculations has some really interesting possibilities. Imagine that instead of replaying deferred calculations, we examined the deferred expression and performed some operation on them. Like what? Well, maybe optimizing the expression, maybe analyzing the variable references, maybe generating SQL from them. Remember the Criteria library … it uses this technique to translate ruby code into SQL conditions.
Be careful with this technique. Although quite powerful, there are a couple of places that can cause problems. In particular, the && and || operators in Ruby are difficult to capture in this way.