{ |one, step, back| } 176 to 185 of 193 articles Syndicate: full/short

Kata Two Worked (continued) -- Eliminating Tail Recursion   04 Jul 03
[ Kata print link all ]

Previous Katas

See the following writeups in this series

In KataTwoCps we talked about passing continuations into functions. It turns out that continuations were just functions (we used Proc objects) that encapsulated the rest of the calculation to be performed. Using continuations allowed us to turn normal recursive calls into tail recursion.

Variation #5: Eliminating Tail Recursion

One of the benefits of tail recursion is that it is easy to turn a tail recursive function into an iterative function. Lets try that transform on our CPS version from KataTwoCps.

Tail recursive calls can be transformed into gotos to the beginning of the function. Any parameters passed to the call are assigned to the existing parameters.

So, the following call …

  sub_chop_cps(target, list[0...mid], found, not_found)

would be rewritten as …

  target = target
  list = list[0...mid]
  found = found
  not_found = not_found
  goto BEGINNING_OF_FUNCTION

We can drop the redundant assignments (e.g. target = target). We will also cleverly use an infinite while loop instead of an explicit goto. (Ruby doesn’t have raw gotos, but if you are patient, I’ll show you how to implement gotos).

First Attempt

Here’s our first attempt at tail recursion removal.

  def chop_tail_recursive(target, list)
    sub_chop_tail_recursive(target, list, proc{|x| x}, proc{-1})
  end

  def sub_chop_tail_recursive(target, list, found, not_found)
    while (true)
      if list.size <= 1
        return (list[0] == target) ? found.call(0) : not_found.call
      else
        mid = list.size/2
        if list[mid] > target
          list = list[0...mid]
        else
          list = list[mid..-1]
          found = proc{|x| found.call(x+mid)}      # PROBLEM
        end
      end
    end
  end

Unfortunately, this code fails the unit tests.

At the line marked PROBLEM we create a Proc object that references found. The intent of this line is that sometime in the future this proc will be called and the current value of found will be used. Unfortunately, we immediately clobber the value of found, sabotaging that future call. (The mid value has a similar problem).

To fix this, we need to create a new context where the value represented by found (and mid) is not lost. If we replace the PROBLEM line with the following, our problems are solved.

  found = proc {|ret, m| proc{|x| ret.call(x+m)} }.call(found, mid)

Here we create our Proc object referencing ret and m instead of found and mid. Both ret and m are created in the scope of an enclosing block and initialized with the values of found and mid. The key is that ret and m are never reassigned, therefore they never become invalid.

Second Attempt

Our second attempt changes the PROBLEM line and works much better. Here is the complete source code.

  def chop_tail_recursive(target, list)
    sub_chop_tail_recursive(target, list, proc{|x| x}, proc{-1})
  end

  def sub_chop_tail_recursive(target, list, found, not_found)
    while (true)
      if list.size <= 1
        return (list[0] == target) ? found.call(0) : not_found.call
      else
        mid = list.size/2
        if list[mid] > target
          list = list[0...mid]
        else
          list = list[mid..-1]
          found = proc {|ret, m| proc{|x| ret.call(x+m)} }.call(found, mid)
        end
      end
    end
  end

  class TestChopTailRecursive < Test::Unit::TestCase
    alias chop chop_tail_recursive
    include ChopTests
  end

Problems Encountered

No problems were encounter other than the already discussed issue regarding the reassignment of found and mid.

References

Dan Sugalski is also talking about Continuations, CPS, TailRecursion, and more. Dan is writing the byte code engine that will eventually power Perl 6 (and perhaps Ruby and Python as well). Dan’s viewpoint is a bit more oriented toward implementation issues surrounding tail recursion and continuations, but make a nice balance to what you read here.

Next Time

We still haven’t talked about callcc. I promise we will get to it next time.


comments

Design by Contract and Unit Testing   29 Jun 03
[ print link all ]
Design by Contract and Unit Testing

Introduction

I wrote this article on Design by Contract and Unit Testing to go along with the EiffelUnit documentation. I'm no longer maintaining EiffelUnit (I recommend using Gobo Eiffel Test instead), but thought the article was worth publishing independently. Hence this blog entry.

The article was written three years ago, so I added a postscript to the conclusions that reflects my current experience.

Design by Contract and Unit Testing

Got some questions about Design by Contract and unit testing?

I've been using Eiffel for almost three years, and have been exploring some Extreme Programming ideas for the past 6 months. In particular, I've been using XP's "Test-First" desgin process in conjunction with Eiffel. Here are some of my thoughts on how unit testing and Design by Contract fit together. The presentation is in the form of a question/answer session. If you have comments on this topic, I would be glad to here them at jweirich@one.net.

The first part of this page covers Design by Contract for those who are new to the idea. Eiffel programmers may want to skip directly to the section on Unit testing. Those whose time is limited may want to jump directly to the conclusions.

What is Design by Contract? Isn't it just adding assertions to your code?

Although Design by Contract and assertions are very closely related, DbC is more than just slapping a few assertions into your code at strategic locations. It is about identifying the contract under which your code will execute and you expect all clients to adhere to. It is about clearly defining responsibilities between client software and supplier software.

In short, Design by Contract starts by specifying the conditions under which it is legal to call a method. It is the responsibility of the client software to ensure these conditions (called preconditions) are met.

Given that the preconditions are met, the method in the supplier software guarantees that certion other conditions will be true when the method returns. These are called postcondition, and are the responsibility of the supplier code in ensure.

What's the advantages of identifying contracts?

By identifying the contract, we are clearly identifying the responsibilities of each party in a collaboration. Knowing responsibilities is a big help when the time comes to get two (or more) modules to work together. Precondition violations mean that the client is in error. Postcondition violations mean that the supplier is incorrect.

How does this work with inheritance?

Very well.

Ok, seriously. Contracts are inherited by child classes. If a child class overrides a method in a parent class, it needs to make sure the method's new implementation conforms to the contract of the parent class. By conforming, we mean that the child class is not allowed to tighten the preconditions of the parent class, or is it allowed to loosen the postconditions. We remember this by using the slogan...

Require no more, promise no less.

By the way, Eiffel automatically handles preconditions and postconditions in child classes without any need to copy and paste them from the parent.

This sounds complicated. Is it really useful?

Absolutely! By writing child classes so that they conform to the contract of their parent, we ensure that our classes obey the Liskov Substitution Principle.

So where do the assertions come in?

Once we have identified the contract, we can programmatically check that everybody play by the rules by checking for preconditions at the start of a method and checking for postconditions at the exit.

Note that the checking (assertions) are merely a way of verifying that everyone is playing by the rules. We should be able to remove the assertions without effecting program correctness. In fact, Eiffel allows you to specify several different levels of assertion checking.

What is unit testing?

Unit testing (at least from the Extreme Programming standpoint) is a set of executable code that exercises the production code, with the intent of checking it for correctness. Unit tests are automated and run without human interaction.

Eiffel has Design by Contract. Why do we need unit testing?

There are several advantages to running unit tests, even in a Design by Contract environment.
  • Better code coverage. Unit tests provide more thorough code coverage than what a typical application would do. DbC assertions are useless if the code is never called.
  • Repeatable. Unit tests are automatic and repeatable. You don't have to rely on a user manually running a program.
  • Reportable. The results of the unit test are easily summarized for reporting (e.g. 2 out of 200 unit tests failed).

Ok, so if we have unit tests, why do we still need DbC?

Although very similar in many ways, there are still some fundamental differences between DbC assertion checking and unit testing. Unit tests generally focus on a single module (class) and don't exercise modules (classes) in concert together. In particular, unit test are a poor vehicle for checking preconditions.

Wait a minute! I can check preconditions in my unit test!

When someone says they are using unit tests to check preconditions, what they really mean is that the unit tests check to make sure code is well behaved in the presence of bad data. They check to make sure that an exception is thrown or that a error flag is set, or even that the bad data is appropriately ignored.

From a Design by Contract perspective, this is not testing preconditions. Preconditions define when it is legal to invoke a particular method. According to DbC, calling a method when its preconditions are not established results in undefined behavior. How can you test for undefined behavior?

What the unit tester has really done is changed the contract on the tested code. They replaced the existing precondition with True (the universal precondition ... methods with a precondition of True may be called at any time). They also changed the postcondition to a stronger, more complicated one.

Can you give me an example?

Ok, here is a version of the stack operation push with a typical DbC contract.

    push (value: INTEGER)
        require
            not is_full
        ensure
            count = old count + 1
            top = value

When you do robustness checking in a unit test, you generally transform the contract into something slightly different. This is often called "defensive programming".

    push (value: INTEGER)
        -- Note: no precondition
        ensure
            success: (not old is_full) implies
                         ((count = old count + 1) and
                          (top = value))
            failure: (old is_full) implies
                         ((count = old count) and
                          (top = old top))
            error_set: old is_full = error

In this version, we added an error flag. We could have also specified an exception should be thrown.

But isn't defensive programming good thing?

As you can see, the defensive programming version has a much more complicated postcondition. Some might defend this by saying that all we have done is pushed existing complexity into the push method, and that the client code can be simplified.

Unfortunately, that's not necessarily the case.

  -- Design by Contract version
  if not stack.is_full then
      stack.push (value)
  else
      -- handle full stack
  end
  -- Defensive Programming Version
  stack.push (value)
  if stack.error then
      -- handle full stack
  end
Defensive programming generally doesn't simplify the client, it just makes him check for errors after the fact.

Defensive programming can also hide problems in the code. By ignoring bad values, bad data can propagate further in a program.

But what does this have to do with precondition checking?

Unit tests focus on verifying that a single class functions properly. Problems between modules are not generally uncovered by unit tests. Extreme programming uses functional (or acceptance) tests to verify end to end correctness in their programs. But since functional tests only check the final result, they give little feedback about what is going on inside the program.

By enabling contract precondition checking, we can immediately detect interface irregularities whenever they happen, whether in function tests or unit tests.

Ok, checking contract preconditions sounds useful. But what about postconditions?

Like unit tests, DbC postcondition assertions check the results of invoking a method. In other words, they are check the validity of a single class, and not the interface as precondition checking will do.

Although they are checking the same kind of behavior, their focus is different. DbC postconditions are general and answer the question of how a method responds under all possible legal conditions. Unit tests focus on how a method responds under certain, specific situations specified in the test.

Because DbC postconditions are general, they are able to check every invocation of the method for the proper behavior. They can be run during unit testing, but they also are in effect during functional testing (and normal program execution, assuming they aren't disabled), checking each and every invocation of the method. In contrast, unit tests only check a limited number of specific invocations.

Because they are more general, DbC postconditions are more difficult to write. In fact, sometimes the postcondition of a method can be more complicated that the actual code of the body.

Can you give me an example of a complicated postcondition?

Ok. Here is the resize method from the ARRAY class in the ELKS 2000 library proposal. The post condition carefully describes the resulting state of the array after the invocation of the resize method. Note that it uses recursive calls in several places and that it carefully handles all combinations of new/old array bounds.

   resize (min_index, max_index: INTEGER)
      -- Resize to bounds `min_index' and `max_index'.
      -- Do not lose any item whose index is in both
      -- `lower..upper' and `min_index..max_index'.
   require
      valid_bounds: min_index <= max_index + 1
   ensure
      new_lower: lower = min_index
      new_upper: upper = max_index
      default_if_too_small:
         min_index < old lower implies subarray
          (min_index, max_index.min (old lower - 1)).all_default
      default_if_too_large:
         max_index > old upper implies subarray 
          (min_index.max (old upper + 1), max_index).all_default
      stable_in_intersection:
         subarray ((min_index.max (old lower)).min(old upper + 1), 
            (max_index.min (old upper)).max(old lower - 1)).same_items
               (old subarray ((min_index.max (lower)).min(upper + 1), 
               (max_index.min (upper)).max(lower - 1)))

Wow, what would the unit test code for the same thing look like?

Here is a unit test for resize, written in EiffelUnit. Note that assert_array(lo,hi,expected,actual) is a convenience method that asserts that the array `actual' has the given lower and upper bounds and contains the same elements as `expected'.

The number of lines in the test case is actually longer that the postcondition above. But each individual test is simple and direct, and easy to verify that it is correct.

    array: ARRAY[INTEGER]

    test_same_range is
        do
            array := <<1, 2, 3>>;
            array.resize (1,3)
            assert_array (1, 3, <<1, 2, 3>>, array)
        end

    test_included_range is
        do
            array := <<1, 2, 3>>;
            array.resize (1,1)
            assert_array (1, 1, <<1>>, array)
        end

    test_big_range is
        do
            array := <<1, 2, 3>>;
            array.resize (0, 5)
            assert_array (0, 5, <<0, 1, 2, 3, 0, 0>>, array)
        end

    test_high_overlap is
        do
            array := <<1, 2, 3>>;
            array.resize (2, 5)
            assert_array (2, 5, <<2, 3, 0, 0>>, array)
        end

    test_low_overlap is
        do
            array := <<1, 2, 3>>;
            array.resize (-1, 1)
            assert_array (-1, 1, <<0, 0, 1>>, array)
        end

    test_no_overlap is
        do
            array := <<1, 2, 3>>;
            array.resize (5, 7)
            assert_array (5, 7, <<0, 0, 0>>, array)
        end
Note for non-Eiffel programmers: The form <<1, 2>> is a manifest array of two elements.

Any other postcondition examples?

I'm glad you asked. Consider the push and pop methods on a STACK. push is fairly easy to specify using DbC...

    push (value: INTEGER)
        ensure
            count = old count + 1
            top = value

But pop is more difficult.

    pop
        ensure
            count = old count - 1
            -- What is the value of `top'

What is the value of the top of the stack after pop completes? Well, it depends upon the history of pushing and popping leading up to a given call to pop. The logic required to write a postcondition specifying the value of pop

Unit tests don't suffer the same problem. Since we are only testing a specific scenario, we know what the value of the top of the stack should be.

    test_push_and_pop is
        do
            stack.push(5)
            assert_equal ("push5", 5, stack.top)
            stack.push(123)
            assert_equal ("push123", 123, stack.top)
            stack.pop
            assert_equal ("pop5", 5, stack.top)
            stack.pop
            assert ("empty", stack.is_empty)
        end

Conclusion

We see that although unit tests and Design by Contract assertions have a lot of overlap, but each has a different focus and different strong points. DbC preconditions are useful for checking inter-module correctness. Unit testing and DbC postconditions validate individual module behavior, with unit testing focusing on specific situations an DbC postconditions focusing on general behavior.

So, what does this mean in practice? I find that when I am doing Eiffel, I tend to specify preconditions along with a good set of unit tests. If the postconditions are obvious and easy to write, I will include them (mainly because postcondition are included in the short form view of a class). If the postconditions are difficult to specify in general, I might include them in comment form for the short view.

When I'm using something Java, or some other language with weaker support for Design by Contract, I definitely would use unit tests for checking individual modules. Depending on the availability of tools (e.g. some kind of assert mechanism), I would still do some level of precondition specification.

Postscript

At the time of writing the original text (sometime in 2000), I was doing a lot of DbC and Eiffel programming and had just started using XP style unit tests. Today (sometime in 2003) I am no longer writing Eiffel code and doing very little "formal" DbC. I find that I still think of interfaces in terms of contracts with pre/post conditions, but instead using Eiffel-like assertions to define the pre/post-condition assertions, I now use unit tests to drive and define the shape of the software.

Although the unit tests have a different focus than assertions (as noted in the article), I find that they are quite suitable for my needs and rarely find it necessary to supplement the coce with any additional pre/post-condition assertions.


comments

Kata Two Worked (continued) -- Continuation Passing Style   19 Jun 03
[ Kata print link all ]

Where Were We?

KataTwoVariation1
We explored an iterative solution using loop invariants.
KataTwoRecursive
We looked at two recursive solutions, one that passed array slices and one that passed explicit limits.
In this posting, we are going to look at a few variations that play with something called Continuation Passing Style.

Variation #4: Continuation Passing Style

Let’s start by looking reviewing variation 2 in KataTwoRecursive. When we recurse, we need to carefully pass in the position of the subarray in the whole array so that the final result is correctly positioned in the whole array. The recursive calls looks like this …

    if list[mid] > target
      chop_recursive(target, list[0...mid], offset)
    else
      chop_recursive(target, list[mid..-1], offset+mid)
    end

Note that if we didn’t have to worry about the returning a -1 when the target value is not found, we could have written the recursive calls like this.

    if list[mid] > target
      chop_recursive(target, list[0...mid])         # [1]
    else
      chop_recursive(target, list[mid..-1]) + mid   # [2]
    end

As chop_recursive unwinds the stack, the correct offsets are added one by one until the final result correctly represents the correct value for the whole array.

Tail Recursion

There is an interesting difference between the recursive call at [1] and the recursive call at [2]. The call at [1] is tail recursive. This means that the result returned by the call at [1] is the result for the entire function. Once the call at [1] terminates, there is nothing to do but immediately return the result we just received.

A smart optimizing compiler can turn tail recursive calls into a jump to the beginning of the function (after reassigning the function parameters). In other words, tail recursion can be implemented using iteration without any stack growth.

There are a few languages where the tail recursion optimization is guaranteed by the language. Scheme is one such language. In Scheme, you never have to write a loop, the language will turn your tail recursive calls into loops automatically.

Unfortunately, the recursive call at [2] is not tail recursive. When [2] returns, it must continue to execute adding mid to result before returning. This little bit of work that must be done between the return of the call at [2], and the return of the main function call is called the continuation of [2]. It is where call [2] continues when it is done.

Continuations As Functions

Wouldn’t it be nice if we could turn the call at [2] into a tail recursive call? Then our entire function would be tail recursive and it could be implemented without stack growth.

This is actually not hard to do. Suppose we had a function that handled that little "+ mid" code for us. When we are ready to return a value, then we just need to let the contination handle any of the "adjustments" before returning the value. We could pass in the continuation function as an argument to the function and call it when we need to return.

Something like this …

  def chop_cps(target, list, continuation)
    if list.size <= 1
      (list[0] == target) ? continuation.call(0) : XXXXX
    ...

This says return the value +0+ if the list[0] matches the target value. The continuation function will apply any adjusts specified by the calling context. (The "XXXXX" represents how we handle the -1 return value. We will come back to this in a bit.)

Now we just need to create the continuations at the right points.

If we are recursing on the first half of the array (line [1] above), then there is no need to add anything. Our contination will just return the input value directly.

  chop_cps(target, list[0...mid], proc{|x| continuation.call(x) })     # [1a]

Remember, continuation is the continuation of our current function call. In other words it is the code to execute when our current fuction returns. So, in order to properly return a value, we need to call continuation, passing in our result.

If we need to adjust the returned result by adding mid (line [2] above), then the continuation we pass needs to look like this …

  chop_cps(target, list[mid..-1], proc{|x| continuation.call(x+mid) }) # [2a]

One final optimization. We notice that in [1a] we are just passing the return value without changing it. We can shorten that to just …

  chop_cps(target, list[0...mid], continuation)                        # [1b]

Ok, we are ready to write the CPS style chop function … except for one thing! We don’t know how to handle the case where the target value is not found!

Handling Errors

The spec for the chop function says that we will return a negative one whenever the target value is not found. Unfortunately, if we return a negative one, the continuations from the calling methods clobber our error return by (possibly) adding the valud of mid to it (remember [2a] above?).

What we need is an alternative continuation where no adjustment is done. No problem! Since we are just using functions for continuations, we can pass in an "error reporting" continuation to handle failure. Our error return continuation would look like this:

    proc { -1 }

Code

Here’s the code to the recursive version using CPS. Since we are passing two continuations into our recursive function, we will call the continuation for the normal return found and the continuation for the error return not_found.

  def chop_cps(target, list)
    sub_chop_cps(target, list, proc{|x| x}, proc{-1})
  end

  def sub_chop_cps(target, list, found, not_found)
    if list.size <= 1
      (list[0] == target) ? found.call(0) : not_found.call
    else
      mid = list.size/2
      if list[mid] > target
        sub_chop_cps(
          target,
          list[0...mid],
          found,
          not_found)
      else
        sub_chop_cps(
          target,
          list[mid..-1],
          proc{|x| found.call(x+mid)},
          not_found)
      end
    end
  end

  class TestChopCpsRecursive < Test::Unit::TestCase
    alias chop chop_cps
    include ChopTests
  end

Problems Encountered

This implementation turned out to be surprisingly easy. No problems were encountered during the implementation.

Notes

I find the introduction of the not_found continuation very interesting. Why does the chop function return a negative one as an error value? Why doesn’t it throw an exception?

With the not_found continuation, we can decide at runtime how errors are to be reported. For example, to switch to an exception throwing version of chop, all we have to do is …

  def chop_cps(target, list)
    sub_chop_cps(
       target,
       list,
       proc{|x| x},
       proc{ fail "Value #{target} not found." })
  end

This is cool!

Where to Next?

Some of you are no doubt surprised that this entire discussion of continuations never touched on Ruby’s callcc mechanism. We will save callcc for the next session, along with two or three more variations on the CPS variation introduced here.


comments

Scripting Languages for Building   19 Jun 03
[ print link all ]
A while back I ranting to a coworker about limitations of make and ant. My main complaint about ant was the choice of XML to represent the build script. I felt a reasonable scripting language would give a lot more power.

That discussion eventually led to the creation of Rake. But it seems others seem to have similar thoughts.

James Duncan Davidson, the original ant developer, writes

When I first chose XML as the configuration format for Ant, I figured it was a great way to replace the simple properties files that I was using before with a structure that could define hierarchical data. However, as time goes on the XML format becomes more and more burdensome to actually getting something done. […] For a long time, I’ve thought that the way to alleviate the burden for Ant would be to use some sort of scripting language to front end the Ant task object model under the covers.

Jonathon Simon is also talking about scripting ant, but using Jython access the ant object module. In an email to the ant-dev mailing list, he lays out some of his ideas. (He even mentions using JRuby.)

It will be interesting to watch where this goes. I suspect that the XML ant build script may have its days numbered.


comments

My New Pair Programming Partner   16 Jun 03
[ print link all ]
My New Pair Programming Partner

Ok, not really.

It is really my Father's Day gift from the kids. They say this is what I look like when playing guitar.

Maybe ... on a really bad hair day.

Actually, the guitar is nice. And who know, perhaps he will be fine at pairing after all.


comments

Kata Two Worked (continued) -- Recursive Binary Chop   16 Jun 03
[ Kata print link all ]
This is a continuation of the KataTwoVariation1 posting where we are trying different methods of implementing a binary search. In the first version we discussed an iterative approach and using loop invariants. In this second installment we cover the the recursive versions. (see Kata Two for background information)

Variation #2: Recursive

This is a standard recursive implementation of the algorithm we looked at in KataTwoVariation1. This version determines which half of the array contains our target value and recurses on the appropriate half of the array.

Code

Here’s the code …

  def chop_recursive(target, list, offset=0)
    if list.size <= 1
      (list[0] == target) ? offset : -1
    end
    mid = list.size/2
    if list[mid] > target
      chop_recursive(target, list[0...mid], offset)
    else
      chop_recursive(target, list[mid..-1], offset+mid)
    end
  end

  class TestChopRecursive < Test::Unit::TestCase
    alias chop chop_recursive
    include ChopTests
  end

Problems Encountered

In writing this version, I encountered the following issues …

  • I had a problem where I miscalculated the array fragment to recurse on. The miscalculation led to a recursive call where the size of the array wasn’t decreased, and led to infinite recursion.
  • Since a copy of the array is passed, we lose information on the position of the target value in the full array. I had to pass in the offset of the of start of the array fragment to correct for this.
  • I refactored the method to return nil (instead of -1) when the target value was not found. I decided that it didn’t help improve the logic in any way, so I put the -1 return value back in.
  • There was a performance bug where I searched both branches of the array (and then returned the one that matched).
  • The size==1 test really bugs me. I made several (unsuccessful) attempts to get rid if it. (But see the next item)
  • The original version of the recursive version started with …
      def chop_recursive(target, list, offset=0)
        return -1 if list.empty?
        return offset if list[0] == target
        return -1 if list.size == 1
        ...
    

    This version tests list[0] agains target at the entry to every recursive call. This is wrong. I should only test when the list is down to one element. This defect made it into the original posting and it wasn’t until I was reviewing the code for the CPS post that I discovered the error. It also might explain my discomfort with the list.size==1 test. It was actually the list.empty? test that was incorrect and the size check evolved to list.size <= 1.

Variation #3: Recursive with Explicit Limits

In variation 2, I decided to pass a copy of the sub-array in each recursive call. This is an expensive operation, but I was thinking that I could avoid passing around explicit limits this way. However, it turned out that I still needed to pass the offset, negating much of the advantage.

In variation 3 we will avoid makeing array copies by always passing the original array around. Instead of arrays, we will explicitly pass the hi/lo limits of the sub-array in the recursive call.

I think this version is not only more efficient than variation 2, it is also easier to read and understand.

Notice that the recursive call in this version takes 4 parameters, so I created a helper function (rchoplim) with the proper signature to do the recursion.

Code

Here’s the code …

  def chop_recursive_passing_limits(target, list)
    rchoplim(target, list, 0, list.size)
  end

  def rchoplim(target, list, lo, hi)
    if (hi-lo) <= 1
      list[lo] == target ? lo : -1
    else
      mid = (lo+hi) / 2
      if list[mid] > target
        rchoplim(target, list, lo, mid)
      else
        rchoplim(target, list, mid, hi)
      end
    end
  end

  class TestChopRecursiveWithLimits < Test::Unit::TestCase
    alias chop chop_recursive_passing_limits
    include ChopTests
  end

Problems Encountered

There were no problems encountered while implementing Variation 3. I think the simpler logic helped (plus I worked out a number of bugs in Variation 2).

Things to Come

In the next installment, we will explore a CPS (Continuation Passing Style) implementation.


comments

"Invitation to Ruby" Now Available   09 Jun 03
[ print link all ]
My "Invitation to Ruby" presentation is now available in the articles section. See onestepback.org/articles/invitationtoruby/cover.html.
comments

Kata Two Worked -- Binary Chop   09 Jun 03
[ Kata print link all ]
PragDave has put together some programming "practice" exericises that he calls Katas. In Kata number two he suggests writing a binary search function, not once, but five different ways. Now I’ve implemented a binary search several times, but writing it five times in five different ways was intriguing.

Here’s my attempt at Variations on a Binary Search Theme.

Unit Tests

Dave provided a set of unit tests for the binary search function. The tests below assume that the functions name is chop, and that it takes the target value and the array to search as arguments. Chop should return the index position in the array where the target value was found.

We put the tests into a separate module that will be included in the real test classes below. This is a useful technique when you write tests you wish to reuse in several test classes.

  require 'test/unit'

  module ChopTests
    def test_chop
      assert_equal(-1, chop(3, []))
      assert_equal(-1, chop(3, [1]))
      assert_equal(0,  chop(1, [1]))

      assert_equal(0,  chop(1, [1, 3, 5]))
      assert_equal(1,  chop(3, [1, 3, 5]))
      assert_equal(2,  chop(5, [1, 3, 5]))
      assert_equal(-1, chop(0, [1, 3, 5]))
      assert_equal(-1, chop(2, [1, 3, 5]))
      assert_equal(-1, chop(4, [1, 3, 5]))
      assert_equal(-1, chop(6, [1, 3, 5]))
      assert_equal(0,  chop(1, [1, 3, 5, 7]))
      assert_equal(1,  chop(3, [1, 3, 5, 7]))
      assert_equal(2,  chop(5, [1, 3, 5, 7]))
      assert_equal(3,  chop(7, [1, 3, 5, 7]))
      assert_equal(-1, chop(0, [1, 3, 5, 7]))
      assert_equal(-1, chop(2, [1, 3, 5, 7]))
      assert_equal(-1, chop(4, [1, 3, 5, 7]))
      assert_equal(-1, chop(6, [1, 3, 5, 7]))
      assert_equal(-1, chop(8, [1, 3, 5, 7]))
    end
  end

Variation #1: Iterative with Loop Invariants

I first read about using loop invariants on a binary search algorithm, so I thought it would be a good variation to start with.

If you have not used loop invariants in the past, they are a useful technique for understanding loops. You specify an condition that is true for every single iteration of the loop. This condition is called the loop invariant.

For the binary search, I used the following invariant.

INV :lo <= Result && Result < hi

Where lo and hi are indices into the array and result is the location in the array where our target value lies (i.e. the result of the binary search function). All this invariant states is that the final result index will be between position lo and hi. (Note that I’m assuming that the target value is in the array. We will deal with that assumption later).

The invariant must be true when we start the loop, but that is easy to handle. Just set lo to zero (the result index must be greater than or equal to zero), and set hi to one more than the largest index value (list.size works for this).

The second piece you need is a loop termination condition. Each time through the loop we will adjust hi and lo so that they move closer together (without violating the loop invariant). When the difference between the hi and lo values is 1, we can exit the loop.

Here is the termination condition.

TERM :(hi-lo) <= 1

When the loop exits, the condition TERM will be true. And the invariant INV will also be true (because it is true for every iteration). If TERM and INV are both true, then that implies that result must be equal to lo! Or in other words, when the loop exits, our search target value must be in the array indexed by lo (if it is in the array at all).

Using the above invariant and termination conditions, we can be certain that our loop will be correct (as long as they are faithfully implemented, which is not a minor problem).

Code

Here’s the code. Notice that INV will be true the first time we enter the loop. Inside the loop, we calculate the midpoint of the lo/hi indices, and compare the value at the midpoint to our target value.

If the value at the midpoint is larger than our target, then the range (lo..mid) still satisfies the invariant and we can adjust hi down to mid. Otherwise we can be sure that the range (mid..hi) satisfies the invariant, and we adjust lo up to the value of mid.

The "list[lo] == target" test at the end checks our assumption that the target value was in the array. If we found the value, we return the lo value. Otherwise we return -1.

  def chop_iterate(target, list)
    lo = 0
    hi = list.size
    while (hi-lo) > 1
      mid = (hi+lo) / 2
      if list[mid] > target
        hi = mid
      else
        lo = mid
      end
    end
    list[lo] == target ? lo : -1
  end

  class TestChopIterative < Test::Unit::TestCase
    alias chop chop_iterate
    include ChopTests
  end

Problems Encountered

So, did all that work figuring out a loop invariant help us? Perhaps. I did have one bug where a typo slipped into the calculation of the midpoint. I typed "(hi-lo)/2" by accident and had to scratch my head for a few minutes.

After fixing the typo, the rest of the algorithm worked without a hitch.

Where’s the Invariant?

An interesting thing happened to our invariant and termination conditions. The termination condition was inverted to "(hi-lo) > 1" instead of the "(hi-lo) <= 1" we originally used. That’s because the while statement expects a "stay-in-the-loop" test instead of a "get-out-of-the-loop-test".

But more importantly, our loop invariant has disappeared! It has no representation in the source code at all, except as an implicit constraint on the loop values. This is unfortunate because some important information is lost that should have been communicated to the reader.

The loop construct in the Eiffel language allows you to directly specify the loop invariants. The same loop in Eiffel would look something like this …

           from
               lo := list.lower
               hi := list.upper + 1
           invariant
               lower_limit: -- lo <= Result
               upper_limit: -- hi <  Result
           variant
               hi - lo
           until
               (hi-lo) <= 1
           loop
               mid := (hi+lo) // 2
               if list.at(mid) > target then
                   hi := mid
               else
                   lo := target
               end
           end

Although rather verbose, notice how every part of the loop has its proper place. The termination condition is expressed as an "until" condition, therefore it isn’t inverted.

The invariants are expressed directly. It turns out that our invariants cannot be checked at runtime (since we don’t know the actual value of Result until after the loop terminates), so we expressed them as comments. However, if they were executable, Eiffel would check them at runtime for us.

The "variant" section is interesting. The variant is an non-negative integer expression that must be smaller each time through the loop. Providing a valid variant proves that your loop eventually terminates.

Even the initialization of the loop variables (hi and lo) have their own section.

Next Time …

I was going to write up all 9 variations at once, but this is long enough now for a posting. I’ll hit the two recursive variations in the next installment.


comments

Mock Turtle Soup   07 Jun 03
[ print link all ]
Mock objects are testing objects that stand in for a real object in a unit test. They are used to test the interactions between objects.

For example, the other night I was testing some CGI code. To display a page of HTML, my object under test calls a print_template method on an object in the CGI framework I am using. It needs to pass in the template name and a hash table of template values to be substituted in the template. The test code might look something like this …

    def test_error_page
      web = CGI::Template.new        # Object from the CGI framework
      page = ErrorPage.new(web)
      page.error_message = "Oops!"
      page.render
      # test that "Oops!" was properly passed to the template object.
    end

The last line of the test wants to validate that the error message was properly passed to the web template. We could check this by parsing the HTML output from the CGI::Template, but that is a lot of work (not to mention error prone as well).

A much better way is to directly test that the parameters are passed to the CGI framework object correctly. We do that by replacing the CGI framework object with out own mock object.

Something like this (lines numbered for reference)…

  01:  def test_error_page
  02:    FlexMock.use { |web|
  03:      web.mock_handle(:print_template, 1) { |template_name, hash|
  04:        assert_equal 'Oops!', hash['error_message']
  05:      }
  06:      page = ErrorPage.new(web)
  07:      page.error_message = "Oops!"
  08:      page.render
  09:    }
  10:  end

In line 2, the FlexMock.use method creates a mock object and passes it to the given block. At the end of the block (line 9), FlexMock.use will verify that all expected calls were actually made to the mock object.

In line 3, we inform the mock web object what methods calls it should expect by passing the name of the expected method to mock_handle. Since we are only interested in the print_template call, that is all we define. The number 1 in the calling sequence tells our mock object to expect exactly one call.

The mock_handle method also gets a block (still on line 3). This block is executed whenever print_template is called in our test. The block object parameters (template_name and hash in this case) should match the parameters of the print_template method.

It is the hash argument to print_template that hold our interest. We need to make sure that the hash table passed to the template engine correctly defines our error message as "Oops!". We verify this in line 4.

The rest of the test method just creates an ErrorPage and feeds it the mock object primed for testing. In line 8 we invoke the render method that should eventually format our page and call the print_template method of the mock object.

By using the mock object, we avoid a lot of setup details that would be required for a real CGI framework object (which in turn might require more objects with more setup). Using a mock object allows us to concentrate on the details of what is being tested without constructing an elaborate framework of live objects merely to support a quick test.

On the other hand, be careful. It is easy to get carried away with mocking to the point where it is more work to create a web of mock objects than it is to just create a network of live objects in the first place. When you start creating mock object that return other mock objects, you should take one step back and reassess your code base.

There are other Mock objects implementations. The original ruby-mock object is a bit rigid for my tastes. You need to know exactly what methods are called and in what order, and that over constrains the solution to your test. The test-unit-mock object is much more flexible than ruby-mock (or even FlexMock), it is just complicated enough that I forget how to use it between the times I need it. FlexMock seems to fit in that middle of the position where, for me, it does what I need without extra stuff.

You can find FlexMock at onestepback.org/software/flexmock.


comments

Test Driven Development Demo now available   03 Jun 03
[ print link all ]
The writeup on the Test Driven Development workshop we held at the Jan 7, 2003 meeting of the Cincinnati XP Users Group is now online at onestepback.org/articles/tdddemo
comments

 

Formatted: 20-May-13 15:00
Feedback: jim@weirichhouse.org