02.27.08

unpack!

Posted in Uncategorized, ruby at 12:54 am by JohnB

 [Update: presentation from the 4/15/2008 Ruby Meetup is now available here.]

I like reading code. Its like a novel and I want to read it cover-to-cover. Some, such as Why’s Camping framework, I struggle to comprehend. But most code that I read comes up slightly short. Like a novel with some mis-spellings, awkward phrasing or repeated analogies, I mentally mark it as “could be better”. And sometimes I really do sit down and write something better - maybe just for my own amusement but often for a useful purpose.

I recently had the experience of reading some code that parsed a variable-length binary data structure. This sort of thing comes up often when parsing a file format or communications protocol. Most of the code looks fairly similar because it does similar stuff: ignore one byte, read the next four as the length of the following junk, read two important bytes, ignore two more, read another four-byte length and skip past the following N bytes - ad nauseum.

I’ve written it in C, and it looks something like this (ignoring error conditions like getting to the end of the buffer):

ptr = &data;                  // start at the beginning of our data
ptr++;                        // skip junk we don't care about
UInt32 len = *(UInt32 *) ptr; // get the 4-byte length
len = ntohl(len);             // convert from network byte ordering
ptr += sizeof(UInt32);        // skip past the length we just read
ptr += len;                   // skip past the data we don't care about
UInt16 cost = *(UInt16 *)ptr; // read our important two bytes
cost = ntohs(cost);           // convert to the correct byte ordering

In Ruby, this tends to be shorter due to the handy String.unpack() routine, which takes a concise format string to define how many bytes to read and what to do with them. “a3″ reads 3 bytes as a string, “N” reads 4 bytes in network order, “n” reads 2 bytes in network order, etc. The code above could be rewritten in Ruby like this:

array = data.unpack( "a1N")        # read the junk and the 4 length bytes
len = array[1]                     # only get the length value we care about
data = data[5..-1]                 # throw away the stuff we just read
array =  data.unpack( "a#{len}n" ) # define the length to read on the fly
cost = array[1]                    # get our data in its correct ordering
data = data[(len+2)..-1]           # again, throw away what we just read

This code works fine, but its not much more readable than the C code. A first step would be do define a string.unpack!() routine, where the ‘!’ exclamation clues us in that it modifies the object we’re working with. In this case, the modification is to eat (discard) the data we just read. This shortens the code to:

array = data.unpack!( "a1N")       # read the junk and the 4 length bytes
len = array[1]                     # only get the length value we care about
array =  data.unpack!("a#{len}n")  # define the length to read on the fly
cost = array[1]                    # get our data in its correct ordering

But again, this isn’t much more readable (in my opinion) than the C code. Additionally, it doesn’t help us understand the code much better in the case where our format string is “a3Nna5″ and we need to remember which item in ‘array’ corresponds to the ‘n’ in the string (in this case, it is array[2]). After a test iteration or two, what I finally hit upon was to encapsulate the behavior we want in a separare Unpacker class, that automatically eats the data it reads and stores the results in an internal Hash object, to map the name ‘len’ or ‘cost’ to the data. I also combined the format string and the resulting variable so we can clearly see the relationships. The result looks like this:

u = Unpacker.new(data)
u.u! "a1        => unused
      N         => len"
u.u! "a#{u.len} => unused
      n         => cost"

Now we can clearly see which values are ignored, which are given meaningful names, and how the format codes relate to the meaning of the data. Changing it to reflect a better understanding of the underlying data will be very easy. Note that the only reason its in two statements is to define a value for u.len before we use it - blocks of fixed-length data can be one statement.

The code to implement the Unpacker class is only about 30 lines of Ruby - including the string.unpack!() routine that can be reused separately.

class String
  def unpack! format
     array = self.unpack(format+"a*")
    self.replace array.pop
     return array
   end
end
class Unpacker < Hash
   attr_reader :data
 def initialize string
     @data = string
    super
  end
  # format string is expected to have whitespace between each
  # "unpackCode=>variableName" pairing (which can have whitespace
  # around the "=>").  u! was picked to be short so it would
  # look nice, and to connote a destructive "unpack!" operation.
  def u! format
    format.gsub(/\s*=>\s*/,'=>').strip.split(/\s+/).each do |segment|
    src,dst = segment.split(/=>/)
    self[dst] = @data.unpack!("#{src}")[0]
 end
end
# Hash_with_Attrs - For the simplicity of using either u.len or u['len'],
# makes a hash appear to have members for each hash entry. Many thanks
# to Why_ for collecting this handy routine on his a href= RedHanded blog.
# Note of Caution: 'len' is fine but 'length' would not be since u.length
# would give the number of entries in the hash, not the just-parsed value.
def method_missing(meth,*args)
  meth = meth.id2name
  if meth =~ /=$/
    self[meth[0..-2]] = (args.length<2 ? args[0] : args)
  else
    self[meth]
  end
end
end

Update: An even cleaner and shorter way would be to implement a DSL as a module so the code above could look like this:

a 1,    :unused
N       :len
a :len, :unused
n       :cost

(and yes, this is valid Ruby code)

01.14.08

Xkcd Titles

Posted in ruby at 1:26 pm by JohnB

I’ve just noticed the geekily hilarious xkcd comic and one of the funniest aspects is that each comic has a ‘title’ attribute (the text that pops up when you hover your mouse over the image) that is often as funny as the comic itself. However, the length of the title often causes it to be truncated in my browser (Firefox 2.x, which probably has an obscure show-entire-title setting). Rather than arduously do a ‘view source’ on each one (or figure out the Firefox setting), I have Ruby do it for me. And for you if you want:

# xkcd.rb
# extract all the titles from xkcd comics since they
# tend to be too long to fully show in the browser

# USAGE: ruby -rubygems -rxkcd.rb -e 'Xkcd.new.show_all'

require 'open-uri'
require 'hpricot'

class Xkcd
  DOMAIN = 'http://xkcd.com/'

  def show id = 343  # 343 is the NSA/RSA one
    begin
      @hp = Hpricot.parse( open( "%s/%d/" % [DOMAIN,id.to_i] ) )
      (@hp / :img).each do |el|
        puts "%4d: %s" % [id.to_i, el[:title]] if el[:title]
      end
    rescue
    end
  end

  def show_all
    0.upto(400) do |i|
      show i
    end
  end
end

10.11.07

Iterators - enough of a reason for Ruby

Posted in ruby at 3:09 pm by JohnB

A non-programmer friend recently asked me why I liked Ruby so much. I asked him for a simple task that I could write in Ruby and we came up with a pyramid - from a single “a” to 26 “z”s. So I showed him this one-liner:

"a".upto("z") { |c| puts c * (1 + c[0] - "a"[0]) }

And then showed him the same program in C:

#include "stdio.h"
int main( int argc, char **argv )
{
  int loop = 0;
  for( loop = 0; loop < 26; loop++ )
  {
    int innerloop = 0;
    for( innerloop = 0; innerloop <= loop; innerloop++ )
    {
      printf( "%c", 'a' + loop );
    }
    printf("n");
  }
  return 0;
}

Enough said.

a
bb
ccc
dddd
eeeee
ffffff
ggggggg
hhhhhhhh
iiiiiiiii
jjjjjjjjjj
kkkkkkkkkkk
llllllllllll
mmmmmmmmmmmmm
nnnnnnnnnnnnnn
ooooooooooooooo
pppppppppppppppp
qqqqqqqqqqqqqqqqq
rrrrrrrrrrrrrrrrrr
sssssssssssssssssss
tttttttttttttttttttt
uuuuuuuuuuuuuuuuuuuuu
vvvvvvvvvvvvvvvvvvvvvv
wwwwwwwwwwwwwwwwwwwwwww
xxxxxxxxxxxxxxxxxxxxxxxx
yyyyyyyyyyyyyyyyyyyyyyyyy
zzzzzzzzzzzzzzzzzzzzzzzzzz

2/6/2008 Update: it might be shorter and more clear like this

("a".."z").each_with_index { |c,i| puts (c * (i + 1)) }

09.29.07

d efine for twitter - step 1

Posted in ruby at 2:55 pm by JohnB

For those of you who use twitter, you’ll likely recognize the separation of ‘d’ from ‘efine’ as intentional: ‘d’ means direct a message to another twitter user and ‘efine’ is the user you’re sending it to. Together I hope they connote ‘define’ because thats what they do. Sending

d efine ruby

to twitter should, if my twitter-bot works as intended, return a direct reply of

"A clear, deep, red, valued as a precious stone."

Which is a fairly accurate definition (even if it does leave out my favorite computer language).

So in this part I’ll describe the definition-grabbing piece, which queries wiktionary.org for the first definition. This first iteration is stupidly simple: read the entire page, parse its contents with the wondrous Hpricot tool, grab the first item from the first ordered list on the page and throw away any links. It sometimes gets odd or partial definitions so it will need improvement - but works great for the five minutes it took to write.

require 'open-uri'
require 'hpricot'
def efine word
  open("http://en.wiktionary.org/wiki/#{word}") do |f|
    (Hpricot(f.read) / "ol" / "li")[0].to_plain_text.gsub(/s*[.*]/,'')
  end
end

That’s all. You’ll have to wait for the twitter-integration piece in my next post. I haven’t written it yet, but given the functionality in twitter4r, I doubt it will be much longer than the efine() method above. In fact, my usual peeve about Ruby is just that: it takes longer to describe the code than to write it!

09.21.07

RJS Error: TypeError: $(element) has no properties

Posted in ruby at 6:11 pm by JohnB

I received this error message late yesterday while testing out RJS templates and link_to_remote(). I did a google search and didn’t find anything useful - some questions that were asked and never answered; one that said “rebuild your entire app”. Finally, I opened the page in another browser and it worked fine. huh?

doh! Rails nearly-seamless simplicity strikes again!

I had just changed my view code (.rhtml file) to include a new div that I wanted to be updated in an AJAXy manner. So I clicked the link in the browser (remember: this is AJAX - no page refresh) and expected my new div to be replaced with the neat new content. Nope. I had to do a decidedly non-AJAX page refresh so my browser would now have the neat new div - only then could the div be replaced.

Its so simple to swap between edit and test, edit and test, edit and test, that the few times you’re required to step out of the cycle seem like a huge hassle. But not when compared to every other development process I’ve used.

I had a similar experience with the routes.rb file. Unlike models and controllers and non-AJAX views, the routes.rb file only gets loaded when the web server starts. Stopping and starting the server fixed the problem - but I think I had to run into it multiple times before I realized what the issue was. A minor pot-hole on the smooth Rails path.

To mis-quote a bumper sticker: the worst day coding Ruby is better than the best day fighting C++.

08.07.07

irb tip

Posted in ruby at 3:04 pm by JohnB

Have you ever loaded a file into irb, only to find that it scrolls endlessly? Its easy to do by accident:

log = File.open('bigger_file_than_you_expected.log') { |f| f.read }

But a simple trick can limit the output to a single useful line:

(log = File.open('bigger_file_than_you_expected.log') { |f| f.read }).length
=> 5066612

Now you can happily slice and dice your data without all the useless output.  And yes, it applies to any operation that would spit out more data than you really want to see.

08.05.07

Rails Drop-In: tooltip.rb

Posted in ruby at 11:41 pm by JohnB

I recently came across Davey Shafik’s nice little Tooltip.js script - and I love it. I love it so much that I’m adding it as context-sensitive help all over the site I’m building. My needs are fairly simple: just some sort of pseudo-icon I can use next to any visual element to show that help or warning information is available. Every aspect can (of course!) be styled any way you want it.

To gain consistency and ease of use I wrote something that isn’t an actual Rails plug-in but is more of a… drop-in. Just drop in three files and you’re ready to roll. There are a few ways to add the tips to your views, but the easiest way is to just add

<%= Tooltip.help name %>

or

<%= Tooltip.alert name %>

to show a highlighted icon. The ones I currently use are html pseudo-icons (? for help and ! for warnings) but they can just as easily be image tags (and will be, as soon as I can find some appropriate icons).

Finally, to get the tip data to be formatted into hidden divs you need to add this to the bottom of your view (or, ideally, your layout):

<%= Tooltip.content %>

Thats nearly all there is to it. The only remaining task is to add these three files:

public/javascripts/Tooltip.js
Davey’s code.
/lib/tooltip.rb
My code.
/config/tooltips.yaml
Styles and the tip contents.

Eventually I’ll make it publicly available, but if you’d like a copy sooner just send me a note at my gmail address: john dot baylor.