02.27.08

unpack!

Posted in Uncategorized, ruby at 12:54 am by JohnB

[Update 10/11/2009: I just found a better tool, bindata, to do what I'm describing in this post. It also lists, at the bottom of the page, many links to yet other implementations of binary data packing/unpacking. Worth checking out if this is what you need.]

[Update: presentation from the 4/15/2008 Ruby Meetup is now available here.]

I like reading code. Its like a novel and I want to read it cover-to-cover. Some, such as Why’s Camping framework, I struggle to comprehend. But most code that I read comes up slightly short. Like a novel with some mis-spellings, awkward phrasing or repeated analogies, I mentally mark it as “could be better”. And sometimes I really do sit down and write something better – maybe just for my own amusement but often for a useful purpose.

I recently had the experience of reading some code that parsed a variable-length binary data structure. This sort of thing comes up often when parsing a file format or communications protocol. Most of the code looks fairly similar because it does similar stuff: ignore one byte, read the next four as the length of the following junk, read two important bytes, ignore two more, read another four-byte length and skip past the following N bytes – ad nauseum.

I’ve written it in C, and it looks something like this (ignoring error conditions like getting to the end of the buffer):

ptr = &data;                  // start at the beginning of our data
ptr++;                        // skip junk we don't care about
UInt32 len = *(UInt32 *) ptr; // get the 4-byte length
len = ntohl(len);             // convert from network byte ordering
ptr += sizeof(UInt32);        // skip past the length we just read
ptr += len;                   // skip past the data we don't care about
UInt16 cost = *(UInt16 *)ptr; // read our important two bytes
cost = ntohs(cost);           // convert to the correct byte ordering

In Ruby, this tends to be shorter due to the handy String.unpack() routine, which takes a concise format string to define how many bytes to read and what to do with them. “a3″ reads 3 bytes as a string, “N” reads 4 bytes in network order, “n” reads 2 bytes in network order, etc. The code above could be rewritten in Ruby like this:

array = data.unpack( "a1N")        # read the junk and the 4 length bytes
len = array[1]                     # only get the length value we care about
data = data[5..-1]                 # throw away the stuff we just read
array =  data.unpack( "a#{len}n" ) # define the length to read on the fly
cost = array[1]                    # get our data in its correct ordering
data = data[(len+2)..-1]           # again, throw away what we just read

This code works fine, but its not much more readable than the C code. A first step would be do define a string.unpack!() routine, where the ‘!’ exclamation clues us in that it modifies the object we’re working with. In this case, the modification is to eat (discard) the data we just read. This shortens the code to:

array = data.unpack!( "a1N")       # read the junk and the 4 length bytes
len = array[1]                     # only get the length value we care about
array =  data.unpack!("a#{len}n")  # define the length to read on the fly
cost = array[1]                    # get our data in its correct ordering

But again, this isn’t much more readable (in my opinion) than the C code. Additionally, it doesn’t help us understand the code much better in the case where our format string is “a3Nna5″ and we need to remember which item in ‘array’ corresponds to the ‘n’ in the string (in this case, it is array[2]). After a test iteration or two, what I finally hit upon was to encapsulate the behavior we want in a separare Unpacker class, that automatically eats the data it reads and stores the results in an internal Hash object, to map the name ‘len’ or ‘cost’ to the data. I also combined the format string and the resulting variable so we can clearly see the relationships. The result looks like this:

u = Unpacker.new(data)
u.u! "a1        => unused
      N         => len"
u.u! "a#{u.len} => unused
      n         => cost"

Now we can clearly see which values are ignored, which are given meaningful names, and how the format codes relate to the meaning of the data. Changing it to reflect a better understanding of the underlying data will be very easy. Note that the only reason its in two statements is to define a value for u.len before we use it – blocks of fixed-length data can be one statement.

The code to implement the Unpacker class is only about 30 lines of Ruby – including the string.unpack!() routine that can be reused separately.

class String
  def unpack! format
     array = self.unpack(format+"a*")
    self.replace array.pop
     return array
   end
end
class Unpacker < Hash
   attr_reader :data
 def initialize string
     @data = string
    super
  end
  # format string is expected to have whitespace between each
  # "unpackCode=>variableName" pairing (which can have whitespace
  # around the "=>").  u! was picked to be short so it would
  # look nice, and to connote a destructive "unpack!" operation.
  def u! format
    format.gsub(/\s*=>\s*/,'=>').strip.split(/\s+/).each do |segment|
    src,dst = segment.split(/=>/)
    self[dst] = @data.unpack!("#{src}")[0]
 end
end
# Hash_with_Attrs - For the simplicity of using either u.len or u['len'],
# makes a hash appear to have members for each hash entry. Many thanks
# to Why_ for collecting this handy routine on his a href= RedHanded blog.
# Note of Caution: 'len' is fine but 'length' would not be since u.length
# would give the number of entries in the hash, not the just-parsed value.
def method_missing(meth,*args)
  meth = meth.id2name
  if meth =~ /=$/
    self[meth[0..-2]] = (args.length<2 ? args[0] : args)
  else
    self[meth]
  end
end
end

Update: An even cleaner and shorter way would be to implement a DSL as a module so the code above could look like this:

a 1,    :unused
N       :len
a :len, :unused
n       :cost

(and yes, this is valid Ruby code)

01.29.08

Book Recommendation: The Rails Way

Posted in Uncategorized at 12:53 pm by JohnB

The Rails Way is a Ruby on Rails reference book that I bought on Josh Susser’s recommendation.  I’ve actually, to my family’s dismay, been reading the darn thing instead of just referring to it like one would a, well, reference book.  A lot of Rails’isms that I had a vague idea about I now understand with much more clarity.  It will definitely come in handy soon.

01.25.08

Using Ubuntu

Posted in Uncategorized at 3:18 pm by JohnB

I’ve heard that dual-booting Ubuntu linux was easy but its really true. I’m now running Ubuntu and it was as easy as various blog posts have said. The longest step in the process was defragmenting the drive before repartitioning with Ubuntu. There are a few issues remaining around using the data on the Windows partition from Linux, but on the whole I’m very happy with the switch.

[Update 1/29/2008: the network is inconsistent.  Upon a boot or un-hibernate it may be completely incapable of finding my router - but then later it is fine. I'll continue trying to track it down... using the Windows OS!] 

01.14.08

Xkcd Titles

Posted in ruby at 1:26 pm by JohnB

I’ve just noticed the geekily hilarious xkcd comic and one of the funniest aspects is that each comic has a ‘title’ attribute (the text that pops up when you hover your mouse over the image) that is often as funny as the comic itself. However, the length of the title often causes it to be truncated in my browser (Firefox 2.x, which probably has an obscure show-entire-title setting). Rather than arduously do a ‘view source’ on each one (or figure out the Firefox setting), I have Ruby do it for me. And for you if you want:

# xkcd.rb
# extract all the titles from xkcd comics since they
# tend to be too long to fully show in the browser

# USAGE: ruby -rubygems -rxkcd.rb -e 'Xkcd.new.show_all'

require 'open-uri'
require 'hpricot'

class Xkcd
  DOMAIN = 'http://xkcd.com/'

  def show id = 343  # 343 is the NSA/RSA one
    begin
      @hp = Hpricot.parse( open( "%s/%d/" % [DOMAIN,id.to_i] ) )
      (@hp / :img).each do |el|
        puts "%4d: %s" % [id.to_i, el[:title]] if el[:title]
      end
    rescue
    end
  end

  def show_all
    0.upto(400) do |i|
      show i
    end
  end
end

01.12.08

Jumping on the Bandwagon

Posted in musings at 3:06 pm by JohnB

I just have to wonder: who were the first two people to have “died in a blogging accident“?

12.17.07

Another High-Traffic Rails Site: catalogchoice.org

Posted in Uncategorized at 6:12 pm by JohnB

Its getting a lot of traffic and seems pretty snappy:

http://www.catalogchoice.org/

So yes, Virginia, Ruby and Rails do scale.

11.28.07

The Perception of Scarcity in a Climate of Fear

Posted in musings at 1:56 pm by JohnB

I was playing Blokus today, where competition is driven by the scarcity of space on the game board, and realized that the perception of scarcity is often more prevalent than actual scarcity – and thus we needlessly hobble ourselves by limiting things that are abundant. Similarly, our fear that something might happen to us (crime, identity theft, terrorism, etc. – whatever monsters we see on the evening news) forces us to add locks and protections that mostly just result in making it hard for us to access our own belongings and data and websites.

The context for this discussion is a website (nameless, sorry) that I’m interested in working on. The startup site, yet another type of social network, holds the promise of allowing for some very interesting and powerful interactions – but unnecessarily limits its users as it guards scarce server resources and data security. Furthermore, and I’m going out on a limb here, I suspect that these mis-perceptions are one of the reasons this startup has had difficulty in raising much-needed funds. Some examples:

  • Users are automatically logged out after a few idle minutes, with no option of changing the time period before auto-logout (or choosing “Keep me logged in” for single-user computers). This seems a bit draconian given that there is nothing accessible on the site that couldn’t be gathered in other ways – no bank statements, social security number or mother’s maiden name.
  • A PDF document containing the public profile data for your social circle can be generated for off-line access, but only by a subset of the social circle and only for a short period of time. I think this is intended both for security and to guard scarce resources (such as server time and bandwidth). The former concern is misguided – anyone receiving the PDF can circumvent security by immediately sending it to bad people – which is unstoppable once you provide off-line access). The scarcity of server time or bandwidth can be overcome by delegating it to someone else such as Amazon’s ECC or S3 services.
  • New people can be invited to the social circle, but only by a small initial set of users – and those invitations expire relatively quickly. Its unclear why this decision was made, but I suspect it was due to some perception of scarcity or security. All it appears to do is add yet another unnecessary barrier to entry.

In spite of these issues, and others, I’m still captivated by the underlying ideas that it represents and by what it could become in the future. Hopefully I can rapidly prototype my vision for an improved site and use it as a starting point to land a dream job.

Even More Rapid Development

Posted in Uncategorized at 1:41 pm by JohnB

The success of the Ruby on Rails web framework is somewhat based on its  ability to soothe the pain caused by the not-so-rapid development process of other, so-called “enterprise-ready” frameworks.  But Rails is not the only Ruby web framework, and not the fastest one for initial prototyping(*).  The faster (more rabid?) ones I’ve looked at:

  • Camping.  From the the quirky mind of why-the-luck-stiff (no other name given) it inspires absurdly fast development (and absurdity!).
  • Sinatra.  Some people who have tried Camping have moved on to Sinatra – it has a clean syntax and a simple metaphor (Sinatra attends events) and is supported by a larger team.

Its hard to imagine what faster development would look like – maybe a web interface for defining Camping or Sinatra event handlers?  Code the app directly from the browser!

(*) Footnote: Note that I use the word “prototype” because that is all I have done with them – I see no reason they couldn’t scale as well as Rails or any other web framework.

11.18.07

Faulty Sensory Awareness – Water Never Lies

Posted in health at 11:16 am by JohnB

I’m learning how to use my body, with the help of Amira, my Alexander teacher. The Alexander Technique has a number of techniques for reminding us that (a) we are usually working harder than we need to and (b) what we think we are doing is not always what we are actually doing. This latter aspect is known as “Faulty Sensory Awareness” because, although we may think we are standing up straight, any observer (or ourself looking in a mirror) can tell that we’re leaning one way or the other.

This became very clear to me last week as I was swimming laps. When kicking a length on my back, I tend to believe I’m looking straight up at the sky. Although I don’t really need my goggles in this position I usually have them on anyway – I can see the beautiful sky better. A side benefit of wearing the goggles is that, if I splash, I don’t get water in my eye.  Last week I happened to be kicking on my back with my goggles off and noticed that my left eye was getting a bit of water in it, while my right eye was dry. Just to verify, I made sure that it happened in both directions down the lane – that it wasn’t due to my neighbor’s wake in the next lane. Nope. Unless the pool was tilted, it was me.

Although I thought my nose was pointed straight up, and it certainly felt “normal”, the water was telling me that I was ever so slightly tilted to the left!

So, if you’ve ever wondered about whether your body could work better than it does, give Amira (or any Alexander teacher) a try. As I always say: “She hasn’t killed me yet!”. (BTW: that is an inside joke with Amira – she wouldn’t allow that quote on her website so I had to come up with a different one)

10.11.07

Iterators – enough of a reason for Ruby

Posted in ruby at 3:09 pm by JohnB

A non-programmer friend recently asked me why I liked Ruby so much. I asked him for a simple task that I could write in Ruby and we came up with a pyramid – from a single “a” to 26 “z”s. So I showed him this one-liner:

"a".upto("z") { |c| puts c * (1 + c[0] - "a"[0]) }

And then showed him the same program in C:

#include "stdio.h"
int main( int argc, char **argv )
{
  int loop = 0;
  for( loop = 0; loop < 26; loop++ )
  {
    int innerloop = 0;
    for( innerloop = 0; innerloop <= loop; innerloop++ )
    {
      printf( "%c", 'a' + loop );
    }
    printf("n");
  }
  return 0;
}

Enough said.

a
bb
ccc
dddd
eeeee
ffffff
ggggggg
hhhhhhhh
iiiiiiiii
jjjjjjjjjj
kkkkkkkkkkk
llllllllllll
mmmmmmmmmmmmm
nnnnnnnnnnnnnn
ooooooooooooooo
pppppppppppppppp
qqqqqqqqqqqqqqqqq
rrrrrrrrrrrrrrrrrr
sssssssssssssssssss
tttttttttttttttttttt
uuuuuuuuuuuuuuuuuuuuu
vvvvvvvvvvvvvvvvvvvvvv
wwwwwwwwwwwwwwwwwwwwwww
xxxxxxxxxxxxxxxxxxxxxxxx
yyyyyyyyyyyyyyyyyyyyyyyyy
zzzzzzzzzzzzzzzzzzzzzzzzzz

2/6/2008 Update: it might be shorter and more clear like this

("a".."z").each_with_index { |c,i| puts (c * (i + 1)) }

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »