Living in the Future

Paul Graham’s latest essay - How to get startup ideas - is a great read.

I was struck by the Bucheit/Pirsig conjecture:

“Live in the future, then build what’s missing.”

and the following paragraph regarding ideas that come out of folks’ experience at college.

pg encourages those readers who are still studying to take classes unrelated to their CS major so that they may see more problems worth solving. In contrast, what struck me about this paragraph was how much, for me, college was like living in the future. In the late nineties, I lived in an environment where every single member of my social circle had an always-on 10 Mbit connection to the Internet and spent inordinate amounts of time communicating via email, IM etc. It seems like no coincidence that so many successful Internet companies were born out of students of that era. I doubt that today’s students encounter the future of much at all in their dorm rooms. Perhaps universities should be working hard to make sure that campus living is more like living in the future than setting up mobile app development courses, incubators etc etc.

Parsing Huge XML Files With Go

I’ve recently been messing around with the XML dumps of Wikipedia.  These are pretty huge XML files - for instance the most recent revision is 36G when uncompressed.  That’s a lot of XML!  I’ve been experimenting with a few different languages and parsers for my task (which also happens to involve some non trivial processing for each article) and found Go to be a great fit.

Go has a common library package for parsing xml (encoding/xml) which is very convenient to code against.  However, the simple version of the API requires parsing the whole document in one go, which for 36G is not a viable strategy.   The parser can also be used in a streaming mode but I found the documentation and examples online to be terse and non-existant respectively, so here is my example code for parsing wikipedia with encoding/xml and a little explanation!

(full example code at https://github.com/dps/go-xml-parse/blob/master/go-xml-parse.go)

Here’s a little snippet of an example wikipedia page in the doc:

// <page> 
//     <title>Apollo 11</title> 
//      <redirect title="Foo bar" /> 
//     ... 
//     <revision> 
//     ... 
//       <text xml:space="preserve"> 
//       {{Infobox Space mission 
//       |mission_name=&lt;!--See above--&gt; 
//       |insignia=Apollo_11_insignia.png 
//     ... 
//       </text> 
//     </revision> 
// </page>

In our Go code, we define a struct to match the <page> element, its nested <redirect> element and grab a couple of fields we’re interested in (<text> and <title>).

type Redirect struct { 
    Title string `xml:"title,attr"` 
} 

type Page struct { 
    Title string `xml:"title"` 
    Redir Redirect `xml:"redirect"` 
    Text string `xml:"revision>text"` 
}

Now we would usually tell the parser that a wikipedia dump contains a bunch of <page>s and try to read the whole thing, but let’s see how we stream it instead.

It’s quite simple when you know how - iterate over tokens in the file until you encounter a StartElement with the name “page” and then use the magic decoder.DecodeElement API to unmarshal the whole following page into an object of the Page type defined above. Cool!

decoder := xml.NewDecoder(xmlFile) 

for { 
    // Read tokens from the XML document in a stream. 
    t, _ := decoder.Token() 
    if t == nil { 
        break 
    } 
    // Inspect the type of the token just read. 
    switch se := t.(type) { 
    case xml.StartElement: 
        // If we just read a StartElement token 
        // ...and its name is "page" 
        if se.Name.Local == "page" { 
            var p Page 
            // decode a whole chunk of following XML into the
            // variable p which is a Page (se above) 
            decoder.DecodeElement(&p, &se) 
            // Do some stuff with the page. 
            p.Title = CanonicalizeTitle(p.Title)
            ...
        } 
...

I hope this saves you some time if you need to parse a huge XML file yourself.

Hands on With Raspberry Pi

I was extremely fortunate to get access to a Raspberry Pi alpha board for the past couple of weeks. For those of you who haven’t already heard about it, the Raspberry Pi project was started to provide a tiny computer for kids to learn to program. It’s a credit card sized computer with a 700 MHz ARM 11 CPU, 256 MB RAM, USB ports to connect a keyboard and mouse and HDMI out so you can plug it in to a TV or monitor - that’s enough power to run Linux, a web browser etc. What’s truly revolutionary is the price point - all of this comes for $25. At that price, the potential for a full blown computer in lots of homebrew embedded electronics projects could be transformational and the initial release of board for pre-order sold out in a matter of hours.

rpi

So, what’s it like in practice? I had a chance to play with the Debian “squeeze” distribution - the official Fedora based image was not yet available. Getting the image written onto an SD card (I recommend 4 GB min as the default image leaves not a lot of empty space to install new software on a 2 GB card) was simple enough following these instructions. I decided that it would be fun to try to get my Neural Network controlled RC car working on Raspberry Pi. The Rasp Pi team are working on an add on ”Gertboard” for I/O but since those aren’t available yet and the device already has USB ports, connecting an Arduino UNO board should work great, right? Well, yes, but the debian image doesn’t come with kernel driver support or prebuilt modules for the usb/serial interface Arduino uses. It took quite a bit of digging to find all the info I needed to build these myself, but I’ve made prebuilt modules available at the end of this post if you’d like to repeat this yourself.

This also means that Rasp Pi can be a great development environment for anyone getting started with Arduino who doesn’t have an expensive PC to connect it to (e.g. at school).

Next step was to get the Rasp Pi driving the car. After installing the default Java JVM (open jdk), I got the camera streaming to the board - the screen shot you can see here is live video from an android phone for the self-driving car… woo!

Unfortunately, openjdk does not do JIT (just in time compilation) on ARM, so the performance of this set up was not going to get fast enough to drive the car (it managed about 1 frame per second without the neural network running). This was just the inspiration I needed to re-implement the project in C++! So, after a few further evenings’ work I was able to claim what I think is the world’s first self driving (RC) car powered by Raspberry Pi! The new C++ code can be found at github.com/dps/nnrccar/tree/master/cpp-driver.

Overall impressions? Rasp Pi is not going to let the throng of enthusiasts awaiting delivery down - it’s enchanting to have a self contained fully fledged linux box and I know I’d have saved my pocket money to buy one when I was a kid. The Debian image had by no means a non-techy friendly setup process, but Eben has been very clear that the software is expected to get much better now that the initial batch are being unleashed on an army of motivated geeks, and what I’ve read about the official Fedora Remix image so far sounds like it’s a big step in the right direction.

I hope some of you also have fun with Arduino on this platform:

Arduino on Rasp Pi

Install the Arduino software sudo apt-get install arduino

Download the pre-built rasp pi / debian kernel modules I built.

Enable the modules

sudo insmod drivers/usb/class/cdc-acm
sudo insmod drivers/usb/serial/usbserial
sudo insmod drivers/usb/serial/ftdi_sio

Plug in your Arduino UNO.

You should now have a USB serial port for the board on /dev/ttyACMO. Enjoy!

One Nanosecond Is to One Second as One Second Is to 31.7 Years

Drops on green #1

Peter Burns wrote a great post earlier last week about timescales as they might be “perceived” by a computer’s CPU… “your CPU lives by the nanosecond” [and humans live by the second]. The post seems to be loosely based on this article.

I found that the comparison really resonated with me and could provide a useful way to get an intuitive handle on the tradeoffs we make when designing software systems…

A nano-second is one billionth of a second.

Moderately fast modern CPUs can process a few instructions (e.g. comparing a couple of numbers) every nanosecond, much as humans can “process” a few basic facts every second (e.g. comparing a couple of numbers!). This might blow your mind: A nanosecond is to one second as one second is to 31.7 years!

Peter’s comparisons talked only about the timescales it takes to shuffle data backwards and forwards within one computer (CPU, main memory, disk). Many software systems nowadays consist of a collection of computers connected together by a fast network (within a datacenter) and often co-operating with services running on the other side of the globe to deliver the kinds of applications and services we’re used to using on the web. Therefore, I thought it quite interesting to extend the analogy and think about some of the Numbers Everyone Should Know (due to Jeff Dean) as if a nanosecond was a second.

L1 cache reference - 0.5 ns -> half a second.
Branch mispredict - 5 ns -> 5 seconds.
L2 cache reference - 7 ns -> 7 seconds.
Main memory reference - 100 ns -> 1 minute 40 seconds.

Now it gets interesting:

Send 2K bytes over 1 Gbps network - 20,000 ns -> 5 and a half hours.
Read 1 MB sequentially from memory - 250,000 ns -> nearly 3 days.
Round trip within same datacenter - 500,000 ns -> nearly 6 days.
Disk seek - 10,000,000 ns -> 4 months
Read 1 MB sequentially from disk - 20,000,000 ns -> 8 months.
Send packet California->Europe->California - 150,000,000 ns -> 4.75 years.

The most significant (and perhaps initially unintuitive) of these is that it can be significantly faster to read data from RAM on another nearby machine via the network (6 days) rather than seek to it on local disk (8 months).

I’ll throw one more in there: round trip across a 3G mobile network: 250,000,000 ns -> nearly 8 years!

What’s the point of thinking like this? Well, by putting timescales into units that humans can more intuitively understand and reason about, I hope this might help me (and you) make better choices as we design new systems.

How I Built a Self-driving (RC) Car and You Can Too!

Recently, I have been refreshing my knowledge of Machine Learning by taking Andrew Ng’s excellent Stanford Machine Learning course online. The lecture module on Neural Networks ends with an intriging motivating video of the ALVINN autonomous car driving itself along normal roads at CMU in the mid 90s.

I was inspired by this video to see what I could build myself over the course of a weekend.

Read the full article here

Why Mobile Apps Suck When You’re Mobile.

In 2011, Smartphones are ubiquitous and everyone and his dog is writing mobile apps, but using apps when you’re not in range of a fixed wifi hotspot or standing still in an urban area is often extremely frustrating. How often have you tried to refresh and found yourself staring at an interminable spinner that makes you want to throw your phone at the wall? Here’s why (and a plea to app developers to do something about it!) Read More

Arduino Temperature Logging

This graph shows the result of my weekend project - it’s the temperature in my living room, logged to an app running on Google Appengine every 30s via an Arduino UNO with ethernet shield.  I’m building out this project to be a little more generic and then will share the details so others can do the same.

Arduino PS2 Mouse Controlled RC Car!

I spent the weekend learning how to hack hardware with Arduino -  I built this mouse controlled RC car.  Fun to build and fun to play with!

Read the detailed HOWTO

DODOcase Factory Tour

On my latest trip to San Francisco, I was privileged to be able to drop in to the DODOcase factory and got a guided tour from chief DODO - Patrick. DODOcase use traditional bookbinding techniques (at a long-established local book binder) to produce a really neat book-like case for iPad and Kindle 3.

Patrick gave me a tour of their production facilities. In addition to the local book binder, they work with a nearby workshop which employs disabled San Franciscans who build the DODOcase frames by hand and attach them to the outers made at the book binder. It was impressive to see how DODOcase is helping provide work for their local community while turning out a top-quality product.