A brief history of tpl
Development of tpl began in 2005. The author was developing a multi-process server. The server had several
database "loader" sub-processes that generated over half a gig of data, in the form of C structures. These needed to
be read into the parent process.
Rumination
There were boring and interesting ways to approach this problem. The boring way would be to write the C structs to disk and read
them back in the parent.
The interesting way
There is nothing wrong with writing C structures to disk, but it'd be far more useful to solve the binary data
interchange problem at a more general level.
XML?
Sure, the half gig of binary data (mostly bit vectors encoded as integer arrays) could be stored in XML.
But binary data interchange isn't XML's strong suit. XML is inconvenient to parse in C, and it's extremely bulky.
Why sacrifice the speed of a modern processor by converting all its native data representation to text and back?
Binary skeptics
Many criticize binary formats because they're often proprietary, hard to inspect, non-editable, non-extensible. But, I had in mind
an open binary format, that could be easily inspected (via an XML conversion utility), and read or written through a standard API.
Inspiration from D-BUS
A Linux Journal article by Robert Love, entitled "Get on the D-BUS" sustained my interest in binary protocols (D-BUS itself sends
typed binary messages). D-BUS also had a clever approach to type-safety. Message types are expressed as a format string, and stored
in the message itself. That idea was too good not to steal. A format string like A(i) tells the recipient, "This is an array of
integers." But, it also tells the API how to parse the binary payload.
Just add API
I wanted to find an API that would be compact and innocuous-- one that didn't look out of place in a C program. I really wanted to
avoid the kind of API that requires writing a call for each data item being read or written. ("Now read an int. Now read a double.
Now read a char[2].") Mapping the format string to C variables solved that problem. By saying, "pack index 1" the API could
do all the work of finding all those C variables and copying their current values, in one fell swoop.
Quanta
I wanted the serialized form to constitute a discrete unit, or packet. A packet encapsulates an entire package of data. If several
packets are stored in a file back-to-back, or sent across a socket, I wanted the recipient to be able to discern the byte boundaries
of the individual packets. The key would be to have a fixed header section that encoded the overall packet length, among other
things such as endian order, and its format string.
IPC
These packets provide a convenient basis for inter-process messaging (IPC). Sending and receiving whole messages (which is how we
construe a packet when it's used for IPC) is much more convenient than using lower-level socket reads/writes. The API would have to
let us load or emit a whole packet at a time.
Validity
A strictly-defined binary packet format would also allow the recipient to validate its well-formedness programmatically. No DTD?
Great!
Late nights
Writing open-source projects is not for the weakly-committed. All the ingredients were coming together but it took a long time for
me to get the internals of the tpl data structure right. In particular, the first version of tpl (Sep 2005) didn't support nested
arrays.
One lesson I learned in developing tpl and earlier projects, is that
you'll find more bugs if you compile and test on two (or more) distinct OS/compiler suites. I repeatedly found bugs under Solaris
that were invisible under Linux. And speaking of debuggers, gdb is great, but Sun's dbx with its "check -access" is invaluable.
A time to be born
I managed to get tpl ready to release in September 2006. You never really finish a program, you just decide to stop fiddling with
it. At least, I did for now.