Monday, January 31, 2011

Pointer swizzling and Serialization

While reading up a bit on serialization I came across this nice little way of persisting a typical linked list node in C using pointer swizzling. 

A node like this:

struct node {
int data;
struct node *next;
};

cannot be persisted or written to a file system or database with the value of the pointer as the memory address pointed to be 'next' will never be predictable when the object is retrieved later. So create an enumeration of the nodes in the list and dynamically generate the 'next' pointer for traversing every time you re-create the list structure.

The persisted node will look like this:

struct node_saved {
int data;
int id_number;
int id_number_of_next_node;
};

Also one needs to take care of Endianess when serializing across different machines as some data types like floating point even when implemented with the same IEEE standard on both the machines may not be readable or interpreted differently when run on the two machines.

The official java api documentation for serialization gives a very good explanation of doing serialization with java. One mainly needs to the following to ensure proper serialization and de-serialization (in their words maintaining consistency in reading/writing flattened objects)
  1. Implement the Serializable interface
  2. Not everything is serializable. It makes no sense to serialize threads or some streams (Probably all). If you need to use them in your serializable class, ensure you mark them as transient.
  3. Override read/write methods of the object to ensure more control over how the state is maintained and later retrieved.
  4. Keep a constant SerialVersionUid field in your classes to ensure that persisted objects whose classes get changed still remain valid when deserialized across different versions of the same class.
  5. Reset or close streams to avoid caching problems. Two successive writes on the same object will not be cache safe.
  6. There are performance issues.

Wikipedia also gives a comparison of different Serialization formats for data (note the difference between data and programming logic i.e code).

Also a note on using threads in general: Avoid threads when not required because they are difficult to manage. Preferable use event drive programming for UI and distributed computing. Here is a popular presentation which tells in detail why. A simple set of examples giving the benefits of using threads in sorting, searching and matrix multiplication is given here. A more involved concurrent programming model in Java is explained here for a webservice example.

I came across this new project under development by a couple of Google folks called Camlistore. Its entirely based on storing blobs (Binary data chunks) and indexing a pointer to the blob and signing it properly to ensure privacy control. It also has a protocol for mirroring the content. It has a fairly simple archtecture but could be a useful standard for managing all of one's media (Dropbox + Greplin + Google Docs + All blog services) . It does seem similar to an application built above a NoSQL implementation like REDIS but has a nice standard SQL database feel to it.

 

Trivia

Seems like people are using open source in Unmanned aerial vehicles too. Check the Open pilot project and the below video for some interesting stuff.

 

-- Pratik Mandrekar

No comments: