5min read

Data Serialization in Embedded Systems

What is data serialization? Serialization is a process of converting data structures into a format that is easy to store and transmit (over the network or other ways of communication). Serialized data could be then reconstructed on the same or even different environment in the process of data deserialization. Different programming languages may have custom solutions that cover that functionality. That’s mainly the case of high-level programming languages. In embedded systems, where C and lightweight part of C++ are used almost exclusively there is no such standard mechanism.

Serialization in embedded systems vs. in high-level languages

In high-level programming languages, often human-readable, text-based data serialization formats like XML or JSON are utilized. That’s not the case with embedded systems, which are typically highly resource-constrained. The serialization algorithm should have a small code footprint and low memory usage. Also, the size of serialized data should be as small as possible. For those reasons a custom, binary format seems to be the right direction to go. The cost of binary serialization is that the data is not human-readable. That’s often less important, though.

There are two main reasons for doing serialization. Firstly, you may want to save the current state of the application to a persistent storage in order to load it later. Secondly, to transmit the data between different systems or applications. In some cases these different applications may be written in completely different environments with different programming languages. For example, you may need to connect the embedded device to PC where high-level Python application is installed to communicate with. You should first consider the purpose of data serialization before choosing the proper method.


Schema-required or schema-less ways of serializing data?

There are basically two ways of serializing the data. First, a schema-based solution requires some way of defining the structure of serialized data. The schema should fully describe the protocol—the format of the messages. Often some kind of an interface description language is used in order to specify this information. The library-specific tool then generates appropriate structures based on that file. A great thing about that style of data serialization is that the schema can be reused (even on different platforms) in all the modules or applications which communicate with each other. The schema could be even parsed to generate the code in different languages, even high level like Python or Java.

There is also a schemaless kind of data serialization. It will perfectly fit your needs if you already have the model to be serialized defined in your code, as there will be no overhead to either describe it redundantly in the schema or rewrite it to use the auto-generated structures. It will also be a perfect choice if the data serialization is not needed for communication and you don’t have to define the cross-platform protocol.

Google Protocol Buffers

Protocol Buffers is an open-source mechanism for serializing structured data created by Google. The mechanism is language and platform independent and is based on a schema defined in a custom, simple language. A *.proto file has to be written with the definition of the messages that represent the structure of data to serialize. Based on that, the schema code generator creates language specific code that allows interacting with structures representing the messages. Example proto file (and an option file defining the size of arrays) could look like this:

#proto file
message Event {
    required int32 id = 1;
    required string name = 2;
    required EventData eData = 3;
message EventData {
    required bytes data = 1;

#options file   		 max_size:32  	 max_size:64

Google provides multiple, official code generators supporting various languages: Java, C++, Python, Ruby, JavaScript, Objective-C and C#. They are however unsuited for embedded solutions. That’s where the open-source benefits appear. There are multiple implementations of Google Protobuf for C. Two of them are actively maintained: Nanopb and protobuf-c. Based on the code size (which is a few times bigger in case of protobuf-c library) and the use of dynamic memory allocation (also in case of protobuf-c) the right choice for highly restricted embedded systems should be the Nanopb library. The authors of the library prepared a benchmark which tests the compiled size, memory usage and execution time of the competitive solutions.

The Nanopb generates following C structures based on the example proto file presented above.

/* Struct definitions */
typedef PB_BYTES_ARRAY_T(64) EventData_data_t;
typedef struct _EventData {
	EventData_data_t data;
/* @@protoc_insertion_point(struct:EventData) */
} EventData;

typedef struct _Event {
	int32_t id;
	char name[32];
	EventData eData;
/* @@protoc_insertion_point(struct:Event) */
} Event;


FlatBuffers (with flatcc - the C language bindings) is another schema-based serialization library, originally created at Google. So, what’s the difference between them? The distinctive feature of FlatBuffers is that the binary, serialized data could be accessed directly without unpacking the whole structure (or “message” in Protobuf naming convention). Extremely fast data serialization and deserialization come with the cost of less efficient “compression” of the data. That results in the larger size of serialized messages. It may vary depending on the use case but I’ve seen about 40% larger serialized data comparing to Nanobp.

The library introduces its own schema description language in *.fbs files in which the previous example would look like this:

table Event {
    id : uint32;
    name : string;
    eData : EventData;
table EventData {
    Data : [uint8];

They seem to be pretty similar, but there is a major difference between Protocol Buffers and Flat Buffers from the programmer’s perspective. Protocol Buffers library handles automatically the process of serializing and deserializing with only one method call. You just need to create instances of the structures defined by the code generator and call a function to serialize them—that’s all! Protocol Buffers takes the responsibility of properly serializing even complicated nested structures.

That’s not the case with FlatBuffers where the library instead of the data structures generates functions to serialize each parameter of your “message”. You need to write data serialization code yourself using the API provided by the library.



MessagePack and its C implementation called MPack is a schemaless data serialization library suitable for embedded systems. Authors say it’s an extremely fast and lightweight solution. Lack of the schema could be treated as an advantage or a disadvantage depending on your needs but our tests showed that the library is not that lightweight when it comes to the program size (about twice larger than Nanopb and flatcc) as well as the serialized data size (which is also the largest one among the tested libraries).

However, it still may be a good choice in some cases. It’s well documented and easy to use—you only need to include two files into your source code. In our case the serialization process was relatively fast. By the way, the authors of the library prepared a benchmark comparison analysis of the available schemaless data serialization libraries.


There’s plenty of available libraries to handle the data serialization in embedded systems. The Nanopb should be a very good choice if you want to rely strongly on the schema, plus both the code and serialized data sizes are key values for you. However, none of the libraries is perfect, which is why it’s important to have a project-specific evaluation prepared before making a choice.


DamianSenior Software Engineer