share

ENGINEERING

5min read

Data Serialization in Embedded Systems

What is data serialization? Serialization is a process of converting data structures into a format that is easy to store and transmit (over the network or other ways of communication). Serialized data could be then reconstructed on the same or even different environment in the process of data deserialization. Different programming languages may have custom solutions that cover that functionality. That’s mainly the case of high-level programming languages. In embedded systems, where C and lightweight part of C++ are used almost exclusively there is no such standard mechanism.

Serialization in embedded systems vs. in high-level languages

In high-level programming languages, often human-readable, text-based data serialization formats like XML or JSON are utilized. That’s not the case with embedded systems, which are typically highly resource-constrained. The serialization algorithm should have a small code footprint and low memory usage. Also, the size of serialized data should be as small as possible. For those reasons a custom, binary format seems to be the right direction to go. The cost of binary serialization is that the data is not human-readable. That’s often less important, though.

There are two main reasons for doing serialization. Firstly, you may want to save the current state of the application to a persistent storage in order to load it later. Secondly, to transmit the data between different systems or applications. In some cases these different applications may be written in completely different environments with different programming languages. For example, you may need to connect the embedded device to PC where high-level Python application is installed to communicate with. You should first consider the purpose of data serialization before choosing the proper method.

komp

Schema-required or schema-less ways of serializing data?

There are basically two ways of serializing the data. First, a schema-based solution requires some way of defining the structure of serialized data. The schema should fully describe the protocol—the format of the messages. Often some kind of an interface description language is used in order to specify this information. The library-specific tool then generates appropriate structures based on that file. A great thing about that style of data serialization is that the schema can be reused (even on different platforms) in all the modules or applications which communicate with each other. The schema could be even parsed to generate the code in different languages, even high level like Python or Java.

There is also a schemaless kind of data serialization. It will perfectly fit your needs if you already have the model to be serialized defined in your code, as there will be no overhead to either describe it redundantly in the schema or rewrite it to use the auto-generated structures. It will also be a perfect choice if the data serialization is not needed for communication and you don’t have to define the cross-platform protocol.

Google Protocol Buffers

Protocol Buffers is an open-source mechanism for serializing structured data created by Google. The mechanism is language and platform independent and is based on a schema defined in a custom, simple language. A *.proto file has to be written with the definition of the messages that represent the structure of data to serialize. Based on that, the schema code generator creates language specific code that allows interacting with structures representing the messages. Example proto file (and an option file defining the size of arrays) could look like this:

#proto file
message Event {
    required int32 id = 1;
    required string name = 2;
    required EventData eData = 3;
}
message EventData {
    required bytes data = 1;
}

#options file
Event.name   		 max_size:32
EventData.data  	 max_size:64

Google provides multiple, official code generators supporting various languages: Java, C++, Python, Ruby, JavaScript, Objective-C and C#. They are however unsuited for embedded solutions. That’s where the open-source benefits appear. There are multiple implementations of Google Protobuf for C. Two of them are actively maintained: Nanopb and protobuf-c. Based on the code size (which is a few times bigger in case of protobuf-c library) and the use of dynamic memory allocation (also in case of protobuf-c) the right choice for highly restricted embedded systems should be the Nanopb library. The authors of the library prepared a benchmark which tests the compiled size, memory usage and execution time of the competitive solutions.

The Nanopb generates following C structures based on the example proto file presented above.

/* Struct definitions */
typedef PB_BYTES_ARRAY_T(64) EventData_data_t;
typedef struct _EventData {
	EventData_data_t data;
/* @@protoc_insertion_point(struct:EventData) */
} EventData;

typedef struct _Event {
	int32_t id;
	char name[32];
	EventData eData;
/* @@protoc_insertion_point(struct:Event) */
} Event;

FlatBuffers

FlatBuffers (with flatcc - the C language bindings) is another schema-based serialization library, originally created at Google. So, what’s the difference between them? The distinctive feature of FlatBuffers is that the binary, serialized data could be accessed directly without unpacking the whole structure (or “message” in Protobuf naming convention). Extremely fast data serialization and deserialization come with the cost of less efficient “compression” of the data. That results in the larger size of serialized messages. It may vary depending on the use case but I’ve seen about 40% larger serialized data comparing to Nanobp.

The library introduces its own schema description language in *.fbs files in which the previous example would look like this:

table Event {
    id : uint32;
    name : string;
    eData : EventData;
}
table EventData {
    Data : [uint8];
}

They seem to be pretty similar, but there is a major difference between Protocol Buffers and Flat Buffers from the programmer’s perspective. Protocol Buffers library handles automatically the process of serializing and deserializing with only one method call. You just need to create instances of the structures defined by the code generator and call a function to serialize them—that’s all! Protocol Buffers takes the responsibility of properly serializing even complicated nested structures.

That’s not the case with FlatBuffers where the library instead of the data structures generates functions to serialize each parameter of your “message”. You need to write data serialization code yourself using the API provided by the library.

image3

MessagePack

MessagePack and its C implementation called MPack is a schemaless data serialization library suitable for embedded systems. Authors say it’s an extremely fast and lightweight solution. Lack of the schema could be treated as an advantage or a disadvantage depending on your needs but our tests showed that the library is not that lightweight when it comes to the program size (about twice larger than Nanopb and flatcc) as well as the serialized data size (which is also the largest one among the tested libraries).

However, it still may be a good choice in some cases. It’s well documented and easy to use—you only need to include two files into your source code. In our case the serialization process was relatively fast. By the way, the authors of the library prepared a benchmark comparison analysis of the available schemaless data serialization libraries.

Conclusion

There’s plenty of available libraries to handle the data serialization in embedded systems. The Nanopb should be a very good choice if you want to rely strongly on the schema, plus both the code and serialized data sizes are key values for you. However, none of the libraries is perfect, which is why it’s important to have a project-specific evaluation prepared before making a choice.

share


DamianSenior Software Engineer

LEARN MORE

Contact us if you have any questions regarding the article or just want to chat about technology, our services, job offers and more!

POLIDEA NEWSLETTER

Sign in and expect sharp insights, recommendations, ebooks and fascinating project stories delivered to your inbox

The controller of the personal data that you are about to provide in the above form will be Polidea sp. z o.o. with its registered office in Warsaw at ul. Przeskok 2, 00-032 Warsaw, KRS number: 0000330954, tel.: 0048 795 536 436, email: hello@polidea.com (“Polidea”). We will process your personal data based on our legitimate interest and/or your consent. Providing your personal data is not obligatory, but necessary for Polidea to respond to you in relation to your question and/or request. If you gave us consent to call you on the telephone, you may revoke the consent at any time by contacting Polidea via telephone or email. You can find detailed information about the processing of your personal data in relation to the above contact form, including your rights relating to the processing, HERE.

Data controller:

The controller of your personal data is Polidea sp. z o.o. with its registered office in Warsaw at ul. Przeskok 2, 00-032 Warsaw, KRS number: 0000330954, tel.: [0048795536436], email: [hello@polidea.com] (“Polidea”)

Purpose and legal bases for processing:

 

Used abbreviations:

GDPR – Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016
on the protection of natural persons with regard to the processing of personal data and on the free movement
of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)

ARES – Polish Act on Rendering Electronic Services dated 18 July 2002

TL – Polish Telecommunications Law dated 16 July 2004

1)        sending to the given email address a newsletter including information on Polidea’s new projects, products, services, organised events and/or general insights from the mobile app business world |art. 6.1 a) GDPR, art. 10.2 ARES and art. 172.1 TL (upon your consent)

Personal data:name, email address

2)       statistical, analytical and reporting purposes |art. 6. 1 f) GDPR (based on legitimate interests pursued by Polidea, consisting in analysing the way our services are used and adjusting them to our clients’ needs, as well as developing new services)

Personal data:name, email address

Withdrawal of consent:

You may withdraw your consent to process your personal data at any time.

Withdrawal of the consent is possible solely in the scope of processing performed based on the consent. Polidea is authorised to process your personal data after you withdraw your consent if it has another legal basis for the processing, for the purposes covered by that legal basis.

Categories of recipients:

Your personal data may be shared with:

1)       authorised employees and/or contractors of Polidea

2)       persons or entities providing particular services to Polidea (accounting, legal, IT, marketing and advertising services) – in the scope required for those persons or entities to provide those services to Polidea

 

Retention period:

1)       For the purpose of sending newsletter to the given email address – for as long as the relevant consent is not withdrawn

2)       For statistical, analytical and reporting purposes – for as long as the relevant consent is not withdrawn

Your rights:

 

Used abbreviation:

GDPR – Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016
on the protection of natural persons with regard to the processing of personal data and on the free movement
of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)

According to GDPR, you have the following rights relating to the processing of your personal data, exercised by contacting Polidea via [e-mail, phone].

1)       to access to your personal data (art. 15 GDPR) by requesting sharing and/or sending a copy of all your personal data processed by Polidea

2)       to request rectification of inaccurate personal data
(art. 16 GDPR) by indicating the data requiring rectification

3)       to request erasure of your persona data (art. 17 GDPR); Polidea has the rights to refuse erasing the personal data in specific circumstances provided by law

4)       to request restriction of processing of your personal data (art. 18 GDPR) by indicating the data which should be restricted

5)       to move your personal data (art. 20 GDPR) by requesting preparation and transfer by Polidea of the personal data that you provided to Polidea to you or another controller in a structured, commonly used machine-readable format

6)       to object to processing your personal data conducted based on art. 6.1 e) or f) GDPR, on grounds relating to your particular situation (art. 21 GDPR)

7)       to lodge a complaint with a supervisory authority,
in particular in the EU member state of your habitual residence, place of work or place of the alleged infringement if you consider that the processing
of personal data relating to you infringes the GDPR
(art. 77.1 GDPR)

No obligation to provide data:

Providing your personal data is not obligatory, but necessary for Polidea to provide you the newsletter service

Refusal to provide the above data will result in inability to receive the newsletter service.

Profiling

In the process of providing the newsletter service, we make decisions in an automated way, including profiling, based on the data you provide.

 

“Profiling” means automated processing of personal data consisting of the use of your personal data to evaluate certain personal aspects relating to you, in particular to analyze or predict aspects concerning your personal preferences and interests.

 

The automated decisions are taken based on the analysis of clicked and viewed content. They affect the targeting of specific newsletter content to selected users registered to receive the newsletter service, based on the anticipated interests of the recipient.