Data Serialization in Automation - Everything you need to know

Endfinity · 21 May 2022 22:39

1. Introduction

This article covers data serialization used by automation and goes over how to read, interpret and write serialized data. This knowledge was used to make Techpool Unleashed and DotCarDumper.

1.1 What is data serialization?

Serialized data is data stored in binary format. In software terms that means that instead of reading and writing text data, we are instead working with bytes. For a more detailed explanation of serialization you can refer to the wikipedia page.

1.2 Where does Automation apply serialization?

There are a few ways Automation uses serialized data. For example, .car files consist entirely of serialized data. Another place you will see serialization is in some columns of the database. Such columns include, but are not limited to, fixtures, paints and tech pool.

2. Reading and writing Automation serialization

As stated above, serialized data is stored in binary. There are several ways to read bytes from data buffers or binary files. One way is to open the data in a hex editor. Beware however that most of the data is not exactly human-readable and you probably would not be having fun working with this data manually.
An optimal alternative is creating software that can read and write data in a way Automation understands.

2.1 Data types

When creating proprietary software for this purpose it is important to understand what data types Automation’s serialization uses as not all programming/coding languages support some of them.
The following table displays all data types you may encounter when working with Automation’s serialized data:

Type	Size	Description
Int32	4 Bytes	A standard integer with no floating point, is not used for values, rather used in pair with other data types
Double	8 Bytes	A floating point number representing all numeric values, you may find terms "Double" and "Numeric" being used interchangeably further on
UTF-8 Char	1 Byte*	A simple character. Used as data identifiers for defining how data should be read. An array of characters makes for a string. A string may be of indefinite size
Boolean	1 Byte	A true/false value. Represented in bytes 0x30 for false and 0x31 for true

2.2 Data identifiers

Serialization used in Automation does not have a strict structure or an order of data. This means certain identifiers need to be used to define how to read provided data. Identifiers are represented by single characters and say how to read data that comes right after them. The following table displays data identifiers you may encounter in Automation.

Identifier	Data	Description
T (0x54)	Table	Tables are dictionaries that contain all the data included in serialization. The identifier is followed by two Int32s which define how many items there are in the table. Only one of the two seems to ever have a value, thus the other one can be ignored. More information about tables can be found further below
N (0x4E)	Numeric	The identifier is directly followed with Double value
S (0x53)	String	Strings are used for defining keys and values of the table. The identifier is followed by an Int32 which defines the length of the string, which is subsequently followed by a number of UTF-8 characters, the amount of characters is equal to the size

Notice that there is no identifier for boolean values, those have to be identified by their possible values (0x30 and 0x31).

2.3 Tables

All serialized data is nested into tables. Tables are dictionaries. Dictionaries contain pairs of keys and values. In Automation’s serialization both keys and values can be of any type. There are 2 types of tables used by Automation that you may find:

Top level, tables which are meant to be treated separately
Nested tables, which are represented as a value of another table

Examples of top level tables include .car files and some items such as fixtures, tech pool, paints etc. Top level tables are not values by themselves and thus do not directly have a key, instead their identifiers are preceded by a byte of 0x01 (However they will technically have a key if they are stored as string values). In .car files, other than the main container itself, such tables are represented as string values.

Nested tables are values that have a key and they are not saved as strings. Examples of such tables may be found in .car files, where the data is split into such tables as models, trims, families, variants.

The identifier is followed with 2 Int32s, one of which always remains 0 and the other one defines the size of the table. Which one of the two is which varies but only one ever has a non-zero value.
The size definition of the table counts pairs of keys and values inside the table (1 pair = 1), not the byte size.

Conclusion

The provided information is the fundamental knowledge about Automation’s serialization which should be enough to start working with the data. All that is left is choosing a fitting programming language and making your own software to handle any of Automation’s serialized data. Unless you are still settled on manually editing everything in a hex editor, in which case, bless your soul.

Endfinity · 16 October 2022 15:48

The article has now been updated with newer information and some fixed mistakes.
A new example of how to handle the serialized data is now available in the shape of DotCarDumper