Serialization and deserialization — to
http://www.infoq.com/cn/articles/serialization-and-deserialization
brief introduction
The author serves meituan recommendation and personalization group, which is committed to providing meituan users with high-quality personalized recommendation and ranking services at the level of bill every day. From terabyte level user behavior data to Gigabyte level deal / POI data; From the user's real-time geographic location data requiring real-time performance within milliseconds to the regular background job data, the recommendation and reordering system needs many types of data services. Customers of recommendation and reordering system include various internal services, meituan client and meituan website. In order to provide high-quality data services and achieve good docking with upstream and downstream systems, the selection of serialization and deserialization is often an important consideration in our system design.
This paper is organized as follows:
1、 Definitions and related concepts
The emergence of the Internet has brought the demand for inter machine communication, and the two sides of interconnected communication need to adopt the agreed protocol. Serialization and deserialization belong to a part of the communication protocol. Communication protocols often adopt hierarchical models, and the functional definitions and granularity of each layer of different models are different. For example, TCP / IP protocol is a four layer protocol, while OSI model is a seven layer protocol model. Presentation layer in OSI seven layer protocol model The main function of the presentation layer is to convert the object of the application layer into a continuous binary string, or conversely, convert the binary string into the object of the application layer -- these two functions are serialization and deserialization. Generally speaking, the application layer of TCP / IP protocol corresponds to the application layer, presentation layer and session layer of OSI seven layer protocol model, so the serialization protocol belongs to TC The P / IP protocol is part of the application layer. The explanation of serialization protocol in this paper is mainly based on OSI seven layer protocol model.
Data structures, objects and binary strings
In different computer languages, the representation of data structures, objects and binary strings are different.
Data structures and objects: for a completely object-oriented language like Java, Everything an engineer operates is an object, which comes from the instantiation of a class. In the Java language, the closest concept to a data structure is POJO (plain old Java object) or JavaBean - those classes that only have setter / getter methods. In C binary string: the binary string generated by serialization refers to a piece of data stored in memory. The string of C language can be directly used by the transport layer because it is essentially a binary string stored in memory ending with '0'. In Java language, binary string The concept of string is easily confused with string. In fact, string is a first-class citizen of Java and a special object. For cross language communication, the serialized data cannot be a special data type of a language. Binary string in Java refers to byte [], and byte is one of the 8 primitive data types of Java.
2、 Serialization protocol properties
Each serialization protocol has advantages and disadvantages. They have their own unique application scenarios at the beginning of design. In the process of system design, we need to consider all aspects of serialization requirements, comprehensively compare the characteristics of various serialization protocols, and finally give a compromise scheme.