Using and maintaining Zorba plan serialization
Each class to be serialized must have a serialize() function implemented along with other macros. This function receives an Archiver object which you can use to serialize the content of the class. The same serialize() function is called for both out and in serialization, so you don't have to worry about maintaining two separate functions.
Inside the serialize() function you have to call operator & for each member of the class (that you want serialized). The objects must be loaded in the same order they are saved. The Archiver does not save the name of the objects, only their types, so the order is important.
Operator & is predefined for almost all simple types. If you need to use it for a new type you can implement this operator yourself following the models from the file class_serializer.cpp.
All classes to be serialized must be derived from zorba::serialization::SerializeBaseClass. Note that RCObject class is already derived from SerializeBaseClass.
Also you have to call a set of macros in the header file and in cpp file where the class is defined.
For example:
//in header file
#include “zorbaserialization/serialization_engine.h”
class Example : public ::zorba::serialization::SerializeBaseClass
{
int m1;
public:
SERIALIZABLE_CLASS(Example)
SERIALIZABLE_CLASS_CONSTRUCTOR(Example)
void serialize(::zorba::serialization::Archiver &ar)
{
ar & m1;
}
}
//in cpp file
SERIALIZABLE_CLASS_VERSIONS(Example)
END_SERIALIZABLE_CLASS_VERSIONS(Example)
The first macro defined in the class is SERIALIZABLE_CLASS(class_name) . This macro defines the class factory class and some various functions related to versioning. The name of the macro varies depending on the type of class:
SERIALIZABLE_CLASS(class_name) |
For normal classes which can be instantiated |
SERIALIZABLE_ABSTRACT_CLASS(class_name) |
For abstract classes, which cannot be instantiated. The class factory is registered only to provide dynamic pointer casting. |
SERIALIZABLE_TEMPLATE_CLASS(class_name) |
For template classes which can be instantiated after specialization. |
SERIALIZABLE_TEMPLATE_ABSTRACT_CLASS(class_name) |
For template abstract classes. |
SERIALIZABLE_CLASS_NO_FACTORY(class_name) |
This can be used optionaly on classes which don't need class factories for creating objects or casting pointers. I used it for some base template classes, like Batcher, to avoid declaring the cpp macros for each of its specializations. |
The second macro SERIALIZABLE_CLASS_CONSTRUCTOR(class_name) defines the special constructor related to serialization. This constructor is in the form
Example(::zorba::serialization::Archiver &ar) : ::zorba::serialization::SerializeBaseClass() {}
This constructor is called when the Archiver creates the objects when loading. This macro is only a helper, you can also declare this constructor by hand.
There are some variations to this macro:
SERIALIZABLE_CLASS_CONSTRUCTOR(class_name) |
For classes derived directly from SerializeBaseClass |
SERIALIZABLE_CLASS_CONSTRUCTOR2(class_name, base_class) |
For classes derived from a single base class. |
SERIALIZABLE_CLASS_CONSTRUCTOR2T(class_name, base_class, templ2) |
For classed derived from a template base class. |
SERIALIZABLE_CLASS_CONSTRUCTOR3T(class_name, base_class, templ2, templ3) |
For classed derived from a template base class with two parameters. |
SERIALIZABLE_CLASS_CONSTRUCTOR3(class_name, base_class1, base_class2) |
For classes derived from two normal base classes. |
Next, the serialize() function is defined. Note that this is not a virtual function. It is called by Archiver from inside the operator & template specialized for this class.
Inside serialize() function you can call various functions and macros:
Operator & For example: ar & class_member_1; ar & class_member_2; |
You can call this operator & for almost every member of the class. This operator is used for both out and in serialization. There are three types of implementations for operator &: one is the template, defined for serializing classes. Other is the implementation for simple types. The third one is the implementation for Zorba special types. There are some Zorba types that cannot be serialized in a normal way, so they had to be processed in a special operator&. |
::zorba::serialization::serialize_baseclass(ar, (base_class*)this); |
Call this function to serialize the content of each base class. This function is a global template function, specialized for every base class you need. It will set the Archiver in a special mode and then call the serialize function of the base class. |
SERIALIZE_ENUM(enum_type, obj) |
Call this macro to serialize an enum object. The object will be converted to “int” and then serialized. |
serialize_array(ar, unsigned char *obj, int len); |
Call this function to serialize some binary data. The data is converted first to text using base64 convertion. |
SERIALIZE_TYPEMANAGER( type, obj ) |
Call this macro to serialize a TypeManager* or TypeManager derived object. You have to specify the exact type as the first parameter. This macro will handle the special case of not serializing the root type manager. When serializing back in, the pointer to the local root type manager will be returned instead. |
SERIALIZE_TYPEMANAGER_RCHANDLE( type, obj ) |
Same as above, only now the type of the object is a rchandle to a TypeManager or some derived class. |
ar.set_is_temp_field ( true ) ar.set_is_temp_field ( false ) |
Call Archiver's set_is_temp_field() to set it in a special mode that tells it to not register the pointer of the upcomming objects into its internal pointer hashmap. Sometimes there is a need to construct some temporary variables inside serialize() function, and serialize those temporary variables. You have to tell Archiver to not memorize the pointer to that temporary variable, otherwise the pointer duplication mechanism might be messed up. All the members of the objects are serialized as temporary objects. After serializing the temporary variable you have to set the temp mode to false. |
ar.set_is_temp_field_one_level( bool is_temp, bool also_for_ptr = false); |
Same as set_is_temp_field, but the members of the temporary objects are not serialized as temporary objects. For example, let's say you have to construct a rchandle<> on the stack and serialize it. The rchandle<> is a temporary object, but its pointer inside maybe not, maybe it is a pointer to an already existing RCObject. To deal with this case you have to call set_is_temp_field_one_level(true) before serializing the xrchandle<> and then set_is_temp_field_one_level(false) after that. Be careful, that set_is_temp_field and set_is_temp_field_one_level should always be called in pairs, one for true and one for false. The second parameter tells the archiver that it must also treat the pointers as temporary objects. The default is false, that is, only the normal objects constructed on stack are to be dealed as temporary when calling set_is_temp_field_one_level. |
Bool ar.is_serializing_out() |
Returns true if Archiver is used to serializing out. For serializing in, the function returns false. |
Int ar.get_class_version() |
Retrieve the version of the current class while loading it. This function is meaningfull only when serializing in. |
ar.set_xquery_with_eval() |
Tells the Archiver that this xquery contains eval expressions. The serializer engine tries to minimize the binary output by removing unused function definitions and other parts. But these must not be removed if the xquery contains an eval expression, because that eval expression may try to use one of that function or variable. For now, this function is called in the serialize() of the EvalIterator. |
ar.set_serialize_only_for_eval(true) ar.set_serialize_only_for_eval(false) |
Marks some objects as usefull only for xqueries with eval. If xquery contains no eval expressions, those objects will be removed from serialization, unless they are referenced from non-eval-only objects. |
ar.dont_allow_delay(ENUM_ALLOW_DELAY d = DONT_ALLOW_DELAY) |
Archiver has a mechanism for detecting duplicate pointers. It will serialize the object only once, and set the other pointers as references to it. There are cases when the loading of an object is done after serializing the references, so those references will stay empty until the whole process of serialization completes. That is, the serialization of that object is delayed. In some cases this is not desired. One example is serialization of node items. When loading the children of a node item, the order of loading is very important, so the serialization must be done right there. Other example is when using temporary pointers for seting up the object in a custom way. You need to make sure that the pointer is loaded right there and not delayed. You have to call dont_allow_delay() before every object
serialization. Like: ar & object_ptr There are two possible parameters: DONT_ALLOW_DELAY – the default value – the object loading cannot be delayed, but it is not mandatory that the object is serialized right there. Maybe it is serialized before, by another reference. SERIALIZE_NOW – tells the Archiver to serialize now the object. Of course, you cannot have two references to the same object, both set to SERIALIZE_NOW. Don't overuse this function, as this may lead to impossible to solve circular dependencies. |
In the cpp file you have to define the class versions:
SERIALIZABLE_CLASS_VERSIONS(Example)
CLASS_VERSION( 2, ZORBA_VERSION_0_9_5, BACKWARD_COMPATIBLE, “added a new member m2 in class”)
CLASS_VERSION( 3, ZORBA_VERSION_0_9_6, !BACKWARD_COMPATIBLE, “changed type of m1 and m2 from int to MAPM”)
END_SERIALIZABLE_CLASS_VERSIONS(Example)
This set of macros specify that the latest Example class version is 3 and the current code is not backward compatible with previous versions. That is, if Archiver tries to load an Example object saved from previous version it will fail and will suggest to the user to use Zorba version 0.9.5 instead.
When just declaring a serializable class you can declare only the macros SERIALIZABLE_CLASS_VERSIONS and END_SERIALIZABLE_CLASS_VERSIONS. A version 1 will be added by default, associating it with the current Zorba version. After changing the members of the class you have to add new versions and also new code into serialize() functions (supporting the old version or not, depending on your choice).
Defining the versions for templates is more complicated. For example lets consider the template serializable_hashmap<>.
SERIALIZABLE_TEMPLATE_VERSIONS(serializable_hashmap)
CLASS_VERSION(..... …... ...)
END_SERIALIZABLE_TEMPLATE_VERSIONS(serializable_hashmap)
SERIALIZABLE_TEMPLATE_INSTANCE_VERSIONS(serializable_hashmap, serializable_hashmap<context::ctx_value_t>, 1)
SERIALIZABLE_TEMPLATE_INSTANCE_VERSIONS(serializable_hashmap, serializable_hashmap<xqp_string>, 2)
SERIALIZABLE_TEMPLATE_INSTANCE_VERSIONS(serializable_hashmap, serializable_hashmap<xqtref_t>, 3)
The CLASS_VERSION macros are defined once for one template. After that you have to declare SERIALIZABLE_TEMPLATE_INSTANCE_VERSIONS for each of the template specializations in the code. The first parameter is the name of the template. The second parameter is the name of the specialization. The third parameter is the index of the macro.
This macro SERIALIZABLE_TEMPLATE_INSTANCE_VERSIONS also declares the global class factory object.
If you don't need a class factory for this class, you can declare the macro SERIALIZABLE_CLASS_NO_FACTORY inside the template definition. Then you don't have to declare the macro SERIALIZABLE_TEMPLATE_INSTANCE_VERSIONS for each of the template specializations.
How to serialize
The Archiver class is an abstract class, so you have to create one of its specializations: MemArchiver, XmlArchiver or BinArchiver.
MemArchiver is used for testing and it just serializes the data into the special Archiver tree.
XmlArchiver serializes data into a stream in xml format. The result is quite large, but is human readable.
BinArchiver serializes data into a stream in binary format. Very optimized but not human readable.
Example ex1;
std::ofstream of(“example.bin”, std::ios_base::binary);
BinArchiver bin_ar_save(&of);///create an archiver for saving
bin_ar_save & ex1;
bin_ar_save.serialize_out();///only now the data is sent to file
//....... then loading back
Example ex2;
std::ifstream if(“example.bin”, std::ios_base::binary);
BinArchiver bin_ar_load(&if);//create an archiver for loading
bin_ar_load & ex2;
bin_ar_load.finalize_input_serialization();///fix up all the duplicated pointers
///...and that's it