20 May 2003 Version 1.0Deserializing Individual Elements in XML DocumentsCharteris White Paper:Page 3
This paper discusses one of the limitations of common implementations of XMLdeserialization using the .NET Framework. It then discusses a possible solution, andhighlights the differences between the standard solution and the proposed one. It isassumed that the reader has some familiarity with XML Serialization, when it is used,and how this is implemented within the .NET Framework. This paper focuses on the current version of the .NET Framework at the time of writing – version 1.1. However, the recommendations should still be valid for version1.0.
XML Serialization is the process of converting an object to a form that can be easily transported. For example, an object can be serialized and transported over HTTP.XML Deserialization is used on the receiving system to create an object tree fromXML. There is not necessarily any correlation between the system doing theserialization, and the system doing the deserialization – they may be on differentplatforms and/or using different technologies to process the requests. The object thatis created as a result of deserialization is not the same object that was serialized; it only has the same public properties. Typically, when a .NET system is built to handle incoming XML, the system will have anumber of classes that conform to the XML schema definition language schema for theincoming XML. These can either be generated by hand, or by using XSD.exe. When incoming XML is received, the .NET framework will create instances of theclasses, and populate their public properties according to the XML received. By default,this is a monolithic process – the XmlSerializer will read the entire stream of XML andpopulate all the objects.
LIMITATIONS OF MONOLITHIC DESERIALIZATION
Depending on the XML being received, deserializing the entire stream may not beappropriate. If the stream is large, the resulting in-memory representation may consumesignificant amounts of memory. The processing logic may also decide to stopprocessing the XML after only processing a relatively small number of the createdentities. This results in suboptimal performance as the system has had to create all theentities, only not to use them. Also, as each object created is part of the entire objecttree, none are available to garbage collection until the entire tree has been processed,even though many objects have been processed.It is proposed in this paper that, in some cases, it would be better to only deserialize theentities as they are required. If processing needs to stop, the rest of the XML has notbeen deserialized, so the are no unnecessary objects. Once an individual item has been