I was working with XML serialization of objects recently and was using the good ol’ DataContractSerializer again.
One thing that I bumped into almost immediately is that the XML that it spits out isn’t exactly the neatest, tidiest of XML possible, to say the least.
So I set out on a little odyssey to see exactly how nice and clean I could make it.
(EDIT: I’ve added more information about how the Name property of the Field object is being serialized twice, which is another big reason for customizing the serialization here, and for specialized dictionary serialization in general).
First, the objects to serialize. I’ve constructed a very rudimentary object hierarchy that still illustrates the problem well.
In this case, I have a List of Record objects, called a Records list. Each Record object is a dictionary of Field objects. And each Field object contains two properties, Name and Value. The code for these (and a little extra code to make populating them easy) is as follows.
Public Class Records Inherits List(Of Record) Public Sub New() '---- default constructor End Sub End Class Public Class Record Inherits Dictionary(Of String, Field) Public Sub New() '---- default constructor End Sub Public Sub New(ByVal ParamArray Fields() As Field) For Each f In Fields Me.Add(f.Name, f) Next End Sub End Class Public Class Field Public Sub New() '---- default constructor End Sub Public Sub New(ByVal Name As String, ByVal Value As String) Me.Name = Name Me.Value = Value End Sub Public Property Name() As String Get Return _Name End Get Set(ByVal value As String) _Name = value End Set End Property Private _Name As String Public Property Value() As String Get Return _Value End Get Set(ByVal value As String) _Value = value End Set End Property Private _Value As String End Class
Yes, I realize there are DataTables, KeyValuePair objects, etc that could do this, but that’s not the point, so just bear with me<g>.
To populate a Records object, you might have code that looks like this:
Dim Recs = New Records Recs.Add(New Record(New Field("Name", "Darin"), New Field("City", "Arlington"))) Recs.Add(New Record(New Field("Name", "Gillian"), New Field("City", "Ft Worth"))) Recs.Add(New Record(New Field("Name", "Laura"), New Field("City", "Dallas")))
Ok, so far so good.
Now, lets serialize that with a simple serialization function using the DataContractSerializer:
''' <summary> ''' Serializes the data contract to a string (XML) ''' </summary> Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String Dim stream = New System.IO.StringWriter Dim writer = System.Xml.XmlWriter.Create(stream) Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T)) serializer.WriteObject(writer, SerializeWhat) writer.Flush() Return stream.ToString End Function
In the test application, I put together, I dump the resulting XML to a text box. Yikes!
So, what’re the problems here? <g>
- You’ve got that “http://www.w3.org/2001/XMLSchema-instance” namespace attribute amongst other
- lots of random letters
- no indenting
- You can’t really tell it from this shot, but the Record dictionary is serializing the name property twice, because I’m using it as the Key for the dictionary, but it’s also a property of the objects in the dictionary.
All this noise might be fine for computer to computer communication, but it’s pretty tough on human eyes<g>.
Ok, first thing to do is indent:
''' <summary> ''' Serializes the data contract to a string (XML) ''' </summary> Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String Dim stream = New System.IO.StringWriter Dim xmlsettings = New Xml.XmlWriterSettings xmlsettings.Indent = True Dim writer = System.Xml.XmlWriter.Create(stream, xmlsettings) Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T)) serializer.WriteObject(writer, SerializeWhat) writer.Flush() Return stream.ToString End Function
Notice that I added the use of the XMLWriterSettings object. This allows me to set the Indent property, and things are much more readable.
But that’s still a far cry from nice, simple, tidy XML. Notice all the “ArrayofArrayOf blah blah” names, and the randomized letter sequences? Plus, it’s much more obvious how the NAME jproperty is being serialized twice now. Yuck! Surely, we can do better than this!
Cleaning Up the Single Entity Field Object
The DataContractSerializer certainly works easily enough to serialize the Field object, but unfortunately, it decorates the serialized elements with a load of really nasty looking and completely unnecessary cruft.
My first thought was to simply decorate the class with <DataContract> attributes:
<DataContract(Name:="Field", Namespace:="")> _ Public Class Field Public Sub New() '---- default constructor End Sub Public Sub New(ByVal Name As String, ByVal Value As String) Me.Name = Name Me.Value = Value End Sub <DataMember()> _ Public Property Name() As String Get Return _Name End Get Set(ByVal value As String) _Name = value End Set End Property Private _Name As String <DataMember()> _ Public Property Value() As String Get Return _Value End Get Set(ByVal value As String) _Value = value End Set End Property Private _Value As String End Class
But this yields:
So we have several problems:
- Each field is rendered into a Value element of the Record’s field collection
- The Key of the Record collection duplicates the Name of the individual Field objects
- and we still have a noxious xmlns=”” attribute being rendered.
Unfortunately, this is where the DataContractSerializer’s simplicity is it’s downfall. There’s just no way to customize this any further, using ONLY the DataContractSerializer.
However, we can implement IXMLSerializable on our Field object to customize its serialization. All I need to do is remove the DataContract attribute, and add a simple implementation of IXMLSerializable to the class:
Public Class Field Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Sub New(ByVal Name As String, ByVal Value As String) Me.Name = Name Me.Value = Value End Sub Public Property Name() As String Get Return _Name End Get Set(ByVal value As String) _Name = value End Set End Property Private _Name As String Public Property Value() As String Get Return _Value End Get Set(ByVal value As String) _Value = value End Set End Property Private _Value As String Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml writer.WriteElementString("Name", Me.Name) writer.WriteElementString("Value", Me.Value) End Sub End Class
And that yields a serialization of:
Definitely better, but still not great.
Cleaning up a Generic Dictionary’s Serialization
The problem now is with the Record dictionary.
Public Class Record Inherits Dictionary(Of String, Field) Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Sub New(ByVal ParamArray Fields() As Field) For Each f In Fields Me.Add(f.Name, f) Next End Sub Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml For Each f In Me.Values DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer) Next End Sub End Class
Adding an IXMLSerializable implementation to it as well yields the following XML:
Definitely much better! Especially notice that we’ve gotten rid of the duplicated “Name” key. It was duplicated before because we used the Name element of the Field object as the Key for the Record dictionary. This be play an important part in deserializing the Record’s dictionary of Field objects later.
Cleaning up the List of Records
Finally, the only thing really left to do is clean up how the generic list of Record objects is serialized.
But once again, the only way to alter the serialization is to implement IXMLSerializable on the class.
<Xml.Serialization.XmlRoot(Namespace:="")> _ Public Class Records Inherits List(Of Record) Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml For Each r In Me DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer) Next End Sub End Class
Notice that I’ve implemented IXMLSerializable, but I also added the XmlRoot attribute with a blank Namespace parameter. This completely clears the Namespace declaration from the resulting output, which now looks like this:
And that is just about as clean as your going to get!
But That’s Not all there is To It
Unfortunately, it’s not quite this simple. The thing is, you very well may want to serialize each object independently, not just serialize the Records collection. Doing that as we have things defined right now won’t work. The Start and End elements won’t be generated in the XML properly.
Instead, we need to add XmlRoot attributes to all three classes, and adjust where the WriteStartElement and WriteEndElement calls are made. So we end up with this:
<Xml.Serialization.XmlRoot(Namespace:="")> _ Public Class Records Inherits List(Of Record) Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml For Each r In Me writer.WriteStartElement("Record") DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer) writer.WriteEndElement() Next End Sub End Class <Xml.Serialization.XmlRoot(ElementName:="Record", Namespace:="")> _ Public Class Record Inherits Dictionary(Of String, Field) Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Sub New(ByVal ParamArray Fields() As Field) For Each f In Fields Me.Add(f.Name, f) Next End Sub Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml For Each f In Me.Values writer.WriteStartElement("Field") DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer) writer.WriteEndElement() Next End Sub End Class <Xml.Serialization.XmlRoot(ElementName:="Field", Namespace:="")> _ Public Class Field Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Sub New(ByVal Name As String, ByVal Value As String) Me.Name = Name Me.Value = Value End Sub Public Property Name() As String Get Return _Name End Get Set(ByVal value As String) _Name = value End Set End Property Private _Name As String Public Property Value() As String Get Return _Value End Get Set(ByVal value As String) _Value = value End Set End Property Private _Value As String Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml writer.WriteElementString("Name", Me.Name) writer.WriteElementString("Value", Me.Value) End Sub End Class
And Finally, Deserialization
Of course, all this would be for nought if we couldn’t actually deserialize the xml we’ve just spent all this effort to clean up.
Turns out that deserialization is pretty straightforward. I just needed to add code to the ReadXml member of the implemented IXMLSerializable interface. The full code for my testing form is below. Be sure to add a reference to System.Runtime.Serialization, though, or you’ll have type not defined errors.
Public Class frmSample Private Sub btnTest_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles btnTest.Click '---- populate the objects Dim Recs = New Records Recs.Add(New Record(New Field("Name", "Darin"), New Field("City", "Arlington"))) Recs.Add(New Record(New Field("Name", "Gillian"), New Field("City", "Ft Worth"))) Recs.Add(New Record(New Field("Name", "Laura"), New Field("City", "Dallas"))) Dim t As String t = Serialize(Of Field)(Recs(0).Values(0)) Dim fld = Deserialize(Of Field)(t) Debug.Print(fld.Name) Debug.Print(fld.Value) Debug.Print("--------------") t = Serialize(Of Record)(Recs(0)) Dim rec = Deserialize(Of Record)(t) Debug.Print(rec.Values.Count) Debug.Print("--------------") t = Serialize(Of Records)(Recs) tbxOutput.Text = t Dim recs2 = Deserialize(Of Records)(t) Debug.Print(recs2.Count) End Sub End Class <Xml.Serialization.XmlRoot(Namespace:="")> _ Public Class Records Inherits List(Of Record) Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml reader.MoveToContent() reader.ReadStartElement("Records") reader.MoveToContent() Do While reader.NodeType <> Xml.XmlNodeType.EndElement Dim Rec = New Record DirectCast(Rec, System.Xml.Serialization.IXmlSerializable).ReadXml(reader) Me.Add(Rec) reader.MoveToContent() Loop reader.ReadEndElement() End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml For Each r In Me writer.WriteStartElement("Record") DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer) writer.WriteEndElement() Next End Sub End Class <Xml.Serialization.XmlRoot(ElementName:="Record", Namespace:="")> _ Public Class Record Inherits Dictionary(Of String, Field) Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Sub New(ByVal ParamArray Fields() As Field) For Each f In Fields Me.Add(f.Name, f) Next End Sub Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml reader.MoveToContent() reader.ReadStartElement("Record") reader.MoveToContent() Do While reader.NodeType <> Xml.XmlNodeType.EndElement Dim fld = New Field DirectCast(fld, System.Xml.Serialization.IXmlSerializable).ReadXml(reader) Me.Add(fld.Name, fld) reader.MoveToContent() Loop reader.ReadEndElement() End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml For Each f In Me.Values writer.WriteStartElement("Field") DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer) writer.WriteEndElement() Next End Sub End Class <Xml.Serialization.XmlRoot(ElementName:="Field", Namespace:="")> _ Public Class Field Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Sub New(ByVal Name As String, ByVal Value As String) Me.Name = Name Me.Value = Value End Sub Public Property Name() As String Get Return _Name End Get Set(ByVal value As String) _Name = value End Set End Property Private _Name As String Public Property Value() As String Get Return _Value End Get Set(ByVal value As String) _Value = value End Set End Property Private _Value As String Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml reader.MoveToContent() reader.ReadStartElement("Field") reader.MoveToContent() If reader.Name = "Name" Then Me.Name = reader.ReadElementContentAsString reader.MoveToContent() If reader.Name = "Value" Then Me.Value = reader.ReadElementContentAsString reader.MoveToContent() reader.ReadEndElement() End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml writer.WriteElementString("Name", Me.Name) writer.WriteElementString("Value", Me.Value) End Sub End Class Public Module Serialize ''' <summary> ''' Serializes the data contract to a string (XML) ''' </summary> Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String Dim stream = New System.IO.StringWriter Dim xmlsettings = New Xml.XmlWriterSettings xmlsettings.Indent = True Dim writer = System.Xml.XmlWriter.Create(stream, xmlsettings) Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T)) serializer.WriteObject(writer, SerializeWhat) writer.Flush() Return stream.ToString End Function ''' <summary> ''' Deserializes the data contract from xml. ''' </summary> Public Function Deserialize(Of T As Class)(ByVal xml As String) As T Using stream As New MemoryStream(UnicodeEncoding.Unicode.GetBytes(xml)) Return DeserializeFromStream(Of T)(stream) End Using End Function ''' <summary> ''' Deserializes the data contract from a stream. ''' </summary> Public Function DeserializeFromStream(Of T As Class)(ByVal stream As Stream) As T Dim serializer As New DataContractSerializer(GetType(T)) Return DirectCast(serializer.ReadObject(stream), T) End Function End Module
Of particular note above is the ReadXML function of the Field object.
It checks the name of the node first and then places the value of the node into the appropriate property of that object. If I didn’t do that, the deserialization process would require the fields in the XML to be in a specific order. This is a minor drawback to the DataContractSerializer that this approach alleviates.
What’s Next?
The one unfortunate aspect of this is that it requires you to implement IXMLSerializable on each object that you want the XML cleaned up for.
Generally speaking, The DataContractSerializer will be perfectly fine for those cases where humans aren’t likely to ever have to see the XML you’re generating. And you get a performance boost for sacrificing that flexibility and “cleanliness”.
But for things like data file imports, custom configuration files, and the like, it may be desirable to implement custom serialization like this so that your xml files can be almost as easy to read as those old school INI files!
5 Comments
Hello,
Cool stuff. But why not using the DataContractSerializer constructor for which you can specify to preserve or not object references. If you give ‘false’, it simply removes all namspace reference and unwanted attributes.
Simple, no ?
Fred
Hi Fred
Thanks for the comment!
It’s been a bit since I worked with the DataContractSerializer, but I don’t believe you’ll end up with particularly clean XML even if you supply the argument on the constructor you mention.
Don’t get me wrong, I love the DCS. It’s simple to use and works for a ton of cases.
In fact, it’s very likely you could use it to READ the xml files I mention above, and just hand roll your own xml serializer (ie let DCS handle the deserialization). I haven’t tried that combo in particular.
I just didn’t have much luck getting really clean xml (the kind I’d show my grandmother, or a particularly tech-averse manager), out of the DCS.
I’ll take another look though. You’ve got me curious.
Hi Darin,
Yes, just try this constructor:
DataContractSerializer serializer = new DataContractSerializer(type, null, int.MaxValue, true, false, null);
The ‘false’ parameter specifies that you don’t want to preserve objects references and so that you will generate a standard XML.
With that, I’ve got the exact same XML than with former .NET XML serializer (no namespace and dummy attributes, simply a clean readable XML).
Hi again,
Just one precision, I forget: You need also to specify the same namespace (in the DataContract attribute) for all objects involved in the serialization process (except of course, if the object that you need to serialize only contains objects in the same namespace).
Otherwise, you still have namespace references.
Cheers,
Fred
i have to post a link to here, from mine, if you dont mind, please reply, cya