I was working with XML serialization of objects recently and was using the good ol’ DataContractSerializer again.
One thing that I bumped into almost immediately is that the XML that it spits out isn’t exactly the neatest, tidiest of XML possible, to say the least.
So I set out on a little odyssey to see exactly how nice and clean I could make it.
(EDIT: I’ve added more information about how the Name property of the Field object is being serialized twice, which is another big reason for customizing the serialization here, and for specialized dictionary serialization in general).
First, the objects to serialize. I’ve constructed a very rudimentary object hierarchy that still illustrates the problem well.
In this case, I have a List of Record objects, called a Records list. Each Record object is a dictionary of Field objects. And each Field object contains two properties, Name and Value. The code for these (and a little extra code to make populating them easy) is as follows.
Public Class Records
Inherits List(Of Record)
Public Sub New()
'---- default constructor
End Sub
End Class
Public Class Record
Inherits Dictionary(Of String, Field)
Public Sub New()
'---- default constructor
End Sub
Public Sub New(ByVal ParamArray Fields() As Field)
For Each f In Fields
Me.Add(f.Name, f)
Next
End Sub
End Class
Public Class Field
Public Sub New()
'---- default constructor
End Sub
Public Sub New(ByVal Name As String, ByVal Value As String)
Me.Name = Name
Me.Value = Value
End Sub
Public Property Name() As String
Get
Return _Name
End Get
Set(ByVal value As String)
_Name = value
End Set
End Property
Private _Name As String
Public Property Value() As String
Get
Return _Value
End Get
Set(ByVal value As String)
_Value = value
End Set
End Property
Private _Value As String
End Class
Yes, I realize there are DataTables, KeyValuePair objects, etc that could do this, but that’s not the point, so just bear with me<g>.
To populate a Records object, you might have code that looks like this:
Dim Recs = New Records
Recs.Add(New Record(New Field("Name", "Darin"), New Field("City", "Arlington")))
Recs.Add(New Record(New Field("Name", "Gillian"), New Field("City", "Ft Worth")))
Recs.Add(New Record(New Field("Name", "Laura"), New Field("City", "Dallas")))
Ok, so far so good.
Now, lets serialize that with a simple serialization function using the DataContractSerializer:
''' <summary>
''' Serializes the data contract to a string (XML)
''' </summary>
Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String
Dim stream = New System.IO.StringWriter
Dim writer = System.Xml.XmlWriter.Create(stream)
Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T))
serializer.WriteObject(writer, SerializeWhat)
writer.Flush()
Return stream.ToString
End Function
In the test application, I put together, I dump the resulting XML to a text box. Yikes!
So, what’re the problems here? <g>
- You’ve got that “http://www.w3.org/2001/XMLSchema-instance” namespace attribute amongst other
- lots of random letters
- no indenting
- You can’t really tell it from this shot, but the Record dictionary is serializing the name property twice, because I’m using it as the Key for the dictionary, but it’s also a property of the objects in the dictionary.
All this noise might be fine for computer to computer communication, but it’s pretty tough on human eyes<g>.
Ok, first thing to do is indent:
''' <summary>
''' Serializes the data contract to a string (XML)
''' </summary>
Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String
Dim stream = New System.IO.StringWriter
Dim xmlsettings = New Xml.XmlWriterSettings
xmlsettings.Indent = True
Dim writer = System.Xml.XmlWriter.Create(stream, xmlsettings)
Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T))
serializer.WriteObject(writer, SerializeWhat)
writer.Flush()
Return stream.ToString
End Function
Notice that I added the use of the XMLWriterSettings object. This allows me to set the Indent property, and things are much more readable.
But that’s still a far cry from nice, simple, tidy XML. Notice all the “ArrayofArrayOf blah blah” names, and the randomized letter sequences? Plus, it’s much more obvious how the NAME jproperty is being serialized twice now. Yuck! Surely, we can do better than this!
Cleaning Up the Single Entity Field Object
The DataContractSerializer certainly works easily enough to serialize the Field object, but unfortunately, it decorates the serialized elements with a load of really nasty looking and completely unnecessary cruft.
My first thought was to simply decorate the class with <DataContract> attributes:
<DataContract(Name:="Field", Namespace:="")> _
Public Class Field
Public Sub New()
'---- default constructor
End Sub
Public Sub New(ByVal Name As String, ByVal Value As String)
Me.Name = Name
Me.Value = Value
End Sub
<DataMember()> _
Public Property Name() As String
Get
Return _Name
End Get
Set(ByVal value As String)
_Name = value
End Set
End Property
Private _Name As String
<DataMember()> _
Public Property Value() As String
Get
Return _Value
End Get
Set(ByVal value As String)
_Value = value
End Set
End Property
Private _Value As String
End Class
But this yields:
So we have several problems:
- Each field is rendered into a Value element of the Record’s field collection
- The Key of the Record collection duplicates the Name of the individual Field objects
- and we still have a noxious xmlns=”” attribute being rendered.
Unfortunately, this is where the DataContractSerializer’s simplicity is it’s downfall. There’s just no way to customize this any further, using ONLY the DataContractSerializer.
However, we can implement IXMLSerializable on our Field object to customize its serialization. All I need to do is remove the DataContract attribute, and add a simple implementation of IXMLSerializable to the class:
Public Class Field
Implements System.Xml.Serialization.IXmlSerializable
Public Sub New()
'---- default constructor
End Sub
Public Sub New(ByVal Name As String, ByVal Value As String)
Me.Name = Name
Me.Value = Value
End Sub
Public Property Name() As String
Get
Return _Name
End Get
Set(ByVal value As String)
_Name = value
End Set
End Property
Private _Name As String
Public Property Value() As String
Get
Return _Value
End Get
Set(ByVal value As String)
_Value = value
End Set
End Property
Private _Value As String
Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
Return Nothing
End Function
Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
End Sub
Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
writer.WriteElementString("Name", Me.Name)
writer.WriteElementString("Value", Me.Value)
End Sub
End Class
And that yields a serialization of:
Definitely better, but still not great.
Cleaning up a Generic Dictionary’s Serialization
The problem now is with the Record dictionary.
Public Class Record
Inherits Dictionary(Of String, Field)
Implements System.Xml.Serialization.IXmlSerializable
Public Sub New()
'---- default constructor
End Sub
Public Sub New(ByVal ParamArray Fields() As Field)
For Each f In Fields
Me.Add(f.Name, f)
Next
End Sub
Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
Return Nothing
End Function
Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
End Sub
Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
For Each f In Me.Values
DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
Next
End Sub
End Class
Adding an IXMLSerializable implementation to it as well yields the following XML:
Definitely much better! Especially notice that we’ve gotten rid of the duplicated “Name” key. It was duplicated before because we used the Name element of the Field object as the Key for the Record dictionary. This be play an important part in deserializing the Record’s dictionary of Field objects later.
Cleaning up the List of Records
Finally, the only thing really left to do is clean up how the generic list of Record objects is serialized.
But once again, the only way to alter the serialization is to implement IXMLSerializable on the class.
<Xml.Serialization.XmlRoot(Namespace:="")> _
Public Class Records
Inherits List(Of Record)
Implements System.Xml.Serialization.IXmlSerializable
Public Sub New()
'---- default constructor
End Sub
Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
Return Nothing
End Function
Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
End Sub
Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
For Each r In Me
DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
Next
End Sub
End Class
Notice that I’ve implemented IXMLSerializable, but I also added the XmlRoot attribute with a blank Namespace parameter. This completely clears the Namespace declaration from the resulting output, which now looks like this:
And that is just about as clean as your going to get!
But That’s Not all there is To It
Unfortunately, it’s not quite this simple. The thing is, you very well may want to serialize each object independently, not just serialize the Records collection. Doing that as we have things defined right now won’t work. The Start and End elements won’t be generated in the XML properly.
Instead, we need to add XmlRoot attributes to all three classes, and adjust where the WriteStartElement and WriteEndElement calls are made. So we end up with this:
<Xml.Serialization.XmlRoot(Namespace:="")> _
Public Class Records
Inherits List(Of Record)
Implements System.Xml.Serialization.IXmlSerializable
Public Sub New()
'---- default constructor
End Sub
Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
Return Nothing
End Function
Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
End Sub
Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
For Each r In Me
writer.WriteStartElement("Record")
DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
writer.WriteEndElement()
Next
End Sub
End Class
<Xml.Serialization.XmlRoot(ElementName:="Record", Namespace:="")> _
Public Class Record
Inherits Dictionary(Of String, Field)
Implements System.Xml.Serialization.IXmlSerializable
Public Sub New()
'---- default constructor
End Sub
Public Sub New(ByVal ParamArray Fields() As Field)
For Each f In Fields
Me.Add(f.Name, f)
Next
End Sub
Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
Return Nothing
End Function
Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
End Sub
Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
For Each f In Me.Values
writer.WriteStartElement("Field")
DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
writer.WriteEndElement()
Next
End Sub
End Class
<Xml.Serialization.XmlRoot(ElementName:="Field", Namespace:="")> _
Public Class Field
Implements System.Xml.Serialization.IXmlSerializable
Public Sub New()
'---- default constructor
End Sub
Public Sub New(ByVal Name As String, ByVal Value As String)
Me.Name = Name
Me.Value = Value
End Sub
Public Property Name() As String
Get
Return _Name
End Get
Set(ByVal value As String)
_Name = value
End Set
End Property
Private _Name As String
Public Property Value() As String
Get
Return _Value
End Get
Set(ByVal value As String)
_Value = value
End Set
End Property
Private _Value As String
Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
Return Nothing
End Function
Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
End Sub
Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
writer.WriteElementString("Name", Me.Name)
writer.WriteElementString("Value", Me.Value)
End Sub
End Class
And Finally, Deserialization
Of course, all this would be for nought if we couldn’t actually deserialize the xml we’ve just spent all this effort to clean up.
Turns out that deserialization is pretty straightforward. I just needed to add code to the ReadXml member of the implemented IXMLSerializable interface. The full code for my testing form is below. Be sure to add a reference to System.Runtime.Serialization, though, or you’ll have type not defined errors.
Public Class frmSample
Private Sub btnTest_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles btnTest.Click
'---- populate the objects
Dim Recs = New Records
Recs.Add(New Record(New Field("Name", "Darin"), New Field("City", "Arlington")))
Recs.Add(New Record(New Field("Name", "Gillian"), New Field("City", "Ft Worth")))
Recs.Add(New Record(New Field("Name", "Laura"), New Field("City", "Dallas")))
Dim t As String
t = Serialize(Of Field)(Recs(0).Values(0))
Dim fld = Deserialize(Of Field)(t)
Debug.Print(fld.Name)
Debug.Print(fld.Value)
Debug.Print("--------------")
t = Serialize(Of Record)(Recs(0))
Dim rec = Deserialize(Of Record)(t)
Debug.Print(rec.Values.Count)
Debug.Print("--------------")
t = Serialize(Of Records)(Recs)
tbxOutput.Text = t
Dim recs2 = Deserialize(Of Records)(t)
Debug.Print(recs2.Count)
End Sub
End Class
<Xml.Serialization.XmlRoot(Namespace:="")> _
Public Class Records
Inherits List(Of Record)
Implements System.Xml.Serialization.IXmlSerializable
Public Sub New()
'---- default constructor
End Sub
Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
Return Nothing
End Function
Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
reader.MoveToContent()
reader.ReadStartElement("Records")
reader.MoveToContent()
Do While reader.NodeType <> Xml.XmlNodeType.EndElement
Dim Rec = New Record
DirectCast(Rec, System.Xml.Serialization.IXmlSerializable).ReadXml(reader)
Me.Add(Rec)
reader.MoveToContent()
Loop
reader.ReadEndElement()
End Sub
Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
For Each r In Me
writer.WriteStartElement("Record")
DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
writer.WriteEndElement()
Next
End Sub
End Class
<Xml.Serialization.XmlRoot(ElementName:="Record", Namespace:="")> _
Public Class Record
Inherits Dictionary(Of String, Field)
Implements System.Xml.Serialization.IXmlSerializable
Public Sub New()
'---- default constructor
End Sub
Public Sub New(ByVal ParamArray Fields() As Field)
For Each f In Fields
Me.Add(f.Name, f)
Next
End Sub
Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
Return Nothing
End Function
Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
reader.MoveToContent()
reader.ReadStartElement("Record")
reader.MoveToContent()
Do While reader.NodeType <> Xml.XmlNodeType.EndElement
Dim fld = New Field
DirectCast(fld, System.Xml.Serialization.IXmlSerializable).ReadXml(reader)
Me.Add(fld.Name, fld)
reader.MoveToContent()
Loop
reader.ReadEndElement()
End Sub
Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
For Each f In Me.Values
writer.WriteStartElement("Field")
DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
writer.WriteEndElement()
Next
End Sub
End Class
<Xml.Serialization.XmlRoot(ElementName:="Field", Namespace:="")> _
Public Class Field
Implements System.Xml.Serialization.IXmlSerializable
Public Sub New()
'---- default constructor
End Sub
Public Sub New(ByVal Name As String, ByVal Value As String)
Me.Name = Name
Me.Value = Value
End Sub
Public Property Name() As String
Get
Return _Name
End Get
Set(ByVal value As String)
_Name = value
End Set
End Property
Private _Name As String
Public Property Value() As String
Get
Return _Value
End Get
Set(ByVal value As String)
_Value = value
End Set
End Property
Private _Value As String
Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
Return Nothing
End Function
Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
reader.MoveToContent()
reader.ReadStartElement("Field")
reader.MoveToContent()
If reader.Name = "Name" Then Me.Name = reader.ReadElementContentAsString
reader.MoveToContent()
If reader.Name = "Value" Then Me.Value = reader.ReadElementContentAsString
reader.MoveToContent()
reader.ReadEndElement()
End Sub
Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
writer.WriteElementString("Name", Me.Name)
writer.WriteElementString("Value", Me.Value)
End Sub
End Class
Public Module Serialize
''' <summary>
''' Serializes the data contract to a string (XML)
''' </summary>
Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String
Dim stream = New System.IO.StringWriter
Dim xmlsettings = New Xml.XmlWriterSettings
xmlsettings.Indent = True
Dim writer = System.Xml.XmlWriter.Create(stream, xmlsettings)
Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T))
serializer.WriteObject(writer, SerializeWhat)
writer.Flush()
Return stream.ToString
End Function
''' <summary>
''' Deserializes the data contract from xml.
''' </summary>
Public Function Deserialize(Of T As Class)(ByVal xml As String) As T
Using stream As New MemoryStream(UnicodeEncoding.Unicode.GetBytes(xml))
Return DeserializeFromStream(Of T)(stream)
End Using
End Function
''' <summary>
''' Deserializes the data contract from a stream.
''' </summary>
Public Function DeserializeFromStream(Of T As Class)(ByVal stream As Stream) As T
Dim serializer As New DataContractSerializer(GetType(T))
Return DirectCast(serializer.ReadObject(stream), T)
End Function
End Module
Of particular note above is the ReadXML function of the Field object.
It checks the name of the node first and then places the value of the node into the appropriate property of that object. If I didn’t do that, the deserialization process would require the fields in the XML to be in a specific order. This is a minor drawback to the DataContractSerializer that this approach alleviates.
What’s Next?
The one unfortunate aspect of this is that it requires you to implement IXMLSerializable on each object that you want the XML cleaned up for.
Generally speaking, The DataContractSerializer will be perfectly fine for those cases where humans aren’t likely to ever have to see the XML you’re generating. And you get a performance boost for sacrificing that flexibility and “cleanliness”.
But for things like data file imports, custom configuration files, and the like, it may be desirable to implement custom serialization like this so that your xml files can be almost as easy to read as those old school INI files!