Cleaning up Messy DataContractSerializer XML

Filed under Code Garage, Software Architecture, VB Feng Shui, XML

I was working with XML serialization of objects recently and was using the good ol’ DataContractSerializer again.

One thing that I bumped into almost immediately is that the XML that it spits out isn’t exactly the neatest, tidiest of XML possible, to say the least.

So I set out on a little odyssey to see exactly how nice and clean I could make it.

(EDIT: I’ve added more information about how the Name property of the Field object is being serialized twice, which is another big reason for customizing the serialization here, and for specialized dictionary serialization in general).

First, the objects to serialize. I’ve constructed a very rudimentary object hierarchy that still illustrates the problem well.

In this case, I have a List of Record objects, called a Records list. Each Record object is a dictionary of Field objects. And each Field object contains two properties, Name and Value. The code for these (and a little extra code to make populating them easy) is as follows.

Public Class Records
    Inherits List(Of Record)


    Public Sub New()
        '---- default constructor
    End Sub

End Class


Public Class Record
    Inherits Dictionary(Of String, Field)


    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal ParamArray Fields() As Field)
        For Each f In Fields
            Me.Add(f.Name, f)
        Next
    End Sub
End Class


Public Class Field

    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub


    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String



    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String

End Class

Yes, I realize there are DataTables, KeyValuePair objects, etc that could do this, but that’s not the point, so just bear with me<g>.

To populate a Records object, you might have code that looks like this:

Dim Recs = New Records
Recs.Add(New Record(New Field("Name", "Darin"), New Field("City", "Arlington")))
Recs.Add(New Record(New Field("Name", "Gillian"), New Field("City", "Ft Worth")))
Recs.Add(New Record(New Field("Name", "Laura"), New Field("City", "Dallas")))

Ok, so far so good.

Now, lets serialize that with a simple serialization function using the DataContractSerializer:

    ''' <summary>
    ''' Serializes the data contract to a string (XML)
    ''' </summary>
    Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String
        Dim stream = New System.IO.StringWriter
        Dim writer = System.Xml.XmlWriter.Create(stream)

        Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T))
        serializer.WriteObject(writer, SerializeWhat)
        writer.Flush()

        Return stream.ToString
    End Function

In the test application, I put together, I dump the resulting XML to a text box. Yikes!

image

So, what’re the problems here? <g>

  1. You’ve got that “http://www.w3.org/2001/XMLSchema-instance” namespace attribute amongst other
  2. lots of random letters
  3. no indenting
  4. You can’t really tell it from this shot, but the Record dictionary is serializing the name property twice, because I’m using it as the Key for the dictionary, but it’s also a property of the objects in the dictionary.

All this noise might be fine for computer to computer communication, but it’s pretty tough on human eyes<g>.

Ok, first thing to do is indent:

    ''' <summary>
    ''' Serializes the data contract to a string (XML)
    ''' </summary>
    Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String
        Dim stream = New System.IO.StringWriter
        Dim xmlsettings = New Xml.XmlWriterSettings
        xmlsettings.Indent = True
        Dim writer = System.Xml.XmlWriter.Create(stream, xmlsettings)

        Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T))
        serializer.WriteObject(writer, SerializeWhat)
        writer.Flush()

        Return stream.ToString
    End Function

Notice that I added the use of the XMLWriterSettings object. This allows me to set the Indent property, and things are much more readable.

image

But that’s still a far cry from nice, simple, tidy XML. Notice all the “ArrayofArrayOf blah blah” names, and the randomized letter sequences? Plus, it’s much more obvious how the NAME jproperty is being serialized twice now. Yuck! Surely, we can do better than this!

Cleaning Up the Single Entity Field Object

The DataContractSerializer certainly works easily enough to serialize the Field object, but unfortunately, it decorates the serialized elements with a load of really nasty looking and completely unnecessary cruft.

My first thought was to simply decorate the class with <DataContract> attributes:

<DataContract(Name:="Field", Namespace:="")> _
Public Class Field

    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub

    <DataMember()> _
    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String



    <DataMember()> _
    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String

End Class

But this yields:

image

So we have several problems:

  • Each field is rendered into a Value element of the Record’s field collection
  • The Key of the Record collection duplicates the Name of the individual Field objects
  • and we still have a noxious xmlns=”” attribute being rendered.

Unfortunately, this is where the DataContractSerializer’s simplicity is it’s downfall. There’s just no way to customize this any further, using ONLY the DataContractSerializer.

However, we can implement IXMLSerializable on our Field object to customize its serialization. All I need to do is remove the DataContract attribute, and add a simple implementation of IXMLSerializable to the class:

Public Class Field
    Implements System.Xml.Serialization.IXmlSerializable


    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub

    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String


    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String


    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function


    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml

    End Sub


    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        writer.WriteElementString("Name", Me.Name)
        writer.WriteElementString("Value", Me.Value)
    End Sub
End Class

And that yields a serialization of:

image

Definitely better, but still not great.

Cleaning up a Generic Dictionary’s Serialization

The problem now is with the Record dictionary.

Public Class Record
    Inherits Dictionary(Of String, Field)
    Implements System.Xml.Serialization.IXmlSerializable


    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal ParamArray Fields() As Field)
        For Each f In Fields
            Me.Add(f.Name, f)
        Next
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml

    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each f In Me.Values
            DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
        Next
    End Sub
End Class

Adding an IXMLSerializable implementation to it as well yields the following XML:

image

Definitely much better! Especially notice that we’ve gotten rid of the duplicated “Name” key. It was duplicated before because we used the Name element of the Field object as the Key for the Record dictionary. This be play an important part in deserializing the Record’s dictionary of Field objects later.

Cleaning up the List of Records

Finally, the only thing really left to do is clean up how the generic list of Record objects is serialized.

But once again, the only way to alter the serialization is to implement IXMLSerializable on the class.

<Xml.Serialization.XmlRoot(Namespace:="")> _
Public Class Records
    Inherits List(Of Record)
    Implements System.Xml.Serialization.IXmlSerializable


    Public Sub New()
        '---- default constructor
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml

    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each r In Me
            DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
        Next
    End Sub
End Class

Notice that I’ve implemented IXMLSerializable, but I also added the XmlRoot attribute with a blank Namespace parameter. This completely clears the Namespace declaration from the resulting output, which now looks like this:

image

And that is just about as clean as your going to get!

But That’s Not all there is To It

Unfortunately, it’s not quite this simple. The thing is, you very well may want to serialize each object independently, not just serialize the Records collection. Doing that as we have things defined right now won’t work. The Start and End elements won’t be generated in the XML properly.

Instead, we need to add XmlRoot attributes to all three classes, and adjust where the WriteStartElement and WriteEndElement calls are made. So we end up with this:

<Xml.Serialization.XmlRoot(Namespace:="")> _ Public Class Records Inherits List(Of Record) Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml For Each r In Me writer.WriteStartElement("Record") DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer) writer.WriteEndElement() Next End Sub End Class <Xml.Serialization.XmlRoot(ElementName:="Record", Namespace:="")> _ Public Class Record Inherits Dictionary(Of String, Field) Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Sub New(ByVal ParamArray Fields() As Field) For Each f In Fields Me.Add(f.Name, f) Next End Sub Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml For Each f In Me.Values writer.WriteStartElement("Field") DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer) writer.WriteEndElement() Next End Sub End Class <Xml.Serialization.XmlRoot(ElementName:="Field", Namespace:="")> _ Public Class Field Implements System.Xml.Serialization.IXmlSerializable Public Sub New() '---- default constructor End Sub Public Sub New(ByVal Name As String, ByVal Value As String) Me.Name = Name Me.Value = Value End Sub Public Property Name() As String Get Return _Name End Get Set(ByVal value As String) _Name = value End Set End Property Private _Name As String Public Property Value() As String Get Return _Value End Get Set(ByVal value As String) _Value = value End Set End Property Private _Value As String Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema Return Nothing End Function Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml End Sub Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml writer.WriteElementString("Name", Me.Name) writer.WriteElementString("Value", Me.Value) End Sub End Class

 

 

And Finally, Deserialization

Of course, all this would be for nought if we couldn’t actually deserialize the xml we’ve just spent all this effort to clean up.

Turns out that deserialization is pretty straightforward. I just needed to add code to the ReadXml member of the implemented IXMLSerializable interface. The full code for my testing form is below. Be sure to add a reference to System.Runtime.Serialization, though, or you’ll have type not defined errors.

Public Class frmSample

    Private Sub btnTest_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles btnTest.Click

        '---- populate the objects
        Dim Recs = New Records
        Recs.Add(New Record(New Field("Name", "Darin"), New Field("City", "Arlington")))
        Recs.Add(New Record(New Field("Name", "Gillian"), New Field("City", "Ft Worth")))
        Recs.Add(New Record(New Field("Name", "Laura"), New Field("City", "Dallas")))

        Dim t As String
        t = Serialize(Of Field)(Recs(0).Values(0))
        Dim fld = Deserialize(Of Field)(t)
        Debug.Print(fld.Name)
        Debug.Print(fld.Value)
        Debug.Print("--------------")

        t = Serialize(Of Record)(Recs(0))
        Dim rec = Deserialize(Of Record)(t)
        Debug.Print(rec.Values.Count)
        Debug.Print("--------------")

        t = Serialize(Of Records)(Recs)
        tbxOutput.Text = t

        Dim recs2 = Deserialize(Of Records)(t)
        Debug.Print(recs2.Count)
    End Sub
End Class


<Xml.Serialization.XmlRoot(Namespace:="")> _
Public Class Records
    Inherits List(Of Record)
    Implements System.Xml.Serialization.IXmlSerializable


    Public Sub New()
        '---- default constructor
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
        reader.MoveToContent()
        reader.ReadStartElement("Records")
        reader.MoveToContent()
        Do While reader.NodeType <> Xml.XmlNodeType.EndElement
            Dim Rec = New Record
            DirectCast(Rec, System.Xml.Serialization.IXmlSerializable).ReadXml(reader)
            Me.Add(Rec)
            reader.MoveToContent()
        Loop
        reader.ReadEndElement()
    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each r In Me
            writer.WriteStartElement("Record")
            DirectCast(r, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
            writer.WriteEndElement()
        Next
    End Sub
End Class


<Xml.Serialization.XmlRoot(ElementName:="Record", Namespace:="")> _
Public Class Record
    Inherits Dictionary(Of String, Field)
    Implements System.Xml.Serialization.IXmlSerializable


    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal ParamArray Fields() As Field)
        For Each f In Fields
            Me.Add(f.Name, f)
        Next
    End Sub

    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function

    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
        reader.MoveToContent()
        reader.ReadStartElement("Record")
        reader.MoveToContent()
        Do While reader.NodeType <> Xml.XmlNodeType.EndElement
            Dim fld = New Field
            DirectCast(fld, System.Xml.Serialization.IXmlSerializable).ReadXml(reader)
            Me.Add(fld.Name, fld)
            reader.MoveToContent()
        Loop
        reader.ReadEndElement()
    End Sub

    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        For Each f In Me.Values
            writer.WriteStartElement("Field")
            DirectCast(f, System.Xml.Serialization.IXmlSerializable).WriteXml(writer)
            writer.WriteEndElement()
        Next
    End Sub
End Class


<Xml.Serialization.XmlRoot(ElementName:="Field", Namespace:="")> _
Public Class Field
    Implements System.Xml.Serialization.IXmlSerializable


    Public Sub New()
        '---- default constructor
    End Sub


    Public Sub New(ByVal Name As String, ByVal Value As String)
        Me.Name = Name
        Me.Value = Value
    End Sub

    Public Property Name() As String
        Get
            Return _Name
        End Get
        Set(ByVal value As String)
            _Name = value
        End Set
    End Property
    Private _Name As String


    Public Property Value() As String
        Get
            Return _Value
        End Get
        Set(ByVal value As String)
            _Value = value
        End Set
    End Property
    Private _Value As String


    Public Function GetSchema() As System.Xml.Schema.XmlSchema Implements System.Xml.Serialization.IXmlSerializable.GetSchema
        Return Nothing
    End Function


    Public Sub ReadXml(ByVal reader As System.Xml.XmlReader) Implements System.Xml.Serialization.IXmlSerializable.ReadXml
        reader.MoveToContent()
        reader.ReadStartElement("Field")
        reader.MoveToContent()
        If reader.Name = "Name" Then Me.Name = reader.ReadElementContentAsString
        reader.MoveToContent()
        If reader.Name = "Value" Then Me.Value = reader.ReadElementContentAsString
        reader.MoveToContent()
        reader.ReadEndElement()
    End Sub


    Public Sub WriteXml(ByVal writer As System.Xml.XmlWriter) Implements System.Xml.Serialization.IXmlSerializable.WriteXml
        writer.WriteElementString("Name", Me.Name)
        writer.WriteElementString("Value", Me.Value)
    End Sub
End Class



Public Module Serialize
    ''' <summary>
    ''' Serializes the data contract to a string (XML)
    ''' </summary>
    Public Function Serialize(Of T As Class)(ByVal SerializeWhat As T) As String
        Dim stream = New System.IO.StringWriter
        Dim xmlsettings = New Xml.XmlWriterSettings
        xmlsettings.Indent = True
        Dim writer = System.Xml.XmlWriter.Create(stream, xmlsettings)

        Dim serializer = New System.Runtime.Serialization.DataContractSerializer(GetType(T))
        serializer.WriteObject(writer, SerializeWhat)
        writer.Flush()

        Return stream.ToString
    End Function


    ''' <summary>
    ''' Deserializes the data contract from xml.
    ''' </summary>
    Public Function Deserialize(Of T As Class)(ByVal xml As String) As T
        Using stream As New MemoryStream(UnicodeEncoding.Unicode.GetBytes(xml))
            Return DeserializeFromStream(Of T)(stream)
        End Using
    End Function


    ''' <summary>
    ''' Deserializes the data contract from a stream.
    ''' </summary>
    Public Function DeserializeFromStream(Of T As Class)(ByVal stream As Stream) As T
        Dim serializer As New DataContractSerializer(GetType(T))
        Return DirectCast(serializer.ReadObject(stream), T)
    End Function
End Module

Of particular note above is the ReadXML function of the Field object.

It checks the name of the node first and then places the value of the node into the appropriate property of that object. If I didn’t do that, the deserialization process would require the fields in the XML to be in a specific order. This is a minor drawback to the DataContractSerializer that this approach alleviates.

What’s Next?

The one unfortunate aspect of this is that it requires you to implement IXMLSerializable on each object that you want the XML cleaned up for.

Generally speaking, The DataContractSerializer will be perfectly fine for those cases where humans aren’t likely to ever have to see the XML you’re generating. And you get a performance boost for sacrificing that flexibility and “cleanliness”.

But for things like data file imports, custom configuration files, and the like, it may be desirable to  implement custom serialization like this so that your xml files can be almost as easy to read as those old school INI files!

5 Comments

  1. Fred says:

    Hello,

    Cool stuff. But why not using the DataContractSerializer constructor for which you can specify to preserve or not object references. If you give ‘false’, it simply removes all namspace reference and unwanted attributes.

    Simple, no ?

    Fred

    • Darin says:

      Hi Fred

      Thanks for the comment!

      It’s been a bit since I worked with the DataContractSerializer, but I don’t believe you’ll end up with particularly clean XML even if you supply the argument on the constructor you mention.

      Don’t get me wrong, I love the DCS. It’s simple to use and works for a ton of cases.

      In fact, it’s very likely you could use it to READ the xml files I mention above, and just hand roll your own xml serializer (ie let DCS handle the deserialization). I haven’t tried that combo in particular.

      I just didn’t have much luck getting really clean xml (the kind I’d show my grandmother, or a particularly tech-averse manager), out of the DCS.

      I’ll take another look though. You’ve got me curious.

      • Fred says:

        Hi Darin,

        Yes, just try this constructor:

        DataContractSerializer serializer = new DataContractSerializer(type, null, int.MaxValue, true, false, null);

        The ‘false’ parameter specifies that you don’t want to preserve objects references and so that you will generate a standard XML.
        With that, I’ve got the exact same XML than with former .NET XML serializer (no namespace and dummy attributes, simply a clean readable XML).

      • Fred says:

        Hi again,

        Just one precision, I forget: You need also to specify the same namespace (in the DataContract attribute) for all objects involved in the serialization process (except of course, if the object that you need to serialize only contains objects in the same namespace).

        Otherwise, you still have namespace references.

        Cheers,

        Fred

  2. i have to post a link to here, from mine, if you dont mind, please reply, cya

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*