Parsing Key-Value Pairs Via Regular Expressions

Filed under Code Garage, Regular Expressions, VB Feng Shui

I’ve often found myself in need of a Key-Value pair parser. Simple stuff, really. Essentially, the idea is to be able to parse any of the following from a typical command buffer:

Key=Value (no whitespace in the key or value)

Key=”Value” (whitespace is ok in the value)

Key (when just the existence of the key signals something)

This sort of parser is fairly easy to write, but this time, I’d just finished playing with regular expressions for another parsing task, so I thought, why not give them a try here?

After a few minutes with the Rad Regular Expression Designer, I’d put together what appeared to be a pretty robust expression for this.

My version keys of the matches instead of the seperators. I did this mainly because I wanted the Key and Value parts to be returned as “cleanly” as possible. That means the Key should be just the Key, no whitespace or “=” and the value should never include the leading or trailing quote marks, if they’re there).

The end result is a function that takes a string buffer and returns a generic Dictionary of Key Value string pairs.

Imports System.Text.RegularExpressions


Module RegEx

    Public Function ParseKeyValuePairs(ByVal Buffer As String) As Dictionary(Of String, String)
        Dim Result = New Dictionary(Of String, String)

        '---- There are 3 sub patterns contained here, seperated at the | characters
        '     The first retrieves name="value", honoring doubled inner quotes
        '     The second retrieves name=value where value can't contain spaces
        '     The third retrieves name alone, where there is no "=value" part (ie a "flag" key
        '        where simply its existance has meaning
        Dim Pattern = "(?:(?<key>\w+)\s*\=\s*""(?<value>[^""]*(?:""""[^""]*)*)"") | " & _
                      "(?:(?<key>\w+)\s*\=\s*(?<value>[^""\s]*)) | " & _
                      "(?:(?<key>\w+)\s*)"
        Dim r = New System.Text.RegularExpressions.Regex(Pattern, RegexOptions.IgnorePatternWhitespace)

        '---- parse the matches
        Dim m As System.Text.RegularExpressions.MatchCollection = r.Matches(Buffer)

        '---- break the matches up into Key value pairs in the return dictionary
        For Each Match As System.Text.RegularExpressions.Match In m
            Result.Add(Match.Groups("key").Value, Match.Groups("value").Value)
        Next
        Return Result
    End Function


    Public Sub Test()
        Dim s = "Key1=Value Key2=""My Value here"" Key3=Test Key4 Key5"
        Dim r = ParseKeyValuePairs(s)
        For Each i In r
            Debug.Print(i.Key & "=" & i.Value)
        Next
    End Sub
End Module

I’ve included a simple test function to help validate it.

DISCLAIMER: I’m not RegEx guru, so there are likely much faster ways to assemble the Regex. If you have one, by all means please comment! And finally, it’s entirely possible that I’ve missed some examples of badly formed input that would cause weird parsing results.

For instance, in the above example, notice that “Key3=Test Key4 Key5” will return Key3 set to “Test” and Key4 and Key5 set to empty strings.

If the user meant for Key3’s value to be “Test Key4 Key5”, there would need to be quotes around the value.

But, parsing issues like that will be the norm in any kind of parsing logic for formats such as this, so I’m not terribly worried about it.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*