Category Archives: Regular Expressions

Parsing Key-Value Pairs Via Regular Expressions

5
Filed under Code Garage, Regular Expressions, VB Feng Shui

I’ve often found myself in need of a Key-Value pair parser. Simple stuff, really. Essentially, the idea is to be able to parse any of the following from a typical command buffer:

Key=Value (no whitespace in the key or value)

Key=”Value” (whitespace is ok in the value)

Key (when just the existence of the key signals something)

This sort of parser is fairly easy to write, but this time, I’d just finished playing with regular expressions for another parsing task, so I thought, why not give them a try here?

After a few minutes with the Rad Regular Expression Designer, I’d put together what appeared to be a pretty robust expression for this.

My version keys of the matches instead of the seperators. I did this mainly because I wanted the Key and Value parts to be returned as “cleanly” as possible. That means the Key should be just the Key, no whitespace or “=” and the value should never include the leading or trailing quote marks, if they’re there).

The end result is a function that takes a string buffer and returns a generic Dictionary of Key Value string pairs.

Imports System.Text.RegularExpressions


Module RegEx

    Public Function ParseKeyValuePairs(ByVal Buffer As String) As Dictionary(Of String, String)
        Dim Result = New Dictionary(Of String, String)

        '---- There are 3 sub patterns contained here, seperated at the | characters
        '     The first retrieves name="value", honoring doubled inner quotes
        '     The second retrieves name=value where value can't contain spaces
        '     The third retrieves name alone, where there is no "=value" part (ie a "flag" key
        '        where simply its existance has meaning
        Dim Pattern = "(?:(?<key>\w+)\s*\=\s*""(?<value>[^""]*(?:""""[^""]*)*)"") | " & _
                      "(?:(?<key>\w+)\s*\=\s*(?<value>[^""\s]*)) | " & _
                      "(?:(?<key>\w+)\s*)"
        Dim r = New System.Text.RegularExpressions.Regex(Pattern, RegexOptions.IgnorePatternWhitespace)

        '---- parse the matches
        Dim m As System.Text.RegularExpressions.MatchCollection = r.Matches(Buffer)

        '---- break the matches up into Key value pairs in the return dictionary
        For Each Match As System.Text.RegularExpressions.Match In m
            Result.Add(Match.Groups("key").Value, Match.Groups("value").Value)
        Next
        Return Result
    End Function


    Public Sub Test()
        Dim s = "Key1=Value Key2=""My Value here"" Key3=Test Key4 Key5"
        Dim r = ParseKeyValuePairs(s)
        For Each i In r
            Debug.Print(i.Key & "=" & i.Value)
        Next
    End Sub
End Module

I’ve included a simple test function to help validate it.

DISCLAIMER: I’m not RegEx guru, so there are likely much faster ways to assemble the Regex. If you have one, by all means please comment! And finally, it’s entirely possible that I’ve missed some examples of badly formed input that would cause weird parsing results.

For instance, in the above example, notice that “Key3=Test Key4 Key5” will return Key3 set to “Test” and Key4 and Key5 set to empty strings.

If the user meant for Key3’s value to be “Test Key4 Key5”, there would need to be quotes around the value.

But, parsing issues like that will be the norm in any kind of parsing logic for formats such as this, so I’m not terribly worried about it.

A One Line CSV Parser

4
Filed under Code Garage, Regular Expressions, Utilities, VB Feng Shui

Parsing up Quote Comma delimited text is a pretty common thing to do, and it seems trivial enough till you realize all the little gotcha’s that come with the problem (like doubled quotes, commas in quotes, etc, etc). Then it becomes just another laborious exercise in boring coding.

I came across a regex some time ago that makes the process literally one line of code. I can no longer find the original author but the code I found (what little there was of it) was C# and actually split up into a few lines of code, so this is converted to the equivalent VB.net:

    Public Function QCSplit(ByVal Args As String) As String()

        Return (New System.Text.RegularExpressions.Regex(",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))")).Split(args)
    End Function

The regex used here doesn’t actually match on the contents, it matches on the commas that split things up, and it then uses the SPLIT function to actually split those things up.

I’ve used this a while now and it works great, but I’ve seen a few interesting alternatives since.

The most interesting thus far is this one by Daniel Einspanjer. Not interesting enough yet for me to switch to it, but the regexlib.com website is quite nice as a great repository of good regex recipes.

Regular Expression Tester

And speaking of Regular Expressions, the guys over at RAD Software, have a free Regular expression tester for .net style regular expressions that works fantastically. If you’re just starting out in regex’s (and seriously, who isn’t <G>), you owe it to yourself to pick up a decent regex test tool, and this one is as good as I’ve seen so far.

image

As far as I can tell, it can handle all the various options for regex’s, and dynamically shows match results, etc. Very handy for trying out expressions without actually running them in .net.

Add it to your External Tools menu in VS and it’ll be right there, good to go.