Category Archives: Office

Word Headers/Footers and the LinkToPrevious

Filed under Office, Troubleshooting, Word

imageIf you’ve had to work much with Word’s Object Model, either from VBA macro scripts or C#/ projects, you’ve probably dealt with Headers and Footers at one point or another. And Headers and Footers can be particularly convoluted things in Word.

I’m not going to try explaining the whole “three objects, but some exist and some don’t, and you have to jump through hoops to see them” issues of Headers and footers, but rather focus on one small element: LinkToPrevious.

As the name suggests, the LinkToPrevious property indicates whether the header or footer for a particular section is “Linked to” the same header/footer of the previous section. When linked, changing the header/footer of either section automatically changes the other, which makes sense.

Often, when scanning through the sections of a document, you’ll need to adjust headers and footers, to a common pattern in code is to check the LinkToPrevious property and if true, skip that header/footer. The reason is because, since it’s linked to the immediately previous section, and you’re scanning through all sections starting at the top, you’ve already processed it. Makes sense.

Except, that the LinkToPrevious property should NEVER be true for the header/footer of the FIRST section.

But it can be!

To prove it, create a document with two sections, set the header for the second section to “LinkToPrevious” true. Then delete the first section. If you do it right. You’ll end up with a single section document whose header is set LinkToPrevious True. Technically, this shouldn’t be possible and it certainly isn’t correct.

The end result is that if you’re scanning headers/footers like this, you have to check LinkToPrevious on all sections except the first.

Integrating with the New Office Backstage from a VSTO 3 Addin

Filed under .NET, Code Garage, Office, VSTO, Word

If you’re like me and want to drop straight to the code, you can download the sample project here.

When it comes to writing addins for Office applications, you really only have 2 choices.

  • Old School – Implementing the IExtensibility2 interface
  • VSTO

The problem is that VSTO addins tend to be one trick ponies. If you want to write a single addin that works in multiple Office applications, or if you want a single addin to work across multiple Office Application versions, you’ll likely run into walls with VSTO. Sure, you could create separate DLL’s for each Office app, and for each target version, but good lord, who wants to do that?

Still, VSTO makes dealing with Ribbons and TaskPanes much easier and can be handy when you need to target a single Office app, say, Word, and when you’re specifically concerned with a single version of the app (or maybe the latest few).

This was the case recently with an addin I was working on.

The Situation

The target application was Word, specifically Word 2010. However, virtually all of the addin was finished by the time VSTO 4 and VS2010 was released. Since we didn’t really want to run the risks of retooling the addin for VSTO 4 (seeing as it’s a brand new platform and we were close to the end of the dev cycle), the decision was made to stick with VSTO 3.

No big deal. VSTO 3 addins run perfectly fine under Word 2010.

The Snag

During the ramp up on Word 2010, however, we discovered the new Backstage view:


It’s a really nice extension to the traditional Office File menu. One of the few new elements of Office 2010 that really makes upgrading worthwhile, in my opinion, but that’s another story.

At any rate, it turns out that the Backstage view is extensible via XML, very much like the Ribbon.

Unfortunately, VSTO 3 has no support for the Backstage view.

Customizing the BackStage in the First Place

For me, though, the first question was exactly how do you customize the BackStage? Turns out, it’s fairly simple. If you’ve manually customized the Ribbon before, you’ll recognize the process instantly. John Durant wrote up a short and sweet article on the process here.

Unfortunately, he describes modifying the BackStage by altering the Custom XML package in a document. This makes the BackStage modification document specific. Not the way you want to do it for an addin.

I also found this great article, but it describes modifying the BackStage by implementing an addin using the older style IExtensibility2 model, not VSTO.

More articles, same shortcomings. This was starting to look as hopeless as the Cowboy’s season this year.

I knew that modifying the ribbon via an IExtensibility2 addin was as simple as creating a public function called GetCustomUI, and returning the XML for whatever custom buttons you want added to the Word Ribbon. From the articles mentioned above, it was obvious that the exact same process was used to alter the BackStage. The problem was, there wasn’t any obvious way to get at  the GetCustomUI function in a VSTO addin. All of that XML munging is automagically handled for you by VSTO.

Further, I wanted, if at all possible, to continue to leverage the VSTO Ribbon support. The Visual Studio Ribbon Designer is just too handy to have to go back to manually crafting up the XML for it.

However, I knew that no matter what happened, I was still going to have to manually build the XML for my BackStage customizations. That was ok, though.

The Wrong Way – Wrap RibbonManager

I initially thought I could just create a new object that wrapped RibbonManager.

The main VSTO Connect class (that itself inherits from the VSTO Addin class), exposes the overridable function CreateRibbonExtensibilityObject  that expects a IRibbonExtensibility object as a return value.

So, create a new object, RibbonManagerInterceptor, that implements IRibbonExtensibility, and just wraps the internally created RibbonManager object, and return that, like so:

Private _RibbonExt As Microsoft.Office.Core.IRibbonExtensibility
Protected Overrides Function CreateRibbonExtensibilityObject() As Microsoft.Office.Core.IRibbonExtensibility
    Dim RibbonManager = MyBase.CreateRibbonExtensibilityObject()
    _RibbonExt = New RibbonManagerInterceptor(RibbonManager)
    Return _RibbonExt
End Function

Unfortunately, it won’t work. The VSTO authors make the assumption that the object implementing IRibbonExtensibility is of base type RibbonManager. For instance:

Private ReadOnly Property RibbonExtensibility As IRibbonExtensibility
        If (Me.ribbonExtensibility Is Nothing) Then
            Me.ribbonExtensibility = Me.CreateRibbonExtensibilityObject
            Dim ribbonExtensibility As RibbonManager = TryCast(Me.ribbonExtensibility,RibbonManager)
            If (Not ribbonExtensibility Is Nothing) Then
                ribbonExtensibility.ServiceProvider = MyBase.HostContext
            End If
        End If
        Return Me.ribbonExtensibility
    End Get
End Property

This is from the Addin class in Microsoft.Office.Tools.Common.v9.0. You’ll notice that the property accepts an IRibbonExtensibility object, but then proceeds to cast it as RibbonManager and if that fails, the addin won’t register the service provider.

What’s worse is that RibbonManager has been marked NotInheritable, so you can’t create a subclass from it to pass this test. Full stop.

Intercepting GetCustomUI

Boiling the basic requirements of an Office addin down, I knew I had to implement IRibbonExtensibility and the GetCustomUI method.

    Public Function GetCustomUI(ByVal RibbonID As String) As String Implements IRibbonExtensibility.GetCustomUI
        Dim xml = _RibbonManager.GetCustomUI(RibbonID)

        If _Connect.Core.WordInstance.Version = "14.0" Then
            '---- only add in backstage support for version 14 (Office 2010)
            Dim bs = <backstage>
                         <tab id="bsSample" label="BackStageSample" insertAfterMso="TabInfo">
                                 <group id="bsSampleGroup" label="BackStage Sample Group">
                                         <button id="BackStageSample"
                                             label="BackStage Sample Button"
            xml = xml.Replace("</customUI>", bs.ToString & "</customUI>")
            xml = xml.Replace("", "")
        End If
        Return xml
    End Function

The first step is to use the underlying RibbonManager object to generate its version of the CustomUI XML. That allows me to continue to use the Ribbon Designer in Visual Studio and the RibbonManager for all the heavy lifting where the Ribbon is concerned.

Next, if I see the addin is hosted by Word v14 (2010), I use the handy inline XML feature of to create the BackStage custom XML and inject it into the Ribbon XML previously already generated.

And finally, I have to patch the schema used, so that Word knows I’m defining BackStage customizations in the XML as well as Ribbon customizations.

A Better Way

I knew that VSTO was somehow intercepting  all callbacks from Word as defined in the CustomUI XML, and then generating events for the various Ribbon controls, or interrogating control properties. Maybe there was a way to hook into that process.

So, I loaded up Reflector and started spelunking.

It didn’t take long to realize what was going on.

When you set up a custom UI for Word, you define callback functions in the XML that Word will then call when necessary. Those functions must be public functions on the object that implements IRibbonExtensibility and that is exposed as a COM object. Further, those functions are ALWAYS called via IDispatch. For instance, take the following custom UI:

    <tab id="bsSample" label="BackStageSample" insertAfterMso="TabInfo">
            <group id="bsSampleGroup" label="BackStage Sample Group">
                    <button id="BackStageSample"
                            label="BackStage Sample Button"

The onAction element above defines a function called bsSampleClicked, that Word will call on the IRibbonExtensibility object whenever that button is clicked.

Interestingly, it doesn’t matter that this is for a BackStage object (a button) and not a Ribbon. Word vectors everything through that IRibbonExtensibility object.

So, how does that process work then?

Well, as it turns out, it’s not terribly complicated, but it does require a little work. With .NET, you implement a latebound IDispatch type call by implementing the IReflect interface. That interface contains a number of members that need to be implemented, though most can be stubbed out.

However, you’ll definitely need to implement the GetMethods and InvokeMember functions.

GetMethods returns to the caller an array of MethodInfo objects that describe the all latebound methods that our IRibbonExtensibility object will be supporting. Since we want to continue to allow the RibbonManager to service all of the methods that it needs to handle, you first need to retrieve the RibbonManager’s list of supported methods and then add to it:

    Private Function IReflect_GetMethods(ByVal bindingAttr As BindingFlags) As MethodInfo() Implements IReflect.GetMethods
        Dim ir = DirectCast(_RibbonManager, IReflect)
        Dim r = ir.GetMethods(bindingAttr)
        ReDim Preserve r(UBound(r) + 1)
        Dim mi = BackstageMethodInfo.CheckBoxActionMethod(DirectCast(_RibbonManager, RibbonManager), "bsSampleClicked", Nothing)
        r(UBound(r)) = mi
        Return r
    End Function

Finally, to vector the method call properly, define the InvokeMethod method like so:

    Private Function IReflect_InvokeMember(ByVal name As String, ByVal invokeAttr As BindingFlags, ByVal binder As Binder, ByVal target As Object, ByVal args As Object(), ByVal modifiers As ParameterModifier(), ByVal culture As CultureInfo, ByVal namedParameters As String()) As Object Implements IReflect.InvokeMember
        Dim r As Object
        If name.StartsWith("bs") Then
            '---- it's a Backstage control, just intercept and pass through
            '     to the connect object
            Select Case name.ToLower
                Case "bssampleclicked"
                Case Else
            End Select
            Return Nothing
                Dim ir = DirectCast(_RibbonManager, IReflect)
                r = ir.InvokeMember(name, invokeAttr, binder, _RibbonManager, args, modifiers, culture, namedParameters)
            Catch ex As Exception
                Throw New TargetInvocationException(ex)
            End Try
            Return r
        End If
    End Function

Here, if the callback function Word is trying to call starts with “bs”, I assume it’s one of my “backstage” callbacks and I drop into the first Select Case.

If not, I forward the call on through to underlying RibbonManager object, so VSTO can continue to work it’s magic.

The actual function that handles the callback, I defined in my VSTO Connect object:

    '---- Testing function
    Friend Sub bsSampleClicked()
        MsgBox("BackStage Sample Button was Clicked")
    End Sub

A Few Side Notes

I put together a small sample addin for Word that illustrates this approach. Download it here.

If you download and unzip the sample project, one of the first things you’re likely to notice is the References folder. For addins like this, I like to copy any referenced dll locally to the project and reference it from there, rather than scattering referenced DLLs all over my system. It makes moving the project to other machines much easier, among other benefits.

You’ll also notice that the DLLs are for Office 2007, as that’s part of the point of this sample; to show that a VSTO 3 addin  can target both 2007 and 2010 and support Ribbons and the Backstage.

One downside to this approach is that VSTO 3 insists  that Office 2007 be installed, even if you’re actually only targeting 2010. You’ll get errors and very specific warnings to that effect if you try to run the project on a machine without Office 2007.

Just make sure you’ve installed Office 2007, then Office 2010, to be able to run the addin against both versions.

COM Visibility

Since addins, even VSTO addin’s, are, at their core, COM dlls, you’ll likely find yourself having to deal with COM interop to some degree. To make that easier, I usually turn OFF COM visibility for the WHOLE PROJECT in the Assembly.vb file:

' Setting ComVisible to false makes the types in this assembly not visible 
' to COM components.  If you need to access a type in this assembly from 
' COM, set the ComVisible attribute to true on that type.
<Assembly: ComVisible(False)> 

And then turn ON COM visibility for each class that actually needs to be visible via COM.

''' <summary>
''' Core object for this addin
''' </summary>
''' <remarks></remarks>
<ComVisible(True)> _
<Guid("8618E3FB-D57B-4875-ABE4-D204E1C1046A")> _
Public Class Core


To make the project easy to debug, I always set the start action to “Start External Program”, and point it at my WinWord.exe in the Office installation.


You may need to change the path to WinWord as applicable to your system.

Sample Project Requirements

To run the sample project, you’ll need the following:

  • Visual Studio 2008
  • Office 2007
  • To test the backstage support, you’ll also need Office 2010 installed (They can be installed side by side except for Outlook).
  • Visual Studio Tools for Office 3.0 (VSTO). I found the installer already on my system at

    c:\Program Files (x86)\Microsoft SDKs\Windows\v6.0A\Bootstrapper\Packages\VSTOR30\vstor30.exe
  • You’ll likely also want the service pack for VSTO. It was located here:

    c:\Program Files (x86)\Microsoft SDKs\Windows\v6.0A\Bootstrapper\Packages\VSTOR30\vstor30sp1-KB949258-x86.exe

The Wrapup

So there you have it. Support for the latest Backstage View in a VSTO 3 based addin. It required a fair bit of poking around under the covers, something that would simply not be possible without Lutz Roeder’s excellent Reflector tool. If you don’t have that in your toolbox, you’re missing out on a fantastic debugging, diagnostic, research, and discovery tool!

In the end, if you need your addin to support multiple Office applications or a wide variety of versions, your best bet will be to go with IExtensibility2 (or possible AddinExpress, though I’ve never used it, and have no idea whether it truly makes things easier for multi targeting or not).

And finally, the standard disclaimer. IWOMM (It works on my machine). If you find bugs or problems, please let me know. Heck, if you know a better way (short of converting to VSTO 4!), I’d love to hear about it! I’m fairly certain that this isn’t the only way to skin this particular cat.

Word’s Compatibility Options

Filed under .NET, Code Garage, Office, Word

One element you’ll eventually run up against when dealing with Word documents is "compatibility”.


You can see one small indicator of compatibility in the above screenshot. When you open any old format DOC file in Word 2007 or 2010, it’ll open in compatibility mode.

But what does that mean, really?

Compatibility in a Nutshell

Word’s Compatibility mode actually encompasses a fairly significant number of tweaks to how Word renders a document. You can see those options at the bottom of the Advanced tab on Word’s Options dialog.


a small sampling of compatibility options available

Word sets those options in a number of different combinations depending on the source of the original document. You can select the source document manually via the below dropdown, but most of the time, Word will choose an appropriate selection automatically when it opens the original  document.


The problem is, many of those options can cause Word to render a document in very strange, unpredictable ways. In fact, most Word experts advise to manually turn OFF all compatibility options (and so force Word to layout the document using it’s most recent rules, either those set for Word 2007 or for Word 2010).

Controlling Compatibility Programmatically

Manipulating those settings manually via the Options dialog is fine, but if you’ve got thousands of documents to deal with, that may not be your best approach.

Why not automate it with a bit of .net code?

    Private Sub ForceCompatibility(ByVal Doc As Word.Document)

        Doc.Application.Options.DisableFeaturesbyDefault = False
        For Each e As Word.WdCompatibility In [Enum].GetValues(GetType(Word.WdCompatibility))
            Dim res As Boolean = False
            Select Case e
                Case Word.WdCompatibility.wdDontULTrailSpace
                    res = True
                Case Word.WdCompatibility.wdDontAdjustLineHeightInTable
                    res = True
                Case Word.WdCompatibility.wdNoSpaceForUL
                    res = True
                Case Word.WdCompatibility.wdExpandShiftReturn
                    res = False
                Case Word.WdCompatibility.wdLeaveBackslashAlone
                    res = True
                Case Word.WdCompatibility.wdDontBalanceSingleByteDoubleByteWidth
                    res = True
            End Select
            Doc.Compatibility(e) = res
    End Sub

So, what’s going on here?

First, I pass in a Document variable to the function. This variable contains a reference to the Document object you need to fix compatibility on. You can easily obtain teh Active document object with the Application.ActiveDocument property.

The function first calls the Doc.Convert method. This converts the document to the latest format available to the version of Word you’re running and enables all new features. This conversion happens in memory. Nothing is saved to disk at this point. Also, you’d think that this would be enough to turn off all compatibility tweaks, but alas, Word leaves many of them still turned of after this conversion.

Next, it uses the DisableFeaturesbyDefault property to enable all available features.

Then, it enumerates all the wdCompability options to clear or set them as necessary. This handy bit of trickery can come in handy in a number of situations. Essentially, it uses a little reflection here:


to retrieve an array of all the possible values of the wdCompatibility enumeration.

Then, the For loop simply iterates through all the values in that array.

The SELECT CASE then matches up a few specific enumeration values that don’t work quite like you might expect, and handles them with special cases.

For instance, the wdNoSpaceForUL option is equivalent to the “Add Space for Underlines” option in the Word Options Dialog. However, the logic is reversed.

Leave it to a word processing program to make use of double negatives like this!

Odd Man Out

The wdExpandShiftReturn option is the one exception. When cleared, this option prevents Word from expanding spaces on lines that end with a soft Return (a Shift-Return). Many Word experts indicate that this is preferable to Word’s default behavior of expanding those spaces, so I’ve set it here accordingly. Your mileage may vary.

And finally, be sure the SAVE the document in xml format, using the Save method like this:

Doc.SaveAs FileName:="doc.xml", FileFormat:=wdFormatXML

So there you have it. A quick and easy way to force a document into the latest Word file format and turn off any errant compatibility options in the process.

Word and the Disappearing Cell

Filed under Office, Troubleshooting, VB Feng Shui


So I was off working on a new bit of code today when an email came through:

“The tool is blanking out a cell when it’s not supposed to!”


No big deal though. I fire up the project, load up the document in question and give it a run.

Sure enough, a cell that contains some text and a merge field (under my control) is being completely cleared out. Only the merge field should be being cleared.

I dig a little more and eventually narrow the culprit down to a single function.

Private Function MyFunc(Rng as Range)
End Function

When I hit that Rng.Delete line, the ENTIRE CELL was being cleared. Including text that had nothing to do with my merge field (other than the fact that it was also in the same table cell).

After a bit more single stepping through code, I discovered that the Range being deleted was actually empty. In other words, it’s Start position was the same as it’s End position.

Now, I’ve know from dealing with Word for far too many years that when you’re working with an empty range like that, there are a good number of things that don’t work quite right. Many will throw errors, but I was getting no error here.

Then it hit me. As a test, I added a few chars of text at the VERY END of the cell, after my merge field.

Sure enough, everything worked, even though the range being deleted was still empty.

The End-of-Cell Marker.

Word stores an End-of-Cell marker character at the end of each cell in a table. The marker is a Chr(7) if you retrieve it, but you wouldn’t normally ever even bother with it.

However, if you end up with an empty Range object that is situated right at the end of a cell, then that Range is effectively pointing at the End of Cell marker, and if you invoke the Delete method on the Range, you’ll end up invoking it on the End of Cell Marker.

And what happens if you try and delete an End of Cell Marker?

Well, duh. The cell is cleared.

Moral of the Story

The short version of all this is simple:

If you plan on deleting a Range, make sure that the Range.Start <> Range.End before you do:

If Range.Start <> Range.End Then
End If

Technically speaking, if start = end, then there’s nothing in the range anyway and so there shouldn’t be anything there to delete. But doing so can definitely have detrimental effects in some cases.

And, as usual, your mileage may vary.

VSTO 3 in Visual Studio 2008 under Office 2010

Filed under Office, Tweaks

Ok, bit of a weird combination. If you’re developing for Office 2010, you’re using VS2010, right? Uh huh…

Seriously, if you’re like a lot of folks, you might not be upgrading to VS2010 soon, so Microsoft can shake the bugs out in an SP1. But still, you have an Office VSTO addin in VS2008, that you’d like to be able to run in Office 2010, while debugging in the VS IDE.

If you’ve tried this without having Office 2007 installed ALSO along with the Office 2007 Primary Interops, you’ve most likely gotten a nasty message in the errors window telling you you “can’t compile this application because Office 2007 is not installed” or some such.

Well, fear not. With a little tweak to your addin’s project file, you should be good as gravy.

Here’s the untouched section from towards the bottom of a VSTO VBProj file. Notice the nasty line that starts with “VSTO_COMPATIBLEPRODUCTS”.

  <!-- This section defines VSTO properties that describe the host-changeable project properties. -->
      <FlavorProperties GUID="{BAA0C2D2-18E2-41B9-852F-F413020CAA33}">
        <ProjectProperties HostName="Word" HostPackage="{D2B20FF5-A6E5-47E1-90E8-463C6860CB05}" OfficeVersion="12.0" VstxVersion="3.0" ApplicationType="Word" Language="vb" TemplatesPath="" DebugInfoExeName="#Software\Microsoft\Office\12.0\Word\InstallRoot\Path#WINWORD.EXE" DebugInfoCommandLine="/w" AddItemTemplatesGuid="{2606E7C9-5071-4B63-9A83-C66A32B1669F}" />
        <Host Name="Word" IconIndex="0">
          <HostItem Name="MyAddin" Code="Connect.vb" CanonicalName="AddIn" CanActivate="false" IconIndex="1" Blueprint="Connect.Designer.xml" GeneratedCode="Connect.Designer.vb" />
          <VSTO_CompatibleProducts ErrorProduct="This project requires Microsoft Office Word 2007, but this application is not installed." ErrorPIA="This project references the primary interop assembly for Microsoft Office Word 2007, but this primary interop assembly is not installed.">
            <Product Code="{XX12XXXX-XXXX-XXXX-X000-X000000FF1CE}" Feature="WORDFiles" PIAFeature="WORD_PIA" />

The trick, it turns out, is to just comment that element out completely.

You’ll end up with this…

  <!-- This section defines VSTO properties that describe the host-changeable project properties. -->
      <FlavorProperties GUID="{BAA0C2D2-18E2-41B9-852F-F413020CAA33}">
        <ProjectProperties HostName="Word" HostPackage="{D2B20FF5-A6E5-47E1-90E8-463C6860CB05}" OfficeVersion="12.0" VstxVersion="3.0" ApplicationType="Word" Language="vb" TemplatesPath="" DebugInfoExeName="#Software\Microsoft\Office\12.0\Word\InstallRoot\Path#WINWORD.EXE" DebugInfoCommandLine="/w" AddItemTemplatesGuid="{2606E7C9-5071-4B63-9A83-C66A32B1669F}" />
        <Host Name="Word" IconIndex="0">
          <HostItem Name="MyAddin" Code="Connect.vb" CanonicalName="AddIn" CanActivate="false" IconIndex="1" Blueprint="Connect.Designer.xml" GeneratedCode="Connect.Designer.vb" />
          <!-- <VSTO_CompatibleProducts ErrorProduct="This project requires Microsoft Office Word 2007, but this application is not installed." ErrorPIA="This project references the primary interop assembly for Microsoft Office Word 2007, but this primary interop assembly is not installed.">
            <Product Code="{XX12XXXX-XXXX-XXXX-X000-X000000FF1CE}" Feature="WORDFiles" PIAFeature="WORD_PIA" />
          END OF COMMENTING -->

Happy VSTOing!

Word and the IsObjectValid Function

Filed under Office

One thing that can be maddening about programming against the Word Object Model is that objects have a habit of “disappearing” on you.

Say you retrieve a range, retrieve the fields in that range, then delete the range.

You still have references to those fields, but they no longer actually exist. Attempting to access any of the properties or methods on a field in this condition will cause Word to throw an exception with the message “That object has been deleted”.

Most code I’ve seen tends to handle these kinds of situations in one of two ways…

  1. Attempt to write the code such that if you’ve caused an object to become “deleted” that you simply don’t access it after that point. Sometimes this is easier said than done.
  2. Just wrap access to the object in a Try Catch, or On Error Resume Next. Test for any error condition and react accordingly. However, this has that nasty “the normal flow of code is through an exception” code smell. Not pretty.

Fortunately, there is third way.

The Word APPLICATION object has a function tucked away on it, called, appropriately enough, “IsObjectValid”.

Pass it a Word object model object and it’ll pass back true if the object is valid, or false if it’s been deleted.

Dim Rng = WordApp.Content
Dim Fld = Rng.Fields(1)
If WordApp.IsObjectValid(Fld) Then
   Debug.Print Fld.Code.Range.Text
   'the FLD object is no longer valid
End If

Yes, it’s been around for a good long while so this may not be news to everyone, but it’s a handy trick to have in your toolbox if you work with Word regularly.

Implementing Document.SaveCopyAs in Word

Filed under Office, VB Feng Shui

If you’ve used the Excel object model, youmay have discovered how incredibly handy the SaveCopyAs function is. Essentially, it allows you to save a currently loaded Excel spreadsheet into some arbitrary file, without altering the state of the loaded copy. In other words, if the user has altered the spreadsheet loaded into Excel, but hasn’t saved it, after a “SaveCopyAs”, that copy of the sheet is still considered dirty, and the saved file on disk does not have the user’s changes saved within it.

This is a very wonderful thing!

Basically, it means you can save off the spreadsheet as it currently exists in memory, complete with all the users edits up to this point, perform any operation on that saved copy that you want, and then either keep that copy or throw it away, and the copy that the user is currently working on is not affected in anyway whatsoever!

Good stuff.

But Word has never had it!

I’d rectified this long ago, in VB6, but recently had a need for this capability again, in this time.

Turns out, it’s incredibly simple to implement with, and much more intuitive to.

With .net 3.5, you can even create the function as an Extension Method, and directly add it to the Word Document object interface! Very cool!

The thing is, I knew I’d done this before, but couldn’t remember exactly how. After quite some time googling this, I came across a post where it was theorized it might be possible to implement SaveCopyAs by using the Compare function in Word.

Hmm, I’d never even tried the Compare, but it sounded interesting. After a few false starts, I ended up with this:

Imports System.Runtime.CompilerServices

<System.Runtime.CompilerServices.Extension()> _ Public Sub SaveCopyAs(ByVal Document As Word.Document, ByVal FileName As String) Document.Compare(Name:=Document.FullName, CompareTarget:=Microsoft.Office.Interop.Word.WdCompareTarget.wdCompareTargetNew) With Document.Application.ActiveDocument If .Revisions.Count > 0 Then .Revisions.RejectAll() .SaveAs(FileName, AddToRecentFiles:=False) .Close() End If End With Document.Activate() End Sub

Notice that I’m using the CompilerServices.Extension attribute to flag this as an extension method. This is what adds this method directly to the Word Document object and makes it so much more intuitive to use.

Essentially, the idea here is to:

  1. Compare the version of the document loaded in memory to the last saved version out on disk.
  2. Write the comparison result to a new document in memory (i.e. what becomes the active document)
  3. Check the revision count.
  4. If there are revisions, you know there’s been changes made to the document since it was last saved, so Reject all the revisions, save the active document and close it.
  5. If there were no revisions, the user hasn’t made any changes to the document since the last time they saved, so there’s really nothing else to do.
  6. Reactivate the user’s originally active document

The real trick here was step 4. Originally, I was accepting the revisions and then saving, and it wasn’t working at all. Eventually I traced the problem back to the comparison order. It turns out, when you perform a compare to another file from a loaded Document object in Word, the results of the comparison are the changes required to convert from the version of the document you currently have open in Word to the other document.

So, if the user has added the word “Nostradamus” in a paragraph, then the comparison would yield a revision that says, essentially, “remove the word Nostradamus from this paragraph”. This is exactly backward from the behavior you want. Turns out that all I had to do was “Reject all changes” instead.

Believe it or not, this works a dream.


I can’t imagine that performing a compare, especially on a large document, is going to be a particularly speedy process and I knew I’d done this before using a alternate interface that the Word Document object implemented, albeit undocumentedly so. I just couldn’t remember how…

Suffice it to say, eventually I stumbled across the long forgotten interface. That interface is the COM IPersistFile. It’s a basic interface COM typically uses when saving Structured Storage documents, which are what Word files usually are (though, fortunately, it works with Word 2007 for saving DOCX style files).

Turns out the Word Document object implements that interface. They just don’t make that fact common knowledge.

Using it is trivially easy, especially given that it’s a COM interface that the .net framework happens to define natively (though do note, you’ll need to Import the InteropServices.ComTypes Namespace as show below.

Imports System.Runtime.CompilerServices
Imports System.Runtime.InteropServices.ComTypes

<System.Runtime.CompilerServices.Extension()> _ Public Sub SaveCopyAs(ByVal Document As Word.Document, ByVal FileName As String) Dim pf = DirectCast(Document, IPersistFile)  
pf.Save(FileName, False) 'pf.SaveCompleted(FileName) End Sub

About that last commented line, that calls the SaveCompleted function. From the docs, it sounds like that is necessary, but I’ve never found a situation where it made a difference calling it or not.

Maybe someone can illuminate that aspect of the interface better than me?

But at any rate, the IPersistFile method is likely to be many times faster than the Compare method, especially for large files or files that the user has made many changes to without saving. Still, I thought the Compare method was interesting enough to warrant a mention.

So there you have it. SaveCopyAs for Word, implemented as an extension method that directly extends the Word Document object.

Yet more sweetness!

Mail Merge and Reports in Word

Filed under Office, VB Feng Shui

If you have a need to generate documents, and I mean lots of documents, you’re likely the eventually investigate the Word mail merge functionality.

It’s ok, but there’s lots of things that it can’t do, especially with respect to generating tables or dealing with table oriented data.

You next stop might be a report generator, like Crystal Reports. But the big problem with those kinds of packages is they require you to use a proprietary report template editor, usually band oriented (the typical “header/body with repeating lines/footer” type reporting system). Not terribly bad, but not great if you actually want to generate mail merged documents.

Word Documents as Templates

But there are alternatives out there. Two of the most intriguing I’ve seen are an open source package called FlexDoc and a commercial application called Windward Reports.

Both of them allow you to use standard Word documents as “report” templates”. But keep in mind, “report template” in this sense is a pretty open ended concept. You could generate everything from a mass mailing letter, to legal documents, to actual reports with tabular data, and even graphs and charts.

And both are lightning fast, since they work directly on the Word document file itself (or more precisely, the DOCX file format, older DOC format files aren’t supported).

The Commercial Product

Windward Reports is polished commercial product. It’s not cheap, but it’s incredibly flexible, supporting everything from XML to SQL data sources, charting, graphs, tabular report type elements, formatted content (ie insert a field into your Word document that consists of formatted HTML, for example), and a lot more.

Windward uses standard Word Mail Merge fields for its tags, specifically the AUTOTEXTLIST field. Those field types have been in Word since even before Word 2000, and it’s very unlikely they’ll change any time soon, so you’re pretty safe there.

One really nice element of Windward is that they have available a “tagging helper” addin for Word called AutoTag. It can really help speed up the design of your Word report templates. It’s optional. Technically, you don’t absolutely have to have it, but it’s a lot more painful to create reports without it.

The Open Source Project

Flexdoc is open source and not quite as polished as Windward. But it’s got much of the same functionality. It can connect to just about any data source, it’s fast, and it can do tabular data very well, but it can’t do charts or graphs, and it can’t handle formatted data (hmtl or otherwise) at all by default, though there are mods that can be made to improve that situation.

The biggest problem with Flexdoc is that it currently relies on the CustomXMLElements functionality of Word that has, as of Jan 10, 2010, been removed from the product because of the lawsuit with I4I.

The author of FlexDoc indicated that there were plans for a ContentControl-based version in the future, but that might be a while off. Still, it’s an open source project, so you could always throw in and help make those changes if you really needed them.

Using Word’s Find Object in Field Result Ranges

Filed under Office, VB Feng Shui

image If you’ve ever had to use the Word Object Model’s Find object, you probably know it can be a bit… well… finicky. It has guids that conflict with Excel 95 typelibs which causes problems, it uses a veritable forest of optional parameters, most of which are declared as OBJECT so intellisense doesn’t work well with it, it’s a real pain to use with C# because of C#’s lack of optional parameter handling, etc, etc.

And I have a new one to add to the list.

I was recently working on a kind of super advanced mail merge facility for Word documents. Word’s MailMerge capabilities are certainly good, but they couldn’t cut the mustard for this particular batch of requirements.

However, since the MAILMERGE field (and all the various other fields in Word) have a lot of capability, I wanted to leverage that as much as possible.

Essentially, the templates to be used would embed normal mail merge fields, but Word’s Merge ability would never get used. Instead, I’d have a VB.NET assembly that would navigate through each field in the document, analyze its code, and replace it with the appropriate result. The code looks something like this:

Dim Code = Field.Code.Text
'....parse the code for the field here
'     determine the final field contents, put it in Content

Field.Result.Text = Content

Pretty simply stuff, really.

But, the content of the field might have embedded style information. Sort of like a stripped down HTML code system. For instance, <b> might be “Turn bold on”, and </b> would turn bold off.

First thought

Initially, I thought I could just detect the existence of the opening tag, remove all the tags, and just format the entire field consistently. It’s simple and fast. But unfortunately, some field values had tags embedded within the field, meaning that only a portion of the field’s data should be formatted.

Strike 1.

RTF??? In 2010?

Then, I figured I could just swap out the formatting code with the equivalent RTF codes, and use PasteSpecial to paste the RTF formatted data directly in the field.

While this did work, it presented a bad side effect. Since the RTF text coming off the clipboard represented specific and complete formatting information, it overrode all default formatting already applied to the field. So, say the entire document was formatted in Calibri Font. When I pasted the RTF text “{\b My Text Here}”, the end result was a field formatted in bold Times New Roman, not Calibri. So even though I hadn’t specifically set the font for the text, the paste special was assuming a default of Times.

Strike 2

That ol’ Word FIND Object

I finally realized that I was going to HAVE to use the FIND object in Word to locate those formatting marks, and then apply the applicable styles to the found text.

Performing the Find is pretty straightforward, although there’s a few things to watch for.

The first is the aforementioned Excel 95 Guid clash. What it means is that you need to access the FIND object via late binding. So obtain a FIND object from the (Field.Result range) and store it in an OBJECT variable. That will force to make latebound calls to it.

The next problem is how to actually find the marks. You’ll need to use the Find object’s wildcard support for that. The code looks something like this:

Dim Find as Object = Field.Result.Find
With Find
   If .Execute(FindText:="\<b\>*\</b\>", MatchCase:= false, MatchWildcards:= True, Forward:= True) then
      'the text was found
      'The text was not found
   End If
End With

Again, pretty simple stuff.

But there was two problems.

The first was, the above would find the text alright, but I still needed to remove those embedded formatting marks. After a false starts, I realized that the REPLACE functionality was could do this very easily. Just mark the * (i.e. the found text that you want to keep) by enclosing it in “()”, then Replace with “\1” (the first marked substring).

Dim Find as Object = Field.Result.Find
With Find
   If .Execute(FindText:="\<b\>(*)\</b\>", MatchCase:= false, MatchWildcards:= True, Forward:= True, Replace:=wdReplace.wdReplaceAll, ReplaceWith:="\1") then
      'the text was found
      'The text was not found
   End If
End With

The second problem was much more insidious.

It worked perfectly on fields embedded in the body of the document, but completely failed to find anything in field embedded in cells in a table. Needless to say, I was scratching my head over that one for a while.

It turns out that Find just doesn’t work right if you attempt to use it in the Result range of a Field  object when that object has been embedded within a table.

The trick, then is to Unlink the field object before you attempt a find inside of it.

Dim Rng = Field.Result
Dim Find = Rng.Find
With Find
   '.... continue as before

With the range no longer within the field, the Find works just like it should, regardless of whether the field is in the body of the document or embedded within a table.

So there’s my terrifically arcane Office object model trivia for the day!

.NET Interoperability with COM Office Apps, a Hidden Gotcha

Filed under .NET, ActiveX, Office, Troubleshooting, VB Feng Shui

image I’ve been working with .NET and Office interoperability for some time now, and, other than a few minor hiccups here and there, things have generally been very smooth.

But a colleague recently ran into a problem that required a good deal of hunting to resolve. Long story short, even though .NET interoperability with Office objects is virtually foolproof, releasing your references has some hidden dangers lurking amongst the weeds.

As a general rule, it’s safe to let the .NET garbage collector (GC) do it’s thing while your application is running.

BUT, when you’re app shuts down, you need to take a few extra steps to make sure everything’s cleaned up before quitting.

The details are laid on in this article on MSDN. It’s from 2005, but, from what I can tell, it’s just as applicable now as it was then.

Essentially, the trick boils down to this:

When you’re preparing to shut your application down, you need to make SURE you’ve released all your references to any Office objects. This means setting all your various references to Nothing, or, more typically, letting the GC do its thing.

Unfortunately, the GC doesn’t necessarily do its thing during a shutdown, so you have to force the issue. And that requires code essentially like this:

WordApp = Nothing 

The article noted above doesn’t really go into why it’s necessary to call Collect and WaitForPendingFinalizers twice, but it does appear to be required. I’m guessing it’s due to the way the Finalizer sweeps object references; it could be possible for it to release objects in a certain order, causing it to skip some objects. But that’s just a guess.

And as a final note, if your .NET code uses any COM interoperability, it might be a good idea to do some final cleanup along these lines as well.