I’ve been working on getting MSBuild setup and configured to handle some Continuous Integration builds for our company. One task that came up was needing to get a large batch of files from our TFS server and pull them down to the appropriate directories on a local machine.
But here’s the catch. I didn’t want to just pull everything from those folders down. Some folders contain very large files that didn’t really need to come down at all, because the intent was that the build process would be building them.
So, it would have been relatively easy to use the MSBuild Extensions pack to get all the files recursively in my project.
<Target Name="GetInstallFileSet"> <MSBuild.ExtensionPack.VisualStudio.TfsSource TaskAction="Get" ItemPath="..\InstallFiles" WorkingDirectory="$(MSBuildProjectDirectory)" Recursive ="true" Force ="true" /> </Target>
The WorkingDirectory parameter indicates where TFS will base all relative file specs from. The ItemPath indicates the folder location (relative to where the MSBuild proj file is located that this Target is in) that TFS should retrieve. The TaskAction of GET just retrieves all files in that folder, and the Recursive parameter tells TFS to get all files in that ItemPath and all it’s subfolders.
Pretty simple, but if there were large files anywhere in the path, TFS will obligingly retrieve them as well, which could suck up a lot of time and bandwidth.
So, first, how to specify only those files that I really want to retrieve? Easy, just use an ItemGroup.
<Target Name="GetInstallFileSet"> <ItemGroup> <InstallFileSet Include="..\..\Install; ..\..\Install\System; ..\..\Install\OtherFiles; ..\..\Install\DiskSet"> </InstallFileSet> </ItemGroup> </Target>
The InstallFileSet ItemGroup will end up with these specific folder names (all relative to the path to the proj file the target is defined in). Unfortunately, I don’t see any straightforward way to “leave out” specific files, because using any wildcard specs when defining this ItemGroup would be based on files that already exist on the local workstation, and hence we run the risk of NOT getting newly added files that exist in TFS but not locally.
But, if those files happen to live in specific subfolders, we CAN leave out those subfolders from the list in the INCLUDE attribute of the Item definition above.
Ah, but what about recursion? In the above case, I specifically DO NOT want to recurse down from the first path in the list (..\..\Install), but I DO want to recurse on all other paths.
That’s where the ‘metadata’ aspect of MSBuild comes into place. Modify the ItemGroup slightly.
<Target Name="GetInstallFileSet"> <ItemGroup> <InstallFileSet Include="..\..\Install> <Recurse>false</Recurse> </InstallFileSet> <InstallFileSet Include="..\..\Install\System; ..\..\Install\OtherFiles; ..\..\Install\DiskSet"> <Recurse>true</Recurse> </InstallFileSet> </ItemGroup> </Target>
Notice that I’ve split the InstallFileSet item into two pieces, and added the Recurse attribute to each piece.
Now, all the items are still in the single InstallFileSet ItemGroup, but one has a Recurse property of false, the others have it set to true.
Using the ItemGroup
All I have to do now is indicate how to use the ItemGroup that I’ve defined
... <MSBuild.ExtensionPack.VisualStudio.TfsSource TaskAction="Get" ItemPath="@(InstallFileSet)" WorkingDirectory="$(MSBuildProjectDirectory)" Recursive ="%(InstallFileSet.Recurse)" Force ="false" All="true" /> </Target>
Adding this one TaskAction=”Get” line will no perform the Get from TFS on all those Items.
So I ran the build, and… Fail.
TFS gave me a “parsing” error on the ItemName argument. Apparently, it doesn’t like being passed a long list of semicolon delimited path names.
Gah. This is where the UnBatching comes in.
Batching Explained
As a build engine, MSBuild will generally try to “batch” multiple items together into one processing “call”, so that as few invocations of that call as possible are made, under the assumption that fewer invocations will be faster.
Unfortunately, in some cases, you really don’t want the processing batched. In this case, batching would cause multiple paths to be supplied to one invocation of TF.exe (the TFS client app), which, as I mentioned, doesn’t know how to deal with that.
The key to understanding batching is MSBuild will batch by default, but will separate out batches based on two things
- Any metadata attributes associated with the batch items
- How you reference the items in the batch.
1) is easy. In the above example, I’ve got two values of the metadata tag “Recurse”, so MSBuild by default will execute the GET task twice, once, with the Item(s) with Recurse=true, and once for those with Recurse=false.
But 2) is harder to grok. As it is now, I’m using ItemPath=”@(InstallFileSet)” to reference the itemgroup. That allows MSBuild to batch the items, but then it splits the batch by the Recurse attribute.
However, if I reference the items by the built in identity metadata tag, MSBuild will consider each item in the group has having unique metadata associated with it, so it will execute the GET task once for each item, do that by using %(InstallFileSet.identity) instead, as in:
... <MSBuild.ExtensionPack.VisualStudio.TfsSource TaskAction="Get" ItemPath="%(InstallFileSet.identity)" WorkingDirectory="$(MSBuildProjectDirectory)" Recursive ="%(InstallFileSet.Recurse)" Force ="false" All="true" /> </Target>
If you build now, you’ll notice that the TF.EXE process is launched once for each path in the InstallFileSet ItemGroup, and that the Recurse attribute is applied appropriately depending on which item is being processed.
Conclusion
MSBuild is definitely powerful. In some ways, it’s simpler to deal with than the MAKE/NMAKE systems of old, and the source proj files are definitely more flexible in how they can be used.
But, many of the more advanced functions (that you’ll end up needing quite quickly) can leave you scratching your head. And the available documentation and examples often don’t help much.
The key is patience and experimentation.