BigGit: Git for large and tall TFS Repositories by Matt Wrock

bIGgitLogoI work in Microsoft’s Developer Division (devdiv). We are the ones that make, among other things, Visual Studio. Our source control is in a TFS Repository (we make TFS too). A very large TFS repository and likely the largest in existence. I really don’t know how big exactly. I do know that the portion I work with is about 13GB and spread over dozens of workspace mappings. Before moving to this part of Microsoft, I had been a Git and Mercurial user for the past three years and learned to love them. My move to DVCS was more than an upgrade in tooling, it was a paradigm shift that transformed the way I work. I knew I would use TFS moving to devdiv but just figured I would use a bridge like git-tfs. After moving to my new TFS home and spending some time intentionally becoming familiar with TFS, I decided to break out git-tfs and get back my branching and merging goodness. It didn’t work. Then I became sad and then desperate. As I progressed through the various Kubler-Ross stages of grieving, I never did reach the final stage of Acceptance. This is the story of my journey that led me to create BigGit.

My hope for the reader

My hope for you, dear reader, is that this will be just a story and that you will have no personal need for BigGit. Unless you are a coworker, chances are you wont. But I do know there are others out there working in unwieldy TFS repos and wanting but unable to use git. Chances are if you are dealing with a repository this large, you are likely experiencing friction points that transcend just source control tooling.  You may come home each night feeling like your nipples have been rubbing against heavy grade sandpaper all day long. Unless you are one who likes that kind of thing, and I mean no disrespect if you do, hopefully you can apply some BigGit and soothe that chafing.

If you are interested in reading more about my personal experiences and impressions as I migrated from traditional centralized version control to DVCS, please read on. If you just want to quickly learn of a way you can possibly get your large TFS repo under git, then go to and you will find lots of documentation on how it works and how to use it. I hope it helps.

My Introduction to DVCS

About three years ago, I reluctantly entered the into the world of diversified source control. I had been using SVN and was very happy. It had only been a couple years since I had moved off of Visual Source Safe (VSS) so I was easily pleased. I was finally feeling competent with the basics of branching and merging which were just bastardized and deformed children in VSS but first class concepts in SVN. When my organization adopted Mercurial in 2011, it was clear to me that it was the right decision. It was not difficult to read the tea leaves even then and see that DVCS was clearly the future.

The first month of using Mercurial was painful. I not only had to learn new commands but I had to internalize a different perspective on source control. This was not SVN. Everyone had their own complete repo and not just a “working folder” and then you had to deal with three way merges that took a bit to groc. There were times when I wondered if it was worth it and wished I could have my familiar SVN back. However I got the sense that I was working with something very powerful and if I could master it, I’d have a lot to gain.

I was not alone. In a team room with four developers, I probably ranked #2 in satisfaction. During our first month of ramp up mistakes were made. The kind of mistakes most will make and this post can not keep you from. You will try to do something fancy because you can. Something like a partial roll back or cherry picking commits and you will make some mistake that either deletes history or puts your repository into some weird and esoteric state that takes you hours to dig out of. You will want to blame the new stupid source control tool, and perhaps you will, to make yourself feel better.

Switching to the Command Line

As a source safe, SVN and early Mercurial user, I primarily used GUI tools to manage my source code. As DVCS systems go, Mercurial has a decent GUI. However, as I learned more about mercurial and delved into troubleshooting adventures, I became more uncomfortable with the way that the GUI tools hid a lot of the operations from me. When I click the “sync” button what really happens? After all, there is no Mercurial “Sync” command. I roughly knew about the concepts of pull/fetch/update but the Gui hid the details from me. This was all well and good until something goes wrong and as you delve into the problem, you find that you really do need to understand these concepts if you are going to work in this system day in day out. Perhaps the more casual user can cope with a GUI, but working in an active team of developers on various branches, I really felt a craving to understand how the system worked.

So I opened up the command line, put the HG directory in my path and never looked back. I personally believe this was not only a practice that improved my Mercurial competence, but it was the gateway to command line competence leading me to powershell and making me a better and more productive developer. And when we become better and more productive, we become happier.

When people new to DVCS would come to me with problems, they were almost always using a GUI exclusively and had bumped into something that they can possibly solve with the GUI but will not understand the problem using the artificial and dumbed down abstractions that the GUI enforces. In order to properly articulate the problem and solution, one needs to understand the commands and switches the GUI is calling. I always tell people to put down the GUI and then they look at me with crazy eyes. I love the crazy eyes. After their next couple blunders that we all make, I see that command line on their screen.

Its not so bad the command line. Quickly you find that most days you use no more than a half dozen commands anyways and you find that the GUI does not save much time and as you become more competent, you do use the GUI where it shines and stick to the command line for the rest.

DVCS Bliss: Branching and merging and time traveling

So what made me become such a firm advocate of Mercurial and eventually Git? There are so many things and I’ll quickly list some here:

  • Most operations are faster because they are carried out locally. Commits, branch creation and switching, examining diffs and history – these are all local operations that do not need to make a server call.
  • Getting a repo up and running is trivial. When I worked in the MSDN org, the msdn/technet forums, profile pages, search code, gallery platform (Visual Studio Gallery, MSDN Samples, etc) and more were using Mercurial. The “central” repository that everyone pushed to and the build system pulled from was maintained by the development team. Being one of the primary administrators, I can safely say that I spent about 5 minutes a month doing some kind of administrative task and it never went down or suffered performance issues. This is due to its simplicity. Its just a file.
  • The local repository is portable. I can move it around or delete it and there is no “server” component that cares about or tracks its location.

These are nice benefits but the biggest value to me of DVCS is the ease and speed of creating branches.This is another area where git outshines Mercurial. While branches are just as fast and easy to create on both, you can easily delete them in git while they can only be “archived” in Mercurial. Both git and Mercurial support excellent compression algorithms and merge tracking strategies. Creating branches in either is incredibly light weight from a disk space perspective. Since all branches are local, I can switch from branch to branch without an expensive server call to reassemble the branch.Using Mercurial or git, having 2 1GB branches that have only a few files of differing code will probably take up not much more than 1 GB for both branches. While TFS requires me to keep a full copy of both.

The merging algorithms of Git and Mercurial are also better than either SVN or TFS. One of the things that delighted me almost immediately when I made the move from SVN to Mercurial was that Merge conflicts dramatically reduced. I have had similar experiences with TFS and git. In systems like TFS and SVN, you might think twice about branching because of the overhead and friction they can produce, while git/mercurial makes it so easy that branching is often central to the workflow in these systems. It is very empowering to know that at any moment I can create a branch and do a bunch of experimentation. Later I can discard the branch or merge the changes back to my original branch. It is also easy to share branches. In a single command I can “push” a branch to a central repository where my fellow contributors can pull it and view my changes. Heck, you don’t even need a central repo, you can push/pull peer to peer.

Git and TFS interoperability

Clearly git has become widely popular among the greater developer community. Many developers find themselves using TFS during the day and git in their off hour work. For those who work in organizations using TFS, there are tools like git-tfs or git-tf that allow one to work in a git repository and bridge changes back and forth between git and tfs. This allows one to target a TFS server path and clone it to a git repository. They can do their day to day work in the git repo and then “checkin” to TFS periodically. These “bridging” technologies typically work great and make the transitions between git and tfs fairly transparent. I work on a couple small and isolated pockets of our TFS repository at work and I use git-tf to clone those areas of tfs to git. Once the directory is cloned, I am in a normal git repo. There are a lot of developers who work like this without a problem.

Large or dispersed repositories

There are a couple scenarios where this interoperability does not work so nicely. One is if your TFS workspace has a lot of mappings. Both git-tfs and git-tf can only clone a single TFS server folder. This is fine if all of your mappings fall under a single root folder that is of a manageable size. However, these multi mapping workspaces often have so many mappings precisely because the root is too large to map on its own. In my case, my 500GB hard drive would run out of space before I could finish syncing my root folder.

This touches on the second scenario preventing git to play nice with large repositories. Now large is a relative and subjective term. Lets qualify large as greater than 5 GB. The raw size is not the key issue but rather the number of files. Many of Git’s key operations perform a file by file scan of the repository to determine the status of each file. If you have 100s of thousands of files, this can be a real bummer. Keep in mind that git was built for the purpose of providing source control for the Linux kernel which is roughly about a couple hundred MB. This provides great performance for the vast majority of source repositories in the world today, but if you work in a repository like mine the perf can be abysmal.

Repository Organization

If you look at some large enterprise TFS or P4 repositories, it is not uncommon to find everything placed under a single repository. That works just fine for these systems and it can be easier and simpler to administer them this way. Because these are CVS (centralized versioning systems) based systems, the server acts as the single authoritative keeper of the assets. This server copy is not a file system but a database. In a large enterprise model where teams of DBAs administer the corporate databases, it makes sense to keep everything in a single data store. 500GB is not a large database and putting everything in one repo simplifies a lot of the maintenance tasks. Most CVS systems provide a protocol where the user often specifies the set of files to be acted upon. By doing this, there is no need to do a file scan because you are telling the system which files will be affected.

Of course many git operations can have individual files specified and many TFS commands can act recursively on an entire repository. However key operations like git status, commit and blame need to determine the status of each file in the repo. And on the TFS side, users become trained NOT to do a TF GET . /r on the root a 50GB folder. Do it once and trust me, you won’t want to do it again.

Having worked under both models, I do think there is a lot more to be said for keeping a repository small than just improved source control performance. It limits the intellectual territory of concern. For someone like myself with severely limited intellectual resources, this really matters. Honestly, this has a tangible impact on our ability to groc source. Here are some scenarios to consider:

  • You need to find code related to a bug. I often find myself looking for a needle in a haystack unless I am very familiar with the nature of the bug and the related source. It is just too easy to end up with a less than tidy code structure with a single tree.
  • In this “shared code” folder, what do I care about? If a single repo is maintained under a large division, there might be one or a small number of TFS folders for shared assets like runtimes, dependent libraries, and build tools. Eventually these folders can become huge and often consume the bulk of space. Eventually no single person or team can keep track of everything in it. One might map to this folder but have no idea what they actually need or don’t. Eventually in TFS, one might create “cloaks” to hide particularly large subdirectories but still have a long tail of perhaps hundreds of small unused directories that consume gigabytes of space. Who wants to create hundreds of cloaks or a hundred individual mappings for just the directories I need?

Of course one can organize a TFS repository into logical subtrees and that’s great if you can. However, it is just too easy to cross intellectual boundaries and having several repositories is a nice forcing function to keep the assets organized. Even if this means some duplication, I think that is a perfectly acceptable compromise for being able to quickly pull the code I care about. Disk space is much cheaper than developer hours spent looking for code.

Some possible ways to work with large git repos that did not work for me

So months ago when I began my investigation into how to get my huge TFS workspace under git control, I came across a lot of advice to individuals in similar circumstances. This advice most commonly was one or a combination of the following.

Garbage Collect the Repo

Think of this as a defrag tool for git. Over the course of working in a git repository, objects will become orphaned as they are dereferenced. For example, lets say you want to discard the last several commits with a git reset –hard or if you delete a branch. These objects do not disappear immediately. It may be weeks before git reclaims unreferenced objects. So it is possible that you have a repository with a bunch of stuff that will never be used again and can be discarded. If you want to invoke the git garbage collector immediately and trim away al unused objects, then call:

git gc –aggressive

This may take several minutes or even hours on a very large repository. It is possible that this can improve performance but if you find performance is poor right after a clone, it is unlikely that a GC is going to yield much of an improvement.

Sparse Checkout

This is somewhat new as of git 1.7. It allows you to clone a repository and then specify a part of the tree that you want to keep and git will eliminate the rest from the index and working tree. This may potentially solve some scenarios but it did not address my problem because I needed all 13GB of source.


Submodules are separate git repositories that you can include in your repository.The intent here is for including libraries in your project that have their own git repository. You may be working on Project Foo that uses the Bar library. Bar has its own git repository that you want to contribute to and get changes from within your Foo repository. So you clone Bar into a subdirectory of Foo. This can work fine especially for the scenario it was intended for.

I didn’t want to have to maintain each second level directory as a separate repository. This can be especially awkward when branching. Each submodule has to be branched separately. I cant just branch the top level and expect everything below to be branched. Sub modules are really intended for working with logically separate libraries. Now if my large repository was structured in such a way where I did my own work in one collection of subdirectories and another team worked exclusively in their own dedicated subdirectory, Submodules may make more sense, but that was not the case for me.

Splitting the repository

Now we come to the solution that I finally settled on. After spending a couple months trying different things with Git-tf, git-tfs and an internal Microsoft tool, experimenting with various combinations of the above and not being satisfied with anything, I pretty much gave up trying to work in git. I found I was wasting too much time “fighting” the system and needed to move on and actually get some work done.Then about 2 months ago I had a flash of insight into an idea I thought might work and it did! This solution has one major assumption that must be true in order to work: a significant chunk of your TFS repository includes content that you do not regularly or directly contribute to and does not change frequently.

Well over 80% of my repository contains external libraries, build tools and native runtime bits. This is content that rarely changes and is not source that I contribute to or need to pull from regularly. Once I have the C runtime version X, I don’t need to sync that everyday. Part of this discovery emerged as I learned more and more about the nature and content of our repository.  As a new member to the org, staring at a repo with Millions of files not to mention trying to understand a new product, I really had no idea what I needed and did not need. I was pointed to a set of workspace mappings that I was told was what I needed and worked with that. Over time, I became more familiar with various parts but large swaths were still a mystery to me. I took a weekend and with the help of some sysinternals tools like process monitor and watching that as I ran several builds, I learned what I needed to know: what bits does my build actually use and how does it use them.

Roughly this is what I did:

  • I created one TFS workspace and added all mappings from my original workspace that contained files my team does not contribute to and rarely change. I call this the Nonvolatile workspace.
  • Create another TFS workspace for the remainder of the mappings that me and the larger team actually churn on and need to be synced regularly. I call this the Volatile workspace.
  • Ideally I’d like a git repository that tracked all files in the Volatile workspace but I have two problems: 1) Git-Tf and other bridging tools do not map to the mappings in a workspace but need to map to a single server path (like $/root/branch/path) and 2) I need the contents of both workspaces in one directory tree so that they can reference each other correctly. So the following steps take care of this.
  • I created a TFS “Partial” branch in my own feature area of my Volatile mappings.This is a way to create a branch that branches just a selective number of folders from the parent folder. This is documented by my coworker Chandru Ramakrishnan and you can find more examples on creating partial branches in the BigGit source which creates these for you. This solves the first problem above and allows me to map a git repository to a single TFS folder $/root/branches/mybranch and get just my Volatile content.
  • I used Git-Tf to clone my partial branch folder to a new Git repository. However I cant build from this since it is missing all the dependencies in my Nonvolatile workspace.
  • Next for each top level directory in my Nonvolatile workspace, I created a symbolic link in my git repository and I added each of these directories to my .git\info\exclusions file. This is the equivalent of the .gitignore file but it is not publicly versioned.and kept private to my own repo. This solves the second problem by putting all my content in one tree but git does not need to see the nonvolatile content.
  • Lastly I want a git remote where I can push/pull all my git branches. I don’t feel safe keeping it all on my local machine. Git-Tf only syncs the master branch with the TFS server. So I clone my git repository to a file server //someserver/share and I make this my Origin remote.

Here is a diagram to illustrate this configuration:


My volatile content that is now tracked by git is about 1.5 GB. Yes, a big repository but still very much manageable in git. This is great! Look at meeee! I am working in Git!! But things are still a bit awkward.

  1. That was a lot of steps to set this up. What if I want to do the same on another machine or help others with the same setup. I want to be able to repeat this without errors and having to look up how I did it.
  2. Syncing changes between my Git repo and the parent TFS repository is a lot of work. I have to sync using Git-Tf with my partial branch and then do a TFS merge from my partial branch to the parent folder. If I want to get the latest changes from TFS I need to do a GET in the parent workspace, merge that to my partial branch, check the merge in and then use Git-Tf to pull my partial branch changes into git. Wow. Now I need a nap.

Automating away complexity with BigGit

BigGit is a powershell module I wrote to solve the setup and syncing of this style of repository layout. By importing this module into your powershell session you can either use several of BigGit’s functions to craft your repository or more likely what you will want to do is use the convenience Install-BigGit function to setup your repo in one step. You just need to know which mappings to direct to your volatile and nonvolatile workspaces and then BigGit will create everything for you. Even if your parent workspace is relatively small but requires a bunch of mappings, you can omit the nonvolatile mappings and BigGit will just setup all mappings as volatile and solve the multi mapping git mapping problem. Here is an example of using BigGit to setup your repository:

$myWorkspace = "MyWorkspace;Matt Wrock"$serverRoot="/root/branch"
$tfsUrl = "http://tfsServer:8080/tfs"

$NonVolatileMappings = Invoke-TF workfold /collection:$tfsUrl /workspace:$myWorkspace | ?{ 
    $_.ToLower() -match "(

$VolatileMappings = Invoke-TF workfold /collection:$tfsUrl /workspace:$myWorkspace | ?{ 
    $_.ToLower() -notmatch "(

Install-BigGit -tfsUrl $tfsUrl `
                     -serverPathToBranch "`$$serverRoot" `
                     -partialBranchPath $/root/mypartialbranchlocation `
                     -localPath  c:\dev\BigGit `
                     -remoteOrigin \\myserver\gitshare `
                     -NonVolatileMappings $NonVolatileMappings 
                     -VolatileMappings $VolatileMappings

Here we compose our mappings into separate powershell variables and then use the Install-BigGit function to set everything up. Depending on the size of your original workspace, this could take hours to run but it should be a one time cost. Once this is done, You can sync your changes to TFS using one command that automates all the git-tf syncing and TFS merging discussed above:

Invoke-ReverseIntegration "some great changes here"

This DOES NOT do the final checkin to TFS. It stops at merging the changes into your parent TFS workspace. I thought it would be more desirable to let the user do the final TFS checkin and sanity check the files being checked in.

To pull in the latest changes from TFS use:


It is easy to install BigGit. While you are welcome to download or clone the source from codeplex (and contribute too!!), you may find it easier to download via Chocolatey:

iex ((new-object net.webclient).DownloadString(''))
."$env:systemdrive\Chocolatey\chocolateyinstall\chocolatey.ps1" install BigGit

This will download just the module and modify your profile so that BigGit functions are always accessible.

If you want to find out more about BigGit, view the source code or discover the functions it exposes and read their command line documentation, you can find all of that at the codeplex project site. I hope this might help others that find themselves in the same predicament I was in. If you think that BigGit could stand some enhancements or that any analysis in this post is incorrect or lacking, I welcome both comments and pull requests.

Related reading

Why Perforce is more scalable than Git by Steve Hanov

Facebook’s benchmark tests using Git

Is Git recommended for large (>250GB) content repositories on StackOverflow

What to do when you cant access Sql Server as Admin by Matt Wrock

This happens to me on a particular VM setup framework we use at work about every couple months and I always have to spend several minutes looking it up. Well no longer I say. I shall henceforth document these steps so I will never have to wander the internets again for this answer.

Stop-Service mssqlserver -Force& 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn\sqlservr.exe' -m

sqlcmd -S "(local)"
sp_addsrvrolemember 'REDMOND\mwrock', 'sysadmin'

Start-Service mssqlserve

This Stops sql server, then starts it in single user mode. You will then need to open a second shell to run the login creation script. When the user is created in the correct role, go back to the shell running the sql instance and exit via ctrl-C. Finally start sql server normally and you are good to go.

Unit Testing Powershell and Hello Pester by Matt Wrock

It has been a long time since I have blogged. Too long. I have changed roles at Microsoft moving to Cloud Developer Services and have taken on two new Open Source projects: Chocolatey and Pester, joining as a project committer to both. RequestReduce is still alive but has gone far under nourished. Fortunately it is pretty stable except for a few edge cases and I hope to get back to enhancements and fixes soon. All these commitments in addition to my Boxstarter project keep me pretty much working nearly 24/7. Its been a lot of fun but its not a sustainable life style I can recommend.

A year of Powershell

This has been my year of Powershell. I was introduced about three years ago and was immediately impressed with its power and also a bit daunted by the learning curve. Unless you work with it often, some things are hard to get used to and not obvious to the curmudgeon C# developer like myself. About a year ago, I decided that I was tired of complaining about my organization’s deployment practices and began to revamp the process from a 50 page document of manual steps to a few thousand lines of powershell. I learned a lot.

You can pretty much do anything in Powershell. There really is no excuse for manual deployments. If you must have manual steps, then there is something wrong with your process or tool set.

The Pain of a large Powershell Code Base

One interesting thing about powershell is that it lends itself extremely well to small but powerful functions and modules. Once you reach a certain level of competence, its almost addictive to automate everything and add lots of scripts to your profile. Over time, these small scripts can grow and I have now worked on about a half dozen powershell projects that have become full blown applications.

As a Powershell application becomes larger and more complex, the pain of no compiler (one might call this the first unit test) and lacking unit testing ecosystem begins to rear its ugly head. As a C# developer practicing TDD in my day job, I have grown to appreciate TDD (test driven development where you write tests before implementation code) as a design tool and as a productivity tool. Without it, it is easy for a growing code base to get sloppy and to get mired down in regression bugs as you add features. I like to think that V1 (or at least the first prototype) always ships faster without tests but with each release, the lack of strong Unit Test coverage slows feature work exponentially. After several years, where Unit tests are done later if at all, its easy to end up with a maintenance nightmare where teams are simply chasing bugs.

Powershell is not immune and needs and deserves a strong Unit Testing toolset.

Unit testing Powershell? Seems a bit overkill doesn’t it?

Maybe. I do think there is a place for the ability to rattle off ad hoc scripts without the compelling need for unit testing. What also makes Powershell unique is that it tends to work in the domain of “external dependencies” that are typically mocked out in traditional unit testing. Look at most powershell scripts and you will find that databases, file systems, the registry, large enterprise systems are central areas of concern. How do you unit test something that simply acts to glue these things together?

So maybe you don’t. I’m not going to advise that every line of powershell needs to be Unit Tested. However, once you start writing scripts with lots of conditionals and alternate paths and flows, Unit Testing is a must I think and its all too often that more time goes by without tests to make catching up painful. So if you think your powershell app is going to hold some weight, its best to start  unit testing right away. Of course in languages like C#, Java, and Ruby this is a no brainer. There is a rich set of tools and vibrant communities committed to polishing unit testing patterns. There are a countless number of unit testing frameworks, mocking frameworks, test runners and now “continuous testing” tools to choose from. Powershell?…Not so much. The tool is new, the domain is often difficult to test and a large part of the community lack a unit testing background.

How do you do it?

Its at this point where we should all bow our heads and offer up a minute of silence as we submit warm rays of gratitude toward Scott Muc (@scottmuc). Scott created a great tool called Pester that is a BDD style Unit Testing framework for powershell. I do know a bit about BDD (behavior driven development) but not enough to pontificate about it on the internets. I think that’s ok. Regardless, I’m going to demonstrate how you use Pester to test powershell scripts.

Chocolatey: A Pester Case Study

For the past few months, I have been an active contributor to Chocolatey – an awesome Windows application package management command line app originated by Rob Reynolds (@ferventcoder) who is one of the very early team members of what we now call Nuget back when it was called Nu and on Ruby Gems. In fact, chocolatey is largely developed on top of Nuget.

Chocolatey is written in 100% powershell. As many applications go,it is a relatively small app, but it is a fairly large Powershell application. Chocolatey deals a lot with communicating with Nuget.exe and also managing downloads from various package and download feeds. So there is a lot of “heavy machinery glue” tying together black boxes but there is also a lot of raw logic involved in deciding which black box to attach to what and which entry points to use to enter the black boxes. Do we call Nuget or WebPI or Ruby Gems? How do we deal with the package versioning APIs, are we raising appropriate exceptions when things go wrong? This is where unit testing helps us and where Pester helps us with unit testing. We only have about 125 tests right now. We got started a little late but we are catching up.

Lets Write a Test!

Describe "When calling Get-PackageFoldersForPackage against multiple versions and other packages" {  $packageName = 'sake'
  $packageVersion1 = '0.1.3'
  $packageVersion2 = ''
  $packageVersion3 = '0.1.4'
  Setup -File "chocolatey\lib\$packageName.$packageVersion1\sake.nuspec" ''
  Setup -File "chocolatey\lib\$packageName.$packageVersion2\sake.nuspec" ''
  Setup -File "chocolatey\lib\sumo.$packageVersion3\sake.nuspec" ''
  $returnValue = Get-PackageFoldersForPackage $packageName
  $expectedValue = "$packageName.$packageVersion1 $packageName.$packageVersion2"

  It "should return multiple package folders back" {

  It "should not return the package that is not the same name" {
    foreach ($item in $returnValue) {

Here we are asking Chocolatey to tell us what versions it has for a given package. If Chocolatey has three packages, two being separate versions of the one we are asking about and the third for an entirely different package, we would expect to get back the two version packages for the package we are querying. There are several more tests around this feature alone but we will focus on this one.

The first thing to call out is the Describe block. This often contains the “Given” and the “When” clauses of BDD’s Given-When-Then idiom. The Chocolatey team are not BDD purists.Otherwise the test might read:

Given a caller for package folders

When there are multiple versions of that package

Then it should return multiple packages

This might also be considered the “Arrange-Act” of the Arrange-Act-Assert” pattern. Here we are setting up the conditions of our test to match the scenario we want to replicate. Finally the “It” block performs our validation. It’s the “Then” of Given-When-Then or the “Assert” of Arrange-Act-Assert. If the validation does not resolve, then our test should fail.

Make it simple, use lots of functions

As I mentioned above, I see Unit testing first and foremost as a design tool. This is largely because it is down right painful to test code that follows poor design principles. One way to make your powershell scripts a bear to test is to have long running procedural scripts. As it so happens, it also makes the code a bear to maintain and add features to. So just like in other languages, use the power of encapsulation and the Single Responsibility Principle. The pattern that the Chocolatey code follows is to break the code up into a file per function and another file for all the tests of that function. The actual functions are dot-sourced into the test file.

At the top of most test functions, you will find:

$here = Split-Path -Parent $MyInvocation.MyCommand.Definition
$common = Join-Path (Split-Path -Parent $here)  '_Common.ps1'
. $common

This dot sources _common.ps1 which is responsible for dot sourcing all of the chocolatey functions and includes some other test setup logic.


I personally got involved with the Pester project when I wanted a true Mocking framework. We were stubbing out all of the Chocolatey functions and used flags to signal whether the real function or a stub should be invoked. This got very tedious to setup and maintain. The Chocolatey code base had well over a 1000 lines of this stubbing code. Given the dynamic nature of Powershell, it seemed to me that it would be straight forward to create a true Mocking framework. It also looked like a lot of fun (the main reason why I do anything) and a way to fill in some knowledge gaps I had with powershell. This all proved to be true (this does not often happen for me).

Essentially you can tell the mocking functions to endow any Powershell command (or custom function) with new behavior. You can also have it track or record what is being called and with what parameters. Here is a test that uses mocking:

Describe "Install-ChocolateyVsixPackage" {
  Context "When not Specifying a version and version 10 and 11 is installed" {
    Mock Get-ChildItem {@(@{Name="path\10.0";Property=@("InstallDir");PSPath="10"},@{Name="path\11.0";Property=@("InstallDir");PSPath="11"})}
    Mock get-itemproperty {@{InstallDir=$Path}}
    Mock Get-ChocolateyWebFile
    Mock Write-Debug
    Mock Write-ChocolateySuccess
    Mock Install-Vsix

    Install-ChocolateyVsixPackage "package" "url"
    It "should install for version 11" {
        Assert-MockCalled Write-Debug -ParameterFilter {$message -like "*11\VsixInstaller.exe" }

  Context "When not Specifying a version and only 10 is installed" {
    Mock Get-ChildItem {@{Name="path\10.0";Property=@("InstallDir");PSPath="10";Length=$false}}
    Mock get-itemproperty {@{InstallDir=$Path}}
    Mock Get-ChocolateyWebFile
    Mock Write-Debug
    Mock Write-ChocolateySuccess
    Mock Write-ChocolateyFailure
    Mock Install-Vsix

    Install-ChocolateyVsixPackage "package" "url"
    It "should install for version 10" {
        Assert-MockCalled Write-Debug -ParameterFilter {$message -like "*10\VsixInstaller.exe" }

These are tests of the Chocolatey VSIX installer. A VSIX is a Visual Studio extension. Before focusing on the mocking, lets look at a new form of syntactic sugar here to call out the “When” as a first class expression using the Context block. Its essentially a nested Describe but expresses the Given-When-Then pattern more naturally.

A user can specify a version of visual studio to have the extension installed in if that user has more than one version installed. If they do not specify a version, we want to install it in the most recent version. That is what the first context is testing. The way that one checks is to inspect the registry for the versions installed and their locations. That’s where mocking comes in. We do not want to have to install and uninstall multiple versions of Visual Studio on our dev boxes if we do not have them. We certainly do not want to do this on a CI server. We also do not want to be futzing with the Windows Registry and try to feed it fake data.

Instead we can Mock the Get-ChildItem Cmdlet that we use to query the registry. Using the Mock function, we state that when Get-ChildItem is called, it should return a Hashtable that “looks” just like the Registry Key collection that a real Get-ChildItem would give us when querying VS versions on a machine with multiple versions. Then you see several functions mocked but no alternate behavior is specified. Specifying no behavior is the same as specifying simply {} or do nothing. We do this because none of those functions are relevant to this test and we don’t want the real code to execute. For instance, we do not want Get-ChocolateyWebFile to make a network call for a file.

In our It block, you can see how we use the Assert-MockCalled to verify that our code called into a mocked function as expected. Here if things go as planned, a debug message will record the path of the VsixInstaller used. Since we previously declared Write-Debug as a mocked function, we allow the mocking framework to monitor its calls. Using Assert-MockCalled we state that Write-Debug should be called at least once. There is further functionality that allows us to specify an exact number of calls or 0. Using the ParameterFilter, we state that only calls where the message argument contains the VS 11 path should be counted.

If you are curious about the implementation details of the mocking in Pester, the code is only a couple hundred lines.

And now a few words on state vs. behavior based testing

A common cause for debate in testing discussions is the use of state based vs. behavior based testing. It is generally agreed that it is best to test the state of a function and its collaborators once the function has completed. Too often we instead test the behavior of a function. We are reaching deeper into the function and assuming we know more that we should about what the function is doing. We are testing the implementation details of the function rather that the outcome of its state. The behavior based tests open the testing framework up to added fragility since changes in implementation often requires changing the tests. Mocking can lead developers to overusing behavior based tests since it makes it so easy to test the functions interactions at various points of the function’s logic.

I very much agree with the need to test state. This can be more difficult I think given the nature and eco system of powershell. Powershell is largely built for connecting and controlling the interactions of “black boxes.” Often the behavior being tested is blurred with state. I think we can get better at this and I admit to being new to this kind of unit testing. I hope to evolve patterns more indicative of state based tests as I learn in this space.

Tying Powershell Test Results into your CI Builds

One great feature of Pester is its ability to output test results according to the NUnit XML schema. While I personally have difficulty using the words “great” and “XML schema” in the same sentence, here it is sincere. This is particularly cool because most continuous integration servers understand this schema and support tooling that surfaces these results for easy visibility. Both Pester and Chocolatey use the CodeBetter TeamCity server to run builds after commits to the master repo and leverage this feature. There is a wiki on the Pester wiki page explaining how to set this up.

Here is a shot of how this looks:


I can see what tests failed and how long the tests took.

Learning more

We just scratched the surface of pester and in no way toured all of its great features. The Pester github wiki has lots of detailed information on its API and usage. When you install the Pester Powershell module, you also have access to the command line help which is pretty thorough. I’d also encourage you to look at both Pester and Chocolatey for a suite of example tests that demonstrate how to use all of the functionality in Pester. Also both Pester and Chocolatey have a google groups discussion page for raising issues and questions.

I hope you have found this post informative and please do let us know what you think of Pester and how you would like to see it improved.

Also If you are attending the Powershell Summit this Spring, please catch my talk on this exact subject.

Released AutoWrockTestable: Making test class composition easier by Matt Wrock

birdLate last year I blogged about a unit testing pattern I have been using for the past couple years. It’s a pattern that I initially learned from Matt Manela (@mmanela). I adapted the pattern to use Automocking with Richard Cirerol’s wrapper. Over the last week I have been working to plug this in to Visual Studio as a template that can easily add test classes and make one’s unit testing work flow more efficient.

I could, should, maybe will, but probably won’t write a separate post dedicated to making the Visual Studio extension. Ironically, while I am a member of the Visual Studio Gallery team, this is my first public extension I have written. While it is a trivial extension as extensions go, there were some interesting learnings that made what I thought would be a couple night’s worth of work into a week of my spare time. Despite some frustrating incidents, it was a lot of fun.

Now lets dive into AutoWrockTestable!

Whats better than AutoMocking? Why of course, AutoWrocking!

Visual Studio integration makes composing tests a snap!

Here is how to effortlessly add test classes to your solution with all mockable dependencies mocked:

1. You can download The Visual Studio Extension from Codeplex or the Visual Studio Gallery.
The extension will also install Nuget if you do not already have it and will add Structuremap, Structuremap.Automocking and Moq to your Nuget repository.

2. Create a skeleton of your implementation class.

public class OAuthTokenService{    private readonly IWebClientWrapper webClientWrapper;    private readonly IRegistryWrapper registry;

    public OAuthTokenService(IWebClientWrapper webClientWrapper,        IRegistryWrapper registry)    {        this.webClientWrapper = webClientWrapper;        this.registry = registry;    }    public string GetAccessToken(string clientId, IOauthUrls oauthUrls)    {        return null;    }}

3. Click on the "Add Testable..." menu item in Solution Explorer's "Add" context menu.


4. Enter the name of the class you want to test. You can enter any name but the text box will auto complete using all class files open in the editor. The first class in the active file is initially selected.


5. AutoWrockTestable creates a new class file with the same name as your implementation class appending "Tests" to the name and containing this code:

using AutoWrockTestable;

namespace SkyCli.Facts{    class OAuthTokenServiceTests    {        class TestableOAuthTokenService : Testable<SkyCli.OAuth.OAuthTokenService>        {            public TestableOAuthTokenService()            {

            }        }    }}

Writing tests using Testable<ClassToTest>

The Testable class has its dependencies automatically mocked. Now you can start to write test methods using your Testable. I like to use nested classes (a class for every method I want to test) to organize my tests. Here is how a test might look:

class OAuthTokenServiceTests{    class TestableOAuthTokenService : Testable<SkyCli.OAuth.OAuthTokenService>    {        public TestableOAuthTokenService()        {

        }    }

    public class GetAccessToken    {        [Fact]        public void WillReturnTokenFromRegistryIfAFreshOneIsFoundThere()        {            var testable = new TestableOAuthTokenService();            var registryValues = new Dictionary<string, string>();            registryValues.Add("access_token", "token");            registryValues.Add("expires_in", "3600");            registryValues.Add("grant_time", DateTime.Now.Ticks.ToString());            testable.Mock<IRegistryWrapper>().Setup(x => x.GetValues("path"))                .Returns(registryValues);

            var result = testable.ClassUnderTest.GetAccessToken("clientId", null);

            Assert.Equal("token", result);        }    }}

See Using the Testable<T> Class for a complete explanation of the Testable<T> API.

For more information on the Testable pattern and Auto Mocking in general see The Testable Pattern and Auto Mocking Explained or see my previous blog post on the subject.

Turn off Internet Explorer Enhanced Security by Matt Wrock

If you enjoy lots of dialog boxes that require you to take action before you can review any unique URL, then you will not want to use this:

$AdminKey = "HKLM:\SOFTWARE\Microsoft\Active Setup\Installed Components\`    {A509B1A7-37EF-4b3f-8CFC-4F3A74704073}"$UserKey = "HKLM:\SOFTWARE\Microsoft\Active Setup\Installed Components\`    {A509B1A8-37EF-4b3f-8CFC-4F3A74704073}"Set-ItemProperty -Path $AdminKey -Name "IsInstalled" -Value 0Set-ItemProperty -Path $UserKey -Name "IsInstalled" -Value 0Stop-Process -Name Explorer -ForceWrite-Host "IE Enhanced Security Configuration (ESC) has been disabled." -ForegroundColor Green

If on the other hand, IE enhanced security makes you want to stick your fist in an activated blender, do your self a favor and step away from the blender and then invoke the above script.

If you walk the fence between these two, invoke it anyways and you’ll be glad you did.

You are welcome.