The SpecFlow Cook Book


Use Regular Expressions

You will never get the most out of Specflow without them

The Problem

REGEX, The Voldermort lurking in the IT cupboard. People often think they are surrounded in mystery and witchcraft, and as unpalatable as it might be for most the honest an inescapable truth is that you will never get the most out of Specflow without them. To some that will seem like bad news, but the good news is that you really can achieve 95% of what you ever need to achieve with only about 10% of what they can do.

The Ingredients

  • Capture Groups and Non-Capturing Groups
  • Greedy vs Non-Greedy
  • Look ahead, and Look behind
  • Character Class’s

The Solution(s)

In this chapter we will be dealing with the following solutions:

  • How to extract words and values from your steps and pass those values to your step methods
  • How to allow for language variations in your steps (some people will say “when I get”, others will say “when I download”)
  • The dreaded “Ambiguous Step Bindings” that your regular expressions may introduce

Extract words and values from text in your steps

Arguably the biggest thing REGEX brings to our Specflow party is the ability to extract words and values from our scenario steps and enables us to pass them to our step methods as parameters. This is achieved through the use of capture groups which appear within your expressions as a series of rules encapsulated within brackets.

If you have multiple parameters that you want to pass to your step method then you will need to provide these method parameters in the same order as the capture groups in your expression.

for example:

The expression “A (bird) in the (hand) is worth (one|two|three) in the (bush)” contains 4 capture groups and in this case will capture the words “bird” followed by “hand” followed by either “one”, “two” or “three” followed by the word “bush” from any matching step.

Scenario: A well known phrase
Given A bird in the hand is worth two in the bush
Then .........

Scenario: A not so well known phrase
Given A frog in the pond is worth three in the tree
Then .........

Scenario: Now this is just silly
Given A toad in the hole is worth four in the oven
Then .........

The following single method would match both of the GIVEN’s in the first two scenarios above, but not the third

[Given(@"A (bird|frog) in the (hand|pond) is worth (one|two|three) in the (bush|tree)")]
public void GivenAPhrase(string animal, string location, string qty, string whereCouldItBe)

Supporting variations in sentence construction

REST, API, Response, HTTP. These are all words unlikely to make it into the written vocabulary of your average business analyst and product owner. As “techies” we dream in code, we like structure, conventions and facial hair, but given the task of forming a sentence that describes “retrieving an XML fragment from a RESTful pricing service” its quite likely that we would wouldn’t produce something like “Get the price list from the web site”.

This is obviously an extreme example, but which is correct? Well, the obvious answer is both. When the way sentences (that mean the same thing) are constructed so differently then arguably the best way of dealing with this is simply to have multiple step bindings on the same method to deal with them, however when the variations are subtle we can usually cope with them using a single well thought out regular expression.

Consider the following:

  • Given I want to download the data from the vehicle lease web site
  • Given I intend to download the information from the leasing site
  • Given that I retrieve data using the the download link on the car leasing screen
  • Given I have been told to download data using the url to the leasing page

Developers like structure and convention, we dream in code and have facial hair, we are therefore probably likely to construct our sentences using overtly technical sentences logical sentences in a consistent way using the same words.

In the above example you can clearly see that we define a list of expected words within the capture groups, for instance “Bush” or “Tree” is expressed as “(bush|tree)” within the expression. The brackets used to define our capture group also (conveniently) group the alternatives together in a way that we are all used to. But what if you do not want to capture and therefore pass some words to your step method?

For instance, suppose we want to support a slightly more flexible version of the above sentences by allowing the step to use terms like “in a tree” as well as “in the tree”? The answer is through the use of a “Non-Capturing Group”. The complete expression in this case would look something like this:

A (bird|frog) in the (hand|pond) is worth (one|two|three) in (?:the|a) (bush|tree)

This expression will still capture all of the same values as the one in the first example, however here it will also match the step text “A frog in the pond is worth one in a tree” rather than just “a frog in the pond is worth one in the tree”

Some basic options to turn on

You will often see things like “(?i)” or “(?ismx)” at the beginning of regular expressions (well, at least in c#), alternatively if you are more accustomed to java or javascript this strangeness is often expressed after the final “/” in an expression, i.e. /here is my expression/ix. But what does it all mean?

These single characters relate to certain options that you can toggle on or off to control the way your expressions are applied. Generally you will see only 4 and only 2 of these are particularly relevant to Specflow. These options are:

  • i – This controls whether your expressions are case sensitive
  • x – This controls whether spaces in your expressions are ignored or not. In the absense of this option an expression “A B” means “find an A, followed by a space, followed by B”, however when the option is activated the space is ignored. Some people prefer to always activate this options because they prefer to use “\s” in their expression to indicate that they expect a space to be present, so in this situation the we might write something like “A\sB”.
  • m – This is not relevant for Specflow as it is used to control whether the REGEX will continue searching past the end of a line for a match (and we only ever have a single line of text in our steps)
  • s
  • – This is also not relevant for Specfow because it controls whether or not a carriage return and line feed (the two characters usually used to indicate the end of a line of text) are treated in the same way as spaces. Again, we only ever have a single line to deal with

In just about every expression you ever write within Specflow you are likely to want to turn case insensitivity on by including “(?i)” at the beginning of your expression. Its then up to you whether you want to ignore spaces in your expressions (although I could never really see the point), and if this is the option for you then you simply include the letter “x” alongside the “i” above: “(?ix)”.

It is very very unlikely that you will ever need to, but on the odd occasion where case *does* matter, you can deactivate case insensitivity and reactivate it at any point in your expression, for instance, suppose that it is important that a reference number is expressed using upper case letters in the following example:

“Your order reference is ABC123 and will be dispatched tomorrow”

You might write an expression that starts with case insensitivity turned on, but which enforces case sensitivity for the reference number like this:

“(?i)your order reference is (?-i)([A-Z0-9]+)(?i) and will be dispatched tomorrow”

“(?i)” – Make my expression case insensitive
“your order reference is” – look for the words “your order reference is”
“(?-i)” – Now make my expression case sensitive
“([A-Z0-9]+)” – Now capture 1 or more combinations of the letters A-Z or numbers 0-9
“(?i)” – Now restore case insensitivity
“and will be dispatched tomorrow” – and finish by looking for the words “and will be dispatched tomorrow”

You will see in the above example

If however all you want to do is group alternative words then you need a Non-Capturing group. For instance:

The expression “A (bird) in the (?:hand|bush) worth (one|two|three) in the (bush)” contains only 3 capture groups that will return the words “bird” then “one”, “two” or “three”, and then finally “bush”, and unlike the first example will match either the word “hand” or “bush” near the beginning of the sentence but will not pass this to your step method as the group is Non-Capturing on account of the presence of “?:” within the brackets.

Stitching expressions together

  • “Starting at the beginning of the string” – enter “^”
  • “Accept any character, but as few as possible” – enter “.*?”
  • “Look for the word email, emails, e-mail or e-mails” – enter “e-?mails?”
  • “Then accept any character, but as few as possible” – enter “.*?”

Use Regular Expressions

Leave a Reply

Simon Parsons

Simon Parsons

SIPSoftware was established in 2005 and provides IT services to several major Blue-Chip companies. We specialize in Agile software development and help our customers achieve excellence through the use of Scrum, continuous deployment and test automation I specialize in software development and test automation using .Net and have a particular interest in BDD using Specflow and Selenium WebDriver.

Use Regular Expressions December 16, 2013