Minimal Matching in Vim

Posted by: Brad Waite

Tagged in: Untagged 

One of the first things anyone who tackles regular expressions (regex) learns is the ".*" construct, which matches any character any number of times.  What is sometimes overlooked is that the * is greedy, in that it grabs any character as many times as it can.  And while that's frequently useful, there are times where you want it to grab it as few times as it can.  This is called "minimal matching".  For those familiar with Perl's minimal matching operator, '?', it turns out Vim has one, too!  Read on to find out how to use it.

Say you're trying to strip HTML tags from a document in Vim.  You might have multiple HTML tags on a single line:

Here's the Foo!  It's the best!

If you used a replace regex like this:

:%s/<.*>//g

it would produce:

Here's !  It's the best!

That's obviously now what you want.  What we really want is for the regex engine to grab as much as it can until it reaches the first '>'.

Vim's minimal matching operators is '\{-}'.  If it looks odd, consider that it's actually a form of the '\{n,m}' specific count operator.  'ab\{1,3}' would match "abbb", "abbbb" and "abbbbb".  When n is negative, Vim matches the minimum it can, so 'ab\{-1,3}' will match "ab" in "abbb". It turns out in this case that n and m are not necessary and '\{-}' matches the previous item zero or more times, as few as possible.

Going back to our HTML stripping example, we can use the following regex:

:%s/<.\{-}>//g

Using our previous example source text, that would produce following text:

Here's the Foo!  It's the best!

Now that's more like it.

What if you want to remove all opening and closing

tags, even if they had additional style information?

 

Minimal Matching

The following regex will do exactly what you want, all in one step:

:%s///g
Comments (0)Add Comment

Write comment
You must be logged in to post a comment. Please register if you do not have an account yet.

busy