Skip to: Site menu | Main content

Groovy 

      Download | Documentation | Developers | Community

An agile dynamic language for the Java Platform

Regular Expressions Add comment to Wiki View in Wiki Edit Wiki page Printable Version

Groovy supports regular expressions natively using the ~"pattern" expression, which creates a compiled Java Pattern object from the given pattern string. Groovy also supports the =~ (create Matcher) and ==~ (returns boolean, whether String matches the pattern) operators.

For matchers having groups, matcher[index] is either a matched String or a List of matched group Strings.

import java.util.regex.Matcher
import java.util.regex.Pattern


// ~ creates a Pattern from String
def pattern = ~/foo/
assert pattern instanceof Pattern
assert pattern.matcher("foo").matches()    // returns TRUE
assert pattern.matcher("foobar").matches() // returns FALSE, because matches() must match whole String

// =~ creates a Matcher, and in a boolean context, it's "true" if it has at least one match, "false" otherwise.
assert "cheesecheese" =~ "cheese"
assert "cheesecheese" =~ /cheese/
assert "cheese" == /cheese/   /*they are both string syntaxes*/
assert ! ("cheese" =~ /ham/)

// ==~ tests, if String matches the pattern
assert "2009" ==~ /\d+/  // returns TRUE
assert "holla" ==~ /\d+/ // returns FALSE

// lets create a Matcher
def matcher = "cheesecheese" =~ /cheese/
assert matcher instanceof Matcher

// lets do some replacement
def cheese = ("cheesecheese" =~ /cheese/).replaceFirst("nice")
assert cheese == "nicecheese"
assert "color" == "colour".replaceFirst(/ou/, "o")

def cheese = ("cheesecheese" =~ /cheese/).replaceAll("nice")
assert cheese == "nicenice"

// simple group demo
// You can also match a pattern that includes groups.  First create a matcher object,
// either using the Java API, or more simply with the =~ operator.  Then, you can index
// the matcher object to find the matches.  matcher[0] returns a List representing the
// first match of the regular expression in the string.  The first element is the string
// that matches the entire regular expression, and the remaining elements are the strings
// that match each group.
// Here's how it works:
def m = "foobarfoo" =~ /o(b.*r)f/
assert m[0] == ["obarf", "bar"]
assert m[0][1] == "bar"
 
// Although a Matcher isn't a list, it can be indexed like a list.  In Groovy 1.6
// this includes using a collection as an index:

def matcher = string =~ "e+"

assert "ee" == matcher[2]
assert ["ee", "e"] == matcher[2..3]
assert ["ee", "ee"] == matcher[0, 2]
assert ["ee", "e", "ee"] == matcher[0, 1..2]
 
matcher = "cheese please" =~ /([^e]+)e+/
assert ["se", "s"] == matcher[1]
assert [["se", "s"], [" ple", " pl"]] == matcher[1, 2]
assert [["se", "s"], [" ple", " pl"]] == matcher[1 .. 2]
assert [["chee", "ch"], [" ple", " pl"], ["ase", "as"]] == matcher[0, 2..3]
// Matcher defines an iterator() method, so it can be used, for example,
// with collect() and each():
matcher = "cheese please" =~ /([^e]+)e+/
matcher.each { println it }
matcher.reset()
assert matcher.collect { it }  ==
                  [["chee", "ch"], ["se", "s"], [" ple", " pl"], ["ase", "as"]]
// The semantics of the iterator were changed by Groovy 1.6.
// In 1.5, each iteration would always return a string of the entire match, ignoring groups.
// In 1.6, if the regex has any groups, it returns a list of Strings as shown above.

// there is also regular expression aware iterator grep()
assert ["foo", "moo"] == ["foo", "bar", "moo"].grep(~/.*oo$/)
// which can be written also with findAll() method
assert ["foo", "moo"] == ["foo", "bar", "moo"].findAll { it ==~ /.*oo/ }

Since a Matcher coerces to a boolean by calling its find method, the =~ operator is consistent with the simple use of Perl's =~ operator, when it appears as a predicate (in 'if', 'while', etc.). The "stricter-looking" ==~ operator requires an exact match of the whole subject string. It returns a Boolean, not a Matcher.

Regular expression support is imported from Java. Java's regular expression language and API is documented here.

More Examples

Goal: Capitalize words at the beginning of each line:

def before='''
apple
orange
y
banana
'''

def expected='''
Apple
Orange
Y
Banana
'''

assert expected == before.replaceAll(/(?m)^\w+/,
    { it[0].toUpperCase() + ((it.size() > 1) ? it[1..-1] : '') })

Goal: Capitalize every word in a string:

assert "It Is A Beautiful Day!" ==
    ("it is a beautiful day!".replaceAll(/\w+/,
        { it[0].toUpperCase() + ((it.size() > 1) ? it[1..-1] : '') }))

Add .toLowerCase() to make the rest of the words lowercase

assert "It Is A Very Beautiful Day!" ==
    ("it is a VERY beautiful day!".replaceAll(/\w+/,
        { it[0].toUpperCase() + ((it.size() > 1) ? it[1..-1].toLowerCase() : '') }))

Gotchas

How to use backreferences with String.replaceAll()

GStrings do not work as you'd expect:

def replaced = "abc".replaceAll(/(a)(b)(c)/, "$1$3")

Produces an error like the following:

[...] illegal string body character after dollar sign:

solution: either escape a literal dollar sign "\$5" or bracket the value expression "$

Unknown macro: {5}

" @ line [...]

Solution:

Use ' or / to delimit the replacement string:

def replaced = "abc".replaceAll(/(a)(b)(c)/, '$1$3')