Skip to: Site menu | Main content

Groovy 

      Download | Documentation | Developers | Community

An agile dynamic language for the Java Platform

Regular Expressions Add comment to Wiki View in Wiki Edit Wiki page Printable Version

Groovy supports regular expressions natively using the ~"pattern" expression, which creates a compiled Java Pattern object from the given pattern string. Groovy also supports the =~ (create Matcher) and ==~ (matches regex) operators.

For matchers having groups, matcher[index] is either a matched String or a List of matched group Strings.

import java.util.regex.Matcher
import java.util.regex.Pattern


// =~ creates a Matcher, and in a boolean context, it's "true" if it has at least one match, "false" otherwise.
assert "cheesecheese" =~ "cheese"
assert "cheesecheese" =~ /cheese/
assert "cheese" == /cheese/   /*they are both string syntaxes*/
assert ! ("cheese" =~ /ham/)

// lets create a regex Pattern
def pattern = ~/foo/
assert pattern instanceof Pattern
assert pattern.matcher("foo").matches()

// lets create a Matcher
def matcher = "cheesecheese" =~ /cheese/
assert matcher instanceof Matcher

// lets do some replacement
def cheese = ("cheesecheese" =~ /cheese/).replaceFirst("nice")
assert cheese == "nicecheese"

// simple group demo
// You can also match a pattern that includes groups.  First create a matcher object, either
// using the Java API, or more simply with the =~ operator.  Then, you can index the matcher
// object to find the matches.  matcher[0] returns a List representing the first match of the
// regular expression in the string.  The first element is the string that matches the entire
// regular expression, and the remaining elements are the strings that match each group.
// Here's how it works:
def m = "foobarfoo" =~ /o(b.*r)f/
assert m[0] == ["obarf", "bar"]
assert m[0][1] == "bar"
 
// Although a Matcher isn't a list, it can be indexed like a list.  In Groovy 1.6 this includes
// using a collection as an index:

def matcher = string =~ "e+"

assert "ee" == matcher[2]
assert ["ee", "e"] == matcher[2..3]
assert ["ee", "ee"] == matcher[0, 2]
assert ["ee", "e", "ee"] == matcher[0, 1..2]
 
matcher = "cheese please" =~ /([^e]+)e+/
assert ["se", "s"] == matcher[1]
assert [["se", "s"], [" ple", " pl"]] == matcher[1, 2]
assert [["se", "s"], [" ple", " pl"]] == matcher[1 .. 2]
assert [["chee", "ch"], [" ple", " pl"], ["ase", "as"]] == matcher[0, 2..3]
// Matcher defines an iterator() method, so it can be used, for example, with collect() and each():
matcher = "cheese please" =~ /([^e]+)e+/
matcher.each { println it }
matcher.reset()
assert matcher.collect { it}  == [["chee", "ch"], ["se", "s"], [" ple", " pl"], ["ase", "as"]]
// The semantics of the iterator are a little different between Groovys up to 1.5, vs. 1.6 and later.
// In 1.5, each iteration would always return a string of the entire match, ignoring groups.  In
// 1.6, if the regex has any groups, it returns a list of Strings as shown above.

Since a Matcher coerces to a boolean by calling its find method, the =~ operator is consistent with the simple use of Perl's =~ operator, when it appears as a predicate (in 'if', 'while', etc.). The "stricter-looking" ==~ operator requires an exact match of the whole subject string. It returns a Boolean, not a Matcher.

Regular expression support is imported from Java. Java's regular expression language and API is documented here.

More Examples

Goal: Capitalize words at the beginning of each line:

def before='''
apple
orange
y
banana
'''

def expected='''
Apple
Orange
Y
Banana
'''

assert expected == before.replaceAll(/(?m)^\w+/, { it[0].toUpperCase() + ((it.size() > 1) ? it[1..-1] : '') })

Goal: Capitalize every word in a string:

assert "It Is A Beautiful Day!" == ("it is a beautiful day!".replaceAll(/\w+/, { it[0].toUpperCase() + ((it.size() > 1) ? it[1..-1] : '') }))

Add .toLowerCase() to make the rest of the words lowercase

assert "It Is A Very Beautiful Day!" == ("it is a VERY beautiful day!".replaceAll(/\w+/, { it[0].toUpperCase() + ((it.size() > 1) ? it[1..-1].toLowerCase() : '') }))