Regex Capturing Group Numbering Tutorial

In this tutorial I will discuss capturing group numbering, but before I begin this topic I would like to do a quick recap of some important capturing group concepts that I have subtly introduced in other tutorials. In my Regex Capturing Groups Introduction Tutorial, I provided a very simplistic overview of capturing groups.

"Lizard[sz]"
"(Lizard)[sz]"
"(Lizards)|(Lizardz)"
"(Lizard)s|(Lizard)z"
"((Lizard)[s])|((Lizard)[z])"

When a capturing group is applied to a quantifier the search pattern is changed quite dramatically.

"round{2}" will return true if roundd exists in the search string.
"(round){2}" will return true if roundround exists in the search string.

In my Pattern Class Tutorial, I demonstrated how capturing groups can work to produce the same results as the overloaded version of the .compile(String regex, int flags) method.

Pattern p = Pattern.compile("the", Pattern.CASE_INSENSITIVE);
Pattern p = Pattern.compile("(?i)(the)");

In the example above, it is important to note that the (?i) is a not a capturing group, it is technically an inline modifier. Don't worry about that for now, I will discuss inline modifiers and non-capturing groups in another tutorial. Inline modifiers and non-capturing groups do not apply when it comes to capture group numbering.

Numbering

Capturing groups are automatically numbered when the regex is compiled. The numbering begins at 0 which is the automatically the entire group. The rest of the groups begin at number 1 and are numbered from left to right depending on the order of their opening parenthesis. We can invoke the .groupCount() method on a Matcher object to find out the number of groups in the regex.
Let's analyze this regex: "((lizard[^s])|(lizard[s]))"

"lizard[^s]|(lizard[s])" .groupCount() = 1
"(lizard[^s])|(lizard[s])" .groupCount() = 2
"((lizard[^s])|(lizard[s]))" .groupCount() = 3
"(?i)((lizard[^s])|(lizard[s]))" .groupCount() = 3

At this point you might be wondering what can we do with numbered capturing groups??? In the Matcher class, the methods .start(int group), .end(int group), and the .group(int group) all allow us to fine tune our search results. We can also use them to find repeating patterns which I will demonstrate in my backreferences tutorial. Let's take the string literal "Is Godzilla a lizard? Lizards are reptiles, but lizards are just a subclass of reptiles. I think the real question is who really cares?" and come up with a way of determining how many times lizard is used singular versus plural.



Open the command prompt (CMD - see the Getting Started ) and type in the following commands.

C:\Windows\System32>cd \
C:\>md Java
C:\>cd Java
C:\Java>
C:\Java>md RegexGroupNumbering
C:\Java>cd RegexGroupNumbering
C:\Java\RegexGroupNumbering>Notepad RegexGroupNumbering.java

Copy and Paste, or type the following code into Notepad and be sure to save the file when you are done.


import java.util.regex.*;

class RegexGroupNumbering {
    public static void main(String args[]) {
        displayCount("(Lizard)|(lizard)");
        System.out.println();
        displayCount("(?i)(LIZARD)");
        System.out.println();

        displayCount("lizard[^s]|(lizard[s])");
        displayCount("(lizard[^s])|(lizard[s])");
        displayCount("((lizard[^s])|(lizard[s]))");
        displayCount("(?i)((lizard[^s])|(lizard[s]))");
        System.out.println();

        Matcher m = Pattern.compile("(?i)((lizard[^s])|(lizard[s]))").matcher("Is Godzilla a lizard? Lizards are reptiles, but lizards are just a subclass of reptiles. I think the real question is who really cares?");
        System.out.println(m.groupCount());

        int singular=0, plural=0;
        while(m.find()) {
            System.out.println("ordinary m.group() = " + m.group());
            System.out.println("m.group(2) = " + m.group(2));
            System.out.println("m.group(3) = " + m.group(3));
            System.out.println();
            if (m.group(2)!=null) { singular++; }
            if (m.group(3)!=null) { plural++; }
        }
        System.out.println("\nSingular lizard usage: "+singular);
        System.out.println("Plural lizards usage: "+plural);
    }

    static void displayCount(String regex) {
        Matcher m = Pattern.compile(regex).matcher(" ");
        System.out.println(regex + "   m.groupCount() = " + m.groupCount());
    }
}

Now switch back to the command prompt (CMD) and type in javac RegexGroupNumbering.java and press Enter.
Now type in java RegexGroupNumbering and press Enter.


C:\Java\RegexGroupNumbering>javac RegexGroupNumbering.java
C:\Java\RegexGroupNumbering>java RegexGroupNumbering
(Lizard)|(lizard)   m.groupCount() = 2

(?i)(LIZARD)   m.groupCount() = 1

lizard[^s]|(lizard[s])   m.groupCount() = 1
(lizard[^s])|(lizard[s])   m.groupCount() = 2
((lizard[^s])|(lizard[s]))   m.groupCount() = 3
(?i)((lizard[^s])|(lizard[s]))   m.groupCount() = 3

3
ordinary m.group() = lizard?
m.group(2) = lizard?
m.group(3) = null

ordinary m.group() = Lizards
m.group(2) = null
m.group(3) = Lizards

ordinary m.group() = lizards
m.group(2) = null
m.group(3) = lizards

Singular lizard usage: 1
Plural lizards usage: 2


Final thoughts

Stay tuned for my next tutorial where I will teach you how to name your capturing groups.


Tutorials