Regex Metacharacters Tutorial

Metacharacters are ordinary characters that cause the compiled regex expression to be interpreted in a special way.
The following list of characters are all metacharacters with a brief description of what they commonly do :

.     the period is a wildcard for any character (letter, number, symbol, etc.).
[]    square brackets are used in character classes.
()    parenthesis are used in capturing groups.
{}    curly brackets are used in quantifiers.
\     the backslash is used to precede a metacharacter or a predefined character class. It is also used to indicate a Java escape sequence character.
^     the caret is used for negation or the beginning of a line.
-     the dash is used for ranges.
&     the ampersand is used for intersections.
=     the equals is used for special construct lookahead.
$     the dollar sign is a boundary matcher for the end of a line.
!     the exclamation mark is used for special construct lookbehind.
|     the pipe is used for or expressions.
?     the question mark is used in quantifiers and special constructs.
*     the asterisk is used in quantifiers.
+     the plus is used in quantifiers.
,     the comma is used in quantifiers.

How can we force one of these characters to become a regular character? There are two ways to make that happen:
(1) put a backslash \ in front of the character.
(2) put the metacharacter(s) inside of a \Q and a \E. An example of a regular period would be \Q.\E
The two rules above apply to regex expressions for any computer language, since we are obviously using Java, we must factor in escape sequences. If you are not familiar with escape sequences then I highly recommend that you watch my Escape Sequences Tutorial. In Java, the backslash \ character inside of a String literal indicates an escape sequence. The escape sequence to produce the backslash \ character is a double backslash \\. With that being said, the above regex rules translated exclusively for Java are:
(1) put a backslash escape sequence \\ in front of the character.
(2) put the metacharacter(s) inside of a \\Q and a \\E. An example of a regular period would be \\Q.\\E

In the list above, I describe a brief overview of some of the most commonly used actions. Regex operations are quite extensive, and certain characters above can perform even more specialized actions depending upon the context in which they are used. On the contrary, some of the characters above are just plain old characters, once again, depending upon the context in which they are used. The key to understanding the purpose of a regex search pattern is to break down what each individual metacharacter is doing. It can be quite confusing at first, and the cryptic appearance of a complicated regex expression will make anyone want to run for the door.

Let's imagine that we have a program that is prompting the user to input their IP Address so we can route data directly to their device. We will need a way to validate the string that they enter is a valid IP Address. A valid IP address consists of a grouping of the numbers 0-255, exacty four times separated with a . (period or dot). 0.0.0.0 through 255.255.255.255 A example of a regex pattern to check for a valid IP Address would be: (([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.){3}([01]?\\d\\d?|2[0-4]\\d|25[0-5]) ... (running for the door).

I promise you that if you stick with learning regex step by step you will become a regex guru and the above expression will look like child's play someday. Prior to this tutorial, I have four four regex tutorials. If you have already watched my regex character class tutorials then you should already be familiar with the following characters: []^-&



Open the command prompt (CMD - see the Getting Started ) and type in the following commands.

C:\Windows\System32>cd \
C:\>md Java
C:\>cd Java
C:\Java>
C:\Java>md RegexMetacharacters
C:\Java>cd RegexMetacharacters
C:\Java\RegexMetacharacters>Notepad RegexMetacharacters.java

Copy and Paste, or type the following code into Notepad and be sure to save the file when you are done.


import java.util.regex.*;

class RegexMetacharacters {
    public static void main(String args[]) {
        displayFind(".","107.1.53.254");
        displayFind("\\.","107.1.53.254");
        displayFind("\\Q.\\E","107.1.53.254");

        displayFind("...","Blah blah ... blah blah");
        displayFind("\\.\\.\\.","Blah blah ... blah blah");
        displayFind("\\Q...\\E","Blah blah ... blah blah");

        System.out.println("Searching for backslashes...");
        System.out.println("\\\\java\\\\");
        System.out.println("c:\\java\\");
        displayFind("\\\\java\\\\","c:\\java\\");


        displayFind("[41]","new int[14]");
        displayFind("\\Q[41]\\E","new int[14]");
        displayFind("\\Q[41]\\E","new int[41]");
        displayFind("\\[41\\]","new int[41]");
    }

    static void displayFind(String regex, String searchMe) {
        boolean foundIt = false;
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(searchMe);
        while(m.find()){
            System.out.println("Matcher found " + m.group() + " at index "+ m.start() + " for regex " + regex + " in string \"" + searchMe +"\"" );
            foundIt = true;
        }
        if(!foundIt){
            System.out.println("No matches found for " + regex + " in string \"" + searchMe +"\'");
        }
        System.out.println();	    
    }
}

Now switch back to the command prompt (CMD) and type in javac RegexMetacharacters.java and press Enter.
Now type in java RegexMetacharacters and press Enter.


C:\Java\RegexMetacharacters>javac RegexMetacharacters.java
C:\Java\RegexMetacharacters>java RegexMetacharacters
Matcher found 1 at index 0 for regex . in string "107.1.53.254"
Matcher found 0 at index 1 for regex . in string "107.1.53.254"
Matcher found 7 at index 2 for regex . in string "107.1.53.254"
Matcher found . at index 3 for regex . in string "107.1.53.254"
Matcher found 1 at index 4 for regex . in string "107.1.53.254"
Matcher found . at index 5 for regex . in string "107.1.53.254"
Matcher found 5 at index 6 for regex . in string "107.1.53.254"
Matcher found 3 at index 7 for regex . in string "107.1.53.254"
Matcher found . at index 8 for regex . in string "107.1.53.254"
Matcher found 2 at index 9 for regex . in string "107.1.53.254"
Matcher found 5 at index 10 for regex . in string "107.1.53.254"
Matcher found 4 at index 11 for regex . in string "107.1.53.254"

Matcher found . at index 3 for regex \. in string "107.1.53.254"
Matcher found . at index 5 for regex \. in string "107.1.53.254"
Matcher found . at index 8 for regex \. in string "107.1.53.254"

Matcher found . at index 3 for regex \Q.\E in string "107.1.53.254"
Matcher found . at index 5 for regex \Q.\E in string "107.1.53.254"
Matcher found . at index 8 for regex \Q.\E in string "107.1.53.254"

Matcher found Bla at index 0 for regex ... in string "Blah blah ... blah blah"
Matcher found h b at index 3 for regex ... in string "Blah blah ... blah blah"
Matcher found lah at index 6 for regex ... in string "Blah blah ... blah blah"
Matcher found  .. at index 9 for regex ... in string "Blah blah ... blah blah"
Matcher found . b at index 12 for regex ... in string "Blah blah ... blah blah"
Matcher found lah at index 15 for regex ... in string "Blah blah ... blah blah"
Matcher found  bl at index 18 for regex ... in string "Blah blah ... blah blah"

Matcher found ... at index 10 for regex \.\.\. in string "Blah blah ... blah blah"

Matcher found ... at index 10 for regex \Q...\E in string "Blah blah ... blah blah"

Matcher found \java\ at index 2 for regex \\java\\ in string "c:\java\"

Matcher found 1 at index 8 for regex [41] in string "new int[14]"
Matcher found 4 at index 9 for regex [41] in string "new int[14]"

No matches found for \Q[41]\E in string "new int[14]'

Matcher found [41] at index 7 for regex \Q[41]\E in string "new int[41]"

Matcher found [41] at index 7 for regex \[41\] in string "new int[41]"


Final thoughts

When creating regular expressions, it is important to understand how metacharacters affect the outcome of a search. Having a solid understanding Java escape sequences will help prevent any unexpected strange behavior from occurring in your search as well. Stay tuned for my next tutorial on predefined character classes, we will be using the special backslash character once again.


Tutorials