Regex Predefined Character Classes Tutorial

Often times you will find yourself using the same character class expression over and over again. A regex character pattern such as [0-9] represents digits only, specifically the numbers 0 through 9. An expression such as this is used so often, the regex architects decided to make a shorthand metacharacter for just such an occasion: \d. The regex string literal "[0-9]" is functionally identical to "\\d". Predefined Character Classes are essentially just extensions of the standard metacharacters. Before continuing with this tutorial, I highly recommend that you watch my Regex Character Classes Part 1 Tutorial , Regex Character Classes Part 2 Tutorial, and my Regex Metacharacters Tutorial.
I made the following table of the predefined character classes (including escape sequence syntax) and what they represent:

Predefined          Character Class    Purpose
"\\d"               "[0-9]"            digits 0 through 9
"\\D"               "[^0-9]"           any character except the digits 0 through 9
"\\s"               "[ \\n\\r\\t\\f]"      regular space, newline, carriage return, tab, and formfeed
"\\S"               "[^\\s]"           any character except for a space, newline, carriage return, tab, or formfeed 
"\\w"               "[a-zA-Z0-9_]"     lowercase a-z, uppercase A-Z, digits 0 through 9, and the underscore
"\\W"               "[^\\w]"           anything except lowercase a-z, uppercase A-Z, digits 0 through 9, or underscore

try not to use these next ones - I'll explain in the video.
"\\h"               "[ \\t\\xA0\\u1680\\u180e\\u2000-\\u200a\\u202f\\u205f\\u3000]" (A0=non-breaking space)
"\\H"               "[^\\h]"
"\\v"               "[\\n\\f\\r\\x85\\x0B\\u2028\\u2029]" (x0B=vertical tab, 85=horizontal ellipsis)
"\\V"               "[^\\v]"



Open the command prompt (CMD - see the Getting Started ) and type in the following commands.

C:\Windows\System32>cd \
C:\>md Java
C:\>cd Java
C:\Java>
C:\Java>md RegexPredefined
C:\Java>cd RegexPredefined
C:\Java\RegexPredefined>Notepad RegexPredefined.java

Copy and Paste, or type the following code into Notepad and be sure to save the file when you are done.


import java.util.regex.*;

class RegexPredefined {
    public static void main(String args[]) {
        displayFind("[0-9]","107.1.53.254");
        displayFind("\\d","107.1.53.254");
        displayFind("[^0-9]","107.1.53.254");
        displayFind("\\D","107.1.53.254");

        displayFind("[ \\n\\r\\t\\f]","Hello World\n");
        displayFind("\\s","Hello World\n");
        displayFind("[^ \\n\\r\\t\\f]","Hello World\n");
        displayFind("\\S","Hello World\n");

        displayFind("[a-zA-Z0-9_]","Page_no: 137");
        displayFind("\\w","Page_no: 137");
        displayFind("[^a-zA-Z0-9_]","Page_no: 137");
        displayFind("[\\W]","Page_no: 137");

        // from metacharacter tutorial
        displayFind("(([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.){3}([01]?\\d\\d?|2[0-4]\\d|25[0-5])", "107.1.53.255");
        displayFind("(([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.){3}([01]?\\d\\d?|2[0-4]\\d|25[0-5])", "0.0.0.0");
        displayFind("(([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.){3}([01]?\\d\\d?|2[0-4]\\d|25[0-5])", "255.255.255.255");
        displayFind("(([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.){3}([01]?\\d\\d?|2[0-4]\\d|25[0-5])", "499.34.82.1007");
    }

    static void displayFind(String regex, String searchMe) {
        boolean foundIt = false;
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(searchMe);
        while(m.find()){
            System.out.println("Matcher found " + m.group() + " at index "+ m.start() + " for regex " + regex + " in string \"" + searchMe +"\"" );
            foundIt = true;
        }
        if(!foundIt){
            System.out.println("No matches found for " + regex + " in string \"" + searchMe +"\'");
        }
        System.out.println();	    
    }
}

Now switch back to the command prompt (CMD) and type in javac RegexPredefined.java and press Enter.
Now type in java RegexPredefined and press Enter.


C:\Java\RegexPredefined>javac RegexPredefined.java
C:\Java\RegexPredefined>java RegexPredefined
Matcher found 1 at index 0 for regex [0-9] in string "107.1.53.254"
Matcher found 0 at index 1 for regex [0-9] in string "107.1.53.254"
Matcher found 7 at index 2 for regex [0-9] in string "107.1.53.254"
Matcher found 1 at index 4 for regex [0-9] in string "107.1.53.254"
Matcher found 5 at index 6 for regex [0-9] in string "107.1.53.254"
Matcher found 3 at index 7 for regex [0-9] in string "107.1.53.254"
Matcher found 2 at index 9 for regex [0-9] in string "107.1.53.254"
Matcher found 5 at index 10 for regex [0-9] in string "107.1.53.254"
Matcher found 4 at index 11 for regex [0-9] in string "107.1.53.254"

Matcher found 1 at index 0 for regex \d in string "107.1.53.254"
Matcher found 0 at index 1 for regex \d in string "107.1.53.254"
Matcher found 7 at index 2 for regex \d in string "107.1.53.254"
Matcher found 1 at index 4 for regex \d in string "107.1.53.254"
Matcher found 5 at index 6 for regex \d in string "107.1.53.254"
Matcher found 3 at index 7 for regex \d in string "107.1.53.254"
Matcher found 2 at index 9 for regex \d in string "107.1.53.254"
Matcher found 5 at index 10 for regex \d in string "107.1.53.254"
Matcher found 4 at index 11 for regex \d in string "107.1.53.254"

Matcher found . at index 3 for regex [^0-9] in string "107.1.53.254"
Matcher found . at index 5 for regex [^0-9] in string "107.1.53.254"
Matcher found . at index 8 for regex [^0-9] in string "107.1.53.254"

Matcher found . at index 3 for regex \D in string "107.1.53.254"
Matcher found . at index 5 for regex \D in string "107.1.53.254"
Matcher found . at index 8 for regex \D in string "107.1.53.254"

Matcher found   at index 5 for regex [ \n\r\t\f] in string "Hello World
"
Matcher found 
 at index 11 for regex [ \n\r\t\f] in string "Hello World
"

Matcher found   at index 5 for regex \s in string "Hello World
"
Matcher found 
 at index 11 for regex \s in string "Hello World
"

Matcher found H at index 0 for regex [^ \n\r\t\f] in string "Hello World
"
Matcher found e at index 1 for regex [^ \n\r\t\f] in string "Hello World
"
Matcher found l at index 2 for regex [^ \n\r\t\f] in string "Hello World
"
Matcher found l at index 3 for regex [^ \n\r\t\f] in string "Hello World
"
Matcher found o at index 4 for regex [^ \n\r\t\f] in string "Hello World
"
Matcher found W at index 6 for regex [^ \n\r\t\f] in string "Hello World
"
Matcher found o at index 7 for regex [^ \n\r\t\f] in string "Hello World
"
Matcher found r at index 8 for regex [^ \n\r\t\f] in string "Hello World
"
Matcher found l at index 9 for regex [^ \n\r\t\f] in string "Hello World
"
Matcher found d at index 10 for regex [^ \n\r\t\f] in string "Hello World
"

Matcher found H at index 0 for regex \S in string "Hello World
"
Matcher found e at index 1 for regex \S in string "Hello World
"
Matcher found l at index 2 for regex \S in string "Hello World
"
Matcher found l at index 3 for regex \S in string "Hello World
"
Matcher found o at index 4 for regex \S in string "Hello World
"
Matcher found W at index 6 for regex \S in string "Hello World
"
Matcher found o at index 7 for regex \S in string "Hello World
"
Matcher found r at index 8 for regex \S in string "Hello World
"
Matcher found l at index 9 for regex \S in string "Hello World
"
Matcher found d at index 10 for regex \S in string "Hello World
"

Matcher found P at index 0 for regex [a-zA-Z0-9_] in string "Page_no: 137"
Matcher found a at index 1 for regex [a-zA-Z0-9_] in string "Page_no: 137"
Matcher found g at index 2 for regex [a-zA-Z0-9_] in string "Page_no: 137"
Matcher found e at index 3 for regex [a-zA-Z0-9_] in string "Page_no: 137"
Matcher found _ at index 4 for regex [a-zA-Z0-9_] in string "Page_no: 137"
Matcher found n at index 5 for regex [a-zA-Z0-9_] in string "Page_no: 137"
Matcher found o at index 6 for regex [a-zA-Z0-9_] in string "Page_no: 137"
Matcher found 1 at index 9 for regex [a-zA-Z0-9_] in string "Page_no: 137"
Matcher found 3 at index 10 for regex [a-zA-Z0-9_] in string "Page_no: 137"
Matcher found 7 at index 11 for regex [a-zA-Z0-9_] in string "Page_no: 137"

Matcher found P at index 0 for regex \w in string "Page_no: 137"
Matcher found a at index 1 for regex \w in string "Page_no: 137"
Matcher found g at index 2 for regex \w in string "Page_no: 137"
Matcher found e at index 3 for regex \w in string "Page_no: 137"
Matcher found _ at index 4 for regex \w in string "Page_no: 137"
Matcher found n at index 5 for regex \w in string "Page_no: 137"
Matcher found o at index 6 for regex \w in string "Page_no: 137"
Matcher found 1 at index 9 for regex \w in string "Page_no: 137"
Matcher found 3 at index 10 for regex \w in string "Page_no: 137"
Matcher found 7 at index 11 for regex \w in string "Page_no: 137"

Matcher found : at index 7 for regex [^a-zA-Z0-9_] in string "Page_no: 137"
Matcher found   at index 8 for regex [^a-zA-Z0-9_] in string "Page_no: 137"

Matcher found : at index 7 for regex [\W] in string "Page_no: 137"
Matcher found   at index 8 for regex [\W] in string "Page_no: 137"

Matcher found 107.1.53.25 at index 0 for regex (([01]?\d\d?|2[0-4]\d|25[0-5])\.){3}([01]?\d\d?|2[0-4]\d|25[0-5]) in string "107.1.53.255"

Matcher found 0.0.0.0 at index 0 for regex (([01]?\d\d?|2[0-4]\d|25[0-5])\.){3}([01]?\d\d?|2[0-4]\d|25[0-5]) in string "0.0.0.0"

Matcher found 255.255.255.25 at index 0 for regex (([01]?\d\d?|2[0-4]\d|25[0-5])\.){3}([01]?\d\d?|2[0-4]\d|25[0-5]) in string "255.255.255.255"

Matcher found 99.34.82.100 at index 1 for regex (([01]?\d\d?|2[0-4]\d|25[0-5])\.){3}([01]?\d\d?|2[0-4]\d|25[0-5]) in string "499.34.82.1007"




Final thoughts

Try to get into the habit of using the predefined character classes in place of character class ranges and negation whenever the opportunity presents itself.


Tutorials