Thursday 18 December 2014

Regular Expression in Java

         A Regular Expression defines a search pattern for strings.  The search pattern can be anything from a simple character, a fixed string or a complex expression containing special characters describing the pattern. The pattern defined by the Regular Expression may match one or several times or not at all for a given string.
         Regular expressions can be used to search, edit and manipulate text. A Regular Expression is also known as a regex or regexp.

         The java.util.regex package was added to Java SE 1.4. If you are running an older version of java than that you should really consider upgrading.

         Java Regular Expression classes are present in java.util.regex package that contains three classes: Pattern, Matcher and PatternSyntaxException.


1) Pattern object is the compiled version of the regular expression. It doesn’t have any public constructor and we use it’s public static method compile to create the pattern object by passing regular expression argument.

2. Matcher is the regex engine object that matches the input String pattern with the pattern object created. This class doesn’t have any public construtor and we get a Matcher object using pattern object matcher method that takes the input String as argument. We then use matches method that returns boolean result based on input String matches the regex pattern or not.

3) PatternSyntaxException is thrown if the regular expression syntax is not correct.


Java Regular Expression Metacharacters:-

          We have some metacharacters also in regular expression, it’s like short codes for common matching patterns.

^  indicates the beginning of line.
$  indicates the end of line.

Regular Expression is in between ^ and $.

Following is the list of metacharacters which can be used in Regular Expressions.

     \d    -   Any digit, short for [0-9].
     \D   -   A non-digit, short for [^0-9].
      \s    -   A white space character.
      \S   -   A non-white space character.
      \w   -   A word character, short for [a-zA-Z_0-9].
      \W  -   A non-word character [^\w].
      \b   -    Matches a word boundary where a word character is [a-zA-Z0-9_].
     [..]  -     Matches any single character in brackets.
   [^..]  -     Matches any single character not in brackets.
      \t    -    Matches a tab (U+0009).
      \v   -    Matches a vertical tab (U+000B).
      +   -    Matches the preceding character 1 or more times. Equivalent to {1,}.
      *    -    Matches the preceding character 0 or more times. Equivalent to {0,}.
      ?    -    Matches the preceding character 0 or 1 time. Equivalent to {0,1}.
      .     -    (The decimal point) matches any single character except the newline character.


Examples:-

Valid Regular Expressions are ,

                                       ^\\d+$   -   Numerics,
                                       ^\\w$     -   AlphaNumerics,
                    ^[a-z0-9_-]{3,16}$   -   lowercase text with  numerics,underscore or hyphen and length                                                               should be between  3 and 16.
                                     ^\\d{5}$/   -   5 digit Numerics,
^(\\d{1,2})-(\\d{1,2})-(\\d{4})$/     -   Date format dd-MM-yyyy .


Regular Expression Example:-

           Regular expressions make it possible to find all instances of text that match a certain pattern, and return a Boolean value if the pattern is found/not found. (This can be used to validate input such as phone numbers, social security numbers, email addresses, web form input data, scrub data, decimals,Numerics,AlphaNumerics,Email and much more. Eg. If the pattern is found in a String, and the pattern matches a Numerics, then the string is an Numerics).

     import java.util.ArrayList;  
     import java.util.List; 
     public class ValidateDemo {  
            public static void main(String[] args)  {
                    List<String> input = new ArrayList<String>(); 
                    input.add("123");
                    input.add("98HT12");
                    input.add("345") ;
                    for (String numeric : input) { 
                             if (numeric.matches("^\\d+$")) { 
                                      System.out.println("Numerics : " + numeric);  
                             } 
                   } 
            }                
    }

Output:-
      Numerics : 123
      Numerics : 345
      

Syntax Error Validation in Java Using Pattern:-

        public class RegularExpValidation {
                public static void main(String[] args){
                        String valid = RegExpValidation("^\\w{1,$");
                        if(valid != null ){
                               System.out.println(valid);
                        } 
               }

               public static String RegExpValidation(String regPattern){
                       String errorMessage = null;
                       try {
                               Pattern.compile(regPattern);
                       }
                       catch (PatternSyntaxException exception) {
                              errorMessage = exception.getDescription();
                       }
                       return errorMessage;
              }
       }

Output:-
     Illegal repetition

Backslashes in Java with Regular Expressions:-

           In literal Java strings the backslash is an escape character. The literal string "\\" is a single backslash. In regular  expressions, the backslash is also an escape character. The regular expression \\ matches a single backslash. This regular expression as a Java string, becomes "\\\\". That's right: 4 backslashes to match a single one.

The regex \w matches a word character. As a Java string, this is written as "\\w".