Regular expressions in Oracle SQL & PL/SQL

Since Oracle10g new functions have been added to Oracle to allow the use of regular expressions in SQL and PL/SQL. These functions conform to the POSIX standard.
In Oracle10g the functions REGEXP_INSTR, REGEXP_REPLACE, REGEXP_SUBSTR and the condition REGEXP_LIKE have been added.
Then in Oracle11g the function REGEXP_COUNT has been added too. Let’s see how to use them.

REGEXP_INSTR works like the INSTR: it’s useful to find, into a given string, a substring that matches a given pattern. If the function doesn’t find any substring matching the pattern returns 0. The signature of the function is

REGEXP_INSTR (source_string, pattern, position, occurrence, return_option, match_parameter)

Where source_string is the string in which you’re searching;
pattern is the regular expression;
position is the position starting from you want to search;
occurrence is the requested occurrence of the searching string;
return_option can evaluate to 0 if you want the position of the first character of the substring found, 1 if you want the position of the first character after it;
match_parameter is a flag that can evaluate to:

  • i (case-insensitive search),
  • c (case-sensitive search),
  • n (the wildcard ‘.’ finds the carriage return character too),
  • m (the string is multi-line)
  • Only the first two are mandatory.

    Let’s see an example: let’s find into the string ‘This is a test string to show how regexp work in Oracle db’ the second two-characters-long word:

    WITH T AS (
    SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
    Select str, REGEXP_INSTR(str,'(^|\ )[[:alpha:]]{2}($|\ )', 1, 2)+1 pos
    FROM t;
    
    STR                                                               POS
    ---------------------------------------------------------- ----------
    This is a test string to show how regexp work in Oracle db         23
    
    

    Let’s see the REGEXP_REPLACE function, using it you can replace a string (found using a regular expression) with another one:

    REGEXP_REPLACE(source_string, pattern, replace_string, position, occurrence, match_parameter)
    

    source_string is the string in which to find and replace;
    pattern is the regular expression;
    replace_string is the replacement string;
    position is the position starting from you want to search;
    occurrence is the requested occurrence of the searching string;
    match_parameter works just like in the REGEXP_INSTR function.

    So, if we want to replace, in the string used in the previous example, the second two-characters-long word with a X we can write:

    
    WITH T AS (
    SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
    Select REGEXP_REPLACE(str,'(^|\ )[[:alpha:]]{2}($|\ )', ' X ',1, 2) newstr
    FROM t;
    
    NEWSTR
    ---------------------------------------------------------
    This is a test string X show how regexp work in Oracle db
    

    The third function to show is REGEXP_SUBSTR that allows us to extract a substring:

    REGEXP_SUBSTR (source_string , pattern, position, occurrence, match_parameter)
    

    source_string is the string in which to find;
    pattern is the regular expression;
    position is the position starting from you want to search;
    occurrence is the requested occurrence of the searching string;
    match_parameter works just like in the REGEXP_INSTR function.

    So, if we want to extract the second two-characters-long word we do:

    WITH T AS (
    SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
    Select str, trim(REGEXP_SUBSTR(str,'(^|\ )[[:alpha:]]{2}($|\ )',1, 2)) substr
    FROM t;
    
    STR                                                        SUBSTR
    ---------------------------------------------------------- ------
    This is a test string to show how regexp work in Oracle db to
    
    

    REGEXP_COUNT allows us to count the occurrences in the source string that match the pattern:

    REGEXP_COUNT (source_char , pattern, position, match_param) 
    

    source_char is the string in which to find;
    pattern is the regular expression;
    position is the position starting from you want to search;
    return_option works just like in the REGEXP_INSTR function.

    So, if we want to count the two-characters-long words, we can do:

    WITH T AS (
    SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
    Select str, REGEXP_COUNT(str,'(^|\ )[[:alpha:]]{2}($|\ )',1) count
    FROM t;
    
    STR                                                             COUNT
    ---------------------------------------------------------- ----------
    This is a test string to show how regexp work in Oracle db          4
    
    

    We can combine REGEXP_COUNT and REGEXP_SUBSTR (and a trick of the connect by clause) to extract in a single statement all the two-characters-long words:

    WITH T AS (
    SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
    Select trim(REGEXP_SUBSTR(str,'(^|\ )[[:alpha:]]{2}($|\ )',1, level)) substr
    FROM t connect by level<=REGEXP_COUNT(str,'(^|\ )[[:alpha:]]{2}($|\ )',1);
    
    SUBSTR
    ------
    is
    to
    in
    db
    
    

    Once seen all the functions let’s see REGEXP_LIKE. It’s a condition, that is to say it returns a boolean value and can be used in the WHERE or in the HAVING clause of a query statement:

    REGEXP_LIKE (source_string, pattern, match_parameter)
    

    source_string is the string in which to find;
    pattern is the regular expression;
    match_parameter works just like in the REGEXP_INSTR function.

    REGEXP_LIKE(str,ptrn,mp) is equivalent to REGEXP_INSTR(str,ptrn,1,1,mp)>0

    Let’s extract all the string that include at least one two-characters-long word:

    WITH T AS (
    SELECT 'String without 2chars words' str from dual union
    SELECT 'first string with a word made of two chars' from dual union
    SELECT 'second string with a word made of two chars' from dual union
    SELECT 'Another string without 2chars words' from dual
    )
    Select str
    from t
    where REGEXP_LIKE(str,'(^|\ )[[:alpha:]]{2}($|\ )');
    
    STR
    -------------------------------------------
    first string with a word made of two chars
    second string with a word made of two chars
    
    

    For the equivalence stated above, we could write:

    WITH T AS (
    SELECT 'String without 2chars words' str from dual union
    SELECT 'first string with a word made of two chars' from dual union
    SELECT 'second string with a word made of two chars' from dual union
    SELECT 'Another string without 2chars words' from dual
    )
    Select str
    from t
    where REGEXP_INSTR(str,'(^|\ )[[:alpha:]]{2}($|\ )')>0;
    
    

    That’s all about regular expressions in Oracle SQL & PL/SQL!

    Massimo

    Advertisements

    Tags: , , , ,

    2 Responses to “Regular expressions in Oracle SQL & PL/SQL”

    1. SutoCom Says:

      Reblogged this on Sutoprise Avenue, A SutoCom Source.

    2. Hierarchical queries in Oracle – The CONNECT BY clause « Welcome To Oracle database by Massimo Ruocchio Says:

      […] we need to make complex computations. I’ve used this ability, for example, in the post about regular expressions when I showed how REGEXP_COUNT […]

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out /  Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out /  Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out /  Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out /  Change )

    Connecting to %s


    %d bloggers like this: