Sending emails from Oracle using UTL_SMTP.

10/23/2012

In this post I want to show a simple way to send emails from an oracle database using the PL/SQL package UTL_SMTP.
Let’s start with a simple example:

CREATE OR REPLACE PROCEDURE SENDMAIL (dest in varchar2,msg in varchar2) is
    mailhost    VARCHAR2(10) := 'localhost';
    sender      VARCHAR2(20) := 'massimo@massimo.it';
    mail_conn   utl_smtp.connection;
    r           utl_smtp.replies;
BEGIN
    mail_conn := utl_smtp.open_connection(mailhost, 25);
    r:=utl_smtp.ehlo(mail_conn, mailhost);
    for i in r.first..r.last loop
       dbms_output.put_line('helo code='||r(i).code||' text='||r(i).text);
    end loop;
    utl_smtp.mail(mail_conn, sender);
    utl_smtp.rcpt(mail_conn, dest);
    utl_smtp.open_data(mail_conn);
    utl_smtp.write_data(mail_conn, 'From: '||sender||chr(13)|| CHR(10));
    utl_smtp.write_data(mail_conn, 'Subject: '||'messaggio di test'||chr(13)|| CHR(10));
    utl_smtp.write_data(mail_conn, 'To: '||dest||chr(13)|| CHR(10));
    utl_smtp.write_data(mail_conn, chr(13)|| CHR(10));
    utl_smtp.write_data(mail_conn, msg||chr(13)|| CHR(10));
    utl_smtp.close_data(mail_conn);
    utl_smtp.quit(mail_conn);
END;
/ 

The procedure gets as inputs the recipient’s address and the message to send.
Let’s analyze it step by step:

First of all, an smtp server to use is declared.
The sender’s address is hard-coded.
The first call opens the connection to the smtp server. To do this I’ve used the open_connection function that returns an handler to the connection. This handler will be always used for the whole program.
The next call is performed to initialize the connection starting the handshake phase.
There are two possible ehlo commands. The one I used in the most complete as it returns a PL/SQL table containing all the server’s abilities.
After receiving them I print them on the screen using DBMS_OUTPUT. The other option was the following procedure:

utl_smtp.helo(mail_conn, mailhost);

That’s identical to the previous one but has no output.

Once met the server we use the MAIL procedure to start the email sending process.
RCPT procedure selects the recipient’s mailbox. This procedure can be used more than once if we have more than one recipient.
The OPEN_DATA procedure opens the email’s body writing session.
After OPEN_DATA we can call as many WRITE_DATA as we need closed from a single call to COLSE_DATA.
In our example we use WRITE_DATA to write the following fields of the email: From, Subject, To.
The empty row closed the email’s header and starts the body.
In the body we could, once again, call WRITE_DATA as many times as we need. We get the text as an input parameter.
The send procedure is completed by the call to CLOSE_DATA, then we leave the server using QUIT.

Let’s run the procedure:

SQL> exec SENDMAIL('massimo.ruocchio@mymail.it','Dear friend, I''m writing...')
BEGIN SENDMAIL('massimo.ruocchio@mymail.it','Dear friend, I''m writing...'); END;

*
ERROR at line 1:
ORA-24247: network access denied by access control list (ACL)
ORA-06512: at "SYS.UTL_TCP", line 17
ORA-06512: at "SYS.UTL_TCP", line 267
ORA-06512: at "SYS.UTL_SMTP", line 161
ORA-06512: at "SYS.UTL_SMTP", line 197
ORA-06512: at "SCOTT.SENDMAIL", line 7
ORA-06512: at line 1

This error will be thrown only in Oracle11g, due to security configuration. Access Control Lists are lists that check whether a PL/SQL program can access critical external resources or not.
We must grant to the program the right to use resource.
Let’s create a specific ACL for the SMTP resource and let’s associate it to our server:


BEGIN
 DBMS_NETWORK_ACL_ADMIN.CREATE_ACL (
  acl          => 'SMTP_ACL.xml', 
  description  => 'ACL for SMTP',
  principal    => 'MAXR',
  is_grant     => TRUE, 
  privilege    => 'connect',
  start_date   => null,
  end_date     => null); 
END;
/

BEGIN
 DBMS_NETWORK_ACL_ADMIN.ASSIGN_ACL (
  acl         => 'SMTP_ACL.xml',
  host        => 'localhost', 
  lower_port  => 25,
  upper_port  => 25); 
END;
/

With the first statement I created the ACL that is stored in an xml document(SMTP_ACL.xml) then I associated it to the schema MAXR granting to him the right to connect to the resource.
With the second statement I assign the ACL to the resource specifying the range of ports.
Once made this security configuration I try again:

SQL> exec  SENDMAIL('massimo.ruocchio@mymail.it','Dear friend, I''m writing...')
helo code=250 text=Hello localhost
helo code=250 text=AUTH LOGIN
helo code=250 text=ENHANCEDSTATUSCODES
helo code=250 text=PIPELINING

The email has successful reached the destination.

SSL connection are supported since Oracle11gR2 using STARTTLS:

(for Oracle’s docs)
STARTTLS Function and Procedure
This subprogram sends the STARTTLS command to secure the SMTP connection using SSL/TLS. SSL/TLS requires an Oracle wallet which must be specified when the connection was opened by the OPEN_CONNECTION Functions.

Syntax

UTL_SMTP.STARTTLS (c IN OUT NOCOPY connection) RETURN reply;

UTL_SMTP.STARTTLS (c IN OUT NOCOPY connection);

Parameters
c SMTP connection

Return Values
reply SMTP reply

Usage Notes

The STARTTLS command must only be issued on an unencrypted connection and when the SMTP server indicates the support of the command in the reply of the EHLO command. The wallet to be used for encryption must have been specified when the initial SMTP connection was opened by the OPEN_CONNECTION function.

Examples

DECLARE
c utl_smtp.connection;
BEGIN
c := utl_smtp.open_connection(
host => ‘smtp.example.com’,
port => 25,
wallet_path => ‘file:/oracle/wallets/smtp_wallet’,
wallet_password => ‘password’,
secure_connection_before_smtp => FALSE);
utl_smtp.starttls(c);
END

Massimo

Advertisements

Transforming rows to columns in Oracle SQL – The PIVOT clause

10/18/2012

One of the most frequent problems in SQL is the need to represent as columns data contained in different rows of a database tables. This problem is usually resolved using the UNION operators, in Oracle11g we have a native tool that can help: PIVOT clause.

An example can explain better than 1000 words: the EMP table (old good SCOTT/TIGER schema) contains a record for each employee. Among others there are the following columns: ENAME (employee’s name), DEPTNO (department’s code) e SAL (employee’s salary). We need to extract a report with four columns: in the first one we want the employee’s name, in the second one the sum of salaries of employees that work in the department #10, in the third one the sum of salaries of employees that work in the department #20, in the forth one the sum of salaries of employees that work in the department #30. First of all let’s see how to resolve it with an UNION:

select ename, sal d10, null d20, null d30
  from emp where deptno=10
  union
  select ename, null d10, sal d20, null d30
  from emp where deptno=20
  union
  select ename, null d10, null d20, sal d30
  from emp where deptno=30
  ;

ENAME             D10        D20        D30
---------- ---------- ---------- ----------
ADAMS                       1100
ALLEN                                  1600
BLAKE                                  2850
CLARK            2450
FORD                        3000
JAMES                                   950
JONES                       2975
KING             5000
MARTIN                                 1250
MILLER           1300
SCOTT                       3000
SMITH                        800
TURNER                                 1500
WARD                                   1250

In Oracle11g we can use the PIVOT clause. This operator let us to define new columns specifying: the values of the filter column (DEPTNO in our example), the column to use for the calculation (SALARY) and the group function to apply (SUM):

select ename, d10,d20,d30 from emp
pivot (sum(sal) for deptno in 
         (10 as D10, 20 as d20, 30 as d30));

ENAME             D10        D20        D30
---------- ---------- ---------- ----------
MARTIN                                 1250
BLAKE                                  2850
MILLER           1300
WARD                                   1250
JONES                       2975
TURNER                                 1500
JAMES                                   950
KING             5000
ADAMS                       1100
FORD                        3000
CLARK            2450
SCOTT                       3000
SMITH                        800
ALLEN                                  1600

Once you get this first result we can write more complex statements to get more: the sum of salaries by department (with results in different columns):

select sum(d10),sum(d20),sum(d30) from emp
pivot (sum(sal) for deptno 
        in (10 as D10, 20 as d20, 30 as d30));


  SUM(D10)   SUM(D20)   SUM(D30)
---------- ---------- ----------
      8750      10875       9400

The sum of salaries by department (on columns) and job (on rows):

select job, sum(d10),sum(d20),sum(d30) from emp
pivot (sum(sal) for deptno 
         in (10 as D10, 20 as d20, 30 as d30))
group by job;


JOB         SUM(D10)   SUM(D20)   SUM(D30)
--------- ---------- ---------- ----------
SALESMAN                              5600
CLERK           1300       1900        950
PRESIDENT       5000
MANAGER         2450       2975       2850
ANALYST                    6000

Here’s the link to documentation for further details…

Massimo

Hierarchical queries in Oracle – The CONNECT BY clause

10/18/2012

When I met Oracle7, back in 1996, I was struck by a feature, hierarchical queries, that I found challenging but not really useful.
Except a couple of didactic examples I didn’t found for it any other practical application.
I was wrong, even more often I make use of the CONNECT BY clause to resolve problems that, at first glance, are not hierarchical at all, for example to anagram a word or to aggregate strings.
Let’s go step by step, I will start with the simplest examples and will show the most complex ones at the end of this post.

As usual we’ll work on the EMP table that is already furnished with a hierarchical self relation. On that table, indeed, there’s a column, MGR, that stores, for each employee, his boss’ EMPNO.
Data contained into the table is as follows:

SQL> select empno, ename, mgr
  2  from emp;

     EMPNO ENAME             MGR
---------- ---------- ----------
      7369 SMITH            7902
      7499 ALLEN            7698
      7521 WARD             7698
      7566 JONES            7839
      7654 MARTIN           7698
      7698 BLAKE            7839
      7782 CLARK            7839
      7788 SCOTT            7566
      7839 KING
      7844 TURNER           7698
      7876 ADAMS            7788
      7900 JAMES            7698
      7902 FORD             7566
      7934 MILLER           7782

KING is the root of the hierarchy since hi has no boss.

The first example of hierarchical query is the following:

SQL> select empno, ename, mgr, prior ename, level
  2  from emp
  3  connect by prior empno = mgr
  4  start with mgr is null;

     EMPNO ENAME             MGR PRIORENAME      LEVEL
---------- ---------- ---------- ---------- ----------
      7839 KING                                      1
      7566 JONES            7839 KING                2
      7788 SCOTT            7566 JONES               3
      7876 ADAMS            7788 SCOTT               4
      7902 FORD             7566 JONES               3
      7369 SMITH            7902 FORD                4
      7698 BLAKE            7839 KING                2
      7499 ALLEN            7698 BLAKE               3
      7521 WARD             7698 BLAKE               3
      7654 MARTIN           7698 BLAKE               3
      7844 TURNER           7698 BLAKE               3
      7900 JAMES            7698 BLAKE               3
      7782 CLARK            7839 KING                2
      7934 MILLER           7782 CLARK               3

Let’s analyze it: the CONNECT BY clause, mandatory to make a hierarchical query, is used to define how each record is connected to the hierarchical superior.
The father of the record having MGR=x has EMPNO=x.
On the other hand, given a record with EMPNO=x, all the records having MGR=x are his sons.
The unary operator PRIOR indicates “the father of”.
START WITH clause is used to from which records we want to start the hierarchy, in our example we want to start from the root of the hierarchy, the employee that has no manager.
The root of the hierarchy could be not unique. In this example it is.
The LEVEL pseudocolumn indicates at which level each record stays in the hierarchy, starting from the root that has level=1.

Once understood the quey, we can read our resulting tree: KING is the root and has level=1.
Under KING there are three employees at level 2(JONES,BLAKE e CLARK). Then the others.
How does Oracle make the hierarchy? First of all it reads the records.
Then it determines the roots applying the START WITH clause.
Then, starting from each root, it determines the first-level sons applying the CONNECT BY clause and so on…

To enhance the layout of the output we can use the following trick:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, level
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null;

     EMPNO NAME                      MGR PRIORENAME      LEVEL
---------- ------------------ ---------- ---------- ----------
      7839    KING                                           1
      7566       JONES              7839 KING                2
      7788          SCOTT           7566 JONES               3
      7876             ADAMS        7788 SCOTT               4
      7902          FORD            7566 JONES               3
      7369             SMITH        7902 FORD                4
      7698       BLAKE              7839 KING                2
      7499          ALLEN           7698 BLAKE               3
      7521          WARD            7698 BLAKE               3
      7654          MARTIN          7698 BLAKE               3
      7844          TURNER          7698 BLAKE               3
      7900          JAMES           7698 BLAKE               3
      7782       CLARK              7839 KING                2
      7934          MILLER          7782 CLARK               3

We just added some white-spaces on the left of the name depending on the level. Now it’s all more clear…

Records order defines the hierarchy, adding an ORDER BY clause we could completely lose the hierarchy.
We can decide how to sort siblings, that is to say records on the same level. For example JONES, BLAKE and CLARK:

SQL> select empno,lpad(' ',level*3,' ')||ename nome,
  2         mgr, prior ename, level
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null
  6  order siblings by ename;

     EMPNO NOME                      MGR PRIORENAME      LEVEL
---------- ------------------ ---------- ---------- ----------
      7839    KING                                           1
      7698       BLAKE              7839 KING                2
      7499          ALLEN           7698 BLAKE               3
      7900          JAMES           7698 BLAKE               3
      7654          MARTIN          7698 BLAKE               3
      7844          TURNER          7698 BLAKE               3
      7521          WARD            7698 BLAKE               3
      7782       CLARK              7839 KING                2
      7934          MILLER          7782 CLARK               3
      7566       JONES              7839 KING                2
      7902          FORD            7566 JONES               3
      7369             SMITH        7902 FORD                4
      7788          SCOTT           7566 JONES               3
      7876             ADAMS        7788 SCOTT               4

With no impact on the hierarchy we’ve sorted siblings by name.

If you work on Oracle9i you can stop reading, what follows is available on Oracle10g and 11g…

In any hierarchy infinite loops can raise. For example what happens if KING has a manager himself?

SQL> update emp set mgr=7369 where ename='KING';

KING is one of the SMITH’s sons, SMITH is himself a KING’s great grandson…

What if we run the same query now?

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, level
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null
  6  order siblings by ename;

No rows selected

Of course, because there’s no record that has MGR null… So let’s change the START WITH as follows:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, level
  3    from emp
  4  connect by prior empno = mgr
  5    start with empno=7839;
ERROR:
ORA-01436: CONNECT BY loop in user data

Here’s the loop, Oracle can’t create the hierarchy.
Oracle has managed it introducing the NOCYCLE clause.

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, level
  3    from emp
  4  connect by nocycle prior empno = mgr
  5    start with empno=7839;

     EMPNO NAME                      MGR PRIORENAME      LEVEL
---------- ------------------ ---------- ---------- ----------
      7839    KING                  7369                     1
      7566       JONES              7839 KING                2
      7788          SCOTT           7566 JONES               3
      7876             ADAMS        7788 SCOTT               4
      7902          FORD            7566 JONES               3
      7369             SMITH        7902 FORD                4
      7698       BLAKE              7839 KING                2
      7499          ALLEN           7698 BLAKE               3
      7521          WARD            7698 BLAKE               3
      7654          MARTIN          7698 BLAKE               3
      7844          TURNER          7698 BLAKE               3
      7900          JAMES           7698 BLAKE               3
      7782       CLARK              7839 KING                2
      7934          MILLER          7782 CLARK               3

That clause tells to Oracle to stop when a loop is raised, Oracle goes on on the other branchs of the tree.
After SMITH another instance of KING (and all his descendant) should be extracted, and so on, generating an infinite output. The NOCYCLE clause prevents that.

CONNECT_BY_ISCYCLE pseudocolumn tells us in which records a loop has been detected:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, connect_by_iscycle IFLOOP
  3    from emp
  4  connect by nocycle prior empno = mgr
  5    start with empno=7839;

     EMPNO NAME                      MGR PRIORENAME     IFLOOP
---------- ------------------ ---------- ---------- ----------
      7839    KING                  7369                     0
      7566       JONES              7839 KING                0
      7788          SCOTT           7566 JONES               0
      7876             ADAMS        7788 SCOTT               0
      7902          FORD            7566 JONES               0
      7369             SMITH        7902 FORD                1
      7698       BLAKE              7839 KING                0
      7499          ALLEN           7698 BLAKE               0
      7521          WARD            7698 BLAKE               0
      7654          MARTIN          7698 BLAKE               0
      7844          TURNER          7698 BLAKE               0
      7900          JAMES           7698 BLAKE               0
      7782       CLARK              7839 KING                0
      7934          MILLER          7782 CLARK               0

In addition to PRIOR another unary operator exists that’s really useful: CONNECT_BY_ROOT.
It allows us to display the root of a given record:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         connect_by_root ename boss
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null;

     EMPNO NAME               BOSS
---------- ------------------ ----------
      7839    KING            KING
      7566       JONES        KING
      7788          SCOTT     KING
      7876             ADAMS  KING
      7902          FORD      KING
      7369             SMITH  KING
      7698       BLAKE        KING
      7499          ALLEN     KING
      7521          WARD      KING
      7654          MARTIN    KING
      7844          TURNER    KING
      7900          JAMES     KING
      7782       CLARK        KING
      7934          MILLER    KING

In our example there’s a single root, KING, but if we modify another record:

SQL> update emp set mgr=null where ename='BLAKE';

We get:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         connect_by_root ename boss
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null;

     EMPNO NAME               BOSS
---------- ------------------ ----------
      7698    BLAKE           BLAKE
      7499       ALLEN        BLAKE
      7521       WARD         BLAKE
      7654       MARTIN       BLAKE
      7844       TURNER       BLAKE
      7900       JAMES        BLAKE
      7839    KING            KING
      7566       JONES        KING
      7788          SCOTT     KING
      7876             ADAMS  KING
      7902          FORD      KING
      7369             SMITH  KING
      7782       CLARK        KING
      7934          MILLER    KING

Where, for each record, we display who is the “big chief”…

Another really useful function is SYS_CONNECT_BY_PATH, it gets as input a field and a character and builds the full path from the root to the current record using the character as a separator:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         sys_connect_by_path(ename,'/') bosses
  3    from emp
  4  connect by prior empno = mgr
  5*   start with mgr is null

     EMPNO NAME               BOSSES
---------- ------------------ --------------------------------
      7839    KING            /KING
      7566       JONES        /KING/JONES
      7788          SCOTT     /KING/JONES/SCOTT
      7876             ADAMS  /KING/JONES/SCOTT/ADAMS
      7902          FORD      /KING/JONES/FORD
      7369             SMITH  /KING/JONES/FORD/SMITH
      7698       BLAKE        /KING/BLAKE
      7499          ALLEN     /KING/BLAKE/ALLEN
      7521          WARD      /KING/BLAKE/WARD
      7654          MARTIN    /KING/BLAKE/MARTIN
      7844          TURNER    /KING/BLAKE/TURNER
      7900          JAMES     /KING/BLAKE/JAMES
      7782       CLARK        /KING/CLARK
      7934          MILLER    /KING/CLARK/MILLER

There’s another useful pseudocolumn to show, CONNECT_BY_ISLEAF.

It tells us whether a record is a leaf of the tree or not:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2        connect_by_isleaf ifleaf
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null;

     EMPNO NAME                   IFLEAF
---------- ------------------ ----------
      7839    KING                     0
      7566       JONES                 0
      7788          SCOTT              0
      7876             ADAMS           1
      7902          FORD               0
      7369             SMITH           1
      7698       BLAKE                 0
      7499          ALLEN              1
      7521          WARD               1
      7654          MARTIN             1
      7844          TURNER             1
      7900          JAMES              1
      7782       CLARK                 0
      7934          MILLER             1
      

Really interesting is the ability of CONNECT BY to generate more records from a table that contains only one row:

SQL> select level from dual
  2  connect by level<=10;

     LEVEL
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

We can extract from DUAL, that contains just one row, all the records we need to make complex computations.
I’ve used this ability, for example, in the post about regular expressions when I showed how REGEXP_COUNT works.

That’s, more or less, everything about theory, let’s see two practical examples.

The first one has been already showed in another post: About string concatenation across different records.


 SELECT deptno, ltrim(SYS_CONNECT_BY_PATH(ename, ','),',') enames
  FROM (select deptno, ename, rank() 
        over(partition by deptno order by rownum) num from emp)
where connect_by_isleaf=1
START WITH num=1
CONNECT BY PRIOR num+1 = num and prior deptno=deptno;

    DEPTNO ENAMES
---------- ----------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES

Using SYS_CONNECT_BY_PATH we was able to put in a single string, separated by comma, the names of the employees that work in the same department.
And this has absolutely nothing to do with the MGR column…

The other example is the solution that I proposed to solve a problem posted on the official Oracle SQL & PL/SQL forum. The question was how to determine whether a word is the anagram of another one or not.
Here’s the query I wrote to extract all the anagrams of a given word:

with t as 
(select 'bolt' name from dual)
select distinct replace(sys_connect_by_path(c,' '),' ',null) str 
  from (select substr(name,level,1) c, name, 
               level*1000+ascii(substr(name,level,1)) cod 
          from t
        connect by level<=length(name))
 where level=length(name)
connect by nocycle cod != prior cod

STR
-------------------------
oltb
otbl
lbot
tlbo
blto
lotb
ltob
otlb
lbto
tblo
bolt
btol
oblt
lobt
tobl
blot
obtl
tbol
tolb
olbt
ltbo
tlob
botl
btlo

24 rows selected.

The word to anagram was “bolt”.

First of all the query extracts all the letters from the word marking each with an unique code:

SQL> with t as
  2  (select 'bolt' name from dual)
  3  select substr(name,level,1) c, name,
  4                 level*1000+ascii(substr(name,level,1)) cod
  5            from t
  6          connect by level<=length(name);

C NAME        COD
- ---- ----------
b bolt       1098
o bolt       2111
l bolt       3108
t bolt       4116

Then I’ve used sys_connect_by_path to build all the possible combinations of such letters.
Interesting, isn’t it?

Massimo

PRAGMA directives in Oracle PL/SQL

10/16/2012

In this post we’re going to analyze all the five PRAGMA directives of Oracle’s PL/SQL. Four of them exist since Oracle8i while the last one has been introduced with Oracle11g.

Let’s begin with PRAGMA EXCEPTION_INIT.
This directive allows us to associate an ORA error code to an user-defined PL/SQL exception.
Once the association as been done we’ll be able to manage the exception in our code as it was a predefined exception (just like NO_DATA_FOUND or TOO_MANY_ROWS).
Let’s see an example.

We need a function that converts a string to a date using the ‘YYYY-MM-DD’ format:

SQL> create or replace function string2date (str in varchar2) return date is
  2  retDate date;
  3  begin
  4    retDate := to_date(str,'yyyy-mm-dd');
  5    return retDate;
  6  end;
  7  /


SQL> select string2date('2010-01-31')
  2  from dual;

STRING2DA
---------
31-JAN-10

SQL> select string2date('werrwer')
  2  from dual;
select string2date('werrwer')
       *
ERROR at line 1:
ORA-01841: (full) year must be between -4713 and +9999, and not be 0
ORA-06512: at "MAXR.STRING2DATE", line 4

As the example shows, if the input string does not conform to the format we get the ORA-1841 error.
We want to manage this error using the PRAGMA EXCEPTION_INIT directive:


SQL> create or replace function string2date (str in varchar2) return date is
  2  retDate date;
  3  not_valid_date exception;
  4  PRAGMA EXCEPTION_INIT(not_valid_date,-1841);
  5  begin
  6    retDate := to_date(str,'yyyy-mm-dd');
  7    return retDate;
  8  exception
  9    when not_valid_date then
 10     dbms_output.put_line('Error: the string '||str||' cannot be converted to a date!');
 11     return null;
 12  end;
 13  /

SQL> set serverout on
SQL> select string2date('werrwer')
  2  from dual;

STRING2DA
---------


Error: the string werrwer cannot be converted to a date!

We’re defining a new exception not_valid_date, but it will be never called if we don’t associate it to the ORA-1841 error using the PRAGMA.
Once we have made the association Oracle knows that, in case of the ORA-1841 error, the not_valid_date exception must be raised.

PRAGMA RESTRICT_REFERENCES allows us to explicitly declare that a PL/SQL program doesn’t read/write in db objects or in package variables.

In some situations, only functions that guarantee those restrictions can be used.
The following is a simple example:
Let’s define a package made of a single function that updates a db table and returns a number:

SQL> create or replace package pack is
  2  function a return number;
  3  end;
  4  /

SQL> create or replace package body pack is
  2  function a return number is
  3  begin
  4    update emp set empno=0 where 1=2;
  5    return 2;
  6  end;
  7  end;
  8  /

If we try to use the function pack.a in a query statement we’ll get an error:


SQL> select pack.a from dual;
select pack.a from dual
       *
ERROR at line 1:
ORA-14551: cannot perform a DML operation inside a query
ORA-06512: a "MAXR.PACK", line 4

PL/SQL functions can be used inside a query statement only if they don’t modify neither the db nor packages’ variables.

This error can be descovered only at runtime, when the select statement is executed.
How can we check for this errors at compile time? We can use PRAGMA RESTRICT_REFERENCES!
If we know that the function will be used in SQL we can define it as follows:

SQL> create or replace package pack is
  2  function a return number;
  3  pragma restrict_references(a,'WNDS');
  4  end;
  5  /

Declaring that the function A will not modify the database state (WNDS stands for WRITE NO DATABASE STATE).
Once we have made this declaration, if a programmer, not knowing that the function has to be used in a query statement, tries to write code for A that violates the PRAGMA:

SQL> create or replace package body pack is
  2  function a return number is
  3  begin
  4    update emp set empno=0 where 1=2;
  5    return 2;
  6  end;
  7  end;
  8  /

Warning: Package Body created with compilation errors.

SVIL>sho err
Errors for PACKAGE BODY PACK:

LINE/COL ERROR
-------- -----------------------------------------------------------------
2/1      PLS-00452: Subprogram 'A' violates its associated pragma

He(She)’ll get an error at compile time…

Pragma RESTRICT_REFERENCE is deprecated and could be removed from future versions of Oracle.

PRAGMA SERIALLY_REUSABLE tells to the compiler that the package’s variables are needed for a single use. After this single use Oracle can free the associated memory. It’s really useful to save memory when a packages uses large temporary space just once in the session.
Let’s see an example.

Let’s define a package with a single numeric variable “var” not initialized:

SQL> create or replace package pack is
  2  var number;
  3  end;
  4  /

If we assign a value to var, this will preserve that value for the whole session:

SQL> begin
  2  pack.var := 1;
  3  end;
  4  /

SQL> exec dbms_output.put_line('Var='||pack.var);
Var=1

If we use the PRAGMA SERIALLY_REUSABLE, var will preserve the value just inside the program that initializes it, but is null in the following calls:

SQL> create or replace package pack is
  2  PRAGMA SERIALLY_REUSABLE;
  3  var number;
  4  end;
  5  /

SQL> begin
  2  pack.var := 1;
  3  dbms_output.put_line('Var='||pack.var);
  4  end;
  5  /
Var=1

SQL> exec dbms_output.put_line('Var='||pack.var);
Var=

PRAGMA SERIALLY_REUSABLE is a way to change the default behavior of package variables that is as useful as heavy for memory.

PRAGMA AUTONOMOUS_TRANSACTION declare to the compiler that a given program has to run into a dedicated transaction, ignoring all uncommitted data changes made into the original transaction of the calling program.

The sum of salaries in EMP is:

SQL> select sum(sal) from emp;

  SUM(SAL)
----------
     29025

Let’s define two functions that do exactly the same thing, read and return the sum of salaries of EMP:

SQL> create or replace function getsal return number is
  2  s number;
  3  begin
  4    select sum(sal) into s from emp;
  5    return s;
  6  end;
  7  /

SQL> create or replace function getsal_AT return number is
  2  PRAGMA AUTONOMOUS_TRANSACTION;
  3  s number;
  4  begin
  5    select sum(sal) into s from emp;
  6    return s;
  7  end;
  8  /

SQL> select sum(sal), getsal, getsal_AT
  2  from emp;

  SUM(SAL)     GETSAL  GETSAL_AT
---------- ---------- ----------
     29025      29025      29025

The second one uses the PRAGMA AUTONOMOUS_TRANSACTION. Now let’s cut all the salaries:


SQL>  update emp set sal=10;

SQL> select sum(sal), getsal, getsal_AT
  2  from emp;

  SUM(SAL)     GETSAL  GETSAL_AT
---------- ---------- ----------
       140        140      29025

GETSAL is seeing uncommitted changed data while GETSAL_AT, defined using PRAGMA AUTONOMOUS_TRANSACTION, reads data as they where before the UPDATE statement…

The only PRAGMA recently added (in Oracle11g) is PRAGMA INLINE.

In Oracle11g has been added a new feature that optimizer can use to get better performances, it’s called Subprogram Inlining.
Optimizer can (autonomously or on demand) choose to replace a subprogram call with a local copy of the subprogram.

For example, assume the following code:

declare
total number;
begin
 total := calculate_nominal + calculate_interests;
end;

Where calculate_nominal and calculate_interests are two functions defined as follows:

function calculate_nominal return number is 
s number;
begin
  select sum(nominal)
    into s
    from deals;
    
  return s;
end;

function calculate_interests return number is 
s number;
begin
  select sum(interest)
    into s
    from deals;
    
  return s;
end;

Optimizer can change the code to something like this:

declare
total number;
v_calculate_nominal number;
v_calculate_interests number;
begin
  select sum(nominal)
    into v_calculate_nominal
    from deals;

  select sum(interest)
    into v_calculate_interests
    from deals;

 total := v_calculate_nominal + v_calculate_interests;
end;

Including a copy of the subprograms into the calling program.

PRAGMA INLINE is the tool that we own to drive this new feature.
If we don’t want such an optimization we can do:

declare
total number;
begin
 PRAGMA INLINE(calculate_nominal,'NO');
 PRAGMA INLINE(calculate_interests,'NO');
 total := calculate_nominal + calculate_interests;
end;

If we do want subprogram inlining on calculate_nominal we do:

declare
total number;
begin
 PRAGMA INLINE(calculate_nominal,'YES');
 total := calculate_nominal + calculate_interests;
end;

Subprogram inlining behave differently depending on the level of optimization defined through the db initialization variable PLSQL_OPTIMIZE_LEVEL.
If this variable is set to 2 (that’s the default value) optimizer never uses subprogram inlining unless the programmer requests it using PRAGMA INLINE YES.
If PLSQL_OPTIMIZE_LEVEL=3 optimizer can autonomously decide whether to use subprogram inlining or not. In this case PRAGMA INLINE YES does not force the optimizer, it’s just an hint.

That’s all about pragmas by now.

Massimo

Collecting statistics using DBMS_STATS

10/15/2012

When Oracle introduced cost based optimizer, the only statement to gather statistics was

SQL> Analyze table EMP compute statistics;

Starting with Oracle8 the DBMS_STATS package as been introduced to offer many useful functions to gather and manage statistics, in this article we’ll take a quick look.

Functions contained in DBMS_STATS allow us to

Gather Statistics

Following procedures allows us to gather stats:

GATHER_DATABASE_STATS
GATHER_DICTIONARY_STATS
GATHER_FIXED_OBJECTS_STATS
GATHER_INDEX_STATS
GATHER_SCHEMA_STATS
GATHER_SYSTEM_STATS
GATHER_TABLE_STATS

These procedures allow us to gather stats for the whole db, at schema level or for single objects.
The GATHER_SYSTEM_STATS is the only one that doesn’t gather object statistics but hardware (CPU and disks) ones .
Statistics can be written in the data dictionary (ed used by the optimizer), or in a database table (created using the CREATE_STAT_TABLE procedure).

Manage Statistics

The following procedures:

PREPARE_COLUMN_VALUES
PREPARE_COLUMN_VALUES_NVARCHAR2
PREPARE_COLUMN_VALUES_ROWID
SET_COLUMN_STATS
SET_INDEX_STATS
SET_SYSTEM_STATS
SET_TABLE_STATS
GET_COLUMN_STATS
GET_INDEX_STATS
GET_SYSTEM_STATS
GET_TABLE_STATS

Allow us to manage stats setting or getting specific values.
PREPARE* procedures are useful to convert values to oracle internal formats to set them using SET_COLUMN_VALUE.

Delete Statistics

Some procedures allow us to delete stats:

DELETE_COLUMN_STATS
DELETE_DATABASE_STATS
DELETE_DICTIONARY_STATS
DELETE_FIXED_OBJECTS_STATS
DELETE_INDEX_STATS
DELETE_SCHEMA_STATS
DELETE_SYSTEM_STATS
DELETE_TABLE_STATS

Export/Import Statistics

As already seen stats can be written either in the data dictionary or into a specific database table.
Stats collected into the table can’t be used by the optimizer.
To create a database table suited to contain stats we have to use the following procedure

CREATE_STAT_TABLE

That can be dropped using

DROP_STAT_TABLE

There are some procedures useful to export stats from the data dictionary to the table:

EXPORT_COLUMN_STATS
EXPORT_DATABASE_STATS
EXPORT_DICTIONARY_STATS
EXPORT_FIXED_OBJECTS_STATS
EXPORT_INDEX_STATS
EXPORT_SCHEMA_STATS
EXPORT_SYSTEM_STATS
EXPORT_TABLE_STATS

Or from the table to the data dictionary:

IMPORT_COLUMN_STATS
IMPORT_DATABASE_STATS
IMPORT_DICTIONARY_STATS
IMPORT_FIXED_OBJECTS_STATS
IMPORT_INDEX_STATS
IMPORT_SCHEMA_STATS
IMPORT_SYSTEM_STATS
IMPORT_TABLE_STATS

Lock and unlock statistics

If we need to freeze an object’s stats we can use
LOCK_SCHEMA_STATS
LOCK_TABLE_STATS

The unlock can be achieved using
UNLOCK_SCHEMA_STATS
UNLOCK_TABLE_STATS

This feature is really useful when we don’t want Oracle to gather statistics for a specific table.
We can achieve this deleting the stats and then calling LOCK_TABLE_STATS:


SQL> BEGIN
  2  DBMS_STATS.DELETE_TABLE_STATS('MAXR','EMP');
  3  DBMS_STATS.LOCK_TABLE_STATS('MAXR','EMP');
  4  END;
  5  /

After this the stats are and will remain empty.

Purge and Restore statistics

The PURGE_STATS procedure purges all the stats deleted before a given timestamp.

If, on the other hand, we need to restore statistics as they were at a given timestamp we can use:

RESTORE_DICTIONARY_STATS
RESTORE_FIXED_OBJECTS_STATS
RESTORE_SCHEMA_STATS
RESTORE_SYSTEM_STATS
RESTORE_TABLE_STATS

Pending Statistics

Sometimes we need to have a preview of the effect of gathered statistics before activating them.
This is why we can use “Pending Statistics”.
To mark gathered statistics as “pending”, hence temporarily inactive, we hate to set to FALSE the PUBLISH parameter using one of the following:

SET_DATABASE_PREFS
SET_GLOBAL_PREFS
SET_SCHEMA_PREFS
SET_TABLE_PREFS

For example, to mark as pending the stats on the EMP table:


SQL> begin
  2  DBMS_STATS.SET_TABLE_PREFS ('MAXR', 'EMP', 'PUBLISH', 'FALSE');
  3  end;
  4  /

Once gathered the stats and evaluated their impact on execution plans we can decide to delete them:
DELETE_PENDING_STATS

Export them in a table:
EXPORT_PENDING_STATS

Mark them as published (used by optimizer):
PUBLISH_PENDING_STATS

Comparing Statistics

We can obtain a comparative report between two version of the stats using
DIFF_TABLE_STATS_IN_HISTORY
DIFF_TABLE_STATS_IN_PENDING
DIFF_TABLE_STATS_IN_STATTAB

These are pipelined functions that can be used in the FROM clause of a query statement.
Let’s see an example.

First of all let’s create a test table:


SQL> create table testtab as select * from emp;

It’s like the EMP table, even as data content.
Let’s gather statistics:


SQL> exec DBMS_STATS.GATHER_TABLE_STATS('MAXR','TESTTAB')

SQL> select systimestamp  from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
28-FEB-10 21:44:41,859000 +01:00

Now let’s insert more records:


SQL> insert into testtab
  2  select * from testtab;

SQL> r
  1  insert into testtab
  2* select * from testtab

SQL> r
  1  insert into testtab
  2* select * from testtab

SQL> r
  1  insert into testtab
  2* select * from testtab

SQL> r
  1  insert into testtab
  2* select * from testtab

SQL> r
  1  insert into testtab
  2* select * from testtab

SQL> r
  1  insert into testtab
  2* select * from testtab

SQL> r
  1  insert into testtab
  2* select * from testtab

SQL> r
  1  insert into testtab
  2* select * from testtab

SQL> r
  1  insert into testtab
  2* select * from testtab

SQL> r
  1  insert into testtab
  2* select * from testtab

SQL> r
  1  insert into testtab
  2* select * from testtab

SQL> r
  1  insert into testtab
  2* select * from testtab

SQL> commit;

And gather stats again:


SQL> exec DBMS_STATS.GATHER_TABLE_STATS('MAXR','TESTTAB');

Now we are ready to compare:


SQL> set long 5000
SQL> select * from
  2* table(DBMS_STATS.DIFF_TABLE_STATS_IN_HISTORY(
           'MAXR',
           'TESTTAB',
           to_timestamp('20100228214442','yyyymmddhh24miss'),
           systimestamp,
           0)
           );

REPORT                                                                           MAXDIFFPCT
-------------------------------------------------------------------------------- ----------
###############################################################################      819100

STATISTICS DIFFERENCE REPORT FOR:
.................................

TABLE         : TESTTAB
OWNER         : MAXR
SOURCE A      : Statistics as of 28-FEB-10 21:44:42,000000 +01:00
SOURCE B      : Statistics as of 28-FEB-10 21:50:16,359000 +01:00
PCTTHRESHOLD  : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

REPORT                                                                           MAXDIFFPCT
-------------------------------------------------------------------------------- ----------

TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME                  TYP SRC ROWS       BLOCKS     ROWLEN     SAMPSIZE
...............................................................................

TESTTAB                     T   A   14         4          37         14
                                B   114688     751        37         114688
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


REPORT                                                                           MAXDIFFPCT
-------------------------------------------------------------------------------- ----------
COLUMN STATISTICS DIFFERENCE:
.............................

COLUMN_NAME     SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................

COMM            A   4       ,25        NO   10      2    80    C20F  4
                B   4       ,25        NO   81920   2    80    C20F  32768
MGR             A   6       ,166666666 NO   1       4    C24C4 C2500 13
                B   6       ,166666666 NO   8192    4    C24C4 C2500 106496
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

REPORT                                                                           MAXDIFFPCT
-------------------------------------------------------------------------------- ----------


NO DIFFERENCE IN INDEX / (SUB)PARTITION STATISTICS
###############################################################################

Extended statistics

DBMS_STATS allows us also to gather statistics on groups of columns or expressions. These are known as “extended statistics”.
Statistics on groups of columns are used by optimizer when, in a query statement, there are conditions on more than one column. A relation between columns (evaluated reading extended stats) can cause the optimizer to choose a different execution plan.

To create extended statistics we can use
CREATE_EXTENDED_STATS

to drop them:
DROP_EXTENDED_STATS

To see extensions’ names:
SHOW_EXTENDED_STATS_NAME

For example, to create an extended statistic on the correlation between EMPNO and ENAME:


SQL> exec dbms_output.put_line(DBMS_STATS.CREATE_EXTENDED_STATS('MAXR','TESTTAB','(EMPNO,ENAME)'))
SYS_STU7CJ_LJ#28ARDED17#AUWYJZ

I’ve printed the return value that is the name of the extension created.

If I lose this name I can always get it using SHOW_EXTENDED_STATS_NAME:

SQL> select dbms_stats.SHOW_EXTENDED_STATS_NAME('MAXR','TESTTAB','(ENAME, EMPNO)')
  2  from dual;

DBMS_STATS.SHOW_EXTENDED_STATS_NAME('MAXR','TESTTAB','(ENAME,EMPNO)')
---------------------------------------------------------------------------------------
SYS_STU7CJ_LJ#28ARDED17#AUWYJZ

SQL> select dbms_stats.SHOW_EXTENDED_STATS_NAME('MAXR','TESTTAB','(DEPTNO, ENAME)')
  2  from dual;
select dbms_stats.SHOW_EXTENDED_STATS_NAME('MAXR','TESTTAB','(DEPTNO, ENAME)')
       *
ERROR at line 1:
ORA-20000: extension "(DEPTNO, ENAME)" does not exist
ORA-06512: a "SYS.DBMS_STATS", line 25416

That, as you can see in the last example, gives error if we try to get the name of an undefined extension.

Another kind of extension that we can create is on an expression, for example:

SQL>  exec dbms_output.put_line(DBMS_STATS.CREATE_EXTENDED_STATS('MAXR','TESTTAB','(TO_CHAR(EMPNO))'))
SYS_STUM9HJB83_TSA6P91FJMZ84U9

That’s all, by now, about the DBMS_STATS package.

Massimo

About string concatenation across different records

10/15/2012

We all know how the GROUP BY clause works and how to use it. Group function (SUM, AVG, MIN, MAX, COUNT, …) are often used and well known too.
There’s a specific “grouping problem” that we often have to manage and that doesn’t find an answer in those functions: Using the EMP table (ENAME is the employee name, DEPTNO is the department number) we need to extract a record for each DEPTNO associated to the list of the names of the employees that work in that department.
Here are all the employees:

SQL> select deptno, ename
  2  from emp
  3* order by deptno

    DEPTNO ENAME
---------- ----------
        10 CLARK
        10 KING
        10 MILLER
        20 JONES
        20 FORD
        20 ADAMS
        20 SMITH
        20 SCOTT
        30 WARD
        30 TURNER
        30 ALLEN
        30 JAMES
        30 BLAKE
        30 MARTIN

And we want to get something similar to the following:

    DEPTNO ENAMES
---------- ----------
        10 CLARK,KING,MILLER
        20 JONES,FORD,ADAMS,SMITH,SCOTT
        30 WARD,TURNER,ALLEN,JAMES,BLAKE,MARTIN

The most natural approach is probably a PL/SQL function:

create or replace function getNames(p_deptno in number) return varchar2 is
retval varchar2(32000);
begin
  for r in (select ename from emp where deptno = p_deptno) loop
    retval := retval||r.ename||',';
  end loop;
  return rtrim(retval,',');
end;
/

SQL> select deptno,getNames(deptno) enames
  2  from emp
  3* group by deptno

    DEPTNO ENAMES
---------- -------------------------------------------
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        10 CLARK,KING,MILLER

The second approach I want to show is based on an undocumented group function: WM_CONCAT.
This function is defined in the WMSYS schema. That schema contains system object used by the “Workspace Manager” feature (more info here.
It’s an undocumented function, used by Oracle for internal purposes only. Hence you shouldn’t use it in production environments.
But it exists and works perfectly:

SQL> select deptno, wm_concat(ename) enames
  2  from emp
  3  group by deptno;

    DEPTNO ENAMES
---------- -------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,FORD,ADAMS,SCOTT,JONES
        30 ALLEN,BLAKE,MARTIN,TURNER,JAMES,WARD

The third approach I’m going to show uses the SYS_CONNECT_BY_PATH function.
This function can be used in hierarchical queries only (CONNECT BY) takes as input a column and a character and gives as output the full path of each element starting from the root of the hierarchy, using the given character as a separator.
An example will clarify:

 SELECT deptno, ltrim(SYS_CONNECT_BY_PATH(ename, ','),',') enames
  FROM (select deptno, ename, rank() over(partition by deptno order by rownum) num from emp)
where connect_by_isleaf=1
START WITH num=1
CONNECT BY PRIOR num+1 = num and prior deptno=deptno;

    DEPTNO ENAMES
---------- ---------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES

Not clear yet? Ok, let’s see step by step:
First of all I started from a query that shows the list of the employees sorted using a number (num) incremented at each row and reset when the department changes.

SQL> select deptno, ename, rank() over(partition by deptno order by rownum) num
  2         from emp
  3  ;

    DEPTNO ENAME             NUM 
---------- ---------- ---------- 
        10 CLARK               1 
        10 KING                2 
        10 MILLER              3 
        20 SMITH               1 
        20 JONES               2 
        20 SCOTT               3 
        20 ADAMS               4 
        20 FORD                5 
        30 ALLEN               1 
        30 WARD                2 
        30 MARTIN              3 
        30 BLAKE               4 
        30 TURNER              5 
        30 JAMES               6 

To get this result I used analytic functions, available in Oracle since Oracle9i.

Second step: I introduced the hierarchical query: who has no experience with the CONNECT BY clause can see the docs.

I connect each ename to the other having the same deptno and the next num using the following clause

CONNECT BY PRIOR num+1 = num and prior deptno=deptno

I drop incomplete paths using the following

WHERE connect_by_isleaf=1

that allows to select only the leafs of the tree.

Finally I get only the paths that start from the first ename


START WITH num=1

Try to use this approach yourself, you will understand its power playing with it.

Forth approach: an XML query.

SQL> select deptno, rtrim(
  2                   xmlagg(
  3                     xmlelement("x",ename||',')
  4                          ).extract('//text()'),',') enames
  5    from emp
  6  group by deptno;

    DEPTNO ENAMES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,FORD,ADAMS,SCOTT,JONES
        30 ALLEN,BLAKE,MARTIN,TURNER,JAMES,WARD

For each department I get, using the XMLAGG function, an XML document that aggregate all the employees that work in that department. Then I extract only the element’s content using the EXTRACT function.

Fifth approach: the LISTAGG function.
It’s a new function available in Oracle11g release 2.
Here is documentation.


SQL >SELECT deptno, LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) enames
  2    FROM emp
  3  group by deptno;

    DEPTNO ENAMES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

Works like a charm!!

Massimo

Regular expressions in Oracle SQL & PL/SQL

10/15/2012

Since Oracle10g new functions have been added to Oracle to allow the use of regular expressions in SQL and PL/SQL. These functions conform to the POSIX standard.
In Oracle10g the functions REGEXP_INSTR, REGEXP_REPLACE, REGEXP_SUBSTR and the condition REGEXP_LIKE have been added.
Then in Oracle11g the function REGEXP_COUNT has been added too. Let’s see how to use them.

REGEXP_INSTR works like the INSTR: it’s useful to find, into a given string, a substring that matches a given pattern. If the function doesn’t find any substring matching the pattern returns 0. The signature of the function is

REGEXP_INSTR (source_string, pattern, position, occurrence, return_option, match_parameter)

Where source_string is the string in which you’re searching;
pattern is the regular expression;
position is the position starting from you want to search;
occurrence is the requested occurrence of the searching string;
return_option can evaluate to 0 if you want the position of the first character of the substring found, 1 if you want the position of the first character after it;
match_parameter is a flag that can evaluate to:

  • i (case-insensitive search),
  • c (case-sensitive search),
  • n (the wildcard ‘.’ finds the carriage return character too),
  • m (the string is multi-line)
  • Only the first two are mandatory.

    Let’s see an example: let’s find into the string ‘This is a test string to show how regexp work in Oracle db’ the second two-characters-long word:

    WITH T AS (
    SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
    Select str, REGEXP_INSTR(str,'(^|\ )[[:alpha:]]{2}($|\ )', 1, 2)+1 pos
    FROM t;
    
    STR                                                               POS
    ---------------------------------------------------------- ----------
    This is a test string to show how regexp work in Oracle db         23
    
    

    Let’s see the REGEXP_REPLACE function, using it you can replace a string (found using a regular expression) with another one:

    REGEXP_REPLACE(source_string, pattern, replace_string, position, occurrence, match_parameter)
    

    source_string is the string in which to find and replace;
    pattern is the regular expression;
    replace_string is the replacement string;
    position is the position starting from you want to search;
    occurrence is the requested occurrence of the searching string;
    match_parameter works just like in the REGEXP_INSTR function.

    So, if we want to replace, in the string used in the previous example, the second two-characters-long word with a X we can write:

    
    WITH T AS (
    SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
    Select REGEXP_REPLACE(str,'(^|\ )[[:alpha:]]{2}($|\ )', ' X ',1, 2) newstr
    FROM t;
    
    NEWSTR
    ---------------------------------------------------------
    This is a test string X show how regexp work in Oracle db
    

    The third function to show is REGEXP_SUBSTR that allows us to extract a substring:

    REGEXP_SUBSTR (source_string , pattern, position, occurrence, match_parameter)
    

    source_string is the string in which to find;
    pattern is the regular expression;
    position is the position starting from you want to search;
    occurrence is the requested occurrence of the searching string;
    match_parameter works just like in the REGEXP_INSTR function.

    So, if we want to extract the second two-characters-long word we do:

    WITH T AS (
    SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
    Select str, trim(REGEXP_SUBSTR(str,'(^|\ )[[:alpha:]]{2}($|\ )',1, 2)) substr
    FROM t;
    
    STR                                                        SUBSTR
    ---------------------------------------------------------- ------
    This is a test string to show how regexp work in Oracle db to
    
    

    REGEXP_COUNT allows us to count the occurrences in the source string that match the pattern:

    REGEXP_COUNT (source_char , pattern, position, match_param) 
    

    source_char is the string in which to find;
    pattern is the regular expression;
    position is the position starting from you want to search;
    return_option works just like in the REGEXP_INSTR function.

    So, if we want to count the two-characters-long words, we can do:

    WITH T AS (
    SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
    Select str, REGEXP_COUNT(str,'(^|\ )[[:alpha:]]{2}($|\ )',1) count
    FROM t;
    
    STR                                                             COUNT
    ---------------------------------------------------------- ----------
    This is a test string to show how regexp work in Oracle db          4
    
    

    We can combine REGEXP_COUNT and REGEXP_SUBSTR (and a trick of the connect by clause) to extract in a single statement all the two-characters-long words:

    WITH T AS (
    SELECT 'This is a test string to show how regexp work in Oracle db' str from dual)
    Select trim(REGEXP_SUBSTR(str,'(^|\ )[[:alpha:]]{2}($|\ )',1, level)) substr
    FROM t connect by level<=REGEXP_COUNT(str,'(^|\ )[[:alpha:]]{2}($|\ )',1);
    
    SUBSTR
    ------
    is
    to
    in
    db
    
    

    Once seen all the functions let’s see REGEXP_LIKE. It’s a condition, that is to say it returns a boolean value and can be used in the WHERE or in the HAVING clause of a query statement:

    REGEXP_LIKE (source_string, pattern, match_parameter)
    

    source_string is the string in which to find;
    pattern is the regular expression;
    match_parameter works just like in the REGEXP_INSTR function.

    REGEXP_LIKE(str,ptrn,mp) is equivalent to REGEXP_INSTR(str,ptrn,1,1,mp)>0

    Let’s extract all the string that include at least one two-characters-long word:

    WITH T AS (
    SELECT 'String without 2chars words' str from dual union
    SELECT 'first string with a word made of two chars' from dual union
    SELECT 'second string with a word made of two chars' from dual union
    SELECT 'Another string without 2chars words' from dual
    )
    Select str
    from t
    where REGEXP_LIKE(str,'(^|\ )[[:alpha:]]{2}($|\ )');
    
    STR
    -------------------------------------------
    first string with a word made of two chars
    second string with a word made of two chars
    
    

    For the equivalence stated above, we could write:

    WITH T AS (
    SELECT 'String without 2chars words' str from dual union
    SELECT 'first string with a word made of two chars' from dual union
    SELECT 'second string with a word made of two chars' from dual union
    SELECT 'Another string without 2chars words' from dual
    )
    Select str
    from t
    where REGEXP_INSTR(str,'(^|\ )[[:alpha:]]{2}($|\ )')>0;
    
    

    That’s all about regular expressions in Oracle SQL & PL/SQL!

    Massimo

    Update a database table using MERGE.

    10/08/2012

    Sometimes we need to update periodically a table reading data from a source (another table, a text file …).
    Often happens that the source is “complete”, that is to say that it contains both records already loaded in the target table and new records. The already loaded records could need an update.
    The simplest approach is the following: periodically empty the target table and refill it reading all the records from the source.
    This is not always the best approach. What if we want to insert only new records and update those that are already loaded? We could use the MERGE statement.

    Let’s see an example. We have to update the employees list (MYEMP table) reading the EMP table.

    
    SQL> select * from emp;
    
         EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
    ---------- ---------- --------- ---------- --------- ---------- ---------- ----------
          7369 SMITH      CLERK           7902 17-DIC-80        800                    20
          7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
          7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
          7566 JONES      MANAGER         7839 02-APR-81       2975                    20
          7654 MARTIN     SALESMAN        7698 28-SET-81       1250       1400         30
          7698 BLAKE      MANAGER         7839 01-MAG-81       2850                    30
          7782 CLARK      MANAGER         7839 09-GIU-81       2450                    10
          7788 SCOTT      ANALYST         7566 09-DIC-82       3000                    20
          7839 KING       PRESIDENT            17-NOV-81       5000                    10
          7844 TURNER     SALESMAN        7698 08-SET-81       1500          0         30
          7876 ADAMS      CLERK           7788 12-GEN-83       1100                    20
          7900 JAMES      CLERK           7698 03-DIC-81        950                    30
          7902 FORD       ANALYST         7566 03-DIC-81       3000                    20
          7934 MILLER     CLERK           7782 23-GEN-82       1300                    10
    
    SQL> select * from myemp;
    
         EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
    ---------- ---------- --------- ---------- --------- ---------- ---------- ----------
          7566 JONES      MANAGER         7839 02-APR-81       1000                    20
          7654 MARTIN     SALESMAN        7698 28-SET-81       1000       1400         30
          7698 BLAKE      MANAGER         7839 01-MAG-81       1000                    30
          7782 CLARK      MANAGER         7839 09-GIU-81       1000                    10
          7788 SCOTT      ANALYST         7566 09-DIC-82       3000                    20
          7839 KING       PRESIDENT            17-NOV-81       5000                    10
          7844 TURNER     SALESMAN        7698 08-SET-81       1500          0         30
          7876 ADAMS      CLERK           7788 12-GEN-83       1100                    20
          7900 JAMES      CLERK           7698 03-DIC-81        950                    30
          7902 FORD       ANALYST         7566 03-DIC-81       3000                    20
          7934 MILLER     CLERK           7782 23-GEN-82       1300                    10
    
    

    As you can see, the MYEMP table misses a few records and some salaries are different from those of the EMP table.
    Let’s load data using MERGE:

    
    SQL> MERGE into MYEMP
      2  USING EMP
      3  ON (emp.empno=myemp.empno)
      4  WHEN MATCHED THEN
      5  UPDATE SET
      6    ename    = emp.ename,
      7    job      = emp.job,
      8    mgr      = emp.mgr,
      9    hiredate = emp.hiredate,
     10    sal      = emp.sal,
     11    comm     = emp.comm,
     12    deptno   = emp.deptno
     13  WHEN NOT MATCHED THEN
     14  INSERT (empno, ename, job, mgr, hiredate, sal, comm, deptno)
     15  VALUES (emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate,
     16          emp.sal, emp.comm, emp.deptno)
     17  ;
    
    SQL> select * from myemp;
    
         EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
    ---------- ---------- --------- ---------- --------- ---------- ---------- ----------
          7566 JONES      MANAGER         7839 02-APR-81       2975                    20
          7654 MARTIN     SALESMAN        7698 28-SET-81       1250       1400         30
          7698 BLAKE      MANAGER         7839 01-MAG-81       2850                    30
          7782 CLARK      MANAGER         7839 09-GIU-81       2450                    10
          7788 SCOTT      ANALYST         7566 09-DIC-82       3000                    20
          7839 KING       PRESIDENT            17-NOV-81       5000                    10
          7844 TURNER     SALESMAN        7698 08-SET-81       1500          0         30
          7876 ADAMS      CLERK           7788 12-GEN-83       1100                    20
          7900 JAMES      CLERK           7698 03-DIC-81        950                    30
          7902 FORD       ANALYST         7566 03-DIC-81       3000                    20
          7934 MILLER     CLERK           7782 23-GEN-82       1300                    10
          7369 SMITH      CLERK           7902 17-DIC-80        800                    20
          7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
          7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
    
    

    Loading succeeded and now the two tables have the same content.
    Let’s review the statement line by line:

  • MERGE into MYEMP
  • Indicates the table to update

  • USING EMP
  • Indicates the source table, what if we have to read a file? We always can see the file as a table using an external table.

  • ON (emp.empno=myemp.empno)
  • The condition to check, if this is true an UPDATE will be performed, otherwise an INSERT.

  • WHEN MATCHED THEN
  • When the above condition is true…

  • UPDATE SET etc…
  • …execute this UPDATE.

  • WHEN NOT MATCHED THEN
  • Else…

  • INSERT (empno, ename, job, mgr, hiredate, sal, comm, deptno) etc..
  • …execute this INSERT.

    There’s some clause not used in the example.

    You can add to the UPDATE clause a WHERE clause to update only some specific records. If, for example, we need to update the record only if the new salary is greater then the old one, we can do:

    
    MERGE into MYEMP                                              
    USING EMP                                                     
    ON (emp.empno=myemp.empno)                                    
    WHEN MATCHED THEN                                             
    UPDATE SET                                                    
      ename    = emp.ename,                                       
      job      = emp.job,                                         
      mgr      = emp.mgr,                                         
      hiredate = emp.hiredate,                                    
      sal      = emp.sal,                                         
      comm     = emp.comm,                                        
      deptno   = emp.deptno                                       
    WHERE sal < emp.sal
    WHEN NOT MATCHED THEN                                         
    INSERT (empno, ename, job, mgr, hiredate, sal, comm, deptno)  
    VALUES (emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, 
            emp.sal, emp.comm, emp.deptno)                        
    ;                                                             
                                                                  
    

    Furthermore you can specify a DELETE clause in the UPDATE.
    It’s useful to delete from the target table those records that, after the UPDATE, match the condition.
    For example we need to delete from MYEMP the employees that, after the update, have a salary smaller than 1000:

    
    SQL> MERGE into MYEMP
      2  USING EMP
      3  ON (emp.empno=myemp.empno)
      4  WHEN MATCHED THEN
      5  UPDATE SET
      6    ename    = emp.ename,
      7    job      = emp.job,
      8    mgr      = emp.mgr,
      9    hiredate = emp.hiredate,
     10    sal      = emp.sal,
     11    comm     = emp.comm,
     12    deptno   = emp.deptno
     13  DELETE WHERE sal < 1000
     14  WHEN NOT MATCHED THEN
     15  INSERT (empno, ename, job, mgr, hiredate, sal, comm, deptno)
     16  VALUES (emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate,
     17          emp.sal, emp.comm, emp.deptno)
     18  ;
    
    SQL>  select * from myemp;
    
         EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
    ---------- ---------- --------- ---------- --------- ---------- ---------- ----------
          7566 JONES      MANAGER         7839 02-APR-81       2975                    20
          7654 MARTIN     SALESMAN        7698 28-SET-81       1250       1400         30
          7698 BLAKE      MANAGER         7839 01-MAG-81       2850                    30
          7782 CLARK      MANAGER         7839 09-GIU-81       2450                    10
          7788 SCOTT      ANALYST         7566 09-DIC-82       3000                    20
          7839 KING       PRESIDENT            17-NOV-81       5000                    10
          7844 TURNER     SALESMAN        7698 08-SET-81       1500          0         30
          7876 ADAMS      CLERK           7788 12-GEN-83       1100                    20
          7902 FORD       ANALYST         7566 03-DIC-81       3000                    20
          7934 MILLER     CLERK           7782 23-GEN-82       1300                    10
          7369 SMITH      CLERK           7902 17-DIC-80        800                    20
          7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
          7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
    
    

    Oracle first of all updated 14 rows, then deleted the row of the employee JAMES (gone from a salary of 1000 to a salary of 950).
    Note that the SMITH’s row has not been deleted. It has a salary of 800 but it was not present in MYEMP before the MERGE so it came in whit the INSERT, not with the UPDATE.
    The DELETE clause applies only to the update records.

    The INSER clause too can be enriched with a WHERE clause to insert only the records that match a given condition.

    For example, to insert in MYEMP only those employees having a salary greater than 1000:

    
    SQL> MERGE into MYEMP
      2  USING EMP
      3  ON (emp.empno=myemp.empno)
      4  WHEN MATCHED THEN
      5  UPDATE SET
      6    ename    = emp.ename,
      7    job      = emp.job,
      8    mgr      = emp.mgr,
      9    hiredate = emp.hiredate,
     10    sal      = emp.sal,
     11    comm     = emp.comm,
     12    deptno   = emp.deptno
     13  WHEN NOT MATCHED THEN
     14  INSERT (empno, ename, job, mgr, hiredate, sal, comm, deptno)
     15  VALUES (emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate,
     16          emp.sal, emp.comm, emp.deptno)
     17  WHERE emp.sal>1000
     18  ;
    
    SQL>  select * from myemp;
    
         EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
    ---------- ---------- --------- ---------- --------- ---------- ---------- ----------
          7566 JONES      MANAGER         7839 02-APR-81       2975                    20
          7654 MARTIN     SALESMAN        7698 28-SET-81       1250       1400         30
          7698 BLAKE      MANAGER         7839 01-MAG-81       2850                    30
          7782 CLARK      MANAGER         7839 09-GIU-81       2450                    10
          7788 SCOTT      ANALYST         7566 09-DIC-82       3000                    20
          7839 KING       PRESIDENT            17-NOV-81       5000                    10
          7844 TURNER     SALESMAN        7698 08-SET-81       1500          0         30
          7876 ADAMS      CLERK           7788 12-GEN-83       1100                    20
          7900 JAMES      CLERK           7698 03-DIC-81        950                    30
          7902 FORD       ANALYST         7566 03-DIC-81       3000                    20
          7934 MILLER     CLERK           7782 23-GEN-82       1300                    10
          7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
          7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
    
    

    This time SMITH has been excluded because he has a salary of 800. Please note that in the WHERE clause of the INSERT you cannot reference columns of the updating table, just columns of the source one.

    To finish, you can handle errors using the LOG ERRORS clause.

    Massimo.