Hierarchical queries in Oracle – The CONNECT BY clause

When I met Oracle7, back in 1996, I was struck by a feature, hierarchical queries, that I found challenging but not really useful.
Except a couple of didactic examples I didn’t found for it any other practical application.
I was wrong, even more often I make use of the CONNECT BY clause to resolve problems that, at first glance, are not hierarchical at all, for example to anagram a word or to aggregate strings.
Let’s go step by step, I will start with the simplest examples and will show the most complex ones at the end of this post.

As usual we’ll work on the EMP table that is already furnished with a hierarchical self relation. On that table, indeed, there’s a column, MGR, that stores, for each employee, his boss’ EMPNO.
Data contained into the table is as follows:

SQL> select empno, ename, mgr
  2  from emp;

     EMPNO ENAME             MGR
---------- ---------- ----------
      7369 SMITH            7902
      7499 ALLEN            7698
      7521 WARD             7698
      7566 JONES            7839
      7654 MARTIN           7698
      7698 BLAKE            7839
      7782 CLARK            7839
      7788 SCOTT            7566
      7839 KING
      7844 TURNER           7698
      7876 ADAMS            7788
      7900 JAMES            7698
      7902 FORD             7566
      7934 MILLER           7782

KING is the root of the hierarchy since hi has no boss.

The first example of hierarchical query is the following:

SQL> select empno, ename, mgr, prior ename, level
  2  from emp
  3  connect by prior empno = mgr
  4  start with mgr is null;

     EMPNO ENAME             MGR PRIORENAME      LEVEL
---------- ---------- ---------- ---------- ----------
      7839 KING                                      1
      7566 JONES            7839 KING                2
      7788 SCOTT            7566 JONES               3
      7876 ADAMS            7788 SCOTT               4
      7902 FORD             7566 JONES               3
      7369 SMITH            7902 FORD                4
      7698 BLAKE            7839 KING                2
      7499 ALLEN            7698 BLAKE               3
      7521 WARD             7698 BLAKE               3
      7654 MARTIN           7698 BLAKE               3
      7844 TURNER           7698 BLAKE               3
      7900 JAMES            7698 BLAKE               3
      7782 CLARK            7839 KING                2
      7934 MILLER           7782 CLARK               3

Let’s analyze it: the CONNECT BY clause, mandatory to make a hierarchical query, is used to define how each record is connected to the hierarchical superior.
The father of the record having MGR=x has EMPNO=x.
On the other hand, given a record with EMPNO=x, all the records having MGR=x are his sons.
The unary operator PRIOR indicates “the father of”.
START WITH clause is used to from which records we want to start the hierarchy, in our example we want to start from the root of the hierarchy, the employee that has no manager.
The root of the hierarchy could be not unique. In this example it is.
The LEVEL pseudocolumn indicates at which level each record stays in the hierarchy, starting from the root that has level=1.

Once understood the quey, we can read our resulting tree: KING is the root and has level=1.
Under KING there are three employees at level 2(JONES,BLAKE e CLARK). Then the others.
How does Oracle make the hierarchy? First of all it reads the records.
Then it determines the roots applying the START WITH clause.
Then, starting from each root, it determines the first-level sons applying the CONNECT BY clause and so on…

To enhance the layout of the output we can use the following trick:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, level
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null;

     EMPNO NAME                      MGR PRIORENAME      LEVEL
---------- ------------------ ---------- ---------- ----------
      7839    KING                                           1
      7566       JONES              7839 KING                2
      7788          SCOTT           7566 JONES               3
      7876             ADAMS        7788 SCOTT               4
      7902          FORD            7566 JONES               3
      7369             SMITH        7902 FORD                4
      7698       BLAKE              7839 KING                2
      7499          ALLEN           7698 BLAKE               3
      7521          WARD            7698 BLAKE               3
      7654          MARTIN          7698 BLAKE               3
      7844          TURNER          7698 BLAKE               3
      7900          JAMES           7698 BLAKE               3
      7782       CLARK              7839 KING                2
      7934          MILLER          7782 CLARK               3

We just added some white-spaces on the left of the name depending on the level. Now it’s all more clear…

Records order defines the hierarchy, adding an ORDER BY clause we could completely lose the hierarchy.
We can decide how to sort siblings, that is to say records on the same level. For example JONES, BLAKE and CLARK:

SQL> select empno,lpad(' ',level*3,' ')||ename nome,
  2         mgr, prior ename, level
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null
  6  order siblings by ename;

     EMPNO NOME                      MGR PRIORENAME      LEVEL
---------- ------------------ ---------- ---------- ----------
      7839    KING                                           1
      7698       BLAKE              7839 KING                2
      7499          ALLEN           7698 BLAKE               3
      7900          JAMES           7698 BLAKE               3
      7654          MARTIN          7698 BLAKE               3
      7844          TURNER          7698 BLAKE               3
      7521          WARD            7698 BLAKE               3
      7782       CLARK              7839 KING                2
      7934          MILLER          7782 CLARK               3
      7566       JONES              7839 KING                2
      7902          FORD            7566 JONES               3
      7369             SMITH        7902 FORD                4
      7788          SCOTT           7566 JONES               3
      7876             ADAMS        7788 SCOTT               4

With no impact on the hierarchy we’ve sorted siblings by name.

If you work on Oracle9i you can stop reading, what follows is available on Oracle10g and 11g…

In any hierarchy infinite loops can raise. For example what happens if KING has a manager himself?

SQL> update emp set mgr=7369 where ename='KING';

KING is one of the SMITH’s sons, SMITH is himself a KING’s great grandson…

What if we run the same query now?

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, level
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null
  6  order siblings by ename;

No rows selected

Of course, because there’s no record that has MGR null… So let’s change the START WITH as follows:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, level
  3    from emp
  4  connect by prior empno = mgr
  5    start with empno=7839;
ERROR:
ORA-01436: CONNECT BY loop in user data

Here’s the loop, Oracle can’t create the hierarchy.
Oracle has managed it introducing the NOCYCLE clause.

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, level
  3    from emp
  4  connect by nocycle prior empno = mgr
  5    start with empno=7839;

     EMPNO NAME                      MGR PRIORENAME      LEVEL
---------- ------------------ ---------- ---------- ----------
      7839    KING                  7369                     1
      7566       JONES              7839 KING                2
      7788          SCOTT           7566 JONES               3
      7876             ADAMS        7788 SCOTT               4
      7902          FORD            7566 JONES               3
      7369             SMITH        7902 FORD                4
      7698       BLAKE              7839 KING                2
      7499          ALLEN           7698 BLAKE               3
      7521          WARD            7698 BLAKE               3
      7654          MARTIN          7698 BLAKE               3
      7844          TURNER          7698 BLAKE               3
      7900          JAMES           7698 BLAKE               3
      7782       CLARK              7839 KING                2
      7934          MILLER          7782 CLARK               3

That clause tells to Oracle to stop when a loop is raised, Oracle goes on on the other branchs of the tree.
After SMITH another instance of KING (and all his descendant) should be extracted, and so on, generating an infinite output. The NOCYCLE clause prevents that.

CONNECT_BY_ISCYCLE pseudocolumn tells us in which records a loop has been detected:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, connect_by_iscycle IFLOOP
  3    from emp
  4  connect by nocycle prior empno = mgr
  5    start with empno=7839;

     EMPNO NAME                      MGR PRIORENAME     IFLOOP
---------- ------------------ ---------- ---------- ----------
      7839    KING                  7369                     0
      7566       JONES              7839 KING                0
      7788          SCOTT           7566 JONES               0
      7876             ADAMS        7788 SCOTT               0
      7902          FORD            7566 JONES               0
      7369             SMITH        7902 FORD                1
      7698       BLAKE              7839 KING                0
      7499          ALLEN           7698 BLAKE               0
      7521          WARD            7698 BLAKE               0
      7654          MARTIN          7698 BLAKE               0
      7844          TURNER          7698 BLAKE               0
      7900          JAMES           7698 BLAKE               0
      7782       CLARK              7839 KING                0
      7934          MILLER          7782 CLARK               0

In addition to PRIOR another unary operator exists that’s really useful: CONNECT_BY_ROOT.
It allows us to display the root of a given record:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         connect_by_root ename boss
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null;

     EMPNO NAME               BOSS
---------- ------------------ ----------
      7839    KING            KING
      7566       JONES        KING
      7788          SCOTT     KING
      7876             ADAMS  KING
      7902          FORD      KING
      7369             SMITH  KING
      7698       BLAKE        KING
      7499          ALLEN     KING
      7521          WARD      KING
      7654          MARTIN    KING
      7844          TURNER    KING
      7900          JAMES     KING
      7782       CLARK        KING
      7934          MILLER    KING

In our example there’s a single root, KING, but if we modify another record:

SQL> update emp set mgr=null where ename='BLAKE';

We get:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         connect_by_root ename boss
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null;

     EMPNO NAME               BOSS
---------- ------------------ ----------
      7698    BLAKE           BLAKE
      7499       ALLEN        BLAKE
      7521       WARD         BLAKE
      7654       MARTIN       BLAKE
      7844       TURNER       BLAKE
      7900       JAMES        BLAKE
      7839    KING            KING
      7566       JONES        KING
      7788          SCOTT     KING
      7876             ADAMS  KING
      7902          FORD      KING
      7369             SMITH  KING
      7782       CLARK        KING
      7934          MILLER    KING

Where, for each record, we display who is the “big chief”…

Another really useful function is SYS_CONNECT_BY_PATH, it gets as input a field and a character and builds the full path from the root to the current record using the character as a separator:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         sys_connect_by_path(ename,'/') bosses
  3    from emp
  4  connect by prior empno = mgr
  5*   start with mgr is null

     EMPNO NAME               BOSSES
---------- ------------------ --------------------------------
      7839    KING            /KING
      7566       JONES        /KING/JONES
      7788          SCOTT     /KING/JONES/SCOTT
      7876             ADAMS  /KING/JONES/SCOTT/ADAMS
      7902          FORD      /KING/JONES/FORD
      7369             SMITH  /KING/JONES/FORD/SMITH
      7698       BLAKE        /KING/BLAKE
      7499          ALLEN     /KING/BLAKE/ALLEN
      7521          WARD      /KING/BLAKE/WARD
      7654          MARTIN    /KING/BLAKE/MARTIN
      7844          TURNER    /KING/BLAKE/TURNER
      7900          JAMES     /KING/BLAKE/JAMES
      7782       CLARK        /KING/CLARK
      7934          MILLER    /KING/CLARK/MILLER

There’s another useful pseudocolumn to show, CONNECT_BY_ISLEAF.

It tells us whether a record is a leaf of the tree or not:

SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2        connect_by_isleaf ifleaf
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null;

     EMPNO NAME                   IFLEAF
---------- ------------------ ----------
      7839    KING                     0
      7566       JONES                 0
      7788          SCOTT              0
      7876             ADAMS           1
      7902          FORD               0
      7369             SMITH           1
      7698       BLAKE                 0
      7499          ALLEN              1
      7521          WARD               1
      7654          MARTIN             1
      7844          TURNER             1
      7900          JAMES              1
      7782       CLARK                 0
      7934          MILLER             1
      

Really interesting is the ability of CONNECT BY to generate more records from a table that contains only one row:

SQL> select level from dual
  2  connect by level<=10;

     LEVEL
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

We can extract from DUAL, that contains just one row, all the records we need to make complex computations.
I’ve used this ability, for example, in the post about regular expressions when I showed how REGEXP_COUNT works.

That’s, more or less, everything about theory, let’s see two practical examples.

The first one has been already showed in another post: About string concatenation across different records.


 SELECT deptno, ltrim(SYS_CONNECT_BY_PATH(ename, ','),',') enames
  FROM (select deptno, ename, rank() 
        over(partition by deptno order by rownum) num from emp)
where connect_by_isleaf=1
START WITH num=1
CONNECT BY PRIOR num+1 = num and prior deptno=deptno;

    DEPTNO ENAMES
---------- ----------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES

Using SYS_CONNECT_BY_PATH we was able to put in a single string, separated by comma, the names of the employees that work in the same department.
And this has absolutely nothing to do with the MGR column…

The other example is the solution that I proposed to solve a problem posted on the official Oracle SQL & PL/SQL forum. The question was how to determine whether a word is the anagram of another one or not.
Here’s the query I wrote to extract all the anagrams of a given word:

with t as 
(select 'bolt' name from dual)
select distinct replace(sys_connect_by_path(c,' '),' ',null) str 
  from (select substr(name,level,1) c, name, 
               level*1000+ascii(substr(name,level,1)) cod 
          from t
        connect by level<=length(name))
 where level=length(name)
connect by nocycle cod != prior cod

STR
-------------------------
oltb
otbl
lbot
tlbo
blto
lotb
ltob
otlb
lbto
tblo
bolt
btol
oblt
lobt
tobl
blot
obtl
tbol
tolb
olbt
ltbo
tlob
botl
btlo

24 rows selected.

The word to anagram was “bolt”.

First of all the query extracts all the letters from the word marking each with an unique code:

SQL> with t as
  2  (select 'bolt' name from dual)
  3  select substr(name,level,1) c, name,
  4                 level*1000+ascii(substr(name,level,1)) cod
  5            from t
  6          connect by level<=length(name);

C NAME        COD
- ---- ----------
b bolt       1098
o bolt       2111
l bolt       3108
t bolt       4116

Then I’ve used sys_connect_by_path to build all the possible combinations of such letters.
Interesting, isn’t it?

Massimo

Tags: , , ,

22 Responses to “Hierarchical queries in Oracle – The CONNECT BY clause”

  1. Query gerarchiche « Oracle Italia by Massimo Ruocchio Says:

    […] This article is also available in English here. […]

  2. suman Says:

    super articular about connect by

  3. tethysnz Says:

    Thank you so much – I have been struggling with hierarchical queries to extract Bill of Material data and this is the best explanation I have read – and gave me more ideas on how to make it work properly.

  4. samna Says:

    Nice once

  5. paru Says:

    Thanks for posting this article, very very helpful with hierarchical queries

  6. Eswar Says:

    nice article, big thanks!

  7. Gokulraj Says:

    Very good article to understand about CONNECT BY in oracle, Thanks

  8. vidya Says:

    It is very explonatary

  9. subh Says:

    very nice

  10. Dinesh Says:

    The article is very nice… I have one query if I need to get the record in reverse direction what can we do for that. e.g. The employee ID is given then his managers and sr.managers need to know then how to do that. in this case if scott is given then need SCOTT/JONES/KING 

    • massimoruocchio Says:

      Just invert the PRIOR clause and modify the START WITH:


      SQL >select SYS_CONNECT_BY_PATH(ename, '/') enames
      2 from emp
      3 where connect_by_isleaf=1
      4 CONNECT BY PRIOR mgr=empno
      5 start with ename='SMITH';

      ENAMES
      -----------------------------------------------------
      /SMITH/FORD/JONES/KING

      Massimo

  11. anita Says:

    connect_by_root without start with clause
    please give a example
    what happening
    in connect by clause without start with clause

  12. BG Says:

    Very interesting read. Thank you!

  13. Oracle hierarchinės užklausos su WITH ir CONNECT BY | Paulius Bautrėnas Says:

    […] veikia CONNECT BY yra šauniai aprašyta čia, todėl per daug neišsiplėsiu ir geriau parodysiu, kaip prieš tai aprašytą užklausą […]

  14. Viper Says:

    Useful Artical !!!!!!

  15. pm Says:

    Very Nice artical about hierarchical queries .. Thanks

  16. Neo Says:

    Great Article!

  17. Y.T Says:

    “EMP table that is already furnished with a hierarchical self relation.” Why is that? Use Employees table in oracle.

  18. Sanjeevv Seth Says:

    Super what I thought is very difficult you made it so easy thanks

  19. Gajendra Choudhary Says:

    Hi what happen if i write connect by 1=0, what actually means it?

    • Massimo Ruocchio Says:

      Hi, it has not a real meaning. If you use an always FALSE condition in the connect by clause, Oracle will create no connections between records, so each record from the table will be extract as it is.
      If the table TAB contains N records, the statement “SELECT * from TAB connect by 1=0” will extract N records.
      On the other hand, if you use an always TRUE condition in the connect by clause, Oracle will connect all the records to all the records, recursively, so you will get a memory error, even if the table contains just one record…
      (ORA-30009: Not enough memory for CONNECT BY operation)

      Massimo

  20. Pashan None Says:

    Great Article with sample database from the following link:
    Example EMP and DEPT tables. Classic Oracle tables with 4 departments and 14 employees. Includes a join query example.
    https://livesql.oracle.com/apex/livesql/file/content_O5AEB2HE08PYEPTGCFLZU9YCV.html
    from
    Mike Hichwa (Oracle)

Leave a reply to Y.T Cancel reply