When I met Oracle7, back in 1996, I was struck by a feature, hierarchical queries, that I found challenging but not really useful.
Except a couple of didactic examples I didn’t found for it any other practical application.
I was wrong, even more often I make use of the CONNECT BY clause to resolve problems that, at first glance, are not hierarchical at all, for example to anagram a word or to aggregate strings.
Let’s go step by step, I will start with the simplest examples and will show the most complex ones at the end of this post.
As usual we’ll work on the EMP table that is already furnished with a hierarchical self relation. On that table, indeed, there’s a column, MGR, that stores, for each employee, his boss’ EMPNO.
Data contained into the table is as follows:
SQL> select empno, ename, mgr 2 from emp; EMPNO ENAME MGR ---------- ---------- ---------- 7369 SMITH 7902 7499 ALLEN 7698 7521 WARD 7698 7566 JONES 7839 7654 MARTIN 7698 7698 BLAKE 7839 7782 CLARK 7839 7788 SCOTT 7566 7839 KING 7844 TURNER 7698 7876 ADAMS 7788 7900 JAMES 7698 7902 FORD 7566 7934 MILLER 7782
KING is the root of the hierarchy since hi has no boss.
The first example of hierarchical query is the following:
SQL> select empno, ename, mgr, prior ename, level 2 from emp 3 connect by prior empno = mgr 4 start with mgr is null; EMPNO ENAME MGR PRIORENAME LEVEL ---------- ---------- ---------- ---------- ---------- 7839 KING 1 7566 JONES 7839 KING 2 7788 SCOTT 7566 JONES 3 7876 ADAMS 7788 SCOTT 4 7902 FORD 7566 JONES 3 7369 SMITH 7902 FORD 4 7698 BLAKE 7839 KING 2 7499 ALLEN 7698 BLAKE 3 7521 WARD 7698 BLAKE 3 7654 MARTIN 7698 BLAKE 3 7844 TURNER 7698 BLAKE 3 7900 JAMES 7698 BLAKE 3 7782 CLARK 7839 KING 2 7934 MILLER 7782 CLARK 3
Let’s analyze it: the CONNECT BY clause, mandatory to make a hierarchical query, is used to define how each record is connected to the hierarchical superior.
The father of the record having MGR=x has EMPNO=x.
On the other hand, given a record with EMPNO=x, all the records having MGR=x are his sons.
The unary operator PRIOR indicates “the father of”.
START WITH clause is used to from which records we want to start the hierarchy, in our example we want to start from the root of the hierarchy, the employee that has no manager.
The root of the hierarchy could be not unique. In this example it is.
The LEVEL pseudocolumn indicates at which level each record stays in the hierarchy, starting from the root that has level=1.
Once understood the quey, we can read our resulting tree: KING is the root and has level=1.
Under KING there are three employees at level 2(JONES,BLAKE e CLARK). Then the others.
How does Oracle make the hierarchy? First of all it reads the records.
Then it determines the roots applying the START WITH clause.
Then, starting from each root, it determines the first-level sons applying the CONNECT BY clause and so on…
To enhance the layout of the output we can use the following trick:
SQL> select empno,lpad(' ',level*3,' ')||ename name, 2 mgr, prior ename, level 3 from emp 4 connect by prior empno = mgr 5 start with mgr is null; EMPNO NAME MGR PRIORENAME LEVEL ---------- ------------------ ---------- ---------- ---------- 7839 KING 1 7566 JONES 7839 KING 2 7788 SCOTT 7566 JONES 3 7876 ADAMS 7788 SCOTT 4 7902 FORD 7566 JONES 3 7369 SMITH 7902 FORD 4 7698 BLAKE 7839 KING 2 7499 ALLEN 7698 BLAKE 3 7521 WARD 7698 BLAKE 3 7654 MARTIN 7698 BLAKE 3 7844 TURNER 7698 BLAKE 3 7900 JAMES 7698 BLAKE 3 7782 CLARK 7839 KING 2 7934 MILLER 7782 CLARK 3
We just added some white-spaces on the left of the name depending on the level. Now it’s all more clear…
Records order defines the hierarchy, adding an ORDER BY clause we could completely lose the hierarchy.
We can decide how to sort siblings, that is to say records on the same level. For example JONES, BLAKE and CLARK:
SQL> select empno,lpad(' ',level*3,' ')||ename nome, 2 mgr, prior ename, level 3 from emp 4 connect by prior empno = mgr 5 start with mgr is null 6 order siblings by ename; EMPNO NOME MGR PRIORENAME LEVEL ---------- ------------------ ---------- ---------- ---------- 7839 KING 1 7698 BLAKE 7839 KING 2 7499 ALLEN 7698 BLAKE 3 7900 JAMES 7698 BLAKE 3 7654 MARTIN 7698 BLAKE 3 7844 TURNER 7698 BLAKE 3 7521 WARD 7698 BLAKE 3 7782 CLARK 7839 KING 2 7934 MILLER 7782 CLARK 3 7566 JONES 7839 KING 2 7902 FORD 7566 JONES 3 7369 SMITH 7902 FORD 4 7788 SCOTT 7566 JONES 3 7876 ADAMS 7788 SCOTT 4
With no impact on the hierarchy we’ve sorted siblings by name.
If you work on Oracle9i you can stop reading, what follows is available on Oracle10g and 11g…
In any hierarchy infinite loops can raise. For example what happens if KING has a manager himself?
SQL> update emp set mgr=7369 where ename='KING';
KING is one of the SMITH’s sons, SMITH is himself a KING’s great grandson…
What if we run the same query now?
SQL> select empno,lpad(' ',level*3,' ')||ename name, 2 mgr, prior ename, level 3 from emp 4 connect by prior empno = mgr 5 start with mgr is null 6 order siblings by ename; No rows selected
Of course, because there’s no record that has MGR null… So let’s change the START WITH as follows:
SQL> select empno,lpad(' ',level*3,' ')||ename name, 2 mgr, prior ename, level 3 from emp 4 connect by prior empno = mgr 5 start with empno=7839; ERROR: ORA-01436: CONNECT BY loop in user data
Here’s the loop, Oracle can’t create the hierarchy.
Oracle has managed it introducing the NOCYCLE clause.
SQL> select empno,lpad(' ',level*3,' ')||ename name, 2 mgr, prior ename, level 3 from emp 4 connect by nocycle prior empno = mgr 5 start with empno=7839; EMPNO NAME MGR PRIORENAME LEVEL ---------- ------------------ ---------- ---------- ---------- 7839 KING 7369 1 7566 JONES 7839 KING 2 7788 SCOTT 7566 JONES 3 7876 ADAMS 7788 SCOTT 4 7902 FORD 7566 JONES 3 7369 SMITH 7902 FORD 4 7698 BLAKE 7839 KING 2 7499 ALLEN 7698 BLAKE 3 7521 WARD 7698 BLAKE 3 7654 MARTIN 7698 BLAKE 3 7844 TURNER 7698 BLAKE 3 7900 JAMES 7698 BLAKE 3 7782 CLARK 7839 KING 2 7934 MILLER 7782 CLARK 3
That clause tells to Oracle to stop when a loop is raised, Oracle goes on on the other branchs of the tree.
After SMITH another instance of KING (and all his descendant) should be extracted, and so on, generating an infinite output. The NOCYCLE clause prevents that.
CONNECT_BY_ISCYCLE pseudocolumn tells us in which records a loop has been detected:
SQL> select empno,lpad(' ',level*3,' ')||ename name, 2 mgr, prior ename, connect_by_iscycle IFLOOP 3 from emp 4 connect by nocycle prior empno = mgr 5 start with empno=7839; EMPNO NAME MGR PRIORENAME IFLOOP ---------- ------------------ ---------- ---------- ---------- 7839 KING 7369 0 7566 JONES 7839 KING 0 7788 SCOTT 7566 JONES 0 7876 ADAMS 7788 SCOTT 0 7902 FORD 7566 JONES 0 7369 SMITH 7902 FORD 1 7698 BLAKE 7839 KING 0 7499 ALLEN 7698 BLAKE 0 7521 WARD 7698 BLAKE 0 7654 MARTIN 7698 BLAKE 0 7844 TURNER 7698 BLAKE 0 7900 JAMES 7698 BLAKE 0 7782 CLARK 7839 KING 0 7934 MILLER 7782 CLARK 0
In addition to PRIOR another unary operator exists that’s really useful: CONNECT_BY_ROOT.
It allows us to display the root of a given record:
SQL> select empno,lpad(' ',level*3,' ')||ename name, 2 connect_by_root ename boss 3 from emp 4 connect by prior empno = mgr 5 start with mgr is null; EMPNO NAME BOSS ---------- ------------------ ---------- 7839 KING KING 7566 JONES KING 7788 SCOTT KING 7876 ADAMS KING 7902 FORD KING 7369 SMITH KING 7698 BLAKE KING 7499 ALLEN KING 7521 WARD KING 7654 MARTIN KING 7844 TURNER KING 7900 JAMES KING 7782 CLARK KING 7934 MILLER KING
In our example there’s a single root, KING, but if we modify another record:
SQL> update emp set mgr=null where ename='BLAKE';
We get:
SQL> select empno,lpad(' ',level*3,' ')||ename name, 2 connect_by_root ename boss 3 from emp 4 connect by prior empno = mgr 5 start with mgr is null; EMPNO NAME BOSS ---------- ------------------ ---------- 7698 BLAKE BLAKE 7499 ALLEN BLAKE 7521 WARD BLAKE 7654 MARTIN BLAKE 7844 TURNER BLAKE 7900 JAMES BLAKE 7839 KING KING 7566 JONES KING 7788 SCOTT KING 7876 ADAMS KING 7902 FORD KING 7369 SMITH KING 7782 CLARK KING 7934 MILLER KING
Where, for each record, we display who is the “big chief”…
Another really useful function is SYS_CONNECT_BY_PATH, it gets as input a field and a character and builds the full path from the root to the current record using the character as a separator:
SQL> select empno,lpad(' ',level*3,' ')||ename name, 2 sys_connect_by_path(ename,'/') bosses 3 from emp 4 connect by prior empno = mgr 5* start with mgr is null EMPNO NAME BOSSES ---------- ------------------ -------------------------------- 7839 KING /KING 7566 JONES /KING/JONES 7788 SCOTT /KING/JONES/SCOTT 7876 ADAMS /KING/JONES/SCOTT/ADAMS 7902 FORD /KING/JONES/FORD 7369 SMITH /KING/JONES/FORD/SMITH 7698 BLAKE /KING/BLAKE 7499 ALLEN /KING/BLAKE/ALLEN 7521 WARD /KING/BLAKE/WARD 7654 MARTIN /KING/BLAKE/MARTIN 7844 TURNER /KING/BLAKE/TURNER 7900 JAMES /KING/BLAKE/JAMES 7782 CLARK /KING/CLARK 7934 MILLER /KING/CLARK/MILLER
There’s another useful pseudocolumn to show, CONNECT_BY_ISLEAF.
It tells us whether a record is a leaf of the tree or not:
SQL> select empno,lpad(' ',level*3,' ')||ename name, 2 connect_by_isleaf ifleaf 3 from emp 4 connect by prior empno = mgr 5 start with mgr is null; EMPNO NAME IFLEAF ---------- ------------------ ---------- 7839 KING 0 7566 JONES 0 7788 SCOTT 0 7876 ADAMS 1 7902 FORD 0 7369 SMITH 1 7698 BLAKE 0 7499 ALLEN 1 7521 WARD 1 7654 MARTIN 1 7844 TURNER 1 7900 JAMES 1 7782 CLARK 0 7934 MILLER 1
Really interesting is the ability of CONNECT BY to generate more records from a table that contains only one row:
SQL> select level from dual 2 connect by level<=10; LEVEL ---------- 1 2 3 4 5 6 7 8 9 10
We can extract from DUAL, that contains just one row, all the records we need to make complex computations.
I’ve used this ability, for example, in the post about regular expressions when I showed how REGEXP_COUNT works.
That’s, more or less, everything about theory, let’s see two practical examples.
The first one has been already showed in another post: About string concatenation across different records.
SELECT deptno, ltrim(SYS_CONNECT_BY_PATH(ename, ','),',') enames FROM (select deptno, ename, rank() over(partition by deptno order by rownum) num from emp) where connect_by_isleaf=1 START WITH num=1 CONNECT BY PRIOR num+1 = num and prior deptno=deptno; DEPTNO ENAMES ---------- ---------------------------------------- 10 CLARK,KING,MILLER 20 SMITH,JONES,SCOTT,ADAMS,FORD 30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
Using SYS_CONNECT_BY_PATH we was able to put in a single string, separated by comma, the names of the employees that work in the same department.
And this has absolutely nothing to do with the MGR column…
The other example is the solution that I proposed to solve a problem posted on the official Oracle SQL & PL/SQL forum. The question was how to determine whether a word is the anagram of another one or not.
Here’s the query I wrote to extract all the anagrams of a given word:
with t as (select 'bolt' name from dual) select distinct replace(sys_connect_by_path(c,' '),' ',null) str from (select substr(name,level,1) c, name, level*1000+ascii(substr(name,level,1)) cod from t connect by level<=length(name)) where level=length(name) connect by nocycle cod != prior cod STR ------------------------- oltb otbl lbot tlbo blto lotb ltob otlb lbto tblo bolt btol oblt lobt tobl blot obtl tbol tolb olbt ltbo tlob botl btlo 24 rows selected.
The word to anagram was “bolt”.
First of all the query extracts all the letters from the word marking each with an unique code:
SQL> with t as 2 (select 'bolt' name from dual) 3 select substr(name,level,1) c, name, 4 level*1000+ascii(substr(name,level,1)) cod 5 from t 6 connect by level<=length(name); C NAME COD - ---- ---------- b bolt 1098 o bolt 2111 l bolt 3108 t bolt 4116
Then I’ve used sys_connect_by_path to build all the possible combinations of such letters.
Interesting, isn’t it?
Massimo
Tags: CONNECT BY, Hierarchical, Oracle, SQL
10/19/2012 at 14:21 |
[…] This article is also available in English here. […]
07/16/2013 at 10:42 |
super articular about connect by
10/21/2013 at 22:03 |
Thank you so much – I have been struggling with hierarchical queries to extract Bill of Material data and this is the best explanation I have read – and gave me more ideas on how to make it work properly.
10/24/2013 at 11:11 |
Nice once
10/30/2013 at 04:25 |
Thanks for posting this article, very very helpful with hierarchical queries
12/31/2013 at 13:00 |
nice article, big thanks!
02/27/2014 at 15:24 |
Very good article to understand about CONNECT BY in oracle, Thanks
03/04/2014 at 14:52 |
It is very explonatary
03/12/2014 at 10:36 |
very nice
04/02/2014 at 11:52 |
The article is very nice… I have one query if I need to get the record in reverse direction what can we do for that. e.g. The employee ID is given then his managers and sr.managers need to know then how to do that. in this case if scott is given then need SCOTT/JONES/KING
04/03/2014 at 07:46 |
Just invert the PRIOR clause and modify the START WITH:
SQL >select SYS_CONNECT_BY_PATH(ename, '/') enames
2 from emp
3 where connect_by_isleaf=1
4 CONNECT BY PRIOR mgr=empno
5 start with ename='SMITH';
ENAMES
-----------------------------------------------------
/SMITH/FORD/JONES/KING
Massimo
04/14/2014 at 11:53 |
connect_by_root without start with clause
please give a example
what happening
in connect by clause without start with clause
05/09/2014 at 21:55 |
Very interesting read. Thank you!
08/26/2014 at 13:20 |
[…] veikia CONNECT BY yra šauniai aprašyta čia, todėl per daug neišsiplėsiu ir geriau parodysiu, kaip prieš tai aprašytą užklausą […]
02/07/2015 at 16:15 |
Useful Artical !!!!!!
04/07/2015 at 11:37 |
Very Nice artical about hierarchical queries .. Thanks
06/07/2015 at 09:08 |
Great Article!
06/29/2015 at 07:11 |
“EMP table that is already furnished with a hierarchical self relation.” Why is that? Use Employees table in oracle.
09/01/2016 at 17:30 |
Super what I thought is very difficult you made it so easy thanks
12/11/2016 at 12:45 |
Hi what happen if i write connect by 1=0, what actually means it?
12/12/2016 at 17:43 |
Hi, it has not a real meaning. If you use an always FALSE condition in the connect by clause, Oracle will create no connections between records, so each record from the table will be extract as it is.
If the table TAB contains N records, the statement “SELECT * from TAB connect by 1=0” will extract N records.
On the other hand, if you use an always TRUE condition in the connect by clause, Oracle will connect all the records to all the records, recursively, so you will get a memory error, even if the table contains just one record…
(ORA-30009: Not enough memory for CONNECT BY operation)
Massimo
04/07/2021 at 15:25 |
Great Article with sample database from the following link:
Example EMP and DEPT tables. Classic Oracle tables with 4 departments and 14 employees. Includes a join query example.
https://livesql.oracle.com/apex/livesql/file/content_O5AEB2HE08PYEPTGCFLZU9YCV.html
from
Mike Hichwa (Oracle)