All You Ever Need to Know About Recursive SQL

Lukas EderAugust 21st, 2014Last Updated: January 3rd, 2015

0 28 6 minutes read

Oracle SYNONYMs are a great feature. You can implement all sorts of backwards-compatibility tweaks simply by creating SYNONYMs in your database. Consider the following schema:

CREATE TABLE my_table (col NUMBER(7));

CREATE SYNONYM my_table_old FOR my_table;
CREATE SYNONYM my_table_bak FOR my_table_old;

Now you can query your same old table through three different names, it’ll all result in the same output:

SELECT * FROM my_table;

-- Same thing:
SELECT * FROM my_table_old;
SELECT * FROM my_table_bak;

The trouble is, when you see my_table_bak in code (or some even more obfuscated name), do you immediately know what it really is?

Use this query to find out

We can use the ALL_SYNONYMS table to figure this one out. This query will already give a simple overview:

SELECT *
FROM   ALL_SYNONYMS
WHERE  TABLE_OWNER = 'PLAYGROUND'

The output is:

OWNER       SYNONYM_NAME  TABLE_OWNER  TABLE_NAME
---------------------------------------------------
PLAYGROUND  MY_TABLE_BAK  PLAYGROUND   MY_TABLE_OLD
PLAYGROUND  MY_TABLE_OLD  PLAYGROUND   MY_TABLE

But as you can see, this is boring, because we have transitive synonyms in there and I don’t want to go through the complete table to figure out that MY_TABLE_BAK -> MY_TABLE_OLD -> MY_TABLE.

So let’s use CONNECT BY!

Oracle (as well as Informix and CUBRID) have this awesome CONNECT BY clause for hierarchical SQL. There is also the possibility to express hierarchical SQL using the more powerful common table expressions, if you dare.

But let’s see how we can transitively resolve our tables. Here’s how:

SELECT 
  s.OWNER,
  s.SYNONYM_NAME,

  -- Get to the root of the hierarchy
  CONNECT_BY_ROOT s.TABLE_OWNER TABLE_OWNER,
  CONNECT_BY_ROOT s.TABLE_NAME  TABLE_NAME
FROM       ALL_SYNONYMS s
WHERE      s.TABLE_OWNER = 'PLAYGROUND'

-- The magic CONNECT BY clause!
CONNECT BY s.TABLE_OWNER = PRIOR s.OWNER
AND        s.TABLE_NAME  = PRIOR s.SYNONYM_NAME

First off, there is CONNECT BY, which allows to “connect” hierarchies by their hierarchical predecessors. On each level of the hierarchy, we’ll connect the TABLE_NAME with its previous (“PRIOR”) SYNONYM_NAME. This will recurse as long as the chain doesn’t end (or if it runs into a cycle).

What’s also interesting is the CONNECT_BY_ROOT keyword, which, for each path through the hierarchy, displays the root of the path. In our case, that’s the target TABLE_NAME.

The output can be seen here:

OWNER       SYNONYM_NAME  TABLE_OWNER  TABLE_NAME
---------------------------------------------------
PLAYGROUND  MY_TABLE_OLD  PLAYGROUND   MY_TABLE
PLAYGROUND  MY_TABLE_BAK  PLAYGROUND   MY_TABLE
PLAYGROUND  MY_TABLE_BAK  PLAYGROUND   MY_TABLE_OLD <-- Useless

If you’re confused by the records that are displayed, just add the LEVEL pseudo-column to display the recursion level:

SELECT

  -- Add level here
  LEVEL,
  s.OWNER,
  s.SYNONYM_NAME,
  CONNECT_BY_ROOT s.TABLE_OWNER TABLE_OWNER,
  CONNECT_BY_ROOT s.TABLE_NAME  TABLE_NAME
FROM       ALL_SYNONYMS s
WHERE      s.TABLE_OWNER = 'PLAYGROUND'
CONNECT BY s.TABLE_OWNER = PRIOR s.OWNER
AND        s.TABLE_NAME  = PRIOR s.SYNONYM_NAME

LEVEL  OWNER       SYNONYM_NAME  TABLE_OWNER  TABLE_NAME
----------------------------------------------------------
1      PLAYGROUND  MY_TABLE_OLD  PLAYGROUND   MY_TABLE
2      PLAYGROUND  MY_TABLE_BAK  PLAYGROUND   MY_TABLE
1      PLAYGROUND  MY_TABLE_BAK  PLAYGROUND   MY_TABLE_OLD
^^^^^^
  Awesome!

Getting rid of “bad records” using START WITH

As you can see, some of the results are now synonyms pointing directly to the target table, whereas the last record still points to an intermediate element from the synonym path. This is because we’re recursing into the path hierarchies from every record in the table, also from the “intermediate” synonym references, whose TABLE_NAME is yet another synonym.

Let’s get rid of those as well, using the optional START WITH clause, which allows to limit tree traversals to those trees whose roots fulfil a given predicate:

SELECT 
  s.OWNER,
  s.SYNONYM_NAME,
  CONNECT_BY_ROOT s.TABLE_OWNER TABLE_OWNER,
  CONNECT_BY_ROOT s.TABLE_NAME  TABLE_NAME
FROM       ALL_SYNONYMS s
WHERE      s.TABLE_OWNER = 'PLAYGROUND'
CONNECT BY s.TABLE_OWNER = PRIOR s.OWNER
AND        s.TABLE_NAME  = PRIOR s.SYNONYM_NAME

-- Start recursing only from non-synonym objects
START WITH EXISTS (
  SELECT 1
  FROM   ALL_OBJECTS
  WHERE  s.TABLE_OWNER           = ALL_OBJECTS.OWNER
  AND    s.TABLE_NAME            = ALL_OBJECTS.OBJECT_NAME
  AND    ALL_OBJECTS.OWNER       = 'PLAYGROUND'
  AND    ALL_OBJECTS.OBJECT_TYPE <> 'SYNONYM'
)

So, essentially, we’re requiring the TABLE_NAME to be any object from ALL_OBJECTS that is in our schema, but not a SYNONYM. (yes, synonyms work for all objects, including procedures, packages, types, etc.)

Running the above query gets us the desired result:

OWNER       SYNONYM_NAME  TABLE_OWNER  TABLE_NAME
---------------------------------------------------
PLAYGROUND  MY_TABLE_OLD  PLAYGROUND   MY_TABLE
PLAYGROUND  MY_TABLE_BAK  PLAYGROUND   MY_TABLE

What about PUBLIC synonyms?

Most often, you will not use local synonyms, though, but PUBLIC ones. Oracle has this quirky PUBLIC pseudo-schema, in which you cannot create objects, but in which you can create synonyms. So, let’s create some more synonyms for backwards-compatibility purposes:

CREATE PUBLIC SYNONYM my_table_bak2 FOR my_table_bak;
CREATE SYNONYM bak_backup_old FOR my_table_bak2;

Unfortunately, this will break our chain, because for some reason only Oracle and the Oracle of Delphi knows, PUBLIC is well reported as a OWNER of the synonym, but not as the TABLE_OWNER. Let’s see some raw data with:

SELECT *
FROM   ALL_SYNONYMS
WHERE  TABLE_OWNER = 'PLAYGROUND'

… and thus:

OWNER       SYNONYM_NAME    TABLE_OWNER  TABLE_NAME
------------------------------------------------------
PLAYGROUND  MY_TABLE_OLD    PLAYGROUND   MY_TABLE
PLAYGROUND  MY_TABLE_BAK    PLAYGROUND   MY_TABLE_OLD
PUBLIC      MY_TABLE_BAK2   PLAYGROUND   MY_TABLE_BAK
PLAYGROUND  BAK_BACKUP_OLD  PLAYGROUND   MY_TABLE_BAK2 <-- Not PUBLIC

As you can see, the PUBLIC SYNONYM MY_TABLE_BAK2 is reported to be in the PLAYGROUND schema! This breaks recursion, of course. We’re missing a record:

OWNER       SYNONYM_NAME    TABLE_OWNER  TABLE_NAME
------------------------------------------------------
PLAYGROUND  MY_TABLE_OLD    PLAYGROUND   MY_TABLE
PLAYGROUND  MY_TABLE_BAK    PLAYGROUND   MY_TABLE
PUBLIC      MY_TABLE_BAK2   PLAYGROUND   MY_TABLE <-- Hmm?

In order to work around this issue, we’ll have to tweak our original data set. Any object reported as (TABLE_OWNER, TABLE_NAME) might in fact be a synonym called ('PUBLIC', TABLE_NAME). The trick is thus to simply duplicate all input data as such:

SELECT 
  s.OWNER,
  s.SYNONYM_NAME,
  CONNECT_BY_ROOT s.TABLE_OWNER TABLE_OWNER,
  CONNECT_BY_ROOT s.TABLE_NAME  TABLE_NAME

-- Tweaked data set
FROM (
  SELECT OWNER, SYNONYM_NAME, TABLE_OWNER, TABLE_NAME
  FROM ALL_SYNONYMS
  UNION ALL
  SELECT OWNER, SYNONYM_NAME, 'PUBLIC', TABLE_NAME
  FROM ALL_SYNONYMS
) s

-- Add the synthetic PUBLIC TABLE_OWNER as well
WHERE      s.TABLE_OWNER IN (
  'PLAYGROUND', 'PUBLIC'
)
CONNECT BY s.TABLE_OWNER = PRIOR s.OWNER
AND        s.TABLE_NAME  = PRIOR s.SYNONYM_NAME
START WITH EXISTS (
  SELECT 1
  FROM   ALL_OBJECTS
  WHERE  s.TABLE_OWNER           = ALL_OBJECTS.OWNER
  AND    s.TABLE_NAME            = ALL_OBJECTS.OBJECT_NAME
  AND    ALL_OBJECTS.OWNER       = 'PLAYGROUND'
  AND    ALL_OBJECTS.OBJECT_TYPE <> 'SYNONYM'
)

There it is, our missing record!

OWNER       SYNONYM_NAME    TABLE_OWNER  TABLE_NAME
---------------------------------------------------
PLAYGROUND  MY_TABLE_OLD    PLAYGROUND   MY_TABLE
PLAYGROUND  MY_TABLE_BAK    PLAYGROUND   MY_TABLE
PUBLIC      MY_TABLE_BAK2   PLAYGROUND   MY_TABLE
PLAYGROUND  BAK_BACKUP_OLD  PLAYGROUND   MY_TABLE <-- Yep!

Displaying the hierarchy

There is also a quirky function called SYS_CONNECT_BY_PATH, which can be used to actually display the whole hierarchy in a string form (VARCHAR2, with max 4000 characters!). Here’s how:

SELECT 

-- Magic function
  SUBSTR(
    sys_connect_by_path(
         s.TABLE_OWNER
      || '.'
      || s.TABLE_NAME, ' <- '
    ) || ' <- '
      || s.OWNER
      || '.'
      || s.SYNONYM_NAME, 5
  )
FROM (
  SELECT OWNER, SYNONYM_NAME, TABLE_OWNER, TABLE_NAME
  FROM ALL_SYNONYMS
  UNION ALL
  SELECT OWNER, SYNONYM_NAME, 'PUBLIC', TABLE_NAME
  FROM ALL_SYNONYMS
) s
WHERE      s.TABLE_OWNER IN (
  'PLAYGROUND', 'PUBLIC'
)
CONNECT BY s.TABLE_OWNER = PRIOR s.OWNER
AND        s.TABLE_NAME  = PRIOR s.SYNONYM_NAME
START WITH EXISTS (
  SELECT 1
  FROM   ALL_OBJECTS
  WHERE  s.TABLE_OWNER           = ALL_OBJECTS.OWNER
  AND    s.TABLE_NAME            = ALL_OBJECTS.OBJECT_NAME
  AND    ALL_OBJECTS.OWNER       = 'PLAYGROUND'
  AND    ALL_OBJECTS.OBJECT_TYPE <> 'SYNONYM'
)

The above query will now output the following records:

PLAYGROUND.MY_TABLE <- PLAYGROUND.MY_TABLE_OLD
PLAYGROUND.MY_TABLE <- PLAYGROUND.MY_TABLE_OLD <- PLAYGROUND.MY_TABLE_BAK
PLAYGROUND.MY_TABLE <- PLAYGROUND.MY_TABLE_OLD <- PLAYGROUND.MY_TABLE_BAK <- PUBLIC.MY_TABLE_BAK2
PLAYGROUND.MY_TABLE <- PLAYGROUND.MY_TABLE_OLD <- PLAYGROUND.MY_TABLE_BAK <- PUBLIC.MY_TABLE_BAK2 <- PLAYGROUND.BAK_BACKUP_OLD

Impressive, eh?

Remark: In case you have stale synonyms

If you have “stale” synonyms, i.e. synonyms that point to nowhere, Oracle may report them to be pointing to themselves. That’s unfortunate and creates a CYCLE in CONNECT BY. To prevent this from happening, simply add another predicate like so:

SELECT 
  SUBSTR(
    sys_connect_by_path(
         s.TABLE_OWNER
      || '.'
      || s.TABLE_NAME, ' <- '
    ) || ' <- '
      || s.OWNER
      || '.'
      || s.SYNONYM_NAME, 5
  )
FROM (
  SELECT * FROM (
    SELECT OWNER, SYNONYM_NAME, TABLE_OWNER, TABLE_NAME
    FROM ALL_SYNONYMS
    UNION ALL
    SELECT OWNER, SYNONYM_NAME, 'PUBLIC', TABLE_NAME
    FROM ALL_SYNONYMS
  ) s

  -- Add this predicate to prevent cycles
  WHERE (s.OWNER       , s.SYNONYM_NAME)
    != ((s.TABLE_OWNER , s.TABLE_NAME))
) s
CONNECT BY s.TABLE_OWNER = PRIOR s.OWNER
AND        s.TABLE_NAME  = PRIOR s.SYNONYM_NAME
START WITH EXISTS (
  SELECT 1
  FROM   ALL_OBJECTS
  WHERE  s.TABLE_OWNER           = ALL_OBJECTS.OWNER
  AND    s.TABLE_NAME            = ALL_OBJECTS.OBJECT_NAME
  AND    ALL_OBJECTS.OWNER       = 'PLAYGROUND'
  AND    ALL_OBJECTS.OBJECT_TYPE <> 'SYNONYM'
)

Can the above query be written in jOOQ?

Yes of course. In jOOQ, pretty much everything is possible, if you can write it in SQL. Here’s how we use a query similar to the above to resolve Oracle Synonmys in the jOOQ code generator:

// Some reusable variables
AllObjects o   = ALL_OBJECTS;
AllSynonyms s1 = ALL_SYNONYMS;
AllSynonyms s2 = ALL_SYNONYMS.as("s2");
AllSynonyms s3 = ALL_SYNONYMS.as("s3");

Field<String> dot = inline(".");
String arr = " <- ";

// The actual qeury
DSL
.using(configuration)
.select(
  s3.OWNER,
  s3.SYNONYM_NAME,
  connectByRoot(s3.TABLE_OWNER).as("TABLE_OWNER"),
  connectByRoot(s3.TABLE_NAME).as("TABLE_NAME"),
  substring(
    sysConnectByPath(
      s3.TABLE_OWNER.concat(dot)
                    .concat(s3.TABLE_NAME), 
      arr
    )
    .concat(arr)
    .concat(s3.OWNER)
    .concat(dot)
    .concat(s3.SYNONYM_NAME), 
    5
  ))
.from(
  select()
  .from(
    select(
      s1.OWNER, s1.SYNONYM_NAME, 
      s1.TABLE_OWNER, s1.TABLE_NAME)
    .from(s1)
    .union(
    select(
      s1.OWNER, s1.SYNONYM_NAME, 
      inline("PUBLIC"), s1.TABLE_NAME)
    .from(s1))
    .asTable("s2"))
  .where(row(s2.OWNER, s2.SYNONYM_NAME)
         .ne(s2.TABLE_OWNER, s2.TABLE_NAME))
  .asTable("s3"))
.connectBy(s3.TABLE_OWNER.eq(prior(s3.OWNER)))
.and(s3.TABLE_NAME.eq(prior(s3.SYNONYM_NAME)))
.startWith(exists(
  selectOne()
  .from(o)
  .where(s3.TABLE_OWNER.eq(o.OWNER))
  .and(s3.TABLE_NAME.eq(o.OBJECT_NAME))
  .and(o.OBJECT_TYPE.ne("SYNONYM"))
  .and(o.OWNER.in(getInputSchemata()))
))
.fetch();

Download jOOQ today and try it yourself!

Conclusion

If you have an intrinsically hierarchical data set, then you will be very unhappy with these simplistic hierarchical SQL features (also with commont table expressions). They don’t perform very well, and they’re very hard to express if hierarchies get more complex. So you may as well consider using an actual graph database like Neo4j.

But every now and then, a little hierarchy may sneak into your otherwise “standard” relational data model. When it does, be sure to have this useful CONNECT BY clause ready for action.

CONNECT BY is supported by (at least):