SQLMDX

Eventually it's all about data

Query transformations in Oracle: or-expansion

Posted by sqlmdx on December 8, 2013

As you may know there are two types of query transformations in Oracle: heuristic-based and cost-based.
Heuristic-based transformations are supposed to produce better plan all the time so they are applied without cost comparison while cost-based transformations not always lead to better plans so they are applied only if the cost of transformed query block is less than the cost of the original one.

I’m not sure whether full list of transformations is publicly available but some brief classification could be find in a white paper Query Optimization in Oracle Database 10g Release 2.

It’s obvious that some transformations such as “outer join to inner join conversion” always lead to better plan while others such as “join factorization” can either improve the plan or make it worse thus former is treated as heuristic-based transformation while latter as a cost-based.

Well, as white paper states, or-expansion is cost-based transformation so it should be applied only if it leads to lower cost.
Let’s check whether that’s always true.

SQL> create table t(id, value) as select rownum, rownum from dual connect by level <= 10000;

Table created.

SQL> create index ti on t(id);

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'t');

PL/SQL procedure successfully completed.

SQL> explain plan for select * from t where id = nvl(:b,id);

Explained.

SQL> select * from table(dbms_xplan.display);
Plan hash value: 3577693071

--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      | 10001 | 80008 |    10   (0)| 00:00:01 |
|   1 |  CONCATENATION                |      |       |       |            |          |
|*  2 |   FILTER                      |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL          | T    | 10000 | 80000 |     8   (0)| 00:00:01 |
|*  4 |   FILTER                      |      |       |       |            |          |
|   5 |    TABLE ACCESS BY INDEX ROWID| T    |     1 |     8 |     2   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | TI   |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(:B IS NULL)
   3 - filter("ID" IS NOT NULL)
   4 - filter(:B IS NOT NULL)
   6 - access("ID"=:B)

21 rows selected.

SQL> explain plan for select * from t where id + 0 = nvl(:b,id);

Explained.

SQL> select * from table(dbms_xplan.display);
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     8 |     8   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |     8 |     8   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"+0=NVL(:B,"ID"))

13 rows selected.

As you can see adding “+ 0” prevents transformation from being applied.
But it’s a bit surprising that cost without transformation dropped from 10 to 8.
This a bit violates the rule that transformation is applied only when it results in lower cost.
On the other hand plan contains two branches and it’s clear that only one branch will be chosen during execution so cost of expected work is either 8 or 2.

Let’s check an another example.

create table t as
select trunc((rownum - 1) / 100) id1,
       mod(rownum, 100) id2,
       lpad(' ', 4000, ' ') padding
  from dual
connect by rownum <= 10000;
create index ti1 on t(id1);
create index ti2 on t(id2);
exec dbms_stats.gather_table_stats(user,'t');
SQL> set autot trace expl
SQL> select --+
  2   *
  3    from t
  4   where id1 = 1
  5      or id2 = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 915952206

-----------------------------------------------------------------------------------------
| Id  | Operation                        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |      |   199 |   778K|   322   (0)| 00:00:04 |
|   1 |  TABLE ACCESS BY INDEX ROWID     | T    |   199 |   778K|   322   (0)| 00:00:04 |
|   2 |   BITMAP CONVERSION TO ROWIDS    |      |       |       |            |          |
|   3 |    BITMAP OR                     |      |       |       |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS|      |       |       |            |          |
|*  5 |      INDEX RANGE SCAN            | TI1  |       |       |     1   (0)| 00:00:01 |
|   6 |     BITMAP CONVERSION FROM ROWIDS|      |       |       |            |          |
|*  7 |      INDEX RANGE SCAN            | TI2  |       |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("ID1"=1)
   7 - access("ID2"=1)

SQL> select --+ opt_param('_b_tree_bitmap_plans' 'false')
  2   *
  3    from t
  4   where id1 = 1
  5      or id2 = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1215555227

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |   199 |   778K|   202   (0)| 00:00:03 |
|   1 |  CONCATENATION               |      |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T    |   100 |   391K|   101   (0)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | TI2  |   100 |       |     1   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| T    |    99 |   387K|   101   (0)| 00:00:02 |
|*  5 |    INDEX RANGE SCAN          | TI1  |   100 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID2"=1)
   4 - filter(LNNVL("ID2"=1))
   5 - access("ID1"=1)

It’s interesting here that optimizer choses bitmap index conversion nevertheless or-expansion is cheaper.

These two examples show that or-expansion is applied not in a true cost-based manner.

However it’s possible to demonstrate that cost plays some role in applying this transformation.
or-expansion was first introduced in 8.1.7 so let’s run a bit adjusted script on 8i.

create table t as
select trunc((rownum - 1) / 1000) id1,
       mod(rownum, 10) id2,
       lpad(' ', 400, ' ') padding
  from all_objects
 where rownum <= 10000;
create index ti1 on t(id1);
create index ti2 on t(id2);
exec dbms_stats.gather_table_stats(user,'t');
SQL> select *
  2    from t
  3   where id1 = 1
  4      or id2 = 1;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=90 Card=1900 Bytes=7
          71400)

   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=90 Card=1900 Bytes=771400
          )




SQL> select --+ use_concat
  2   *
  3    from t
  4   where id1 = 1
  5      or id2 = 1;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=166 Card=1051 Bytes=
          426706)

   1    0   CONCATENATION
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=83 Card=51 By
          tes=20706)

   3    2       INDEX (RANGE SCAN) OF 'TI2' (NON-UNIQUE) (Cost=3 Card=
          51)

   4    1     TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=83 Card=51 By
          tes=20706)

   5    4       INDEX (RANGE SCAN) OF 'TI1' (NON-UNIQUE) (Cost=3 Card=
          51)

Output shows that or-expansion was not applied because it increases cost. However with data distribution from above example it takes place.
Which means that cost was taken into consideration.
The similar behavior on 11.2.0.1.

So as it was demonstrated, or-expansion can be applied on 11gR2 not in a “real” cost-based manner while even on 8i it can behave like cost-based transformation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: