Domain indexes. Built-in vs DIY.

In this I’ll do a quick performance comparison of built-in Oracle Text index versus the one created using Extensible indexing interface (described in this post).

First of all, Oracle Text option must be installed to use built-in text indexes. This functionality is getting installed under CTXSYS schema.

Speaking about DIY text indexes, it’s mandatory to have privileges to create index types and operators or simply RESOURCE role.

Once all permissions are granted, below statement can be used to simulate functionality implemented in tp_instr using built-in capabilities.

begin 
    ctx_ddl.drop_preference('mywordlist');
    ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
    ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');    
end; 
/
create index idx_t_str_ctx on t(str) indextype is ctxsys.context parameters ('Wordlist mywordlist');

Similar to the approach used in tp_instr Oracle creates “ladder” table.

SQL> select * from dr$idx_t_str_ctx$p
  2   where pat_part1 || pat_part2 = upper('DRNXIrhioCpVQZtOrACL')
  3   order by length(pat_part2);

PAT_PART1                 PAT_PART2
------------------------- -------------------------
DRNXIRHIOCPVQZTOR         ACL
DRNXIRHIOCPVQZTO          RACL
DRNXIRHIOCPVQZT           ORACL
DRNXIRHIOCPVQZ            TORACL
DRNXIRHIOCPVQ             ZTORACL
DRNXIRHIOCPV              QZTORACL
DRNXIRHIOCP               VQZTORACL
DRNXIRHIOC                PVQZTORACL
DRNXIRHIO                 CPVQZTORACL
DRNXIRHI                  OCPVQZTORACL
DRNXIRH                   IOCPVQZTORACL
DRNXIR                    HIOCPVQZTORACL
DRNXI                     RHIOCPVQZTORACL
DRNX                      IRHIOCPVQZTORACL
DRN                       XIRHIOCPVQZTORACL
DR                        NXIRHIOCPVQZTORACL
D                         RNXIRHIOCPVQZTORACL

17 rows selected.

Performance test brings below timings

set timing on
set serveroutput on

declare
  x    int;
  ts   timestamp;
  n    int := 1000;
begin
  ts := systimestamp;

  for i in 1 .. n / 100
  loop
    select count (*)
      into x
      from t
     where upper (str) like '%ORAC%';
  end loop;

  dbms_output.put_line (systimestamp - ts);
  dbms_output.put_line (x);

  ts := systimestamp;

  for i in 1 .. n
  loop
    select count (*)
      into x
      from t
     where contains (str, '%ORAC%') > 0;
  end loop;

  dbms_output.put_line (systimestamp - ts);
  dbms_output.put_line (x);

  ts := systimestamp;

  for i in 1 .. n
  loop
    select count (*)
      into x
      from t
     where op_instr (str, lower ('ORAC')) = 1;
  end loop;

  dbms_output.put_line (systimestamp - ts);
  dbms_output.put_line (x);

  ts := systimestamp;

  for i in 1 .. n
  loop
    select count (*)
      into x
      from idx_t_str$
     where sub_str like lower ('ORAC%');
  end loop;

  dbms_output.put_line (systimestamp - ts);
  dbms_output.put_line (x);
end;
/
+000000000 00:00:02.739000000
38
+000000000 00:00:02.170000000
38
+000000000 00:00:00.380000000
38
+000000000 00:00:00.022000000
38

So built-in index is 100 times faster then approach with like (similar timings but 100 times more executions). DIY text index is about 5 times faster then built-in one.
And finally, the statement which queries data from auxiliary table is approximately 20 times faster then DIY index.

This clearly shows that irrespective whatever approach you choose for domain indexing it will introduce noticeable overhead. Further details can be found using dbms_hprof and SQL tracing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: