Oracle vs PostgreSQL. One specific query challenge.

In this post I’m not going to compare two different RDBMS by doing load testing and measuring TPC. Neither I’m going to dive into different features or architecture. Instead the goal is to solve one specific SQL quiz and highlight some strengths and weaknesses for both RDBMS.

Let’s assume we a have a table with transactions

create table trans as
select t2.*, t1.*
from (select level transaction_id, abs(trunc(1e1 * sin(level * level))) quantity, abs(trunc(1e1 * sin(level) * (1 + level / 2e1))) price
     from dual
     connect by level <= 2e1) t1
   ,(select level product_id
     from dual
     connect by level <= 1) t2
order by 1, 2;

Our task is to calculate total for the last N transactions if prices were as first N transactions. In particular for N = 4:

SQL> select
  2    t.*,
  3    nth_value(price, 1) over (partition by product_id order by transaction_id
  4                              rows between unbounded preceding and unbounded following)
  5    * lag(quantity, 3, 0) over (partition by product_id order by transaction_id) +
  6    nth_value(price, 2) over (partition by product_id order by transaction_id
  7                              rows between unbounded preceding and unbounded following)
  8    * lag(quantity, 2, 0) over (partition by product_id order by transaction_id) +
  9    nth_value(price, 3) over (partition by product_id order by transaction_id
 10                              rows between unbounded preceding and unbounded following)
 11    * lag(quantity, 1, 0) over (partition by product_id order by transaction_id) +
 12    nth_value(price, 4) over (partition by product_id order by transaction_id
 13                              rows between unbounded preceding and unbounded following)
 14    * lag(quantity, 0, 0) over (partition by product_id order by transaction_id) result
 15  from trans t;

PRODUCT_ID TRANSACTION_ID   QUANTITY      PRICE     RESULT
---------- -------------- ---------- ---------- ----------
         1              1          8          8         72
         1              2          7         10         71
         1              3          4          1        123
         1              4          2          9        156
         1              5          1         11        107
         1              6          9          3        134
         1              7          9          8        116
         1              8          9         13        188
         1              9          6          5        225
         1             10          5          8        213
         1             11          9         15        218
         1             12          4          8        143
         1             13          6          6        188
         1             14          9         16        199
         1             15          9         11        182
         1             16          9          5        228
         1             17          0         17        171
         1             18          4         14        198
         1             19          2          2         94
         1             20          8         18        114

20 rows selected.

Above query does the job with a single table scan and single sort but it works only for the fixed N. There are a few alternatives for the arbitrary N though.

Single join + group by with keep

with t as(select trans.*, row_number() over(partition by product_id order by transaction_id) rn from trans)
select sum(result) chk
from
(
select t1.product_id
      ,max(t1.transaction_id) transaction_id
      ,sum(t1.quantity * t2.price) as result
      ,max(t1.quantity) keep (dense_rank last order by t1.rn) quantity
      ,max(t1.price) keep (dense_rank last order by t1.rn) price
from t t1 join t t2 on t1.product_id = t2.product_id
where t2.rn <= 20
group by t1.product_id, t1.rn - t2.rn
having max(t2.rn) = 20
);

Single join + group by without keep

with t as(select trans.*, row_number() over(partition by product_id order by transaction_id) rn from trans)
select sum(result) chk
from
(
select t1.product_id
      ,max(t1.transaction_id) transaction_id
      ,sum(t1.quantity * t2.price) as result
      ,max(case when t2.rn=20 then t1.quantity end) quantity
      ,max(case when t2.rn=20 then t1.price end) price
from t t1 join t t2 on t1.product_id = t2.product_id
where t2.rn <= 20
group by t1.product_id, t1.rn - t2.rn
having max(t2.rn) = 20
);

Single join + window functions (sum and max) + filter

with t as(select trans.*, row_number() over(partition by product_id order by transaction_id) rn from trans)
select sum(result) chk
from (select t1.*
            ,sum(t1.quantity * t2.price) over (partition by t1.product_id, t1.rn - t2.rn) result
            ,max(t2.rn) over (partition by t1.product_id, t1.rn - t2.rn) mrn
            ,row_number() over(partition by t1.product_id, t1.rn - t2.rn order by t1.transaction_id desc) i
      from t t1 join t t2 on t1.product_id = t2.product_id
      where t2.rn <= 20) tt
where i = 1 and mrn = 20;

Single join + window function lag with dynamic shift + group by

select sum(result) chk
from
(
select product_id, transaction_id, quantity, price, sum(pq) result
from (select product_id
            ,t.transaction_id
            ,t.quantity
            ,t.price
            ,p.price * lag(quantity, (20 - rn) * 20) over(partition by product_id order by transaction_id, rn) pq
      from trans t
           join (select product_id, row_number() over(partition by product_id order by transaction_id) rn, price
                 from trans s
                 where transaction_id <= 20) p
               using (product_id)) t0
group by product_id, transaction_id, quantity, price
) tt;

Two joins

with ss as
(
select row_number() over (partition by product_id order by transaction_id) i, s.*
from trans s
)
select sum(result) chk
from
(
select s1.product_id, s1.transaction_id, s1.quantity, s1.price, sum(s2.quantity * s3.price) result
from ss s1
join ss s3 on s1.product_id = s3.product_id and s3.i <= 20
join ss s2 on s1.product_id = s2.product_id and s1.i + s3.i - 20 = s2.i
group by s1.product_id, s1.transaction_id, s1.quantity, s1.price
) tt;

Let’s generate 200000 transactions for 5 products and compare elapsed times for

Oracle 12.2 on Windows
PostgreSQL 15.1 on Windows
Oracle 23.26 on OLS 8
PostgreSQL 17.6 on Ubuntu 24

	ORA Windows	ORA OLS	PG Windows	PG Ubuntu
window functions	14	14	8	9
single join + agg functions (KEEP)	20	20	–	–
single join + agg functions (NO KEEP)	12	6	12	9
single join + window functions (NO LAG)	24	24	48	27
single join + window functions (LAG)	–	–	24	22
two joins	11	14	23	11

Below is a comparison table with elapsed times for each query.

Let’s briefly highlight some strengths and weaknesses.

PostgreSQL does not support keep for aggregate functions.
Oracle has extremely inefficient implementation of lag function when second argument is not constant (I’m not sure if this is listed as bug on Oracle Support). You can find simplified case to reproduce here.
First option runs almost twice faster on PostgreSQL which is a sign that implementation of window functions is more optimal.
The key difference between KEEP/NO KEEP version with aggregate functions in Oracle is that it uses different algorithms for group by – HASH GROUP BY/SORT GROUP BY. We can disable hash aggregation with the hint NO_USE_HASH_AGGREGATION and runtime for both queries will be the same.
Version without keep runs twice faster on the latest version of Oracle comparing to 12.2 however plans are logically identical so most likely the implementation of HASH GROUP BY has been improved.
Timings for PostgreSQL on Ubuntu are slightly better than on Windows because JIT (Just-in-Time Compilation) was used. When it is disabled the timings become almost the same.

Other technical details used for testing

Workarea was set to 2GB to minimize temp file usage and avoid swapping into disk.
PostgreSQL chooses not optimal join method for the last query. To avoid that I had to switch off enable_mergejoin/enable_nestloop before execution and reset afterwards. Also I had to specify as not materialized so that it applies predicate i <= 20 before the join.

Full scripts used for testing – floating_window_cmp_ora.sql and floating_window_cmp_pg.sql.

Overall verdict

Execution time is quite similar for both RDBMS. PostgreSQL seems to have better optimized implementation for window functions while Oracle has better runtime with aggregate functions on the latest version.
Oracle has more advanced optimizer and especially query transformation engine. It is both a great power (because it provides additional ways to execute the query and developers need to worry a bit less how exactly they express logic in SQL) and a great head ache (because the execution with specific query plan becomes less predictable).
Also, Oracle has advanced SQL features like model clause or pattern matching. They were not super necessary for this specific task but you can check the solutions in Friday’s quiz – CALCULATE EARNINGS. That thread has other performance related details.

So two best options are single join with aggregate functions (no keep keyword) and two joins. But who loves using extra join unless it gives noticeable improvement? 🙂

SQLMDX

Eventually it's all about data

Oracle vs PostgreSQL. One specific query challenge.

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply