In this post I’m not going to compare two different RDBMS by doing load testing and measuring TPC. Neither I’m going to dive into different features or architecture. Instead the goal is to solve one specific SQL quiz and highlight some strengths and weaknesses for both RDBMS.
Let’s assume we a have a table with transactions
create table trans as
select t2.*, t1.*
from (select level transaction_id, abs(trunc(1e1 * sin(level * level))) quantity, abs(trunc(1e1 * sin(level) * (1 + level / 2e1))) price
from dual
connect by level <= 2e1) t1
,(select level product_id
from dual
connect by level <= 1) t2
order by 1, 2;
Our task is to calculate total for the last N transactions if prices were as first N transactions. In particular for N = 4:
SQL> select
2 t.*,
3 nth_value(price, 1) over (partition by product_id order by transaction_id
4 rows between unbounded preceding and unbounded following)
5 * lag(quantity, 3, 0) over (partition by product_id order by transaction_id) +
6 nth_value(price, 2) over (partition by product_id order by transaction_id
7 rows between unbounded preceding and unbounded following)
8 * lag(quantity, 2, 0) over (partition by product_id order by transaction_id) +
9 nth_value(price, 3) over (partition by product_id order by transaction_id
10 rows between unbounded preceding and unbounded following)
11 * lag(quantity, 1, 0) over (partition by product_id order by transaction_id) +
12 nth_value(price, 4) over (partition by product_id order by transaction_id
13 rows between unbounded preceding and unbounded following)
14 * lag(quantity, 0, 0) over (partition by product_id order by transaction_id) result
15 from trans t;
PRODUCT_ID TRANSACTION_ID QUANTITY PRICE RESULT
---------- -------------- ---------- ---------- ----------
1 1 8 8 72
1 2 7 10 71
1 3 4 1 123
1 4 2 9 156
1 5 1 11 107
1 6 9 3 134
1 7 9 8 116
1 8 9 13 188
1 9 6 5 225
1 10 5 8 213
1 11 9 15 218
1 12 4 8 143
1 13 6 6 188
1 14 9 16 199
1 15 9 11 182
1 16 9 5 228
1 17 0 17 171
1 18 4 14 198
1 19 2 2 94
1 20 8 18 114
20 rows selected.
Above query does the job with a single table scan and single sort but it works only for the fixed N. There are a few alternatives for the arbitrary N though.
Single join + group by with keep
with t as(select trans.*, row_number() over(partition by product_id order by transaction_id) rn from trans)
select sum(result) chk
from
(
select t1.product_id
,max(t1.transaction_id) transaction_id
,sum(t1.quantity * t2.price) as result
,max(t1.quantity) keep (dense_rank last order by t1.rn) quantity
,max(t1.price) keep (dense_rank last order by t1.rn) price
from t t1 join t t2 on t1.product_id = t2.product_id
where t2.rn <= 20
group by t1.product_id, t1.rn - t2.rn
having max(t2.rn) = 20
);
Single join + group by without keep
with t as(select trans.*, row_number() over(partition by product_id order by transaction_id) rn from trans)
select sum(result) chk
from
(
select t1.product_id
,max(t1.transaction_id) transaction_id
,sum(t1.quantity * t2.price) as result
,max(case when t2.rn=20 then t1.quantity end) quantity
,max(case when t2.rn=20 then t1.price end) price
from t t1 join t t2 on t1.product_id = t2.product_id
where t2.rn <= 20
group by t1.product_id, t1.rn - t2.rn
having max(t2.rn) = 20
);
Single join + window functions (sum and max) + filter
with t as(select trans.*, row_number() over(partition by product_id order by transaction_id) rn from trans)
select sum(result) chk
from (select t1.*
,sum(t1.quantity * t2.price) over (partition by t1.product_id, t1.rn - t2.rn) result
,max(t2.rn) over (partition by t1.product_id, t1.rn - t2.rn) mrn
,row_number() over(partition by t1.product_id, t1.rn - t2.rn order by t1.transaction_id desc) i
from t t1 join t t2 on t1.product_id = t2.product_id
where t2.rn <= 20) tt
where i = 1 and mrn = 20;
Single join + window function lag with dynamic shift + group by
select sum(result) chk
from
(
select product_id, transaction_id, quantity, price, sum(pq) result
from (select product_id
,t.transaction_id
,t.quantity
,t.price
,p.price * lag(quantity, (20 - rn) * 20) over(partition by product_id order by transaction_id, rn) pq
from trans t
join (select product_id, row_number() over(partition by product_id order by transaction_id) rn, price
from trans s
where transaction_id <= 20) p
using (product_id)) t0
group by product_id, transaction_id, quantity, price
) tt;
Two joins
with ss as
(
select row_number() over (partition by product_id order by transaction_id) i, s.*
from trans s
)
select sum(result) chk
from
(
select s1.product_id, s1.transaction_id, s1.quantity, s1.price, sum(s2.quantity * s3.price) result
from ss s1
join ss s3 on s1.product_id = s3.product_id and s3.i <= 20
join ss s2 on s1.product_id = s2.product_id and s1.i + s3.i - 20 = s2.i
group by s1.product_id, s1.transaction_id, s1.quantity, s1.price
) tt;
Let’s generate 200000 transactions for 5 products and compare elapsed times for
- Oracle 12.2 on Windows
- PostgreSQL 15.1 on Windows
- Oracle 23.26 on OLS 8
- PostgreSQL 17.6 on Ubuntu 24
| ORA Windows | ORA OLS | PG Windows | PG Ubuntu | |
|---|---|---|---|---|
| window functions | 14 | 14 | 8 | 9 |
| single join + agg functions (KEEP) | 20 | 20 | – | – |
| single join + agg functions (NO KEEP) | 12 | 6 | 12 | 9 |
| single join + window functions (NO LAG) | 24 | 24 | 48 | 27 |
| single join + window functions (LAG) | – | – | 24 | 22 |
| two joins | 11 | 14 | 23 | 11 |
Below is a comparison table with elapsed times for each query.
Let’s briefly highlight some strengths and weaknesses.
- PostgreSQL does not support
keepfor aggregate functions. - Oracle has extremely inefficient implementation of lag function when second argument is not constant (I’m not sure if this is listed as bug on Oracle Support). You can find simplified case to reproduce here.
- First option runs almost twice faster on PostgreSQL which is a sign that implementation of window functions is more optimal.
- The key difference between KEEP/NO KEEP version with aggregate functions in Oracle is that it uses different algorithms for
group by–HASH GROUP BY/SORT GROUP BY. We can disable hash aggregation with the hintNO_USE_HASH_AGGREGATIONand runtime for both queries will be the same. - Version without
keepruns twice faster on the latest version of Oracle comparing to 12.2 however plans are logically identical so most likely the implementation ofHASH GROUP BYhas been improved. - Timings for PostgreSQL on Ubuntu are slightly better than on Windows because JIT (Just-in-Time Compilation) was used. When it is disabled the timings become almost the same.
Other technical details used for testing
- Workarea was set to 2GB to minimize temp file usage and avoid swapping into disk.
- PostgreSQL chooses not optimal join method for the last query. To avoid that I had to switch off enable_mergejoin/enable_nestloop before execution and reset afterwards. Also I had to specify
as not materializedso that it applies predicatei <= 20before the join.
Full scripts used for testing – floating_window_cmp_ora.sql and floating_window_cmp_pg.sql.
Overall verdict
- Execution time is quite similar for both RDBMS. PostgreSQL seems to have better optimized implementation for window functions while Oracle has better runtime with aggregate functions on the latest version.
- Oracle has more advanced optimizer and especially query transformation engine. It is both a great power (because it provides additional ways to execute the query and developers need to worry a bit less how exactly they express logic in SQL) and a great head ache (because the execution with specific query plan becomes less predictable).
- Also, Oracle has advanced SQL features like
model clauseorpattern matching. They were not super necessary for this specific task but you can check the solutions in Friday’s quiz – CALCULATE EARNINGS. That thread has other performance related details.
So two best options are single join with aggregate functions (no keep keyword) and two joins. But who loves using extra join unless it gives noticeable improvement? 🙂