SQL Interview Question by a FAANG company

Some time ago I stumbled into this video on youtube with an interesting SQL quiz (they claim it is interview question by FAANG company). Solution from the video contains a number of subqueries and joins however none of that is actually needed. Let me show you how this can be solved with a single table scan.

The challenge: for a table t(q int, d date, t int) which contains movements on a warehouse you need to group the items into the buckets depending on how long they have been on the warehouse as of current date. Buckets are 0<..<=90, 90<..<=180, 180<..<=270, 270<..<=360, 360<. Items are managed as FIFO i.e. first in – first out.
q - quantity, d - date, t - type of operation where 1 means "in" and -1 means "out"

So below is the actual data

with
    t (q, d, t) as
    (select 99, date '2020-05-25', -1 from dual
     union all
     select 31, date '2020-05-24', 1 from dual
     union all
     select 1, date '2020-05-24', -1 from dual
     union all
     select 1, date '2020-05-23', -1 from dual
     union all
     select 102, date '2020-04-25', 1 from dual
     union all
     select 43, date '2020-04-25', 1 from dual
     union all
     select 2, date '2020-02-25', -1 from dual
     union all
     select 129, date '2020-02-18', -1 from dual
     union all
     select 1, date '2020-02-18', -1 from dual
     union all
     select 27, date '2020-01-29', -1 from dual
     union all
     select 120, date '2019-12-31', 1 from dual
     union all
     select 8, date '2019-05-22', -1 from dual
     union all
     select 250, date '2019-05-20', 1 from dual)

Let’s start with a first bucket – i.e. items which have been on a warehouse less than 90 days. Clearly the upper bound for the number of items in the first bucket is those which arrived in last 90 days – Arrived.

However we also might need exclude some dispatched items if they among those which arrived in last 90 days – Dispatched.

So how do we know how many dispatched items in last 90 days among those which arrived in last 90 days? We need to substract number of items as of day 90 (Initial) from total number of items dispatched in last 90 days (Dispatched_Original). If result is positive then this is what we need to exclude from Arrived otherwise all Dispatched_Original times were takes from those which arrived before day 90 and we do not need to exclude anything from Arrived.

So the formula for specific bucket is

Arrived - Dispatched = Arrived - greatest(Dispacted_Original - Initial, 0)

Or expressed in SQL

   -- arrived
     nvl(sum(q * nullif(t, -1)) over(order by d desc range between current row and 90 following), 0)
   -- dispatched = greatest(dispatched_original - initial, 0)
   - greatest(
           -- dispatched_original
           -nvl(sum(q * nullif(t, 1)) over(order by d desc range between current row and 90 following), 0)
         -- initial
         - nvl(sum(q * t) over(order by d desc range between 90 following and unbounded following), 0)
        ,0) s_0_90
Click here to view full solution

with
    t (q, d, t) as
    (select 99, date '2020-05-25', -1 from dual
     union all
     select 31, date '2020-05-24', 1 from dual
     union all
     select 1, date '2020-05-24', -1 from dual
     union all
     select 1, date '2020-05-23', -1 from dual
     union all
     select 102, date '2020-04-25', 1 from dual
     union all
     select 43, date '2020-04-25', 1 from dual
     union all
     select 2, date '2020-02-25', -1 from dual
     union all
     select 129, date '2020-02-18', -1 from dual
     union all
     select 1, date '2020-02-18', -1 from dual
     union all
     select 27, date '2020-01-29', -1 from dual
     union all
     select 120, date '2019-12-31', 1 from dual
     union all
     select 8, date '2019-05-22', -1 from dual
     union all
     select 250, date '2019-05-20', 1 from dual),
    tt as
    (select t.*
           ,greatest(
                  nvl(sum(q * nullif(t, -1)) over(order by d desc range between current row and 90 following), 0)
                - greatest(
                        -nvl(sum(q * nullif(t, 1)) over(order by d desc range between current row and 90 following), 0)
                      - nvl(sum(q * t) over(order by d desc range between 90 following and unbounded following), 0)
                     ,0)
               ,0) s_0_90
           ,greatest(
                  nvl(sum(q * nullif(t, -1)) over(order by d desc range between 90 following and 180 following), 0)
                - greatest(
                        -nvl(sum(q * nullif(t, 1)) over(order by d desc range between current row and 180 following), 0)
                      - nvl(sum(q * t) over(order by d desc range between 180 following and unbounded following), 0)
                     ,0)
               ,0) s_90_180
           ,greatest(
                  nvl(sum(q * nullif(t, -1)) over(order by d desc range between 180 following and 270 following), 0)
                - greatest(
                        -nvl(sum(q * nullif(t, 1)) over(order by d desc range between current row and 270 following), 0)
                      - nvl(sum(q * t) over(order by d desc range between 270 following and unbounded following), 0)
                     ,0)
               ,0) s_180_270
           ,greatest(
                  nvl(sum(q * nullif(t, -1)) over(order by d desc range between 270 following and 360 following), 0)
                - greatest(
                        -nvl(sum(q * nullif(t, 1)) over(order by d desc range between current row and 360 following), 0)
                      - nvl(sum(q * t) over(order by d desc range between 360 following and unbounded following), 0)
                     ,0)
               ,0) s_270_360
           ,greatest(
                  nvl(sum(q * nullif(t, -1)) over(order by d desc range between 360 following and unbounded following), 0)
                - greatest(-nvl(sum(q * nullif(t, 1)) over(order by d desc range between current row and unbounded following), 0), 0)
               ,0) s_360
     from t)
select tt.*, s_0_90 + s_90_180 + s_180_270 + s_270_360 + s_360 - sum(q * t) over (order by d) chk
from tt
order by d desc;

         Q D                  T     S_0_90   S_90_180  S_180_270  S_270_360      S_360        CHK
---------- --------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
        99 25-MAY-20         -1        176        102          0          0          0          0
        31 24-MAY-20          1        176        120          0          0         81          0
         1 24-MAY-20         -1        176        120          0          0         81          0
         1 23-MAY-20         -1        145        120          0          0         82          0
        43 25-APR-20          1        145        120          0         83          0          0
       102 25-APR-20          1        145        120          0         83          0          0
         2 25-FEB-20         -1        120          0          0         83          0          0
         1 18-FEB-20         -1        120          0          0         85          0          0
       129 18-FEB-20         -1        120          0          0         85          0          0
        27 29-JAN-20         -1        120          0        215          0          0          0
       120 31-DEC-19          1        120          0        242          0          0          0
         8 22-MAY-19         -1        242          0          0          0          0          0
       250 20-MAY-19          1        250          0          0          0          0          0

13 rows selected.

There are a few things to note here

  • The range to calculate number of arrived items is from lower bound till upper bound of a bucket
  • The range to calculate initial amount is upper bound till infinity (or till the very first row in the past)
  • The range to calculate number of dispatched items is from current row till upper bound. So it starts from current row and not from lower bound because items for specific bucket (or period) may be taken out after that period is closed. For example we may dispatch N items today which are 100 days old. That clearly should affect value s_90_180.
  • Finally we use function greatest not only to calculate number of dispatched items but also in this expression greatest(Arrived – Dispatched). This is because number of dispatched items may be greater than initial plus arrived in specific period. That is because dispatched items actually arrived after given period was closed.

Leave a comment