Designing and Implementing get_stats in PL/SQL
The get_stats function in the util_numeric package provides a deterministic statistical engine for in-memory numeric datasets.
It does not operate directly on database tables, although data from tables can be passed to it by first being placed into an array.
Instead, it works with PL/SQL collections and produces a structured result containing:
- The Frequency Table
- A complete statistical summary
This blog post explains the design and implementation of that function.
Entry Points
The package supports the following ways to compute statistics:
If you already have data in an array, or have populated the array from a database table, call:
get_stats(p_array IN t_number_array)
Thin wrapper for array input, simply delegates to get_stats:
get_stats_array(p_array IN t_number_array)
If your data is a comma separated values string:
get_stats_list(p_list IN t_number_array)
This converts the string using function list_to_array and then calls get_stats_array.
All three entry points ultimately route to the same core function, get_stats.
The Core Design Principle
The key architectural decision is this:
Build the Frequency Table once, then derive all statistics from it.
Instead of repeatedly scanning the original array, get_stats first transforms the dataset into a canonical distribution.
Step 1 - Populate the Frequency Table
The first operation inside get_stats is:
v_frequency_table := populate_frequency_table(p_array);
populate_frequency_table performs the following:
- Validates input
- Rejects NULL values
- Sorts array into ascending order
- Compresses duplicates into (key, frequency) pairs
Example:
Input array: 1,1,2,2,2,4,5
| Frequency Table | |
|---|---|
| Key | Frequency |
| 1 | 2 |
| 2 | 3 |
| 4 | 1 |
| 5 | 1 |
From this point forward, all statistical calculations operate on this frequency table.
Step 2 - Calculate Summary Statistics
Once the frequency table has been populated, each statistic is computed independently using pure functions:
v_stats.sum_values := frequency_table_sum(v_frequency_table);
v_stats.n_total := frequency_table_count(v_frequency_table);
v_stats.distinct_n := v_frequency_table.COUNT;
v_stats.mean := frequency_table_mean(v_frequency_table);
v_stats.median := frequency_table_median(v_frequency_table);
v_stats.mode_values:= frequency_table_mode(v_frequency_table);
v_stats.lowest := frequency_table_lowest(v_frequency_table);
v_stats.highest := frequency_table_highest(v_frequency_table);
v_stats.range := frequency_table_range(v_frequency_table);
v_stats.variance_pop := variance_pop(v_frequency_table);
v_stats.variance_samp := variance_samp(v_frequency_table);
v_stats.stddev_pop := stddev_pop(v_frequency_table);
v_stats.stddev_samp := stddev_samp(v_frequency_table);
v_stats.iqr := iqr(v_frequency_table);
Each function:
- Accepts frequency table data of type t_frequency_table
- Performs a calculation
- Has no side-effects
- Does not re-build, re-sort, or change frequency table data
This makes the implementation predictable, and testable.
Step 3 - Populate the Result Structure
The function returns a composite structure:
-- Get_stats uses this composite record to store the calculated
-- statistics and frequency table from which they were generated.
TYPE t_stats_result IS RECORD (
stats t_stats_summary,
freq_tbl t_frequency_table
);
The underlying structures used for the frequency table, and statistics are:
TYPE t_frequency_row IS RECORD (
key PLS_INTEGER,
frequency PLS_INTEGER
);
TYPE t_frequency_table IS TABLE OF t_frequency_row;
TYPE t_int_table IS TABLE OF PLS_INTEGER;
-- Used by get_stats to return all statistics for frequency table
-- Note this is nested, supporting multiple mode values in t_int_table
TYPE t_stats_summary IS RECORD (
sum_values PLS_INTEGER,
n_total PLS_INTEGER,
distinct_n PLS_INTEGER,
mean NUMBER,
median NUMBER,
mode_values t_int_table,
lowest NUMBER,
highest NUMBER,
range NUMBER,
variance_pop NUMBER,
variance_samp NUMBER,
stddev_pop NUMBER,
stddev_samp NUMBER,
iqr NUMBER
);
Returning both the statistics and frequency table from which they were derived in t_stats_result ensures:
- The frequency table is available for display
- Percentiles can be calculated repeatedly without needing to re-compute the frequency table
- Additional derived calculations can reuse the frequency table
Why Percentiles are Separate
Percentile functions are intentionally not embedded inside get_stats. They are defined separately:
FUNCTION percentile_disc(
p_frequency_table IN t_frequency_table,
p_pct IN NUMBER
) RETURN PLS_INTEGER;
FUNCTION percentile_cont(
p_frequency_table IN t_frequency_table,
p_pct IN NUMBER
) RETURN NUMBER;
This keeps responsibilities clean:
- get_stats computes the core_summary.
- Percentiles operate on the returned frequency table.
- The summary record remains stable and compact.
Complexity
Let:
- N = total number of elements in source array
- D = number of distinct values (rows in frequency table)
Then:
- Frequency table build: O(N log N)
- Mean, median, mode, variance, standard deviation: O(D)
- Percentile: O(D)
Once the frequency table is built, all operations are linear in distinct values only.
Example
In this example, a comma separated list of integers will be passed to get_stats_list which
returns the result in a record of type t_stats_result.
Run the following code in Oracle SQL Developer:
SET SERVEROUTPUT ON
ACCEPT p_list PROMPT "Enter a list of numbers separated by commas"
ACCEPT p_percentile NUMBER PROMPT "Percentile (number > 0 and < 1)?"
DECLARE
v_stats_result util_numeric.t_stats_result;
v_percentile_cont NUMBER;
v_percentile_disc NUMBER;
BEGIN
util_admin.log_message('Input list is ' || '&p_list');
v_stats_result := util_numeric.get_stats_list('&p_list');
util_numeric.display_frequency_table(v_stats_result);
util_numeric.display_stats(v_stats_result);
v_percentile_disc := util_numeric.percentile_disc(v_stats_result.freq_tbl,&p_percentile);
util_admin.log_message('PCT_DISC ('||to_char(&p_percentile,'0.99')||')='||trim(to_char(v_percentile_disc,'9999999990.9999')));
v_percentile_cont := util_numeric.percentile_cont(v_stats_result.freq_tbl,&p_percentile);
util_admin.log_message('PCT_CONT ('||to_char(&p_percentile,'0.99')||')='||trim(to_char(v_percentile_cont,'9999999990.9999')));
EXCEPTION
WHEN OTHERS THEN
util_admin.log_message('See error messages above.');
END;
You will be prompted to enter a list of numbers, followed by a percentile.
Enter a list of numbers separated by commas : 1,3,1,2,1,5,9,1,7,3,2
Percentile (number > 0 and < 1)? : 0.66
The results are displayed below:
Input list is 1,3,1,2,1,5,9,1,7,3,2
----------------------------------------------------------
FREQUENCY TABLE
----------------------------------------------------------
KEY=1 Frequency=4
KEY=2 Frequency=2
KEY=3 Frequency=2
KEY=5 Frequency=1
KEY=7 Frequency=1
KEY=9 Frequency=1
----------------------------------------------------------
STATISTICS
Sum=35
N Total=11
Distinct N=6
Mean=3.1818
Median=2.0000
Mode 1 = 1
Lowest=1
Highest=9
Range=8
Variance Population=6.6942
Variance Sample=7.3636
Standard Deviation Population=2.5873
Standard Deviation Sample=2.7136
Interquartile Range=3.0000
PCT_DISC ( 0.66)=3.0000
PCT_CONT ( 0.66)=3.0000
Conclusion
get_stats defines a small, well-structured statistical application.
In our next blog we will develop an APEX application that uses these statistical functions.


