LucidDbAppLib PENULTIMATE VALUES

From Eigenpedia

Jump to: navigation, search

Contents

Syntax

select * from table(
    applib.penultimate_values(
        cursor( SQL_QUERY ),
        row( GROUPING_COLUMNS ),
        row( DESIGNATED_VALUE_AND_TIMESTAMP )
    )
);

Purpose

The Penultimate values UDX takes in a sorted input table which has a timestamp column, a designated column, and grouping column(s), among other columns, and returns a table with the same columns as the input table, plus an additional timestamp column. One row per group is outputted - it contains the next to last value change for each column other than the grouping column and the timestamp column. Since each column does not necessarily change per row, the output row for a grouping may not be an actual row in the input table. The timestamp column contains the timestamp value taken from the input row for which the next to last value change occurred for the designated column. The additional timestamp column is calculated by looking at the last value change for the designated column and taking the timestamp from that row in the input table.

Parameters

Input:

  • SQL_QUERY: rows to be aggregated, presorted on GROUPING_COLUMNS plus timestamp value from DESIGNATED_VALUE_AND_TIMESTAMP.
  • GROUPING_COLUMNS: - one or more columns of the input which are used to form the key of the output (input rows will be grouped accordingly)
  • DESIGNATED_VALUE_AND_TIMESTAMP: - exactly two columns from the input; the first (of any datatype) is used for determining which rows timestamps are taken from; the second selects the timestamp column

Output:

  • a table with the same columns as the input table, plus one additional UNTIL_TIMESTAMP column of type TIMESTAMP

Calculation:

The output contains one row per distinct key from GROUPING_COLUMNS, in the same order as the input.

  • For GROUPING_COLUMNS, the output values for a given group are the same as the input values (which are the same for the entire group).
  • For the timestamp column, the output value for a given group is determined as follows. First, find the penultimate value of the designated column. (If the group contains only one value for the designated column, then treat that as the penultimate value.) Next, find the last run of contiguous rows within the group which have that penultimate value for the designated column. Take the timestamp from the first row within that run.
  • For the UNTIL_TIMESTAMP column, the output value for a given group is determined by taking the timestamp value from the next row following the penultimate value run defined in the previous bullet. However, if the group contains only one distinct value for the designated column, then return NULL for UNTIL_TIMESTAMP instead.
  • For any other column, the output value for a given group is equal to the penultimate value in the input for that group according to the timestamp ordering. If the group contains only one value for that column, then the output value is that unique value.

Boundary cases:

  • Empty input will return an empty table with the additional UNTIL_TIMESTAMP column
  • There is no special handling for null values in columns other than the timestamp column
    • This means that if a penultimate value for a column is a null, it will be outputted as is.
    • The same goes for nulls in the grouping column - null is a valid group
  • all string fields are right trimmed prior to comparison for equality. This means that 'Field' is the same as 'Field ', and strings that consist of spaces are equivalent to the empty string.

Exceptions:

A fatal exception is thrown under any of the following conditions:

  • DESIGNATED_VALUE_AND_TIMESTAMP is not a row of exactly two columns
  • Second column of DESIGNATED_VALUE_AND_TIMESTAMP is not a TIMESTAMP
  • Input table not presorted correctly in ascending order of group keys plus timestamp (as defined by SQL:2003 ORDER BY semantics, with NULLS FIRST, and PAD SPACE collation characteristic for group keys of VARCHAR datatype)
    • for rules related to NULLS FIRST ordering, see SQL:2003 Part 2 Section 10.10 General Rule 1.e
    • for specifics of VARCHAR with PAD SPACE, see SQL:2003 Part 2 Section 4.2.2
  • If any of the columns in DESIGNATED_VALUE_AND_TIMESTAMP are in GROUPING_COLUMNS
  • If the timestamp input column (2nd column of DESIGNATED_VALUE_AND_TIMESTAMP) contains a NULL value

Example

select 
    "OpportunityId", 
    "OpportunityChangeTimestamp" as "PreviousStageStart",
    "Stage" as "PreviousStage", 
    "Amount" as "PreviousAmount",
    "ExpectedRevenue" as "PreviousExpectedRevenue" 
    until_timestamp as "PreviousStageEnd"
from table(
    penultimate_values(
        cursor(
            select * from opportunity_change_history
            order by "OpportunityId", "OpportunityChangeTimestamp"
        ), 
        row("OpportunityId"),
        row("Stage", "OpportunityChangeTimestamp")));
opportunity_change_history input table
OpportunityId OpportunityChangeTimestamp Stage Amount ExpectedRevenue
(null) 1800-01-01 00:00:00 (null) 0 (null)
(null) 1802-01-01 00:00:00 (null) (null) 0
A 2007-10-12 12:23:41 F 100 (null)
A 2007-10-14 00:00:00 F 150 100
A 2007-12-01 09:00:00 G 150 200
A 2007-12-01 15:00:00 F (null) 100
A 2007-12-29 11:45:00 H 200 100
B (null) (null) 10 (null)
B 2007-09-09 05:00:23 F 10 90
C 2008-01-01 07:27:00 G 900 2000
D 2007-12-04 16:56:09 F 10 15
D 2007-12-05 00:00:00 G 10 15
D 2007-12-05 01:00:00 H 15 15
D 2007-12-05 05:00:00 H 20 20


Output table
OpportunityId PreviousStageStart PreviousStage PreviousAmount PreviousExpectedRevenue PreviousStageEnd
(null) 1800-01-01 00:00:00 (null) 0 (null) (null)
A 2007-12-01 15:00:00 F (null) 200 2007-12-29 11:45:00
B (null) (null) 10 (null) 2007-09-09 05:00:23
C 2008-01-01 07:27:00 G 900 2000 (null)
D 2007-12-05 00:00:00 G 15 15 2007-12-05 01:00:00

Source Code

http://p4web.eigenbase.org/open/dev/luciddb/src/com/lucidera/luciddb/applib/cursor/PenultimateValuesUdx.java

Personal tools