Relation
- Visualization graph consists of
- nodes, we call them endpoints
- edges, we call them relations
- relation means, that by executing SQL code, data is
- copied or transformed from source endpoint to target endpoint, or:
- target endpoint is influenced by source endpoint, but the value is not directly derived from source
Relation types: direct, semidirect and indirect
Direct lineage example
Consider this SQL command:
INSERT INTO TARGET_TABLE SELECT COLUMN1, COLUMN2 FROM SOURCE_TABLE
- There are two direct relations:
- FROM
COLUMN1
of SOURCE_TABLE
to first column of TARGET_TABLE
- FROM
COLUMN2
of SOURCE_TABLE
to second column of TARGET_TABLE
Semidirect lineage example
INSERT INTO TARGET_TABLE SELECT CASE WHEN A = B AND C = 1 THEN D END FROM SOURCE_TABLE
-
There are three semidirect relations:
- FROM
A
of SOURCE_TABLE
to first column of TARGET_TABLE
- FROM
B
of SOURCE_TABLE
to first column of TARGET_TABLE
- FROM
C
of SOURCE_TABLE
to first column of TARGET_TABLE
-
And there is one direct relation:
- FROM
D
of SOURCE_TABLE
to first column of TARGET_TABLE
-
See Glossary for more info about semidirect relation
Indirect lineage example
Consider this SQL command:
INSERT INTO TARGET_TABLE SELECT COLUMN1 FROM SOURCE_TABLE WHERE COLUMN2 = 'VALUE'
- There is an indirect relation FROM
COLUMN2
of SOURCE_TABLE
to TARGET_TABLE
Note
- The direct relation connects column endpoint to column endpoint, whereas the indirect relation connects column endpoint to table endpoint
- Direct relation means, that data is copied or transformed from source column and saved into target column.
- Indirect relation means, that target doesn't neccessarily contain the source data, but its presence is somehow influenced by the source column - typically by filtering and aggregating by its value.