SkillAgentSearch skills...

Stairlight

A data lineage tool detects table dependencies from rendered SQL statements.

Install / Use

/learn @tosh2230/Stairlight

README

<div align="center"> <img src="https://raw.githubusercontent.com/tosh2230/stairlight/main/img/stairlight_white.png" width="400" alt="Stairlight"> </div>

Stairlight

PyPi Version PyPi License PyPi Python Versions Code style: black CI

Stairlight is a data lineage tool, detects table dependencies from rendered SQL statements.

<div align="left"> <img src="https://raw.githubusercontent.com/tosh2230/stairlight/main/img/drawio/concepts.drawio.png" width="1080" alt="concepts"> </div>

Supported Data Sources

| Data Source | Remarks | | --- | --- | | Local file system | Python Pathlib module | | Amazon S3 | Available for Amazon Managed Workflows for Apache Airflow (MWAA) | | Google Cloud Storage | Available for Google Cloud Composer | | dbt - Google BigQuery | Using dbt compile command internally | | Redash | |

Installation

This package is distributed on PyPI.

# The base package is for local file system only.
$ pip install stairlight

# Set extras when detecting from other data sources.
# e.g. Amazon S3 and Google Cloud Storage
$ pip install "stairlight[s3, gcs]"

| Data Source | TemplateSourceType | Extra | | --- | --- | --- | | Local file system | File | - | | Amazon S3 | S3 | s3 | | Google Cloud Storage | GCS | gcs | | dbt - Google Bigquery | dbt | dbt-bigquery | | Redash | Redash | redash |

Getting Started

There are 3 steps to use.

# 1: Initialize and set your data source settings
$ stairlight init

# 2: Map your SQL statements and tables
$ stairlight map

# 3: Get table dependencies
$ stairlight

Description

Input

  • SQL statements
  • Configuration YAML files
    • stairlight.yaml: SQL statements locations and include/exclude conditions.
    • mapping.yaml: For mapping SQL statements and tables.

Output

Stairlight outputs table dependencies as JSON format.

Top-level keys are table names, and values represents tables that are the data source for each key's table.

<details> <summary>Example</summary>
{
  "test_project.beam_streaming.taxirides_aggregation": {
    "test_project.beam_streaming.taxirides_realtime": {
      "TemplateSourceType": "File",
      "Key": "tests/sql/union_same_table.sql",
      "Uri": "/foo/bar/stairlight/tests/sql/union_same_table.sql",
      "Lines": [
        {
          "LineNumber": 6,
          "LineString": "    test_project.beam_streaming.taxirides_realtime"
        },
        {
          "LineNumber": 15,
          "LineString": "    test_project.beam_streaming.taxirides_realtime"
        }
      ]
    }
  },
  "PROJECT_a.DATASET_b.TABLE_c": {
    "PROJECT_A.DATASET_A.TABLE_A": {
      "TemplateSourceType": "GCS",
      "Key": "sql/one_line/one_line.sql",
      "Uri": "gs://stairlight/sql/one_line/one_line.sql",
      "Lines": [
        {
          "LineNumber": 1,
          "LineString": "SELECT * FROM PROJECT_A.DATASET_A.TABLE_A WHERE 1 = 1"
        }
      ],
      "BucketName": "stairlight",
      "Labels": {
        "Source": null,
        "Test": "a"
      }
    }
  },
  "AggregateSales": {
    "PROJECT_e.DATASET_e.TABLE_e": {
      "TemplateSourceType": "Redash",
      "Key": 5,
      "Uri": "AggregateSales",
      "Lines": [
        {
          "LineNumber": 1,
          "LineString": "SELECT service, SUM(total_amount) FROM PROJECT_e.DATASET_e.TABLE_e GROUP BY service"
        }
      ],
      "DataSourceName": "BigQuery",
      "Labels": {
        "Category": "Sales"
      }
    }
  },
  "dummy.dummy.example_b": {
    "PROJECT_t.DATASET_t.TABLE_t": {
      "TemplateSourceType": "dbt",
      "Key": "tests/dbt/project_01/target/compiled/project_01/models/b/example_b.sql",
      "Uri": "/foo/bar/stairlight/tests/dbt/project_01/target/compiled/project_01/models/b/example_b.sql",
      "Lines": [
        {
          "LineNumber": 1,
          "LineString": "select * from PROJECT_t.DATASET_t.TABLE_t where value_a = 0 and value_b = 0"
        }
      ]
    }
  },
  "PROJECT_as.DATASET_bs.TABLE_cs": {
    "PROJECT_A.DATASET_A.TABLE_A": {
      "TemplateSourceType": "S3",
      "Key": "sql/one_line/one_line.sql",
      "Uri": "s3://stairlight/sql/one_line/one_line.sql",
      "Lines": [
        {
          "LineNumber": 1,
          "LineString": "SELECT * FROM PROJECT_A.DATASET_A.TABLE_A WHERE 1 = 1"
        }
      ],
      "BucketName": "stairlight",
      "Labels": {
        "Source": null,
        "Test": "a"
      }
    }
  }
}
</details>

Collecting patterns

Centralization

<div align="left"> <img src="https://raw.githubusercontent.com/tosh2230/stairlight/main/img/drawio/centralization.drawio.png" width="800" alt="centralization"> </div>

Agents

<div align="left"> <img src="https://raw.githubusercontent.com/tosh2230/stairlight/main/img/drawio/agents.drawio.png" width="800" alt="agents"> </div>

Configuration

Examples can be found here, used for unit testing in CI.

stairlight.yaml

'stairlight.yaml' is for setting up Stairlight itself. It is responsible for specifying SQL statements to be read.

stairlight init creates a template of stairlight.yaml.

<details> <summary>Example</summary>
Include:
  - TemplateSourceType: File
    FileSystemPath: ./tests/sql
    Regex: .*/*\.sql$
    DefaultTablePrefix: "PROJECT_A"
  - TemplateSourceType: GCS
    ProjectId: null
    BucketName: stairlight
    Regex: ^sql/.*/*\.sql$
    DefaultTablePrefix: "PROJECT_A"
  - TemplateSourceType: Redash
    DatabaseUrlEnvironmentVariable: REDASH_DATABASE_URL
    DataSourceName: BigQuery
    QueryIds:
      - 1
      - 3
      - 5
  - TemplateSourceType: dbt
    ProjectDir: tests/dbt/project_01
    ProfilesDir: tests/dbt
    Vars:
      key_a: value_a
      key_b: value_b
  - TemplateSourceType: S3
    BucketName: stairlight
    Regex: ^sql/.*/*\.sql$
    DefaultTablePrefix: "PROJECT_A"
Exclude:
  - TemplateSourceType: File
    Regex: main/exclude\.sql$
Settings:
  MappingFilesRegex:
    - .*/mapping\_file\.yaml$
    - .*/mapping\_gcs\.yaml$
    - .*/mapping\_dbt\.yaml$
    - .*/mapping\_s3\.yaml$
  # Deprecated from v0.7.2
  MappingPrefix: "mapping"
</details>

mapping.yaml

'mapping.yaml' is used to define relationships between input SELECT statements and tables.

stairlight map creates a template of mapping.yaml and attempts to read from data sources specified in stairlight.yaml. If successfully read, it outputs settings that have not yet configured in an existing 'mapping.yaml' file.

<details> <summary>Example</summary>
Global:
  Parameters:
    DESTINATION_PROJECT: stairlight
    params:
      PROJECT: 1234567890
      DATASET: public
      TABLE: taxirides
Mapping:
  - TemplateSourceType: File
    FileSuffix: "tests/sql/union_same_table.sql"
    Tables:
      - TableName: "test_project.beam_streaming.taxirides_aggregation"
        Parameters:
          params:
            source_table: source
            destination_table: destination
        IgnoreParameters:
          - execution_date.add(days=1).isoformat()
  - TemplateSourceType: GCS
    Uri: "gs://stairlight/sql/one_line/one_line.sql"
    Tables:
      - TableName: "PROJECT_a.DATASET_b.TABLE_c"
  - TemplateSourceType: Redash
    QueryId: 5
    DataSourceName: metadata
    Tables:
      - TableName: New Query
        Parameters:
          table: dashboards
        Labels:
          Category: Redash test
  - TemplateSourceType: dbt
    ProjectName: project_01
    FileSuffix: tests/dbt/project_01/target/compiled/project_01/models/example/my_first_dbt_model.sql
    Tables:
      - TableName: dummy.dummy.my_first_dbt_model
  - TemplateSourceType: S3
    Uri: "s3://stairlight/sql/one_line/one_line.sql"
    Tables:
      - TableName: "PROJECT_as.DATASET_bs.TABLE_cs"
ExtraLabels:
  - TableName: "PROJECT_A.DATASET_A.TABLE_A"
    Labels:
      Source: Null
      Test: a
</details>

Global Section

This section is for global configurations.

Parameters is used to set common parameters. If conflicts has occurred with Parameters in mapping section, mapping section's parameters will be used in preference to global.

Mapping Section

Mapping section is used to define relationships between input SELECT statements and tables that created as a result of query execution.

Parameters allows you to reflect settings in jinja template variables embedded in statements. If multiple settings are applied to a statement using jinja template, the statement will be read as if there were the same number of queries as the number of settings.

In contrast, IgnoreParameters handles a list to ignore when rendering queries.

Extra labels Section

This section sets labels to tables that appears only in queries.

Arguments and Options

$ stairlight --help
usage: stairlight [-h] [-c CONFIG] [--save SAVE] [--load LOAD] {init,check,up,down} ...

An end-to-end data lineage tool, detects table dependencies by SQL SELECT statements.
Without positional arguments, return a table dependency map as JSON format.

positional a
View on GitHub
GitHub Stars30
CategoryData
Updated17d ago
Forks0

Languages

Python

Security Score

95/100

Audited on Mar 14, 2026

No findings