Kolumn - Schema-as-Configuration for Data & Streaming

Installation

curl -fsSL https://schemabounce.github.io/Kolumn-deploy/install.sh | bash

Supports AMD64 and ARM64. Installs to ~/.local/bin/kolumn

curl -fsSL https://schemabounce.github.io/Kolumn-deploy/install.sh | bash

Supports Intel and Apple Silicon. Installs to ~/.local/bin/kolumn

Option 1: Direct Download

Download (64-bit) Download (ARM64)

Option 2: PowerShell

irm https://schemabounce.github.io/Kolumn-deploy/install.ps1 | iex

Option 3: Manual

Download from the releases page
Rename to kolumn.exe and add to PATH
Run kolumn version to verify

curl -fsSL https://schemabounce.github.io/Kolumn-deploy/install.sh | bash

WSL Notes

Install WSL 2: wsl --install
Open your WSL terminal
Run the install command above

# main.kl - Define your database schema provider "postgres" { host = "localhost" port = 5432 database = "myapp" username = var.db_user password = var.db_password } create "postgres_table" "users" { name = "users" schema = "public" columns = { id = { type = "bigserial" primary_key = true } email = { type = "varchar(255)" not_null = true unique = true } created_at = { type = "timestamptz" default = "now()" } } }

MSSQL → Snowflake streaming replication

Stream CDC from MSSQL into Snowflake with auto-schema creation, merge batching, and drift-safe routing filters.

Kolumn config

# main.kl - stream CDC from MSSQL into Snowflake (end-to-end)
provider "mssql" {
  host     = "mssql.internal"
  database = "orders"
  username = var.mssql_user
  password = var.mssql_password
}

provider "snowflake" {
  account   = var.snowflake_account
  database  = "ANALYTICS"
  schema    = "CDC"
  warehouse = "COMPUTE_WH"
  role      = "CDC_SYNC"
}

# Source table with PII column
create "mssql_table" "users" {
  name   = "users"
  schema = "dbo"

  columns = {
    id = {
      type        = "uniqueidentifier"
      default     = "NEWSEQUENTIALID()"
      primary_key = true
    }
    email = {
      type     = "varchar(255)"
      not_null = true
      unique   = true
    }
    created_at = {
      type    = "datetime2"
      default = "SYSUTCDATETIME()"
    }
  }
}

# Snowflake roles/users and masking for PII
create "snowflake_role" "cdc_consumer" {
  name = "CDC_CONSUMER"
}

create "snowflake_user" "cdc_sync_user" {
  name         = "CDC_SYNC_USER"
  default_role = snowflake_role.cdc_consumer.name
  password     = var.snowflake_password
}

create "snowflake_masking_policy" "mask_email" {
  name = "MASK_EMAIL"
  body = "CASE WHEN CURRENT_ROLE() = 'CDC_CONSUMER' THEN email ELSE REGEXP_REPLACE(email, '(.).+(@.*)', '\\\\1***\\\\2') END"
}

# Snowflake landing table with masking applied
create "snowflake_table" "users_replica" {
  database = "ANALYTICS"
  schema   = "CDC"
  name     = "USERS"

  columns = {
    id = {
      type        = "BINARY(16)"
      primary_key = true
    }
    email = {
      type           = "VARCHAR"
      masking_policy = snowflake_masking_policy.mask_email.name
    }
    created_at = {
      type = "TIMESTAMP_NTZ"
    }
  }
}

create "stream_sink" "snowflake_replica" {
  type = "snowflake"
  connection_info = {
    account   = var.snowflake_account
    user      = "CDC_SYNC_USER"
    password  = var.snowflake_password
    database  = "ANALYTICS"
    schema    = "CDC"
    warehouse = "COMPUTE_WH"
  }
  batch_mode    = "merge"
  buffer_size   = 1000
  create_tables = true
}

create "stream_route" "mssql_orders_to_snowflake" {
  source = {
    type        = "mssql_cdc"
    database    = "orders"
    slot_name   = "cdc_orders_slot"
    publication = "cdc_orders_publication"
  }

  sink = stream_sink.snowflake_replica.id

  filter {
    event_types = ["DML"]
    conditions  = ["schema NOT IN ('cdc', 'sys')"]
  }

  throughput_targets {
    target_tps   = 1000
    worker_count = 1
  }

  # Optional: allow DDL table creates from CDC (if desired)
  # filter {
  #   event_types = ["DDL", "DML"]
  #   conditions  = ["operation IN ('CREATE TABLE','ALTER TABLE','DROP TABLE')"]
  # }
}

PII columns (e.g., email) can be classified in Kolumn so Snowflake masking/role grants apply on arrival.

Supported Providers

Create + discover + state/RPC wiring across databases, warehouses, lakehouses, and NoSQL/time-series engines.

PostgreSQL

Database

MySQL

Database

SQLite

Database

CockroachDB

Database

MSSQL

Database

Snowflake

Warehouse

BigQuery

Warehouse

Databricks

Lakehouse

Redshift

Warehouse

DuckDB

OLAP

MongoDB

NoSQL

DynamoDB

NoSQL

InfluxDB

Time-series

Why Kolumn?

Schema Drift Detection

Compare your config against live databases. Instantly see what changed, what's unmanaged, and what needs attention.

Safe SQL Generation

Review generated SQL before applying. Risk scoring highlights potentially dangerous changes so you can proceed with confidence.

Multi-Environment

Manage dev, staging, and production with the same config. Variables and environment files keep credentials separate.

Rollback Support

Plans include rollback notes: simple creates map to drops; complex changes (type rewrites) include guidance and may need backups for full recovery.

CI/CD Ready

Run plan in CI to catch drift early. Block risky PRs before they hit production. Integrate with GitHub Actions, GitLab CI, or any pipeline.

Audit Trail

Every compare and apply is logged. Know who changed what and when for compliance and debugging.

Transformation Migrations

Structured forward/backward migrations (no raw SQL): prechecks, provider-run steps, rollback paths, and optional quarantine/backup hints recorded in state.

Risk Scoring & Guardrails

Plans flag high-risk steps (drops/rewrites), enforce prechecks, and let you pause, approve, or split changes before apply.

Provider Coverage

13 engines supported with create + discover + state/RPC wiring: Postgres, MySQL, MSSQL, SQLite, CockroachDB, Snowflake, BigQuery, Databricks, Redshift, DuckDB, MongoDB, DynamoDB, InfluxDB.

Schema-as-Configurationfor Data & Streaming

Installation

Option 1: Direct Download

Option 2: PowerShell

Option 3: Manual

WSL Notes

Define schemas in simple HCL configuration

1) Compare

2) SQL generation

3) Outputs

Risk scoring

Guards

Approval cue

Execution plan

Safety checks

Outcome

State file

Next run

Audit

MSSQL → Snowflake streaming replication

1) MSSQL CDC

2) SchemaBounce

3) Snowflake

Classifications

Masking/roles

Lineage

Metrics

DLQ

Audit/state

Supported Providers

Why Kolumn?

Schema Drift Detection

Safe SQL Generation

Multi-Environment

Rollback Support

CI/CD Ready

Audit Trail

Transformation Migrations

Risk Scoring & Guardrails

Provider Coverage

Get Started

Schema-as-Configuration
for Data & Streaming