v0.1.111

Schema-as-Configuration
for Data & Streaming

Fastest path from schema drift to safe change: define schemas in HCL, detect drift, generate SQL, and ship database + streaming updates with confidence.

Installation

curl -fsSL https://schemabounce.github.io/Kolumn-deploy/install.sh | bash

Supports AMD64 and ARM64. Installs to ~/.local/bin/kolumn

curl -fsSL https://schemabounce.github.io/Kolumn-deploy/install.sh | bash

Supports Intel and Apple Silicon. Installs to ~/.local/bin/kolumn

Option 1: Direct Download

or

Option 2: PowerShell

irm https://schemabounce.github.io/Kolumn-deploy/install.ps1 | iex
or

Option 3: Manual

  1. Download from the releases page
  2. Rename to kolumn.exe and add to PATH
  3. Run kolumn version to verify
curl -fsSL https://schemabounce.github.io/Kolumn-deploy/install.sh | bash

WSL Notes

  1. Install WSL 2: wsl --install
  2. Open your WSL terminal
  3. Run the install command above

Define schemas in simple HCL configuration

# main.kl - Define your database schema
provider "postgres" {
  host     = "localhost"
  port     = 5432
  database = "myapp"
  username = var.db_user
  password = var.db_password
}

create "postgres_table" "users" {
  name   = "users"
  schema = "public"

  columns = {
    id = {
      type        = "bigserial"
      primary_key = true
    }

    email = {
      type     = "varchar(255)"
      not_null = true
      unique   = true
    }

    created_at = {
      type    = "timestamptz"
      default = "now()"
    }
  }
}
What the Kolumn snippet does

From HCL to SQL with drift detection, risk scoring, controlled apply, and state.

1) Compare

kolumn plan diffs HCL vs live DB, finds drift/unmanaged objects.

2) SQL generation

Builds ordered DDL (CREATE/ALTER) plus rollback SQL for every step.

3) Outputs

Shows a plan summary: create table users, add unique index on email, etc.

Risk scoring

Each change is labeled low/medium/high (drops, rewrites, locking changes).

Guards

High-risk steps call out why: "Drop column email is destructive", "Rewrite table".

Approval cue

You decide to proceed, edit config, or split the change before applying.

Execution plan

Runs the generated SQL in order with transactional safety where supported.

Safety checks

Stops on errors, respects lock-sensitive operations, surfaces rollback SQL.

Outcome

Reports success/failure per statement; warns on partial applies if any.

State file

Kolumn records the post-apply state (tables, columns, checks, indexes).

Next run

Future plan runs diff against state + live DB to find new drift.

Audit

Plan/apply logs with timestamps and SQL are kept for review/compliance.

MSSQL → Snowflake streaming replication

Stream CDC from MSSQL into Snowflake with auto-schema creation, merge batching, and drift-safe routing filters.

Kolumn config
# main.kl - stream CDC from MSSQL into Snowflake (end-to-end)
provider "mssql" {
  host     = "mssql.internal"
  database = "orders"
  username = var.mssql_user
  password = var.mssql_password
}

provider "snowflake" {
  account   = var.snowflake_account
  database  = "ANALYTICS"
  schema    = "CDC"
  warehouse = "COMPUTE_WH"
  role      = "CDC_SYNC"
}

# Source table with PII column
create "mssql_table" "users" {
  name   = "users"
  schema = "dbo"

  columns = {
    id = {
      type        = "uniqueidentifier"
      default     = "NEWSEQUENTIALID()"
      primary_key = true
    }
    email = {
      type     = "varchar(255)"
      not_null = true
      unique   = true
    }
    created_at = {
      type    = "datetime2"
      default = "SYSUTCDATETIME()"
    }
  }
}

# Snowflake roles/users and masking for PII
create "snowflake_role" "cdc_consumer" {
  name = "CDC_CONSUMER"
}

create "snowflake_user" "cdc_sync_user" {
  name         = "CDC_SYNC_USER"
  default_role = snowflake_role.cdc_consumer.name
  password     = var.snowflake_password
}

create "snowflake_masking_policy" "mask_email" {
  name = "MASK_EMAIL"
  body = "CASE WHEN CURRENT_ROLE() = 'CDC_CONSUMER' THEN email ELSE REGEXP_REPLACE(email, '(.).+(@.*)', '\\\\1***\\\\2') END"
}

# Snowflake landing table with masking applied
create "snowflake_table" "users_replica" {
  database = "ANALYTICS"
  schema   = "CDC"
  name     = "USERS"

  columns = {
    id = {
      type        = "BINARY(16)"
      primary_key = true
    }
    email = {
      type           = "VARCHAR"
      masking_policy = snowflake_masking_policy.mask_email.name
    }
    created_at = {
      type = "TIMESTAMP_NTZ"
    }
  }
}

create "stream_sink" "snowflake_replica" {
  type = "snowflake"
  connection_info = {
    account   = var.snowflake_account
    user      = "CDC_SYNC_USER"
    password  = var.snowflake_password
    database  = "ANALYTICS"
    schema    = "CDC"
    warehouse = "COMPUTE_WH"
  }
  batch_mode    = "merge"
  buffer_size   = 1000
  create_tables = true
}

create "stream_route" "mssql_orders_to_snowflake" {
  source = {
    type        = "mssql_cdc"
    database    = "orders"
    slot_name   = "cdc_orders_slot"
    publication = "cdc_orders_publication"
  }

  sink = stream_sink.snowflake_replica.id

  filter {
    event_types = ["DML"]
    conditions  = ["schema NOT IN ('cdc', 'sys')"]
  }

  throughput_targets {
    target_tps   = 1000
    worker_count = 1
  }

  # Optional: allow DDL table creates from CDC (if desired)
  # filter {
  #   event_types = ["DDL", "DML"]
  #   conditions  = ["operation IN ('CREATE TABLE','ALTER TABLE','DROP TABLE')"]
  # }
}

PII columns (e.g., email) can be classified in Kolumn so Snowflake masking/role grants apply on arrival.

Data flow: MSSQL CDC to Snowflake

How events travel from source tables to Snowflake with Kolumn + streaming.

1) MSSQL CDC

Source changes captured from orders via MSSQL CDC.

2) SchemaBounce

Events flow through the SchemaBounce streaming service, aligned to Kolumn config.

3) Snowflake

CDC rows land in ANALYTICS.CDC; downstream models consume curated views.

Classifications

Kolumn tags columns (PII/financial) and propagates policies to Snowflake.

Masking/roles

Snowflake roles, masking policies, and access filters generated from HCL.

Lineage

Data objects track source → warehouse lineage for audits and impact analysis.

Metrics

Per-step throughput, latency, retries, and queue depth exported to Grafana.

DLQ

Failed events land in DLQ with filters + retry paths; exports for forensics.

Audit/state

Plan/apply logs + Kolumn state track schema and migration history end-to-end.

Supported Providers

Create + discover + state/RPC wiring across databases, warehouses, lakehouses, and NoSQL/time-series engines.

PostgreSQL
Database
MySQL
Database
SQLite
Database
CockroachDB
Database
MSSQL
Database
Snowflake
Warehouse
BigQuery
Warehouse
Databricks
Lakehouse
Redshift
Warehouse
DuckDB
OLAP
MongoDB
NoSQL
DynamoDB
NoSQL
InfluxDB
Time-series

Why Kolumn?

Schema Drift Detection

Compare your config against live databases. Instantly see what changed, what's unmanaged, and what needs attention.

Safe SQL Generation

Review generated SQL before applying. Risk scoring highlights potentially dangerous changes so you can proceed with confidence.

Multi-Environment

Manage dev, staging, and production with the same config. Variables and environment files keep credentials separate.

Rollback Support

Plans include rollback notes: simple creates map to drops; complex changes (type rewrites) include guidance and may need backups for full recovery.

CI/CD Ready

Run plan in CI to catch drift early. Block risky PRs before they hit production. Integrate with GitHub Actions, GitLab CI, or any pipeline.

Audit Trail

Every compare and apply is logged. Know who changed what and when for compliance and debugging.

Transformation Migrations

Structured forward/backward migrations (no raw SQL): prechecks, provider-run steps, rollback paths, and optional quarantine/backup hints recorded in state.

Risk Scoring & Guardrails

Plans flag high-risk steps (drops/rewrites), enforce prechecks, and let you pause, approve, or split changes before apply.

Provider Coverage

13 engines supported with create + discover + state/RPC wiring: Postgres, MySQL, MSSQL, SQLite, CockroachDB, Snowflake, BigQuery, Databricks, Redshift, DuckDB, MongoDB, DynamoDB, InfluxDB.