Building in public
Engineering deep-dives, product thinking, and the occasional rant about why data platforms are broken.
Why AI that compiles beats AI that suggests — every time
The fundamental problem with every AI data tool on the market today is that the AI generates text into a void. There's no type system validating column names, no schema resolver checking that catalog://sales.orders actually exists. The engineer is the compiler. We think that's backwards.
Building the PDL compiler: why we chose Rust
PDL is DataAstra's Pipeline Definition Language. It's statically typed, catalog-aware, and lineage-native. Here's why Rust was the only real choice for the compiler, and what Hindley-Milner type inference looks like applied to data pipelines.
Grammar-constrained LLM generation: killing hallucinations at the token level
When you constrain an LLM to generate tokens that form valid PDL syntax, something remarkable happens: it can't hallucinate a column name. Here's the technical approach we use and why it cuts token usage by 40% versus free-form generation.
Data contracts as first-class language constructs
Great Expectations is a great tool. But quality checks should live in the pipeline, not beside it. Here's what inline quality contracts look like in PDL and why they change the data quality conversation entirely.