Context
We are Sustaain, a data-for-sustainability startup building a sustainability intelligence platform for complex supply chains such as coffee and cocoa with an initial focus on data management for European Union Deforestation Regulation (EUDR) compliance and risk management.
Over the next 6 months, we’re shipping production-grade compliance workflows: portal UX + APIs, traceability + geospatial services, auditability, and document ingestion & automation as a first-class module. This sits at the heart of the unified compliance operations vision we’re building with large commodity traders.
Role Overview
We are looking for an AI Engineer (LLM-centered) to industrialize how we extract, reconcile, and validate compliance evidence from messy documents and semi-structured sources.
You will report to our Science Lead and work closely with the engineering team and delivery team to bring LLM capabilities into robust production systems : evaluation, monitoring, security, cost control, and tight integration with data models and workflows.
Responsibilities
- Build and ship LLM-powered pipelines for document ingestion & automation (think receipts, certificates, supporting evidence), including extraction, classification, normalization, and reconciliation with existing data.
- Design RAG, agentic and hybrid systems (rules + ML + LLM) that are auditable, deterministic where needed, and resilient to edge cases.
- Own evaluation: gold datasets, offline benchmarks, regression tests, quality gates, and human-in-the-loop review workflows.
- Implement production guardrails: PII handling, redaction, prompt injection resistance, secure retrieval, and traceability of model outputs (who/what/when/why).
- Optimize for latency and cost (caching, chunking strategies, batching, model selection, fallbacks).
- Collaborate with product and data teams to turn compliance requirements into “shippable” workflows (review screens, evidence packs, audit trails).
Required Skills & Experience
- Strong software engineering fundamentals (Python), production mindset, and a track record of shipping.
- Hands-on experience with LLM systems in production (tool use, structured extraction, function calling, evals).
- Solid understanding of data engineering basics (schemas, validation, pipelines, observability).
- Comfort with ambiguity: you can define the problem, measure it, and iterate fast.
- You care about correctness and accountability (this is compliance, not content).
Nice-to-Have