Open-source workflow layer for Codex and Claude Code

Agentic ML without the mess.

Sennen turns AI coding agents into disciplined ML workflow agents: target definitions, leakage-safe splits, versioned data, tracked experiments, baselines, explanations, and reviewable artifacts.

View on GitHub Read the case study

CodexClaude CodeGit-nativeDVC-readyMLflow-ready

AI agents can write ML code. That is not enough.

Serious ML work needs stable targets, explicit splits, comparable metrics, tracked experiments, and a record of why decisions were made.

Targets drift

Labels, horizons, cohorts, and exclusions can change across notebooks unless the task contract is explicit.

Leakage hides

Temporal, entity, and duplicate leakage often survive quick experiments and make model performance look better than it is.

Results scatter

Metrics, plots, configs, and explanations end up in separate files that are hard to compare and harder to review.

Sennen makes the workflow explicit.

It is not AutoML and not a hosted platform. Sennen is a workflow discipline packaged as agent skills, designed to work inside your repo.

Task contract

Data versioning

Leakage-safe split

Baseline first

Tracked experiments

Explanation and review

Example: clinical trial enrollment prediction

From noisy registry data to a reproducible predictor.

In the clinical-trials example, Sennen guides an agent through a real modeling workflow: predict whether a trial remains open to enrollment 12 months after launch using ClinicalTrials.gov data.

Step 1

$sen-data

Data

Ingest source data, inspect structure, and materialize versioned datasets.

Step 2

$sen-plan

Plan

Define the prediction target, cohort, time horizon, split strategy, and metric contract.

Step 3

$sen-split

Split

Create leakage-safe train/test splits and check whether the split can be trusted.

Step 4

$sen-experiment

Experiment

Run a baseline first, then compare stronger models against a stable reference.

Step 5

$sen-explain

Explain

Inspect feature importance, model behavior, and failure modes.

Step 6

$sen-review

Review

Critique the workflow, results, and next actions before moving on.

Read the clinical-trials case study

What changes when the agent uses Sennen?

Without Sennen

Ad hoc notebooks and scripts
Implicit metric definitions
Unclear leakage checks
Scattered plots and summaries
Results that are hard to review later

With Sennen

Repo-local workflow artifacts
Explicit target and metric contracts
Leakage-aware split workflow
Git-friendly outputs and reviews
DVC and MLflow integration where useful

Commands for a disciplined ML loop

In Codex, Sennen skills are available as $sen-*. In Claude Code, the matching skills are available as /sen-*.

Start

$sen-doctor$sen-plan

Prepare the repo and make the modeling task explicit.

Data

$sen-data$sen-defects$sen-visualize

Connect, inspect, version, and diagnose datasets.

Model

$sen-split$sen-metrics$sen-preprocess$sen-experiment

Create splits, define metrics, engineer features, and run comparable experiments.

Trust

$sen-explain$sen-review

Explain model behavior and review the workflow before treating results as evidence.

Remix

$sen-remix

Select prior experiments, merge useful mechanisms, and scaffold the next hybrid candidate.

Remix: stop guessing what to try next.

After a few completed experiments, the hard question is no longer whether the agent can write another model. It is which idea should be tried next. Sennen Remix turns prior runs into a structured next-candidate search.

Exploration and exploitation for ML experiments

Inspired by the tree-search tension between promising lines and underexplored ones, not by self-play or a fixed architecture search space.

Prior experiments

UCT selection

Primary parent

Random secondary parent

Agent merge

Hybrid scaffold

Run and record

Exploit what works

Use measured metrics to favor strong prior experiments without hard-coding the next idea.

Explore what is underused

Use visit counts so the search does not collapse into one familiar experiment lineage.

Record the merge

Capture parents, kept mechanisms, imported mechanisms, exclusions, hypothesis, and success criteria.

Remix does not guarantee improvement on a single run and is not a hyperparameter tuner. It is most useful once a project has several completed experiments with comparable metrics, configs, and recorded outcomes.

Built for tools your repo already understands

Git

Keep configs, summaries, scripts, and reviews in a normal code review flow.

DVC

Track downloaded, processed, and split datasets without turning Git into data storage.

MLflow

Log metrics and compare runs with a consistent experiment backend.

Use reproducible Python environments and project-local dependency management.

Install Sennen

Install Sennen into a data project, then ask Codex or Claude Code to use the Sennen workflow.

git clone https://github.com/20minds/sennen.git
cd sennen
./setup /your/data/folder

Open GitHub repo Read install notes

Frequently Asked Questions

Is Sennen AutoML?

No. Sennen is a workflow layer for AI coding agents. It helps the agent plan, run, track, explain, and review ML work inside your repo rather than hiding the process behind an AutoML system.

Does Sennen work with Codex?

Yes. In Codex, Sennen exposes skills such as $sen-data, $sen-plan, $sen-experiment, and $sen-review. Claude Code users get the matching /sen-* skills.

Do I need DVC and MLflow?

No for every project, yes for the strongest workflow. Sennen is designed around Git, DVC, MLflow, and uv, but the setup flow lets you choose which pieces to install.

How does Sennen reduce leakage risk?

Sennen makes split strategy part of the planning contract and gives the agent explicit split and review steps. It does not guarantee correctness, but it makes leakage checks part of the workflow instead of an afterthought.

Can I use Sennen outside biomedical ML?

Yes. The clinical-trials example is biomedical, but the core workflow applies to any supervised ML project where targets, splits, metrics, baselines, and reviewability matter.

Is Sennen Remix a hyperparameter tuner?

No. Remix does not tune a fixed search space. It selects prior experiments, combines useful mechanisms from their source files, and creates a new runnable scaffold. It works best after you have at least a few completed experiments with comparable metrics.

Want to use Sennen for a serious ML workflow?

View on GitHub Contact 20minds