MILIONI
← Blog
Technology

Why I built a "Jira" for AI: the story of claude-organizer

AI made the way I build software way too fast — and the new bottleneck became organizing what it does. I tried several plugins, kept hitting the same wall, and ended up building claude-organizer: a board the AI itself uses over MCP.

Why I built a "Jira" for AI: the story of claude-organizer

AI changed the way I build software

It’s been a while since I last wrote most of my code by hand. I describe what I want, I review it, I correct the course, and the AI executes. That changed everything: what used to take an afternoon of typing now takes minutes. Shipping stopped being a question of how many hours I can stay in the chair and became a question of how well I can steer the machine.

But every gain in speed exposes a new bottleneck. And mine showed up fast.

The bottleneck that took its place

A coding agent has no memory between sessions. When I close the terminal and open it again tomorrow, it has no idea what we decided yesterday, why we decided it, what was already done and what was missing. Every session starts from zero.

While the work was small, I could hold all of that in my head. But once the AI started shipping for real, I found myself with several fronts open at the same time — and the hardest part of my day was no longer writing code. It was remembering the state of things. What was decided, what depended on what, what had already been done and what had been left half-finished.

In other words: documentation stopped being a “later” luxury and became the thing that keeps the project moving. Without it, the AI hallucinates — it rewrites what already existed, ignores a decision we made last week, runs things in the wrong order. The quality of what it delivers came to depend directly on how well the work is organized and recorded.

I tried almost everything — and they all hit the same wall

I didn’t jump straight into reinventing the wheel. I spent months trying the plugins and skills the community had been building precisely to solve this: superpowers, GSD (get-shit-done), cavekit, among others.

They all have merit. They’re well-thought-out projects, full of good ideas about how to give process and discipline to an agent. I learned a lot from each one. But in the end, they all ran into the same underlying problem: the way they stored what the AI “knows” was a pile of spec files in Markdown committed to the repository.

And in practice, that hurts in two ways:

  • It pollutes git. Each plan turns into a handful of .md files that ship alongside the code. The history fills up with process noise, and the repo carries a layer of paperwork that isn’t, in fact, the product.
  • It goes stale fast. A spec is a snapshot of a moment. The next day reality has already moved on, but the file is still there, confidently asserting things that no longer hold. Then the AI reads an old plan as if it were true, and the hallucination only gets worse.

I wanted the opposite of a static file: I wanted living state the AI would query and update as the work moves — not yet another document aging alone in the corner of the repo.

What if the board belonged to the AI itself?

That’s when the idea hit me. Every software team has already solved the “what to do, why, and in what order” problem — it’s called a task board. Jira, Trello, Linear. Why wouldn’t the AI have its own?

Except with one important inversion: this board would be used mostly by the AI itself, over MCP (the protocol that lets the agent query and edit external tools). The agent reads the active sprint, picks the next card, records the decisions, attaches the commit, closes the task. And I — the human — step in through the interface mainly to follow along and review: see what’s being done in real time, drag cards, leave a comment it reads next session.

It’s not the AI using a human’s tool. It’s a tool designed for the AI, with a window for the human to look inside.

That’s how claude-organizer was born

claude-organizer is what came out of that idea. It’s not just a board: it’s an ecosystem around organizing the agent’s work. It has projects, sprints, stories and sub-tasks, blockers, tags, priorities — and a Nuxt interface that mirrors all of it in real time, updating over WebSocket while the AI works.

The claude-organizer board: a sprint with cards spread across To do / In progress / Review / Done
The claude-organizer board mirrors, in real time, what the AI is doing — the same columns as any team's board.

The whole flow runs through skills + MCP. There are five skills that trigger from what I say, without me calling any by hand: one to orient and operate the board, one to plan a new demand into sprint/stories/tasks, one to execute a card through its lifecycle, one for review (a mandatory gate, run by a fresh subagent, that checks the acceptance criteria and hunts for bugs), and one for autopilot to advance on its own through several ready cards. The durable documentation — architecture, decisions (ADRs), patterns — lives in the organizer’s own docs, which the AI reads before reinventing.

How it works in practice

The real turning point, for me, was this: because what was planned lives in the organizer and not in the session, I can execute each task in a completely clean session, without losing quality. The agent opens, reads the board, understands where it is, does the work. It doesn’t matter if the previous context has already evaporated — the plan wasn’t in its head, it was in the database.

That solves both ghosts at once: the execution order (what depends on what is explicit in the blockers) and the hallucination (it starts from what was decided, not from a vague memory). And it lets me run several fronts at the same time, sequentially or in parallel, without getting lost.

Detail of a card in claude-organizer: description, acceptance criteria, status, sprint, tags and the attached commit
Each card carries the description, the acceptance criteria and the commit that delivered it — captured outside the AI's context, without spending tokens reading the diff.

Decisions don’t get lost because they become comments on the card: the agent records why it did something a certain way, and I reply right there. Next session, it reads the comments it hadn’t seen yet and picks up from that point. It’s a decision log that survives the end of each conversation.

A card's comment thread and the attached commit diff, with a human comment not yet read by the AI
Comments are the decision log: the AI records the why, I reply, and it re-reads next session whatever it hadn't read yet.

The nicest side effect: the repository went back to being just the product. No spec folders fattening up git. The “what to do and why” lives in the organizer, reachable from any session; the code lives in the repo. Each thing in its place.

How to get started

It runs as a Claude Code plugin (the five skills + the MCP server), with the whole stack in Docker. To bring it up:

1. Bring up the stack (Postgres + migrations + API + UI + MCP, in one shot):

git clone https://github.com/fmilioni/claude-organizer.git
cd claude-organizer
cp .env.example .env
docker compose up -d --build

The UI lives at http://localhost:4401, the API at :4400 and the MCP at :4402/mcp. Migrations run automatically before the API starts.

2. Install the plugin — it delivers the skills and registers the MCP, with no need for claude mcp add:

/plugin marketplace add fmilioni/claude-organizer
/plugin install claude-organizer@claude-organizer

3. Talk to Claude. Just say it. “I want to plan such-and-such feature” triggers the planning skill; “let’s continue, what’s next?” makes the AI read the board and pick up where it left off. The skills trigger from what you say — you don’t need to memorize any command.

The project is open source (MIT). You can run it 100% locally with Docker or host it on a VPS and point Claude Code at it.

I want to know how you organize this

This is, deep down, the bridge that interests me most: bringing the process discipline engineering already has into the new way of building, with AI at the wheel. claude-organizer is my attempt to close that gap — and it was born from a real pain, from watching good work get lost between one session and the next.

So how have you been dealing with this? Dumping everything into a spec file? Trusting the model’s memory? Got a flow that works? Tell me in the comments — I’d really love to hear how other people are solving this same problem.

And if you want to try it, the code is all open: clone it, test it and contribute on GitHub. Feedback and PRs are very welcome.

Share

Related posts