We built a full Opsgenie replacement in 10 weeks using Claude Code

Atlassian announced they're sunsetting Opsgenie on April 5, 2027. Every engineering team using Opsgenie for on-call alerting now has a ticking clock — and no obvious replacement that preserves their DORA metrics, escalation policies, and incident history in one place.

I decided to build that replacement. Not just an Opsgenie clone — a full engineering intelligence platform: DORA metrics, deploy risk prediction, CODEOWNERS governance, coverage correlation, and a complete incident management layer. The kind of platform that would normally take a team of 8–10 engineers 12–18 months to ship.

I shipped it in 10 weeks. Alone. Using Claude Code as my development partner.

This is the honest account of what that looked like — what worked, what didn't, and what it means for how software gets built.

What we shipped

By the numbers, Koalr is:

187+ dashboard pages — DORA, deploy risk, incidents, on-call, CODEOWNERS, coverage, service catalog, team health, forecasting, custom reports, and more
126 NestJS API modules — multi-tenant, auth-gated, with audit logging, rate limiting, and a public API
14 integrations — GitHub, Jira, Linear, PagerDuty, Opsgenie, Slack, Codecov, SonarCloud, Vercel, Railway, Netlify, Render, Fly.io, AWS CodeDeploy
Full incident management — alert ingestion from Datadog/CloudWatch/Prometheus/Grafana/Sentry, on-call scheduling with rotation, escalation policies with Twilio voice/SMS, Slack-native workflow, postmortem generation, AI investigation on declaration
Deploy risk prediction — 7-factor scoring model with logistic regression weight tuning on historical outcomes. The only pre-merge deploy risk tool on the market.
LLM-powered AI chat — Claude with live context from your GitHub PRs, deployments, and incident history. Ask "which repos are most likely to cause an incident this week?" and get an answer backed by real data.

The workflow

I didn't use Claude Code as an autocomplete tool. I used it as a pair programmer — one that never gets tired, never loses context (with the right tooling), and can hold the entire codebase in mind while you describe the next feature.

The workflow that worked:

Write the plan, not the code. I maintained a STRATEGIC_PLAN_2026.md file with every sprint defined — feature by feature, with acceptance criteria, database models, API endpoints, and frontend components specified. Claude Code reads this at the start of every session and knows exactly what to build next.
Let the agent explore before writing. Before implementing anything non-trivial, Claude Code reads the existing codebase — the Prisma schema, the module structure, existing patterns. It builds a mental model of how things fit together before touching a file. This eliminated most of the "here's a change that works in isolation but breaks your existing conventions" failures.
Parallel agents for parallel work. Claude Code can launch multiple subagents simultaneously. While one agent was building the Twilio telephony integration, another was writing the frontend status page components. The same feature that would take a senior engineer 3 days took 2–3 hours of wall-clock time.
CI as the final arbiter. I have a pre-commit hook that runs Prettier. I have a CI pipeline that runs TypeScript checks, ESLint, and tests. Claude Code knows it needs to pass these. It treats the build as ground truth, not its own assessment of correctness. When CI fails, it investigates and fixes. This turned out to be the most important constraint I imposed.
CLAUDE.md as the constitution. A single file in the repo root documents every critical rule — the client/server boundary between api-client.ts and client-api.ts, the Prisma import convention, the FastifyReply usage pattern, the conventional commit format. Claude Code reads this at session start. It eliminated 90% of the pattern violations that plagued early sessions.

What Claude Code can't do (yet)

Being honest about the limitations:

It doesn't know your business. Claude Code can build anything you describe. But describing what to build — the product decisions, the positioning, the pricing — that's still entirely human work. The plan I spent weeks writing (competitive analysis, feature prioritization, GTM strategy) was the real leverage. The code was almost the easy part.
Context compression loses nuance. Long sessions eventually hit context limits. The auto-compression is good but occasionally loses a subtle constraint or decision. The CLAUDE.md file and the strategic plan as persistent context files were the fix — they get re-read every session.
QA is still yours. Claude Code ships working code. But "working" means "compiles, passes tests, renders in browser." The actual product experience — is this the right UX? does this feel right in context? — needs a human. I built an end-to-end QA checklist into the workflow: every feature gets manual browser verification before it's committed.
External integrations need real credentials. You can't test Twilio without a real Twilio account. You can't test Stripe webhooks without real events. Claude Code can build the integration perfectly — but you're still doing the integration testing yourself.

How we shipped incident management in 10 days

The incident management sprint is the best example of what this workflow enables. In 10 days, working evenings after a day job:

Alert ingestion from 5 monitoring tools (Datadog, CloudWatch, Prometheus, Grafana, Sentry)
Deduplication within 5-minute windows (burst of alerts = 1 incident)
Auto-resolve on monitoring tool resolved webhook + MTTR computation
On-call scheduling — rotation layers (daily/weekly/custom), overrides, timezone support
Escalation policies — multi-step, delay timers, target types (schedule/user/team), BullMQ-backed
Twilio voice call + SMS alerts with acknowledgement (press 1 to ack)
Slack-native workflow — slash commands, auto-created incident channels, live timeline bot
Status pages with SSE live updates
AI investigation — Claude queries recent deploys, CODEOWNERS, and postmortem history; generates probable cause within 60 seconds of declaration
Postmortem generation — Claude drafts a full postmortem from the incident timeline
Runbook library — AI generates runbooks for incidents without existing ones

This would normally be a 6–9 month project for a team of 4 engineers. We shipped it in 10 evenings. The code quality — TypeScript strict mode, NestJS patterns, proper error handling, idempotent queue processing — is production-grade, not prototype-grade.

The feature no one else has: deploy risk prediction

While building incident management, I kept noticing the same pattern: most incidents trace back to a specific deploy. A deploy that, in hindsight, everyone knew was risky — large change, first-time author in a critical module, no review, late Friday push.

So I built the prediction layer. Every open PR gets a 0–100 risk score based on 7 signals: change size, file count, directory entropy (blast radius), author expertise in changed files, review coverage, DDL/migration detection, and description risk indicators.

The model self-tunes. After each deployment, we record the outcome (incident or not) and run logistic regression to adjust the factor weights to your team's specific patterns. If your team ships safely on Fridays, the timing weight goes to near-zero for your org.

I checked every competitor. Nobody does this. DORA tools measure outcomes. LinearB flags code quality after the PR. No one predicts deployment risk before you merge.

Try it without signing up: koalr.com/tools/deploy-risk-calculator

What this means for software teams

The interesting question isn't "can AI write code?" — we're past that. The interesting question is: what does software development look like when a single person can ship a platform that used to require a team?

My answer, after 10 weeks of living in this workflow: the bottleneck shifts from implementation to specification. The hard part is knowing exactly what to build, why, and in what order. The product thinking, the competitive analysis, the architectural decisions — those are still 100% human. The implementation — writing correct TypeScript, wiring NestJS modules, structuring database queries — is increasingly something you describe and verify rather than write.

The teams that will win in the next 3 years are the ones who learn to work this way. Not by replacing engineers — by multiplying their leverage.

Try Koalr

Deploy risk prediction, incident management, DORA metrics, and LLM-powered engineering intelligence — in one platform. Built with Claude Code.

Start free trial Try deploy risk calculator →