Skip to main content
AI/MLcoalesce-labs

feature-results

Post-launch analysis and results documentation. Document what shipped and what we learned.

Stars
12
Source
coalesce-labs/catalyst
Updated
2026-05-31
Slug
coalesce-labs--catalyst--feature-results
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/coalesce-labs/catalyst/HEAD/plugins/pm/skills/feature-results/SKILL.md -o .claude/skills/feature-results.md

Drops the SKILL.md into .claude/skills/feature-results.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

/feature-results - Post-Launch Analysis

Document feature results and capture learnings.

Quick Start

/feature-results

Then provide:

  1. Feature name and ship date
  2. Original hypothesis (or I'll pull it from your PRD)
  3. Actual results (metrics, or I'll query your analytics MCP)

I'll generate a results doc with: executive summary, results vs targets, root cause analysis (if missed), segment breakdown, estimate calibration, stakeholder communication guidance, and concrete next steps.

Output: Saved to thoughts/shared/pm/analyses/feature-results-[name]-[date].md Time: ~10 min with data ready, ~20 min if we need to dig


Pre-Launch Detection

Before generating a results analysis, verify the feature has actually shipped.

Check for launch signals:

  • Does the PRD in thoughts/shared/pm/prds/ show status "Shipped" or "Launched"?
  • Does the PM mention actual results data (not projections)?

If the feature hasn't launched yet:

It looks like [feature] hasn't shipped yet. Post-launch analysis works best with real data.

For pre-launch work, you might want:
- `/feature-metrics` - Define what you'll measure before launch
- `/impact-sizing` - Estimate expected impact
- `/experiment-decision` - Plan your A/B test or rollout
- `/launch-checklist` - Make sure you're ready to ship

Want to proceed anyway (e.g., analyzing a soft launch or pilot), or switch to one of these?

When to Use

  • After A/B test completes
  • After feature reaches full rollout
  • During quarterly reviews
  • For portfolio analysis

Quick Start Prompt

When PM types /feature-results, respond:

Let's document the results of your feature launch.

Tell me:
1. What feature shipped?
2. What were the original goals/metrics?
3. What actually happened?

I'll help you create a results doc that captures outcomes and learnings.

Results Doc Structure

1. Executive Summary

  • One paragraph: What shipped, did it work, what's next

2. Original Hypothesis

  • What we expected to happen
  • Why we believed it

3. Results

  • Primary metric: Target vs. Actual
  • Guardrail metrics: Any issues?
  • Unexpected findings

4. Analysis

  • Why did results occur?
  • What surprised us?
  • What didn't we anticipate?

5. Learnings

  • What would we do differently?
  • What should we apply elsewhere?

6. Next Steps

  • Keep / Iterate / Kill?
  • Follow-up experiments
  • Related opportunities

Output Template

# Feature Results: [Feature Name]

**Ship Date:** [Date]
**Analysis Date:** [Date]
**Owner:** [PM Name]

---

## Executive Summary

[1-2 sentences: What shipped, key result, recommendation]

**Verdict:** [Ship to 100% / Iterate / Kill]

---

## Original Hypothesis

**If we** [built X],
**then** [Y would happen],
**because** [Z assumption].

**Target:** [Primary metric] would improve by [X%]

---

## Results

### Primary Metric

| Metric | Baseline | Target | Actual | Verdict    |
| ------ | -------- | ------ | ------ | ---------- |
| [Name] | [X]      | [Y]    | [Z]    | [Hit/Miss] |

### Statistical Significance

- Sample size: [N]
- Confidence level: [X%]
- P-value: [X]

### Guardrail Metrics

| Metric   | Acceptable Range | Actual   | Status       |
| -------- | ---------------- | -------- | ------------ |
| [Metric] | [range]          | [actual] | [OK/Warning] |

### Unexpected Findings

- [Finding 1]
- [Finding 2]

---

## Analysis

### Why These Results?

[Explain the underlying reasons for the results]

### What Surprised Us?

[Things we didn't expect]

### Segment Breakdown

| Segment     | Result  | Notes     |
| ----------- | ------- | --------- |
| [Segment 1] | [+/-X%] | [insight] |
| [Segment 2] | [+/-X%] | [insight] |

---

## Learnings

### What Worked

- [Learning 1]
- [Learning 2]

### What Didn't Work

- [Learning 1]
- [Learning 2]

### Apply Elsewhere

- [Insight that applies to other features]

---

## Next Steps

**Recommendation:** [Ship / Iterate / Kill]

**If shipping:**

- [ ] Roll out to 100%
- [ ] Monitor for [X] weeks
- [ ] Update documentation

**If iterating:**

- [ ] [Specific change 1]
- [ ] [Specific change 2]
- [ ] Re-run experiment

**If killing:**

- [ ] Document why
- [ ] Rollback plan
- [ ] Communicate to stakeholders

---

## Appendix

- [Link to experiment dashboard]
- [Link to original PRD]
- [Link to detailed data]

Root Cause Analysis

When results miss targets, diagnose using this framework:

Discovery Problem (Users don't know it exists)

  • Check: Feature awareness rate, in-app visibility, announcement reach
  • Common causes: Buried in UI, no onboarding tooltip, weak announcement
  • Fix: Improve discovery through tooltips, email campaign, in-app prompts

Quality Problem (Users try it but it doesn't work well)

  • Check: Error rates, completion rates, time-on-task, feedback sentiment
  • Common causes: Bugs, poor UX, confusing workflow, slow performance
  • Fix: Fix bugs, simplify UX, improve performance

Relevance Problem (Users try it but it doesn't solve their problem)

  • Check: Use case analysis, segment breakdown, feature-problem fit
  • Common causes: Wrong target audience, edge cases not handled, misaligned with actual needs
  • Fix: Narrow targeting, add missing use cases, revisit problem definition

Frequency Problem (Users use it once but don't come back)

  • Check: Repeat usage rates, habit formation metrics, triggers for return
  • Common causes: Not integrated into daily workflow, one-time value, no triggers
  • Fix: Add recurring value hooks, notifications, integration with other workflows

Estimate vs Actual Calibration

Metric Original Estimate Actual Result Accuracy
[Primary metric] [From impact-sizing] [Actual] [Over/Under by X%]
[Secondary metric] [From impact-sizing] [Actual] [Over/Under by X%]
[Guardrail metric] [Expected range] [Actual] [Within/Outside range]

Calibration insight: [Are we consistently over/under-estimating? By how much? What should we adjust for next time?]

Historical calibration pattern:

  • Look at past 3-5 feature results. If we consistently over-estimate by 30-50%, apply a discount factor to future impact-sizing.
  • If we consistently under-estimate, we may be too conservative or missing amplification effects.
  • Track this over time to improve estimation accuracy.

Segment Breakdown Guidance

Always analyze results across these default segments. Segment analysis often reveals that an overall miss is actually a win for one segment and a miss for another.

Segment Dimension Why It Matters
New vs existing users New users may not find the feature; existing users may resist change
Plan tier (Free/Team/Business/Enterprise) Feature value often varies dramatically by tier
Company size Small teams adopt differently than large orgs
Power users vs casual users Power users may love it; casual users may be confused
Geographic region (if applicable) Cultural, language, and connectivity differences

Segment analysis template:

Segment N (sample) Primary Metric vs Target Insight
New users [N] [result] [+/-X%] [why]
Existing users [N] [result] [+/-X%] [why]
Free tier [N] [result] [+/-X%] [why]
Paid tier [N] [result] [+/-X%] [why]
Power users [N] [result] [+/-X%] [why]
Casual users [N] [result] [+/-X%] [why]

How to Communicate Results

If results beat targets: Lead with the win, credit the team, connect to strategy. Share in #product-team, include in weekly status update, highlight in next board deck.

If results miss targets: Lead with what you learned, not what went wrong. Frame as: "Here's what we hypothesized, here's what actually happened, here's what we're doing about it." Never hide bad results -- own them fast.

If results are ambiguous: Be honest about uncertainty. "We saw X but can't conclusively attribute it to Y because Z. Here's our plan to get clearer signal."

Audience-Specific Framing

Audience Lead With Include Avoid
CEO/Board Impact on key business metrics + strategic implications Revenue impact, user growth, competitive position Technical details, minor metrics
CPO/Manager Learnings + next steps + resource ask What worked, what didn't, proposed plan Blame, excuses
Engineering team What worked technically + what to improve Performance data, technical wins, tech debt notes Business jargon
Sales/CS Customer-facing impact + talking points User quotes, feature highlights, objection handlers Internal debates, negative framing
Design User behavior changes + UX insights Usage patterns, qualitative feedback, heatmaps Pure metric tables without context

Example Patterns

Pattern 1: Clear Win

Scenario: Feature shipped, primary metric exceeded target by 20%. Guardrails held. Root cause: N/A (success) Action: Scale to 100%, plan v2 based on segment insights, draft announcement via /catalyst-pm-ops:slack-message, update /catalyst-pm-ops:status-update with win. Communication: Lead with the number, credit the team, connect to quarterly goals.

Pattern 2: Mixed Results

Scenario: Primary metric hit target but guardrail metric degraded (e.g., page load time increased 15%). Root cause: Quality problem -- feature added weight to core flow. Action: Investigate quality regression, hold at current rollout %, fix guardrail before scaling. Create follow-up ticket via /catalyst-pm-ops:create-tickets. Communication: "Feature is working as intended for users, but we identified a performance regression we need to fix before full rollout."

Pattern 3: Clear Miss

Scenario: Primary metric at 60% of target after 4 weeks. Root cause: Discovery problem -- only 12% of eligible users found the feature. Among those who found it, conversion was above target. Action: Don't kill the feature -- improve discovery (add tooltip, email campaign, onboarding mention), re-measure in 4 weeks. Communication: "The feature works for users who find it, but we have a distribution problem. Plan: improve discovery and re-measure."

Pattern 4: Kill Decision

Scenario: Primary metric at 30% of target after 6 weeks. Root cause is relevance -- users tried it but it doesn't solve their actual problem. Root cause: Relevance problem -- hypothesis was wrong. Action: Document the learning, roll back, redirect engineering effort. Draft decision doc via /decision-doc. Communication: "We tested our hypothesis that X would solve Y. Data shows it doesn't. Here's what we learned and where we're redirecting effort."


Tips

  • Be honest about failures - Negative results are still valuable
  • Capture learnings immediately - Don't wait; context fades
  • Share widely - Others can learn from your experiments
  • Connect to strategy - How does this inform our roadmap?
  • Run root cause analysis - Don't just report the miss; diagnose why
  • Check segments - An overall miss may hide a segment win
  • Calibrate estimates - Track accuracy over time to improve future sizing

Context Routing Strategy

When the PM uses /feature-results, I automatically:

1. Pull Original PRD & Hypothesis

Source: thoughts/shared/pm/prds/, conversation history

  • What I look for: Original hypothesis, target metrics, success criteria
  • How I use it: Compare actual results against original targets
  • Example: Auto-populate "Original Hypothesis" section from the PRD you shared

2. Query Analytics MCPs

Source: PostHog, PostHog, Posthog, PostHog (if connected)

  • What I look for: Real-time metrics for the feature being analyzed
  • How I use it: Populate results tables with actual data
  • Fallback: If no MCP connected, I'll ask you to provide the metrics

3. Check Related Experiments

Source: thoughts/shared/pm/metrics/, past feature results

  • What I look for: Similar features' results, patterns in what works
  • How I use it: Add context like "This compares to our pricing redesign which saw 12% improvement"

4. Extract Guardrail Metrics

Source: thoughts/shared/pm/metrics/, PRD success metrics section

  • What I look for: Metrics that must not decline (retention, NPS, etc.)
  • How I use it: Pre-populate guardrail metrics table with relevant metrics

5. Identify Learnings for Portfolio

Source: Cross-reference with other feature results

  • What I look for: Patterns across multiple experiments
  • How I use it: Suggest "Apply Elsewhere" section based on similar features

6. Route Results for Sharing

Routing logic:

  • Ship to 100%: Draft announcement for /catalyst-pm-ops:slack-message
  • Iterate: Create follow-up experiment plan with /experiment-decision
  • Kill: Draft decision doc for /decision-doc explaining sunset
  • Executive update: Format for /catalyst-pm-ops:status-update with business impact

Output Quality Self-Check

Before delivering the feature results doc, verify:

  • Executive summary is 1-2 sentences max and includes a clear verdict (Ship/Iterate/Kill)
  • Original hypothesis is stated in If/Then/Because format
  • Primary metric has baseline, target, and actual with statistical significance noted
  • Guardrail metrics are checked and status is clear (OK/Warning/Breach)
  • Root cause analysis is included if results missed targets (Discovery/Quality/Relevance/Frequency)
  • Segment breakdown covers at least 3 dimensions (new vs existing, tier, user type)
  • Estimate vs actual calibration table is populated with accuracy percentages
  • Stakeholder communication section has audience-appropriate framing
  • Next steps are specific, actionable, and have owners
  • Learnings include at least one "apply elsewhere" insight
  • No corporate jargon -- results are written in plain, direct language
  • Linked to original PRD and any related experiments