/feature-results - Post-Launch Analysis

Document feature results and capture learnings.

Quick Start

/feature-results

Then provide:

Feature name and ship date
Original hypothesis (or I'll pull it from your PRD)
Actual results (metrics, or I'll query your analytics MCP)

I'll generate a results doc with: executive summary, results vs targets, root cause analysis (if missed), segment breakdown, estimate calibration, stakeholder communication guidance, and concrete next steps.

Output: Saved to thoughts/shared/pm/analyses/feature-results-[name]-[date].md Time: ~10 min with data ready, ~20 min if we need to dig

Pre-Launch Detection

Before generating a results analysis, verify the feature has actually shipped.

Check for launch signals:

Does the PRD in thoughts/shared/pm/prds/ show status "Shipped" or "Launched"?
Does the PM mention actual results data (not projections)?

If the feature hasn't launched yet:

It looks like [feature] hasn't shipped yet. Post-launch analysis works best with real data.

For pre-launch work, you might want:
- `/feature-metrics` - Define what you'll measure before launch
- `/impact-sizing` - Estimate expected impact
- `/experiment-decision` - Plan your A/B test or rollout
- `/launch-checklist` - Make sure you're ready to ship

Want to proceed anyway (e.g., analyzing a soft launch or pilot), or switch to one of these?

When to Use

After A/B test completes
After feature reaches full rollout
During quarterly reviews
For portfolio analysis

Quick Start Prompt

When PM types /feature-results, respond:

Let's document the results of your feature launch.

Tell me:
1. What feature shipped?
2. What were the original goals/metrics?
3. What actually happened?

I'll help you create a results doc that captures outcomes and learnings.

Results Doc Structure

1. Executive Summary

One paragraph: What shipped, did it work, what's next

2. Original Hypothesis

What we expected to happen
Why we believed it

3. Results

Primary metric: Target vs. Actual
Guardrail metrics: Any issues?
Unexpected findings

4. Analysis

Why did results occur?
What surprised us?
What didn't we anticipate?

5. Learnings

What would we do differently?
What should we apply elsewhere?

6. Next Steps

Keep / Iterate / Kill?
Follow-up experiments
Related opportunities

Output Template

# Feature Results: [Feature Name]

**Ship Date:** [Date]
**Analysis Date:** [Date]
**Owner:** [PM Name]

---

## Executive Summary

[1-2 sentences: What shipped, key result, recommendation]

**Verdict:** [Ship to 100% / Iterate / Kill]

---

## Original Hypothesis

**If we** [built X],
**then** [Y would happen],
**because** [Z assumption].

**Target:** [Primary metric] would improve by [X%]

---

## Results

### Primary Metric

| Metric | Baseline | Target | Actual | Verdict    |
| ------ | -------- | ------ | ------ | ---------- |
| [Name] | [X]      | [Y]    | [Z]    | [Hit/Miss] |

### Statistical Significance

- Sample size: [N]
- Confidence level: [X%]
- P-value: [X]

### Guardrail Metrics

| Metric   | Acceptable Range | Actual   | Status       |
| -------- | ---------------- | -------- | ------------ |
| [Metric] | [range]          | [actual] | [OK/Warning] |

### Unexpected Findings

- [Finding 1]
- [Finding 2]

---

## Analysis

### Why These Results?

[Explain the underlying reasons for the results]

### What Surprised Us?

[Things we didn't expect]

### Segment Breakdown

| Segment     | Result  | Notes     |
| ----------- | ------- | --------- |
| [Segment 1] | [+/-X%] | [insight] |
| [Segment 2] | [+/-X%] | [insight] |

---

## Learnings

### What Worked

- [Learning 1]
- [Learning 2]

### What Didn't Work

- [Learning 1]
- [Learning 2]

### Apply Elsewhere

- [Insight that applies to other features]

---

## Next Steps

**Recommendation:** [Ship / Iterate / Kill]

**If shipping:**

- [ ] Roll out to 100%
- [ ] Monitor for [X] weeks
- [ ] Update documentation

**If iterating:**

- [ ] [Specific change 1]
- [ ] [Specific change 2]
- [ ] Re-run experiment

**If killing:**

- [ ] Document why
- [ ] Rollback plan
- [ ] Communicate to stakeholders

---

## Appendix

- [Link to experiment dashboard]
- [Link to original PRD]
- [Link to detailed data]

Root Cause Analysis

When results miss targets, diagnose using this framework:

Discovery Problem (Users don't know it exists)

Check: Feature awareness rate, in-app visibility, announcement reach
Common causes: Buried in UI, no onboarding tooltip, weak announcement
Fix: Improve discovery through tooltips, email campaign, in-app prompts

Quality Problem (Users try it but it doesn't work well)

Check: Error rates, completion rates, time-on-task, feedback sentiment
Common causes: Bugs, poor UX, confusing workflow, slow performance
Fix: Fix bugs, simplify UX, improve performance

Relevance Problem (Users try it but it doesn't solve their problem)

Check: Use case analysis, segment breakdown, feature-problem fit
Common causes: Wrong target audience, edge cases not handled, misaligned with actual needs
Fix: Narrow targeting, add missing use cases, revisit problem definition

Frequency Problem (Users use it once but don't come back)

Check: Repeat usage rates, habit formation metrics, triggers for return
Common causes: Not integrated into daily workflow, one-time value, no triggers
Fix: Add recurring value hooks, notifications, integration with other workflows

Estimate vs Actual Calibration

Metric	Original Estimate	Actual Result	Accuracy
[Primary metric]	[From impact-sizing]	[Actual]	[Over/Under by X%]
[Secondary metric]	[From impact-sizing]	[Actual]	[Over/Under by X%]
[Guardrail metric]	[Expected range]	[Actual]	[Within/Outside range]

Calibration insight: [Are we consistently over/under-estimating? By how much? What should we adjust for next time?]

Historical calibration pattern:

Look at past 3-5 feature results. If we consistently over-estimate by 30-50%, apply a discount factor to future impact-sizing.
If we consistently under-estimate, we may be too conservative or missing amplification effects.
Track this over time to improve estimation accuracy.

Segment Breakdown Guidance

Always analyze results across these default segments. Segment analysis often reveals that an overall miss is actually a win for one segment and a miss for another.

Segment Dimension	Why It Matters
New vs existing users	New users may not find the feature; existing users may resist change
Plan tier (Free/Team/Business/Enterprise)	Feature value often varies dramatically by tier
Company size	Small teams adopt differently than large orgs
Power users vs casual users	Power users may love it; casual users may be confused
Geographic region (if applicable)	Cultural, language, and connectivity differences

Segment analysis template:

Segment	N (sample)	Primary Metric	vs Target	Insight
New users	[N]	[result]	[+/-X%]	[why]
Existing users	[N]	[result]	[+/-X%]	[why]
Free tier	[N]	[result]	[+/-X%]	[why]
Paid tier	[N]	[result]	[+/-X%]	[why]
Power users	[N]	[result]	[+/-X%]	[why]
Casual users	[N]	[result]	[+/-X%]	[why]

How to Communicate Results

If results beat targets: Lead with the win, credit the team, connect to strategy. Share in #product-team, include in weekly status update, highlight in next board deck.

If results miss targets: Lead with what you learned, not what went wrong. Frame as: "Here's what we hypothesized, here's what actually happened, here's what we're doing about it." Never hide bad results -- own them fast.

If results are ambiguous: Be honest about uncertainty. "We saw X but can't conclusively attribute it to Y because Z. Here's our plan to get clearer signal."

Audience-Specific Framing

Audience	Lead With	Include	Avoid
CEO/Board	Impact on key business metrics + strategic implications	Revenue impact, user growth, competitive position	Technical details, minor metrics
CPO/Manager	Learnings + next steps + resource ask	What worked, what didn't, proposed plan	Blame, excuses
Engineering team	What worked technically + what to improve	Performance data, technical wins, tech debt notes	Business jargon
Sales/CS	Customer-facing impact + talking points	User quotes, feature highlights, objection handlers	Internal debates, negative framing
Design	User behavior changes + UX insights	Usage patterns, qualitative feedback, heatmaps	Pure metric tables without context

Example Patterns

Pattern 1: Clear Win

Scenario: Feature shipped, primary metric exceeded target by 20%. Guardrails held. Root cause: N/A (success) Action: Scale to 100%, plan v2 based on segment insights, draft announcement via /catalyst-pm-ops:slack-message, update /catalyst-pm-ops:status-update with win. Communication: Lead with the number, credit the team, connect to quarterly goals.

Pattern 2: Mixed Results

Scenario: Primary metric hit target but guardrail metric degraded (e.g., page load time increased 15%). Root cause: Quality problem -- feature added weight to core flow. Action: Investigate quality regression, hold at current rollout %, fix guardrail before scaling. Create follow-up ticket via /catalyst-pm-ops:create-tickets. Communication: "Feature is working as intended for users, but we identified a performance regression we need to fix before full rollout."

Pattern 3: Clear Miss

Scenario: Primary metric at 60% of target after 4 weeks. Root cause: Discovery problem -- only 12% of eligible users found the feature. Among those who found it, conversion was above target. Action: Don't kill the feature -- improve discovery (add tooltip, email campaign, onboarding mention), re-measure in 4 weeks. Communication: "The feature works for users who find it, but we have a distribution problem. Plan: improve discovery and re-measure."

Pattern 4: Kill Decision

Scenario: Primary metric at 30% of target after 6 weeks. Root cause is relevance -- users tried it but it doesn't solve their actual problem. Root cause: Relevance problem -- hypothesis was wrong. Action: Document the learning, roll back, redirect engineering effort. Draft decision doc via /decision-doc. Communication: "We tested our hypothesis that X would solve Y. Data shows it doesn't. Here's what we learned and where we're redirecting effort."

Tips

Be honest about failures - Negative results are still valuable
Capture learnings immediately - Don't wait; context fades
Share widely - Others can learn from your experiments
Connect to strategy - How does this inform our roadmap?
Run root cause analysis - Don't just report the miss; diagnose why
Check segments - An overall miss may hide a segment win
Calibrate estimates - Track accuracy over time to improve future sizing

Context Routing Strategy

When the PM uses /feature-results, I automatically:

1. Pull Original PRD & Hypothesis

Source: thoughts/shared/pm/prds/, conversation history

What I look for: Original hypothesis, target metrics, success criteria
How I use it: Compare actual results against original targets
Example: Auto-populate "Original Hypothesis" section from the PRD you shared

2. Query Analytics MCPs

Source: PostHog, PostHog, Posthog, PostHog (if connected)

What I look for: Real-time metrics for the feature being analyzed
How I use it: Populate results tables with actual data
Fallback: If no MCP connected, I'll ask you to provide the metrics

3. Check Related Experiments

Source: thoughts/shared/pm/metrics/, past feature results

What I look for: Similar features' results, patterns in what works
How I use it: Add context like "This compares to our pricing redesign which saw 12% improvement"

4. Extract Guardrail Metrics

Source: thoughts/shared/pm/metrics/, PRD success metrics section

What I look for: Metrics that must not decline (retention, NPS, etc.)
How I use it: Pre-populate guardrail metrics table with relevant metrics

5. Identify Learnings for Portfolio

Source: Cross-reference with other feature results

What I look for: Patterns across multiple experiments
How I use it: Suggest "Apply Elsewhere" section based on similar features

6. Route Results for Sharing

Routing logic:

Ship to 100%: Draft announcement for /catalyst-pm-ops:slack-message
Iterate: Create follow-up experiment plan with /experiment-decision
Kill: Draft decision doc for /decision-doc explaining sunset
Executive update: Format for /catalyst-pm-ops:status-update with business impact

Output Quality Self-Check

Before delivering the feature results doc, verify:

Executive summary is 1-2 sentences max and includes a clear verdict (Ship/Iterate/Kill)
Original hypothesis is stated in If/Then/Because format
Primary metric has baseline, target, and actual with statistical significance noted
Guardrail metrics are checked and status is clear (OK/Warning/Breach)
Root cause analysis is included if results missed targets (Discovery/Quality/Relevance/Frequency)
Segment breakdown covers at least 3 dimensions (new vs existing, tier, user type)
Estimate vs actual calibration table is populated with accuracy percentages
Stakeholder communication section has audience-appropriate framing
Next steps are specific, actionable, and have owners
Learnings include at least one "apply elsewhere" insight
No corporate jargon -- results are written in plain, direct language
Linked to original PRD and any related experiments