What I Check Before Shipping Revenue-Critical Frontend Features
What I Check Before Shipping Revenue-Critical Frontend Features
The difference between a feature that works in staging and one that's production-ready is a 47-point checklist I learned the hard way.
When you're shipping checkout flows, payment processing, or anything that touches money, "it works on my machine" isn't enough. A single edge case can cost six figures in refunds and support burden.
Context: The Real Problem
Most production incidents I've debugged weren't caused by complex bugs. They were caused by unhandled states nobody thought to test:
- User closes tab mid-payment
- Network drops after request sent but before response
- User refreshes page during processing
- Backend returns success but with error payload
- Session expires during multi-step flow
These scenarios don't show up in happy-path testing. They show up at 2 AM when users are calling support.
Why This Matters in Production
In a system processing $10M monthly transactions:
- 1% failure rate = $100K in refunds
- 5-minute outage = $35K lost revenue
- One duplicate charge = trending on Twitter
The frontend isn't just UI. It's the last line of defense before user money moves.
My Pre-Ship Checklist
1. Edge Case Mapping
Before writing a single line of code, I enumerate every possible state:
// For a payment flow, map ALL states
type PaymentState =
| { status: 'idle' }
| { status: 'validating'; field: string }
| { status: 'submitting' }
| { status: 'processing'; transactionId: string }
| { status: 'succeeded'; orderId: string }
| { status: 'failed'; reason: string; retryable: boolean }
| { status: 'timeout'; transactionId: string }
| { status: 'unknown'; transactionId: string }
I verify:
- UI handles every state explicitly
- No undefined states exist
- State transitions are valid
- Loading states prevent duplicate submissions
Red flags:
- Boolean flags instead of enums
- Implicit state in multiple variables
- No handling for "unknown" state
2. Failure State Enumeration
I categorize every failure mode:
Network failures:
- Request never sent (offline)
- Request sent, no response (timeout)
- Response received, can't parse (malformed)
- Response indicates retry (503, 429)
Business logic failures:
- Insufficient funds
- Invalid payment method
- Duplicate transaction detected
- Fraud check failed
System failures:
- Session expired
- Rate limited
- Service degraded
- Maintenance mode
For each failure, I define:
- User-facing message
- Recovery action (retry, contact support, try different method)
- Analytics event
- Support ticket data
3. Monitoring Hooks
Before shipping, I instrument everything:
function PaymentFlow() {
const { track } = useAnalytics()
useEffect(() => {
track('payment_flow_viewed', {
flowVersion: 'v2.1',
timestamp: Date.now(),
})
}, [])
const handleSubmit = async (data: PaymentData) => {
const startTime = Date.now()
track('payment_submit_attempted', { amount: data.amount })
try {
const result = await processPayment(data)
track('payment_submit_succeeded', {
orderId: result.orderId,
duration: Date.now() - startTime,
})
return result
} catch (error) {
track('payment_submit_failed', {
errorType: error.type,
errorMessage: error.message,
duration: Date.now() - startTime,
retryable: error.retryable,
})
throw error
}
}
// ...
}
I track:
- Every state transition
- Time spent in each state
- Errors with full context
- User actions (clicks, inputs, navigation)
- Performance metrics (render time, API latency)
4. Feature Flags with Circuit Breakers
All revenue-critical features ship behind flags:
const { isEnabled, config } = useFeatureFlag('new-payment-flow')
if (!isEnabled) {
return <LegacyPaymentFlow />
}
// Also check circuit breaker
const { isHealthy } = useCircuitBreaker('payment-service')
if (!isHealthy) {
return <PaymentDegradedState />
}
This allows:
- Instant rollback without deploy
- Gradual rollout (1% → 10% → 50% → 100%)
- A/B testing with revenue metrics
- Emergency kill switch
5. Rollback Strategy
Before shipping, I document:
Rollback triggers:
- Error rate > 5% for 5 minutes
- P95 latency > 3 seconds
- Conversion rate drops > 10%
- Support tickets spike > 3x baseline
Rollback process:
- Toggle feature flag off (instant)
- Verify metrics return to baseline
- Communicate to stakeholders
- Debug in staging
Data implications:
- Can users resume interrupted flows?
- Do we need to migrate incomplete transactions?
- How do we handle users mid-flow during rollback?
6. Load Testing Revenue Paths
I test under realistic load:
# Simulate 1000 concurrent checkout attempts
k6 run --vus 1000 --duration 30s checkout-load-test.js
I verify:
- No race conditions under concurrency
- Request deduplication works
- Rate limiting behaves correctly
- Database locks don't deadlock
- Error messages stay meaningful under load
Red flags:
- Performance degrades non-linearly
- Error rates increase with load
- Timeouts occur intermittently
- Memory usage grows unbounded
7. Idempotency Verification
For any mutation, I verify:
// Every mutation needs an idempotency key
async function submitOrder(orderData: OrderData) {
const idempotencyKey = crypto.randomUUID()
return await api.post('/orders', orderData, {
headers: {
'Idempotency-Key': idempotencyKey,
},
})
}
// Store it for retries
sessionStorage.setItem('order-idempotency-key', idempotencyKey)
I test:
- Duplicate submission with same key = same result
- Network retry doesn't create duplicates
- Browser refresh doesn't duplicate order
- Back button doesn't duplicate action
8. Session & Auth Edge Cases
I verify:
- Session expiry during flow redirects to login, preserves state
- Login redirect returns to correct flow step
- Token refresh happens transparently
- CSRF tokens regenerate correctly
- Multi-tab behavior is safe
9. Browser Compatibility Matrix
For revenue-critical paths, I test:
- Chrome (current, current-1)
- Safari (current, iOS Safari)
- Firefox (current)
- Edge (current)
Specifically testing:
- Form validation
- Payment input masking
- Autofill behavior
- Back/forward navigation
- Refresh mid-flow
10. Observability Verification
Before shipping, I verify I can answer:
- How many users are in this flow right now?
- What's the current success rate?
- What's the P95 latency?
- What's the most common error?
- Can I filter by user ID to debug specific issues?
Tradeoffs
Comprehensive testing:
- ✅ Prevents costly incidents
- ❌ Slower shipping velocity
- ❌ Requires more upfront planning
Heavy instrumentation:
- ✅ Fast debugging in production
- ❌ Performance overhead
- ❌ Analytics data costs
Feature flags:
- ✅ Safe rollout & instant rollback
- ❌ Code complexity
- ❌ Technical debt if not cleaned up
Pessimistic UI:
- ✅ No false positives
- ❌ Feels slower to users
- ❌ Requires user education
What I'd Do Differently Today
If starting fresh:
-
Build monitoring first. Ship instrumentation before features. You can't debug what you can't see.
-
Write failure scenarios as tests. Use tools like Playwright to test network failures, timeouts, and race conditions.
-
Default to pessimistic UI. Optimistic updates are for social features, not payment flows.
-
Document state machines visually. A diagram prevents bugs better than code review.
-
Practice rollbacks. Do it in staging monthly so you're confident in production.
The goal isn't zero bugs. The goal is zero costly bugs. Every check on this list prevents a production incident that wakes someone up at 2 AM.
The difference between a senior engineer and a staff engineer is knowing which corners you can cut and which ones will cost six figures.
Revenue-critical features? Don't cut corners.