Kill Bad Ideas Early

Lee, Kyu Ho

Kill Bad Ideas Early

Lee, Kyu Ho · 2025. 11. 11. · Open Access (CC BY-NC 4.0)

초록(Abstract)

This study evaluates whether LLM-driven persona simulations can predict post-launch outcomes and, therefore, help teams kill weak ideas early. Using standardized inputs (rich personas, structured service scenarios) and fixed axes for value and convenience, three outputs are generated: a Persona–Service Preference Matrix, a Brand Perceptual Map (Value × Convenience), and a Churn–Elasticity Matrix. These simulation results are compared with ex-post logs via RMSE, MAE, and Spearman rank correlation. Findings show that LLMs align with directional trends and rankings but systematically overestimate levels and compress variance. Consistency improves through multi-model/seed ensembles, tighter scenario specificity, outlier control, and post-hoc calibration using real logs. The paper proposes an operational loop—LLM draft → log-based calibration → operational forecasting—plus governance practices (quarterly refresh, holdouts, transferability checks). LLMs are positioned as an auxiliary inference layer that accelerates experiment design, not a substitute for human data.

키워드

LLMpersona simulationprelaunch validationsurvey-behavior gapsampling biasprice elasticitychurn analysis

PDF

PDF 다운로드

빠른 Q&A

How does this paper help teams kill bad ideas early?

It provides a fast, standardized prelaunch check that compares LLM-simulated demand, retention, and elasticity against real logs. Ideas that fail to meet calibrated thresholds on the positioning and churn–elasticity maps can be deprioritized before expensive launches.

Are LLM simulations accurate enough to replace human data?

No. They are useful for direction and prioritization but show overestimation and under-variance. Reliability reaches operating thresholds only after log-based calibration, ensembles, and routine validation with holdouts.

What are the key governance practices recommended?

Adopt a quarterly refresh of maps, track RMSE/MAE/Spearman as KPIs, run service/segment holdouts, test temporal and segment transferability, and maintain a loop of LLM draft → log calibration → operational prediction.