The Skill That Determines Whether Your AI Project Succeeds Before It Starts
There is a particular kind of failure that haunts analytics, machine learning, and AI projects: quiet, expensive, and largely preventable. The model performs well on the training data. The dashboard looks impressive. The presentation lands confidently. Then six months later, nothing has changed in the business, and no one can quite explain why.
The answer is almost always upstream of the technology. The problem was never properly framed. The wrong things were being measured. Or success was defined so vaguely that any outcome could pass for progress.
Problem framing and measurement are the most underrated skills in data science. They don’t appear on job descriptions as prominently as Python or PyTorch. They don’t generate conference talks. But they determine whether a project delivers insight or just activity.
Before the Data, Before the Model
Consider what happens at the start of most analytics projects. Someone identifies a business challenge: customer churn is climbing, margins are compressing, fulfillment is slower than it should be. The natural impulse is to reach for the data. What do we have? What can we pull? What does a quick analysis show?
That impulse skips the hardest part.
A problem stated loosely remains loose all the way through analysis. “We have a customer churn problem” is an observation, not a problem statement. It tells you something is wrong but nothing about where to look, what to measure, or what a solution would actually require. You can train a churn model on that framing, but you’ll be answering a question that was never clearly asked.
The discipline of problem framing forces a different kind of thinking before any tool gets opened.
One useful first distinction: is this a puzzle or a mystery? A puzzle has a definitive answer waiting to be found. Given enough information, you can solve it completely. If your monthly revenue numbers don’t reconcile with your payment processor’s records, that’s a puzzle. There is a specific discrepancy with a specific cause, and when you find it, you’ll know.
A mystery has no single correct answer. It’s irreducibly uncertain, shaped by human behavior, market dynamics, and conditions that shift. Why are your highest-value customers churning at twice last year’s rate? That’s a mystery. You can investigate, identify contributing factors, build models that estimate risk. But there is no moment where the answer clicks into place and the case is closed. The underlying dynamics keep moving.
The test is straightforward: if you could have perfect information, would the problem be solved? If yes, it’s a puzzle. If perfect information would help but still leave genuine uncertainty about what happens next, it’s a mystery.
This distinction matters because puzzles and mysteries require different approaches. Treating a mystery like a puzzle leads to false precision, a model that outputs a confident prediction about something fundamentally unpredictable. Treating a puzzle like a mystery leads to endless hedging when a cleaner answer exists. Before choosing a method, know which kind of problem you’re actually working on.
Expand the Frame Before You Narrow It
Once a problem is named, the instinct is to go narrow immediately. Define scope, constrain variables, make it tractable. That narrowing is necessary, but premature narrowing cuts off insight before it can form.
A more productive move is to first expand the frame. Who are the entities affected by this problem? Not just customers, but which customers, under what conditions, in what context? How does this problem connect to adjacent systems: operations, finance, supply chain, the experience of frontline employees? Where does this specific problem sit within a larger pattern?
Expanding the frame before narrowing it often reveals that the stated problem is a symptom of a different, more fundamental one. A company asks why customer satisfaction scores are declining. The framing starts narrow: survey responses, support ticket volume, resolution time. Expand the frame and you find that satisfaction varies dramatically by product category, that certain product lines have higher return rates, that those returns correlate with a specific fulfillment center’s packaging error rate. The real problem was operational quality in one node of the supply chain.
This is the diagnostic equivalent of a doctor asking about sleep, stress, and diet before prescribing medication for a headache. The presenting symptom is real. The cause is often somewhere else entirely. Problem framing is the discipline of looking upstream before committing downstream.
Breaking complex problems into components serves a similar purpose. A large, undifferentiated problem is hard to analyze because it’s hard to isolate variables. But broken into components, each piece becomes tractable. You can find data for it, define a metric that captures it, and make progress independently while understanding how it connects to the whole.
One more technique that saves significant time: modify what you’re solving for when the original target is too hard to measure directly. Predicting exact revenue next quarter is extremely difficult. Predicting whether revenue will grow or decline is easier, and often sufficient for the decision being made. Adjusting the precision of the target, without abandoning its relevance, often transforms an intractable problem into a useful one.
Measurement Is Not Automatic
None of the above matters if the underlying measurement infrastructure doesn’t exist. And for many organizations, it doesn’t.
Measurement is the foundation that makes analytics possible. This sounds obvious, but its implications are easy to underestimate. Every business generates financial data: revenue, expenses, margin. That data tends to be captured reliably because accounting requires it. But operational data, how customers actually behave, where processes slow down, which products are gaining momentum and which are quietly degrading, is often sparse, inconsistent, or simply absent.
If a question matters to the business, there needs to be a system capturing the answer over time. That system doesn’t have to be sophisticated at the start. A spreadsheet, maintained consistently, produces genuinely useful data. The variable that tends to be underestimated is time. Data about one month tells you very little. Data about twelve months, thirty-six months, across multiple cycles and seasons, reveals patterns worth acting on. The cost of not starting to measure something today compounds quietly until the moment you realize you needed that data two years ago.
Four categories of performance metrics apply to nearly every business, regardless of industry or size. Customer metrics reveal which customers drive disproportionate value and where behavior is shifting. Product metrics show which offerings are gaining traction and where profitability hides or disappears. Process metrics surface where operations accumulate friction, delay, and waste. Employee metrics capture productivity and collaboration patterns that affect output quality.
Most organizations track some of these. Few track all of them with the consistency and granularity to answer hard questions. The gaps tend to show up precisely when a business needs insight most: during a downturn, a competitive threat, or a strategic pivot, when there’s no historical data to reason from.
What This Has to Do With ML and AI
The failure modes of poorly framed analytics problems are amplified when machine learning or AI is involved. A predictive model trained on the wrong target variable learns to optimize for the wrong thing, efficiently and at scale. A model built without clarity on how it will be used produces predictions that never get acted on. A generative AI system deployed without measurement infrastructure can’t be evaluated, improved, or trusted.
There is a specific way this differs from traditional analytics failures. When a spreadsheet analysis gets the framing wrong, the error is usually visible. The numbers don’t add up, or the conclusion contradicts something obvious. When a machine learning model gets the framing wrong, the error can be invisible for months. The model produces confident outputs. The outputs look reasonable. Everyone assumes the system is working because it’s producing predictions on schedule. The failure surfaces only when someone finally asks whether those predictions changed any outcomes. Often, the answer is no. Generative AI adds another dimension: the outputs can sound authoritative and well-reasoned even when the underlying problem was never framed correctly. The eloquence of the output masks the emptiness of the question.
The pattern that precedes most failed ML and AI projects is recognizable in retrospect: the team moved quickly from problem identification to model development, skipping the work of defining precisely what success would look like, what data would be needed to demonstrate it, and whether that data existed and could be trusted. The model becomes a solution in search of a clearly defined problem.
Framing a problem correctly, understanding whether it’s a puzzle or a mystery, identifying the entities and systems involved, breaking it into solvable components, adjusting the target for tractability: this is the most consequential work in any analytics, ML, or AI project. It determines whether the sophistication that follows gets applied to something real.
Analytics serves three purposes in a business: equipping decision-makers with better information, improving what the business offers its customers, and making operations more efficient. All three require that the right questions be asked before the data is pulled, that the right things be measured before patterns can be found, and that success be defined precisely enough to know when it has been achieved.
The models are rarely the problem. The framing almost always is.
Start there.