"We Have Data" — The Biggest Lie in AI Projects

"We have data." I hear this sentence in the first meeting of almost every AI project. It's usually said by someone from the board or the IT director. They say it with conviction. And almost every time, it turns out to be untrue.

Not because they're lying. Because "we have data" in the organization's understanding means something completely different from "we have data ready to be used by AI."

Four Systems, Zero Consistency

The typical scenario I see in mid-to-large companies looks like this: customer data lives in the CRM, operational data in the ERP, financial data in a separate system, and reports are created in Excel based on exports from all three sources. Nobody has ever sat down to reconcile this data. Each system has its own logic, its own formats, its own field definitions.

When you ask to consolidate this data in one place, it turns out that:

The same customer has three different identifiers across three systems.
Address data is in different formats and varying degrees of currency.
Some fields are populated, some are not — and nobody knows why.
Data structure documentation either doesn't exist or dates from 2019 and no longer reflects reality.

This is not the exception. This is the norm.

The Truth Emerges When a Human Sits Down With the Data

The moment when the "we have data" illusion shatters is always the same: when someone — a human, not a machine — sits down and tries to perform a specific task using that data. Not a report. Not a dashboard. A specific business task that was supposed to be automated by AI.

And then comes a series of questions: "Where do I get this value?" "Why is this field empty?" "Is this current or historical?" "Who owns this data?" Most of these questions go unanswered — because nobody ever asked them. The data existed, the systems worked, people somehow managed by creating workaround after workaround.

AI doesn't do workarounds. AI needs consistent, complete data with a clear structure. And that's where the problem begins.

"We Have Data" vs. "Data Is Ready"

These are two fundamentally different statements. "We have data" means: somewhere in the organization there are digital records related to our operations. "Data is ready" means: those records are consistent, complete, documented, accessible in one place, and suitable for use in a specific purpose.

Between the two lies a chasm. And that chasm is expensive. I've seen projects where 70% of the "AI deployment" budget went toward cleaning up data. Not on the model, not on integration, not on UX — just on having something usable in the first place.

That's not a bad scenario. That's an honest scenario. The bad scenario is one where nobody checks the data, builds a proof of concept on demo data, and then wonders why nothing works in production.

Why Companies Deceive Themselves

Because the truth is uncomfortable. Telling the board "we don't have data ready for AI" means saying that years of IT investment haven't built the foundation we need. That's a hard conversation. It's much easier to say "we have data" and move to the next slide with the deployment timeline.

But that ease ends the moment you try to actually use the data. And then the project either stops (costly but honest) or — worse — pushes forward by force, producing results that look good but have no basis in reality.

What to Do Instead

Before you start any AI project, conduct a data readiness audit. Not an IT audit. Not a systems review. An audit that answers simple questions:

Where is the data needed for this specific task? Not "what data do we have" — that question leads nowhere. The question is: what data do we need for this one specific use case, and where is it?
Is this data consistent across systems? Is customer X in the CRM the same customer X in the ERP? Do the values match? Are the formats compatible?
Is there documentation? Not "general system documentation," but a description of what each field means, who fills it in, how often it's updated, and what values are permitted.
Who owns the data? Who is responsible for its quality? Who decides on changes? If the answer is "nobody" — that's the first thing to fix.
Can a human complete the task using this data? This is the test I wrote about in the previous article. If the expert can't, AI won't work miracles.

The Honest Approach Is Cheaper

I know this isn't what the board wants to hear at an AI project kick-off. But an honest assessment of data readiness at the outset is many times cheaper than discovering the truth midway through the project. Projects where I spend the first month with the client on a thorough data diagnosis succeed far more often than those where we immediately start building a model "because we have data."

The AI industry loves success stories. Nobody talks about projects that died because the data turned out to be useless. And those projects are the vast majority.

Next time someone in a meeting says "we have data" — ask: "Show me. What exactly will AI work on? Where will the data come from? Who is responsible for it?" If answers don't come quickly and concretely — that's a signal to start with the foundations, not the model.

If you want an honest assessment of whether your organization's data is ready for an AI project, I invite you to a conversation — Leszek Giza.

"We Have Data" — The Biggest Lie in AI Projects

Four Systems, Zero Consistency

The Truth Emerges When a Human Sits Down With the Data

"We Have Data" vs. "Data Is Ready"

Why Companies Deceive Themselves

What to Do Instead

The Honest Approach Is Cheaper

Interested in AI consulting?

Related articles

Four Systems, Zero Consistency

The Truth Emerges When a Human Sits Down With the Data

"We Have Data" vs. "Data Is Ready"

Why Companies Deceive Themselves

What to Do Instead

The Honest Approach Is Cheaper

Interested in AI consulting?

Related articles

If a Human Can't Do It With Your Data, AI Can't Either

You Don't Need More AI Tools — You Need Someone Who'll Say "No"

Your AI Pilot "Succeeded" — That's Why Nothing Came of It