Founder's article: Read why Duodata was founded
    Back to Archive
    data analytics
    modern data stack
    semantic layer
    metric definitions
    metrics governance
    business approved metrics
    data semantics
    analytics trust
    data governance
    AI analytics
    LLMs and data
    Snowflake
    dbt
    BI dashboards
    KPI alignment

    The Great Spreadsheet Wars: Why Your Modern Data Stack Still Needs a Translator

    In this post, Shawn Johnson explains why analytics problems are really semantic problems. Different teams define the same metrics differently, dashboards drift, and trust erodes. He shows how a business-approved metrics layer brings shared meaning back to data, reduces governance friction, and gives AI the context it needs to produce reliable answers. The takeaway: speed without shared meaning is expensive chaos. Fix the context, not the pipes.

    Shawn Johnson
    Shawn Johnson
    Semantic Engineering Advisor
    December 29, 20256 min read
    The Great Spreadsheet Wars: Why Your Modern Data Stack Still Needs a Translator

    If you have been in data as long as I have, you remember the "Before Times".

    I am talking about the days before the cloud made everything instantaneous. Back when I was cutting my teeth at Huron and OppenheimerFunds, "Big Data" mostly meant "Big Excel File that crashes your laptop if you look at it wrong".

    I have a distinct memory of sitting in a conference room, watching two VPs nearly come to blows over a quarterly report.

    The VP of Sales had a spreadsheet saying we made $10M.
    The VP of Finance had a printout saying we made $8.5M.

    Neither of them was lying. They just had different definitions of "Revenue". One was counting booked contracts, the other was counting recognized cash.

    And me? I was the poor twenty-something systems analyst in the corner, furiously trying to reverse-engineer two different SQL scripts to figure out whose number was "more right".


    Speed Did Not Fix the Argument

    We have spent the last decade building incredible technology to fix the movement of data.

    At Fivetran, I helped companies move mountains of information. We solved the pipeline problem. We can now spin up a Snowflake warehouse in the time it takes to brew a coffee, a task that used to take me six weeks of paperwork and begging for server rack space.

    But despite all this speed, we are still having that same argument in the conference room.

    We just have faster dashboards to argue over.

    The problem is not the plumbing anymore.
    It is the semantics.


    The "Tower of Babel" Problem

    Here is the dirty secret of the modern data stack: we made it too easy to hoard data without explaining what it means.

    When I was moving data at VaultSpeed or SqlDBM, we focused heavily on the structure, tables, keys, and columns. But the database schema is cold. It does not care about business logic.

    A column named Total_Amt looks innocent enough, but:

    • Does it include tax?
    • Does it include shipping?
    • Does it subtract returns?

    Without a semantic layer, every single analyst in your company has to decide on that definition for themselves.

    I call this "The Great Interpreter Tax".

    Every time a data scientist writes a SQL query, they are translating raw data into business logic. If you have 50 analysts, you have 50 different translators.

    And just like a game of Telephone, by the time the data reaches the CEO, the message is garbled.


    A Rosetta Stone for Metrics

    This is where tools like Duodata are becoming the new heroes of the stack.

    It is not just another catalog. It is a business-approved metrics layer where teams define and govern metric definitions in one place, then project them into downstream platforms so everyone measures the business the same way.

    Not by dumping tables and joins on business users, but by making the business meaning explicit and portable.


    How Semantic Modeling Stops the Arguments

    Semantic modeling is essentially building a layer of Business English on top of your Technical Gibberish.

    In my experience, 90 percent of "Data Quality Issues" are not actually broken data.

    The pipeline did not fail.
    The API did not time out.
    The data is fine.

    The context is missing.

    I once spent three days debugging a "critical error" where a dashboard showed zero growth. Turns out, the data engineer had filtered out "Test Accounts" using a flag is_test = 1, but the Marketing team was filtering them using emails like %@test.com.

    The data was not broken.
    The definition of a "Real Customer" was.


    Define Once, Use Everywhere

    A tool like Duodata, together with the semantic layers it feeds, fixes this by centralizing metric definitions in one governed place.

    You define "Revenue" once in Duodata's conceptual metric layer, then project it into the semantic layer and downstream tools.

    • The BI tool can consume that definition.
    • The Python script can reference that same governed logic.
    • The AI agent can use it too, with guardrails.

    If you change the logic and promote it, it updates everywhere that consumes that semantic definition.

    No more hunting through 400 SQL scripts to find where someone hard-coded a WHERE clause.


    Reducing the "Governance Tax" (And Saving Your Sanity)

    Let us be honest. Nobody likes "Data Governance".

    In the old days, governance meant a poor soul named "The Data Steward" walking around with a clipboard, yelling at people for not filling out metadata fields.

    It was a thankless job that cost companies millions in headcount and slowed everything down.

    During my time in sales engineering, I learned that the only way to sell governance is to make it invisible.

    Semantic modeling reduces governance costs because it automates the heavy lifting.

    • Documentation happens automatically
      Instead of writing a Word doc that nobody reads (and I have written hundreds of those), the semantic model is the documentation.

    • Lineage is instant
      When a number looks weird, you can trace it back through the metric definition, upstream metrics, and the declared source systems.

    You stop paying high-salaried engineers to be data janitors and start paying them to actually build things.


    The AI Elephant in the Room

    There is another reason this matters right now, and it is looming large: artificial intelligence.

    Everyone wants to point an AI agent at their database and say:

    "Hey Computer, how can we save money next quarter?"

    But if you point an LLM at a raw data warehouse, you are asking for trouble.

    The AI does not know that the table legacy_sales_2019_do_not_use contains bad data. It just sees "Sales" and does the math.

    Without semantic modeling, your AI is far more likely to hallucinate. It will confidently tell you that your most profitable product is a test SKU created by a developer named Steve in 2021.

    Tools like Duodata provide the context and guardrails that AI needs to be useful. It helps ensure the robot is reading from the same dictionary as the CEO.


    The Bottom Line

    I have worn a lot of hats in this industry, from the guy writing scripts at 2 AM, to the architect designing cloud migrations, to the guy selling the vision.

    If there is one lesson I have learned, it is this:

    Speed without direction is just expensive chaos.

    We have the speed. The modern data stack is a Ferrari.

    But if you do not have hardened business concepts feeding a semantic model, a clear, agreed-upon map of what your data actually means, you are just driving that Ferrari in circles in a parking lot.

    It is time to stop fixing the pipes and start fixing the context.

    Your data team, and your sanity, will thank you.

    Shawn Johnson

    Shawn Johnson

    Semantic Engineering Advisor

    Modern data‑stack sales engineer who scales adoption through demos, enablement, and best‑practice content.

    Enjoyed this?

    Get new posts and frameworks from The Metrics Letter. One email when we publish—no spam.

    Comments

    No comments yet. Be the first to share your thoughts!