All Signals

When Data Became Infrastructure

Governance used to live where data was stored. Increasingly, it has to live where data is delivered.

Kevin

Kevin · Founder

Share

In 2017, The Economist argued that the world’s most valuable resource was no longer oil, but data. That framing was directionally right. It captured the economic importance of data at a moment when companies were racing to collect, centralize, and monetize as much of it as they could.

But nearly a decade later, the more useful question is no longer just how valuable data is. The more useful question is how data moves.

Data does not sit quietly in a warehouse waiting for a quarterly report anymore. It is requested in real time by products, partners, internal tools, compliance workflows, embedded services, and increasingly by software systems acting on behalf of people. The operational challenge is no longer just storing data safely. It is delivering the right data, to the right consumer, in the right form, with the right controls, every time.

That is why we think data governance is changing shape.

The old center of gravity

For a long time, governance lived close to storage.

If a team wanted to govern data, they usually started with the database, warehouse, lake, or export pipeline. Access was granted at the system level. Sensitive fields were handled in downstream ETL. Policies lived in documentation, review processes, and committee approvals. If a new consumer needed access, the common pattern was to create another copy, another extract, or another custom endpoint.

That model made sense in a world where most consumers were human analysts and most usage was relatively predictable.

A common version of this still looks familiar: a partner needs weekly access to customer activity, so a team spins up a CSV export, strips a few columns by hand, emails the spec around for approval, and moves on. Then the source schema changes, a new field appears, and the organization has to rediscover where governance was supposed to happen.

It makes much less sense now.

Today’s data environment is shaped by a very different set of realities:

  • the same source data is reused by many consumers with very different needs
  • those consumers increasingly expect live or near-live access
  • policy obligations vary by geography, partner, product surface, and use case
  • machine consumers do not wait for a manual review cycle every time they need a field
  • every new copy of data creates another governance problem instead of solving one

The result is that governance can no longer be mostly about where data rests. It has to be about how data is exposed.

The boundary has moved

We increasingly see the real boundary not at the database, but at the interface.

That interface might be an API endpoint, a private marketplace listing, a governed export, or an MCP server. What matters is that it is the moment where data crosses from internal system of record to external or semi-external use.

That is the moment where decisions have to be made:

  • Which fields should leave the source system at all?
  • Which identifiers should be tokenized instead of revealed?
  • Which consumer should get a purpose-limited view instead of the full record?
  • Which geographies are allowed?
  • Which usage should expire automatically?
  • Which delivery method should be available to which type of consumer?

These are governance questions, but they are no longer abstract governance questions. They are runtime questions.

The new center of gravity looks more like this:

ThenNow
System-level access to raw storesPurpose-limited access to governed interfaces
Policy in documents and review workflowsPolicy enforced directly in delivery pipelines
New request means new extract or copyNew request means a new governed view
Human-first consumptionHuman, software, partner, and automated consumption
Governance checked periodicallyGovernance enforced continuously

This is not just a tooling change. It is a shift in architecture.

Copy-first governance breaks down quickly

A lot of legacy data governance still assumes that when a new team or external party needs data, the answer is to prepare a separate copy for them. Sometimes that copy is a CSV export. Sometimes it is a partner feed. Sometimes it is a custom API assembled for one use case and forgotten six months later.

The short-term appeal is obvious. A copy feels simple. It feels contained. It feels like a clean handoff.

But once data starts moving across many consumers, copy-first governance becomes expensive and brittle:

  • every copy has to be re-governed
  • every schema change has to be tracked across multiple downstream artifacts
  • every new consumer introduces another one-off transformation path
  • ownership gets blurry because the source of truth and the served version diverge
  • policy enforcement becomes inconsistent because it is spread across scripts, people, and custom integrations

This is one of the reasons we think “data as asset” is now an incomplete framing. Assets are often managed as things you hold. Infrastructure is managed as something many parties depend on, often at once, under explicit operating rules.

That is much closer to how modern data behaves.

Governance has to become executable

If governance is happening at the point of delivery, then it cannot live only in slide decks, approval emails, or internal policy wikis. It has to become executable.

By executable, we mean that the rules travel with the interface itself. They are not suggestions for downstream teams to remember. They are applied directly, consistently, and automatically before data is delivered.

This is the pattern we keep coming back to:

  1. Start with a live source of truth.
  2. Define purpose-specific rules for how that data can be exposed.
  3. Enforce those rules at request time.
  4. Deliver different governed views to different consumers without cloning the source.

In DataHarbor, that shows up as a Virtual API with declarative Data Control rules. A simple example looks like this:

version: "0.3"
objects:
  customers:
    controls:
      - type: redact
        fields: [ssn, passwordResetToken]
      - type: tokenize
        fields: [email, phone]
      - type: mask
        fields: [accountNumber]

With that configuration in place, the enforcement happens in the delivery path rather than in a downstream copy process.

curl -H "dataHarbor-api-key: YOUR_API_KEY" \
  https://service.dataharbor.co/fetch/YOUR_VAPI_ID/customers

The important part is not the syntax. The important part is the architectural shift behind it. Governance is no longer a memo attached to the data. Governance becomes part of the interface contract.

One source, many governed consumers

Once you view data as infrastructure, a different design pattern starts to make sense.

You do not create one raw feed and then hope every downstream consumer behaves. You publish multiple governed views from the same source, each one aligned to a purpose.

For example:

ConsumerWhat they needGoverned delivery pattern
External partnerOperational records, but no direct PIIRedact sensitive fields, tokenize identifiers, set expiration
Internal analytics teamConsistent identifiers for trend analysisTokenize identifiers with memory, preserve structure for correlation
Customer-facing product featureLow-latency live data with strict field limitsNarrow object scope, explicit allowlist, usage controls
Automated assistant or agent workflowStructured access to task-relevant contextSame governed view, delivered through API or MCP

Notice what changes here. The source does not multiply. The governance does not get reimplemented from scratch each time. What changes is the served view.

That is a much stronger operating model than copying datasets around the company and asking everyone to remember what they are allowed to do with them.

Why this matters more now

Automated consumers need the contract to be clear before data arrives.

That is the clearest sign that this shift is no longer theoretical. More workflows are now initiated by systems that query, summarize, route, classify, trigger, and act without a human manually reshaping the data first.

Human consumers can sometimes work around messy interfaces. They can open a spreadsheet, ignore a column, or ask someone in Slack whether a field is safe to use. Automated consumers are much less forgiving. They need the interface to be shaped in advance, and they need policy to be applied before the data arrives, not after.

This is one reason we have been spending so much time on delivery patterns like REST endpoints, private and global marketplace access, and MCP. The delivery channel matters, but the deeper point is that governance has to survive intact across all of them.

If the same data is safe in one channel and unsafe in another, the problem is not the channel. The problem is that governance is not attached closely enough to delivery.

A better question for the next decade

The old question was: who owns the data?

That still matters, of course. But the more operational question now is: who can consume which slice of the data, for which purpose, through which interface, under which controls?

That is a more demanding question, but it is also a more practical one. It acknowledges that modern organizations do not win by locking data away forever. They win by making data reusable without making it reckless.

The companies that adapt fastest will not be the ones with the largest pile of raw data. They will be the ones that can publish governed, purpose-limited interfaces quickly and confidently as new consumers show up.

That is the posture we think the market is moving toward: fewer unmanaged copies, more runtime enforcement, and more governed delivery from a live source of truth.

In other words, governance stops being a layer added after data architecture. It becomes part of the architecture itself.

We built DataHarbor around this idea from the start: one source of truth, many governed views. That perspective matters for partner APIs, internal reuse, marketplace distribution, and a world where more consumers are software systems operating at machine speed.

But the broader point is bigger than any one product category. Data governance is moving away from static ownership models and toward controlled interoperability. The systems that win will be the ones that make safe reuse easy.

That is the real shift we see behind the headlines.

Share
Kevin

Thanks for reading. If you have questions or want to see this in action, reach out at hello@dataharbor.co.

— Kevin, Founder

Go to Dashboard