A billion AI agents walk into a power grid — and Postgres may be part of the answer

Running AI inference closer to the data, inside the database itself, could cut the energy and sovereignty costs of centralised cloud AI. The case is plausible. The evidence is still thin.

Written by Lena Armitage · Bureau Tech · Jun 1, 2026

The claim worth interrogating

Somewhere between the hype cycle for large language models and the very real anxiety about datacenter power consumption, a quieter argument has been gaining traction: what if you ran AI inference *inside the database*, rather than shipping data out to a cloud API?

The Register flagged this framing in late May 2026, anchoring it to PostgreSQL — the open-source relational database that underpins a significant share of enterprise data infrastructure — and to two distinct pressures: energy costs and data sovereignty.

Both pressures are real. The inference-in-Postgres response to them is architecturally coherent. Whether it actually delivers at scale is a different question, and the honest answer right now is: we don't fully know yet.

What 'AI in Postgres' actually means

To be precise about the architecture: this is not about storing AI model weights in a database table. The proposal is to run inference — the process of passing input data through a trained model to get a prediction or generated output — as a native operation within the database engine, using extensions or embedded runtimes.

Postgres has a mature extension system. Projects like `pgvector` (for storing and querying vector embeddings, the numerical representations that underpin semantic search and retrieval-augmented generation) have already demonstrated that ML-adjacent workloads can live comfortably inside Postgres. The next step — running a small model's forward pass directly in the database process — is technically feasible with runtimes like ONNX or llama.cpp.

The practical implication: a query that previously required an application to call an external API, wait for a response, and pipe the result back into a database transaction could instead execute entirely within the database server.

The energy argument

The 'billion AI agents' framing in the source headline is doing rhetorical work, but it points at something real. Aggregate inference demand — across enterprise applications, consumer products, and the autonomous software agents now being deployed at scale — is a material contributor to datacenter power load.

Centralised inference clusters are energy-intensive in ways that go beyond the GPU compute itself: cooling, power conversion losses, and the network infrastructure required to serve millions of concurrent requests all add overhead. Distributed inference, run closer to where data already lives, theoretically reduces some of that overhead.

The word 'theoretically' is doing load-bearing work in that sentence. Published, independently verified energy comparisons between centralised cloud inference and Postgres-native inference for equivalent workloads are not, to my knowledge, in the public record as of this writing. The efficiency argument is plausible from first principles. It has not been rigorously quantified in this specific context.

The sovereignty argument

Data sovereignty — the legal principle that data is governed by the laws of the jurisdiction where it is stored or processed — is a more concrete and immediately actionable driver.

For organisations subject to GDPR in the EU, sector-specific regulations in financial services or healthcare, or national data-localisation requirements in markets like India or Brazil, sending data to a US-based cloud AI API is not always legally straightforward. Contractual safeguards exist, but they add compliance overhead and residual risk.

Running inference on-premises, or in a jurisdictionally appropriate cloud region, directly addresses this. If the model runs inside a Postgres instance that never leaves a compliant environment, the data-egress problem largely disappears. This is a genuine enterprise use case, not a theoretical one — procurement teams are already asking these questions.

What Postgres can and can't do here

Postgres is a credible platform for this kind of work. Its extension architecture is flexible, its community is large, and its deployment footprint is enormous. The `pgvector` extension's rapid adoption suggests the ecosystem can absorb ML-adjacent tooling quickly when the use case is clear.

The constraints are also real. Postgres is optimised for transactional and analytical query workloads, not for the memory bandwidth and parallelism that large model inference demands. Embedding a capable language model inside a database process raises questions about resource contention, operational complexity, and the size of models that are actually practical in this context.

Small, task-specific models — classifiers, embedding generators, lightweight generative models — are much more plausible candidates for in-database inference than frontier-scale models. The use cases that fit this architecture well are probably narrower than the broadest version of the pitch implies.

The gap between the pitch and the evidence

The framing — Postgres as an answer to the datacenter energy crisis — is ambitious. The underlying ideas are worth taking seriously. But 'an answer' to a crisis implies demonstrated, measurable impact, and that bar has not been cleared in the public literature for this specific approach.

What is supportable: in-database inference is a legitimate architectural pattern that addresses real data-sovereignty requirements and could, in principle, reduce some categories of inference-related energy overhead. What is not yet supportable: that it constitutes a meaningful solution to grid-scale AI power demand.

The distinction matters. Smart organisations evaluating this architecture should do so on the sovereignty and latency merits, which are solid, rather than on energy savings claims that remain unquantified.

Key takeaways

Running AI inference inside PostgreSQL (rather than calling out to a cloud API) keeps data local, which matters for GDPR and similar data-sovereignty regulations.
The energy argument rests on eliminating network round-trips and centralised GPU cluster overhead — plausible, but not yet backed by published, independent benchmarks for Postgres-native inference specifically.
The 'billion AI agents' framing reflects a real infrastructure concern: aggregate inference demand at scale is already straining power grids, and distributed architectures are one proposed mitigation.
Postgres is an open-source relational database with a large extension ecosystem, making it a credible host for embedded ML runtimes — but production-grade AI-in-Postgres tooling is still maturing.
Data sovereignty (the legal principle that data is subject to the laws of the country where it is stored or processed) is a growing enterprise procurement requirement, and on-premises or edge inference directly addresses it.

FAQ

What does it mean to run AI inference 'inside' PostgreSQL?

It means executing a machine learning model's prediction step as a native database operation — using a Postgres extension or embedded runtime — rather than calling an external API. The data never leaves the database server to get an AI-generated result.

Why does data sovereignty matter for AI inference specifically?

When an application sends data to a cloud AI API, that data is processed in the cloud provider's infrastructure, which may be in a different legal jurisdiction. Regulations like GDPR, and national data-localisation laws in various countries, can restrict or complicate this. Running inference locally keeps data within a controlled, compliant environment.

Is Postgres actually capable of running large language models?

For frontier-scale models (think GPT-4-class), no — the memory and compute requirements are incompatible with running inside a database process alongside normal workloads. For smaller, task-specific models (classifiers, embedding generators, compact generative models), it is technically feasible and increasingly supported by extensions.

What is pgvector, and how does it relate to this?

pgvector is a popular Postgres extension that adds support for storing and querying vector embeddings — the numerical representations used in semantic search and retrieval-augmented generation (RAG). It demonstrates that ML-adjacent workloads can run inside Postgres, and it is often a stepping stone toward more ambitious in-database AI capabilities.

Are the energy savings from in-database inference proven?

Not in the public record, at least not for Postgres-native inference specifically. The efficiency argument is plausible from first principles — eliminating network round-trips and centralised cluster overhead does reduce some energy costs — but independently verified, quantified comparisons for this architecture have not been published as of mid-2026.

Citations

AI and data sovereignty in Postgres: An answer to the datacenter energy crisisFrames Postgres-native AI inference as a response to both data sovereignty requirements and datacenter energy demand; introduces the 'billion AI agents' load framing.
The Register — AI and ML coverageBureau research source for AI infrastructure reporting.
pgvector: Open-source vector similarity search for PostgresDemonstrates that ML-adjacent workloads — specifically vector embedding storage and similarity search — can run natively inside PostgreSQL via the extension system.
PostgreSQL: The world's most advanced open source relational databasePrimary reference for PostgreSQL architecture, extension system, and deployment characteristics.