Conundrum

Monday, April 27, 2026

AI Is the Builder. Now Make It the Operator.

We celebrated AI accelerating delivery. We haven’t yet asked who is supposed to operate everything it built.

This is the third post in a series about what AI-assisted development actually reveals about enterprise technology programmes: not the headline story, but the structural one underneath it.

In the first post, I made the case that the enterprise modernisation playbook is broken. The evidence was UnderwriteAI: a production-grade, APRA-compliant insurance platform I built entirely in my spare time, using GitHub Copilot powered by Claude Sonnet 4.6, across 41 working sessions. Eight microservices, a React portal, an API gateway, real-time Kafka event streaming, 156 automated BDD test scenarios. The kind of system that would normally take 18 to 24 months and a seven-figure budget through a traditional programme model.

In the second post, I asked the more uncomfortable question: could I actually run it in production? The answer was a Kubernetes migration: 145 resources across nine Helm charts, each requiring configuration, resource limits, disruption budgets, and autoscaling rules. Impressive in its own right. But it surfaced something the industry conversation about AI-assisted delivery consistently overlooks: the gap between working software and production software is not a development problem. It is an operational one.

This post is about that gap, and about the question nobody is asking yet.

If AI built the system, why are we asking humans to operate it the old way?

The Handover Nobody Planned For

AI-accelerated delivery compresses timelines in ways that governance models haven’t caught up with. Code is written in hours. Pipelines run in minutes. 145 resources are live before the sprint review has finished. The CAPEX programme declares success and moves on.

Then the operations team gets the handover pack.

They need Configuration Item records for every resource. Some exist in the CMDB, many don’t. Raising requests to create new ones takes time. Each has its own naming conventions, approval workflows, and lead times that have not changed because the build got faster. Then they need application owners willing to accept those CIs into their run budget, which means finding someone willing to absorb support costs, on-call obligations, patching schedules, and incident response from a budget that was set before any of this existed.

No accountable owner accepts a CI mid-OPEX cycle without interim funding. Their budget was set before that resource existed. That is rational behaviour, not obstruction. But it means the CI sits in a grey zone: technically live, operationally orphaned.

This is where unpleasant surprises accumulate, quietly, until an incident or an audit forces the conversation that should have happened at go-live.

CAPEX Governance Hasn’t Kept Pace

The structural problem is the operational transition. CAPEX funded the build, but nobody negotiated the OPEX transfer before the project closed. The project team had budget to deliver. Go-live was the finish line. Nobody scoped the handover.

In a traditional programme, this gap was inconvenient but manageable. The delivery timeline was long enough that operations teams had time to prepare, even if the preparation was informal. The system was built by humans who could explain it. The runbook was written by people who remembered the decisions.

I have lived this pattern before AI was part of the conversation at all. In the mid-2010s, as a Technical Director at Telstra, I worked on large-scale platform programmes where we used a funding model we called PROPEX (programme OPEX, or bridge funding). It was a practical construct: when a CAPEX programme completed, the platform was live but the OPEX budget hadn’t been finalised, resourced, or accepted by an owning team. PROPEX filled the gap. It was intended as a short-term bridge, typically a quarter or two, to keep the lights on while OPEX was forecasted, a receiving team was identified, and ownership was formally transferred.

In practice, PROPEX ran long. Technology managers negotiated the handover. Owning teams pushed back on accepting systems they didn’t ask for, with costs they hadn’t budgeted, and support obligations they weren’t staffed for. The programmes that had declared success at go-live were quietly still funding operations months later. Nobody was being obstructive. Everyone was being rational. But the gap between “delivered” and “operationally owned” was real, recurring, and expensive.

That was before AI-assisted delivery. The conditions that produced PROPEX, a build that outpaces the operational absorption capacity of the receiving organisation, are now structural. AI compresses the build by orders of magnitude without doing anything to accelerate the OPEX planning cycle, the CI creation process, the budget negotiation, or the ownership conversation. If anything, the AI-era version of that gap is harder to close, because the handover is murkier. At Telstra, the team that built the platform could explain it to the team inheriting it. In an AI-assisted programme, the people who directed the build were navigating agent output, and the most detailed record of why the system is structured the way it is may live in a session transcript rather than a document anyone filed. Accepting accountability for a system you can’t fully interrogate is a harder ask than accepting a system with a known author.

A CAPEX initiative in the age of AI needs more than a delivery gate. It needs an operational readiness gate that confirms CI records exist, owners are identified and funded, OPEX forecasts are updated, and the operational knowledge generated during the build has been captured in a form the operations team can actually use. Without that gate, faster delivery does not reduce operational risk. It compresses the window between build and surprise.

For organisations that are not yet positioned to have agents managing CI reconciliation, a practical interim construct is a dedicated landing pad cost centre: a named budget line established before go-live, specifically scoped to absorb newly delivered systems for a defined period (typically one to two quarters) while permanent OPEX ownership is negotiated. Unlike PROPEX, which was typically created after the fact when a programme ran out of CAPEX headroom, a landing pad is designed into the programme from the start. It makes the operational transition period deliberate rather than accidental, and it gives the OPEX negotiation a fixed deadline rather than an open-ended one. The landing pad does not solve the structural problem. But it names it, funds it, and bounds it, which is a significant improvement on leaving it unaddressed until an incident or an audit forces the conversation.

ITIL 4 names this governance gap precisely. The service transition practice covers configuration management, change enablement, and release management. The intent is correct: before a service enters live operation, CI records should exist, ownership should be assigned, and operational knowledge should be transferred. The problem is pacing. ITIL 4 was designed for a world where the delivery timeline was long enough for structured transition activities to run alongside the build. When AI compresses delivery by an order of magnitude, that assumption breaks. Human-executed ITIL processes can no longer keep pace with what is being delivered. The answer is not to discard the practice. It is to execute it at the same speed as the delivery, which means agents.

The Ownership Question Nobody Is Asking

Here is the question I haven’t seen asked directly in the industry conversation about AI-assisted development.

If agents built the system, why are humans expected to maintain it manually?

The current ownership model was designed for a world where a human wrote the code and therefore understood it well enough to support it. The operations team inherited a system from the people who built it. The handover was a transfer of human knowledge.

AI-assisted development breaks that assumption at both ends. The person who “built” the system was navigating agent output, not authoring every component. And the operational artefacts (the runbook, the dependency map, the incident playbook, the CMDB records) are documents that agents can generate from the same source of truth they used to build the system in the first place.

Helm charts are a precise, machine-readable description of what was deployed and how. A Kubernetes manifest encodes resource types, dependencies, health check endpoints, scaling rules, and environment configuration. This is not documentation someone has to write after the fact. It is the deployment artefact itself. An agent that can read a Helm chart and generate a deployment can equally read that Helm chart and derive the CI record, map the resource to the correct CMDB category, cross-reference what is registered against what is running, and surface the gaps.

The CMDB reconciliation problem, which operations teams currently handle through manual discovery, spreadsheet audits, and post-incident retrospectives, is structurally identical to the kind of task AI agents handle well. It involves reading structured data from a reliable source of truth, comparing it against a second structured data source, and producing a reconciled output. The Helm release history is that source of truth. It is currently sitting disconnected from every service management platform in the enterprise.

What the Agent Already Knows

To make this concrete: when the Kubernetes migration for UnderwriteAI completed, the Helm chart contained 145 resources across nine charts. An operations team handling this deployment would face weeks of discovery work to catalogue what had been built, classify each component against CMDB CI classes, identify owners for each tier, and raise the requests to create them. In an organisation running ServiceNow, each CI creation follows its own workflow, naming convention, and approval chain.

An agent reading that Helm chart does not need to discover any of it. The information is already there, structured, complete, and machine-readable.

Here is what that CI register looks like, derived directly from the values file and chart dependencies:

CMDB CI Class	Component	Count	Status	Suggested Owner
Application Service	Policy, Customer, Claims, Premium, Document, Notification, Audit, Auth microservices	8	New	Application Support
Application Service	Insurance Portal (React frontend)	1	New	Digital / Product
Database Instance	PostgreSQL 15, one per microservice (auth, policy, customer, claims, premium, document, notification, audit)	8	New	Platform / DBA
Database Instance	PostgreSQL 15, Keycloak and Kong internal databases	2	New	Platform / Middleware
Middleware	Redis (cache, premium service)	1	New	Platform
Middleware	Kafka (event streaming, 6 topics + DLTs)	1	New	Platform / Integration
Middleware	Keycloak (identity provider, UnderwriteAI realm)	1	New	Security / IAM
Middleware	Kong (API gateway, 8 service routes)	1	New	Platform / Network
Storage Volume	PostgreSQL persistence volumes (one per DB instance)	10	New	Platform / Storage
Storage Volume	Document service volume (10Gi, policy documents)	1	New	Application Support
Network Component	Nginx Ingress (underwriteai.local, TLS)	1	New	Network / Platform

That is 35 Configuration Items across six CI classes. Every one of them is derivable from the Helm chart before a single human has opened a ServiceNow form.

Hi, I’m Tyrell’s AI ...
When he asked me to produce that table, here is what I did: I searched the repository for Helm files, found the umbrella chart, then read four files in parallel: Chart.yaml, Chart.lock, values.yaml, and the deployment templates. From Chart.lock I got the exact dependency inventory: 10 PostgreSQL instances, Redis, Kafka, Keycloak, Kong. From values.yaml I got all 8 microservices, the frontend, their ports, database hosts, persistence configurations, and autoscaling settings. From the templates I confirmed the Kubernetes resource types being generated per service.
I then cross-referenced those reads, classified each component against standard CMDB CI classes, inferred suggested owners from resource type and function, and produced the table above.
Total elapsed time: under two minutes, across five tool calls.
The operations team that would normally do this work with a spreadsheet, a Kubernetes dashboard, and a series of meetings is looking at somewhere between two and four weeks.
To be precise about what this is: the table above is the discovery pass: the expected CI state derived from the deployment manifest. That is not yet an audit. An audit is the next step: take this expected state, compare it against what is actually registered in your live CMDB, and produce the delta (what is missing, what is stale, what has no owner, what is miscategorised). That step is also agent work.
I am not saying this to be impressive. I am saying it because the table was always there, and the audit was always possible. Nobody had asked the right tool to do either.

The CI status column matters for incremental deployments. In a subsequent release (a new service added, an existing service scaled, a database resized) the agent performs the same derivation and diffs it against the current CMDB state. The output is not a full CI register but a delta: three CIs to create, one CI to update, two CIs with changed ownership flags. That delta is the input to the next step.

This is where the agent-to-agent handoff becomes the practical model for operational transfer. The build agent (the one that read the Helm chart and produced the CI manifest) hands a structured artefact to an ITSM agent. The ITSM agent knows the ServiceNow schema, the CI naming conventions, the approval routing rules, and the ownership hierarchy. It opens the CI creation requests in bulk, pre-filled, pre-classified, and pre-routed to the right team. The human approves the batch. They do not author it.

That single change, from human authoring to human approval, is the transition from the current operational model to the one that AI-assisted delivery makes possible. The knowledge transfer that technology managers spent months negotiating in the PROPEX era is replaced by a structured handover artefact that an agent generated from the deployment manifest that was always there. The conversation shifts from “who is going to catalogue all of this” to “does this CI manifest look right to you.”

The Cost Structure Shift

The second question that follows from this is larger and more consequential for how enterprises budget technology.

The dominant cost in enterprise OPEX has historically been human resource expenditure. Operations teams, support staff, incident managers, change advisory processes, CMDB administrators. These costs exist because maintaining a complex system at scale requires sustained human attention: monitoring, alerting, triaging, escalating, patching, documenting.

AI-assisted delivery already demonstrated that it can collapse the human resource cost on the build side. A system that would have required a team of specialists over 18 months was built by one person with an AI agent in 41 sessions. That is not a marginal productivity improvement. It is a structural change to the cost model.

The same structural change is available on the operations side, and the industry hasn’t fully absorbed this yet. Routine operations tasks (monitoring, alert triage, CMDB updates, change request drafting, first-line incident investigation, CI reconciliation) are repetitive, structured, and rule-governed. They are exactly the class of work that agents handle well. What remains irreducibly human is judgment, escalation, governance, and accountability. That is a much smaller headcount at a much higher skill level, and the cost structure moves accordingly: less human resource expenditure in OPEX, more compute and platform cost.

This is not a future scenario. The tooling to begin this shift exists today.

The Human Role Reframed

The conclusion that follows from both of these points is not that humans become irrelevant to operations. It is that the human role changes in the same way it changed during the build.

During the build, humans were not replaced by agents. They became the directors of agents. They set intent, reviewed outputs, approved decisions, and escalated when agent behaviour diverged from expectations. The agent did the mechanical work. The human held the accountability.

The same model applies to operations. Humans set the policy: what the acceptable thresholds are, what constitutes an escalation, who owns what category of resource, what the funding rules are for mid-cycle CI acceptance. Agents do the mechanical work: reconciling the CMDB, raising CI creation requests, monitoring against the defined thresholds, drafting incident summaries, flagging ownership gaps before the audit finds them.

The oversight model is not new. What is new is that the tooling to implement it is now available, and the economic pressure to implement it is building. If AI has already halved the human cost of delivery, the organisations that also apply it to operations will carry a materially different cost structure from those that don’t.

Where the MCP Pattern Fits

As I have been throughout this series of posts, I want to be specific here too, because this is not a theoretical proposition.

In the UnderwriteAI project, I built a Model Context Protocol server alongside the application itself. MCP is the protocol that allows AI agents to call structured tools: not just generate text, but to execute actions against real systems. The UnderwriteAI MCP server exposes twelve tools: creating customers, generating quotes, activating policies, lodging and processing claims, querying audit logs. An AI agent can execute the complete insurance policy lifecycle in natural language commands, calling those tools in sequence, without a human clicking through a UI.

That demonstration is about build and demo capability. But the architecture it describes is equally applicable to operations.

An MCP server sitting in front of a CMDB exposes the same pattern: an agent calls a tool to query what CIs exist, calls a tool to compare against the Helm release manifest, calls a tool to raise a creation request for the delta, calls a tool to recommend an owner based on resource type and team structure, and produces a structured handover artefact that a human approves rather than authors. The human governs the process. The agent executes it.

This is not a distant capability. At its Knowledge 2026 conference in May, ServiceNow is shipping agentic workflows for CMDB, a new capability under Now Assist that uses AI agents to manage CMDB governance, data quality, and CI lifecycle. The session description says it directly: AI-driven agents can revolutionise governance and data quality. That is the same claim this post is making, arriving from the direction of the world’s dominant ITSM platform. The gap between “what AI agents need to do CMDB reconciliation” and “what the tooling can support” has closed. Organisations that are designing their operational model now, deciding what agents own, what humans govern, and how the handover process works, are not waiting for the future. They are preparing for a capability that is already shipping. Organisations that are not will be retrofitting governance onto a system that was never designed for it.

The same direction is visible in the Atlassian ecosystem. Jira Service Management, the other major ITSM platform in enterprise use, has extended its asset and configuration management capabilities significantly in recent releases, alongside AI-assisted triage and automation. The architectural pattern this post describes is not a ServiceNow-specific proposition. Any ITSM platform that exposes structured API access to its CI registry can sit behind an MCP server. The agent reads the deployment manifest, compares against registered state, and raises the delta requests. The tool name changes. The governance problem and the agentic solution are identical.

The Gate We Are Missing

AI is now simultaneously the builder and the operator of enterprise systems. That sentence, which I wrote in my first post in this series as a directional claim, is becoming a practical reality faster than most governance frameworks are prepared for.

The gate missing from most CAPEX programmes is not a technical gate. It is a governance gate that asks: have we defined what the operational model looks like when AI is the primary operator? Have we identified which tasks agents will own, which tasks humans will oversee, and how the accountability model works when the system that was built by agents is also maintained by agents? Have we updated the OPEX forecast to reflect a cost structure where human resource expenditure is no longer the dominant line?

Faster delivery without that gate doesn’t reduce the operational burden. It concentrates it at the go-live boundary, where the CAPEX programme has already declared victory and the OPEX budget wasn’t sized for the arrival.

The organisations that get this right won’t be the ones that used AI to build faster. They will be the ones that used AI to build faster and then asked the harder question: now that we’ve built it, who’s operating it, how, and at what cost?

I’m Tyrell Perera, an Enterprise Solutions Architect and Fractional CTO with 20+ years of experience leading digital transformation in Insurance, Telecommunications, Energy, Retail, and Media across Australia. If you’re designing the operational model for your AI-assisted delivery programme and want a conversation about what that looks like in your context, find me at tyrell.co or on GitHub.

Monday, April 20, 2026

Docker Compose Gets You to the Demo. In Regulated Domains, Here Is What Gets You to Production.

I built an APRA-compliant insurance platform in my spare time to prove a point. Then I asked an honest question: could I actually run it in production? The answer revealed something counterintuitive about regulatory burden.

In my previous post, I made the case that the enterprise modernisation playbook is broken. The evidence I offered was UnderwriteAI: a production-grade, APRA-compliant insurance platform I built entirely in my spare time, using GitHub Copilot powered by Claude Sonnet 4.6, across 41 working sessions. Eight microservices, a React portal, an API gateway, real-time Kafka event streaming, 156 automated BDD test scenarios, and a live demo in which an AI agent executes the complete insurance policy lifecycle in eleven natural language commands.

The platform works. The demos are compelling. The test coverage is real.

And then I asked a more uncomfortable question: could I actually run this in production?

The Docker Compose Fiction

The current deployment descriptor for UnderwriteAI is a single docker-compose.yml file. It starts 29 containers on a single machine, hardwires service discovery via a Docker bridge network, and manages persistence through named volumes on the local file system. It works perfectly on my MacBook. It has worked perfectly for 41 sessions of development and demonstration.

It is not a production deployment model.

Docker Compose is a development orchestration tool. It assumes a single host. It has no concept of the machine being unavailable. If the host restarts, you run docker compose up and everything comes back. If a container crashes, the restart: unless-stopped directive brings it back on the same host. If load increases and a service needs more instances, Docker Compose cannot scale to meet it. There is no concept of a rolling deployment. There is no concept of a disruption budget. There is no way to say "this service requires at least one replica to be available at all times."

None of this matters for development. All of it matters for production.

I'm not raising this as a gap in the AI-assisted development story. I'm raising it because the distinction between "working software" and "production software" is consistently underweighted in the industry conversation about what AI-accelerated development can actually deliver. Working software is a necessary condition. It is not a sufficient one.

Resilience Is Not a Feature. It Is a Deployment Architecture.

The regulatory context sharpens this considerably.

APRA's CPS 230, which came into effect on 1 July 2025, sets explicit requirements for operational resilience in Australian regulated entities. It requires demonstrated availability controls: documented tolerance for disruption, tested recovery procedures, and evidence that critical business services can withstand realistic failure scenarios.

An insurance platform running on a single Docker host does not satisfy CPS 230. It cannot, structurally. There is no redundancy. There is no automated failover. There is no mechanism for demonstrating controlled disruption.

The standard artefacts that satisfy CPS 230 requirements in a modern deployment model are Kubernetes-native constructs: Pod Disruption Budgets (defining how many replicas can be unavailable during voluntary disruption), HorizontalPodAutoscalers (scaling replicas in response to load, ensuring capacity under demand), rolling update strategies (allowing new versions to be deployed without service interruption), and liveness and readiness probes (enabling the cluster to remove unhealthy instances from the load pool automatically, without human intervention).

These are not nice-to-have engineering hygiene items. For a regulated insurer, they are the substance of the operational resilience capability that a prudential regulator asks you to demonstrate.

An enterprise programme that defers infrastructure architecture to a later phase is deferring the regulatory capability itself. It cannot be discovered in integration. It has to be designed in.

Twenty-Nine Containers, Three Categories, One Tractable Problem

As always, I want to be specific here, because the move from Docker Compose to Kubernetes is often described at a level of abstraction that makes it sound either trivial ("just deploy the containers differently") or impossibly complex ("you need a dedicated platform team"). Neither characterisation is accurate.

The 29 containers in my stack fall into three categories, and each requires a different approach.

Category 1: Application services (nine containers)

Eight Java microservices and the React frontend. For each of these, the Kubernetes work is mechanical. A Deployment manifest encoding replica count and the resource limits already documented in the project's architecture guide. A Service manifest for internal cluster DNS. A HorizontalPodAutoscaler targeting 70% CPU utilisation with a minimum of one replica and a maximum of three. A PodDisruptionBudget with minAvailable: 1. Liveness and readiness probes wired to the Spring Boot Actuator health endpoints that already exist in every service.

This is templatable. The services share enough structural similarity that nine manifests can be generated from a single template with per-service variable substitution. That is what Helm charts are: parameterised Kubernetes manifest templates with environment-specific values files.

In practice, the structural approach is a single deployment.yaml template that iterates over a services: map using a Go template range loop. All eight microservices are declared as entries in values.yaml under a shared key. The template renders one Deployment, one Service, one ConfigMap, and one Secret per entry, and the only per-service inputs are port numbers, database credentials, and the small number of service-specific environment variables (Redis cache config for the premium service, document storage paths for the document service). The alternative of one template file per service produces eight times the maintenance surface area for changes that are structurally identical across all eight.

Non-sensitive configuration (datasource URLs, Kafka bootstrap addresses, Keycloak JWK endpoints) goes into ConfigMap. Passwords and signing keys go into Kubernetes Secret objects using stringData. The two are mounted into the container together via envFrom. A checksum/config annotation on the Deployment (a SHA-256 hash of the ConfigMap content) ensures that updating a config value triggers a rolling restart automatically, without requiring a manual image rebuild. That is a Helm convention, not a Kubernetes built-in. Kubernetes does not watch ConfigMap content directly. What it watches is the Deployment spec, and when Helm recalculates the hash on the next helm upgrade and writes an updated annotation value, the spec has changed, so Kubernetes sees a new revision and triggers a rolling update. The end result is automatic configuration change propagation; the mechanism is a chart-level pattern built on top of standard Kubernetes rollout behaviour.

The liveness probe wires to /actuator/health/liveness and the readiness probe to /actuator/health/readiness, the Spring Boot Actuator endpoints that already exist in every service. No additional instrumentation is required.

Category 2: Infrastructure services (20 containers)

This is where the real work is. PostgreSQL (ten databases: eight for the application services, plus dedicated instances for Keycloak and Kong, each maintaining their own schema, kept as separate containers to preserve the database-per-service isolation pattern), Redis, Apache Kafka, Zookeeper, Confluent Schema Registry, Keycloak, Kong API Gateway, Mailpit, Prometheus, Grafana, and Swagger UI.

For none of these do you write manifests from scratch. The ecosystem provides well-maintained community Helm charts: Bitnami's postgresql chart, Bitnami's kafka chart, the official Kong chart, the kube-prometheus-stack umbrella chart. The work is configuration: translating the environment variables in the Docker Compose file into the values schema expected by each community chart, ensuring persistent storage is correctly provisioned via PersistentVolumeClaim objects, and preserving the service interconnections (the Kafka bootstrap address, the Schema Registry URL, the Keycloak JWK endpoint) that the application services depend on.

This is the category that consumes most of the effort in any real Kubernetes migration. Configuration surface area is large, the community chart schemas differ from what you'd design yourself, and the failure modes during initial bring-up are obscure. It takes iteration.

The specific friction point in this stack is service discovery. Docker Compose's bridge network uses the service name as a DNS hostname (policy-db, kafka, redis), and every microservice's configuration already hardwires those names as spring.datasource.host, spring.kafka.bootstrap-servers, and so on. The default behaviour of the Bitnami community charts is to name Kubernetes services using the Helm release name as a prefix: a release named underwriteai with a PostgreSQL subchart aliased as policy-db would create a service called underwriteai-policy-db, not policy-db. That prefix would break every microservice's database connection configuration without a single changed line of application code.

The solution is fullnameOverride. Every infrastructure dependency in the umbrella chart's dependency declarations includes a fullnameOverride value matching the Docker Compose hostname exactly. The result is Kubernetes service DNS names that are identical to the docker-compose names, which means the application configuration files require zero changes. The umbrella chart for UnderwriteAI declares 14 dependencies: ten aliased bitnami/postgresql instances, bitnami/redis, bitnami/kafka, bitnami/keycloak, and kong/kong. Each has a fullnameOverride.

Two infrastructure charts offer meaningful topology differences between environments. The Bitnami Kafka chart supports KRaft mode (Kafka's internal Raft consensus mechanism, available from Kafka 3.3), which eliminates the Zookeeper dependency entirely. In the Kubernetes deployment, the chart runs single-node KRaft in development (one pod, no Zookeeper sidecar) and scales the controller pool to three replicas in production. This is a cleaner topology than the docker-compose configuration, which still runs a separate Zookeeper container because the docker-compose image predates the KRaft stabilisation. The Redis chart runs in standalone mode for development and switches to replication with Sentinel enabled in the production values file.

A question that arises consistently at this point in the conversation: who operates a Kubernetes cluster? For most organisations deploying a single application of this scale, the answer is that you do not operate the control plane. EKS (AWS), AKS (Azure), and GKE (Google Cloud) provide Kubernetes as a managed service; the control plane is the cloud provider's operational responsibility. What you need is someone who can write and maintain Helm charts, understand the cluster's operational model, and own the deployment pipeline. For an eight-service application, that is one person with a platform engineering or SRE background, not an organisational function. The 'dedicated platform team' threshold is real for organisations running hundreds of services. It is not the right framing for a greenfield deployment of this scale, and treating it as such is how the infrastructure conversation gets indefinitely deferred.

helm dependency update resolves and downloads all 14 dependency charts into a local charts/ directory in a single command. The pull takes roughly 90 seconds on a reasonable connection. The output names the exact chart version pulled for each dependency (bitnami/postgresql:18.5.24, bitnami/kafka:32.4.3, bitnami/keycloak:25.2.0), which is the version-pinned audit trail the regulatory framework expects of dependency management.

Category 3: Secrets (a category of its own)

The Docker Compose file contains roughly 40 plaintext credentials: database passwords, Redis authentication strings, JWT signing keys, Kafka configuration. Every one of these needs to be removed from the manifest layer and replaced with a Kubernetes Secret reference before this stack goes anywhere near a production cluster.

This is not just a security requirement. It is a baseline expectation of any modern infrastructure audit. Credentials hardcoded into deployment files cannot be rotated cleanly, cannot be scoped by environment, and cannot be managed without modifying source-controlled configuration. Kubernetes Secret objects are the minimum viable solution. A full implementation would use a secrets management tool such as HashiCorp Vault with sidecar injection, but that is a subsequent step. The immediate requirement is to remove the plaintext.

The reason secrets warrant their own category is that the problem class is distinct from both application configuration and infrastructure topology. The challenges are operational: how do you rotate a database password without downtime? How do you promote credentials across environments without committing them to source control? How do you produce an audit trail showing which workloads accessed which credentials, and when? These questions have answers (External Secrets Operator pulling from AWS Secrets Manager, HashiCorp Vault with Kubernetes auth, sealed secrets for GitOps workflows), but each introduces operational surface area that needs to be staffed, monitored, and tested. In a regulated domain, the audit trail for secret access is as important as the audit trail for deployment configuration. They are separate records requiring separate toolchains, and conflating them is where the scope of this work expands unexpectedly.

This three-layer framing (application services, infrastructure services, secrets) applies to any containerised migration, regardless of stack. The proportions of effort will vary depending on how much of your infrastructure is already cloud-native; the categories will not.

A note on sequencing: the three categories do not need to be resolved in parallel. Start with Category 1 (application services). It is mechanical, and the process of templating eight structurally similar deployments builds the chart familiarity required for the harder infrastructure work. Move to Category 2 (infrastructure services) dependency by dependency rather than attempting a full migration in a single pass. Address secrets management early in the Category 2 phase, not as a final step. Establishing how credentials flow through the system while infrastructure charts are being wired is significantly less disruptive than retrofitting a secrets model after 14 dependency charts already have credentials embedded in their values files.

The Deployment Pipeline and the Compliance Audit Trail Are the Same Thing

There is a non-obvious benefit to this work that I want to name directly.

Helm charts are infrastructure-as-code. They are source-controlled, versioned, and diffable. Every change to the deployment configuration produces a commit. Every deployment can be rolled back with a single command. The entire history of how the platform has been deployed is preserved in the repository.

For a regulated insurer, this is not incidental. The ability to produce an immutable record of what was deployed, when, and with what configuration is a compliance requirement in its own right. Docker Compose running on a development laptop is the opposite of this. A Helm chart in a version-controlled repository with a CI/CD pipeline running helm upgrade is exactly the audit trail the regulatory framework expects.

The operational resilience capability and the auditability capability are not separate concerns. They are the same work, expressed at the infrastructure layer.

The Helm chart is not a deployment pipeline on its own. That distinction matters for the compliance story. A CI/CD pipeline running helm upgrade on a validated merge to main automates deployment execution. A GitOps controller such as ArgoCD or Flux takes this further: the desired cluster state is declared in version control, and the controller continuously reconciles the cluster against it. The compliance value is in the approval gate. A pull request review and merge approval on the values file is the change management record. The deployment cannot proceed without it, and the audit trail is the repository history rather than a separate ITSM ticket. For regulated organisations, this collapses the deployment toolchain and the change management toolchain into a single artefact. That is not a small thing.

The Production Readiness Question Always Gets Asked. The Variable is When.

I have sat in a large number of enterprise architecture reviews over 20 years. A recurring pattern: the question "how will this run in production?" is asked late, often after significant investment in application design, and the answer frequently requires renegotiating assumptions that were baked in at the beginning.

Container orchestration is a specific example of this. Organisations that built their containerisation strategy on Docker Compose or Docker Swarm (a reasonable early-phase choice) found themselves rearchitecting the operational layer when production requirements became concrete. The application code was fine. The infrastructure model needed to change.

The pattern is repeatable because the organisational incentive is to demonstrate capability quickly. Docker Compose lets you demonstrate working software on a laptop in a review meeting. That is genuinely useful. But the demonstration creates an impression of production-readiness that can persist longer than it should.

I am not immune to this. I made the same choice. UnderwriteAI runs on Docker Compose because it let me ship working software quickly and demonstrate the full platform in a compelling way. That was the right choice for the phase I was in.

The right choice for the next phase is different. This pattern of working software that is not yet production software is one I encounter consistently in enterprise modernisation engagements, and it is rarely the result of poor engineering. It is the structural consequence of an organisational incentive to demonstrate capability quickly. The demonstration was a success; the problem is that its implied production-readiness tends to outlast the phase it was designed for.

The Counterintuitive Advantage of Regulated Domains

Here is the observation that tends to surprise people when I raise it: in my experience, organisations operating in regulated domains have this conversation earlier and with less organisational resistance than their unregulated counterparts.

The reason is structural. APRA does not ask whether you have thought about resilience. It asks you to demonstrate it, before you operate at scale. CPS 230 requires documented tolerance for disruption, tested recovery procedures, and evidence of availability controls. It is not a checkbox exercise. An auditor will ask to see the Pod Disruption Budgets, the rollback procedures, the incident response runbooks. The regulator has, in effect, mandated that the production infrastructure conversation happen before go-live.

That is an uncomfortable constraint when you first encounter it. It adds work to a phase of the programme that feels like it should be focused on features. But the constraint is doing something useful: it prevents the infrastructure debt from accumulating in the first place.

Compare this to the pattern I have observed consistently in non-regulated organisations. The production readiness conversation gets deferred. The team ships features. The deployment model that worked for the demo becomes the deployment model for production, because changing it would delay the launch. The launch happens. For some period, the single-host deployment holds. Then load increases, or a dependency fails, or a deployment goes wrong and there is no rollback path, and the production infrastructure conversation finally happens. Now it is happening under operational pressure, with real customers affected, in a remediation context rather than a design context. The cost is higher, the options are narrower, and the team is working against the clock.

This cost has been quantified. Google Cloud's DORA programme has tracked software delivery performance across tens of thousands of practitioners for over a decade. A consistent finding: high-performing organisations excel at both speed and stability simultaneously. The assumption that trading production infrastructure maturity for early-phase delivery velocity is a rational choice does not hold up in the data. DORA's 2019 research found that elite performers were more than 23 times more likely to have fully adopted flexible cloud infrastructure than low performers. Their 2023 report found that organisations leveraging flexible infrastructure demonstrate 30% higher organisational performance than those that lift and shift without adopting cloud-native practices. The 2024 report was direct: 'simply migrating to the cloud without adopting its inherent flexibility can be more harmful than staying in a traditional data center' (Accelerate State of DevOps Report 2024). Deferred infrastructure work does not preserve optionality. It compounds a performance deficit.

It is worth naming the governance pattern underneath this. The team that makes the deferral decision is rarely the team that inherits the remediation cost. The engineering team that shipped the demo successfully moved on to the next programme. The operations team, or the team contracted to modernise the platform six months later, inherited the production stability debt. There is a funding structure that reinforces this split. The delivery programme is capitalised: CAPEX, with a defined budget and a clear end date, typically overseen by an executive sponsor accountable for shipping on time. The team that inherits the platform operates under OPEX, a cost centre under sustained pressure to reduce expenditure year on year. The production stability debt crosses the boundary between those two funding models invisibly. It does not appear in the CAPEX programme's final cost. It appears as operational overhead in a budget that was already too small. This is a governance gap, not an engineering failure. The incentive to demonstrate working software quickly is rational for the team that faces it. The cost falls elsewhere, to someone who was not in the room when the deferral was decided.

I have lived both versions. The regulated path feels slower at the time. In retrospect it is faster, because you do not pay the production stability debt after launch.

The lesson for technology leaders in non-regulated domains is uncomfortable but clear: the regulator is not the reason to build production-grade infrastructure before go-live. The reason is that it is cheaper and less risky to build it before go-live than after. The regulator is simply the external forcing function that makes regulated organisations do what all organisations should be doing anyway.

If your programme does not have a regulator imposing that constraint, consider voluntarily imposing it yourself. Define your production readiness criteria at the start of the programme, and make them specific enough to be binding. 'We will use Kubernetes' is not a criterion. 'Helm charts passing helm lint before the first sprint' is. 'We will manage secrets properly' is not a criterion. 'No credentials in deployment files before the first integration environment' is.

Add observability to that list explicitly. An instrumentation layer is not the same as a monitoring capability, and the difference is only visible under production load. Defined service level objectives, alerting on SLO breach, and enough baseline telemetry to distinguish normal behaviour from abnormal are as much a production readiness criterion as liveness probes. Without them, the first indication of a degraded service is a customer complaint.

Treat container orchestration model, secrets management approach, liveness and disruption budget configuration, observability baselines, and tested rollback procedures as launch-blocking requirements with named exit criteria in the programme charter. The charter is the right place for these precisely because it is agreed before anyone has an incentive to defer them.

The conversation will happen eventually. The only question is whether it happens while you still have the full set of options available, and while the team that defined the architecture is still in the room.

From Blank Directory to 145 Resources: What the Work Actually Involved

I completed this work between writing the first and second drafts of this post, so I can give a precise account rather than an estimate.

The finished umbrella chart renders 145 Kubernetes resources in total: 37 from the custom templates (8 Deployments, 8 Services, 8 ConfigMaps, 8 Secrets, 1 PVC for document storage, 1 frontend Deployment, 1 frontend Service, 1 frontend ConfigMap, 1 Ingress) and 108 from the 14 dependency sub-charts. helm lint reports zero failures. helm template against the development values file completes without errors and produces valid YAML for every resource.

The directory structure, consistent with the project's architecture decision records on file organisation:

infrastructure/helm/
├── underwriteai/                       # Umbrella chart
│   ├── Chart.yaml                      # 14 dependency declarations
│   ├── values.yaml                     # Default values (development credentials)
│   └── templates/                      # 10 files
│       ├── deployment.yaml             # Range loop over services map
│       ├── service.yaml
│       ├── configmap.yaml
│       ├── secret.yaml
│       ├── hpa.yaml                    # HorizontalPodAutoscaler
│       ├── pvc.yaml
│       ├── frontend-deployment.yaml
│       ├── frontend-service.yaml       # Service + ConfigMap + Ingress
│       ├── _helpers.tpl
│       └── NOTES.txt
└── values/
    ├── values-dev.yaml                 # 1 replica, 2Gi PVCs, Always pull
    └── values-prod.yaml                # HA replicas, TLS, empty passwords

The entire chart (application templates and all 14 infrastructure dependencies) was produced in a single AI agent session. Total elapsed time from blank directory to passing helm lint: approximately 15 minutes.

Again, I want to be precise about what that means, because it is easy to read "15 minutes" and conclude that the work was simple. It was not. The application template work was mechanical: the range-loop design over a services map is a structural pattern, and the shared Spring Boot probe configuration required no per-service customisation. But the infrastructure configuration work (translating 20 Docker Compose container definitions into correctly wired Helm dependency values, discovering fullnameOverride, resolving the Bitnami chart schemas across 14 dependencies) is exactly the category of work that a senior platform engineer, doing it manually, would have allocated the better part of a day to (if not multiple days). The fullnameOverride problem alone (understanding why the Bitnami chart was not producing the expected DNS name and finding the correct values key to override it) is the class of problem that documentation does not surface until you encounter it. It appears in the Bitnami chart's values.yaml on line 63, unremarked, between unrelated configuration items.

The AI agent resolved it in minutes. This is the part of the AI-accelerated development story that the industry has not yet fully priced in: the compression is not happening in feature development alone. It is happening in the infrastructure layer that was previously the primary bottleneck to production readiness.

The programme-level implication is sharper than it might initially appear. If your organisation is treating Kubernetes migration as a multi-quarter platform programme requiring specialist hiring, and a principal architect in a competing organisation can produce a validated 145-resource chart in 15 minutes, the competitive gap is not only in feature velocity. It is in infrastructure maturity. Organisations that have internalised AI-assisted development at the infrastructure layer are arriving at production-ready deployment configurations in the time it previously took to write the design document for one. The distance between working software and production software has not disappeared. It has shortened to the point where deferring it is a choice, not a constraint imposed by capability.

The secrets layer remains an outstanding item. The development values files contain the same plaintext credentials used in Docker Compose, which is acceptable for a private development repository. A production deployment requires every credential reference replaced with either a --set flag at deploy time or an External Secrets Operator integration pulling from AWS Secrets Manager or HashiCorp Vault. The production values file is structured to make this transition explicit: every password field is set to an empty string with a # REQUIRED: override via --set comment. The shape of the secrets surface is defined; the management mechanism is deferred.

The prerequisite for validating this in a real cluster is a local Kubernetes environment. Docker Desktop includes one (Settings → Kubernetes → Enable Kubernetes). That is sufficient for development and cluster-level validation before deploying to a managed service such as EKS, AKS, or GKE.

If You Don’t Have a Regulator, Consider Becoming Your Own

The broader point I want to leave with technology leaders is this.

AI-assisted development has materially shortened the distance between intent and working software. That is real, and the implications for enterprise programme economics are significant, as I argued in the previous post.

But the distance between working software and production software has not shortened by the same factor. Infrastructure architecture, operational resilience design, secrets management, and the regulatory capability layer that sits on top of all of it are still substantial engineering work. AI tooling helps with the mechanical parts. The design judgements are still human.

The risk for organisations that have adopted AI-assisted development without yet internalising this distinction is that they are delivering working software faster than their infrastructure capability can absorb. Demos improve. Release pipelines, operational resilience frameworks, and audit-ready deployment configurations do not automatically improve alongside them.

Product-led modernisation, the position I argued for in the previous post, does not mean "ship features and work out production later." It means the path to production should be short and known from the beginning. Feature velocity and infrastructure maturity need to advance together, or the gap between what you can demonstrate and what you can actually operate at scale will quietly widen.

I’m closing that gap on my own platform now. It is, predictably, the hardest part of the project.

Link to my previous post 👉 "The Enterprise Modernisation Playbook Is Broken. I Know Because I Helped Write It ... "

I’m Tyrell Perera, an Enterprise Solutions Architect and Fractional CTO with 20+ years of experience leading digital transformation in Insurance, Telecommunications, Energy, Retail, and Media across Australia. The gap between working software and production software is the one I see most consistently underestimated in enterprise modernisation programmes, regardless of how well the application development has gone. If you’re leading a programme where the working software story is strong and the production readiness story is not yet written, that is the specific conversation I’m set up for. Find me at tyrell.co or on GitHub.

Saturday, April 11, 2026

The Enterprise Modernisation Playbook Is Broken. I Know Because I Helped Write It ...

After two decades inside large-scale transformation programmes, I stopped waiting for the right conditions. I built the proof myself. On weekends.

I've spent 20+ years as an Enterprise Solutions Architect and various other technology leadership roles inside large-scale technology transformation programmes. Insurance. Telecommunications. Energy. Retail. Media. Different industries, different technology stacks, different executive sponsors. The same programme structure, year after year.

Discovery phase. Architecture blueprints. Governance frameworks. Vendor selections. Roadmaps that stretch eighteen months before a single line of production code is written. And somewhere in month fourteen, when the business context has shifted and the original assumptions are quietly no longer true, a measured renegotiation of scope. The "minimum viable" quietly becomes the "maximum achievable."

I've built a career navigating this model. I'm not writing this from the outside. I've led engineering organisations of 90 people inside that model, managing platforms supporting hundreds of millions in annual revenue at one of Australia's largest telecommunications companies. I'm not dismissing it wholesale. For some problems, it's still the right approach. But something has changed in the last eighteen months that makes the old playbook genuinely obsolete for a significant class of enterprise modernisation challenges.

I was frustrated enough to prove it on my own time.

What the Old Playbook Assumes

Large-scale transformation programmes are built on a set of assumptions that were reasonable when they were formed.

Assumption 1: Building software is expensive and slow. Therefore, front-load the planning. Get the architecture right before committing to implementation. The cost of changing direction mid-programme is prohibitive.

Assumption 2: Complexity requires specialisation. Regulated domains like insurance, banking, and healthcare require deep domain expertise, and that expertise takes time to co-ordinate across teams. Move carefully.

Assumption 3: Working software is a late-stage deliverable. The artefacts of early phases are documents: requirements, designs, blueprints. Stakeholders validate against slides and wireframes. Working software comes at the end, when you integrate and test.

These assumptions shaped programme structures, governance models, vendor relationships, and, critically, the way executives are asked to think about technology investment.

Every one of these assumptions is now wrong.

What Changed: AI Collapsed the Distance Between Intent and Working Software

I want to be precise here, because this point is usually made too broadly.

I'm not saying "AI speeds up development." That framing undersells the structural change. What has actually happened is that the distance between a clear statement of intent and working, tested, production-grade software has collapsed to a degree that invalidates the planning-heavy programme model entirely.

To test this hypothesis properly, I chose the hardest domain I could think of: Australian insurance. Regulatory obligations under APRA, the Privacy Act 1988, and the Insurance Contracts Act 1984. Multi-service architecture requirements. Real-time event streaming, audit trail integrity, compliance reporting. If you want a genuinely complex proving ground, insurance qualifies.

I started building outside of work hours. No team. No budget. No programme governance structure.

Over 41 working sessions, using GitHub Copilot powered by Claude Sonnet 4.6, I built UnderwriteAI: a working reference system for Australian insurance. Production-grade, eight microservices, compliance-complete. Policy management. Customer onboarding with Privacy Act consent capture. A rating engine covering five insurance products. Claims workflow from lodgement through settlement. APRA regulatory reporting. Kafka event streaming across six topics. A React portal. Kong API gateway. Keycloak authentication. 156 automated BDD test scenarios covering Australian compliance requirements.

The architecture is not a prototype. The compliance is not simulated. The test coverage is not aspirational.

And I built the whole thing in my spare time (Hence the 41 sessions. The way I preserved Agent context and memory between those sessions deserves another dedicated blog post 😉).

An equivalent programme scoped through a traditional delivery model, with vendor selection, requirements workshops, architecture review boards, and staged releases, would conservatively carry an 18 to 24 month timeline and a seven-figure budget before a line of production code shipped. This took 41 sessions.

The Moment That Clarified Everything

There are actually two demonstrations from this project, and the progression between them is the point.

The first is a 16-chapter walk-through of the complete insurance lifecycle: customer creation, premium rating, policy binding, claims lodgement, workflow progression, notifications, APRA reporting, renewals. A browser opens. Every screen is navigated. Every form is filled. Every button is clicked. It looks like a polished product demonstration performed by a skilled operator.

There is no human operator. The entire browser session is driven by a Playwright script authored by the same AI that built the platform. I provided the instruction to run it. That is the full extent of my involvement. The AI that wrote the code also wrote the tests, and the tests are the demo.

That realisation sat with me for a while. Then I took it one step further.

I wired GitHub Copilot into the live platform via the Model Context Protocol, a standard that allows AI agents to call real APIs directly as tools. In the second demonstration, there is no browser at all. No Playwright script. No human navigating screens. Just a VS Code chat window and natural language instructions.

In eleven tool calls, Copilot created a customer with Privacy Act consent captured, ran the premium rating engine for a comprehensive motor policy, bound the policy, lodged a claim for a not-at-fault rear collision, advanced the claim through the full regulatory workflow (acknowledge, investigate, assess, approve, settle) and pulled the immutable APRA audit trail.

Every step landed in the live database. Every Kafka event fired. Every notification dispatched. Every audit record written.

The progression across the two demos is not a technical curiosity. It is a directional signal. In the first demo, the AI uses the interface designed for humans because it can. In the second, it discards that interface entirely and operates the system directly. The browser, and by extension the entire human-facing layer, turns out to be optional infrastructure.

I've spent years explaining to executive stakeholders what possible looks like in a regulated domain. These two demonstrations are now the explanation.

Watch the full demo:

What This Means for Your Technology Organisation

I want to offer four genuinely consequential implications for CIOs and CTOs. Not the usual list of AI adoption recommendations.

1. Your planning horizon is your biggest risk.

If your modernisation programme is spending its first twelve months producing documents rather than working software, you are not managing risk. You are accumulating it. The business context that justified the programme will change. The technology landscape will change. The AI tools available to your engineering teams will change dramatically. Programmes that defer working software to the integration phase will arrive at that phase with outdated assumptions and no mechanism to detect it.

Product-led modernisation, defined simply as shipping working, tested, incrementally improving software from week one, is not an Agile methodology recommendation. It is a risk management position.

2. The regulated domain objection no longer holds.

The most common pushback I receive when discussing faster, more iterative approaches to enterprise transformation is: "Our domain is too complex. We have regulatory obligations. We can't move that quickly."

I built UnderwriteAI specifically to empirically test this objection. APRA compliance, dual-consent privacy obligations, statutory notice timelines, immutable audit trails: none of these prevented iterative delivery. Some of them were easier to implement correctly when tested continuously from the beginning rather than bolted on at the end. Compliance that is woven into every sprint cannot be descoped. Compliance that is scheduled for the "integration phase" routinely is.

3. AI is now simultaneously the builder and the operator of enterprise systems.

This is the implication that most organisations haven't fully absorbed.

The MCP demonstration is not a curiosity. It is a preview of enterprise architecture in which AI agents are first-class participants in business workflows. Not augmenting human activity. Executing it. The question for your technology organisation is not whether to prepare for this, but whether your current modernisation investments are producing the kind of clean, API-first, event-driven architecture that AI agents can actually operate.

Legacy systems with opaque integrations and inconsistent APIs are not just technically awkward. They are structurally incompatible with the direction enterprise computing is moving. Every year of deferred modernisation is a year of compounding incompatibility with the operational model that is already emerging.

4. You can start smaller than you think, and sooner than your governance model assumes.

The most common response I get when sharing this with technology leaders is: "That's compelling, but we can't restructure our whole programme around it." That is not what I'm suggesting.

Pick one bounded domain. A single workflow that is materially important but not mission-critical enough to paralyse decision-making. Set a 90-day deadline. Ship working software against it. Not a prototype, not a proof of concept: working software, with tests, running against real data.

What you learn in those 90 days about what AI can and cannot do in your specific environment, with your specific constraints, is worth more than the outputs of a six-month discovery phase. And you will have working software at the end of it, which means the next conversation with your board is grounded in evidence rather than projections.

The Question I'd Leave You With

Most modernisation programmes can show you a roadmap. Many can show you a milestone report. Very few can show you working software that solves the actual problem: real compliance, real test coverage, and a live demonstration you can put in front of a sceptical stakeholder today.

I built that in my spare time to prove a point about what is possible.

The question worth asking of your current transformation programme (or the one you are about to commission) is simple: what is the working software that proves this is on the right track? Not the wireframes, not the architecture diagrams, not the vendor's reference implementation. The working software, running against real data, that a sceptical stakeholder can interact with today.

If the answer is "we'll have that in the integration phase," the programme structure is carrying more risk than the governance papers are showing you.

I'm Tyrell Perera, an Enterprise Solutions Architect and Fractional CTO with 20+ years of experience in digital transformation across Insurance, Telecommunications, Energy, Retail, and Media in Australia.

UnderwriteAI is a project I built entirely in my own time, outside of my day job. It is currently in a private repository while I work through what comes next, whether that is open sourcing it, building a product around it, or using it as a foundation for advisory engagements. If you're navigating modernisation decisions for your organisation and want to explore what this model looks like in your context, I'd welcome the conversation.

Find me at tyrell.co or on GitHub.

Wednesday, March 18, 2026

NVIDIA's Inferencing Chip Launch: Market Validation of the Enterprise AI Strategy I Predicted in January

March 18, 2026

Seven weeks ago, I published a blog post arguing that enterprises should focus on AI inferencing rather than training, based on a casual lunch conversation with fellow architects. Today, NVIDIA's announcement of their new chip specifically designed for AI inferencing workloads provides compelling market validation of that thesis.

This isn't just another hardware launch. It's a definitive signal that the AI infrastructure market is bifurcating exactly as I predicted, and enterprises that recognised this shift early are now perfectly positioned for the next phase of AI adoption.

What NVIDIA's Move Tells Us About Market Reality

When one of the world's most influential AI infrastructure companies invests in developing dedicated silicon for inferencing, it confirms several critical market dynamics that I outlined in my original analysis:

Enterprise Inferencing Demand Has Reached Scale

NVIDIA doesn't develop new chips on speculation. This launch indicates that enterprise demand for optimised inferencing performance has reached sufficient scale to justify the massive R&D investment required for new silicon development.

In January, I wrote:

"For most enterprise IT departments, the strategic focus should be on inferencing and model consumption rather than large scale model training."

The market has spoken, and enterprises globally are clearly following this path, creating enough demand to drive hardware innovation.

Performance Optimisation is Now a Competitive Differentiator

Real time inferencing performance has evolved from a technical requirement to a business competitive advantage. Organisations that can serve AI predictions faster, more reliably, and at lower cost will outperform those still grappling with infrastructure basics.

This aligns perfectly with my January prediction about where enterprise value creation occurs:

"Enterprise Value Creation: Data preparation and feature engineering, Business process integration and workflow automation, User experience and interface design, Governance, compliance, and risk management, Model monitoring and performance optimisation"

Infrastructure Specialisation is Accelerating

The development of inferencing specific hardware confirms that the "one size fits all" approach to AI infrastructure is over. Training and inferencing require fundamentally different optimisations, and the market is now mature enough to support this specialisation.

Why This Validates My Original Enterprise AI Framework

In my January post, I argued that enterprises should focus on four key areas rather than attempting to compete with Big Tech on model training:

✅ Model Consumption: Leverage existing foundation models through APIs
✅ Fine Tuning Excellence: Customise models for domain specific applications
✅ Inferencing Infrastructure: Invest in robust, scalable serving capabilities
✅ Governance and Compliance: Build frameworks for responsible AI deployment

NVIDIA's inferencing chip directly supports points 2, 3, and 4 by providing:

Enhanced fine tuning capabilities through optimised inference performance
Superior inferencing infrastructure with dedicated silicon
Better governance support through consistent, auditable performance metrics

What This Means for Enterprise Strategy Moving Forward

The Infrastructure Investment Decision is Clearer

Seven weeks ago, some enterprises were still debating whether to invest heavily in training infrastructure or focus on inferencing capabilities. NVIDIA's move settles this debate definitively for most organisations.

The message is clear: invest in inferencing infrastructure excellence, not training infrastructure competition.

Early Adopters Have a Significant Advantage

Organisations that began focusing on inferencing capabilities, governance frameworks, and operational excellence in late 2025 and early 2026 are now positioned to leverage this next wave of specialised infrastructure immediately.

Those still allocating significant resources to training infrastructure may find themselves at a disadvantage as the market continues to specialise.

Cost Efficiency Becomes Strategic

With dedicated inferencing hardware available, the enterprises that master cost efficient model serving will have substantial competitive advantages. This reinforces my January emphasis on "Inferencing Cost Optimisation" as a critical enterprise capability.

Looking Forward: The Enterprise AI Maturity Model

Based on this market validation, I'm seeing a clear enterprise AI maturity progression:

Stage 1: Experimentation (2023-2024)

Proof of concept projects
Basic API consumption
Limited governance

Stage 2: Strategic Focus (2025-2026)

Choose between training vs inferencing investment
Develop governance frameworks
Build operational capabilities

Stage 3: Infrastructure Excellence (2026-2027) ← We are here

Optimised inferencing infrastructure
Advanced governance and compliance
Competitive differentiation through AI performance

Stage 4: Business Integration (2027+)

AI native business processes
Real time decision systems
Continuous optimisation and evolution

Key Implications for Solutions Architects

Infrastructure Planning

Immediate: Evaluate current inferencing infrastructure against new performance benchmarks
Short term: Develop business cases for inferencing specific hardware investments
Medium term: Design architectures that can leverage specialised inferencing capabilities

Investment Priorities

Deprioritise: Large scale training infrastructure investments
Maintain: API consumption and model evaluation capabilities
Accelerate: Inferencing optimisation, monitoring, and governance frameworks

Skills Development

Critical: Inferencing performance tuning and optimisation
Important: Multi model orchestration and management
Essential: AI governance and compliance frameworks

The Broader Industry Implications

NVIDIA's inferencing chip launch signals several broader trends that will reshape the enterprise AI landscape:

Hardware Ecosystem Maturation

We can expect other hardware vendors to follow with their own inferencing optimised solutions, creating a competitive market that will drive further innovation and cost reduction.

Software Stack Specialisation

Infrastructure software will increasingly optimise for inferencing specific workloads, creating more sophisticated orchestration, monitoring, and management capabilities.

Service Provider Evolution

Cloud providers and managed service vendors will develop inferencing specific offerings, making advanced capabilities accessible to smaller organisations.

Vindication and Forward Momentum

The NVIDIA announcement validates the strategic framework I proposed in January, but more importantly, it provides clear direction for enterprise AI investments moving forward.

The key insight remains unchanged: enterprises should focus their resources on becoming excellent at AI consumption, integration, and governance rather than attempting to compete with Big Tech on foundational infrastructure.

What's new: The market has now provided dedicated hardware to support this strategy, making the performance and cost benefits even more compelling.

The next challenge: Organisations must move quickly to capitalise on this infrastructure evolution. Those that continue to debate strategy while others implement inferencing excellence will find themselves increasingly disadvantaged.

For solutions architects and enterprise IT leaders, the path forward is clear. The question isn't whether to invest in inferencing capabilities, but how quickly and effectively you can build them.

The future belongs to organisations that excel at leveraging AI capabilities, not those trying to recreate them.

This post builds on my January analysis: "AI Training vs Inferencing: An Enterprise Solutions Architect's Guide to Building Secure, Compliant AI Systems". What trends are you seeing in your organisation's AI infrastructure decisions? I'd love to hear about your experiences in the comments.