Professional Documents
Culture Documents
Azure Reference Architecture Humanitec
Azure Reference Architecture Humanitec
e
Backstag
API
log / Portal
e Cata
Servicg Developer
Catalo
IaC
specs
n cont
rol Workload orm
Versio Terraf
Score
e
ne
Github ce Pla
Resour
Compute
line bernetes
CD Pipe Azure Ku
Service
Registry
Data
line
CI Pipe
FluxCD Azure SQ
L
Platfor tor
ra
Azure
r
Orchest ng
Containe Networki
Registry
Github
Actions Azure DN
S
Services
s
rvice Bu
Azure Se
nitor
Azure Mo
y
abilit
Observ
y Vault
Azure Ke
ment
Manage
entity
s & Id
Secret
02 Introduction
Introduction
Organizations need to be agile and innovative to stay competitive in today's software
development era, which has led to changes in how applications are built, deployed, and managed.
This necessitates the transformation of static CI/CD setups into modern Internal Developer
Platforms (IDPs) that provide developers with the tools needed to innovate and move quickly.
43% of DevOps professionals recognize this and, as a result, have built an IDP to improve
developer experience (DevEx) and enable developer self-service.
As an industry, we need to move beyond buzzwords and provide real-life examples of modern
IDPs. A blog post by McKinsey (soon to be released) makes a major contribution to this.
While every platform looks different, certain common patterns emerge. To help simplify things,
McKinsey consolidated the platform designs of hundreds of setups into standard patterns based
on real-world experiences, which have been proven to work effectively. By adopting these
patterns, organizations can create IDPs that keep them ahead of the competition and deliver
innovative applications faster than ever before.
This whitepaper is inspired by McKinsey’s blog post and provides an overview of one reference
architecture for a dynamic IDP using Azure Cloud, GitHub Actions, Backstage, Humanitec, Flux
CD, Terraform, and several other technologies. Please note this architectural design tries to use
the most common combinations, but it doesn’t restrict the use of these technologies; every one
of them is interchangeable with others.
Developer Portal
Deploy
Compute
Azure Kubernetes
Integration
Azure
Service
& Delivery Plane Github
Container
Platform
FluxCD
Actions Registry Orchestrator
Data
Azure SQL
Networking
Services
There will be a replicable open-source version of this architecture soon, so you can set this up
yourself. In the meantime, please get in touch with a Humanitec solution architect
(info@humanitec.com), who will be happy to provide a test setup.
Table of contents
06
Problems this IDP design aims to solve
07
Design principles
08
Architectural components
Security Plane 10
11
The end-to-end architecture result
maintain a platform 14
19
How developers use such a platform
05 Table of contents
Golden path 4: A platform engineer updates the dev Postgres resource to
the latest Postgres version 26
Conclusion 30
Appendix 32
Environment management 32
Deployment Management 33
Observability 33
Administration 33
Cost management 34
Integration 34
06 Problems this IDP design aims to solve
When developers spend their most productive hours dealing with tedious
infrastructure and configuration management tooling, creativity and velocity suffer.
A burned-out engineer does not write code as quickly and efficiently; inspiration
often dies when too many steps are needed to test and deploy an idea. On average,
developers with inefficient platform setups have longer waiting times as they are
stuck in a loop while other teams manually resolve things for them. Mediocre setups
also make onboarding difficult.
Ops teams and developers often find themselves waiting for each other to complete
tasks which can result in delays, frustration, and decreased productivity. Modern
IDPs can help solve this problem by providing code such as UI, CLI or API-based
interfaces that allow developers to quickly and easily provision resources without
having to wait for Ops to do it for them.
In current static CI/CD setups, there are often dozens of ways to materially reach the
same goal, such as spinning up a new Postgres Database or deploying to production,
describing the state of a cluster or other conventions. These scripts often vary only
slightly and sometimes just by environment. Their sheer number and unstructured
nature make them hard to maintain. Good platform design reduces the number of
variances through Dynamic Configuration Management (DCM) by up to 95%.
Design principles
According to McKinsey, there are eight proven design principles:
engineers must define how to vend resources and configuration. This ensures
every resource is built securely, compliant, and well-architected.
0 5 Implement Dynamic on guration anagement. Dynamic
C fi M
07 Keep code as the single source of truth. This ensures everyone is working
from the same version, reducing the risk of errors.
Architectural components
According to McKinsey's blog post, “plane levels” are different areas of the platform
architecture that cluster certain functionalities. Let’s zoom in on the plane levels we have to
take care of and see what technologies fulfil each function in all of those levels.
Terraform - an Infrastructure as Code (IaC) tool written in HCL to describe the state of
the infrastructure resources in a declarative way
Backstage - a developer portal/service catalog to provide an interface to consolidate
documentation, structure service templates, and catalog existing services.
Service Catalog / API Catalog
Developer Portal
Integration
Github
Azure
Security Plane
On the Security Plane, we’re managing secrets and identity to protect sensitive data. We’re
storing, managing, and securely retrieving API keys and passwords.
Observability
Monitoring & Logging Plane Azure Monitor
Developer Portal
Deploy
Compute
Azure Kubernetes
Integration
Azure
Service
& Delivery Plane Github
Container
Platform
FluxCD
Actions Registry Orchestrator
Data
Azure SQL
Networking
Services
The developer portal component adds an aggregator of information pulling data from the
Observability Plane, Resource Plane, Platform Orchestrator, CI pipeline and VCS. If used for
service creation and as a templating engine, the portal might call the templating API of
the VCS.
12 The end-to-end architecture result
The Continuous Integration (CI) pipeline receives a notification to build and test from the git-
push of the terminal, pushing the latest changes to the Version Control System (VCS) and
indicating the branch. The CI pipeline will store its image in the registry and—at the last step of
the build pipeline—inform the Platform Orchestrator that a new image is available. It will also
send the metadata and eventual “orders” of new resources from the workload specification
(we’ll get to this in more detail later.) The Orchestrator will hand over the deployment-ready
app and infrastructure configs to the Continuous Delivery (CD) part. Note that the Orchestrator
is also taking over the CD functionality in our example, which doesn’t necessarily have to be the
case. The CD pipeline will go ahead and update the Resource Plane.
All parts of this plane can output workflow performance data and other metrics to the
Developer Control Plane, such as CI build time, DORA metrics etc. The Orchestrator can
register new services and their dependent resources to the portal layer.
An important integration point for the Monitoring and Logging Plane also happens on the
Integration and Delivery Plane, where the Platform Orchestrator ensures the necessary
sidecars and agents are launched and running next to the cluster.
In the vast majority of cases teams already have an existing setup when building their IDP, so
it’s often about remodeling to ensure their setup matches the design principles. Here’s how:
01 Design the individual planes. Start with the Resource Plane because it
dictates other design decisions on other layers. We usually propose the
following order:
a. Resource Plane (you probably already have resources, in this case, decide
which ones are supported by your platform as a default).
b . Integration and Delivery Plane: Pipeline design, configs of the Orchestrator etc.
c . Security Plane.
d . M onitoring and Logging Plane.
e . Developer Control Plane: this heavily depends on the design choices of the other
planes and should always come last after thorough testing by developers.
15 How platform engineers or Ops teams operate, build, and maintain a platform
02 Wire the individual components of the planes to each other as well as one
plane to the other and test the raw end-to-end flows.
03 Set baseline configs for app and infrastructure configs (more details below).
We’ve covered the planes and their design; let’s next zoom in on the baseline configs
and automations.
Before the platform is ready, the platform engineering team still needs to set a number of
defaults and baseline configs. The entire idea of the presented reference architecture is to
enable developer self-service, lower cognitive load, drive standardization, and reduce ticket
ops. This requires the use of Dynamic Configuration Management (DCM), which in turn requires
a Platform Orchestrator that functions as a rules engine. It matches the request from the
developers with the config defaults provided by the platform team.
This means the next “job to be done” for the platform engineering team is to set those app and
infrastructure config defaults.
16 How platform engineers or Ops teams operate, build, and maintain a platform
id = "db-dev"
name = "db-dev"
type = "postgres"
driver_type = "humanitec/postgres-cloudsql-static"
driver_inputs = {
values = {
"instance" = "test:test:test"
"name" = "db-dev"
"host" = "127.0.0.1"
"port" = "5432"
secrets = {
"username" = "test"
"password" = "test"
criteria = [
app_id = "test-app"
}
17 How platform engineers or Ops teams operate, build, and maintain a platform
Now all our planes are in place, and we can technically already get from code to running
application. Fundamentally we would now be able to use our platform. However, as a developer,
you may not want to have to trigger the different elements of the process. For instance, if we
do a git-push, we expect the platform to deploy all the way through automatically. In more
advanced setups, we might even want to enable automated progression from one environment
to the other, theoretically to production or at least pre-production. So as a next step, we’ll need
The RBAC of the Version Control System (VCS) and the Platform Orchestrator allows your
organization to define the right levels and control for your developers across roles. How these
RBACs are set up depends on the security posture of your organization. Are developers
production? Who’s permitted to change the baseline templates? We usually think about a
Holding true to our design principle of leaving interface choice and opting for code first,
the answer is: It depends! The proposed architecture leaves that choice to a workload-by-
workload basis.
The primary interaction method (by far the most used) is the code-based one. Developers
prefer to stay in their usual workflow, in the version control system (VCS), and within their
integrated development environment in order to “indicate” what their workloads require, spin
up new services, add resources etc. This is where a workload specification like Score comes
into play. It provides a code-based “specification” to describe how the workload relates to
other workloads and their dependent resources. Adding a Resource Definition to the Score file
will tell the Orchestrator to automatically create a new resource or wire an existing one. We’ll
score.yaml
apiVersion: score.dev/v1b1
metadata:
name: python-service
containers:
python-service:
image: python
variables:
CONNECTION_STRING: postgresql://${resources.db.user}:${resources.db.password}@${resources.db.host}:${resources
resources:
db:
type: postgres
storage:
type: s3
dns:
type: dns
We can see that the developer requires a database type Postgres, a storage of type S3 and a
DNS of type DNS. For the vast majority of use cases, this code-based format should be entirely
For specific situations (like running diffs, rolling back, spinning up new environments) they
might prefer to use the Orchestrator UI, CLI or API.
Portals and Service Catalogs are primarily used for consolidation and or product managers/
engineering managers, as well as onboarding and orienting new developers.
21 How developers use such a platform
Here’s a list of some activities a developer performs using an Internal Developer Platform and
what interface they usually choose:
Deploy Terminal/IDE
Now we understand the integration points for the different planes, let’s look at how a
developer deployment request would flow through the platform step-by-step. We call these
standard flows “golden paths”. Below are four (of many) standard flow examples the IDP
Let’s start with a very simple example. A developer has changed something on their workload
and now deploys to a dev environment. As we discussed earlier, the primary interaction
method for devs would be code. They would git-push their change, and the CI pipeline would
pick it up and run. It would then push the built image to the image registry. At this point we
have the service built, but we don’t have the configs yet (remember, we’re opting for Dynamic
Configuration Management).
The workload source code contains the workload specification (Score), which in this case
score.yaml
apiVersion: score.dev/v1b1
metadata:
name: python-service
containers:
python-service:
image: python
variables:
CONNECTION_STRING: postgresql://${resources.db.user}:${resources.db.password}@${resources.db.host}:${resources
resources:
db:
type: postgres
storage:
type: s3
dns:
type: dns
We can see that the developer requires a database type Postgres, a storage of type S 3 and a
DNS of type DNS.
23 Zooming in on a “golden path” to understand the interplay of all components
So after the CI has been built, the Platform Orchestrator realizes the context and looks up
what resources are matched against this context (in our case, it's maybe the CI tag
"environment = development"). It checks whether the resources are already created (which is
likely in this case because it’s just a deployment to an existing dev environment) and reaches
out to the Azure API to retrieve the resource credentials. It then creates the application configs
in the form of manifests because our target compute in this architecture is Azure Kubernetes
Service (AKS). Once this is done, the Orchestrator deploys the configs and injects the secrets
at runtime into the container (utilizing Vault).
score.yaml
apiVersion: score.dev/v1b1
metadata:
name: python-service
containers: CONTEXT:
python-service:
image: python
env = development
Dev request variables:
CONNECTION_STRING: postgresql://${resources.db.user}:${resources.db.password}@${resources.db.host}:${resources.db.port}/${resources.db.name}
Platform
Orchestrator
resources:
db:
type: postgres
storage:
type: s3
dns:
type: dns
score.yaml
apiVersion: score.dev/v1b1
metadata:
name: python-service
containers: CONTEXT:
python-service:
image: python
Dev request variables: env = development
CONNECTION_STRING: postgresql://${resources.db.user}:${resources.db.password}@${resources.db.host}:${resources.db.port}/${resources.db.name}
Platform
Orchestrator
resources:
db:
type: postgres
storage:
type: s3
dns:
type: dns
id = "db-dev"
name = "db-dev"
type = "postgres"
driver_type = "humanitec/postgres-cloudsql-static"
driver_inputs = {
values = {
"instance" = "test:test:test"
"name" = "db-dev"
"host" = "127.0.0.1"
"port" = "5432"
secrets = {
"username" = "test"
"password" = "test"
criteria = [
app_id = "test-app"
}
27 Zooming in on a “golden path” to understand the interplay of all components
03 We then need to find out what workloads are currently depending on our
Resource Definition of “dev Postgres”. The answer can be found in our “rules
engine”, the Platform Orchestrator. Simply because this is where the “decision
is made” regarding what resources to use to wire the workload up, and in what
context. We can do this by pinging the Orchestrator API or looking at the user
interface in the Resource Definition section: “Usage”.
Another benefit of IDPs is streamlined config management, which reduces cognitive load.
Developers can focus on writing code instead of worrying about infrastructure, which can be
a complex and time-consuming task. With an IDP, developers can simply select the resources
they need and configure them as required, freeing up more time for coding.
IDPs also offer new superpowers that can boost productivity. For example, developers
can use Score as a workload spec, which allows them to specify the desired performance
characteristics of their application. They can also spin up PR environments, which can be
used to test and debug code changes before merging them into the main codebase.
Furthermore, the diff functionality for debugging allows developers to quickly identify and
fix issues, while secure infrastructure self-service ensures that the entire development
process remains secure.
29 Benefits of this Architecture
In conclusion, IDPs can have a significant impact on the productivity and efficiency of
application developers. By reducing dependencies and waiting times, streamlining config
management, and offering new superpowers, developers can focus on delivering
high-quality applications.
In addition to the automation benefits, IDPs also enable developer self-service, which reduces
waiting times and skyrockets productivity. This allows for faster innovation cycles and
enables organizations to stay ahead of the competition. Moreover, dynamic IDPs require fewer
full-time Ops employees per every application developer, which helps organizations
streamline overall operations and reduce costs.
Another important benefit of IDPs is cost control. By reducing cloud bills and optimizing
resource allocation, organizations can invest saved money in other business areas. This is
especially important in today's highly competitive landscape, where every dollar counts.
Overall, IDPs have the potential to revolutionize the way organizations develop and deploy
software. By leveraging automation, developer self-service, and other advanced technologies,
IDPs can help organizations to stay ahead of the curve and achieve their goals more quickly
and efficiently than ever before.
30 Conclusion
Conclusion
In conclusion, adopting modern Internal Developer Platforms (IDPs) using on Azure Cloud, SQL
Server, Backstage, Humanitec, GitHub, Azure Pipelines, Terraform, and several other
technologies can help organizations improve their developer experience, increase productivity
and innovation, and reduce cognitive load for developers. By implementing this architecture,
organizations can also deliver applications faster and more efficiently. However, it is important
to remember that the implementation of an IDP varies widely by organization, and our
reference architecture is just a starting point for building an effective dynamic platform.
31 Ready to build your dynamic Internal Developer Platform? Next steps
Appendix
Capabilities of this architecture
Dynamic Configuration Management
Use the environment-agnostic workload specification to describe infrastructure
dependencies once and for all environments
Multiple workloads can depend on the same resource (e.g. a shared database or
DNS name)
A full history of all workload configuration, environment specific values and secrets
can be retrieved
Environment Management
New environments can be created on demand by cloning existing environments
Deployment Management
Deployments in an environment can be rolled back to a previous deployment
Webhooks for key events (creation/deletion of environments, deployments etc.)
are available
Deployments can be triggered based on criteria from source control, such as tag
format or branch name
Pipelines including additional pre and post-deployment steps can be defined (in beta)
Promotion of workloads between environments based on criteria such as tests passing
or manual approval can be automated
Observability
Can be used to standardize the integration of APM products
Container logs are surfaced without the user needing access to the cluster
Monitor environment health via workload, pod and container statuses. Errors are
displayed in real-time
Services can be catalogs, and metadata can be aggregated
34 Appendix
Administration
Run a self-hosted instance (still managed by Humanitec, but running in your network
Cost management
Resource limits
Pausing of environments
Integration
Long-lived API tokens can be issued to support integration with 3rd party systems
IP Whitelisting
Bastion hosts
VPN (IPSec)
Humanitec GmbH
Wöhlertstraße 12-13, 10115 Berlin, Germany
Phone: +49 30 6293-8516
Humanitec Inc
228 East 45th Street, Suite 9E,
Humanitec Ltd
3rd Floor, 1 Ashley Road
United Kingdom
E-mail: info@humanitec.com
Website: https://www.humanitec.com
Responsible for the content of humanitec.com ref. § 55 II RStV: Kaspar von Grünberg
IDE IaC
specs
control Workload m
Version Terrafor
Score
loper
Visual
de
Studio Co Github e Plane
rol
Resourc
e
Compute
line ernetes
Registry
Data
line
CI Pipe
FluxCD Azure SQL
Platforram
tor
Azure
r
Orchest ng
Containe Networki
Github
s Registry
Action Azure DN
S
tion
Integraery Plane
& Deliv
Services
rvice Bus
Azure Se
nitor
Azure Mo
ility
Observab
ing &
y Plane
Securit
humanitec.com