Azure Landing Zone - Policy at Scale

Question

Azure Landing Zone - Policy at Scale

architectraghu posted Dec 2, 2025 10 min read

Azure Policy in Real Landing Zones

Onboard multiple Azure subscriptions into a real landing zone using Azure Policy, management groups, and policy-as-code for enterprise governance.

Introduction

On paper, Azure Landing Zones and Azure Policy look clean and deterministic. In reality, you're onboarding dozens or hundreds of subscriptions with existing workloads, legacy configurations, and multiple teams that all think they're special.

This article focuses on using Azure Policy to onboard multiple subscriptions into a real, enterprise-grade landing zone:

How to structure management groups and policy initiatives for scalable governance
How to onboard brownfield and greenfield subscriptions safely
How to use policy as code and CI/CD to operate this at scale
How to balance security, reliability, and delivery velocity when policies start denying real traffic

The scope is Azure-first, but the patterns map naturally to AWS Organizations + SCPs and GCP Organization policies.

Core Concepts

Azure Landing Zone in Governance Terms

Forget marketing. In governance terms, an Azure Landing Zone is:

A management group hierarchy
A set of policy initiatives (built-in and custom)
A baseline of guardrails for networking, identity, security, and operations
A repeatable way to onboard subscriptions and apps into that baseline

Key Azure building blocks:

Management groups (MGs) – hierarchical containers for subscriptions; policies applied here flow down
Azure Policy definitions – rules for what is allowed, denied, audited, or automatically deployed
Policy initiatives – logical bundles of policy definitions (e.g., CAF "Azure Landing Zone" initiatives)
Assignments – binding a definition/initiative to a scope (management group, subscription, resource group)
Effects – Deny, Audit, DeployIfNotExists, Modify, Append, etc.
Remediation tasks – jobs created to apply DeployIfNotExists/Modify to existing resources

Management Group Hierarchy for Real Landing Zones

A pragmatic, enterprise-friendly hierarchy often looks like:

Tenant Root ( / )
 ├─ Platform
 │   ├─ Identity
 │   ├─ Management
 │   ├─ Connectivity
 ├─ LandingZones
 │   ├─ Corp
 │   │   ├─ Corp-Prod
 │   │   ├─ Corp-NonProd
 │   ├─ Online
 │       ├─ Online-Prod
 │       ├─ Online-NonProd
 └─ Sandbox

Patterns:

Platform MGs host shared services: identity, networking, management
LandingZones MGs host application subscriptions grouped by business domain / criticality
Sandbox MG for low-governance experimental spaces

Policy is usually bound at Platform, LandingZones, and specific child MGs such as Corp-Prod.

Policy as Code and DevSecOps

Azure Policy without automation becomes unmaintainable as soon as you pass 5 subscriptions.

Treat Azure Policy as code:

Definitions and initiatives stored as JSON in Git (or Bicep/ARM modules, or Terraform)
Pipelines (Azure DevOps or GitHub Actions) that:
- Validate policies (linting, schema checks)
- Publish definitions to the management group scope
- Assign initiatives with parameter values per environment (dev/test/prod)
PR-based change control for all policy changes
Security and platform teams collaborate on policy sets and exemptions through Git, not the portal

High-level mapping of key services:

DevOps tooling: Azure DevOps Pipelines / GitHub Actions
Governance: Azure Policy, Management Groups, Azure Blueprints (phasing out), CAF landing zone initiatives
Security: Defender for Cloud, Key Vault, Managed Identities, Entra ID (Azure AD)
Observability: Azure Monitor, Log Analytics, Application Insights

Step-by-Step Guide

1. Prerequisites and Roles

Prerequisites:

Tenant-level setup
- Management group hierarchy defined and approved
- Tenant Root Group locked down to a small platform/governance team
Access
- Platform team with Owner or Contributor + Resource Policy Contributor at key MGs
- Security team with Security Admin + rights to define/approve policies
Tools
- Git repository for policy-as-code
- CI/CD pipelines (Azure DevOps or GitHub Actions)
- Landing zone reference (CAF-aligned) selected and tailored

Team ownership:

Platform team: management groups, policy initiatives, CI/CD
Security team: security controls, approvals, exceptions
App teams: subscription ownership, remediation work, exception requests

2. Design the Management Group Strategy

Define environments
- Separate Prod and NonProd at MG level where possible (Corp-Prod, Corp-NonProd)
- Critical workloads can have their own branch (Payments-Prod MG)
Align policies with MG scopes
- Tenant root: only minimal guardrails (e.g., logging, naming, maybe region restrictions)
- Platform MG: policies for shared services (diagnostics, DDoS plan requirements, etc.)
- LandingZones MG: baseline security, networking, and operational standards for all app subscriptions

Key rule: You should not assign policies directly to individual subscriptions unless there is a strong, documented reason. Use management groups as default.

3. Define and Version Your Policy Library

Use a structure like:

policies/
  definitions/
    security/
      allowed-locations.json
      vm-require-managed-disks.json
    operations/
      deploy-diagnostics-to-log-analytics.json
    networking/
      require-private-endpoints.json
  initiatives/
    alz-core.json
    alz-security.json
    alz-networking.json
  assignments/
    Corp-NonProd/
      alz-core.json
      alz-security.json
    Corp-Prod/
      alz-core.json
      alz-security.json
      alz-networking.json

Example Terraform snippet for an initiative assignment:

resource "azurerm_policy_set_definition" "alz_core" {
  name         = "alz-core"
  display_name = "ALZ Core Baseline"
  policy_type  = "Custom"
  management_group_id = data.azurerm_management_group.landingzones.id

  policy_definitions = [
    {
      policy_definition_id = data.azurerm_policy_definition.allowed_locations.id
      parameters           = jsonencode({ listOfAllowedLocations = { value = ["westeurope", "northeurope"] } })
    },
    {
      policy_definition_id = data.azurerm_policy_definition.deploy_diagnostics.id
      parameters           = jsonencode({ logAnalytics = { value = "/subscriptions/.../resourceGroups/rg-logs/providers/Microsoft.OperationalInsights/workspaces/lz-logs-weu" } })
    }
  ]
}

Then assign that initiative to a specific management group:

resource "azurerm_policy_set_definition" "alz_core" {
  # as above
}

resource "azurerm_policy_assignment" "alz_core_corp_prod" {
  name                 = "alz-core-corp-prod"
  display_name         = "ALZ Core for Corp Prod"
  policy_definition_id = azurerm_policy_set_definition.alz_core.id
  scope                = data.azurerm_management_group.corp_prod.id

  identity {
    type = "SystemAssigned"
  }

  location = "westeurope"
}

4. Build CI/CD for Policies

Example: GitHub Actions workflow for policy deployment (simplified):

name: Deploy Azure Policies

on:
  push:
    branches: [ main ]
    paths:
      - 'policies/**'

jobs:
  deploy-policies:
    runs-on: ubuntu-latest
    env:
      AZURE_MG_ID: /providers/Microsoft.Management/managementGroups/landingzones
    steps:
      - uses: actions/checkout@v4

      - uses: azure/login@v2
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Validate policy JSON
        run: |
          python scripts/validate_policies.py policies/definitions
          python scripts/validate_initiatives.py policies/initiatives

      - name: Deploy policy definitions
        run: |
          az deployment mg create \
            --management-group-id ${AZURE_MG_ID##*/} \
            --template-file infra/policies/definitions.bicep \
            --parameters definitionsPath=policies/definitions

      - name: Deploy policy initiatives and assignments
        run: |
          az deployment mg create \
            --management-group-id ${AZURE_MG_ID##*/} \
            --template-file infra/policies/initiatives.bicep \
            --parameters initiativesPath=policies/initiatives assignmentsPath=policies/assignments

Key points:

Use service principals with least privilege for policy deployment
Validate JSON / Bicep / Terraform as part of the pipeline
Use separate pipelines or stages for dev vs prod MGs

5. Onboard Subscriptions: Brownfield vs Greenfield

5.1 Greenfield subscriptions

For new subscriptions:

Create subscription via automation (e.g., Azure Deployment Environments, Azure DevOps pipeline, Terraform, custom portal)
Ensure the new subscription is placed in the correct management group (Corp-NonProd, Corp-Prod, etc.)
Policies auto-apply from the MG
Validate:
- Subscription appears in management group
- Policy assignments list inherited initiatives
- First deployments from app team resources comply with policy

5.2 Brownfield subscriptions (existing workloads)

Onboarding existing subscriptions is where Azure Policy meets reality.

Process:

Discovery
- Move subscription into a temporary MG (e.g., Brownfield-Quarantine) where only Audit policies are applied
- Run compliance scans and export non-compliance via Azure Resource Graph or az policy state CLI
- Classify findings: high-risk (public IPs, missing encryption, no backups), medium, low
Define migration strategy
- Agree with app owners on timelines and responsibility for remediation
- For some controls, use DeployIfNotExists or Modify to auto-remediate
- For hard-breaking changes, schedule maintenance windows
Gradual enforcement
- Phase 1: Audit-only policies at target landing zone MG
- Phase 2: Switch selected controls to Deny for new deployments, keep existing resources under Audit/DeployIfNotExists
- Phase 3: Fully enforce policies including existing resources (if realistic)
Move subscription into its final MG once:
- Critical non-compliances are addressed or excepted
- Auto-remediation jobs have run and are monitored
- App team acknowledges and accepts the landing zone guardrails

6. Operationalizing Policy: Remediation and Exceptions

6.1 Remediation

For DeployIfNotExists / Modify policies:

Enable system-assigned managed identity on policy assignments to perform changes
Trigger remediation tasks via portal or CLI:

az policy remediation create \
  --name fix-diag-settings \
  --policy-assignment "/subscriptions/<subId>/providers/Microsoft.Authorization/policyAssignments/alz-core-corp-prod" \
  --resource-group rg-app-prod

Monitor remediation status via:
- Azure Policy compliance dashboard
- Azure Monitor alerts on failed remediation tasks

6.2 Exceptions and Exemptions

Not everything can be compliant on day one.

Use Azure Policy exemptions instead of deleting or bypassing policies:

Scope exemptions to the smallest possible asset (resource group or resource)
Make them time-bound where possible (expiration date)
Track exemptions in Git (YAML/JSON manifest) to avoid "exception sprawl"

Architecture Overview

This shows:

Policy-as-code flowing through CI/CD into MG scopes
Subscriptions inheriting initiatives from their MGs
Clear ownership separation between platform, security, and app teams

Best Practices

Start with management groups, not subscriptions
Design and agree your MG hierarchy before onboarding more subscriptions
Bundle policies into coherent initiatives
Organize by domain: alz-core, alz-security, alz-networking, alz-ops rather than hundreds of standalone assignments
Use Audit-first, then Deny
Introduce policies as Audit, analyze impact, then promote selected controls to Deny for new deployments
Separate baselines for Prod vs NonProd
NonProd can have more permissive policies (e.g., broader locations, less strict SKUs). Prod should be tightly controlled
Make policy-as-code mandatory
Disallow manual creation or modification of policy assignments outside pipelines (and detect them via drift monitoring)
Minimize tenant root scope policies
Place most policies at Landing Zone MG scope; keep tenant root focused on tenant-wide essentials (e.g., disallowed regions, high-level logging)
Automate subscription creation and placement
Never create subscriptions manually via portal for production scenarios. Use pipelines + APIs to enforce correct MG placement and tags
Integrate policy results into observability
Export compliance and remediation metrics to Log Analytics, build dashboards and alerts (e.g., % compliance per landing zone, per team)
Align policy with security standards
Map initiatives to CIS, NIST, ISO controls; use Defender for Cloud and CAF recommendations to inform baseline

Common Pitfalls

1. "Deny Everything" From Day One

Issue: Deploying a large set of Deny policies to all subscriptions at once.
Impact: Broken pipelines, blocked deployments, emergency bypasses, political fallout.
Detection: Sudden spike in policy evaluation failures and deployment errors (HTTP 403) across subscriptions.
Fix:

Roll back to Audit for high-impact controls
Reintroduce Deny gradually with prior impact analysis and communication

2. No Separation Between Baseline and App-Specific Policies

Issue: App-specific requirements mixed into global initiatives.
Impact: Baseline becomes cluttered, unmanageable, and difficult to update.
Detection: Initiatives that reference specific resource names, SKUs, or app tags.
Fix:

Keep platform baseline initiatives generic
Manage app-specific policies at subscription or dedicated MG branches

3. Manual Policy Drift

Issue: Changes made via the portal that are not reflected in Git.
Impact: Unknown behaviors, inconsistent environments, broken expectations between teams.
Detection: Regular export and diff between current assignments and Git repo.
Fix:

Enforce policy-as-code
Introduce a "drift detection" pipeline that compares actual state to desired state and raises alerts

4. Overuse of Tenant Root Policies

Issue: Assigning almost everything at tenant root MG.
Impact: No flexibility for different landing zones, high blast radius for mistakes.
Detection: Tenant root has dozens of initiatives; lower MGs have almost none.
Fix:

Move most initiatives down to LandingZone / domain MGs
Keep root very thin

5. Ignoring Remediation Identity Permissions

Issue: Policy assignments use DeployIfNotExists but the managed identity doesn't have required permissions.
Impact: Remediation tasks fail silently or partially.
Detection: Errors in remediation jobs, non-compliant resources not fixed.
Fix:

Assign appropriate RBAC roles (e.g., Contributor on target scopes) to the policy assignment identities

6. Poor Exception Governance

Issue: Exceptions created ad-hoc, without expiry or documentation.
Impact: Hidden risk, compliance gaps, and audit failures.
Detection: Large number of exemptions with no clear owner or reason.
Fix:

Implement a standardized exception workflow (ticket + PR + approval)
Require owner, reason, scope, and expiry for every exemption

FAQ

1. How does this compare to AWS and GCP?

AWS equivalent: AWS Organizations + SCPs + Config; Azure Policy is closer to Config rules + some SCP-like deny behavior
GCP equivalent: Organization policies + Constraints at org/folder/project scopes

The pattern—policy at hierarchy nodes, inheritance to accounts/subscriptions/projects—is the same.

2. How do I integrate security and compliance without blocking delivery?

Use a progressive rollout: start with Audit, expose non-compliance to teams, set SLAs for fixes, then enforce Deny only on well-understood controls. Add policy checks into CI (pre-merge) as well as at runtime.

3. How do I scale this across dozens of teams and subscriptions?

Standardize landing zones per domain
Centralize policy-as-code in a platform-owned repo
Provide self-service subscription provisioning that automatically places subscriptions into the right MG with the correct initiatives

4. What about multi-region and DR landing zones?

Ensure allowed locations and networking policies support your DR regions. Create separate MGs if DR regions have different constraints, or parameterize initiatives with region-specific settings (e.g., different Log Analytics workspaces).

5. How do I integrate with legacy stacks and on-prem?

Focus your policies on connectivity and security first: VPN/ExpressRoute, private endpoints, NSGs, and logging. Ensure that hybrid connectivity does not bypass landing zone controls (e.g., enforcing private-only paths for PaaS services).

6. How do I measure success of my landing zone governance?

Track KPIs such as:

% of subscriptions onboarded to landing zones
Policy compliance rate per MG / per team
Mean time to remediate non-compliant resources
Number of active exemptions and their trend
Deployment failure rates caused by policy (should decrease over time as baselines stabilize)

7. How do I deal with app teams who need special policies?

Use branching MGs under landing zones or subscription-scoped initiatives for special cases, but keep them managed via the same policy-as-code pipeline and documented exceptions.

8. Do I need Azure Blueprints?

Blueprints are being superseded by ARM/Bicep/Terraform + policy-as-code. Prefer those, unless you have a strong reason to keep existing Blueprint-based implementations.

Conclusion

Onboarding multiple subscriptions into a real Azure Landing Zone is fundamentally a governance and operations problem, not a button in the portal. Azure Policy gives you the enforcement engine; management groups give you the structure; policy-as-code and CI/CD give you repeatability.

Key takeaways:

Design a management group hierarchy that reflects your organization
Use policy initiatives aligned to CAF domains (core, security, networking, ops)
Treat Azure Policy as code, integrated with pipelines and reviews
Phase in enforcement: audit, analyze, remediate, then deny
Manage exceptions and remediation with discipline, not one-off hacks

Next steps:

Draft your MG hierarchy and get cross-team buy-in
Stand up a policy-as-code repo and CI/CD pipeline
Start with a non-prod landing zone, onboard a few subscriptions, and refine
Gradually extend to prod and high-criticality workloads

References

2 Comments

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

Marco Marelli · Answer 1 · 2025-12-03T14:11:39+0000

Marco Marelli • Dec 3, 2025

Nice point here about easing into Deny instead of flipping everything at once, that part from the author really hits. Makes me wonder how teams decide when a control is mature enough to enforce for real.

architectraghu • Dec 5, 2025

@[Marco Marelli] Glad that part resonated with you. The “ease into Deny” is one of those things we usually learn after breaking a few pipelines.

For me, a control feels “ready to enforce” when:

it’s been in Audit for a while with low/no false positives,

the exceptions are clearly understood and documented, and

there’s a repeatable way to unblock people when it does trigger (runbook, exemption process, etc.).

	Implementing Cellular Redundancy: Cross-Cloud Failover with AWS Transit Gateway and Azure ExpressRou Cláudio Raposo - May 5
	Designing a Multicloud Cellular Architecture for Blast Radius Containment Cláudio Raposo - May 4
	Implementing Cellular Data Sovereignty: AWS DynamoDB Global Tables vs. Azure Cosmos DB Multi-Region Cláudio Raposo - May 7
	Just completed another large-scale WordPress migration — and the client left this saqib_devmorph - Apr 7
	Hardening Azure Acmebot for ISO 27001 & NIS2 Compliance with Terraform dwoitzik - May 20

Welcome to Coder Legion

Connect with 4,341 amazing developers

Don't have an account? Sign up

OR

Azure Landing Zone - Policy at Scale

Azure Policy in Real Landing Zones

Introduction

Core Concepts

Azure Landing Zone in Governance Terms

Management Group Hierarchy for Real Landing Zones

Policy as Code and DevSecOps

Step-by-Step Guide

1. Prerequisites and Roles

2. Design the Management Group Strategy

3. Define and Version Your Policy Library

4. Build CI/CD for Policies

5. Onboard Subscriptions: Brownfield vs Greenfield

5.1 Greenfield subscriptions

5.2 Brownfield subscriptions (existing workloads)

6. Operationalizing Policy: Remediation and Exceptions

6.1 Remediation

6.2 Exceptions and Exemptions

Architecture Overview

Best Practices

Common Pitfalls

1. "Deny Everything" From Day One

2. No Separation Between Baseline and App-Specific Policies

3. Manual Policy Drift

4. Overuse of Tenant Root Policies

5. Ignoring Remediation Identity Permissions

6. Poor Exception Governance

FAQ

Conclusion

References

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Implementing Cellular Redundancy: Cross-Cloud Failover with AWS Transit Gateway and Azure ExpressRou

Designing a Multicloud Cellular Architecture for Blast Radius Containment

Implementing Cellular Data Sovereignty: AWS DynamoDB Global Tables vs. Azure Cosmos DB Multi-Region

Just completed another large-scale WordPress migration — and the client left this

Hardening Azure Acmebot for ISO 27001 & NIS2 Compliance with Terraform

More From architectraghu

Zero Downtime: Designing an Azure Multi-Region Architecture

Related Jobs

Commenters (This Week)