Azure Landing Zone - Policy at Scale

Azure Landing Zone - Policy at Scale

posted 10 min read

Azure Policy in Real Landing Zones

Onboard multiple Azure subscriptions into a real landing zone using Azure Policy, management groups, and policy-as-code for enterprise governance.

Introduction

On paper, Azure Landing Zones and Azure Policy look clean and deterministic. In reality, you're onboarding dozens or hundreds of subscriptions with existing workloads, legacy configurations, and multiple teams that all think they're special.

This article focuses on using Azure Policy to onboard multiple subscriptions into a real, enterprise-grade landing zone:

  • How to structure management groups and policy initiatives for scalable governance
  • How to onboard brownfield and greenfield subscriptions safely
  • How to use policy as code and CI/CD to operate this at scale
  • How to balance security, reliability, and delivery velocity when policies start denying real traffic

The scope is Azure-first, but the patterns map naturally to AWS Organizations + SCPs and GCP Organization policies.

Core Concepts

Azure Landing Zone in Governance Terms

Forget marketing. In governance terms, an Azure Landing Zone is:

  • A management group hierarchy
  • A set of policy initiatives (built-in and custom)
  • A baseline of guardrails for networking, identity, security, and operations
  • A repeatable way to onboard subscriptions and apps into that baseline

Key Azure building blocks:

  • Management groups (MGs) – hierarchical containers for subscriptions; policies applied here flow down
  • Azure Policy definitions – rules for what is allowed, denied, audited, or automatically deployed
  • Policy initiatives – logical bundles of policy definitions (e.g., CAF "Azure Landing Zone" initiatives)
  • Assignments – binding a definition/initiative to a scope (management group, subscription, resource group)
  • EffectsDeny, Audit, DeployIfNotExists, Modify, Append, etc.
  • Remediation tasks – jobs created to apply DeployIfNotExists/Modify to existing resources

Management Group Hierarchy for Real Landing Zones

A pragmatic, enterprise-friendly hierarchy often looks like:

Tenant Root ( / )
 ├─ Platform
 │   ├─ Identity
 │   ├─ Management
 │   ├─ Connectivity
 ├─ LandingZones
 │   ├─ Corp
 │   │   ├─ Corp-Prod
 │   │   ├─ Corp-NonProd
 │   ├─ Online
 │       ├─ Online-Prod
 │       ├─ Online-NonProd
 └─ Sandbox

Patterns:

  • Platform MGs host shared services: identity, networking, management
  • LandingZones MGs host application subscriptions grouped by business domain / criticality
  • Sandbox MG for low-governance experimental spaces

Policy is usually bound at Platform, LandingZones, and specific child MGs such as Corp-Prod.

Policy as Code and DevSecOps

Azure Policy without automation becomes unmaintainable as soon as you pass 5 subscriptions.

Treat Azure Policy as code:

  • Definitions and initiatives stored as JSON in Git (or Bicep/ARM modules, or Terraform)
  • Pipelines (Azure DevOps or GitHub Actions) that:
    • Validate policies (linting, schema checks)
    • Publish definitions to the management group scope
    • Assign initiatives with parameter values per environment (dev/test/prod)
  • PR-based change control for all policy changes
  • Security and platform teams collaborate on policy sets and exemptions through Git, not the portal

High-level mapping of key services:

  • DevOps tooling: Azure DevOps Pipelines / GitHub Actions
  • Governance: Azure Policy, Management Groups, Azure Blueprints (phasing out), CAF landing zone initiatives
  • Security: Defender for Cloud, Key Vault, Managed Identities, Entra ID (Azure AD)
  • Observability: Azure Monitor, Log Analytics, Application Insights

Step-by-Step Guide

1. Prerequisites and Roles

Prerequisites:

  1. Tenant-level setup

    • Management group hierarchy defined and approved
    • Tenant Root Group locked down to a small platform/governance team
  2. Access

    • Platform team with Owner or Contributor + Resource Policy Contributor at key MGs
    • Security team with Security Admin + rights to define/approve policies
  3. Tools

    • Git repository for policy-as-code
    • CI/CD pipelines (Azure DevOps or GitHub Actions)
    • Landing zone reference (CAF-aligned) selected and tailored

Team ownership:

  • Platform team: management groups, policy initiatives, CI/CD
  • Security team: security controls, approvals, exceptions
  • App teams: subscription ownership, remediation work, exception requests

2. Design the Management Group Strategy

  1. Define environments

    • Separate Prod and NonProd at MG level where possible (Corp-Prod, Corp-NonProd)
    • Critical workloads can have their own branch (Payments-Prod MG)
  2. Align policies with MG scopes

    • Tenant root: only minimal guardrails (e.g., logging, naming, maybe region restrictions)
    • Platform MG: policies for shared services (diagnostics, DDoS plan requirements, etc.)
    • LandingZones MG: baseline security, networking, and operational standards for all app subscriptions

Key rule: You should not assign policies directly to individual subscriptions unless there is a strong, documented reason. Use management groups as default.

3. Define and Version Your Policy Library

Use a structure like:

policies/
  definitions/
    security/
      allowed-locations.json
      vm-require-managed-disks.json
    operations/
      deploy-diagnostics-to-log-analytics.json
    networking/
      require-private-endpoints.json
  initiatives/
    alz-core.json
    alz-security.json
    alz-networking.json
  assignments/
    Corp-NonProd/
      alz-core.json
      alz-security.json
    Corp-Prod/
      alz-core.json
      alz-security.json
      alz-networking.json

Example Terraform snippet for an initiative assignment:

resource "azurerm_policy_set_definition" "alz_core" {
  name         = "alz-core"
  display_name = "ALZ Core Baseline"
  policy_type  = "Custom"
  management_group_id = data.azurerm_management_group.landingzones.id

  policy_definitions = [
    {
      policy_definition_id = data.azurerm_policy_definition.allowed_locations.id
      parameters           = jsonencode({ listOfAllowedLocations = { value = ["westeurope", "northeurope"] } })
    },
    {
      policy_definition_id = data.azurerm_policy_definition.deploy_diagnostics.id
      parameters           = jsonencode({ logAnalytics = { value = "/subscriptions/.../resourceGroups/rg-logs/providers/Microsoft.OperationalInsights/workspaces/lz-logs-weu" } })
    }
  ]
}

Then assign that initiative to a specific management group:

resource "azurerm_policy_set_definition" "alz_core" {
  # as above
}

resource "azurerm_policy_assignment" "alz_core_corp_prod" {
  name                 = "alz-core-corp-prod"
  display_name         = "ALZ Core for Corp Prod"
  policy_definition_id = azurerm_policy_set_definition.alz_core.id
  scope                = data.azurerm_management_group.corp_prod.id

  identity {
    type = "SystemAssigned"
  }

  location = "westeurope"
}

4. Build CI/CD for Policies

Example: GitHub Actions workflow for policy deployment (simplified):

name: Deploy Azure Policies

on:
  push:
    branches: [ main ]
    paths:
      - 'policies/**'

jobs:
  deploy-policies:
    runs-on: ubuntu-latest
    env:
      AZURE_MG_ID: /providers/Microsoft.Management/managementGroups/landingzones
    steps:
      - uses: actions/checkout@v4

      - uses: azure/login@v2
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Validate policy JSON
        run: |
          python scripts/validate_policies.py policies/definitions
          python scripts/validate_initiatives.py policies/initiatives

      - name: Deploy policy definitions
        run: |
          az deployment mg create \
            --management-group-id ${AZURE_MG_ID##*/} \
            --template-file infra/policies/definitions.bicep \
            --parameters definitionsPath=policies/definitions

      - name: Deploy policy initiatives and assignments
        run: |
          az deployment mg create \
            --management-group-id ${AZURE_MG_ID##*/} \
            --template-file infra/policies/initiatives.bicep \
            --parameters initiativesPath=policies/initiatives assignmentsPath=policies/assignments

Key points:

  • Use service principals with least privilege for policy deployment
  • Validate JSON / Bicep / Terraform as part of the pipeline
  • Use separate pipelines or stages for dev vs prod MGs

5. Onboard Subscriptions: Brownfield vs Greenfield

5.1 Greenfield subscriptions

For new subscriptions:

  1. Create subscription via automation (e.g., Azure Deployment Environments, Azure DevOps pipeline, Terraform, custom portal)
  2. Ensure the new subscription is placed in the correct management group (Corp-NonProd, Corp-Prod, etc.)
  3. Policies auto-apply from the MG
  4. Validate:
    • Subscription appears in management group
    • Policy assignments list inherited initiatives
    • First deployments from app team resources comply with policy
5.2 Brownfield subscriptions (existing workloads)

Onboarding existing subscriptions is where Azure Policy meets reality.

Process:

  1. Discovery

    • Move subscription into a temporary MG (e.g., Brownfield-Quarantine) where only Audit policies are applied
    • Run compliance scans and export non-compliance via Azure Resource Graph or az policy state CLI
    • Classify findings: high-risk (public IPs, missing encryption, no backups), medium, low
  2. Define migration strategy

    • Agree with app owners on timelines and responsibility for remediation
    • For some controls, use DeployIfNotExists or Modify to auto-remediate
    • For hard-breaking changes, schedule maintenance windows
  3. Gradual enforcement

    • Phase 1: Audit-only policies at target landing zone MG
    • Phase 2: Switch selected controls to Deny for new deployments, keep existing resources under Audit/DeployIfNotExists
    • Phase 3: Fully enforce policies including existing resources (if realistic)
  4. Move subscription into its final MG once:

    • Critical non-compliances are addressed or excepted
    • Auto-remediation jobs have run and are monitored
    • App team acknowledges and accepts the landing zone guardrails

6. Operationalizing Policy: Remediation and Exceptions

6.1 Remediation

For DeployIfNotExists / Modify policies:

  • Enable system-assigned managed identity on policy assignments to perform changes
  • Trigger remediation tasks via portal or CLI:
az policy remediation create \
  --name fix-diag-settings \
  --policy-assignment "/subscriptions/<subId>/providers/Microsoft.Authorization/policyAssignments/alz-core-corp-prod" \
  --resource-group rg-app-prod
  • Monitor remediation status via:
    • Azure Policy compliance dashboard
    • Azure Monitor alerts on failed remediation tasks
6.2 Exceptions and Exemptions

Not everything can be compliant on day one.

Use Azure Policy exemptions instead of deleting or bypassing policies:

  • Scope exemptions to the smallest possible asset (resource group or resource)
  • Make them time-bound where possible (expiration date)
  • Track exemptions in Git (YAML/JSON manifest) to avoid "exception sprawl"

Architecture Overview

This shows:

  • Policy-as-code flowing through CI/CD into MG scopes
  • Subscriptions inheriting initiatives from their MGs
  • Clear ownership separation between platform, security, and app teams

Best Practices

  • Start with management groups, not subscriptions
    Design and agree your MG hierarchy before onboarding more subscriptions

  • Bundle policies into coherent initiatives
    Organize by domain: alz-core, alz-security, alz-networking, alz-ops rather than hundreds of standalone assignments

  • Use Audit-first, then Deny
    Introduce policies as Audit, analyze impact, then promote selected controls to Deny for new deployments

  • Separate baselines for Prod vs NonProd
    NonProd can have more permissive policies (e.g., broader locations, less strict SKUs). Prod should be tightly controlled

  • Make policy-as-code mandatory
    Disallow manual creation or modification of policy assignments outside pipelines (and detect them via drift monitoring)

  • Minimize tenant root scope policies
    Place most policies at Landing Zone MG scope; keep tenant root focused on tenant-wide essentials (e.g., disallowed regions, high-level logging)

  • Automate subscription creation and placement
    Never create subscriptions manually via portal for production scenarios. Use pipelines + APIs to enforce correct MG placement and tags

  • Integrate policy results into observability
    Export compliance and remediation metrics to Log Analytics, build dashboards and alerts (e.g., % compliance per landing zone, per team)

  • Align policy with security standards
    Map initiatives to CIS, NIST, ISO controls; use Defender for Cloud and CAF recommendations to inform baseline

Common Pitfalls

1. "Deny Everything" From Day One

Issue: Deploying a large set of Deny policies to all subscriptions at once.
Impact: Broken pipelines, blocked deployments, emergency bypasses, political fallout.
Detection: Sudden spike in policy evaluation failures and deployment errors (HTTP 403) across subscriptions.
Fix:

  • Roll back to Audit for high-impact controls
  • Reintroduce Deny gradually with prior impact analysis and communication

2. No Separation Between Baseline and App-Specific Policies

Issue: App-specific requirements mixed into global initiatives.
Impact: Baseline becomes cluttered, unmanageable, and difficult to update.
Detection: Initiatives that reference specific resource names, SKUs, or app tags.
Fix:

  • Keep platform baseline initiatives generic
  • Manage app-specific policies at subscription or dedicated MG branches

3. Manual Policy Drift

Issue: Changes made via the portal that are not reflected in Git.
Impact: Unknown behaviors, inconsistent environments, broken expectations between teams.
Detection: Regular export and diff between current assignments and Git repo.
Fix:

  • Enforce policy-as-code
  • Introduce a "drift detection" pipeline that compares actual state to desired state and raises alerts

4. Overuse of Tenant Root Policies

Issue: Assigning almost everything at tenant root MG.
Impact: No flexibility for different landing zones, high blast radius for mistakes.
Detection: Tenant root has dozens of initiatives; lower MGs have almost none.
Fix:

  • Move most initiatives down to LandingZone / domain MGs
  • Keep root very thin

5. Ignoring Remediation Identity Permissions

Issue: Policy assignments use DeployIfNotExists but the managed identity doesn't have required permissions.
Impact: Remediation tasks fail silently or partially.
Detection: Errors in remediation jobs, non-compliant resources not fixed.
Fix:

  • Assign appropriate RBAC roles (e.g., Contributor on target scopes) to the policy assignment identities

6. Poor Exception Governance

Issue: Exceptions created ad-hoc, without expiry or documentation.
Impact: Hidden risk, compliance gaps, and audit failures.
Detection: Large number of exemptions with no clear owner or reason.
Fix:

  • Implement a standardized exception workflow (ticket + PR + approval)
  • Require owner, reason, scope, and expiry for every exemption

FAQ

1. How does this compare to AWS and GCP?

  • AWS equivalent: AWS Organizations + SCPs + Config; Azure Policy is closer to Config rules + some SCP-like deny behavior
  • GCP equivalent: Organization policies + Constraints at org/folder/project scopes

The pattern—policy at hierarchy nodes, inheritance to accounts/subscriptions/projects—is the same.

2. How do I integrate security and compliance without blocking delivery?

Use a progressive rollout: start with Audit, expose non-compliance to teams, set SLAs for fixes, then enforce Deny only on well-understood controls. Add policy checks into CI (pre-merge) as well as at runtime.

3. How do I scale this across dozens of teams and subscriptions?

  • Standardize landing zones per domain
  • Centralize policy-as-code in a platform-owned repo
  • Provide self-service subscription provisioning that automatically places subscriptions into the right MG with the correct initiatives

4. What about multi-region and DR landing zones?

Ensure allowed locations and networking policies support your DR regions. Create separate MGs if DR regions have different constraints, or parameterize initiatives with region-specific settings (e.g., different Log Analytics workspaces).

5. How do I integrate with legacy stacks and on-prem?

Focus your policies on connectivity and security first: VPN/ExpressRoute, private endpoints, NSGs, and logging. Ensure that hybrid connectivity does not bypass landing zone controls (e.g., enforcing private-only paths for PaaS services).

6. How do I measure success of my landing zone governance?

Track KPIs such as:

  • % of subscriptions onboarded to landing zones
  • Policy compliance rate per MG / per team
  • Mean time to remediate non-compliant resources
  • Number of active exemptions and their trend
  • Deployment failure rates caused by policy (should decrease over time as baselines stabilize)

7. How do I deal with app teams who need special policies?

Use branching MGs under landing zones or subscription-scoped initiatives for special cases, but keep them managed via the same policy-as-code pipeline and documented exceptions.

8. Do I need Azure Blueprints?

Blueprints are being superseded by ARM/Bicep/Terraform + policy-as-code. Prefer those, unless you have a strong reason to keep existing Blueprint-based implementations.

Conclusion

Onboarding multiple subscriptions into a real Azure Landing Zone is fundamentally a governance and operations problem, not a button in the portal. Azure Policy gives you the enforcement engine; management groups give you the structure; policy-as-code and CI/CD give you repeatability.

Key takeaways:

  • Design a management group hierarchy that reflects your organization
  • Use policy initiatives aligned to CAF domains (core, security, networking, ops)
  • Treat Azure Policy as code, integrated with pipelines and reviews
  • Phase in enforcement: audit, analyze, remediate, then deny
  • Manage exceptions and remediation with discipline, not one-off hacks

Next steps:

  1. Draft your MG hierarchy and get cross-team buy-in
  2. Stand up a policy-as-code repo and CI/CD pipeline
  3. Start with a non-prod landing zone, onboard a few subscriptions, and refine
  4. Gradually extend to prod and high-criticality workloads

References

1 Comment

1 vote
1

More Posts

What Is an Availability Zone Explained Simply

Ijay - Feb 12

Just completed another large-scale WordPress migration — and the client left this

saqib_devmorph - Apr 7

3 Ways to Configure Resources in Terraform

Ijay - Apr 14

Zero Downtime: Designing an Azure Multi-Region Architecture

architectraghu - Nov 30, 2025

I Built a Production-Level AI Resume Analyzer Using 9 Azure Services (Free Tier)

pravin_m_kshirsagar - Mar 30
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

3 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!