Remco Eissing

Enabling Defender for Servers Vulnerability Management through Infrastructure as Code

Recently got the question on how to enable certain environment settings in Microsoft Defender for Cloud. This post will be targeted at enabling the vulnerability assessment for machines feature and assumes that Defender for Servers itself is already enabled. In Azure this is managed via the Microsoft.Security/serverVulnerabilityAssessmentsSettings resource. This resource is used to enable the Vulnerability Management of Defender for Servers on Azure. This is a great way to ensure that you gain valuable insights on the vulnerabilities applicable to your servers and ensure they are protected against the latest threats.

When we look at the documentation of this resource it already describes a name, kind and a properties object with the selected provider. This selected provider also includes the example of MdeTvm as required value. Making it clear that this is the only value that we can currently pick.

Before running the deployment: Before running the deployment

Bicep example

If we would put this in a Bicep template, this would look like this:

targetScope = 'subscription'

resource serverVulnerabilityAssessmentsSettings 'Microsoft.Security/serverVulnerabilityAssessmentsSettings@2023-05-01' = {
  name: 'AzureServersSetting'
  kind: 'AzureServersSetting'
  properties: {
    selectedProvider: 'MdeTvm'
  }
}

There are a couple of things to note from this Bicep template; one of the first things is the targetScope that is set to subscription. This is because the Vulnerability Management of Defender for Servers is a subscription level resource. The second thing to note is the kind and name that is set to AzureServersSetting. This is the kind that is used for the Vulnerability Management of Defender for Servers. With this template in hand all that we have to do is utilize New-AzSubscriptionDeployment -name 'va' -TemplateFile .\vulnerabilityAssessment.bicep -Location 'westeurope' to deploy it to our subscription.

After running the deployment: After running the deployment

Conclusion

So there we have it. When we now deploy this against our subscription we have enabled the Vulnerability Management of Defender for Servers on Azure. This is a great way to ensure that all your servers are protected against the latest threats.

read more

Enabling Azure Policy for Kubernetes from Defender for Container with Infrastructure as Code

Recently got the question on how to enable certain environment settings in Microsoft Defender for Cloud. This post will be targeted at enabling the Azure Policy for Kubernetes feature and assumes that Defender for Containers itself is already enabled.

When we enable this in Azure what will happen under the hood is that two extra policies will be assigned at the subscription level. The two policy definitions are ‘Deploy Azure Policy Add-on to Azure Kubernetes Service clusters’ and ‘Configure Azure Arc enabled Kubernetes clusters to install the Azure Policy extension’. But as we work with code we typically use the definition id’s, which are a8eff44f-8c92-45c3-a3fb-9880802d67a7 and 0adc5395-9169-4b9b-8687-af838d69410a.

Before running the deployment:

Before running the deployment

Bicep example

Putting this into a Bicep template is just like any other Policy Assignment that we create. However we keep the names and as these are deploy if not exists policies we also assign a System Managed Identity to the policy, without it we cannot deploy. If we would put this in a Bicep template, this would look like this:

targetScope = 'subscription'

resource azurePolicyKubernetesArc 'Microsoft.Authorization/policyAssignments@2024-04-01' = {
  name: 'Defender for Containers provisioning Policy extension for Arc-e'
  location: deployment().location
  properties: {
    displayName: 'Configure Azure Arc enabled Kubernetes clusters to install the Azure Policy extension'
    description: 'Deploy Azure Policy\'s extension for Azure Arc to provide at-scale enforcements and safeguard your Arc enabled Kubernetes clusters in a centralized, consistent manner. Learn more at https://aka.ms/akspolicydoc.'
    policyDefinitionId: '/providers/Microsoft.Authorization/policyDefinitions/0adc5395-9169-4b9b-8687-af838d69410a'
    parameters: {
      effect: {
        value: 'DeployIfNotExists'
      }
    }
  }
  identity: {
    type: 'SystemAssigned'
  }
}

resource azurePolicyKubernetes 'Microsoft.Authorization/policyAssignments@2024-04-01' = {
  name: 'Defender for Containers provisioning Azure Policy Addon for Kub'
  location: deployment().location
  properties: {
    displayName: 'Deploy Azure Policy Add-on to Azure Kubernetes Service clusters'
    description: 'Use Azure Policy Add-on to manage and report on the compliance state of your Azure Kubernetes Service (AKS) clusters. For more information, see https://aka.ms/akspolicydoc.'
    policyDefinitionId: '/providers/Microsoft.Authorization/policyDefinitions/a8eff44f-8c92-45c3-a3fb-9880802d67a7'
    parameters: {
      effect: {
        value: 'DeployIfNotExists'
      }
    }
  }
  identity: {
    type: 'SystemAssigned'
  }
}

After running the deployment:

After running the deployment

Permissions

When we enable Azure Policy addon for Kubernetes via the portal you would see that it does not only create just plain Azure Policy assignments but also provides the identity the following role based permissions:

So if we want to deploy this properly we should ensure that we grant these roles as well. We could extend the Bicep file with two additional resources that loop over an array of these 3 role definitions. As an alternative a module would also be a possibility, but for the simplicity of the example lets use two resources.

var roleDefinitions = [
  '8bb6f106-b146-4ee6-a3f9-b9c5a96e0ae5'
  '5e93ba01-8f92-4c7a-b12a-801e3df23824'
  '92aaf0da-9dab-42b6-94a3-d43ce8d16293'
]

resource arcRoleAssignments 'Microsoft.Authorization/roleAssignments@2022-04-01' = [for roleDefinition in roleDefinitions: {
  name: guid(roleDefinition, 'defender for containers arc')
  properties: {
    principalId: azurePolicyKubernetesArc.identity.principalId
    roleDefinitionId: '/providers/Microsoft.Authorization/roleDefinitions/${roleDefinition}'
    principalType: 'ServicePrincipal'
  }
}]

resource azurePolicyRoleAssignments 'Microsoft.Authorization/roleAssignments@2022-04-01' = [for roleDefinition in roleDefinitions: {
  name: guid(roleDefinition, 'defender for containers')
  properties: {
    principalId: azurePolicyKubernetes.identity.principalId
    roleDefinitionId: '/providers/Microsoft.Authorization/roleDefinitions/${roleDefinition}'
    principalType: 'ServicePrincipal'
  }
}]

Conclusion

When we combine the policy assignments and role assignments together we have successfully enabled the Azure Policy for Kubernetes feature of Defender for Containers. With this it has become possible to report and enforce the compliance of our Kubernetes clusters.

The full template I’ve used:

targetScope = 'subscription'

var roleDefinitions = [
  '8bb6f106-b146-4ee6-a3f9-b9c5a96e0ae5'
  '5e93ba01-8f92-4c7a-b12a-801e3df23824'
  '92aaf0da-9dab-42b6-94a3-d43ce8d16293'
]

resource azurePolicyKubernetesArc 'Microsoft.Authorization/policyAssignments@2024-04-01' = {
  name: 'Defender for Containers provisioning Policy extension for Arc-e'
  location: deployment().location
  properties: {
    displayName: 'Configure Azure Arc enabled Kubernetes clusters to install the Azure Policy extension'
    description: 'Deploy Azure Policy\'s extension for Azure Arc to provide at-scale enforcements and safeguard your Arc enabled Kubernetes clusters in a centralized, consistent manner. Learn more at https://aka.ms/akspolicydoc.'
    policyDefinitionId: '/providers/Microsoft.Authorization/policyDefinitions/0adc5395-9169-4b9b-8687-af838d69410a'
    parameters: {
      effect: {
        value: 'DeployIfNotExists'
      }
    }
  }
  identity: {
    type: 'SystemAssigned'
  }
}

resource arcRoleAssignments 'Microsoft.Authorization/roleAssignments@2022-04-01' = [for roleDefinition in roleDefinitions: {
  name: guid(roleDefinition, 'defender for containers arc')
  properties: {
    principalId: azurePolicyKubernetesArc.identity.principalId
    roleDefinitionId: '/providers/Microsoft.Authorization/roleDefinitions/${roleDefinition}'
    principalType: 'ServicePrincipal'
  }
}]

resource azurePolicyKubernetes 'Microsoft.Authorization/policyAssignments@2024-04-01' = {
  name: 'Defender for Containers provisioning Azure Policy Addon for Kub'
  location: deployment().location
  properties: {
    displayName: 'Deploy Azure Policy Add-on to Azure Kubernetes Service clusters'
    description: 'Use Azure Policy Add-on to manage and report on the compliance state of your Azure Kubernetes Service (AKS) clusters. For more information, see https://aka.ms/akspolicydoc.'
    policyDefinitionId: '/providers/Microsoft.Authorization/policyDefinitions/a8eff44f-8c92-45c3-a3fb-9880802d67a7'
    parameters: {
      effect: {
        value: 'DeployIfNotExists'
      }
    }
  }
  identity: {
    type: 'SystemAssigned'
  }
}

resource azurePolicyRoleAssignments 'Microsoft.Authorization/roleAssignments@2022-04-01' = [for roleDefinition in roleDefinitions: {
  name: guid(roleDefinition, 'defender for containers')
  properties: {
    principalId: azurePolicyKubernetes.identity.principalId
    roleDefinitionId: '/providers/Microsoft.Authorization/roleDefinitions/${roleDefinition}'
    principalType: 'ServicePrincipal'
  }
}]
read more

Remove Microsoft-Hosted Agents from an Azure DevOps Project

As part of Azure DevOps the Azure Pipelines allow you to run automated builds and deployments for your applications. You can use different types of agents to run your pipelines, such as hosted agents, self-hosted agents, or virtual machine scale sets. Agents are grouped into agent pools, which define the configuration and capabilities of the agents. Agent pools are associated with agent pool queues, which are used to assign work to the agents in the pool.

Sometimes, you may need to remove an agent pool queue from an Azure DevOps project, for example, if you know that the project cannot execute builds and deployments on Microsoft-hosted agents. This typically occurs when there are security requirements and the network is locked down. In this post, I will show you how to manually remove an agent pool queue from an Azure DevOps project, as well as how to use the Azure DevOps REST API to automate the process. I will also show you how to restore an agent pool queue using the Azure DevOps REST API, in case you need to undo the removal.

How to manually remove an agent pool queue from an Azure DevOps project

The manual method to remove an agent pool queue from an Azure DevOps project involves deleting the agent pool that is associated with the queue. This will also delete the queue from the project, but not from the organization. To do this, follow these steps:

  1. Log in to your Azure DevOps portal and navigate to the project that contains the agent pool queue that you want to remove.
  2. Go to Project settings > Pipelines > Agent pools.
  3. Hover over the agent pool you like to delete. You will see a delete button on the right and can click on it.
  4. You get a confirmation if you want to delete the queue from the project. Confirm it if you are sure.
  5. Refresh the page and verify that the agent pool and the queue are no longer listed in the project.

Note that this method will delete the agent pool and the queue from the project, if required you can of course add it back again.

How to remove an agent pool queue from an Azure DevOps project using the Azure DevOps REST API

The Azure DevOps REST API allows you to programmatically interact with Azure DevOps services, such as projects, pipelines, agents, and queues. You can use the Azure DevOps REST API to remove an agent pool queue from an Azure DevOps project without deleting the agent pool. To do this, you will need to use the following endpoints:

To use the Azure DevOps REST API, you will need to authenticate with an access token or a personal access token (PAT). However, for security reasons, I recommend using the Get-AzAccessToken cmdlet from the Az.Accounts module to get an access token for Azure DevOps. This cmdlet will use the credentials of the current Azure session to generate an access token that can be used to call the Azure DevOps REST API. To use the Get-AzAccessToken cmdlet, you will need to connect to your Azure account that contains your Azure DevOps organization. Once you have connected to your Azure account, you can use the Get-AzAccessToken cmdlet to get an access token for Azure DevOps. To do this, run the following command in PowerShell: $token = Get-AzAccessToken -ResourceUrl 499b84ac-1321-427f-aa17-267ca6975798. Where 499b84ac-1321-427f-aa17-267ca6975798 is the well known guid of Azure DevOps. I do the same in the below example script.

In the below script I first get the access token for Azure DevOps using the Get-AzAccessToken cmdlet. Then I use the access token to call the Azure DevOps REST API to get the list of agent pool queues in the organization. I filter the list of queues by name to find the queue that I want to remove. If the queue is present, I get its ID and use it to call the DELETE endpoint to remove the queue from the organization. Here is the complete script in PowerShell:

$organization = 'your-organization-name'
$project = 'your-project-name'
$queueName = 'Azure Pipelines'

# Get access token for Azure DevOps
$accessToken = Get-AzAccessToken -ResourceUrl '499b84ac-1321-427f-aa17-267ca6975798'

$headers = @{
    Authorization = "Bearer $($accessToken.Token)"
    Accept        = 'application/json'
}

$getQueuesParameters = @{
    Method  = 'Get'
    Uri     = "https://dev.azure.com/$organization/$project/_apis/distributedtask/queues?queueNames=$queueName&api-version=7.2-preview.1"
    Headers = $headers
}

$getQueuesResponse = Invoke-RestMethod @getQueuesParameters

if ($getQueuesResponse.count -eq 1) {
    $queueId = $getQueuesResponse.value[0].id

    $removeParameters = @{
        Method  = 'Delete'
        Uri     = "https://dev.azure.com/$organization/$project/_apis/distributedtask/queues/$queueId/?api-version=7.2-preview.1"
        Headers = $headers
    }

    $response = Invoke-RestMethod @removeParameters

    Write-Host 'Queue deleted'
} else {
    Write-Host 'Queue not present'
}

How to restore an agent pool queue to an Azure DevOps project using the Azure DevOps REST API

If you accidentally deleted an agent pool queue from an Azure DevOps project or organization, you can use the Azure DevOps REST API to restore it. To do this, you will need to use the following endpoints:

In the script below I first get the access token for Azure DevOps using the Get-AzAccessToken cmdlet. Then I use the access token to call the Azure DevOps REST API to get the list of agent pool queues in the organization. I filter the list of queues by name to find the queue that I want to restore. If the queue is present, I get its ID and use it to call the POST endpoint to create a new agent pool queue in the organization and associate it with a project. Here is the complete script in PowerShell:

$organization = 'your-organization-name'
$project = 'your-project-name'
$poolName = 'Azure Pipelines'

# Get access token for Azure DevOps
$accessToken = Get-AzAccessToken -ResourceUrl '499b84ac-1321-427f-aa17-267ca6975798'

$headers = @{
    Authorization = "Bearer $($accessToken.Token)"
    Accept        = 'application/json'
    "Content-Type"  = 'application/json'
}

$getPoolParameters = @{
    Method  = 'Get'
    Uri     = "https://dev.azure.com/$organization/_apis/distributedtask/pools?poolName=$queueName&api-version=7.2-preview.1"
    Headers = $headers
}

$getPoolResponse = Invoke-RestMethod @getPoolParameters

if ($getPoolResponse.count -eq 1) {
    $poolId = $getPoolResponse.value[0].id

    $body = @{
        Name = $poolName
        Pool = @{
            Id = $poolId
        }
    }

    $addParameters = @{
        Method  = 'Post'
        Uri     = "https://dev.azure.com/$organization/$project/_apis/distributedtask/queues?api-version=7.2-preview.1"
        Headers = $headers
        Body    = $body | ConvertTo-Json
    }

    $response = Invoke-RestMethod @addParameters

    Write-Host 'Queue added'
} else {
    Write-Host 'Pool not found'
}

You can verify that the agent pool queue is restored to the project by using the GET endpoint again or by going to Project settings > Pipelines > Agent pools in the Azure DevOps portal.

Conclusion

In this post, I showed you how to remove and restore an agent pool queue from an Azure DevOps project using manual and automated methods. I hope that this guide will help you to manage your agent pool queues more easily and efficiently and above all to ensure that your project meets your compliance requirements.

read more

Finding Azure DevOps full scoped PATs

Personal Access Tokens (PATs) are a way of authenticating and accessing Azure DevOps resources without using a username and password. They can be scoped to limit the access level and duration of the token, and they can be revoked at any time. However, PATs also pose a security risk if they are not managed properly, especially if they have full scope permissions.

Full scope PATs grant unrestricted access to all Azure DevOps organizations, projects, and resources that the user has access to. This means that anyone who obtains a full scope PAT can potentially perform any action on behalf of the user, such as deleting repositories, modifying pipelines, accessing secrets, and so on. Full scope PATs should only be used for specific scenarios where no other scope is sufficient, and they should be treated with extreme caution.

Azure DevOps offers the possibility to authenticate processes using a Managed Identity or Service Principal. This reduces the need for running automation processes with a PAT token.

How to prevent these from being created?

Even though using Managed Identity or Service Principal is a recommended practice, there might be scenarios where using a PAT token is necessary or more convenient. In that case, it is important to restrict the scope and duration of the PAT token as much as possible, to reduce the risk of unauthorized access or misuse.

One way to restrict PAT tokens is by applying policies in Azure DevOps / Microsoft Entra ID. Policies can help enforce certain rules or standards for creating and using PAT tokens, such as:

To apply policies for Personal Access Tokens, you need to have an organization associated to Microsoft Entra ID. And as user doing the configuration you need to have been assigned the Azure DevOps Administrator role in Microsoft Entra ID. If that’s all in place you can go to the organization settings and navigate to Microsoft Entra.

Keep in mind that these policies will only apply to newly created PATs, so any existing PATs will remain unaffected.

How to list all full scope PATs?

One way to list all the full scope PATs in your organization is to use a PowerShell script that can query the Azure DevOps API. The script will require an administrator to run it. Luckily we don’t need to have a PAT to authenticate and access the data, instead we can use an Access Token.

This script will fetch all users and query foreach user if they have any PAT, if they do it will validate the scope of the token. If the scope matches it will write an entry to the results. So all the results will be users with a full scope PAT. If there is no value for targetAccounts it’s even worse, this means that the PAT is valid for all organizations that the user belongs to.

param(
    [String]
    $OrganizationName,

    [String]
    $TenantId,

    [String]
    $Scope = 'app_token'
)

$AccessToken = Get-AzAccessToken -ResourceUrl "499b84ac-1321-427f-aa17-267ca6975798" -TenantId $TenantId
$Headers = @{
    Accept = "application/json"
    Authorization = "Bearer $($accessToken.Token)"
}

function GetUsers
{
    $UsersUrl = "https://vssps.dev.azure.com/$OrganizationName/_apis/graph/users?subjectTypes=aad&api-version=7.2-preview.1"
    $UsersResult = Invoke-RestMethod -Method:Get -Uri $UsersUrl -Headers $Headers

    return $UsersResult.value
}

$Users = GetUsers
$FilteredUsers = $Users | Where-Object {$_.domain -eq $TenantId}

$Results = @()

foreach($User in $FilteredUsers)
{
    $PatUrl = "https://vssps.dev.azure.com/$OrganizationName/_apis/tokenadmin/personalaccesstokens/$($User.descriptor)?api-version=7.1-preview.1"
    $PatResult = Invoke-RestMethod -Method:Get -Uri $PatUrl -Headers $Headers

    if ($null -eq $PatResult.value -or $PatResult.value.Length -le 0)
    {
        Write-Verbose "No PAT found for $($User.displayName)"
    }
    else
    {
        $PatResult.value | Where-Object {$_.scope -eq $Scope} | ForEach-Object {
            Write-Verbose "PAT found for $($User.displayName) with scope $($_.scope)"

            $Results += [PSCustomObject] @{
                User = $User.displayName
                Scope = $_.scope
                TargetAccounts = $_.targetAccounts
                ExpirationDate = $_.validTo
            }
        }
    }
}

$Results | Format-Table

The script will generate an output like this:

User  Scope     TargetAccounts ExpirationDate
----  -----     -------------- --------------
alice app_token {some guid}    1/14/2024 12:00:00 AM
alice app_token                1/14/2024 12:00:00 AM

Conclusion

In this document, I have shown you how to list all the full scope PATs in your organization using a PowerShell script and the Azure DevOps API. This can help you identify and revoke any unnecessary or risky tokens that could compromise your security. I have also taught you how to limit the scope of a PAT, and why this is crucial to protect your data and resources. I recommend that you review your PATs regularly and follow the best practices for creating and managing them. By doing so, you can enhance the security and compliance of your Azure DevOps environment.

read more

Service Health alert at scale

When we run our solution in Azure there will always be a time where there is an incident or maintenance event. As we are responsible for hosting our solutions we should get updates on these events so that we can take the right actions. One way that Azure provides these kind of notifications is through the use of Service Health. I quite regularly run into workloads that haven’t yet set this up and then experience challenges because they miss the events that have been published. This can of course be configured manually, but this will have to be done per subscription or resource group and thus it can be easy to miss.

How to deploy this at scale?

The obvious answer for this is typically Azure Policies. They can stay around and as new subscriptions get added to the environment they will configure Service Health alerts as well. Luckily the Azure Community Policy already contains Deploy Service Health Alerts and corresponding Action Group to notify of Service Health Incidents. When we enable this policy it will create an Action Group per subscription and setup Service Health alerts attached to it.

How to set unique receivers per subscription?

One of the challenges with the current policy is that you can assign it on a management group level and it will configure the same e-mail receivers for all subscriptions. In many organizations different subscriptions have different technical owners or engineers that are responsible for this. One way many of them already do this is by setting a tag on each subscription with this information. So how can we leverage this. If we look at the current policy it has a parameter emailAddress, within the deployment part of the policy definition this is used like this:

"emailAddress": {
    "value": "[parameters('emailAddress')]"
}

As we have set a tag TechnicalContact at the subscription level we could leverage this by altering this block to this:

"emailAddress": {
    "value": "[subscription().tags['TechnicalContact']]"
}

Once this policy gets remediated the service health alerts will be sent to the technical contact.

Conclusion

Setting up service health alerts is important to ensure workload teams don’t miss out on important service issue, maintenance events etc.. Missing these could result in very short and difficult planning for these events, or time spent troubleshooting an issue. So many reasons to ensure that this gets configured at scale, which can easily be done with an Azure Policy as shown in this post.

read more

Azure Policy the DevOps Way

Azure Policy is a great way to enforce or assess if the configuration of Azure services within Azure environment is compliant with standards set by the organization. Policies can be utilized in many ways; of course there are the builtin policies that are available on the platform and can just be assigned. Of course builtin policies will not always cover each and every scenario and you often have to create your own custom policies as well.

What happens a lot is that an engineer creates a policy, tests it, fine tune’s it and then is done. A common use case is that we first do this in an isolated fashion on a single subscription or resource group level. When the engineer is happy it then gets applied onto a Management Group level for testing purposes. Which usually results in discovering some bugs, and after fixing those it will get applied to the production management group as well.

Most of the time there is some basic Infrastructure as Code and pieces of automation surrounding this. Design Azure Policy as Code workflows describes a nice proccess that creates a workflow covering the creation, testing and deployment of the Azure Policy. In this post I want to describe how such a proccess could look in practice by leveraging GitHub Actions for the automation. For the example we will create an Azure Policy that audits if there are undesired role assignments of type Owner on a subscription or resource group that has a certain tag.

Creating a policy repo

Obviously I would first need a GitHub repository, I usually start with a local git repo.

Creating a GitHub Actions workflow

When we create a GitHub Actions workflow we of course want to trigger it on a push into the main branch. And then we want to deploy the policies and apply the test assignments.

name: policy-deployment

on:
  push:
    branches:
      - main

jobs:
  apply-azure-policy:
    runs-on: ubuntu-latest
    name: Deploy policy definitions + test assignments
    steps:
    - name: Checkout
      uses: actions/checkout@v2
    - name: Login to Azure
      uses: azure/login@v1
      with:
        creds: $
        allow-no-subscriptions: true
    - name: Create or update Azure Policies
      uses: azure/manage-azure-policy@v0
      with:
        paths: |
          policies/rbac-owner/**
        assignments: |
          assign.test.*.json

The above workflow contains a single job that will apply the policy definition and apply all the assignments that are in files following the assign.test.*.json pattern. So a file with assign.dev.rbac-owner-disallowed.json will be applied, but the file named assign.prd.rbac-owner.json won’t get applied as it doesn’t match with the pattern.

Testing the policy

Most of the DevOps teams that I have met so far are testing these policies manually and test them for the change they want to make. Over the last years I’ve seen many of them experience regressions, where a change in the policy had a bunch of unintended side effects.

If we take these previous thoughts into consideration a DevOps process would facilitate some way of automated testing. For a policy that would mean a few things;

For all of these we would have to follow a similar process:

For the rest of the example I will use an Audit policy. For the other policy types some of the test steps will be slightly different.

Deploying test resources

Deploying the test resources for me is usually a dedicated stage. This stage deploys a Bicep, ARM, Terraform template or any other mechanism that you like for creating these resources.

  deploy-test-resources:
    runs-on: ubuntu-latest
    name: Deploy test resources
    needs: apply-azure-policy
    steps:
    - name: Checkout
      uses: actions/checkout@v2
    - name: Login to Azure
      uses: azure/login@v1
      with:
        creds: $
    - name: Deploy
      uses: azure/arm-deploy@v1
      with:
        scope: subscription
        subscriptionId: $
        template: ./tests/deployment/rbac-owner.bicep
        region: westeurope
        failOnStdErr: false

Testing your policies

Now that we test resources deployed and our policy assigned we can take a look at testing our policy. Typically you would run a policy scan and then look at the output, but we will trigger this scan in the next step already. We will use the Azure/policy-compliance-scan action in GitHub. This will provide us with a CSV reporting the compliance of our resources. This CSV is great input for running some automated test cases against.

I could pick a number of testing frameworks out there, in this case I would leverage Pester; so would write it in PowerShell. My reasoning behind this is that most of the time policies are written and maintained by engineers that are already familiar with PowerShell.

Describe "Check disallowed resource group owners" {
    BeforeAll {
        $results = Import-Csv policy-compliance.csv
        $result = $results | Where-Object { $_.POLICY_DEF_ID.endswith('audit-rbac-owner') -and $_.POLICY_ASSG_ID.Contains('/resourcegroups/rg-disallowed-owner') }
    }

    It "Should have one result" {
        $result | Should -Not -BeNullOrEmpty
        $result.Count | Should -Be 1
    }

    It "Should be resource group rg-disallowed-owner" {
        $result.RESOURCE_ID.Contains('rg-disallowed-owners/providers/microsoft.authorization/') | Should -Be $true
    }

    It "Should be non compliant" {
        $result.COMPLIANCE_STATE | Should -Be 'NonCompliant'
    }
}

In the example above we run a validation on a single resource group in the describe. In the BeforeAll section we grab the results for this particular resource group, as we only want to validate that single resource group.

Then the different cases will be validating that our policy behaved as expected:

  1. We should only have a single entry for our policy for the single resource group. So we validate that the count of the results is 1.
  2. Validate that the resource id of the result is actually the resource that we expect.
  3. As the deployed resource should be non-compliant we validate the compliance state.

Automating the tests

Of course we can run all these steps using GitHub Actions.

  test-policy:
    runs-on: ubuntu-latest
    name: Test policies on resources
    needs:
      - apply-azure-policy
      - deploy-test-resources
    steps:
    - name: Login to Azure
      uses: azure/login@v1
      with:
        creds: $
    - name: Azure Policy Compliance Scan
      uses: Azure/policy-compliance-scan@v0
      continue-on-error: true
      with:
        scopes: |
          /subscriptions/$/resourceGroups/rg-disallowed-owners
        policy-assignments-ignore: |
          /subscriptions/$/providers/microsoft.authorization/policyassignments/*
        wait: true
        report-name: policy-compliance
    - name: Checkout
      uses: actions/checkout@v2
    - name: Download policy compliance artifact
      uses: actions/download-artifact@v2
      with:
        name: policy-compliance.csv
    - name: Test policy compliance results file
      shell: pwsh
      run: Test-Path policy-compliance.csv | Should -Be $true
    - name: Analyze policy compliance results
      shell: pwsh
      run: |
        Invoke-Pester ./tests/PolicyCompliance.Tests.ps1 -Passthru

Promoting the policy

Once that all the test have completed succesfully our policy should be consided safe to be promoted to production. As we already have the definition there it can be just a matter of assigning it to the next resources. In general I would prefer to do this at the Management Group level.

For assigning them I leverage another GitHub Actions Job. This job will just look at the assignments that are described in assign.prd.*.json. An example of such an assignment would be:

{
    "sku": {
     "name": "A0",
     "tier": "Free"
    },
    "properties": {
     "displayName": "No unknown RBAC owners allowed on subscription and resource groups",
     "policyDefinitionId": "/subscriptions/[subscriptionid]/providers/Microsoft.Authorization/policyDefinitions/audit-rbac-owner",
     "scope": "/subscriptions/[subscriptionid]",
     "notScopes": [],
     "parameters": {},
     "metadata": {
      "assignedBy": "Remco Eissing"
     },
     "enforcementMode": "Default"
    },
    "id": "/subscriptions/[subscriptionid]/resourceGroups/rg-disallowed-owners/providers/Microsoft.Authorization/policyAssignments/audit-rbac-owner",
    "type": "Microsoft.Authorization/policyAssignments",
    "name": "audit-rbac-owner",
    "location": "westeurope"
   }

The deployment job would look like below. After this job has ran it will have assigned the policies at the production scopes.

  assign-azure-policy-production:
    runs-on: ubuntu-latest
    name: Assign policy to production
    needs: test-policy
    steps:
    - name: Checkout
      uses: actions/checkout@v2
    - name: Login to Azure
      uses: azure/login@v1
      with:
        creds: $
        allow-no-subscriptions: true
    - name: Create or update Azure Policies
      uses: azure/manage-azure-policy@v0
      with:
        paths: |
          policies/**/**
        assignments: |
          assign.prd.*.json

Cleaning up

We’ve now seen how we can automatically test a policy and promote it to production. We of course want to be cost effective here as well; so we should clean up all our testing resources in an automated fashion. Using GitHub Actions this can be a simple extra job, that looks something like this:

  remove-test-resources:
    runs-on: ubuntu-latest
    name: Remove test resources
    needs: test-policy
    steps:
    - name: Checkout
      uses: actions/checkout@v2
    - name: Login to Azure
      uses: azure/login@v1
      with:
        creds: $
    - name: Remove
      uses: azure/CLI@v1
      with:
        azcliversion: 2.0.72
        inlineScript: |
          resources="$(az resource list --resource-group rg-policy-test | grep id | awk -F \" '{print $4}')"
          for id in $resources; do
              az resource delete --resource-group rg-policy-test --ids "$id" --verbose
          done

Conclusion

There is lots of room for improvement to learn from regular development processes when working with Azure Policies. This post is just meant as a first starter for some inspiration.

read more

Remove unknown objects from Azure Role Based Access Control

When working within your Azure environment you leverage Role Based Access Control, either on your Management Group, subscription, resource group or even on a resource level. Now due to a variety of reasons you eventually see a identity with the name Identity not found. So you now have a case where permissions are assigned to an identity that got removed; this typically happens in cases where users get removed from Azure Active Directory or for example a Managed Identity that got removed.

Why should I care?

Because it’s just so much nicer to work in an environment that’s clean, you are simply less distracted. Also Role Based Access Control has it’s limits just like many other things in Azure.

How can I find these identities?

You could of course look through all your subscriptions etc. within the Azure Portal to identify any assignments that are unknown and clean them up. But an easier job would be to leverage automation to identify these objects. My personal favorite for these tasks is leveraging the Az PowerShell cmdlets, and in this case Get-AzRoleAssignment specifically.

When we look at the documentation for this cmdlet it provides us with a nice clue on what we can leverage:

Please notice that this cmdlet will mark ObjectType as Unknown in output if the object of role assignment is not found or current account has insufficient privileges to get object type.

So with this new bit of knowledge we can now write a script to get all our unknown objects, which would typically look like:

Get-AzRoleAssignment | Where-Object {$_.ObjectType -eq 'Unknown'}

This will return all the objects with ObjectType of Unknown throughout the selected Azure subscription. So this includes all the assignments to resource group and resources as well.

How can I remove them?

That’s the easy part where Remove-AzRoleAssignment comes into play. And as it’s the case when working with PowerShell objects we can just pipe the output of the get command to our remove command. You probably want to do something like this in a loop and write some logging information for the cases where you need to read back what you actually removed and when.

How can we make this scale?

My personal favorite is to do this in a serverless way, so in my case I would leverage Azure Functions for executing this. It offers high scalability, reliability as well as many standard building blocks that I like to utilize. What our Function App needs to do:

To run this the Function App needs to have a Managed Identity. The Managed Identity needs to have User Access Administrator rights on the subscriptions that it needs to clean.

I’ve opted to leverage PowerShell Durable Functions. I’ve opted for PowerShell as this is the language I see most people use for these kind of tasks, so that enables them to maintain it as well. For the durable functions I choose a fan out / fan in pattern.

It starts with the trigger and orchestrator. The orchestrator is rather simple; it invokes the GetSubscriptions activity and foreach subscription it invokes the RemoveUnknownAssignments activity and waits till all of them are completed.

GetSubscriptions connects to the Azure environment and will return all the subscriptions that the Managed Identity has access to.

RemoveUnknownAssignments selects the Azure subscription, fetches the unknown assignments and then removes them. This function also takes into account an Application Setting whatif; if this has a value of 1 then it will not process the actual removal but run in a log only mode.

Configuring the Managed Identity with Graph permissions

I configure the Function App with a System Assigned Managed Identity so that this can be leveraged for authentication against other Azure services. By default the Managed Identity doesn’t have any permissions to interact with the Microsoft Graph API, the API used for managing Azure Active Directory, so we need to provide those.

# Connecting to the graph and requesting the scope to assign app roles.
Connect-MgGraph -Scopes Directory.ReadWrite.All, AppRoleAssignment.ReadWrite.All

# Get the Managed Identity from AAD
$MSI = Get-MgServicePrincipal -Filter "DisplayName eq 'rbaccleaner'"

# Get Microsoft Graph application
$graphApp = Get-MgServicePrincipal -Filter "appId eq '00000003-0000-0000-c000-000000000000'"

# Get graph permission
$permission = $graphApp.AppRoles | Where-Object { $_.Value -eq "Directory.Read.All" }

# Assign the permission to the Managed Identity
New-MgServicePrincipalAppRoleAssignment -AppRoleId $permission.Id -PrincipalId $MSI.Id -ServicePrincipalId $MSI.Id -ResourceId $graphApp.Id

In my case I assigned Directory.Read.All, Directory.User.Read. Keep in mind that if you don’t have the right permissions set that all the role assignments will be marked as unknown and thus removed.

Repository

An example of the code can be found here: https://github.com/remcoeissing/azure-rbac-clean-ps

It currently doesn’t deploy any infrastructure components.

What’s next

In a future post will dive into the possibilities that we have available for doing this in an event driven fashion. That will feature a more direct approach where we don’t rely on scheduling anymore. But it leverages the same basis.

read more

App Service Plan cost optimization

App Services in Azure make it easy to quickly spin up an environment where you can host a webapp or api, without having to manage the webserver yourself.

When running solution in the cloud the costs become very visible. This allows us to spend time optimizing the costs. One of the easiest way to reduce costs is by reducing the amount of waste. In a cloud we can define waste as resources that we pay for but are not utilizing anymore.

The case of the empty App Service Plans

If we apply this principle to App Services then one might think we can just remove the App Service when we are done with it. But many times I found Azure environments where the App Service was removed but the App Service Plan stayed around. The actual resource that you are getting billed for is the App Service Plan, so not removing that will mean you still incurr charges.

Luckily enough we can easily detect these kind of App Service Plans. My favorite method is by leveraging an Azure Resource Graph Query. Using this we can fetch all the App Service Plans that are hosting 0 sites. The below query will fetch all of these.

resources
| where type =~ 'Microsoft.Web/serverFarms'
| where properties.numberOfSites == 0
| project ResourceId = id, ResourceGroup = resourceGroup, Sku = tostring(sku.size), Instances = tostring(properties.currentNumberOfWorkers)

So when we look at the returned data you will most likely see some App Service Plans with a SKU of F1, this refers to the Free tier and does not incur any charges. For other tiers like for example P1v2 or S1 you do have to pay. The actual amount is also influenced by the number of instances. If we for example stick with a P1V2 and lookup the instance price on App Service Pricing page, this is currently 130.954 Euro’s a month per instance. So when we run 2 instances of this without any sites on there we are spending slightly over 260 Euro’s that we could have spent on something more useful.

The case of the cheaper App Service Plan tier

After eliminating waste on the App Service Plans that we use throughout the Azure environment we can also try to optimize the spend on the App Service Plans that we are actually utilizing. With the Premium tier there are 3 different versions. Newer versions are utilizing newer generations and have a different pricing model. For example if we have a P2v2 plan we would have 2 cores and 7 GB of RAM for around 262 Euros. If we would switch to a P1v3 plan we would get 2 cores and 8 GB of RAM for 222 Euros. And with the Premium v3 tier also comes the possibility of reserving capacity, with a 3-year reservation the price would go down to 133 Euro’s.

Changing from a Premium v2 tier to a Premium v3 tier depends on the availability. The steps for this migration can be found here: App Service Configure Premium tier

We can query these different types of App Service Plans using an Azure Resource Graph query and provide recommendations on which size to switch to.

resources
| where type =~ 'Microsoft.Web/serverFarms'
| where sku.size startswith 'P' and sku.size endswith 'v2'
| project Name = id, ResourceGroup = resourceGroup, Sku = tostring(sku.size), Instances = tostring(properties.currentNumberOfWorkers), RecommendedSku = iff(sku.size == 'P1v2', 'P1v3', iif(sku.size == 'P2v2', 'P1v3', 'P2v3'))

Monitoring for cost optimizations

Monitoring your environment to be cost effective and optimizing it’s cost is an ongoing effort. One way to do this is by leveraging a workbook in Azure Monitor that we can utilize for tracking the empty plans and plans where we can leverage a different size that would be more cost effective. In a perfect world the workbook would not display anything and show us that we’ve optimized the entire estate.

Example of App Service Cost Optimization workbook

If you want to try out this workbook you can find it at App Service Cost Optimization GitHub Repo

read more