If you’re devopsing around (a variant of horsing around) several Azure subscriptions daily like I do, chances are you have a time slot in your calendar dedicated to housekeeping duties. At Compositional IT we have about 8 devs (including me) capable of deploying their work to our subscriptions, so resources can pop up and sometimes not get disposed of once they’ve outlived their purpose. Stray resources incur unnecessary costs, might present an increased attack surface, and - if you stare into the Azure Portal as much as I do – are also a form of visual smog. If, in addition to that, your attention to detail is that of a caffeinated squirrel, you might appreciate a way to run scripted queries that return a list of only what you want to see, rather than having to scan through a list of everything there is with your eyes and hope you haven't missed something. The idea behind this is that the query logic will be applied consistently, whereas my attention will not.
My solution to this problem space is to:
- Set up all subscriptions to emit Activity Logs to a Log Analytics Workspace,
- Develop log queries to extract the data I want,
- Use a PowerShell script to run those queries against the workspace anytime I want.
Part 1: Setting up your subscription(s)
Before we can query any logs, we need to start collecting said logs. To that end (and other ends) I happen to have a ‘maintenance’ resource group with a several maintenance resources. One of them is a maintenance Log Analytics Workspace. This workspace ingests Activity Logs from all the subscriptions I rake.
You can set this up in the Azure Portal: Find your subscription and on the Activity Log blade hit Export Activity Logs. Press +Add diagnostic setting, tick only the Administrative category and for Destination, tick Send to Log Analytics workspace and select the target workspace. Don’t forget to hit Save.
Part 2: The KQL queries
- I want to identify resources that were created and not deleted since the last time I ran this check.
The first query ($newResourcesRawQuery) looks at the last 7 days (because I run it once a week) and prints out all ResourceIds that were written to in that period and still exist (were not deleted after their last write event). I am not interested in ResourceIds that the devs deleted after they were done with them.
// How far back to look for recent activity
let lookback = (7d+1h);
// generic pre-filter to narrow down the data set
let prefiltered =
AzureActivity
| where TimeGenerated > ago(lookback)
and CategoryValue == 'Administrative'
and ActivityStatusValue == 'Success'
and OperationNameValue !startswith 'microsoft.storage/storageaccounts/blobservices/'
and OperationNameValue !startswith 'microsoft.storage/storageaccounts/fileservices/'
and OperationNameValue !startswith 'microsoft.resources/deployments'
and OperationNameValue !startswith 'microsoft.support/supporttickets';
// pick all creation events, summarise aka 'distinct' to keep only one row per _ResourceId (the one with the latest TimeGenerated)
let all_writes =
prefiltered
| where OperationNameValue endswith '/write'
| summarize max(TimeGenerated) by _ResourceId, OperationNameValue;
// pick all deletion events, summarise aka 'distinct' to keep only one row per _ResourceId (the one with the latest TimeGenerated)
let all_deletes =
prefiltered
| where OperationNameValue endswith '/delete'
| summarize max(TimeGenerated) by _ResourceId, OperationNameValue;
// join the two tables (creations and deletions) and filter down to creation events that either do not have a corresponding deletion event or have one that's not newer than the creation event
let existing_writes =
all_writes
| join kind=leftouter all_deletes on _ResourceId
| where isempty(max_TimeGenerated1) or (max_TimeGenerated > max_TimeGenerated1);
existing_writes
| project max_TimeGenerated, _ResourceId
| sort by max_TimeGenerated desc
- I also want to indentify existing resources that have not had any activity on them in the last 30 days. We need to bear in mind that this query will only be of any use once we have been collecting the Activity Logs for a while (over a month in my case).
The second query ($oldResourcesRawQuery) looks at the full log history and prints out ResourceIds where the last non-delete activity was more than 30 days ago.
// min days since last activity on resource
let ageThreshold = (30d);
// generic pre-filter to narrow down the data set
let prefilter =
AzureActivity
| where CategoryValue == 'Administrative'
// filters out deployment-related events
and ResourceProviderValue != 'MICROSOFT.RESOURCES'
// filters out subscription-related events
and OperationNameValue != 'MICROSOFT.RESOURCES/SUBSCRIPTIONS/ACTION';
let non_deletes =
prefilter
| where OperationNameValue !endswith '/delete'
| summarize max(TimeGenerated) by _ResourceId, OperationNameValue;
let deletes =
prefilter
| where OperationNameValue endswith '/delete'
| summarize max(TimeGenerated) by _ResourceId, OperationNameValue;
// join the two tables (non_deletes and deletes) and filter down to non-delete events that either do not have a corresponding deletion event or have one that's not newer than the latest non-delete event for the same resourceId
let existing_actives =
non_deletes
| join kind=leftouter deletes on _ResourceId
| where isempty(max_TimeGenerated1) or (max_TimeGenerated > max_TimeGenerated1);
// filter down to resources older than the defined threshold
let inactives =
existing_actives
| where max_TimeGenerated < ago(ageThreshold);
inactives
| project max_TimeGenerated, _ResourceId
| sort by _ResourceId asc
Part 3: The Shell of Power
The PowerShell script defines:
- The two queries described above as strings (mostly ommited in the sample below, apart from the first few lines updated for parametrisation),
- A function Get-FormattedQuery that does something nasty to the pretty-printed query strings before they can be passed to the AzureCLI command without throwing an error,
- hard-coded WorkspaceId to run the queries against.
It then executes the queries sequentially and dumps the output into the console, formatted as a table.
$lookback=$args[0]
$ageThreshold=$args[1]
$newResourcesRawQuery = "
// How far back to look for recent activity
let lookback = ($($lookback)d+1h);
...
"
$oldResourcesRawQuery = "
// min days since last activity on resource
let ageThreshold = ($($ageThreshold)d);
...
"
# the whole query string with newlines and // comments doesn't work with the azure-cli command, that's why it needs to be mangled by this function
function Get-FormattedQuery {
param ($rawQuery)
$lines = ($rawQuery -split '\r?\n')
$linesFiltered =
foreach ($line in $lines) {
$line = $line.Trim()
if ($line.startswith('//')) {''} else {$line}
}
$linesFiltered -join ''
}
$newResourcesFormattedQuery = Get-FormattedQuery($newResourcesRawQuery)
$oldResourcesFormattedQuery = Get-FormattedQuery($oldResourcesRawQuery)
$citMaintenanceWorkspaceId = 'ebfa7054-4ca7-431f-ae63-045ab0b8f2e4'
Write-Host -ForegroundColor DarkYellow "RESOURCES WITH ACTIVITY IN THE LAST $lookback DAYS:"
az monitor log-analytics query -w $citMaintenanceWorkspaceId -o table --analytics-query $newResourcesFormattedQuery
Write-Host -ForegroundColor DarkYellow "RESOURCES WITH NO ACTIVITY IN THE LAST $ageThreshold DAYS:"
az monitor log-analytics query -w $citMaintenanceWorkspaceId -o table --analytics-query $oldResourcesFormattedQuery
I typically run this once a week from commandline:
.\HouseKeeper.ps1 7 30
Time might show more balast to filter out, or other ways to fine-tune the queries. For now, though, this script that takes a second to run narrows down the set of resources I need to be concerned with for the housekeeping task and cuts down on the time spent on this task.
Let us know what you think and what you'd do differently.
- Proof the queries better - are mine leaky?
- Turn the queries into periodic Azure Alerts (if even possible)?