KeMeT Tech
← All field notes

Migrating Azure Sentinel off MMA: a field guide for production environments

May 22, 20264 min read
azure sentinelsiemlog analyticsamamigration

Microsoft retired the Log Analytics agent (MMA) in August 2024 and Azure Monitor Agent (AMA) is the only supported collector path forward. If you have custom Sentinel connectors still bound to the HTTP Data Collector API or the legacy MMA pipeline, you have already started seeing ingestion warnings and double-billed records.

This is the playbook we used on a recent fourteen-connector migration. Zero ingestion gaps during cutover, full audit trail, operating cost down ~52%.

What actually changes under the hood

The Data Collector API let you POST anything to a Log Analytics workspace and it would magic up a custom table. AMA + Logs Ingestion API requires you to define the table shape in a Data Collection Rule (DCR), route ingestion through a Data Collection Endpoint (DCE), and the DCR validates every record against the schema before insertion. That schema enforcement is the silent footgun. Most legacy connectors send fields that drift over time, and the new pipeline will drop records that don't match instead of accepting whatever shape arrives.

The pre-cutover audit

Before touching any connector, dump the last 30 days of ingestion shape for every custom table:

CustomTable_CL
| project pack=pack_array(*)
| mv-apply p = pack on (
    project Field=tostring(p["Key"]), Type=tostring(gettype(p["Value"]))
  )
| summarize TypeSet=make_set(Type), Count=count() by Field
| sort by Count desc

Look for fields that show up with TypeSet=[string, long] or [string, dynamic]. Those are the records that will break in AMA's schema-enforced pipeline. Fix the producer to emit a single consistent type, or coerce in the DCR transform.

Building the DCR + DCE

The cleanest pattern we landed on:

  1. One DCE per region per environment. Don't share DCEs across prod and non-prod. Cross-environment writes are the source of half the security incidents we have seen.
  2. One DCR per source application. Don't reuse DCRs across applications. The DCR is the schema contract.
  3. A transformKql on the DCR that handles legacy field renames and timestamp normalization.

Sample DCR snippet for a generic webhook collector:

resource dcr 'Microsoft.Insights/dataCollectionRules@2023-03-11' = {
  name: 'dcr-webhook-${app}'
  location: location
  properties: {
    dataCollectionEndpointId: dce.id
    streamDeclarations: {
      'Custom-${app}_CL': {
        columns: [
          { name: 'TimeGenerated', type: 'datetime' }
          { name: 'EventId',       type: 'string' }
          { name: 'EventType',     type: 'string' }
          { name: 'Payload',       type: 'dynamic' }
        ]
      }
    }
    dataSources: {}
    destinations: {
      logAnalytics: [
        { name: 'la', workspaceResourceId: workspace.id }
      ]
    }
    dataFlows: [
      {
        streams: ['Custom-${app}_CL']
        destinations: ['la']
        transformKql: 'source | extend TimeGenerated = coalesce(TimeGenerated, now())'
        outputStream: 'Custom-${app}_CL'
      }
    ]
  }
}

Cutover without ingestion gaps

The naive approach is "stop MMA, start AMA, swap connector URLs." That gives you a real gap. The pattern that works:

  1. Stand up the AMA + DCR + DCE alongside MMA. Let both ingest in parallel for 72 hours.
  2. Run a KQL comparison every hour. Same count plus or minus 0.5%, same field distributions, same query results. If yes, proceed. If no, fix the DCR transform and run another 72 hours.
  3. Cut producer traffic over with a feature flag, one connector at a time. Watch ingestion latency on the new pipeline for an hour before moving the next.
  4. After all producers cut over, keep MMA running but with the workspace destination removed. This gives you a one-week rollback window without paying double-ingest costs.
  5. Decommission MMA only after the rollback window closes.

Where the cost savings come from

The 52% operating-cost reduction was not magic. Three drivers:

  • AMA does not double-ingest. The legacy path was writing to both the workspace and the deprecated EventForwarding table for some connectors. That alone was ~20% of the bill.
  • DCR transforms let us drop noise fields before ingestion. Dropping RawRequestBody from auth logs cut ingestion volume by 30% on that table.
  • Logs Ingestion API has cheaper per-GB cost than the Data Collector API when you batch 50+ records per request.

The thing that bit us

The DCR validates TimeGenerated against a 48-hour window (24h past, 24h future). If your producer emits historical data on cutover (e.g., backfilling a missed window), the DCR silently drops every record older than 24h with no error returned to the producer. The transform extend TimeGenerated = now() in the snippet above is not optional. It is the difference between a clean migration and a four-hour debugging session.

Next steps

If you are mid-migration and want a second set of eyes on the DCR design or the cutover plan, we do fixed-fee Sentinel architecture reviews. Most engagements close in two weeks with a written runbook and a tested rollback plan. Get in touch via the contact page.