Claude Code: Model Selection, CLAUDE.md Patterns, and Workflow Integration

The first time Claude Code actually saved a sprint for us, it was not writing application code. It was untangling a 400-line Bicep module that had drifted from its parameter file across three environments. The model read the whole tree, proposed a refactor, ran az bicep build to verify syntax, and flagged a conditional that would have caused a silent failure in production. Four minutes. The same task would have taken a senior engineer two hours on a good day.

The real value is not raw code generation speed. It is the ability to hold a large, messy codebase in context and reason about it without losing the thread.

What the Opus 4.x Line Actually Changed

Claude Code lets you select the underlying model per session. For most of 2025, Sonnet 3.7 was the effective default, and it handled scaffolding and smaller refactors well. When the Opus 4.x series arrived and iterated through 4.6, 4.7, and 4.8, the gap between tiers became hard to ignore on complex tasks.

Opus handles multi-file reasoning noticeably better. Give it a Terraform module spread across eight files with cross-references and conditional logic, and it tracks the dependency graph consistently. Sonnet sometimes loses an assumption it made in file three by the time it is editing file six. For detection engineering, where a KQL query touches multiple tables and correctness depends on understanding the full data model, Opus 4.x is more reliable by a meaningful margin.

The trade-off is cost and latency. Opus tokens run roughly 15x a Haiku call and 3-4x a Sonnet call (the exact multiplier has moved as pricing has changed, but the order of magnitude holds). We use Haiku for fast linting passes and doc generation, Sonnet for routine code tasks, and Opus for architecture decisions, security-sensitive logic, and data pipeline design. That tiered approach cuts monthly API spend by about 60% compared to running everything on Opus.

CLAUDE.md Is Load-Bearing Infrastructure

CLAUDE.md is the project-level instruction file Claude Code reads at the start of every session. Most teams treat it as optional documentation. It is not. It is the difference between a model that helps and one that generates plausible-looking nonsense.

A CLAUDE.md that actually works includes architecture facts, naming conventions, known issues, and explicit guardrails:

# Project: Sentinel Detection Pipeline

## Architecture
- Ingestion: Azure Event Hubs -> Databricks Delta Live Tables
- Storage: ADLS Gen2, partitioned by tenant/date
- SIEM: Microsoft Sentinel, workspace ID in Key Vault (kv-sentinel-prod)
- Alerting: Logic Apps -> PagerDuty

## Conventions
- All KQL queries must be tested against the last 7 days of real data before commit
- Bicep modules live in /infra/modules; do not inline resources in main.bicep
- Never hardcode tenant IDs, subscription IDs, or workspace IDs
- CI runs: `az bicep build --file main.bicep` and `pylint ./src`

## Current Work
- Migrating parsers from CEF to ASIM schema (see /parsers/README.md)
- Known issue: SentinelOne connector drops duplicate events under high load (#4421)

## Do Not Touch
- /infra/legacy -- deprecated, removal scheduled Q3 2026
- /scripts/bootstrap.sh -- runs in prod provisioning; changes require PR review

## Testing
- Unit: `pytest tests/unit`
- Integration: requires AZURE_TEST_SUB env var

The "Do Not Touch" section matters more than it looks. Without it, the model will helpfully "clean up" files you cannot afford to have touched. The known issues section pre-empts the model from spending multiple tool calls investigating something you already understand.

The NPM Source Map Incident

Earlier this year, a source map file was accidentally shipped inside the Claude Code NPM package. Developers who examined the .map artifact got an unintended look at internal implementation details: how the tool router is structured, the order in which hooks fire, and how the skill system is composed at load time.

From an engineering standpoint, it was less a secret revealed and more a confirmation of architecture behavior already visible in day-to-day use. The hook system is genuinely modular. Skills are composed at session initialization. The agentic loop is interruptible at most steps, which explains why /stop is as reliable as it is.

The security lesson is familiar: anything shipped to a public registry should be treated as public, including debug artifacts. We now run npm pack --dry-run and inspect the file manifest before any internal tooling package goes to a registry. Takes thirty seconds, saves embarrassment.

Hooks and Settings Patterns That Hold Up

Claude Code's hook system wires shell commands to lifecycle events: before a tool call, after the session stops, on specific transcript patterns. A few patterns have proven durable across client projects.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "scripts/pre-bash-lint.sh"
          }
        ]
      }
    ],
    "Stop": [
      {
        "matcher": ".*",
        "hooks": [
          {
            "type": "command",
            "command": "scripts/post-session-notify.sh"
          }
        ]
      }
    ]
  }
}

pre-bash-lint.sh reads the proposed command from $CLAUDE_TOOL_INPUT, extracts any shell script content, and runs shellcheck against it before execution. This alone has caught a dozen bugs, mostly unquoted variable expansions and missing set -e guards, that would have been unpleasant to debug after the fact.

The stop hook pushes a summary of what changed to the team Slack channel. The rest of the team sees what the AI session did without reading a transcript. For detection engineering work, we add a pre-write hook on the /parsers directory that validates generated KQL against our ASIM schema linter. Claude will sometimes produce syntactically valid queries that violate internal field naming conventions. The hook catches that before anything gets committed.

Where It Breaks Down

Claude Code is not reliable for tasks requiring real-time state. If a deployment is in flight and you ask it to check status, it will guess or fabricate unless you explicitly wire it a tool that can query live systems.

Long sessions degrade. After roughly 90 minutes of back-and-forth in a large codebase, context compression starts dropping details. We have seen the model forget a constraint stated forty turns earlier and reintroduce a pattern it was explicitly told to avoid. The fix is scope discipline: one task, one session. Use /compact before context pressure builds, not after you notice the quality drop.

Multi-repo work is awkward. Claude Code is well-suited to a single working directory. When a task spans two repositories with a cross-repo dependency, you need a carefully constructed CLAUDE.md that describes the other repo's interface in enough detail that the model does not invent an API shape. That description has to stay current, which adds maintenance overhead.

Security-sensitive code paths should not be delegated entirely to the model. We use Claude Code to generate a first-pass review and surface candidates, then a human reviews those candidates against the actual threat model. The model is good at recognizing patterns; it does not always understand the trust boundary.

When to Call Us

If your team is integrating Claude Code into an existing CI/CD pipeline, detection engineering workflow, or multi-cloud IaC process and wants patterns that hold up past the proof-of-concept stage, reach out.