OSOP vs OpenTelemetry: Different Layers, Complementary Standards

TL;DR

OpenTelemetry is the observability standard for monitoring AI agents (spans, traces, metrics). OSOP is the process standard for defining and recording what AI agents do (workflow definitions + execution records). They operate at different layers and are designed to work together.

The Two Layers

two-layers.txt
┌─────────────────────────────────────────────┐
│  OSOP Layer (Process)                       │
│  .osop = "Here's the workflow"              │
│  .osoplog = "Here's what actually happened" │
│  Human-readable, auditable, optimizable     │
├─────────────────────────────────────────────┤
│  OpenTelemetry Layer (Observability)        │
│  Spans = "This function took 200ms"         │
│  Traces = "Request A → Service B → DB C"   │
│  Metrics = "P99 latency is 500ms"           │
│  Machine-readable, dashboards, alerting     │
└─────────────────────────────────────────────┘

Key Differences

Dimension	OSOP	OpenTelemetry GenAI
Primary purpose	Define workflows + record execution evidence	Monitor performance + detect anomalies
Abstraction level	Process steps (human-readable)	Function calls, spans (machine-readable)
Format	YAML (.osop + .osoplog)	Protocol Buffers / OTLP
Audience	Process owners, auditors, AI engineers	SREs, platform engineers
Defines intent	Yes (.osop = what should happen)	No (only records what did happen)
Human oversight	First-class human nodes + approval gates	Not modeled
Self-optimization	Compare .osop vs .osoplog	Dashboard/alerting

When to use which

OSOP — Use OSOP when you need to define what the agent should do, record what it did in human-readable form, compare intent vs. reality, or provide audit evidence for compliance.
OpenTelemetry — Use OpenTelemetry when you need system-level performance monitoring, distributed tracing across services, real-time dashboards, or alerting on latency/error rates.
Both — Use both when you need the full picture — process-level understanding (OSOP) plus infrastructure-level observability (OTel). They are complementary, not competing.

The Bridge: osop-interop

OSOP's osop-interop project includes a bidirectional OTel bridge:

OSOP → OTel: OSOP to OTel: Each .osoplog node record becomes an OTel span. Duration maps to span duration. Tool calls become child spans. AI metadata becomes span attributes.
OTel → OSOP: OTel to OSOP: OTel traces can be aggregated into .osoplog records for process-level analysis. Spans become node records. Trace ID maps to run_id.

Concrete example: Medical diagnosis workflow

Consider an AI-assisted medical diagnosis workflow. Here is what each layer captures:

OSOP layer (.osoplog)

diagnosis.osoplog.yaml

node_records:
  - node_id: "ai_diagnosis"
    status: "COMPLETED"
    duration_ms: 8500
    ai_metadata:
      model: "med-llm-v3"
      prompt_tokens: 15000
    reasoning:
      question: "What is the likely diagnosis?"
      selected: "Type 2 diabetes"
      alternatives:
        - { option: "Pre-diabetes", reason: "Borderline A1C" }
        - { option: "Type 2 diabetes", reason: "A1C 7.2%" }
      confidence: 0.89
  - node_id: "doctor_review"
    human_metadata:
      actor: "dr_smith"
      decision: "confirmed"
      notes: "Agreed. Order glucose tolerance test."

Open in Editor

OpenTelemetry layer

otel-trace.json
{
  "traceId": "abc123",
  "spans": [{
    "name": "llm.chat.completions",
    "duration": 3200,
    "attributes": {
      "gen_ai.system": "med-llm",
      "gen_ai.usage.prompt_tokens": 15000,
      "gen_ai.response.model": "med-llm-v3"
    }
  }]
}

The OSOP layer tells you what was decided and why — the diagnosis, the alternatives considered, the doctor's confirmation. The OTel layer tells you how fast and how reliably — latency, token usage, error rates. Both are valuable. Neither replaces the other.

Conclusion

OSOP and OpenTelemetry address different questions about AI agent behavior. OSOP asks "what did the agent do and why?" OpenTelemetry asks "how did the system perform?" The mature AI stack uses both — OSOP for process understanding and compliance, OpenTelemetry for operational monitoring.