TL;DR
OpenTelemetry is the observability standard for monitoring AI agents (spans, traces, metrics). OSOP is the process standard for defining and recording what AI agents do (workflow definitions + execution records). They operate at different layers and are designed to work together.
The Two Layers
┌─────────────────────────────────────────────┐ │ OSOP Layer (Process) │ │ .osop = "Here's the workflow" │ │ .osoplog = "Here's what actually happened" │ │ Human-readable, auditable, optimizable │ ├─────────────────────────────────────────────┤ │ OpenTelemetry Layer (Observability) │ │ Spans = "This function took 200ms" │ │ Traces = "Request A → Service B → DB C" │ │ Metrics = "P99 latency is 500ms" │ │ Machine-readable, dashboards, alerting │ └─────────────────────────────────────────────┘
Key Differences
| Dimension | OSOP | OpenTelemetry GenAI |
|---|---|---|
| Primary purpose | Define workflows + record execution evidence | Monitor performance + detect anomalies |
| Abstraction level | Process steps (human-readable) | Function calls, spans (machine-readable) |
| Format | YAML (.osop + .osoplog) | Protocol Buffers / OTLP |
| Audience | Process owners, auditors, AI engineers | SREs, platform engineers |
| Defines intent | Yes (.osop = what should happen) | No (only records what did happen) |
| Human oversight | First-class human nodes + approval gates | Not modeled |
| Self-optimization | Compare .osop vs .osoplog | Dashboard/alerting |
When to use which
- OSOP — Use OSOP when you need to define what the agent should do, record what it did in human-readable form, compare intent vs. reality, or provide audit evidence for compliance.
- OpenTelemetry — Use OpenTelemetry when you need system-level performance monitoring, distributed tracing across services, real-time dashboards, or alerting on latency/error rates.
- Both — Use both when you need the full picture — process-level understanding (OSOP) plus infrastructure-level observability (OTel). They are complementary, not competing.
The Bridge: osop-interop
OSOP's osop-interop project includes a bidirectional OTel bridge:
- OSOP → OTel: OSOP to OTel: Each .osoplog node record becomes an OTel span. Duration maps to span duration. Tool calls become child spans. AI metadata becomes span attributes.
- OTel → OSOP: OTel to OSOP: OTel traces can be aggregated into .osoplog records for process-level analysis. Spans become node records. Trace ID maps to run_id.
Concrete example: Medical diagnosis workflow
Consider an AI-assisted medical diagnosis workflow. Here is what each layer captures:
OSOP layer (.osoplog)
node_records:
- node_id: "ai_diagnosis"
status: "COMPLETED"
duration_ms: 8500
ai_metadata:
model: "med-llm-v3"
prompt_tokens: 15000
reasoning:
question: "What is the likely diagnosis?"
selected: "Type 2 diabetes"
alternatives:
- { option: "Pre-diabetes", reason: "Borderline A1C" }
- { option: "Type 2 diabetes", reason: "A1C 7.2%" }
confidence: 0.89
- node_id: "doctor_review"
human_metadata:
actor: "dr_smith"
decision: "confirmed"
notes: "Agreed. Order glucose tolerance test."OpenTelemetry layer
{
"traceId": "abc123",
"spans": [{
"name": "llm.chat.completions",
"duration": 3200,
"attributes": {
"gen_ai.system": "med-llm",
"gen_ai.usage.prompt_tokens": 15000,
"gen_ai.response.model": "med-llm-v3"
}
}]
}The OSOP layer tells you what was decided and why — the diagnosis, the alternatives considered, the doctor's confirmation. The OTel layer tells you how fast and how reliably — latency, token usage, error rates. Both are valuable. Neither replaces the other.
Conclusion
OSOP and OpenTelemetry address different questions about AI agent behavior. OSOP asks "what did the agent do and why?" OpenTelemetry asks "how did the system perform?" The mature AI stack uses both — OSOP for process understanding and compliance, OpenTelemetry for operational monitoring.