生產事故應變

AI → Human

結構化的事故處理流程，結合 AI 分類與人工監督。

5 個節點 · 5 條連接devops

eventagenthumancli

視覺化

告警觸發event

PagerDuty 或 Grafana 告警觸發。

↓sequential→ AI 分類診斷

AI 分類診斷agent

AI 分析日誌、指標與近期部署紀錄。

↓sequential→ 工程師調查

↓timeout→ 工程師調查

工程師調查human

值班工程師驗證 AI 的分類結果。

↓conditional→ 套用緩解措施

套用緩解措施cli

執行回滾、擴容或套用 hotfix。

↓sequential→ 產生事後檢討報告

產生事後檢討報告agent

AI 根據 .osoplog 資料撰寫事後檢討報告。

在編輯器開啟

uc-incident-response.osop.yaml

osop_version: "1.0"
id: "incident-response"
name:"生產事故應變"
description:"結構化的事故處理流程，結合 AI 分類與人工監督。"

nodes:
  - id: "alert"
    type: "event"
    name: "告警觸發"
    description: "PagerDuty 或 Grafana 告警觸發。"

  - id: "triage"
    type: "agent"
    subtype: "llm"
    name: "AI 分類診斷"
    description: "AI 分析日誌、指標與近期部署紀錄。"
    security:
      risk_level: "medium"

  - id: "investigate"
    type: "human"
    subtype: "input"
    name: "工程師調查"
    description: "值班工程師驗證 AI 的分類結果。"

  - id: "mitigate"
    type: "cli"
    subtype: "script"
    name: "套用緩解措施"
    description: "執行回滾、擴容或套用 hotfix。"
    security:
      risk_level: "high"
      approval_gate: true

  - id: "postmortem"
    type: "agent"
    subtype: "llm"
    name: "產生事後檢討報告"
    description: "AI 根據 .osoplog 資料撰寫事後檢討報告。"

edges:
  - from: "alert"
    to: "triage"
    mode: "sequential"
  - from: "triage"
    to: "investigate"
    mode: "sequential"
  - from: "investigate"
    to: "mitigate"
    mode: "conditional"
    when: "investigation.confirmed == true"
  - from: "mitigate"
    to: "postmortem"
    mode: "sequential"
  - from: "triage"
    to: "investigate"
    mode: "timeout"
    timeout_sec: 300
    label: "Escalate if >5min"