多模型作文評分

AI ↔ AI

三個 LLM 並行評分,彙整分數,進行偏見檢查,再發出最終成績。

7 個節點 · 9 條連接education
agentsystem
視覺化
作文接收system

接收提交的作文及評分標準,對學生身份進行匿名化處理。

parallel評分者 A(GPT-4o)
parallel評分者 B(Claude)
parallel評分者 C(Gemini)
評分者 A(GPT-4o)agent

依結構、論點品質、證據運用與寫作清晰度評分。

parallel分數彙整
評分者 B(Claude)agent

使用相同評分標準獨立評分,對其他評分者保持盲評。

parallel分數彙整
評分者 C(Gemini)agent

作為第三位評分者獨立評分,以形成穩健的共識。

parallel分數彙整
分數彙整system

計算加權平均分數,若任一評分者偏差超過平均值 15% 則標記。

sequential偏見偵測代理人
偏見偵測代理人agent

分析評分模式是否存在人口統計偏見、主題偏見或長度偏見。

conditional最終成績與回饋
fallback評分者 A(GPT-4o)
最終成績與回饋agent

發出最終成績,附上所有評分者的綜合回饋與改善建議。

uc-multi-model-grading.osop.yaml
osop_version: "1.0"
id: "multi-model-grading"
name:"多模型作文評分"
description:"三個 LLM 並行評分,彙整分數,進行偏見檢查,再發出最終成績。"

nodes:
  - id: "essay_intake"
    type: "system"
    name: "作文接收"
    description: "接收提交的作文及評分標準,對學生身份進行匿名化處理。"

  - id: "grader_1"
    type: "agent"
    subtype: "llm"
    name: "評分者 A(GPT-4o)"
    description: "依結構、論點品質、證據運用與寫作清晰度評分。"

  - id: "grader_2"
    type: "agent"
    subtype: "llm"
    name: "評分者 B(Claude)"
    description: "使用相同評分標準獨立評分,對其他評分者保持盲評。"

  - id: "grader_3"
    type: "agent"
    subtype: "llm"
    name: "評分者 C(Gemini)"
    description: "作為第三位評分者獨立評分,以形成穩健的共識。"

  - id: "aggregate"
    type: "system"
    name: "分數彙整"
    description: "計算加權平均分數,若任一評分者偏差超過平均值 15% 則標記。"

  - id: "bias_check"
    type: "agent"
    subtype: "llm"
    name: "偏見偵測代理人"
    description: "分析評分模式是否存在人口統計偏見、主題偏見或長度偏見。"

  - id: "final_grade"
    type: "agent"
    subtype: "llm"
    name: "最終成績與回饋"
    description: "發出最終成績,附上所有評分者的綜合回饋與改善建議。"

edges:
  - from: "essay_intake"
    to: "grader_1"
    mode: "parallel"
  - from: "essay_intake"
    to: "grader_2"
    mode: "parallel"
  - from: "essay_intake"
    to: "grader_3"
    mode: "parallel"
  - from: "grader_1"
    to: "aggregate"
    mode: "parallel"
  - from: "grader_2"
    to: "aggregate"
    mode: "parallel"
  - from: "grader_3"
    to: "aggregate"
    mode: "parallel"
  - from: "aggregate"
    to: "bias_check"
    mode: "sequential"
  - from: "bias_check"
    to: "final_grade"
    mode: "conditional"
    when: "bias.detected == false"
  - from: "bias_check"
    to: "grader_1"
    mode: "fallback"
    label: "Bias detected, re-grade with adjusted prompts"