MACS: A Cognitive Diversity Multi-Agent Consensus Framework for Bias Mitigation in Automated Evaluation Systems
Institute of Electrical and Electronics Engineers (IEEE), Q3
Abstrak
The reliance on single Large Language Models (LLMs) for automated academic assessment risks creating an algorithmic monoculture, where inherent model biases are amplified at scale. This paper introduces a novel framework, the Multi-Agent Consensus System (MACS), designed to mitigate this risk by simulating cognitive diversity. MACS orchestrates a heterogeneous ensemble of LLMs in a structured, adversarial peer-review workflow. The system comprises: (1) a VLM-driven multimodal extraction module for high-fidelity data retrieval from PDFs; (2) an initial review by a primary agent; (3) a critical challenge stage by secondary agents with diverse architectures; and (4) a final arbitration stage where a concluding agent synthesizes conflicting evaluations to form a robust consensus. By formalizing this process of structured disagreement and resolution, our framework moves beyond simple ensemble averaging. We introduce the Disagreement-Resolution Ratio (DRR) as a novel metric to quantify the system's ability to identify and correct initial scoring biases. Our experiments show that MACS achieves 92.3% scoring accuracy and reduces single-model scoring variance by 63%, demonstrating its superior robustness and fairness in automated academic evaluation.