Baumeister T, Abdennadher H, Fey D (2026)
Publication Language: English
Publication Type: Conference contribution
Publication year: 2026
Providing timely, personalized feedback is critical for learning in
technical higher education, yet remains a significant challenge in large
courses. With over 100 students submitting complex assignments in
domains like Computer Architecture, teaching staff struggle to deliver
detailed, consistent feedback before its pedagogical value diminishes.
This study investigates whether Large Language Models (LLMs) can help
address this feedback bottleneck. We comparatively evaluated four
state-of-the-art models (ChatGPT-4o, Gemini 2.5 Pro, Claude Sonnet 4,
and DeepSeek-V3) on 380 real, anonymized student submissions from an
undergraduate Computer Architecture course covering RISC-V assembly
programming, microprogramming, and pipeline analysis. Each submission
was processed using four distinct prompt configurations, ranging from a
basic structured rubric to a hybrid prompt combining reference solutions
with feedback exemplars. The resulting 5,960 AI-generated feedback
instances were then assessed through systematic human evaluation across
six pedagogical criteria: Correctness, Clarity, Depth, Consistency,
Usefulness, and Strictness.
Results demonstrate that current LLMs can produce technically accurate
and pedagogically valuable feedback when supported by well-designed
prompts. Gemini 2.5 Pro achieved the highest overall correctness
(8.71/10 mean), while ChatGPT-4o exhibited the most balanced performance
across all criteria. Claude Sonnet 4 demonstrated superior clarity and
pedagogical tone, and DeepSeek-V3 provided a cost-efficient open-weight
alternative. Critically, prompt engineering influenced feedback quality
comparably to model selection: enriched prompts improved correctness by
up to 17%, with the hybrid prompt consistently yielding superior results
across all models.
These findings suggest that LLM-assisted feedback systems offer
substantial potential for addressing scalability challenges in technical
education. Practical implications include deployment as draft feedback
generators for teaching assistants, enabling consistent, detailed
feedback for all students while preserving essential human oversight.
With careful prompt engineering and transparent implementation, LLMs can
help make personalized formative assessment achievable at scale.
APA:
Baumeister, T., Abdennadher, H., & Fey, D. (2026). Addressing the Feedback Bottleneck in Large Technical Courses: A Comparative Study of LLM-Assisted Assessment in Computer Architecture. In Proceedings of the 20th annual International Technology, Education and Development Conference. Valencia, ES.
MLA:
Baumeister, Tobias, Hazem Abdennadher, and Dietmar Fey. "Addressing the Feedback Bottleneck in Large Technical Courses: A Comparative Study of LLM-Assisted Assessment in Computer Architecture." Proceedings of the 20th annual International Technology, Education and Development Conference, Valencia 2026.
BibTeX: Download