Evaluating Task-Level Struggle Detection Methods in Intelligent Tutoring Systems for Programming

Dannath, JesperDeriyeva, AlinaPaaßen, BenjaminSchulz, SandraKiesler, Natalie2024-09-032024-09-0320242944-7682https://dl.gi.de/handle/20.500.12116/44545Intelligent Tutoring Systems require student modeling in order to make pedagogical decisions, such as individualized feedback or task selection. Typically, student modeling is based on the eventual correctness of tasks. However, for multi-step or iterative learning tasks, like in programming, the intermediate states towards a correct solution also carry crucial information about learner skill. We investigate how to detect learners who struggle on their path towards a correct solution of a task. Prior work addressed struggle detection in programming environments on different granularity levels, but has mostly focused on preventing course dropout. We conducted a pilot study of our programming learning environment and evaluated different approaches for struggle detection at the task level. For the evaluation of measures, we use downstream Item Response Theory competency models. We find that detecting struggle based on large language model text embeddings outperforms chosen baselines with regard to correlation with a programming competency proxy.enIntelligent Tutoring SystemsItem Response TheoryStruggleLarge Language ModelsEvaluating Task-Level Struggle Detection Methods in Intelligent Tutoring Systems for ProgrammingText/Conference paper10.18420/delfi2024_072944-7682