Robin Schmucker

8227 Gates Hillman Center

4902 Forbes Ave

Pittsburgh, PA 15213

Welcome! I am a postdoctoral researcher with joint affiliations at Carnegie Mellon University, under the guidance of Prof. Tom Mitchell, and the University of California, Berkeley, where I collaborate with Prof. Zachary Pardos.

My research focuses on machine learning and human-AI interaction, particularly in the context of large-scale online education. I develop data-driven algorithms and systems to promote more effective, scalable, and equitable learning experiences. Some questions I am actively pursuing:

What can we learn about student knowledge acquistion using modern machine learning and robust statistical methods? [1, 2, 3, 4]
How can reinforcement learning help us understand the effects of instructional materials and refine the abilities of learning systems? [1, 2]
How can generative-AI facilitate structured conversational learning activities and foster new types of content authoring tools? [1, 2]

We are grateful to collaborate with the CK-12 Foundation, where our algorithms for student knowledge modeling and content selection benefit millions of learners worldwide.

Previously, I completed my PhD in the Machine Learning Department at CMU. I studied computer science at KIT in Germany. I was a research assistant at TECO with Prof. Michael Beigl. Supported by the CLICS fellowship, I worked on human-robot interaction advised by Prof. Manuela Veloso. In the industry, I was a research intern at AWS where I designed new algorithms for multi-objective hyperparameter optimization and contributed to AutoGluon.

Research opportunities: I am happy to collaborate, discuss research and answer questions about CMU’s academic programs. If you are interested, please feel free to send me an email.

I am on the job market for positions in machine learning, human-AI interaction, education and related disciplines.

news

Dec 18, 2024	Our paper on AI Mentors for Student Projects got accepted as a spotlight at AAAI-iRaise.
Nov 05, 2024	Honored to give talks about LLM-based conversational tutoring at UPenn and WPI.
Oct 03, 2024	I successfully defended my PhD thesis on Sequence-Modeling for Assessments and Interventions in Intelligent Tutoring Systems. I am deeply thankful to the many friends and collaborators who contributed to my dissertation, both directly and indirectly.
Jul 28, 2024	Our project Artificial Mentors for Student-Driven Projects won a Tools Competiton Catalyst award. We develop LLM-based technologies to support project-based learning activities.
Jul 06, 2024	Looking forward to two weeks of insightful discussions and presentations at AIED and L@S. We are honored to present several of our recent works [1,2,3].

selected publications

AIED

Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System

Robin Schmucker, Meng Xia, Amos Azaria, and Tom Mitchell

In Proceedings of the 25th International Conference on Artificial Intelligence in Education , 2024

Abs PDF Code

Conversational tutoring systems (CTSs) offer learning experiences through interactions based on natural language. They are recognized for promoting cognitive engagement and improving learning outcomes, especially in reasoning tasks. Nonetheless, the cost associated with authoring CTS content is a major obstacle to widespread adoption and to research on effective instructional design. In this paper, we discuss and evaluate a novel type of CTS that leverages recent advances in large language models (LLMs) in two ways: First, the system enables AI-assisted content authoring by inducing an easily editable tutoring script automatically from a lesson text. Second, the system automates the script orchestration in a learning-by-teaching format via two LLM-based agents (Ruffle&Riley) acting as a student and a professor. The system allows for free-form conversations that follow the ITS-typical inner and outer loop structure. We evaluate Ruffle&Riley’s ability to support biology lessons in two between-subject online user studies (N = 200) comparing the system to simpler QA chatbots and reading activity. Analyzing system usage patterns, pre/post-test scores and user experience surveys, we find that Ruffle&Riley users report high levels of engagement, understanding and perceive the offered support as helpful. Even though Ruffle&Riley users require more time to complete the activity, we did not find significant differences in short-term learning gains over the reading activity. Our system architecture and user study provide various insights for designers of future CTSs. We further open-source our system to support ongoing research on effective instructional design of LLM-based learning technologies.
L@S

Gaining Insights into Group-Level Course Difficulty via Differential Course Functioning

Frederik Baucks^*, Robin Schmucker^*, Conrad Borchers, Zachary A. Pardos, and Laurenz Wiskott

In Proceedings of the 11th ACM Conference on Learning @ Scale , 2024

Abs PDF Code

Curriculum Analytics (CA) studies curriculum structure and student data to ensure the quality of educational programs. One desirable property of courses within curricula is that they are not unexpectedly more difficult for students of different backgrounds. While prior work points to likely variations in course difficulty across student groups, robust methodologies for capturing such variations are scarce, and existing approaches do not adequately decouple course-specific difficulty from students’ general performance levels. The present study introduces Differential Course Functioning (DCF) as an Item Response Theory (IRT)-based CA methodology. DCF controls for student performance levels and examines whether significant differences exist in how distinct student groups succeed in a given course. Leveraging data from over 20,000 students at a large public university, we demonstrate DCF’s ability to detect inequities in undergraduate course difficulty across student groups described by grade achievement. We compare major pairs with high co-enrollment and transfer students to their non-transfer peers. For the former, our findings suggest a link between DCF effect sizes and the alignment of course content to student home department motivating interventions targeted towards improving course preparedness. For the latter, results suggest minor variations in course-specific difficulty between transfer and non-transfer students. While this is desirable, it also suggests that interventions targeted toward mitigating grade achievement gaps in transfer students should encompass comprehensive support beyond enhancing preparedness for individual courses. By providing more nuanced and equitable assessments of academic performance and difficulties experienced by diverse student populations, DCF could support policymakers, course articulation officers, and student advisors.
LAK
Gaining Insights into Course Difficulty Variations Using Item Response Theory

Frederik Baucks^*, Robin Schmucker^*, and Laurenz Wiskott

In Proceedings of the 14th Learning Analytics and Knowledge Conference , 2024

Abs Bib PDF Code

Curriculum analytics (CA) studies curriculum structure and student data to ensure the quality of educational programs. To gain statistical robustness, most existing CA techniques rely on the assumption of time-invariant course difficulty, preventing them from capturing variations that might occur over time. However, ensuring low temporal variation in course difficulty is crucial to warrant fairness in treating individual student cohorts and consistency in degree outcomes. We introduce item response theory (IRT) as a CA methodology that enables us to address the open problem of monitoring course difficulty variations over time. We use statistical criteria to quantify the degree to which course performance data meets IRT’s theoretical assumptions and verify validity and reliability of IRT-based course difficulty estimates. Using data from 664 Computer Science and 1,355 Mechanical Engineering undergraduate students, we show how IRT can yield valuable CA insights: First, by revealing temporal variations in course difficulty over several years, we find that course difficulty has systematically shifted downward during the COVID-19 pandemic. Second, time-dependent course difficulty and cohort performance variations confound conventional course pass rate measures. We introduce IRT-adjusted pass rates as an alternative to account for these factors. Our findings affect policymakers, student advisors, accreditation, and course articulation.
@inproceedings{Baucks2024:IRT, author = {Baucks, Frederik and Schmucker, Robin and Wiskott, Laurenz}, title = {Gaining Insights into Course Difficulty Variations Using Item Response Theory}, year = {2024}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3636555.3636902}, booktitle = {Proceedings of the 14th Learning Analytics and Knowledge Conference}, pages = {450--461}, numpages = {12}, series = {LAK '24}, }
ECTEL
Learning to Give Useful Hints: Assistance Action Evaluation and Policy Improvements

Robin Schmucker, Nimish Pachapurkar, Shanmuga Bala, Miral Shah, and Tom Mitchell

In Proceedings of the 18th European Conference on Technology Enhanced Learning , 2023

Abs Bib PDF

We describe a fielded online tutoring system that learns which of several candidate assistance actions (e.g., one of multiple hints) to provide to students when they answer a practice question incorrectly. The system learns, from large-scale data of prior students, which assistance action to give for each of thousands of questions, to maximize measures of student learning outcomes. Using data from over 190,000 students in an online Biology course, we quantify the impact of different assistance actions for each question on a variety of outcomes (e.g., response correctness, practice completion), framing the machine learning task as a multi-armed bandit problem. We study relationships among different measures of learning outcomes, leading us to design an algorithm that for each question decides on the most suitable assistance policy training objective to optimize central target measures. We evaluate the trained policy for providing assistance actions, comparing it to a randomized assistance policy in live use with over 20,000 students, showing significant improvements resulting from the system’s ability to learn to teach better based on data from earlier students in the course. We discuss our design process and challenges we faced when fielding data-driven technology, providing insights to designers of future learning systems.
@inproceedings{Schmucker2023:Learning, author = {Schmucker, Robin and Pachapurkar, Nimish and Bala, Shanmuga and Shah, Miral and Mitchell, Tom}, title = {Learning to Give Useful Hints: Assistance Action Evaluation and Policy Improvements}, booktitle = {Proceedings of the 18th European Conference on Technology Enhanced Learning}, year = {2023}, publisher = {Springer Nature Switzerland}, address = {Cham}, pages = {383--398}, }
ICCE
Transferable Student Performance Modeling for Intelligent Tutoring Systems

Robin Schmucker, and Tom M Mitchell

In Proceedings of the 30th International Conference on Computers in Education , 2022

Abs Bib PDF

Millions of students worldwide are now using intelligent tutoring systems (ITSs). At their core, ITSs rely on student performance models (SPMs) to trace each student’s changing ability level over time, in order to provide personalized feedback and instruction. Crucially, SPMs are trained using interaction sequence data of previous students to analyze data generated by future students. This induces a cold-start problem when a new course is introduced, because no students have yet taken the course and hence there is no data to train the SPM. Here, we consider transfer learning techniques to train accurate SPMs for new courses by leveraging log data from existing courses. We study two settings: (i) In the naive transfer setting, we first train SPMs on existing course data and then apply these SPMs to new courses without modification. (ii) In the inductive transfer setting, we fine tune these SPMs using a small amount of training data from the new course (e.g., collected during a pilot study). We evaluate the proposed techniques using student interaction sequence data from five different mathematics courses taken by over 47,000 students. The naive transfer models that use features provided by human domain experts (e.g., difficulty ratings for questions in the new course) but no student interaction training data for the new course, achieve prediction accuracy on par with standard BKT and PFA models that use training data from thousands of students in the new course. In the inductive setting our transfer approach yields more accurate predictions than conventional SPMs when only limited student interaction training data (<100 students) is available to both.
@inproceedings{Schmucker2022:Transferable, title = {Transferable Student Performance Modeling for Intelligent Tutoring Systems}, author = {Schmucker, Robin and Mitchell, Tom M}, booktitle = {Proceedings of the 30th International Conference on Computers in Education}, publisher = {APSCE}, year = {2022}, pages = {13--23}, address = {Kuala Lumpur, MY}, series = {ICCE '22}, }
JEDM
Assessing the Knowledge State of Online Students-New Data, New Approaches, Improved Accuracy

Robin Schmucker, Jingbo Wang, Shijia Hu, Tom Mitchell, and others

Journal of Educational Data Mining, 2022

Abs Bib PDF Code

We consider the problem of assessing the changing performance levels of individual students as they go through online courses. This student performance modeling problem is a critical step for building adaptive online teaching systems. Specifically, we conduct a study of how to utilize various types and large amounts of log data from earlier students to train accurate machine learning models that predict the performance of future students. This study is the first to use four very large sets of student data made available recently from four distinct intelligent tutoring systems. Our results include a new machine learning approach that defines a new state of the art for logistic regression-based student performance modeling, improving over earlier methods in several ways: First, we achieve improved accuracy of student modeling by introducing new features that can be easily computed from conventional question-response logs (e.g., features such as the pattern in the student’s most recent answers). Second, we take advantage of features of the student history that go beyond question-response pairs (e.g., features such as which video segments the student watched, or skipped) as well as background information about prerequisite structure in the curriculum. Third, we train multiple specialized student performance models for different aspects of the curriculum (e.g., specializing in early versus later segments of the student history), then combine these specialized models to create a group prediction of the student performance. Taken together, these innovations yield an average AUC score across these four datasets of 0.808 compared to the previous best logistic regression approach score of 0.767, and also outperforming state-of-the-art deep neural net approaches. Importantly, we observe consistent improvements from each of our three methodological innovations, in each diverse dataset, suggesting that our methods are of general utility and likely to produce improvements for other online tutoring systems as well.
@article{Schmucker2022:Assessing, title = {Assessing the Knowledge State of Online Students-New Data, New Approaches, Improved Accuracy}, author = {Schmucker, Robin and Wang, Jingbo and Hu, Shijia and Mitchell, Tom and others}, journal = {Journal of Educational Data Mining}, volume = {14}, number = {1}, pages = {1--45}, year = {2022}, }