9 AI to Co-Create Grading Rubrics (Statistics & Data Science)
Author: Sally Ragsdale, Department of Statistics and Data Sciences, The University of Texas at Austin Sally Ragsdale joined the Department of Statistics and Data Sciences in 2012 and is the course coordinator for SDS 320E Elements of Statistics as well as the undergraduate minor and certificate faculty advisor. To see Sally’s full bio, click here. Cite this Chapter: Ragsdale, S. (2025). AI to Co-Create Grading Rubrics. In K. Procko, E. N. Smith, and K. D. Patterson (Eds.), Enhancing STEM Higher Education through Artificial Intelligence. The University of Texas at Austin. https://doi.org/10.15781/7dv5-g279 |
Description of resource(s):
This page summarizes my attempts to use both Microsoft Copilot and the free version of ChatGPT 4 to assist in creating grading keys for the homework assignments in my introductory applied statistics course. Included is a comparison of the two generative AI tools, examples of AI-generated solutions and grading rubrics, and feedback from my undergraduate course assistants (who use the keys).
Links to Resources:
Why I implemented this:
The goal of this project was to explore ways I could use generative AI tools to help make answer keys and grading rubrics for use in my introductory statistics class. In this course, undergraduate course assistants (UGCAs) graded homework assignments and required robust and clearly defined grading schemes that could be fairly applied across students. Often a time-consuming and tedious task, I wanted to find a more efficient way to make homework solutions that include point breakdowns for each problem and instructions for how much partial credit to award for common errors.
I input a combination of quantitative and open-response homework problems into Copilot and ChatGPT and prompted them to 1) produce the correct solutions, checked against those I had previously prepared, 2) include point breakdowns and specific deductions for common mistakes, and 3) output an organized document with clear and concise grading notes that would be easy for UGCAs to use when grading actual student work.
After refining my prompts to achieve the best possible output, I gave three of my UGCAs both a completely human-generated grading key as well as a hybrid key that contained my edits to the AI-generated content.
My main takeaways:
- AI Tool Comparison: ChatGPT was superior to Copilot both in terms of generating correct solutions and also being able to output fully formatted keys.
- Pros: The keys ChatGPT produced often gave reasonable point breakdowns for multi-step problems and correct answers to calculation-based problems.
- Cons: The ChatGPT keys often contained errors, including:
- Mis-numbered problems.
- Only hypothetical answers to problems that required interpreting a graph, analyzing a dataset in R, or referencing an attached article.
- Incomplete/vague answers (e.g. stating “use the median” without actually calculating the number).
- Using repetitive comments and overly wordy grading notes.
- UGCA Feedback: They preferred the human key overall, although there were some aspects of the AI-generated key that they thought were helpful when grading, such as clearer breakdown of point values and explanations.
What else should I consider?
Timing: Refining the prompt I fed into Copilot and ChatGPT took about an hour, although I have experience with developing detailed prompt instructions for these programs. Once I had the right combination of instructions (see attachment), I could easily copy/paste the prompt and use it for any number of homeworks. After getting the output key/rubric, it then took quite a bit of time to cross-check my answers to make sure the AI key didn’t have any mistakes, re-number the problems to match the assignment, and edit any answers it wasn’t able to produce.
Context: I think this resource could be helpful for instructors in a variety of disciplines and would work for assignments that contain multi-step calculations and/or open-ended interpretations of problems stated completely in words (no graphs or tables).
Adding Student Self-Reflection: Given the ease with which anyone can input even complex, open-ended assignment questions into these programs and get a fairly well-constructed and (most often) correct response, it is important to consider other ways of assessing students’ understanding of course topics. Especially for our large service courses, where many students lack motivation to actually learn the material, we routinely encounter students using generative AI unethically to complete assignments. In an online course I developed last summer, I experimented with restructuring homework assignments to have a low-stakes quiz in which students have multiple attempts to submit quantitative answers that are auto-graded in Canvas followed by a written assignment where they must explain the reasoning behind their answers, interpret their results, and discuss the challenges they faced when working on the problem. Not only did these assignments allow students to self-reflect on their own learning, but the hope was that it forced students to show that they actually thought through how to solve the problem, thus discouraging unethical use of generative AI tools.
Crafting prompts: It is important to be very specific with prompts given to ChatGPT, including information about who you are, what your audience is, and what format you want the output in. ChatGPT was easily able to read a .docx file of assignment questions and output a key in the same format. If the key is too repetitive, adding a sentence to the prompt instructing it to give more concise notes is an easy fix. With a little bit of calibrating, it was able to output what I wanted it to.