Allowing ChatGPT on knowledge-based coursework improves short-term student performance in several classes, with more interactive use linked to modestly higher scores; students value speed and organization but warn of inaccuracies and overreliance, prompting calls for explicit AI-literacy instruction.
Abstract This study investigates how university students engage with generative artificial intelligence (GenAI), specifically ChatGPT, when completing knowledge-based academic tasks across six courses and two institutions. By comparing performance and perceptions in engineering and non-engineering subjects, the study examines whether students can use GenAI effectively without prior training and to what extent such tools meaningfully support learning. The work also explores how these findings may inform future research on accessible and inclusive learning design. A multi-method design was employed with 254 undergraduate and graduate students assigned to experimental groups (allowed to use ChatGPT) or control groups (restricted to traditional, non-GenAI resources). Quantitative analyses included descriptive statistics, a general linear model, and non-parametric comparisons, complemented by a topic-based analysis of open-ended survey responses addressing students’ perceptions, usage patterns, and desired functionalities. Students in the experimental groups generally obtained higher scores, with significant improvements in several subjects (e.g., computer systems administration, informatics, childhood disorders). A weak but significant positive correlation emerged between iterative engagement with ChatGPT (edits) and academic performance. Qualitative analysis showed that students valued ChatGPT for fast information access, clarification of concepts, and organizational support, while also expressing concerns about inaccuracies, overreliance, and limitations of free versions. GenAI can enhance student performance when used actively and reflectively, although its effectiveness varies by disciplinary context. The findings highlight the need for explicit AI-literacy instruction to ensure critical and responsible use. While the study does not directly address disability or accessibility outcomes, the qualitative patterns suggest potential intersections with inclusive and multimodal learning design, pointing to promising avenues for future research.
Summary
Main Finding
Students given unrestricted access to ChatGPT generally scored higher on short, knowledge-based academic tasks than students restricted to traditional resources; gains were significant in several courses (e.g., Computer Systems Administration, Informatics, Childhood Disorders). Iterative, active engagement with the model (measured by edits/prompts) showed a weak but significant positive correlation with performance. Qualitative responses indicate students value speed, clarification, and organization support from ChatGPT, but worry about inaccuracies, over-reliance, and limits of free versions. The paper concludes GenAI can boost learning outcomes when used reflectively, but effectiveness varies by discipline and requires explicit AI-literacy instruction.
Key Points
- Design: Multi-method, multi-institutional experimental study comparing experimental (allowed ChatGPT) vs control (no GenAI) conditions on identical knowledge questions.
- Sample: 254 students (133 experimental, 121 control) across six courses at two Spanish universities; predominantly engineering students (219), 24% female in engineering courses.
- No formal GenAI training given; under 7% reported prior instruction — study captures “naturalistic/untrained” use.
- Assessment: identical question sets and rubrics for both groups to isolate effect of GenAI access; additional logging of interaction metrics (prompts/edits).
- Quantitative analysis: descriptive stats, general linear model (GLM), and non-parametric tests. Main quantitative findings:
- Experimental group generally outperformed control.
- Significant improvements in specific subjects (e.g., CSA, INF, CD).
- Weak but statistically significant positive correlation between iterative engagement with ChatGPT and higher scores.
- Qualitative analysis: topic-based coding of open-ended responses revealed:
- Perceived benefits: fast access to information, concept clarification, help with organization and synthesis.
- Perceived drawbacks: factual errors, risk of over-reliance, limited domain specificity, constraints of free model versions.
- Practical notes: a few participants did not follow assigned conditions and were reclassified according to actual behavior; the activity was optional in some courses, creating group size imbalances.
- Limitations acknowledged: disciplinary variation in GenAI utility (less effective for high-order analytical engineering tasks), gender and sample composition imbalances, no direct evidence on outcomes for students with disabilities (though implications for inclusive design are discussed).
Data & Methods
- Participants: 254 undergraduate and graduate students from University of Salamanca and University of León across six courses (four engineering-related; two education/psychology-related).
- Assignment: systematic sampling to experimental (n=133) or control (n=121); reclassification of a few misbehaving cases to reflect actual tool use.
- Intervention: Experimental group could use ChatGPT (and other GenAI); control group restricted to non-GenAI materials (notes, textbooks, web resources without GenAI).
- Tasks: Same set of short, knowledge-based questions per course; scoring rubric and conditions identical across arms.
- Quantitative analyses:
- Descriptive statistics of scores by course and group.
- General Linear Model to assess effect of condition controlling for covariates.
- Non-parametric comparisons where appropriate.
- Correlation analysis between interaction metrics (number of edits/prompts) and task scores.
- Qualitative analyses:
- Topic-based coding of open-ended survey items about perceived usefulness, reliability, usage patterns, and desired features.
- Key robustness/validity choices: identical tests to isolate tool effect; naturalistic setting (no training) to observe unaided appropriation.
Implications for AI Economics
- Human capital and productivity:
- Short-run productivity gains: Access to GenAI (ChatGPT) improves performance on certain knowledge tasks, implying potential near-term gains in student productivity and learning efficiency.
- Complementarity vs substitution: Gains depended on active, iterative use—suggesting GenAI acts as a complement to student effort and skill (those who use it reflectively benefit most) rather than a pure substitute for learning.
- Heterogeneous returns: Effect sizes vary by discipline; GenAI may raise returns to skills emphasizing synthesis and conceptual recall more than high-order analytical/problem-solving skills in STEM—affecting how human capital investment returns differ across fields.
- Labor market and skill demand:
- Increased demand for AI-literacy: Educational institutions and employers will benefit from investing in AI-literacy training to realize GenAI complementarities; lack of instruction reduces potential gains and raises risks (misinformation, dependency).
- Credentialing and assessment redesign: If GenAI materially aids routine knowledge tasks, credentialing systems may need to shift toward assessments of higher-order, domain-specific reasoning and human-AI collaboration skills.
- Access, equity, and market implications:
- Access matters: Widespread, often free access to GenAI can reduce time and search costs for students, but differential effective use (due to prior skills, training, gendered participation rates, or device access) could exacerbate inequalities unless AI-literacy and supervised deployment are scaled equitably.
- Market for complementary services: Positive outcomes without training highlight a baseline utility of GenAI, but the demonstrated benefits of iterative, skillful use suggest commercial opportunities for education providers offering structured AI-usage curricula, scaffolding tools, or domain-tuned models.
- Public policy and institutional investment:
- Cost-effectiveness: GenAI can be a low-cost lever to improve learning outcomes in some domains, but net welfare gains depend on investment in instruction, oversight, and assessment redesign to mitigate misuse and inaccuracies.
- Regulation and standards: Findings strengthen the case for institutional policies on acceptable use, transparency of AI-assisted work, and standards for AI integration into graded activities.
- Research and macro implications:
- Aggregate productivity: If similar improvements generalize beyond this setting, adoption of GenAI in education could accelerate skill acquisition at scale, potentially affecting the future labor supply quality and the pace of technological diffusion.
- Need for field-specific evaluation: Economic models of AI’s impact should account for heterogeneity across domains and the role of user skill and training in converting access into productivity gains.
Limitations for economic interpretation: effects are task- and discipline-specific, sample is university students (mostly engineering in Spain), and the study captures short-run assessment outcomes rather than long-run learning or labor-market impacts. Future economic work should estimate long-term returns to AI-augmented education, distributional effects, and cost–benefit of training programs.
Assessment
Claims (13)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Allowing students to use ChatGPT on knowledge-based academic tasks led to generally higher scores compared with control groups restricted to non-GenAI resources. Output Quality | positive | medium | student task/course scores (short-term performance on knowledge-based tasks) |
n=254
statistically significant (overall)
0.36
|
| The improvement from allowing ChatGPT use was statistically significant in specific courses (examples named: computer systems administration, informatics, childhood disorders). Output Quality | positive | medium | course/task scores within specified courses |
n=254
statistically significant in some courses
0.36
|
| There is a weak but statistically significant positive relationship between iterative engagement with ChatGPT (measured by number of edits to the tool's outputs) and better academic performance. Output Quality | positive | medium | student task/course scores (correlated with number of edits) |
n=254
weak but statistically significant positive correlation
0.36
|
| Students reported that ChatGPT provided faster access to information, helped clarify concepts, and aided organization (e.g., outlining and summarizing). Worker Satisfaction | positive | medium | student-reported perceived usefulness/benefits |
n=254
0.36
|
| Students raised concerns about ChatGPT producing factual errors, the risk of overreliance that could reduce independent thinking, and functional constraints of free ChatGPT versions. Ai Safety And Ethics | negative | medium | student-reported concerns and perceived risks |
n=254
0.36
|
| Effectiveness of ChatGPT varied by discipline; not all course contexts showed significant gains from allowing its use. Output Quality | mixed | medium | course/task scores (heterogeneous effects across disciplines) |
n=254
heterogeneous by discipline
0.36
|
| The study focused on short-term, knowledge-based tasks and did not measure long-term learning or retention. Skill Acquisition | null_result | high | long-term learning/retention (not measured) |
n=254
0.6
|
| The study did not directly measure accessibility or impacts on students with disabilities, though qualitative results suggest possible intersections with inclusive and multimodal learning design. Other | null_result | high | accessibility/disability-related educational outcomes (not measured) |
n=254
0.6
|
| Based on findings and student-reported concerns, the authors recommend integrating explicit AI-literacy instruction to support critical and reflective use of Generative AI tools in education. Training Effectiveness | positive | medium | recommendation for AI-literacy instruction (policy/educational intervention) |
0.36
|
| The study employed a multi-method approach combining experimental quantitative analysis (descriptives, GLM, non-parametric robustness checks) with qualitative topic-based coding of open-ended survey responses. Research Productivity | null_result | high | study methodology (mixed-methods design) |
n=254
0.6
|
| Observed higher short-term performance and the positive correlation with iterative engagement imply that GenAI can augment short-term academic productivity and that benefits depend partly on active, skillful user interaction (complementarity). Output Quality | positive | speculative | short-term academic productivity (inferred/complementarity interpretation) |
n=254
interpretive inference of complementarity
0.06
|
| Differential access to higher-quality (paid) versus free GenAI tools and differing ability to engage with the tool could widen inequality among students and institutions. Inequality | negative | speculative | equity/inequality in access and learning outcomes (not directly measured) |
0.06
|
| The study has potential selection and ecological-validity constraints because it was conducted at two institutions across six courses, limiting generalizability. Research Productivity | null_result | high | external validity/generalizability (limitation) |
n=254
0.6
|