Allowing ChatGPT on knowledge-based coursework improves short-term student performance in several classes, with more interactive use linked to modestly higher scores; students value speed and organization but warn of inaccuracies and overreliance, prompting calls for explicit AI-literacy instruction.
Abstract This study investigates how university students engage with generative artificial intelligence (GenAI), specifically ChatGPT, when completing knowledge-based academic tasks across six courses and two institutions. By comparing performance and perceptions in engineering and non-engineering subjects, the study examines whether students can use GenAI effectively without prior training and to what extent such tools meaningfully support learning. The work also explores how these findings may inform future research on accessible and inclusive learning design. A multi-method design was employed with 254 undergraduate and graduate students assigned to experimental groups (allowed to use ChatGPT) or control groups (restricted to traditional, non-GenAI resources). Quantitative analyses included descriptive statistics, a general linear model, and non-parametric comparisons, complemented by a topic-based analysis of open-ended survey responses addressing students’ perceptions, usage patterns, and desired functionalities. Students in the experimental groups generally obtained higher scores, with significant improvements in several subjects (e.g., computer systems administration, informatics, childhood disorders). A weak but significant positive correlation emerged between iterative engagement with ChatGPT (edits) and academic performance. Qualitative analysis showed that students valued ChatGPT for fast information access, clarification of concepts, and organizational support, while also expressing concerns about inaccuracies, overreliance, and limitations of free versions. GenAI can enhance student performance when used actively and reflectively, although its effectiveness varies by disciplinary context. The findings highlight the need for explicit AI-literacy instruction to ensure critical and responsible use. While the study does not directly address disability or accessibility outcomes, the qualitative patterns suggest potential intersections with inclusive and multimodal learning design, pointing to promising avenues for future research.
Summary
Main Finding
Allowing students to use ChatGPT on knowledge-based academic tasks led to generally higher scores—significantly so in some courses—and a weak but significant positive relationship between iterative engagement with the tool (number of edits) and performance. Students saw value in ChatGPT for quick information access, clarification, and organization, but raised concerns about inaccuracies, overreliance, and limitations of free versions. Effectiveness varied by discipline, and the authors recommend explicit AI-literacy instruction to support critical, reflective use.
Key Points
- Sample: 254 undergraduate and graduate students across six courses at two institutions (engineering and non-engineering subjects).
- Design: Multi-method randomized/experimental assignment to groups allowed to use ChatGPT vs. control groups restricted to traditional, non-GenAI resources.
- Quantitative methods: Descriptive statistics, general linear model (GLM), and non-parametric comparisons.
- Qualitative methods: Topic-based analysis of open-ended survey responses on perceptions, usage patterns, and desired features.
- Outcomes:
- Experimental groups generally scored higher; significant improvements in specific subjects (e.g., computer systems administration, informatics, childhood disorders).
- Weak but significant positive correlation between iterative engagement (making edits to ChatGPT outputs) and better academic performance.
- Student-reported benefits: faster access to information, concept clarification, organizational aid (e.g., outlining, summarizing).
- Student-reported concerns: factual errors, risk of overreliance (reducing independent thinking), and constraints in free ChatGPT versions.
- Accessibility/disability: Not directly measured, but qualitative results suggest possible intersections with inclusive and multimodal learning design.
- Recommendation: Integrate explicit AI literacy instruction to foster critical and reflective use of GenAI tools.
Data & Methods
- Participants: 254 students (mix of undergraduate/graduate), across six courses spanning engineering and non-engineering disciplines, at two institutions.
- Experimental manipulation: Assignment to either (a) allowed use of ChatGPT for completing tasks, or (b) control—restricted to traditional, non-GenAI resources.
- Quantitative analyses:
- Descriptive statistics to compare group outcomes.
- General linear model to control for covariates and estimate treatment effects.
- Non-parametric tests to assess robustness of differences across groups/courses.
- Correlational analysis between usage behavior (e.g., number of edits) and scores.
- Qualitative analyses:
- Topic-based coding of open-ended survey responses capturing student perceptions, use cases, desired functionalities, and concerns.
- Limitations noted or implied:
- Focus on short-term, knowledge-based tasks—no direct measurement of long-term learning or retention.
- Heterogeneous effects by discipline; not all course contexts show significant gains.
- No direct measurement of accessibility outcomes or impacts on students with disabilities.
- Potential selection and ecological-validity constraints (two institutions, six courses).
Implications for AI Economics
- Productivity and human capital formation:
- GenAI can raise short-term academic productivity (higher task performance), suggesting potential for AI to augment learning efficiency—an input to human capital accumulation.
- The positive correlation with iterative engagement implies complementarities: gains depend on active user interaction rather than passive use. This aligns with models where AI complements skilled effort.
- Skill-biased technological change and heterogeneity:
- Variable effectiveness across disciplines indicates heterogeneous returns to AI adoption across fields, implying uneven skill-augmenting effects and possible shifts in comparative advantage across specializations.
- Investment in AI literacy as economic policy:
- AI literacy functions as a necessary complement to realize AI benefits. Investments in training are analogous to investments in complementary capital that raise returns to AI—critical for equitable adoption and to avoid lock-in to low-value usage patterns.
- Market for educational services and credentialing:
- If GenAI improves performance on knowledge-based tasks, it may change signaling value of assessments and credentials, prompting redesigns of evaluation, proctoring, and curricula—affecting demand for tutoring, course design services, and assessment technologies.
- Distributional concerns and access:
- Differential access to high-quality GenAI (free vs. paid versions) and differing abilities to engage reflect potential to widen inequality among students and institutions—mirroring digital divide dynamics in labor markets.
- Labor market and task reallocation:
- Improved speed of knowledge acquisition could alter labor market entry and task allocation in jobs requiring domain knowledge—accelerating deskilling for routine tasks but increasing demand for higher-order skills (critical thinking, evaluation).
- Externalities and regulation:
- Risks from inaccuracies and overreliance highlight negative externalities (misinformation, lowered skill development) that may motivate educational policy and standards for AI use, certification of AI literacy, or guidance on permissible use in assessments.
- Research and evaluation priorities for economics:
- Need for cost-benefit analyses comparing gains from GenAI-enabled learning against costs of training, subscription access, and potential long-term learning impacts.
- Longitudinal studies to quantify effects on human capital accumulation, labor-market outcomes, and inequality.
- Structural work estimating complementarities/substitutability between AI tools and student effort or instructor inputs to inform optimal investment and policy design.
Suggested next empirical steps for researchers in AI economics: measure long-term retention and labor-market signaling changes, estimate heterogeneous treatment effects by discipline and socioeconomic status, evaluate paid vs. free tool differentials, and model optimal public/private investments in AI literacy and access.
Assessment
Claims (13)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Allowing students to use ChatGPT on knowledge-based academic tasks led to generally higher scores compared with control groups restricted to non-GenAI resources. Output Quality | positive | medium | student task/course scores (short-term performance on knowledge-based tasks) |
n=254
statistically significant (overall)
0.36
|
| The improvement from allowing ChatGPT use was statistically significant in specific courses (examples named: computer systems administration, informatics, childhood disorders). Output Quality | positive | medium | course/task scores within specified courses |
n=254
statistically significant in some courses
0.36
|
| There is a weak but statistically significant positive relationship between iterative engagement with ChatGPT (measured by number of edits to the tool's outputs) and better academic performance. Output Quality | positive | medium | student task/course scores (correlated with number of edits) |
n=254
weak but statistically significant positive correlation
0.36
|
| Students reported that ChatGPT provided faster access to information, helped clarify concepts, and aided organization (e.g., outlining and summarizing). Worker Satisfaction | positive | medium | student-reported perceived usefulness/benefits |
n=254
0.36
|
| Students raised concerns about ChatGPT producing factual errors, the risk of overreliance that could reduce independent thinking, and functional constraints of free ChatGPT versions. Ai Safety And Ethics | negative | medium | student-reported concerns and perceived risks |
n=254
0.36
|
| Effectiveness of ChatGPT varied by discipline; not all course contexts showed significant gains from allowing its use. Output Quality | mixed | medium | course/task scores (heterogeneous effects across disciplines) |
n=254
heterogeneous by discipline
0.36
|
| The study focused on short-term, knowledge-based tasks and did not measure long-term learning or retention. Skill Acquisition | null_result | high | long-term learning/retention (not measured) |
n=254
0.6
|
| The study did not directly measure accessibility or impacts on students with disabilities, though qualitative results suggest possible intersections with inclusive and multimodal learning design. Other | null_result | high | accessibility/disability-related educational outcomes (not measured) |
n=254
0.6
|
| Based on findings and student-reported concerns, the authors recommend integrating explicit AI-literacy instruction to support critical and reflective use of Generative AI tools in education. Training Effectiveness | positive | medium | recommendation for AI-literacy instruction (policy/educational intervention) |
0.36
|
| The study employed a multi-method approach combining experimental quantitative analysis (descriptives, GLM, non-parametric robustness checks) with qualitative topic-based coding of open-ended survey responses. Research Productivity | null_result | high | study methodology (mixed-methods design) |
n=254
0.6
|
| Observed higher short-term performance and the positive correlation with iterative engagement imply that GenAI can augment short-term academic productivity and that benefits depend partly on active, skillful user interaction (complementarity). Output Quality | positive | speculative | short-term academic productivity (inferred/complementarity interpretation) |
n=254
interpretive inference of complementarity
0.06
|
| Differential access to higher-quality (paid) versus free GenAI tools and differing ability to engage with the tool could widen inequality among students and institutions. Inequality | negative | speculative | equity/inequality in access and learning outcomes (not directly measured) |
0.06
|
| The study has potential selection and ecological-validity constraints because it was conducted at two institutions across six courses, limiting generalizability. Research Productivity | null_result | high | external validity/generalizability (limitation) |
n=254
0.6
|