The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6491 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Human Ai Collab Remove filter
A Neural Boosted Tree model with entity embeddings for textile attributes was constructed and achieved a mean R2 of 0.921 in cross-validation, surpassing benchmark methods.
Model training and cross-validation reported in paper using the e-commerce dataset; comparison to benchmark methods reported (specific benchmarks not listed in abstract).
high positive Enhancing Supply Chain Resilience in Textile SMEs: A Human-C... forecasting accuracy (mean R2)
The framework incorporates ethically compliant acquisition of consumer demand signals, semantic translation of unstructured market data into textile engineering attributes, machine-learning-based demand forecasting, and human-centric decision support.
Description of framework components and design choices presented in paper (methodological/architectural claim).
high positive Enhancing Supply Chain Resilience in Textile SMEs: A Human-C... presence of specified framework components (ethical data acquisition, semantic t...
This study develops and validates a customer-to-manufacturer (C2M) intelligence framework that enables data-driven production planning using publicly available e-commerce data.
Methodological development described in paper; validation based on ML modeling using e-commerce data and a 12-month field deployment at one Taiwanese dyeing SME.
high positive Enhancing Supply Chain Resilience in Textile SMEs: A Human-C... feasibility and validation of a C2M intelligence framework for production planni...
The paper introduces a novel posted-price procurement model with coverage objectives for studying platform procurement of human input.
Methodological contribution declared in the paper: presentation of a new formal model (posted-price procurement with coverage objectives).
high positive Stochastic wage suppression on gig platforms and how to orga... model formulation / methodological innovation
A small coalition of targeted low-cost workers who commit to a price floor forces the platform's total spending to change from logarithmic to linear in M.
Theoretical analysis within the model showing that when a targeted subset of low-cost workers commit to a minimum price, the asymptotic scaling of platform spending increases from logarithmic (in M) to linear (in M); proof-based, no empirical sample.
high positive Stochastic wage suppression on gig platforms and how to orga... platform's total spending / total payments to workers (scaling in M)
A research-degree-student survey showed high performance ratings across information reliability, theoretical depth and logical rigor, with pronounced ceiling effects on a 7-point scale, despite all participants already being frontier-model users.
Authors report results from a survey of research-degree students evaluating the scholar-bots on specified dimensions (information reliability, theoretical depth, logical rigor) using a 7-point scale and note ceiling effects; participants reportedly were experienced model users.
high positive The Relic Condition: When Published Scholarship Becomes Mate... student-rated performance on reliability, theoretical depth, logical rigor (7-po...
Recovered panel scores placed Scholar A between 7.9 and 8.9/10 and Scholar B between 8.5 and 8.9/10 under multi-turn debate conditions.
Paper reports numeric panel scores (ranges) for the two scholar-bots in multi-turn debate scenarios; scores are presented as recovered panel evaluations.
high positive The Relic Condition: When Published Scholarship Becomes Mate... panel evaluation scores (0-10 scale) under multi-turn debate
Appointment-level recommendations placed both bots at or above Senior Lecturer level in the Australian university system.
Authors state that appointment-level syntheses from assessors recommended both scholar-bots at or above the Senior Lecturer rank (Australian system); based on the experts' syntheses.
high positive The Relic Condition: When Published Scholarship Becomes Mate... appointment/rank recommendation
Across the preserved expert record, all review and supervision reports judged the outputs benchmark-attaining.
Authors report that the preserved set of expert review and supervision reports (from the three assessors) rated scholar-bot outputs as attaining the benchmark standards used for assessment.
high positive The Relic Condition: When Published Scholarship Becomes Mate... benchmark attainment in review and supervision reports
The scholar-bots were deployed across doctoral supervision, peer review, lecturing and panel-style academic exchange.
Authors report deployment of the generated scholar-bots in multiple academic task contexts (doctoral supervision, peer review, lecturing, panel debates); reported as part of evaluation protocol.
high positive The Relic Condition: When Published Scholarship Becomes Mate... ability to perform academic tasks (supervision, peer review, lecturing, panel ex...
We converted those systems into structured inference-time constraints for a large language model.
Authors describe a pipeline that transforms the extracted scholar reasoning artefacts into inference-time constraints applied to a LLM; presented as part of methods for the two scholar cases.
high positive The Relic Condition: When Published Scholarship Becomes Mate... conversion of extracted reasoning systems into inference-time constraints
We extracted the scholarly reasoning systems of two internationally prominent humanities and social science scholars from their published corpora alone.
Authors report an extraction procedure applied to the published corpora of two named scholars; claim is descriptive of dataset and method (n=2).
high positive The Relic Condition: When Published Scholarship Becomes Mate... successful extraction of reasoning systems from published corpora
From synthesis of results, we suggest three practices that focus on preserving agency in software engineering for coding, learning, and mentorship, especially as AI grows increasingly autonomous.
Authors' prescriptive recommendations derived from the paper's qualitative synthesis; presented as proposed practices rather than empirically tested interventions.
high positive From Junior to Senior: Allocating Agency and Navigating Prof... Recommended practices intended to preserve developer agency
Seniors leverage pre-AI foundational instincts to steer modern tools and possess valuable perspectives for mentoring juniors in their early AI-encouraged career development.
Qualitative accounts from senior participants in the Delphi/ACTA process and blind reviews showing seniors reference pre-AI practices and see mentoring value.
high positive From Junior to Senior: Allocating Agency and Navigating Prof... Seniors' ability to direct AI tools based on prior foundations and their perceiv...
Juniors enter as AI‑natives, seniors adapted mid‑career.
Authors' synthesis from a three-phase mixed-methods study: ACTA combined with a Delphi process (5 seniors), an AI-assisted debugging task (10 juniors), and blind reviews of junior prompt histories by 5 additional seniors.
high positive From Junior to Senior: Allocating Agency and Navigating Prof... Whether developers began their careers with AI tools (AI-native status) versus a...
Prediction intervals are a more suitable evaluation format than point estimates for numerical forecasting because they require scale awareness, internal consistency across confidence levels, and calibration over a continuum of outcomes.
Conceptual/analytical argument presented in the paper explaining why prediction intervals better capture uncertainty and testability for continuous numerical forecasting (no empirical proof provided in the excerpt).
high positive QuantSightBench: Evaluating LLM Quantitative Forecasting wit... suitability of evaluation format (prediction intervals vs point estimates)
Technology-driven recruitment has emerged as a strategic imperative for organizations seeking competitive advantage in talent acquisition.
Argumentative/interpretive claim in the paper's introduction and discussion, supported by survey findings (N=150) indicating perceived strategic importance.
high positive A Study on the Effectiveness of Technology-Driven Recruitmen... perceived strategic importance / adoption intent
The paper proposes the Technology-Enabled Recruitment Optimization Framework (TEROF), a structured implementation model designed to guide organizations through the phased adoption of recruitment technology.
Paper synthesizes its empirical findings into a named framework (TEROF) described in the discussion/conclusions; based on combined survey (N=150) and case-study analysis (4 organizations).
high positive A Study on the Effectiveness of Technology-Driven Recruitmen... adoption guidance / implementation framework
Video interview platforms improved recruiter productivity by 41%.
Reported quantitative finding from the study's survey (N=150) and corroborating case study observations.
AI-powered resume screening reduced initial shortlisting time by 64%.
Reported quantitative result in the paper derived from the survey of HR professionals (N=150) and illustrated in case studies.
high positive A Study on the Effectiveness of Technology-Driven Recruitmen... initial shortlisting time
Integrated technology-driven recruitment produced a 52% reduction in cost-per-hire relative to traditional methods.
Reported quantitative finding from the study's survey (N=150) and supporting case studies (4 organizations).
Adoption of integrated recruitment technology yielded a 45% improvement in candidate quality as measured by first-year performance ratings.
Reported quantitative result from the survey (N=150) and case study evidence using first-year performance ratings as the quality metric.
high positive A Study on the Effectiveness of Technology-Driven Recruitmen... first-year employee performance (candidate quality)
Organizations adopting integrated technology-driven recruitment platforms experienced an average reduction in time-to-hire of 38%.
Reported quantitative finding based on the paper's mixed-methods analysis (survey of 150 HR professionals and corroborating qualitative case studies of 4 organizations).
These results suggest that LinuxArena has meaningful headroom for both attackers and defenders, making it a strong testbed for developing and evaluating future control protocols.
Authors synthesize results from sabotage evaluations, monitor evaluations, and the LaStraj human-attack dataset to conclude there is room for improvement on both attacker and defender sides; this is presented as an implication/recommendation rather than a strictly measured outcome.
high positive LinuxArena: A Control Setting for AI Agents in Live Producti... suitability/quality of LinuxArena as a testbed (headroom for attacker and defend...
LinuxArena contains 184 side tasks representing safety failures such as data exfiltration and backdooring.
Authors report the number of side tasks and describe their nature (safety failures) in the dataset/control setting documentation.
high positive LinuxArena: A Control Setting for AI Agents in Live Producti... number of side (safety-failure) tasks in LinuxArena
LinuxArena contains 1,671 main tasks representing legitimate software engineering work.
Authors report the number of main tasks when describing the contents of LinuxArena.
high positive LinuxArena: A Control Setting for AI Agents in Live Producti... number of main (legitimate) tasks in LinuxArena
LinuxArena contains 20 environments.
Authors report constructing LinuxArena and state the number of environments explicitly in the paper's description of the dataset/control setting.
high positive LinuxArena: A Control Setting for AI Agents in Live Producti... number of environments in the LinuxArena control setting
We introduce DELEGATE-52 to study the readiness of AI systems in delegated workflows; DELEGATE-52 simulates long delegated workflows that require in-depth document editing across 52 professional domains (e.g., coding, crystallography, and music notation).
Paper describes creation of a benchmark/dataset called DELEGATE-52 covering 52 professional domains and designed to simulate long delegated document-editing workflows.
high positive LLMs Corrupt Your Documents When You Delegate benchmark scope / domain coverage
Drawing on Moral Foundations Theory and a multi-stakeholder perspective, moral (mis)alignment matters for the meaningful integration of AI in sensitive contexts.
Paper's theoretical framing and normative claim (method: conceptual synthesis using Moral Foundations Theory and multi-stakeholder argumentation; no empirical sample or quantitative results reported in the supplied text).
high positive Smart But Not Moral? Moral Alignment In Human-AI Decision-Ma... meaningful integration/adoption of AI in sensitive/high-stakes contexts
Moral alignment is defined as the perceived congruence between the values embedded in an AI system's decision logic and the moral intuitions of stakeholders.
Explicit definitional statement in the paper (conceptual definition; no empirical measurement reported in the supplied text).
high positive Smart But Not Moral? Moral Alignment In Human-AI Decision-Ma... perceived congruence between AI values and stakeholder moral intuitions (definit...
Moral alignment may be a more fundamental dimension of human-AI decision-making than functional or behavioral alignment.
Paper's central argumentative claim (theoretical proposition building on conceptual reasoning and prior theory; no empirical evidence or sample size reported in the supplied text).
high positive Smart But Not Moral? Moral Alignment In Human-AI Decision-Ma... relative fundamental status of moral alignment in human-AI decision-making
In high-stakes AI-supported decisions, considerations are not purely technical but involve moral judgments about fairness, responsibility, and harm.
Stated as a conceptual assertion in the paper's framing/abstract; presented as an observation building on prior literature (no empirical method or sample size reported in the supplied text).
high positive Smart But Not Moral? Moral Alignment In Human-AI Decision-Ma... presence of moral judgments in decision-making
Our paper contributes to the emerging discourse on AI overreliance and provides an understanding of the appropriate degree of reliance as essential to developers making the most of these powerful technologies.
Authors' claimed contribution based on synthesis of themes from twenty-two interviews and presentation of the reliance-control framework.
high positive Towards an Appropriate Level of Reliance on AI: A Preliminar... developers' ability to effectively use AI tools (appropriate degree of reliance)
The reliance-control framework can be used to recommend future research to explore different control levels supported by current and emergent LLM-driven tools.
Paper explicitly uses the framework to motivate and recommend directions for future research; based on qualitative interview findings (n=22) and authors' synthesis.
high positive Towards an Appropriate Level of Reliance on AI: A Preliminar... research directions and scope (exploration of control levels)
We propose a preliminary reliance-control framework where the level of control can be used to identify AI overreliance and underreliance.
Authors present a conceptual/framework contribution derived from analysis of the twenty-two interviews; this is a proposed (theoretical) framework rather than an experimentally validated one.
high positive Towards an Appropriate Level of Reliance on AI: A Preliminar... ability to identify overreliance and underreliance (framework applicability)
Fairness should be evaluated at the system level (the interacting agents) rather than solely at the level of individual models, because fairness can be an emergent, procedural property of decentralized agent interaction.
Conceptual framing supported by the triage experiments showing emergent fairness properties from agent interaction that were not present at the single-agent level.
high positive Beyond Arrow's Impossibility: Fairness as an Emergent Proper... appropriateness of system-level versus model-level evaluation for fairness
Aligned agents partially moderate bias through contestation rather than override, acting as corrective patches that restore access for marginalized groups without fully converting a biased counterpart.
Behavioral observations from the triage negotiation trials where aligned agents contested allocations proposed by biased/un-aligned agents and adjusted final allocations in ways that increased access for marginalized groups while not fully changing the adversarial agent's preferences.
high positive Beyond Arrow's Impossibility: Fairness as an Emergent Proper... change in allocations for marginalized groups due to contestation in multi-agent...
Neither agent's allocation is ethically adequate in isolation, yet their joint final allocation can satisfy fairness criteria that neither would have reached alone.
Comparative analysis of individual-agent allocations versus joint allocations after three rounds of negotiation in the hospital triage simulation; claim based on observed differences between solitary and joint outcomes.
high positive Beyond Arrow's Impossibility: Fairness as an Emergent Proper... ethical adequacy / fairness of allocations (individual vs joint)
Fairness in language models emerges through interaction and exchange among agents, rather than being solely a property of a single, centrally optimized model.
Controlled simulation using a hospital triage framework in which two agents negotiate over three structured debate rounds; one agent is aligned via retrieval-augmented generation (RAG) and the other is unaligned or adversarially prompted. Observed final allocations and negotiation dynamics used to support the claim.
high positive Beyond Arrow's Impossibility: Fairness as an Emergent Proper... emergent fairness of joint allocations produced by multi-agent interaction
By framing disclosure as epistemic infrastructure, this work outlines a conceptual roadmap for future empirical and design research on Human–AI collaboration.
High-level, forward-looking claim about the paper's contribution to research agenda (conceptual argument). No empirical validation in the abstract.
high positive Who Gets Credit? Operationalizing AI Disclosure as Epistemic... influence on future empirical and design research agendas
We contribute a research instrument that operationalizes these configurations in a collaborative chat setting and articulate testable design conjectures.
Paper contribution: a research instrument and set of conjectures described by the authors (design/methodological artifact). The abstract does not report empirical deployment or sample size.
high positive Who Gets Credit? Operationalizing AI Disclosure as Epistemic... operationalization of disclosure configurations in a collaborative chat research...
We introduce an AI Disclosure Design Space that conceptualizes disclosure as an epistemic coordination mechanism.
Paper contribution: conceptual artifact (design space) introduced by the authors; this is a descriptive/foundational claim about the paper's contents.
high positive Who Gets Credit? Operationalizing AI Disclosure as Epistemic... conceptualization of disclosure as an epistemic coordination mechanism
What matters in practice is the design of disclosure: how systems reveal, signal, or conceal AI assistance within collaboration.
Central theoretical argument of the paper (conceptual/design claim); no empirical validation reported in the abstract.
high positive Who Gets Credit? Operationalizing AI Disclosure as Epistemic... effects of AI disclosure design on collaboration
Our results suggest that grounding reward design in empirical analysis of information impact and user answerability improves clarification efficiency.
Conclusion drawn from the paper's empirical work: identification of task relevance and user answerability properties, operationalization via RL rewards, and the CLARITI evaluation showing fewer questions for matched resolution rate; abstract does not report experimental details or metrics beyond the 41% reduction.
high positive Asking What Matters: Reward-Driven Clarification for Softwar... clarification efficiency (fewer questions for similar resolution performance)
CLARITI is an 8B-parameter clarification module.
Model specification reported in the abstract; factual description of the trained model's scale (no further empirical detail provided in the abstract).
We operationalize these properties as multi-stage reinforcement learning rewards to train CLARITI, an 8B-parameter clarification module.
Methodological claim: the paper reports implementation of multi-stage RL rewards and training of a clarification model named CLARITI with 8 billion parameters (claim reported in abstract; no training dataset size reported).
high positive Asking What Matters: Reward-Driven Clarification for Softwar... ability to train a clarification module using the proposed reward design
Using Shapley attribution and distributional comparisons, we identify two key properties of effective clarification: task relevance (which information predicts success) and user answerability (what users can realistically provide).
Analytical methods reported in the paper: Shapley attribution and distributional comparisons applied to datasets of software engineering tasks and simulated user responses (abstract mentions these methods but gives no numeric sample size).
high positive Asking What Matters: Reward-Driven Clarification for Softwar... importance of information features for predicting task success and simulated-use...
Humans often specify tasks incompletely, so assistants must know when and how to ask clarifying questions.
Background claim stated in the paper's introduction/abstract; likely supported by literature on underspecified task specifications and/or the authors' motivating examples (no specific sample size or experiment reported in the abstract).
high positive Asking What Matters: Reward-Driven Clarification for Softwar... frequency/occurrence of incomplete task specifications (need for clarification)
The approach provides a practical path toward more transparent, controllable, and accountable AI use without requiring new model architectures.
Authors' asserted benefit of the proposed interaction-layer framework; no empirical demonstration that transparency, control, or accountability are achieved or that no architectural changes are required in practice.
high positive Governing Reflective Human-AI Collaboration: A Framework for... transparency_controllability_accountability_of_AI_use
The framework enables auditable reasoning traces and supports alignment with emerging governance standards, including the EU AI Act and ISO/IEC 42001.
Stated compliance/alignment claim linking the proposed interaction-layer approach to existing regulatory standards; no compliance testing or audit examples reported.
high positive Governing Reflective Human-AI Collaboration: A Framework for... auditable_reasoning_traces_and_regulatory_alignment (EU AI Act, ISO/IEC 42001)