The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (6491 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Human Ai Collab Remove filter
This tension reveals a pattern we call 'bounded delegation': developers wanted AI to absorb the assembly work surrounding their craft, never the craft itself.
Interpretive result from the paper's qualitative thematic analysis of survey responses (n=860), labeled by the authors as the 'bounded delegation' pattern.
high positive To Copilot and Beyond: 22 AI Systems Developers Want Built preferred boundary of automation / delegation
Developers wanted systems enforcing explicit authority scoping, provenance, uncertainty signaling, and least-privilege access throughout.
Reported constraints and desiderata from the thematic analysis of survey responses (n=860).
high positive To Copilot and Beyond: 22 AI Systems Developers Want Built desired governance/security features for AI tools (authority scoping, provenance...
Developers wanted systems that embed quality signals earlier in their workflow to keep pace with accelerating code generation.
Thematic findings from the paper's human-in-the-loop, multi-model council-based analysis of survey responses (n=860).
high positive To Copilot and Beyond: 22 AI Systems Developers Want Built requested placement/timing of quality signals in developer workflow
Using a human-in-the-loop, multi-model council-based thematic analysis, we identify 22 AI systems that developers want built across five task categories.
Qualitative analysis method described in the paper applied to the survey responses (n=860); result reported as identification of 22 desired AI systems organized into five categories.
high positive To Copilot and Beyond: 22 AI Systems Developers Want Built catalog of desired AI systems and task categories
BTB enables automated evaluation of any LLM or agent, scoring deliverables against 100+ rubric criteria defined by veteran investment bankers to capture stakeholder utility.
Design claim in abstract describing the benchmark's automated scoring system and rubric size (100+ criteria) defined by expert bankers.
high positive BankerToolBench: Evaluating AI Agents in End-to-End Investme... number of rubric criteria for automated evaluation
For reproducibility all our data and code are provided at https://github.com/scaleapi/scipredict
Explicit reproducibility statement and URL provided in the paper.
high positive SciPredict: Can LLMs Predict the Outcomes of Scientific Expe... data_and_code_availability
SciPredict addresses two critical questions: (a) can LLMs predict the outcome of scientific experiments with sufficient accuracy? and (b) can such predictions be reliably used in the scientific research process?
Statement of research goals and scope in the paper introducing the SciPredict benchmark and accompanying evaluations.
high positive SciPredict: Can LLMs Predict the Outcomes of Scientific Expe... research_questions_addressed
Human experts demonstrate strong calibration: their accuracy increases from ≈5% to ≈80% as they deem outcomes more predictable without conducting the experiment.
Reported stratified accuracy of human experts on SciPredict tasks by self-reported predictability judgments; accuracy rises from ≈5% (when judged not predictable) to ≈80% (when judged predictable).
high positive SciPredict: Can LLMs Predict the Outcomes of Scientific Expe... calibration_of_human_confidence_vs_accuracy
We introduce SciPredict, a benchmark comprising 405 tasks derived from recent empirical studies in 33 specialized sub-fields of physics, biology, and chemistry.
Construction of the SciPredict benchmark described in the paper; explicitly reports 405 tasks and 33 sub-fields.
The future of Nagpur's industrial belt depends not on resisting automation, but on an aggressive reskilling strategy to bridge the gap between current workforce capabilities and future technological requirements.
Normative policy conclusion in the paper recommending reskilling as the primary response; based on the paper's analysis of task changes and projected role shifts; no program evaluation or empirical evidence of reskilling effectiveness reported in the excerpt.
high positive PREDICTING THE FUTURE OF JOBS IN NAGPUR DISTRICT MIDC: THE R... need for reskilling / workforce skill acquisition
There is a projected surge in demand for 'AI-collaborative' roles such as machine maintenance, data supervision, and process optimization.
Projection in the paper based on analysis of task complementarities between humans and AI, listing specific roles expected to grow; no quantitative demand estimates or sample sizes provided in the excerpt.
high positive PREDICTING THE FUTURE OF JOBS IN NAGPUR DISTRICT MIDC: THE R... projected demand for AI-collaborative roles (machine maintenance, data supervisi...
A configuration-driven domain model means deploying a new institutional decision domain requires YAML configuration, not engineering capacity.
Design/implementation claim in paper describing deployment approach using YAML configuration rather than engineering work.
high positive Governed Reasoning for Institutional AI deployment effort required to support a new institutional decision domain
We introduce governability — how reliably a system knows when it should not act autonomously — as a primary evaluation axis for institutional AI alongside accuracy.
Conceptual contribution/metric proposed by authors in paper; no empirical validation reported in the excerpt.
high positive Governed Reasoning for Institutional AI governability (system's ability to know when not to act autonomously)
Cognitive Core produced zero silent errors while both baselines produced 5-6 silent errors on the evaluation set.
Empirical benchmark reported in paper on the 11-case evaluation set; counts of silent errors given for Cognitive Core and baselines.
high positive Governed Reasoning for Institutional AI count of silent errors (incorrect determinations that executed without human-rev...
Cognitive Core achieves 91% accuracy on the 11-case prior authorization appeal set, versus 55% for ReAct and 45% for Plan-and-Solve.
Empirical benchmark reported in paper on the 11-case evaluation set; accuracies explicitly stated for three systems.
high positive Governed Reasoning for Institutional AI accuracy on prior authorization appeal cases
We propose Cognitive Core: a governed decision substrate built from nine typed cognitive primitives (retrieve, classify, investigate, verify, challenge, reflect, deliberate, govern, generate), a four-tier governance model where human review is a condition of execution rather than a post-hoc check, a tamper-evident SHA-256 hash-chain audit ledger endogenous to computation, and a demand-driven delegation architecture supporting both declared and autonomously reasoned epistemic sequences.
Design/proposal described in paper (architectural specification); no empirical evaluation reported for the architecture itself in the excerpt.
high positive Governed Reasoning for Institutional AI system governability and auditability as properties of the decision substrate
Organisations should invest in customisation capabilities for AI recruitment tools, implement comprehensive change management strategies, and maintain robust post-hire evaluation procedures.
Authors' recommendations derived from thematic findings and participant perspectives across two firms (qualitative synthesis of n = 22 interviews).
high positive The augmented recruiter: examining AI integration and decisi... recommended_organisational_practices_for_AI_recruitment
AI functioned optimally as an augmentative technology rather than as a replacement for human decision-makers in recruitment.
Findings: participants across the two case firms described AI being most effective when augmenting human judgment rather than replacing it (interviews n = 22).
high positive The augmented recruiter: examining AI integration and decisi... role_of_AI (augmentation vs replacement)
AI significantly enhanced efficiency through process standardisation and automation.
Findings based on participant accounts in thematic analysis (interviews n = 22) describing process optimisation and automation benefits.
high positive The augmented recruiter: examining AI integration and decisi... efficiency (process standardisation and automation)
Participants in the treatment conditions showed greater positive belief change about the AI across the session.
Pre/post measures of participant beliefs collected during the field experiment (N=388) showing larger positive shifts among those assigned to treatment conditions versus controls.
high positive Scaffolding Human-AI Collaboration: A Field Experiment on Be... change in participant beliefs about AI (pre/post)
A cognitive scaffolding intervention (partnership training that reframed AI as a thought partner) was associated with higher individual document quality at the top of the distribution.
Field experiment with 388 employees comparing cognitive scaffolding to other conditions; reported improvements concentrated at the top of the individual document-quality distribution.
high positive Scaffolding Human-AI Collaboration: A Field Experiment on Be... individual document quality (top of the distribution)
LLMs coordinate extremely well on similar actions.
Empirical observation from the experiment showing high coordination performance by LLMs when alignment on similar actions is the equilibrium; qualitative description in the abstract without reported quantitative metrics.
high positive Strategic Algorithmic Monoculture:Experimental Evidence from... coordination success when similar actions are favored
Like humans, [LLMs] regulate [action similarity] in response to coordination incentives (strategic monoculture).
Empirical claim based on experimental results comparing how humans and LLMs change similarity when incentives for coordination/divergence are manipulated. No numerical details in excerpt.
high positive Strategic Algorithmic Monoculture:Experimental Evidence from... change in action similarity in response to incentives
LLMs exhibit high levels of baseline similarity (primary monoculture).
Empirical observation from the experiment comparing baseline action similarity across LLM subjects (relative level described qualitatively in paper). Specific sample sizes and quantitative metrics not provided in the excerpt.
high positive Strategic Algorithmic Monoculture:Experimental Evidence from... action similarity (baseline)
We implement a simple experimental design that cleanly separates these forces, and deploy it on human and large language model (LLM) subjects.
Methodological claim: authors report implementing an experiment that separates baseline similarity from strategic adjustments and applying it to human participants and LLM agents. No sample sizes or procedural details provided in the excerpt.
high positive Strategic Algorithmic Monoculture:Experimental Evidence from... experimental implementation (ability to separate primary vs strategic monocultur...
We distinguish primary algorithmic monoculture -- baseline action similarity -- from strategic algorithmic monoculture, whereby agents adjust similarity in response to incentives.
Conceptual/theoretical distinction proposed in the paper (definition and taxonomy introduced by the authors). No empirical sample size reported for this conceptual claim in the provided text.
high positive Strategic Algorithmic Monoculture:Experimental Evidence from... definition/separation of two forms of algorithmic monoculture (primary vs strate...
In a test of eight behavioural persuasion strategies, all outperformed the most effective attitudinal persuasion strategy, but differences among the eight were small.
Experimental comparison within the preregistered studies of eight behavioural persuasion strategies versus the best attitudinal persuasion strategy; results reported in paper showing each behavioural strategy exceeded the attitudinal strategy and that variation among the eight behavioural strategies was small.
high positive Artificial intelligence can persuade people to take politica... behavioural persuasion effectiveness (various behavioural outcomes such as petit...
We replicated prior findings that information provision drove effects on attitudes.
Experimentally manipulating information provision within the preregistered studies and observing effects on attitudinal outcomes, consistent with prior literature (sample reported in paper).
high positive Artificial intelligence can persuade people to take politica... attitudinal change (attitudes)
We found sizable AI persuasion effects on these behavioural outcomes (e.g. +19.7 percentage points on petition signing).
Experimental results from the two preregistered studies reported in the paper; example effect explicitly reported as +19.7 percentage points increase in petition signing. Overall sample reported as N=17,950 responses.
high positive Artificial intelligence can persuade people to take politica... petition signing (real petition signing behavior)
Organizations that strategically invest in blended, context-rich, and partnership-based development programs position themselves for sustainable competitive advantage in an increasingly automated marketplace.
Normative recommendation supported by the paper's synthesis of theory and practice (organizational development, adult learning, workforce development); no empirical effect sizes or sample-size-based evaluation provided.
high positive The Future of Education in an AI-Driven World: Preparing Org... positioning for sustainable competitive advantage (organizational performance ad...
Forward-thinking organizations are redesigning learning architectures to cultivate irreplaceable human capabilities that complement rather than compete with AI systems.
Synthesis of literature from organizational psychology, adult learning theory, and workforce development practice cited in the paper; presented as descriptive statement about current organizational practice rather than based on a reported empirical study with sample size.
high positive The Future of Education in an AI-Driven World: Preparing Org... redesign of learning architectures to cultivate human capabilities (critical thi...
Corporate and academic learning ecosystems will converge (necessary convergence of corporate and academic learning ecosystems).
Conceptual synthesis and argumentation in the paper referencing workforce development practice and organizational development research; no quantitative measures or sample size reported.
high positive The Future of Education in an AI-Driven World: Preparing Org... convergence/integration between corporate and academic learning ecosystems
Human skills (critical thinking, adaptive decision-making, interpersonal acumen) will be elevated to core competency status as AI automates technical tasks once considered core competencies.
Argument and synthesis presented in the paper drawing on organizational psychology, adult learning theory, and workforce development practice; no empirical sample size or statistical tests reported (conceptual/literature-based claim).
high positive The Future of Education in an AI-Driven World: Preparing Org... elevation of human skills to core competencies (critical thinking, adaptive deci...
A machine-learning research agenda is needed centered on team-level evaluation, privacy-preserving memory layers, scaffolded AI for learning, carbon-aware routing, and pro-agency workflow design.
Prescriptive recommendation in the position paper proposing specific research priorities; no empirical evaluation of these approaches is presented within the paper itself.
high positive Remote-Capable Knowledge Work Should Default to AI-Enabled F... prioritized ML research directions and interventions (team-level evaluation, pri...
Rather than eliminating the office, this shift supports selective co-presence, reserving in-person time for tasks with high tacitness, high coupling, or high relational stakes (including apprenticeship, conflict repair, trust formation, and early-stage synthesis).
Theoretical/qualitative argument about task types best suited for in-person interaction; illustrated by examples (apprenticeship, conflict repair, trust formation, early-stage synthesis); no empirical task-level allocation study presented.
high positive Remote-Capable Knowledge Work Should Default to AI-Enabled F... allocation of in-person vs. remote time for specific task types
Capabilities that are already widely deployed—transcription, summarization, retrieval, translation, drafting, and code assistance—are the basis for this shift (with bounded agents as an amplifying but not necessary extension).
Descriptive claim citing the prevalence of specific AI capabilities in current deployments; presented as observation in the position paper rather than as a quantified adoption study.
high positive Remote-Capable Knowledge Work Should Default to AI-Enabled F... deployment/adoption of specific AI capabilities (transcription, summarization, r...
The organizational significance of these systems is not generic automation but the accumulation of artifact capital: durable, queryable, reusable traces such as transcripts, summaries, decisions, tickets, code comments, and retrieval layers.
Argumentative claim in the paper describing a conceptual mechanism ('artifact capital') by which foundation-model features create reusable organizational artifacts; no empirical measurement of artifact capital provided.
high positive Remote-Capable Knowledge Work Should Default to AI-Enabled F... accumulation and reuse of organizational knowledge artifacts ('artifact capital'...
The foundation-model stack (NL interaction, multimodal capture, long context, retrieval, transcription, translation, bounded tool use) changes the coordination economics that previously favored daily in-person co-presence.
Conceptual claim supported by descriptions of foundation-model capabilities and their potential to create durable, queryable artifacts; no empirical test or measured coordination-costs reported.
high positive Remote-Capable Knowledge Work Should Default to AI-Enabled F... coordination economics (costs/benefits of co-presence vs. remote work)
Remote-capable knowledge work should default to AI-enabled flexibility because the workflow-integrated foundation-model stack changes the coordination economics that once favored daily co-presence.
Normative argument in the position paper based on conceptual analysis of coordination economics and the claimed effects of foundation-model features; no empirical sample or quantitative study reported.
high positive Remote-Capable Knowledge Work Should Default to AI-Enabled F... defaulting remote-capable knowledge work to AI-enabled flexible arrangements (i....
Preliminary corroboration is provided by a companion production automation system with eleven operating lanes and 2,132 classified tickets.
Reported companion system operational statistics in the paper (11 lanes, 2,132 tickets).
high positive Context Engineering: A Practitioner Methodology for Structur... companion system scale and classified tickets
When iteration was permitted, the final success rate for the structured interactions reached 91.5% (183 of 200).
Reported final success counts/rate in the paper for structured interactions (183 of 200).
high positive Context Engineering: A Practitioner Methodology for Structur... final success rate after iteration
Among structured interactions, 110 of 200 were accepted on first pass.
Reported counts in the paper for the structured-interaction group (110 accepted of 200 structured interactions).
high positive Context Engineering: A Practitioner Methodology for Structur... first-pass acceptances (count and rate)
Structured context assembly was associated with an improvement in first-pass acceptance from 32% to 55%.
Observational comparison reported in the paper (baseline vs. structured first-pass acceptance rates are given as 32% and 55%).
high positive Context Engineering: A Practitioner Methodology for Structur... first-pass acceptance rate
Structured context assembly was associated with a reduction from 3.8 to 2.0 average iteration cycles per task.
Observational comparison reported in the paper (structured vs. baseline interactions); the paper states the 3.8 to 2.0 cycle figures.
high positive Context Engineering: A Practitioner Methodology for Structur... average iteration cycles per task
The paper applies formal models from reliability engineering and information theory as post hoc interpretive lenses on context quality.
Paper text claiming the application of these formal models for interpretation.
high positive Context Engineering: A Practitioner Methodology for Structur... use of formal theoretical models
Context Engineering applies a staged four-phase pipeline (Reviewer to Design to Builder to Auditor).
Methodological description in the paper listing the four pipeline phases.
Context Engineering defines a five-role context package structure (Authority, Exemplar, Constraint, Rubric, Metadata).
Explicit specification in the paper of the five-role package components.
high positive Context Engineering: A Practitioner Methodology for Structur... structure/components of context package
This paper introduces Context Engineering, a structured methodology for assembling, declaring, and sequencing the complete informational payload that accompanies a prompt to an AI tool.
Methodological description in the paper (definition and presentation of the Context Engineering approach).
high positive Context Engineering: A Practitioner Methodology for Structur... existence/definition of a structured prompting methodology
The review integrates fragmented literature into a cohesive framework and offers implications for managers and policymakers to pursue more balanced, inclusive, and context-sensitive AI adoption strategies.
Author-stated contribution of the review based on synthesis of the 40 included studies; normative recommendations derived from the review.
high positive Generative AI in the Workplace: A Systematic Review of Produ... guidance for managerial and policy decision-making regarding AI adoption
Generative AI adoption is associated with mixed employee perceptions: some studies report increased efficiency and higher job satisfaction.
Aggregate finding from included studies in the review that report positive employee-reported outcomes (efficiency, satisfaction).
high positive Generative AI in the Workplace: A Systematic Review of Produ... reported efficiency gains and job satisfaction