The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (11677 claims)

Adoption
7395 claims
Productivity
6507 claims
Governance
5921 claims
Human-AI Collaboration
5192 claims
Org Design
3497 claims
Innovation
3492 claims
Labor Markets
3231 claims
Skills & Training
2608 claims
Inequality
1842 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 609 159 77 738 1617
Governance & Regulation 671 334 160 99 1285
Organizational Efficiency 626 147 105 70 955
Technology Adoption Rate 502 176 98 78 861
Research Productivity 349 109 48 322 838
Output Quality 391 121 45 40 597
Firm Productivity 385 46 85 17 539
Decision Quality 277 145 63 34 526
AI Safety & Ethics 189 244 59 30 526
Market Structure 152 154 109 20 440
Task Allocation 158 50 56 26 295
Innovation Output 178 23 38 17 257
Skill Acquisition 137 52 50 13 252
Fiscal & Macroeconomic 120 64 38 23 252
Employment Level 93 46 96 12 249
Firm Revenue 130 43 26 3 202
Consumer Welfare 99 51 40 11 201
Inequality Measures 36 106 40 6 188
Task Completion Time 134 18 6 5 163
Worker Satisfaction 79 54 16 11 160
Error Rate 64 79 8 1 152
Regulatory Compliance 69 66 14 3 152
Training Effectiveness 82 16 13 18 131
Wages & Compensation 70 25 22 6 123
Team Performance 74 16 21 9 121
Automation Exposure 41 48 19 9 120
Job Displacement 11 71 16 1 99
Developer Productivity 71 14 9 3 98
Hiring & Recruitment 49 7 8 3 67
Social Protection 26 14 8 2 50
Creative Output 26 14 6 2 49
Skill Obsolescence 5 37 5 1 48
Labor Share of Income 12 13 12 37
Worker Turnover 11 12 3 26
Industry 1 1
Advances in AI agent capabilities have outpaced users' ability to meaningfully oversee their execution.
Author assertion / literature-level observation presented in the paper (no empirical sample reported for this claim).
high negative Auditing and Controlling AI Agent Actions in Spreadsheets user oversight ability
A threat model taxonomy mapping misuse vectors to hardware, software, institutional, and liability layers illustrates why no single governance mechanism suffices.
Threat model taxonomy developed in the paper (conceptual taxonomy; illustrative mapping rather than empirical testing).
high negative The Open-Weight Paradox: Why Restricting Access to AI Models... completeness/adequacy of single governance mechanisms
Restricting access to open-weight models deepens asymmetries while driving proliferation into unsupervised settings.
Argumentation and threat-model reasoning in the paper describing likely consequences of restrictions (theoretical analysis; no empirical sample cited).
high negative The Open-Weight Paradox: Why Restricting Access to AI Models... geopolitical asymmetries and proliferation into unsupervised settings
Access restrictions, without governed alternatives, may displace risks rather than reduce them.
Theoretical argument and threat-model analysis in the paper showing possible risk displacement (conceptual reasoning; no empirical sample reported).
high negative The Open-Weight Paradox: Why Restricting Access to AI Models... risk displacement vs risk reduction from access restrictions
Selective forgetting remains underexplored compared to retention in LLM agent memory research.
Authors' literature survey / position statement in paper (assertion made in abstract).
high negative FSFM: A Biologically-Inspired Framework for Selective Forget... extent of research coverage on forgetting vs retention
Beyond technical barriers there are organizational ones: a persistent AI literacy gap, cultural heterogeneity, and governance structures that have not yet caught up with agentic capabilities.
Interview data (over 30) reporting organizational challenges including limited AI literacy, diverse cultural attitudes across organizations, and lagging governance relative to agentic AI capabilities.
high negative Agentic AI in Engineering and Manufacturing: Industry Perspe... organizational readiness factors (AI literacy, culture, governance alignment)
Adoption is constrained less by model capability than by fragmented and machine-unfriendly data, stringent security and regulatory requirements, and limited API-accessible legacy toolchains.
Stakeholder interviews (over 30) reporting barriers to deployment; qualitative synthesis identifies data fragmentation, security/regulatory requirements, and legacy toolchain access as primary constraints.
high negative Agentic AI in Engineering and Manufacturing: Industry Perspe... barriers to AI adoption in engineering/manufacturing
Providing agents feedback about past performance makes them worse at information aggregation and reduces their profits.
Experimental condition where agents received feedback about past performance; compared aggregation (log error of last price) and profits with and without feedback and found worse aggregation and lower profits when feedback was given.
high negative Information Aggregation with AI Agents information aggregation (log error of the last price) and profits
Increasing the complexity of the information structure has a significant and negative impact on information aggregation, suggesting AI agents may suffer from the same limitations as humans when reasoning about others.
Experimental manipulation of information-structure complexity in the controlled trading experiment; measured change in aggregation performance (log error of last price) as complexity increases.
high negative Information Aggregation with AI Agents information aggregation (log error of the last price)
The value alignment problem for artificial intelligence (AI) is often framed as a purely technical or normative challenge, sometimes focused on hypothetical future systems.
Author's literature-based observation and critique in the paper's introduction (conceptual argument; no empirical sample reported).
high negative Relative Principals, Pluralistic Alignment, and the Structur... framing_of_problem_in_literature
Users push back against agent outputs -- through corrections, failure reports, and interruptions -- in 44% of all turns.
Turn-level coding of user behavior in the SWE-chat dataset: proportion of conversational turns containing correction/complaint/interrupt signals, computed across >63,000 user prompts and sessions.
high negative SWE-chat: Coding Agent Interactions From Real Users in the W... rate of user pushback per interaction turn
Agent-written code introduces more security vulnerabilities than code authored by humans.
Comparative analysis of security vulnerabilities attributed to agent-authored code versus human-authored code within the SWE-chat dataset (method details not specified in excerpt).
high negative SWE-chat: Coding Agent Interactions From Real Users in the W... security vulnerabilities introduced by agent-written code versus human-written c...
Just 44% of all agent-produced code survives into user commits.
Empirical measurement of code provenance and survival within the SWE-chat dataset: proportion of agent-produced code that becomes part of subsequent user commits across sessions.
high negative SWE-chat: Coding Agent Interactions From Real Users in the W... survival/usefulness of agent-produced code (proportion incorporated into commits...
Despite rapidly improving capabilities, coding agents remain inefficient in natural settings.
Authors' summary claim supported by dataset-derived metrics such as agent code survival rate (44%) and user pushback (44% of turns); observational analysis of SWE-chat.
high negative SWE-chat: Coding Agent Interactions From Real Users in the W... overall agent efficiency in natural developer workflows (qualitative synthesis)
Regulated deployment imposes four load-bearing systems properties — deterministic replay, auditable rationale, multi-tenant isolation, statelessness for horizontal scale — and stateful architectures violate them by construction.
Conceptual/architectural argument presented in the paper (theoretical analysis), not an empirical measurement in the abstract.
high negative Stateless Decision Memory for Enterprise AI Agents compatibility of stateful architectures with regulatory/system properties
Evaluation of four leading AI platforms shows that standard RAG-based approaches achieve an average of only 15% accuracy when information is insufficient.
Empirical evaluation described in paper: four AI platforms tested on benchmark; reported average accuracy of 15% for RAG-based approaches on cases with insufficient information.
high negative Learning When Not to Decide: A Framework for Overcoming Fact... accuracy on cases where information is insufficient (inconclusive cases)
Unemployment insurance adjudication has seen rapid integration of AI systems and the question of additional fact-finding poses the most significant bottleneck for a system that affects millions of applicants annually.
Contextual/introductory claim in paper; references to domain-scale impact and bottleneck; no specific numeric study sample provided in excerpt.
high negative Learning When Not to Decide: A Framework for Overcoming Fact... scale of impact (number of applicants affected) and fact-finding bottleneck in a...
A well-known limitation of AI systems is presumptuousness: the tendency of AI systems to provide confident answers when information may be lacking.
Statement in paper framing the problem; general literature/contextual claim (no specific experiment cited in the excerpt).
high negative Learning When Not to Decide: A Framework for Overcoming Fact... tendency to provide confident answers when information is lacking (presumptuousn...
Brevity, semantic isolation and rhetorical register independently predict representational outcome (i.e., which submissions are included/excluded in summaries).
Statistical/semantic analysis (presumably regression or causal inference) reported in the paper linking textual features—brevity, semantic isolation, rhetorical register—to representational outcomes.
high negative Participatory provenance as representational auditing for AI... predictive relationship between textual features and representational outcome (c...
Exclusion concentrates in clusters expressing dissent, scepticism and critique of AI, with exclusion rates of 33%–88% in such clusters.
Cluster/semantic analysis reported in the paper showing higher exclusion rates for clusters labeled as dissent/scepticism/critique.
high negative Participatory provenance as representational auditing for AI... cluster-level exclusion rate for dissenting/sceptical/critical clusters
In topic B, 15.3% of participants are effectively excluded by the official summary.
Empirical measurement reported in the paper quantifying participants 'effectively excluded' when comparing source submissions to official summary coverage.
high negative Participatory provenance as representational auditing for AI... participant exclusion rate
In topic A, 16.9% of participants are effectively excluded by the official summary.
Empirical measurement reported in the paper quantifying participants 'effectively excluded' when comparing source submissions to official summary coverage.
high negative Participatory provenance as representational auditing for AI... participant exclusion rate
Both official government summaries underperform a random-participant baseline for topic B (coverage degradation of -8.0%).
Empirical comparison in the paper between official government summary and a random-participant baseline using the n=5,253 consultation responses.
high negative Participatory provenance as representational auditing for AI... coverage (coverage degradation relative to random baseline)
Both official government summaries underperform a random-participant baseline for topic A (coverage degradation of -9.1%).
Empirical comparison in the paper between official government summary and a random-participant baseline using the n=5,253 consultation responses.
high negative Participatory provenance as representational auditing for AI... coverage (coverage degradation relative to random baseline)
No single policy instrument is sufficient to produce high regional science and technology industrial competitiveness.
Result of fuzzy-set qualitative comparative analysis (fsQCA) on AI policy instruments issued by provincial-level governments in China, reported in the study; fsQCA finds no individual condition is sufficient.
high negative How Can Artificial Intelligence Policies Promote the Sustain... regional science and technology industrial competitiveness
LLMs endorsed fraudulent investments at 0% across all models tested.
Preregistered experiment across seven leading LLMs producing 3,360 AI advisory conversations; reported 0% endorsement of objectively fraudulent opportunities.
high negative Large Language Models Outperform Humans in Fraud Detection a... endorsement rate of fraudulent investments by LLMs
Endorsement reversal occurred in fewer than 3 in 1,000 observations.
Observed incidence reported from the preregistered experiment (3,360 AI advisory conversations); statement in paper reporting incidence <3/1,000.
high negative Large Language Models Outperform Humans in Fraud Detection a... rate of endorsement reversal (AI shifting from warning to endorsing fraudulent o...
Critical gaps persist in explainability, regulatory alignment, ethical governance, and context-specific validation.
Authors' synthesis and Conclusion listing persistent shortcomings identified across the reviewed literature.
high negative AI-Driven Financial Risk Management and Decision Intelligenc... presence of gaps in explainability, regulation, ethics, and validation
Integration of decision intelligence principles into AI applications for financial risk management in emerging markets is nascent.
Authors' synthesis noting limited presence of decision intelligence frameworks or hybrid human-AI decision processes across the reviewed literature.
high negative AI-Driven Financial Risk Management and Decision Intelligenc... degree of decision intelligence integration
There is limited empirical validation of AI approaches in emerging market settings.
Review finding described in Results and Conclusion: comparatively few studies provide robust, context-specific empirical validation for emerging markets despite general claims of effectiveness.
high negative AI-Driven Financial Risk Management and Decision Intelligenc... extent of empirical validation in emerging markets
Disparities emerge and compound across stages of the ML pipeline (training data, model predictions, and post-processing).
Pipeline-level analysis reported in paper showing sources of disparity at multiple stages and how effects accumulate from training data through prediction to post-processing.
high negative Fairness Audits of Institutional Risk Models in Deployed ML ... cumulative disparity across pipeline stages
Post-processing amplifies these disparities by collapsing heterogeneous probabilities into percentile-based risk tiers.
Analysis of the pipeline showing that converting model probabilities into percentile-based risk tiers (post-processing step) increases observed disparities across demographic groups.
high negative Fairness Audits of Institutional Risk Models in Deployed ML ... change in disparity magnitude after post-processing (probability → percentile ri...
Older and female students with comparable dropout risk are under-identified by the EWS.
Audit comparison showing lower identification/flagging rates for older and female students who have comparable modeled or observed dropout risk to other groups; reported as part of the pipeline disparities analysis.
high negative Fairness Audits of Institutional Risk Models in Deployed ML ... identification/flagging rate for support relative to comparable dropout risk
Younger, male, and international students are disproportionately flagged for support by the EWS, even when many ultimately succeed.
Empirical results from the replica-based audit comparing model predictions and post-processing flags against eventual student outcomes; disparities reported by demographic groups (age, gender, residency). Exact sample size and numerical metrics not provided in the abstract.
high negative Fairness Audits of Institutional Risk Models in Deployed ML ... rate of being flagged for support (EWS risk flag) versus eventual success/dropou...
Recent policy and academic discourse has increasingly acknowledged the infeasibility of fullstack AI sovereignty, but has not yet provided an integrating theoretical architecture for governing dependence under these conditions.
Literature/policy-discourse claim made in the paper (review/interpretation). No empirical sampling or quantitative evidence reported in the provided text.
high negative Digital Sovereignty in the Global Cognitive-Informational Or... feasibility of full technological autonomy (fullstack AI sovereignty) and the pr...
The concentration of AI-related infrastructures is coalescing into distinct geocognitive power poles whose competing infrastructural ecosystems generate structural asymmetries that position small and medium-sized states within regimes of cognitive-informational dependence.
Theoretical/geopolitical argument introduced in the paper (conceptual framing). No empirical sample size or quantitative measurement provided in the excerpt.
high negative Digital Sovereignty in the Global Cognitive-Informational Or... structural asymmetries and dependence of small and medium-sized states on domina...
There is a growing concentration of computational capacity, data ecosystems, and advanced model architectures within a limited number of technological actors, signaling the emergence of a cognitive-informational order in which influence is exercised through the architectures that shape how knowledge is generated, interpreted, and operationalized.
Theoretical/observational assertion in the paper (conceptual synthesis). No empirical details, sample sizes, or quantitative analyses provided in the supplied text.
high negative Digital Sovereignty in the Global Cognitive-Informational Or... concentration of technological capabilities and resulting influence over knowled...
The policy and research challenge posed by platform-mediated automation is not merely job quantity (technological unemployment) but institutional continuity — how societies reproduce practical competence when platforms optimize for efficiency rather than formation.
Normative and conceptual claim developed through literature synthesis (institutional economics, platform governance, workforce development); presented as an analytical reframing rather than an empirically tested hypothesis.
high negative When Platforms Replace the Pipeline: AI, Labor Erosion, and ... institutional continuity and human capital reproduction (quality of workforce fo...
Entry-level roles have historically functioned as apprenticeships in which workers acquire tacit knowledge and critical judgment; if platforms curtail these formative occupational layers, organizations may lack future workers capable of exercising contextual reasoning required to manage complex systems.
Institutional economics and workforce development literature cited in the paper; conceptual synthesis without original empirical measurement reported.
high negative When Platforms Replace the Pipeline: AI, Labor Erosion, and ... human capital formation (tacit knowledge acquisition and contextual reasoning ca...
Platform-mediated automation risks hollowing out labor structures from both directions: eroding repetitive, junior roles from below and automating supervisory coordination functions from above.
Theoretical argument synthesizing institutional economics and platform literature; articulated as a conceptual risk rather than demonstrated with original empirical data.
high negative When Platforms Replace the Pipeline: AI, Labor Erosion, and ... structural change in occupational layers (hollowing out of junior and supervisor...
Algorithmic systems are displacing routine tasks across both low-wage entry-level work and middle-management functions.
Stated in paper's argumentation; supported by a literature-based review drawing on platform governance literature and recent research on AI-enhanced automation (no original empirical sample or quantitative study reported).
high negative When Platforms Replace the Pipeline: AI, Labor Erosion, and ... displacement of routine tasks (across entry-level and middle-management roles)
The observed negative OPM effect is consistent with short-term 'J-curve' transition costs (process redesign and capability buildup) during early AI adoption.
Interpretation of empirical patterns (short-term decline in OPM concurrent with no ROA change) offered by the authors as an explanatory mechanism; not presented as separately estimated or experimentally tested.
high negative The Dynamic Causal Effects of Corporate AI Adoption on Profi... operating profit margin dynamics / transition costs interpretation
AI adoption had a significantly negative impact on the operating profit margin (OPM).
Causal analysis of KOSDAQ-listed companies (2018–2025) with AI-adoption timing identified via multi-step, contextually validated text analysis of DART business reports; endogeneity addressed using two-way fixed effects (TWFE) and Propensity Score Matching (PSM).
high negative The Dynamic Causal Effects of Corporate AI Adoption on Profi... operating profit margin (OPM)
For agentic systems, there are three structural breaks: decision diffusion, evidence fragmentation, and responsibility ambiguity.
Analytical identification and labeling of three specific structural problems for agentic AI within the paper's argumentation.
high negative Governed Auditable Decisioning Under Uncertainty: Synthesis ... types of structural governance failures in agentic AI
The paper introduces the 'cascade of uncertainty', showing how governance failures propagate through serial dependencies between framework layers.
Conceptual/theoretical model introduced and analyzed in the paper (cascade model linking framework layers and failure propagation).
high negative Governed Auditable Decisioning Under Uncertainty: Synthesis ... propagation of governance failure/uncertainty across framework layers
Agentic AI systems encounter structural breaks that prevent normal framework fillability.
Paper's analytic assessment reports that agentic AI systems cause structural breaks undermining the framework's ability to fill DES-properties.
high negative Governed Auditable Decisioning Under Uncertainty: Synthesis ... framework fillability / governance evidence coverage in agentic systems
Classical ML systems achieve only minimal DES-property fillability.
Analytic comparison in the paper classifies classical ML systems as providing minimal governance evidence fillability.
When automated decision systems fail, organizations frequently discover that formally compliant governance infrastructure cannot reconstruct what happened or why.
Asserted by the paper as an observed problem motivating the study; presented as a general empirical/experiential claim (literature/examples synthesis) rather than a controlled empirical estimate.
high negative Governed Auditable Decisioning Under Uncertainty: Synthesis ... ability of governance infrastructure to reconstruct decisions (post-hoc explaina...
Artificial intelligence introduces systemic risks through unprovenanced AI-derived metadata.
Cautionary claim made by the authors; stated as a systemic risk linked to provenance issues of AI-generated metadata, without empirical incident data in the excerpt.
high negative Market Dynamics, Governance and Open Research Metadata in th... systemic risk from unprovenanced AI-derived metadata (e.g., reduced trust, relia...
The debate about scholarly knowledge infrastructure has long been framed as a contest between openness and commercial enclosure, and this framing distorts both policy and practice.
Conceptual/persuasive claim made in the paper's opening paragraph; no empirical data or sample reported in the excerpt.
high negative Market Dynamics, Governance and Open Research Metadata in th... policy and practice framing (openness vs commercial enclosure)