Evidence (11677 claims)
Adoption
7395 claims
Productivity
6507 claims
Governance
5921 claims
Human-AI Collaboration
5192 claims
Org Design
3497 claims
Innovation
3492 claims
Labor Markets
3231 claims
Skills & Training
2608 claims
Inequality
1842 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 609 | 159 | 77 | 738 | 1617 |
| Governance & Regulation | 671 | 334 | 160 | 99 | 1285 |
| Organizational Efficiency | 626 | 147 | 105 | 70 | 955 |
| Technology Adoption Rate | 502 | 176 | 98 | 78 | 861 |
| Research Productivity | 349 | 109 | 48 | 322 | 838 |
| Output Quality | 391 | 121 | 45 | 40 | 597 |
| Firm Productivity | 385 | 46 | 85 | 17 | 539 |
| Decision Quality | 277 | 145 | 63 | 34 | 526 |
| AI Safety & Ethics | 189 | 244 | 59 | 30 | 526 |
| Market Structure | 152 | 154 | 109 | 20 | 440 |
| Task Allocation | 158 | 50 | 56 | 26 | 295 |
| Innovation Output | 178 | 23 | 38 | 17 | 257 |
| Skill Acquisition | 137 | 52 | 50 | 13 | 252 |
| Fiscal & Macroeconomic | 120 | 64 | 38 | 23 | 252 |
| Employment Level | 93 | 46 | 96 | 12 | 249 |
| Firm Revenue | 130 | 43 | 26 | 3 | 202 |
| Consumer Welfare | 99 | 51 | 40 | 11 | 201 |
| Inequality Measures | 36 | 106 | 40 | 6 | 188 |
| Task Completion Time | 134 | 18 | 6 | 5 | 163 |
| Worker Satisfaction | 79 | 54 | 16 | 11 | 160 |
| Error Rate | 64 | 79 | 8 | 1 | 152 |
| Regulatory Compliance | 69 | 66 | 14 | 3 | 152 |
| Training Effectiveness | 82 | 16 | 13 | 18 | 131 |
| Wages & Compensation | 70 | 25 | 22 | 6 | 123 |
| Team Performance | 74 | 16 | 21 | 9 | 121 |
| Automation Exposure | 41 | 48 | 19 | 9 | 120 |
| Job Displacement | 11 | 71 | 16 | 1 | 99 |
| Developer Productivity | 71 | 14 | 9 | 3 | 98 |
| Hiring & Recruitment | 49 | 7 | 8 | 3 | 67 |
| Social Protection | 26 | 14 | 8 | 2 | 50 |
| Creative Output | 26 | 14 | 6 | 2 | 49 |
| Skill Obsolescence | 5 | 37 | 5 | 1 | 48 |
| Labor Share of Income | 12 | 13 | 12 | — | 37 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Advances in AI agent capabilities have outpaced users' ability to meaningfully oversee their execution.
Author assertion / literature-level observation presented in the paper (no empirical sample reported for this claim).
A threat model taxonomy mapping misuse vectors to hardware, software, institutional, and liability layers illustrates why no single governance mechanism suffices.
Threat model taxonomy developed in the paper (conceptual taxonomy; illustrative mapping rather than empirical testing).
Restricting access to open-weight models deepens asymmetries while driving proliferation into unsupervised settings.
Argumentation and threat-model reasoning in the paper describing likely consequences of restrictions (theoretical analysis; no empirical sample cited).
Access restrictions, without governed alternatives, may displace risks rather than reduce them.
Theoretical argument and threat-model analysis in the paper showing possible risk displacement (conceptual reasoning; no empirical sample reported).
Selective forgetting remains underexplored compared to retention in LLM agent memory research.
Authors' literature survey / position statement in paper (assertion made in abstract).
Beyond technical barriers there are organizational ones: a persistent AI literacy gap, cultural heterogeneity, and governance structures that have not yet caught up with agentic capabilities.
Interview data (over 30) reporting organizational challenges including limited AI literacy, diverse cultural attitudes across organizations, and lagging governance relative to agentic AI capabilities.
Adoption is constrained less by model capability than by fragmented and machine-unfriendly data, stringent security and regulatory requirements, and limited API-accessible legacy toolchains.
Stakeholder interviews (over 30) reporting barriers to deployment; qualitative synthesis identifies data fragmentation, security/regulatory requirements, and legacy toolchain access as primary constraints.
Providing agents feedback about past performance makes them worse at information aggregation and reduces their profits.
Experimental condition where agents received feedback about past performance; compared aggregation (log error of last price) and profits with and without feedback and found worse aggregation and lower profits when feedback was given.
Increasing the complexity of the information structure has a significant and negative impact on information aggregation, suggesting AI agents may suffer from the same limitations as humans when reasoning about others.
Experimental manipulation of information-structure complexity in the controlled trading experiment; measured change in aggregation performance (log error of last price) as complexity increases.
The value alignment problem for artificial intelligence (AI) is often framed as a purely technical or normative challenge, sometimes focused on hypothetical future systems.
Author's literature-based observation and critique in the paper's introduction (conceptual argument; no empirical sample reported).
Users push back against agent outputs -- through corrections, failure reports, and interruptions -- in 44% of all turns.
Turn-level coding of user behavior in the SWE-chat dataset: proportion of conversational turns containing correction/complaint/interrupt signals, computed across >63,000 user prompts and sessions.
Agent-written code introduces more security vulnerabilities than code authored by humans.
Comparative analysis of security vulnerabilities attributed to agent-authored code versus human-authored code within the SWE-chat dataset (method details not specified in excerpt).
Just 44% of all agent-produced code survives into user commits.
Empirical measurement of code provenance and survival within the SWE-chat dataset: proportion of agent-produced code that becomes part of subsequent user commits across sessions.
Despite rapidly improving capabilities, coding agents remain inefficient in natural settings.
Authors' summary claim supported by dataset-derived metrics such as agent code survival rate (44%) and user pushback (44% of turns); observational analysis of SWE-chat.
Regulated deployment imposes four load-bearing systems properties — deterministic replay, auditable rationale, multi-tenant isolation, statelessness for horizontal scale — and stateful architectures violate them by construction.
Conceptual/architectural argument presented in the paper (theoretical analysis), not an empirical measurement in the abstract.
Evaluation of four leading AI platforms shows that standard RAG-based approaches achieve an average of only 15% accuracy when information is insufficient.
Empirical evaluation described in paper: four AI platforms tested on benchmark; reported average accuracy of 15% for RAG-based approaches on cases with insufficient information.
Unemployment insurance adjudication has seen rapid integration of AI systems and the question of additional fact-finding poses the most significant bottleneck for a system that affects millions of applicants annually.
Contextual/introductory claim in paper; references to domain-scale impact and bottleneck; no specific numeric study sample provided in excerpt.
A well-known limitation of AI systems is presumptuousness: the tendency of AI systems to provide confident answers when information may be lacking.
Statement in paper framing the problem; general literature/contextual claim (no specific experiment cited in the excerpt).
Brevity, semantic isolation and rhetorical register independently predict representational outcome (i.e., which submissions are included/excluded in summaries).
Statistical/semantic analysis (presumably regression or causal inference) reported in the paper linking textual features—brevity, semantic isolation, rhetorical register—to representational outcomes.
Exclusion concentrates in clusters expressing dissent, scepticism and critique of AI, with exclusion rates of 33%–88% in such clusters.
Cluster/semantic analysis reported in the paper showing higher exclusion rates for clusters labeled as dissent/scepticism/critique.
In topic B, 15.3% of participants are effectively excluded by the official summary.
Empirical measurement reported in the paper quantifying participants 'effectively excluded' when comparing source submissions to official summary coverage.
In topic A, 16.9% of participants are effectively excluded by the official summary.
Empirical measurement reported in the paper quantifying participants 'effectively excluded' when comparing source submissions to official summary coverage.
Both official government summaries underperform a random-participant baseline for topic B (coverage degradation of -8.0%).
Empirical comparison in the paper between official government summary and a random-participant baseline using the n=5,253 consultation responses.
Both official government summaries underperform a random-participant baseline for topic A (coverage degradation of -9.1%).
Empirical comparison in the paper between official government summary and a random-participant baseline using the n=5,253 consultation responses.
No single policy instrument is sufficient to produce high regional science and technology industrial competitiveness.
Result of fuzzy-set qualitative comparative analysis (fsQCA) on AI policy instruments issued by provincial-level governments in China, reported in the study; fsQCA finds no individual condition is sufficient.
LLMs endorsed fraudulent investments at 0% across all models tested.
Preregistered experiment across seven leading LLMs producing 3,360 AI advisory conversations; reported 0% endorsement of objectively fraudulent opportunities.
Endorsement reversal occurred in fewer than 3 in 1,000 observations.
Observed incidence reported from the preregistered experiment (3,360 AI advisory conversations); statement in paper reporting incidence <3/1,000.
Critical gaps persist in explainability, regulatory alignment, ethical governance, and context-specific validation.
Authors' synthesis and Conclusion listing persistent shortcomings identified across the reviewed literature.
Integration of decision intelligence principles into AI applications for financial risk management in emerging markets is nascent.
Authors' synthesis noting limited presence of decision intelligence frameworks or hybrid human-AI decision processes across the reviewed literature.
There is limited empirical validation of AI approaches in emerging market settings.
Review finding described in Results and Conclusion: comparatively few studies provide robust, context-specific empirical validation for emerging markets despite general claims of effectiveness.
Disparities emerge and compound across stages of the ML pipeline (training data, model predictions, and post-processing).
Pipeline-level analysis reported in paper showing sources of disparity at multiple stages and how effects accumulate from training data through prediction to post-processing.
Post-processing amplifies these disparities by collapsing heterogeneous probabilities into percentile-based risk tiers.
Analysis of the pipeline showing that converting model probabilities into percentile-based risk tiers (post-processing step) increases observed disparities across demographic groups.
Older and female students with comparable dropout risk are under-identified by the EWS.
Audit comparison showing lower identification/flagging rates for older and female students who have comparable modeled or observed dropout risk to other groups; reported as part of the pipeline disparities analysis.
Younger, male, and international students are disproportionately flagged for support by the EWS, even when many ultimately succeed.
Empirical results from the replica-based audit comparing model predictions and post-processing flags against eventual student outcomes; disparities reported by demographic groups (age, gender, residency). Exact sample size and numerical metrics not provided in the abstract.
Recent policy and academic discourse has increasingly acknowledged the infeasibility of fullstack AI sovereignty, but has not yet provided an integrating theoretical architecture for governing dependence under these conditions.
Literature/policy-discourse claim made in the paper (review/interpretation). No empirical sampling or quantitative evidence reported in the provided text.
The concentration of AI-related infrastructures is coalescing into distinct geocognitive power poles whose competing infrastructural ecosystems generate structural asymmetries that position small and medium-sized states within regimes of cognitive-informational dependence.
Theoretical/geopolitical argument introduced in the paper (conceptual framing). No empirical sample size or quantitative measurement provided in the excerpt.
There is a growing concentration of computational capacity, data ecosystems, and advanced model architectures within a limited number of technological actors, signaling the emergence of a cognitive-informational order in which influence is exercised through the architectures that shape how knowledge is generated, interpreted, and operationalized.
Theoretical/observational assertion in the paper (conceptual synthesis). No empirical details, sample sizes, or quantitative analyses provided in the supplied text.
The policy and research challenge posed by platform-mediated automation is not merely job quantity (technological unemployment) but institutional continuity — how societies reproduce practical competence when platforms optimize for efficiency rather than formation.
Normative and conceptual claim developed through literature synthesis (institutional economics, platform governance, workforce development); presented as an analytical reframing rather than an empirically tested hypothesis.
Entry-level roles have historically functioned as apprenticeships in which workers acquire tacit knowledge and critical judgment; if platforms curtail these formative occupational layers, organizations may lack future workers capable of exercising contextual reasoning required to manage complex systems.
Institutional economics and workforce development literature cited in the paper; conceptual synthesis without original empirical measurement reported.
Platform-mediated automation risks hollowing out labor structures from both directions: eroding repetitive, junior roles from below and automating supervisory coordination functions from above.
Theoretical argument synthesizing institutional economics and platform literature; articulated as a conceptual risk rather than demonstrated with original empirical data.
Algorithmic systems are displacing routine tasks across both low-wage entry-level work and middle-management functions.
Stated in paper's argumentation; supported by a literature-based review drawing on platform governance literature and recent research on AI-enhanced automation (no original empirical sample or quantitative study reported).
The observed negative OPM effect is consistent with short-term 'J-curve' transition costs (process redesign and capability buildup) during early AI adoption.
Interpretation of empirical patterns (short-term decline in OPM concurrent with no ROA change) offered by the authors as an explanatory mechanism; not presented as separately estimated or experimentally tested.
AI adoption had a significantly negative impact on the operating profit margin (OPM).
Causal analysis of KOSDAQ-listed companies (2018–2025) with AI-adoption timing identified via multi-step, contextually validated text analysis of DART business reports; endogeneity addressed using two-way fixed effects (TWFE) and Propensity Score Matching (PSM).
For agentic systems, there are three structural breaks: decision diffusion, evidence fragmentation, and responsibility ambiguity.
Analytical identification and labeling of three specific structural problems for agentic AI within the paper's argumentation.
The paper introduces the 'cascade of uncertainty', showing how governance failures propagate through serial dependencies between framework layers.
Conceptual/theoretical model introduced and analyzed in the paper (cascade model linking framework layers and failure propagation).
Agentic AI systems encounter structural breaks that prevent normal framework fillability.
Paper's analytic assessment reports that agentic AI systems cause structural breaks undermining the framework's ability to fill DES-properties.
Classical ML systems achieve only minimal DES-property fillability.
Analytic comparison in the paper classifies classical ML systems as providing minimal governance evidence fillability.
When automated decision systems fail, organizations frequently discover that formally compliant governance infrastructure cannot reconstruct what happened or why.
Asserted by the paper as an observed problem motivating the study; presented as a general empirical/experiential claim (literature/examples synthesis) rather than a controlled empirical estimate.
Artificial intelligence introduces systemic risks through unprovenanced AI-derived metadata.
Cautionary claim made by the authors; stated as a systemic risk linked to provenance issues of AI-generated metadata, without empirical incident data in the excerpt.
The debate about scholarly knowledge infrastructure has long been framed as a contest between openness and commercial enclosure, and this framing distorts both policy and practice.
Conceptual/persuasive claim made in the paper's opening paragraph; no empirical data or sample reported in the excerpt.