Evidence (4189 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Org Design
Remove filter
Programmatic state abstraction delivers the largest returns per token spent (RPTS), improving mean return by up to 76% over raw observations.
Controlled empirical study in the CybORG CAGE-2 POMDP environment comparing context representations (raw observations vs. deterministic state-tracking layer with compressed history) across five model families, six models, and twelve configurations with token-level cost accounting (3,475 episodes).
The study's findings offer actionable insights for managers and policymakers to leverage AI for sustainable organizational growth while safeguarding employee well-being.
Authors' concluding statement based on survey findings and analytical results.
Successful human–AI collaboration requires a human-centric approach that balances technological advancement with workforce development, ethical governance, and organizational support.
Study conclusion/recommendation based on survey findings (perceptions of opportunities and challenges) and analytical results (correlation/regression).
Human–AI collaboration reduces employees' routine workload.
Respondent perceptions collected via the structured questionnaire and analyzed with descriptive statistics and regression in SPSS.
AI-based systems support better decision-making by providing data-driven insights, allowing employees to focus on higher-level cognitive and strategic activities.
Survey responses (structured questionnaire) analyzed with SPSS (correlation and regression analyses) reporting perceived support for decision-making.
Human–AI collaboration significantly enhances workplace efficiency and productivity by reducing routine workload and improving accuracy and speed in task execution.
Primary data from employees in AI-enabled organizations collected via a structured questionnaire (5-point Likert); analyzed with SPSS using descriptive statistics and regression analysis.
Design principles that promote disagreement and decentralization—contextual grounding, community customization, continual adaptation, and polycentric governance—should be used so oversight is distributed across many legitimate centers rather than centralized in one institutional or moral chokepoint.
Normative design recommendations and governance proposals provided in the paper (argumentative; no empirical governance evaluation reported).
A range of technical directions (e.g., data filtering and upsampling, pre- and post-training, evaluations, collaborative value collection) are relevant for supporting positive alignment across different phases of the LLM and agents lifecycle.
Prescriptive technical recommendations and research directions described by the authors (conceptual proposals, not reported empirical tests).
Several existing failures of alignment (e.g., engagement hacking, loss of human autonomy, failures in truth-seeking, low epistemic humility, error correction, lack of diverse viewpoints, and being primarily reactive rather than proactive) may be better addressed through positive alignment, including cultivating virtues and maximizing human flourishing.
Theoretical argument and illustrative examples presented in the paper (no experimental or observational results reported).
Positive Alignment is a distinct and necessary agenda within AI alignment research.
Normative argumentation in the paper advocating for a separate research agenda (no empirical validation presented).
Positive Alignment is the development of AI systems that (i) actively support human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way while (ii) remaining safe and cooperative.
Paper's definitional proposal / conceptual framing (normative definition rather than empirical evidence).
Policy frameworks are necessary to govern verifiable machine intelligence in modern socio-technical infrastructures.
Normative recommendation and policy discussion in the paper; no empirical policy evaluation or legislative case studies are presented in the supplied text.
Process-based supervision has broader implications for algorithmic fairness and can reduce black-box opacity.
High-level discussion in the paper linking process-verifiability to fairness and reduced opacity; no empirical fairness audits or quantitative fairness metrics reported in the provided text.
Integrating reinforcement learning with process-oriented feedback can foster a more transparent AI ecosystem where the path to a conclusion is as scrutinized as the conclusion itself.
Conceptual claim and proposed benefit in the paper; presented as an argument rather than supported by empirical transparency or interpretability studies in the supplied text.
Process-based supervision significantly improves the reliability of models in high-stakes domains such as law, medicine, and engineering.
Asserted by the authors as an advantage of PRMs for high-stakes applications; presented as argumentation rather than backed by reported empirical trials or case-study sample sizes in the provided text.
Optimizing PRMs through reinforcement learning enhances the verifiability and robustness of multi-step reasoning in large-scale model architectures.
Central argumentative claim of the paper (theoretical proposal and conceptual analysis); no experimental results or quantitative evaluation provided in the text supplied.
Process-Based Reward Models (PRMs) assign value to each distinct stage of a reasoning chain, providing a more granular signal for training than outcome-only approaches.
Methodological description and conceptual argument in the paper; described as a design/approach rather than empirically validated with data.
Overall, the study provides a cross-sectoral empirical foundation for understanding how budget flexibility, governance, and technology interact to support resilient financial systems in uncertain economic environments.
Synthesis statement based on the paper's cross-sectoral comparative analysis combining firm 10-K data (four firms), Open Budget Survey, OECD database, GAO reports, and the Flexibility Index.
In the public sector, systems characterized by strong transparency frameworks and Medium-Term Expenditure Frameworks demonstrate higher alignment between planned and actual expenditures.
Cross-sectional analysis using Open Budget Survey 2023, OECD Budget Practices Database, and U.S. GAO oversight reports linking transparency and MTEFs to alignment between planned and actual expenditures.
Firms with decentralized budgeting structures and embedded predictive analytics exhibit lower forecast deviations and faster resource reallocation.
Comparative empirical analysis of four large firms using Form 10-K data (2019–2023) and the Flexibility Index to relate decentralization and AI integration to forecast deviations and reallocation speed.
The framework contributes to improving understanding of enterprise coordination and governance under constrained legal conditions and offers a basis for future analytical and empirical research.
Author-stated contribution of the paper based on the developed theoretical framework; positioned as foundation for future work.
The analysis identifies theoretical conditions under which such governance may support verifiable integrity, adaptive compliance, and access to formal markets.
Theoretical conditions derived from the review and theory synthesis (no empirical testing reported in this paper).
The study develops a theory-based framework explaining how RegTech-supported governance may, under specified conditions, enable sanctions-safe enterprise ecosystems during post-conflict reconstruction.
Primary contribution of the paper: theory synthesis built from integrative review of five literature streams (RegTech, sanctions compliance, institutional voids, supply-chain governance, algorithmic accountability).
Post-conflict reconstruction relies heavily on private enterprises to bring back employment, rebuild supply networks, and reconnect damaged economies.
Statement grounded in literature cited in the review (paper positions this as a general premise from post-conflict reconstruction literature); no primary data reported.
There is a positive spillover effect on AI-ineligible chats: treated workers adapted their multitasking workflow to devote greater attention to these chats.
Experiment-level observations comparing worker behavior on AI-ineligible chats between treatment and control; treated workers reallocated attention/effort (multitasking workflow changes) leading to improved attention on AI-ineligible chats.
Early intervention is essential for sustaining high post-escalation intervention effort.
Temporal analysis of intervention timing within the randomized experiment showing an association between earlier human intervention after escalation and higher subsequent intervention effort.
Human intervention preserves service quality in algorithm-triggered technical escalations (unresolved customer issues beyond the AI's capability).
Experimental subgroup analysis of escalations categorized as algorithm-triggered technical escalations; post-escalation human interventions were observed to maintain service quality in these cases.
By reframing reskilling as a shared, supported, and bounded process, AI-driven change can foster long-term career resilience, professional identity renewal, and sustainable human–AI integration.
Conceptual conclusion/implication drawn by the authors from the proposed model and recommendations; no empirical validation included in the paper.
The paper advances a set of sustainable, collective strategies—such as role-linked learning, protected learning time, skill prioritization, and phased AI adoption—to interrupt the reskilling loop and redistribute adaptive demands across organizations.
Prescriptive/theoretical recommendations proposed by the authors; no empirical evaluation or trial evidence presented.
The appropriate design response to Metis tasks is centaur architectures in which humans lead and AI supports, rather than pursuing further automation.
Prescriptive recommendation based on the conceptual analysis and normative reasoning in the paper; not supported by empirical evaluation or quantified comparisons of architectures.
The study offers actionable insights for leaders seeking to balance innovation, capability development and ethical governance in AI-enabled workplaces while sustaining human interpretive authority, accountability and responsibility over time.
Implications and recommendations derived from the study's qualitative findings (28 interviews) and interpretive synthesis.
AI reshapes contemporary work by augmenting, rather than substituting, human roles.
Qualitative semistructured interviews with 28 managers and professionals from 12 organizations across technology, finance and knowledge-intensive services in Europe and Asia; thematic and interpretive analysis supported by organizational document review.
The study demonstrates that recent archival case evidence can be used rigorously to analyze an emerging strategic phenomenon without reducing the study to a purely descriptive literature review.
Methodological claim supported by the paper's demonstration of within-case coding and cross-case pattern matching applied to recent archival documents for the four firms.
The paper develops a process view of AIECI built on sensing, interpretation, and orchestration as the sequence through which AI inputs are transformed into competitive intelligence capability, intelligence-informed decisions, and economic outcomes.
Theoretical contribution synthesized from cross-case analysis and conceptual development within the paper.
Competitive intelligence (the process of sensing, interpreting, and orchestrating responses) rather than AI as a standalone automation tool is the strategic mechanism through which value is created.
Theoretical argument supported by within-case coding and cross-case synthesis of archival materials from four firms demonstrating how AI functions as part of an intelligence infrastructure rather than as isolated automation.
Across the four cases, AIECI delivered strategic speed under uncertainty (faster, better-timed decisions in uncertain environments).
Archival case evidence (public disclosures and corporate materials) showing firms using AI-enabled intelligence to accelerate decision cycles and respond more quickly to market signals.
Across the four cases, AIECI improved allocation quality (better targeting and resource allocation decisions).
Within- and cross-case coding of corporate materials from the four sampled firms reporting improvements in campaign targeting, budget allocation, and resource deployment linked to AI-driven intelligence.
Across the four cases, AIECI produced efficiency gains and cost relief for firms.
Cross-case evidence from archival corporate disclosures and reports for Walmart, Unilever, Sprinklr, and DoubleVerify showing operational/marketing efficiencies and cost savings linked to AI-enabled competitive intelligence.
Across the four cases, AIECI generated value through revenue acceleration.
Cross-case findings from a qualitative comparative multiple-case design using public archival evidence (annual reports, 10-Ks, earnings releases, corporate materials) for four firms (Walmart, Unilever, Sprinklr, DoubleVerify).
We outline a research program for the runtime systems that foundation-model software agents will require.
Paper claims to present a forward-looking research agenda or program (stated in abstract); this is a conceptual contribution rather than an empirical finding.
Applied to a controlled validation task, the framework yields episode packages whose evidence structure varies systematically with harness level: lower levels produce only a final patch, while higher levels produce reproduction logs, failure attributions, deterministic requirement checks, and structured verification reports.
Empirical application described in the abstract: framework applied to a controlled validation task showing systematic variation in episode-package evidence structure across harness levels. The abstract does not report sample size or statistical measures.
We propose a trace-based evaluation protocol that converts each agent run into an auditable episode package.
Methodological proposal described in the abstract proposing a trace-based protocol and an auditable episode package format; no quantitative evaluation details provided in the abstract.
We operationalize the harness through a four-level ladder (H0–H3) that progressively exposes runtime support to the agent.
Design contribution described in the paper (abstract) introducing a four-level ladder (H0–H3) as an operationalization of the harness concept.
Foundation models have transformed automated code generation.
Statement in paper's abstract referring to broad impact of foundation models on automated code generation; likely supported by citations and literature overview within the paper (no sample size or quantitative study reported in the abstract).
The Agent-First paradigm is orthogonal and complementary to transport-layer standards such as MCP, operating as the semantic application layer above existing tool discovery and invocation protocols.
Conceptual argument and mapping presented in the paper asserting interoperability/orthogonality with transport-layer standards (e.g., MCP).
Agent-First APIs improve autonomous error recovery by 5.8x (compared to optimized CRUD baselines).
Reported comparative experiments on 50 real operational tasks measuring autonomous error recovery capability.
Agent-First APIs reduce required human interventions by 72.7% (compared to optimized CRUD baselines).
Same set of comparative experiments on 50 real operational tasks reported in the paper.
Comparative experiments on 50 real operational tasks demonstrate that Agent-First APIs achieve 88% end-to-end task success rate versus 64% for optimized CRUD baselines (+37.5%).
Empirical comparative experiments reported in the paper on 50 real operational tasks, comparing Agent-First APIs to optimized CRUD baselines.
The paradigm is implemented and validated in a production multi-tenant SaaS platform serving 85 registered tools across 6 business domains.
Reported production implementation and deployment statistics (platform with 85 registered tools spanning 6 business domains).
We propose the Agent-First Tool API paradigm, comprising three integrated mechanisms: (1) a Six-Verb Semantic Protocol that decomposes tool interactions into search, resolve, preview, execute, verify, and recover phases; (2) a Normalized Tool Contract (NTC) providing structured decision-support metadata including confidence scores, evidence chains, and suggested next actions; and (3) a dual-layer governance pipeline combining static capability policies with dynamic risk escalation.
Design and specification presented in the paper (proposed architecture and components).