Evidence (4175 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Org Design
Remove filter
The authors constructed a contamination-free dataset of 22 real-world smart-contract security incidents that postdate every evaluated model's release.
Curation procedure described in the methods: 22 incidents selected to occur after all model release dates to prevent leakage.
This study expanded the evaluation matrix to 26 agent configurations spanning four model families and three scaffolding approaches.
Methods reported in this study specifying 26 agent configurations, four model families, and three scaffolds.
EVMbench (OpenAI, Paradigm, OtterSec) reported agents detecting up to 45.6% of vulnerabilities and achieving exploitation on 72.2% of a curated subset.
Reported metrics from the original EVMbench paper/benchmark (as summarized in this study).
Integrating AI (notably ML and NLP) meaningfully automates routine software engineering tasks across requirements management, code generation, testing, and maintenance.
Systematic literature review of prior AI-for-SE work combined with an empirical survey of software engineering professionals reporting usage and examples of tool-supported automation; sample size for the survey not specified in the summary.
PRF design decomposes into two independent dimensions: feedback source (where feedback text comes from) and feedback model (how that feedback is used to refine the query).
Paper's conceptual framing and controlled experiments that isolate and vary these two factors independently.
The paper proposes specific operational and market recommendations: firms should invest in middleware and co-design partnerships; policymakers should fund shared QCSC infrastructure and workforce programs; researchers should prioritize interoperable middleware, scheduling models, and economic experiments on access-pricing.
Explicit recommendations section synthesizing prior architectural and economic analysis; prescriptive assertions based on conceptual arguments rather than experimental validation.
Middleware standardization and interoperable APIs reduce switching costs and foster competition; lack of standards risks vendor lock-in and higher long-run costs.
Economic and systems-design argument drawing on well-understood effects of standardization in software ecosystems; no empirical QCSC-standardization case studies provided.
QCSC reference architecture elements — e.g., QPU integration patterns, low-latency interconnects, orchestration and scheduling middleware, unified programming environments, data staging strategies — are required components to address current friction.
System decomposition and interface requirements derived from use-case analysis; proposed architecture components listed and motivated; no experimental validation.
Policy recommendations include subsidizing complementary investments (data governance, training) rather than technology-only incentives; encouraging standards and interoperability; and funding evaluation studies to measure distributional effects and long-run productivity impacts.
Authors' policy section proposing these interventions based on case findings and broader policy implications.
The authors propose a conceptual optimisation framework emphasizing three pillars: digital integration (tech stack & data), collaboration (processes & governance), and continuous improvement (metrics, feedback loops).
Paper presents a conceptual framework derived from cross-case findings; theoretical/conceptual contribution rather than empirical estimation.
Explanations must be tailored to stakeholders (clinicians, regulators, customers) and integrated into decision processes to be useful (human-centered design principle).
Thematic coding of design and HCI literature within the review; draws on empirical studies and design guidance recommending stakeholder-specific explanation formats and integration into decision workflows.
The forecasting model was deployed with a human-in-the-loop mechanism that triggers on critical forecast deviations.
Pilot description in the paper documenting integration of H-in-the-loop rules for critical deviations during pilot deployment (single-case deployment evidence).
The framework explicitly targets SME-specific risks (data scarcity, limited skills/budgets, and change resistance) and proposes mitigations such as staged pilots, human-in-the-loop designs, and clear governance.
Design rationale and operational recommendations within the paper addressing SME constraints (conceptual; no large-N testing).
An MLOps layer is included to provide continuous integration/deployment, monitoring, retraining, and governance for sustainable model maintenance.
Framework/component specification in the paper describing an MLOps layer and its responsibilities (conceptual design).
The approach operationalizes AI adoption into seven sequential stages, each with specified deliverables, assigned roles, and gate/exit criteria.
Framework description in the paper enumerating seven sequential stages and documenting deliverables, role allocation, and gate criteria (conceptual / design artifact).
The paper proposes a practice-oriented, end-to-end algorithm for integrating AI into SME managerial decision loops grounded in CRISP-DM and extended with AI Canvas, an organizational digital-readiness assessment, and an MLOps layer.
Conceptual/framework development presented in the paper; synthesis of CRISP-DM, AI Canvas, a digital-readiness assessment, and an MLOps layer (no empirical sample required).
Standards and governance frameworks (for model auditability, security, and alignment) will become economic infrastructure influencing adoption costs and market trust.
Conceptual argument linking governance to adoption and trust, drawing on normative risk analysis; no empirical governance impact studies included.
Increasing AI autonomy magnifies ethical, safety, and value‑alignment concerns; robust human oversight and institutional governance are required.
Normative and risk analysis based on projected increases in system autonomy and illustrative failure modes; no formal safety audits included.
Models and systems must include robust governance: transparency, explainability, provenance logging, versioning, and compliance checks to maintain trust and satisfy auditors/regulators.
Normative claim supported by recommended governance and evaluation practices described in the paper; no regulatory testing or audit case studies reported.
Cloud and distributed compute (data lakes, distributed training, streaming pipelines) provide the scalability needed to handle growing data and model complexity in financial analytics.
Technical claim supported by proposed infrastructure components in the paper; no benchmarking or capacity measurements provided.
Such frameworks—designed to be modular, scalable, and interoperable—enable pluggable AI modules (scenario analysis, cash‑flow forecasting, dynamic pricing) and easier integration with ERP/BI systems.
Architectural claim supported by system design principles listed in the paper (modular model repositories, model-serving layers, feature stores, API integration); presented as design best-practices rather than empirical validation.
A systematic RM process—risk identification → analysis/assessment → evaluation/response → control implementation → monitoring and reporting—is a core component of effective practice.
Convergence of process descriptions across ISO 31000, COSO ERM, and multiple reviewed publications identified via thematic analysis.
Integration of risk management with strategy-setting and operational processes is essential to realize RM benefits.
Thematic findings from the literature review and recommendations in established frameworks (ISO 31000, COSO ERM); synthesized across peer-reviewed and practitioner literature.
An embedded risk culture and clear accountability across the organization are necessary enablers for effective risk management.
Repeatedly reported across reviewed literature and standards (e.g., ISO/COSO) in the thematic synthesis; supported by multiple secondary sources in the ten-year scope.
Leadership and governance commitment (board and senior management buy-in) is a core component required for effective risk management implementation.
Consistent identification of leadership/governance as an enabling factor across multiple peer-reviewed articles, books, and risk frameworks synthesized in the review; thematic analysis of literature over the last ten years.
Actionable takeaway: organizations should measure inter-model similarity and response diversity as part of ROI and procurement analyses and factor in governance and role-redesign costs when estimating net returns to LLM deployment.
Explicit recommendation in the paper grounded in empirical analyses of output similarity and diversity metrics; presented as operational guidance rather than tested via field ROI studies.
The paper provides practical diagnostic tools and metrics (e.g., inter-model similarity, response entropy) for detecting and tracking AI homogenization in workflows.
Methodological section describing diagnostic framework and example metrics used in the empirical analyses (semantic similarity measures, entropy, distinct-n), intended for operational use.
Organizational responses to homogenization include leadership communication strategies, work redesign (contrarian roles, ensemble workflows, mandated diversity checks), and governance frameworks (auditing, procurement policies avoiding monoculture).
Prescriptive recommendations in the paper synthesizing empirical results with organizational-design principles; proposed interventions are not evaluated empirically in the paper but are presented as actionable responses.
The analysis dataset comprises approximately 26,000 real-world user queries paired with outputs from over 70 distinct language models spanning different providers, architectures, and scales.
Explicit data description in the paper: ≈26,000 queries and outputs from 70+ models (paper lists model sets and sampling procedures in methods section).
The task frontier expands: new tasks become profitable and are created endogenously as coordination costs decline.
Analytical derivation in the model (proposition about task frontier) and simulation exercises that permit endogenous task entry.
Aggregate output increases when coordination costs fall because reduced frictions and endogenous task creation raise productive capacity.
Analytical result (one of the five propositions) showing comparative statics of output with respect to coordination compression; supported by calibrated numerical simulations.
Lower coordination costs expand managers’ spans of control (managers can supervise more subordinates).
Analytical comparative statics derived in the model (one of the five propositions) and corroborating numerical simulations with heterogeneous agents.
The paper proposes a research agenda prioritizing interoperable, ethical‑by‑design platforms; metrics to measure social equity impacts; and adaptation of global standards to local institutional capacities.
Explicit list of three prioritized research directions provided in the paper, derived from the systematic synthesis of the 103 items.
High‑income examples (e.g., Estonia, Singapore) demonstrate mature integration of digital/AI systems in e‑government, urban mobility, and e‑health.
Empirical case examples drawn from the reviewed literature and institutional reports cited in the review; specific country examples (Estonia, Singapore) repeatedly referenced as mature adopters.
Vendor support, warranties, and service-level agreements (SLAs) are important for clinical adoption and liability management.
Policy and implementation literature, industry reports, and stakeholder feedback synthesized in the paper highlighting the role of vendor contractual commitments in adoption decisions.
Proprietary systems lead on reliability, maintenance, and validated integrations with clinical systems.
Literature synthesis including vendor case studies, deployment reports, and stakeholder surveys indicating more mature productization and validated integrations for proprietary offerings.
Open-source deployment options (e.g., on-premises) reduce data-sharing exposure and improve privacy.
Aggregated evidence from deployment reports and technical papers describing on-premises and local inference architectures; industry analyses of data governance tradeoffs.
Open-source models provide greater transparency and inspectability, enabling better auditability and explainability.
Systematic literature synthesis of peer-reviewed studies, industry reports, and case studies comparing open-source and proprietary systems; comparative analysis highlights inspectability of open-source code/models. No new primary experiments reported.
Recommended policy levers include data-governance rules, provenance and watermarking standards, liability frameworks, copyright clarifications, competition policy, and taxes/subsidies to internalize externalities.
Policy recommendations synthesized from legal, regulatory, and economic literatures within the review; presented as qualitative guidance rather than tested policy interventions.
A structured three-stage framework (input/process/output) clarifies where different risks and regulatory rules apply to generative audiovisual systems.
Framework presented in the paper as a conceptual synthesis of reviewed literatures; supported by cross-references to legal, technical, and ethical sources within the review.
Cognitive interlocks include concrete mechanisms such as policy-enforced gates, automated verification thresholds, role-based checks, and mandatory rebuttal workflows to force verification before outputs are trusted or deployed.
Design details and enumerated mechanisms within the Overton Framework as presented in the paper; no implementation case studies reported.
The Overton Framework is an architectural remedy that embeds 'cognitive interlocks' into development environments to enforce verification boundaries and restore system integrity.
Prescriptive architectural proposal described in the paper (design specification and principles); presented conceptually without empirical validation.
The paper proposes specific metrics and empirical follow-ups (e.g., generation-to-verification throughput ratios, defect accumulation rates, time-to-acceptance for machine-generated artifacts, incident rates attributable to unverified AI outputs) to validate the model.
Explicit recommendations and measurement proposals listed in the paper; no empirical implementation provided.
Team Situation Awareness (shared perception, comprehension, projection) remains a useful analytic anchor for HAT even with agentic AI.
Conceptual analysis mapping Team SA components onto agentic AI interactions; literature review of Team SA utility in HAT contexts.
DAR produces ten falsifiable propositions explicitly mapped to measurement constructs, making the framework empirically testable.
Derivation and listing of ten testable propositions in the paper, each linked to observable measures and prioritized by feasibility. Theoretical derivation, no empirical tests provided.
Common uses of AI among practitioners include generating code snippets, suggesting fixes, accelerating routine tasks, surfacing design patterns or documentation, and scaffolding prototypes.
Practice-focused qualitative data from interviews and workflow analysis at Netlight; authors list these use-cases as commonly reported by practitioners; frequency counts not provided.
Practitioners use AI primarily as a practical assistant (coding, debugging, prototyping, knowledge retrieval) rather than as a fully autonomous developer.
Reported practitioner accounts and observations from the Netlight field study (interviews/observations); examples of tasks AI is used for were documented in the paper; sample limited to experienced consultants at one firm.
Experienced IT professionals at Netlight are already integrating AI tools into everyday development work.
Qualitative field study conducted at Netlight Consulting GmbH using interviews, observations, and analysis of practitioner workflows; single-firm sample (Netlight); exact number of participants not reported.
Enablers of value realization are high-quality, integrated data; explicit data governance and metadata; process standardization; clear KPIs; user training and change management; and executive sponsorship.
Consistent findings across standards-based guidance, practitioner reports, and case studies from the 2020–2025 review highlighting these enablers as prerequisites or facilitators of success.
Value pathways enabled by ERP-integrated AI include improved visibility and real-time decisioning, automation of routine tasks, better forecasts and risk detection, and faster exception handling.
Thematic analysis across the reviewed literature (case studies and conceptual papers) identifying recurring mechanisms by which AI produced value in ERP contexts.