Evidence (4793 claims)
Adoption
5539 claims
Productivity
4793 claims
Governance
4333 claims
Human-AI Collaboration
3326 claims
Labor Markets
2657 claims
Innovation
2510 claims
Org Design
2469 claims
Skills & Training
2017 claims
Inequality
1378 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 402 | 112 | 67 | 480 | 1076 |
| Governance & Regulation | 402 | 192 | 122 | 62 | 790 |
| Research Productivity | 249 | 98 | 34 | 311 | 697 |
| Organizational Efficiency | 395 | 95 | 70 | 40 | 603 |
| Technology Adoption Rate | 321 | 126 | 73 | 39 | 564 |
| Firm Productivity | 306 | 39 | 70 | 12 | 432 |
| Output Quality | 256 | 66 | 25 | 28 | 375 |
| AI Safety & Ethics | 116 | 177 | 44 | 24 | 363 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 76 | 38 | 20 | 315 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 77 | 34 | 80 | 9 | 202 |
| Skill Acquisition | 92 | 33 | 40 | 9 | 174 |
| Innovation Output | 120 | 12 | 23 | 12 | 168 |
| Firm Revenue | 98 | 34 | 22 | — | 154 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 84 | 16 | 33 | 7 | 140 |
| Inequality Measures | 25 | 77 | 32 | 5 | 139 |
| Regulatory Compliance | 54 | 63 | 13 | 3 | 133 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Task Completion Time | 88 | 5 | 4 | 3 | 100 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 32 | 11 | 7 | 97 |
| Wages & Compensation | 53 | 15 | 20 | 5 | 93 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 24 | 22 | 9 | 6 | 62 |
| Job Displacement | 6 | 38 | 13 | — | 57 |
| Hiring & Recruitment | 41 | 4 | 6 | 3 | 54 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 10 | 6 | 2 | 40 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 5 | 9 | — | 26 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
Productivity
Remove filter
Actionable takeaway: organizations should measure inter-model similarity and response diversity as part of ROI and procurement analyses and factor in governance and role-redesign costs when estimating net returns to LLM deployment.
Explicit recommendation in the paper grounded in empirical analyses of output similarity and diversity metrics; presented as operational guidance rather than tested via field ROI studies.
The paper provides practical diagnostic tools and metrics (e.g., inter-model similarity, response entropy) for detecting and tracking AI homogenization in workflows.
Methodological section describing diagnostic framework and example metrics used in the empirical analyses (semantic similarity measures, entropy, distinct-n), intended for operational use.
Organizational responses to homogenization include leadership communication strategies, work redesign (contrarian roles, ensemble workflows, mandated diversity checks), and governance frameworks (auditing, procurement policies avoiding monoculture).
Prescriptive recommendations in the paper synthesizing empirical results with organizational-design principles; proposed interventions are not evaluated empirically in the paper but are presented as actionable responses.
The analysis dataset comprises approximately 26,000 real-world user queries paired with outputs from over 70 distinct language models spanning different providers, architectures, and scales.
Explicit data description in the paper: ≈26,000 queries and outputs from 70+ models (paper lists model sets and sampling procedures in methods section).
The task frontier expands: new tasks become profitable and are created endogenously as coordination costs decline.
Analytical derivation in the model (proposition about task frontier) and simulation exercises that permit endogenous task entry.
Aggregate output increases when coordination costs fall because reduced frictions and endogenous task creation raise productive capacity.
Analytical result (one of the five propositions) showing comparative statics of output with respect to coordination compression; supported by calibrated numerical simulations.
Lower coordination costs expand managers’ spans of control (managers can supervise more subordinates).
Analytical comparative statics derived in the model (one of the five propositions) and corroborating numerical simulations with heterogeneous agents.
Overinvestment increases inequality (greater tail concentration of income).
Model computations showing that exponential returns amplify income at the top; comparative statics indicate inequality measures rise with greater investment/technology under lognormal wage assumption.
Overinvestment increases measured GDP (output).
Comparative statics in the theoretical model linking higher private investment/technology adoption to higher aggregate output; model shows positive effect on measured GDP despite welfare loss possibilities.
The exponential returns to skill and technology create strong private incentives for agents to escalate skill (education) investment toward the high tail of the distribution (an educational arms race).
Equilibrium analysis and comparative statics in the theoretical model showing that marginal returns to additional investment are increasing toward the distribution tail, producing higher optimal private investment at the top relative to social optimum.
When wages follow a lognormal distribution, technological progress makes wages increase exponentially in both skill and technology.
Analytical derivation in the paper's economic model that assumes a lognormal wage distribution and specifies wages as an exponential function of skill and a technology parameter; result follows from model algebra (no empirical data).
The paper proposes a research agenda prioritizing interoperable, ethical‑by‑design platforms; metrics to measure social equity impacts; and adaptation of global standards to local institutional capacities.
Explicit list of three prioritized research directions provided in the paper, derived from the systematic synthesis of the 103 items.
High‑income examples (e.g., Estonia, Singapore) demonstrate mature integration of digital/AI systems in e‑government, urban mobility, and e‑health.
Empirical case examples drawn from the reviewed literature and institutional reports cited in the review; specific country examples (Estonia, Singapore) repeatedly referenced as mature adopters.
Research priorities include developing robust measures of AI adoption and using causal methods (difference-in-differences, synthetic controls, RDD, IV) to estimate effects of AI and regulation on productivity, employment, and inequality.
Methodological recommendations in the report based on identified evidence gaps and normative evaluation of empirical priorities.
The American Artificial Intelligence Initiative emphasizes R&D and innovation leadership, standards development, workforce readiness, and fostering 'trustworthy AI' (transparency, fairness, accountability).
Primary source policy documents from the U.S. American Artificial Intelligence Initiative reviewed in the report.
Vendor support, warranties, and service-level agreements (SLAs) are important for clinical adoption and liability management.
Policy and implementation literature, industry reports, and stakeholder feedback synthesized in the paper highlighting the role of vendor contractual commitments in adoption decisions.
Proprietary systems lead on reliability, maintenance, and validated integrations with clinical systems.
Literature synthesis including vendor case studies, deployment reports, and stakeholder surveys indicating more mature productization and validated integrations for proprietary offerings.
Open-source deployment options (e.g., on-premises) reduce data-sharing exposure and improve privacy.
Aggregated evidence from deployment reports and technical papers describing on-premises and local inference architectures; industry analyses of data governance tradeoffs.
Open-source models provide greater transparency and inspectability, enabling better auditability and explainability.
Systematic literature synthesis of peer-reviewed studies, industry reports, and case studies comparing open-source and proprietary systems; comparative analysis highlights inspectability of open-source code/models. No new primary experiments reported.
Coordinated policy reform, targeted infrastructure investment, workforce training, and equity-focused implementation are strategic priorities to realize AI’s potential in Indonesian healthcare.
Consensus recommendations drawn from the narrative synthesis, thematic analysis, and Delphi consensus studies included among the 42 supplementary documents and the broader 2020–2025 literature body.
Recommended research priorities for economists include measuring how adoption changes task mixes and wages, quantifying verification/remediation costs, estimating productivity gains net of security/IP costs, and studying market dynamics from centralized model providers.
Author recommendations based on identified gaps in the empirical literature synthesized by the paper.
Recommended policy levers include data-governance rules, provenance and watermarking standards, liability frameworks, copyright clarifications, competition policy, and taxes/subsidies to internalize externalities.
Policy recommendations synthesized from legal, regulatory, and economic literatures within the review; presented as qualitative guidance rather than tested policy interventions.
A structured three-stage framework (input/process/output) clarifies where different risks and regulatory rules apply to generative audiovisual systems.
Framework presented in the paper as a conceptual synthesis of reviewed literatures; supported by cross-references to legal, technical, and ethical sources within the review.
The paper introduces IJOPM’s Africa Initiative (AfIn) to support Africa-based OSCM research, outlining motivation, objectives, review process, and researcher support mechanisms.
Descriptive account within the paper (administrative/initiative description rather than empirical evidence).
Cognitive interlocks include concrete mechanisms such as policy-enforced gates, automated verification thresholds, role-based checks, and mandatory rebuttal workflows to force verification before outputs are trusted or deployed.
Design details and enumerated mechanisms within the Overton Framework as presented in the paper; no implementation case studies reported.
The Overton Framework is an architectural remedy that embeds 'cognitive interlocks' into development environments to enforce verification boundaries and restore system integrity.
Prescriptive architectural proposal described in the paper (design specification and principles); presented conceptually without empirical validation.
High‑frequency sensor and satellite data, processed with AI, improve precision in measuring yields, input use, and environmental externalities, enhancing the quality of economic impact evaluations and policy targeting.
Methodological and validation studies using high‑resolution satellite imagery and field sensors that show improved measurement accuracy versus traditional survey methods; referenced empirical demonstrations in the literature.
The paper proposes specific metrics and empirical follow-ups (e.g., generation-to-verification throughput ratios, defect accumulation rates, time-to-acceptance for machine-generated artifacts, incident rates attributable to unverified AI outputs) to validate the model.
Explicit recommendations and measurement proposals listed in the paper; no empirical implementation provided.
The paper’s own drafting began via casual AI conversation, presented as an illustrative case supporting the model.
Author-reported anecdote (N=1; the paper's drafting process).
Enhanced gross‑flows estimation using longitudinal microdata can better track transitions (job-to-job, upskilling, unemployment spells) and measure occupational churn and reallocation.
Established econometric practice cited in paper; recommendation to use panel/admin microdata (CPS longitudinal supplements, LEHD/LODES, UI records); no new empirical results but aligns with standard methods.
Team Situation Awareness (shared perception, comprehension, projection) remains a useful analytic anchor for HAT even with agentic AI.
Conceptual analysis mapping Team SA components onto agentic AI interactions; literature review of Team SA utility in HAT contexts.
DAR produces ten falsifiable propositions explicitly mapped to measurement constructs, making the framework empirically testable.
Derivation and listing of ten testable propositions in the paper, each linked to observable measures and prioritized by feasibility. Theoretical derivation, no empirical tests provided.
Common uses of AI among practitioners include generating code snippets, suggesting fixes, accelerating routine tasks, surfacing design patterns or documentation, and scaffolding prototypes.
Practice-focused qualitative data from interviews and workflow analysis at Netlight; authors list these use-cases as commonly reported by practitioners; frequency counts not provided.
Practitioners use AI primarily as a practical assistant (coding, debugging, prototyping, knowledge retrieval) rather than as a fully autonomous developer.
Reported practitioner accounts and observations from the Netlight field study (interviews/observations); examples of tasks AI is used for were documented in the paper; sample limited to experienced consultants at one firm.
Experienced IT professionals at Netlight are already integrating AI tools into everyday development work.
Qualitative field study conducted at Netlight Consulting GmbH using interviews, observations, and analysis of practitioner workflows; single-firm sample (Netlight); exact number of participants not reported.
BERT-family encoders provide superior contextual understanding for sentiment analysis, intent detection, behavioural segmentation, and feature extraction from user signals compared to simpler feature pipelines.
Use of BERT encoders for classification tasks with offline metrics reported such as classification accuracy for intent/sentiment and user embedding quality for segmentation. (Specific datasets and sample sizes are not provided.)
Enablers of value realization are high-quality, integrated data; explicit data governance and metadata; process standardization; clear KPIs; user training and change management; and executive sponsorship.
Consistent findings across standards-based guidance, practitioner reports, and case studies from the 2020–2025 review highlighting these enablers as prerequisites or facilitators of success.
Value pathways enabled by ERP-integrated AI include improved visibility and real-time decisioning, automation of routine tasks, better forecasts and risk detection, and faster exception handling.
Thematic analysis across the reviewed literature (case studies and conceptual papers) identifying recurring mechanisms by which AI produced value in ERP contexts.
Observed AI techniques used in ERP contexts include supervised and unsupervised machine learning, predictive forecasting, anomaly/fraud detection, optimization, and explainable AI.
Systematic review of peer-reviewed articles, technical evaluations, and practitioner reports (2020–2025) documenting the methods applied in ERP and enterprise planning/control systems.
Durable benefits require the co‑evolution of technology, people, and process capabilities rather than technology deployment alone.
Interpretive framing and synthesis of multiple empirical case studies and best-practice guidance included in the 2020–2025 literature review; recurring theme across studies.
Continuous monitoring and observability for performance, compliance, and drift are essential to maintain operational stability and detect model or process degradation.
Prescriptive claim grounded in engineering practice and comparative analysis of failure modes; supported by illustrative deployments; no quantitative evaluation of monitoring impact reported.
Core governance components should include policy enforcement integrated into development and deployment pipelines, risk controls for data/model behavior/automated actions, explicit human-in-the-loop and human-on-the-loop oversight, continuous monitoring/logging/incident-response, and role-based governance structures linking legal, compliance, IT, and business units.
Prescriptive design based on literature synthesis and practitioner experience; described as core components in the proposed reference pattern (conceptual, case-illustrated).
Research needs include empirically measuring prevalence and average loss from prompt fraud incidents, evaluating effectiveness and cost-effectiveness of technical mitigations (watermarking, provenance), and modeling firm-level investment decisions under varying regulatory/insurance regimes.
Authors' recommended agenda for further research based on identified gaps in the paper's qualitative analysis.
The United States manages the openness–security trade-off via a decentralized, rights‑based coordination emphasizing procedural transparency and public accountability.
Qualitative content analysis of national‑level policy texts: 18 U.S. policy documents coded across the same four analytical dimensions.
Systems biology, constraint‑based metabolic modeling (e.g., FBA), kinetic modeling, and hybrid models are effective tools to predict fluxes and identify metabolic bottlenecks.
Discussion and aggregation of modeling studies using COBRA/OptFlux frameworks, FBA simulations, and kinetic/dynamic modeling applied to engineered strains to predict flux changes and suggest genetic interventions; validated in multiple reported DBTL cycles.
Engineered microorganisms are maturing into modular, programmable “microbial factories” capable of producing complex chemicals, specialty compounds, and next‑generation biofuels.
Synthesis of multiple experimental case studies reported in the literature (bench and pilot scale fermentations) demonstrating microbial production of natural products, specialty chemicals, and biofuel molecules using engineered strains and heterologous pathways; methods include pathway assembly, enzyme engineering, and fermentation optimization.
Cluster-level interpretation can be performed via LLM-based semantic decoding to generate concise human-readable labels and descriptions for discovered themes.
Pipeline step implemented: use of an LLM to decode cluster content and produce labels/descriptions; reported in experimental workflow on ICML and ACL abstracts.
Normalized representations can be embedded into a continuous vector space and then clustered using density-based clustering to identify latent themes without pre-specifying the number of topics.
Methodological pipeline: embedding model applied to normalized representations followed by density-based clustering (algorithmic property: density-based methods do not require pre-specified cluster count). Demonstrated in experiments on ICML and ACL 2025 abstracts.
Training improved exam scores by 0.27 grade points relative to optional access without training (p = 0.027).
Intent-to-treat comparison between the optional-access-with-training arm and the optional-access-without-training arm in the randomized trial (n = 164); reported effect size = +0.27 grade points and p-value = 0.027.
A brief, targeted training increased voluntary LLM use from 26% (optional access without training) to 41% (optional access with training).
Randomized experiment with 164 law students assigned to three arms (no access, optional access, optional access + ~10-minute training). Observed adoption rates in the two optional-access arms were 26% (untrained) vs. 41% (trained).