Evidence (4175 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Org Design
Remove filter
Firms need complementary investments (data pipelines, monitoring tools, feedback loops, human oversight systems) which materially affect the economics of adoption.
Industry case studies and practitioner reports synthesized in the review describing necessary complementary investments; no quantified investment sample or ROI analysis provided here.
Regulatory attention is likely to focus on transparency, liability for factual errors, data privacy, and nondiscrimination; compliance and auditing will add to adoption costs.
Policy and regulatory analyses aggregated in the review and references to ongoing regulatory discussions; no primary regulatory impact study conducted in this paper.
Generative AI currently lacks genuine empathy and relational capabilities necessary for high-stakes or sensitive interactions.
Conceptual analyses and practitioner case examples aggregated in the review; limited direct quantitative measurement cited in this brief review.
Generative models exhibit contextual misunderstandings and cannot reliably infer nuanced customer intent in all cases.
Synthesis of empirical studies and practitioner observations documenting misinterpretation and intent-detection failures; no new testing reported in this review.
There is substitution risk: routine ideation and drafting tasks may be automated, altering task-level labor demand and wage structure.
Task-automation literature and empirical studies of LLMs performing routine drafting/ideation tasks summarized in the review; no long-run labor-market causality established in the paper.
Generative AI lacks reliable situational judgment on ambiguous problems and on ethical trade-offs, making it insufficient for autonomous decision-making in such contexts.
Case examples and experimental studies cited in the synthesis showing inconsistent or inappropriate responses to ambiguous/ethical scenarios; no large-scale causal evidence provided.
LLMs are prone to bias, mediocrity, and factual or logical errors when domain-specific context or experiential knowledge is absent.
Review of empirical evaluations documenting biased outputs, superficial or mediocre suggestions, and factual errors in open-ended tasks and domain-specific prompts; evidence comes from multiple short-term studies and applied examples.
LLMs are predominantly recombinative — they tend to rework and recombine existing material rather than produce deeply novel insights.
Analytical synthesis of output analyses and creativity assessments from multiple empirical studies demonstrating frequent recombination of existing concepts and lower rates of highly original novelty; studies and measures vary.
Proliferation of low-quality or biased AI-generated ideas creates externalities: increased filtering and reputational costs for firms and risks of poor product designs, ethical lapses, or regulatory violations if evaluation is insufficient.
Case studies and qualitative reports documenting filtering burdens and instances of biased/misleading outputs; theoretical reasoning about reputational and regulatory risks; direct quantification of these externalities is limited.
Standard productivity metrics (e.g., TFP) may undercount the value of ideation and creative augmentation provided by generative AI, making attribution between human and AI contributions difficult.
Methodological discussion in the review supported by heterogeneity in outcome measures across studies and challenges in measuring implemented idea quality and long-run impacts.
Generative models exhibit recombination bias: they tend to remix existing patterns rather than produce deeply original, paradigm-shifting insights.
Synthesis of output analyses across studies showing frequent recombination of known patterns and limited evidence of wholly novel, paradigm-changing ideas; claim based on qualitative and comparative analyses in reviewed literature.
AI illiteracy (lack of understanding of AI capabilities/limits) impedes adoption and appropriate use of AI tools in finance.
Survey and interview data reporting lower adoption/intended use among respondents with limited self-reported AI understanding; supplemented by qualitative explanations; sample described as finance professionals across multinational institutions (size unspecified).
Excessive reliance on algorithmic suggestions can erode human judgment and create systemic risks.
Interview reports and, where available, operational/risk metrics indicating overreliance patterns; authors note systemic-risk implications based on combined qualitative and quantitative observations (no causal identification reported).
Cognitive biases and inappropriate trust (both overtrust and distrust) distort decision outcomes and limit the benefits of AI-assisted decision-making.
Qualitative interview evidence describing instances of cognitive bias and misplaced trust; some quantitative indicators of decision distortion and risk where operational performance/risk metrics were available; sample: finance professionals across multinational institutions (detailed metrics not specified).
Increased monitoring and algorithmic management raise concerns about worker autonomy and privacy and will prompt regulatory responses (data protection, algorithmic transparency) that shape adoption costs and trajectories.
Recurring concerns reported across included studies and the review's policy implication section; grounded in qualitative and normative discussions within the literature.
Model risk, bias, and privacy concerns impose negative externalities (e.g., systemic risk in supply chains, discrimination), motivating governance standards, auditing, and possibly regulation.
Documentation in standards, practitioner reports, and conceptual literature within the 2020–2025 review describing incidents, risks, and calls for governance/regulation.
Firms will need to invest in new control technologies, governance structures, and personnel (AI auditors, red teams), increasing the total cost of GenAI adoption.
Economic reasoning and implications section; no empirical cost estimates or survey data; projection based on anticipated control needs.
Malicious insiders, external actors (vendors, consultants, customers), shadow AI (unsanctioned consumer-grade GenAI use), and supply-chain/third-party prompt templates are plausible attack vectors for prompt fraud.
Threat taxonomy and scenario mapping with case-style examples; conceptual identification of actors rather than documented incident attribution.
Poor logging, weak prompt governance, and over-reliance on machine-generated artifacts increase organizational vulnerability to prompt fraud.
Control gap analysis and prescriptive argumentation; examples of weak controls used to illustrate exploitability; no empirical measurement of effect sizes.
Because prompt fraud operates at the linguistic/procedural surface rather than the network/technical surface, existing control frameworks are ill-prepared to address this new attack surface.
Control gap analysis comparing conventional internal controls to the linguistic attack surface; conceptual rather than empirical evaluation.
Upfront governance costs (policy, tooling, staff) become a key part of adoption cost and affect ROI calculations and payback periods for automation investments.
Economic reasoning and implications discussed in the paper; no empirical cost data provided—recommendation based on practitioner experience and theoretical cost accounting.
Traditional automation governance is often ad hoc, underestimates security and compliance risks, and does not scale safely for mission-critical enterprise systems.
Synthesis of industry best practices and practitioner-sourced lessons (qualitative observations and case illustrations). No systematic survey or quantitative incidence rates provided.
Divergent governance regimes increase the risk of data localization, interoperability frictions, and regulatory fragmentation — raising costs for multinational AI development and limiting global model generalizability.
Policy‑level comparative inference from contrasting national approaches identified in the document analysis and related literature on cross‑border data governance; no direct measurement of costs or model generalizability in the paper.
State‑led coordination can rapidly mobilize resources and scale national champions, altering competitive dynamics and potentially creating winner‑take‑most outcomes.
Theoretical inference from document evidence of state mobilization and developmentalist goals in Chinese texts, combined with literature on state coordination and industrial scaling (no empirical competition measures in the paper).
Resource-rich labs and firms are likely to adopt LLM orchestration faster, which could widen gaps in research capacity between institutions and countries unless mitigated by policy choices.
Equity and diffusion argument based on resource requirements (compute, data, validation); no adoption-rate data or cross-institution comparisons provided.
There is potential for 'winner-take-most' market outcomes if a few players combine superior models, instrument control software, and exclusive datasets.
Economics reasoning about network effects and data concentration; no empirical market concentration metrics specific to microscopy provided.
Upfront investments required for compute, data labeling, validation, and safety testing may raise entry costs and favor incumbents.
Economic logic about fixed costs and scale advantages; no measured entry-cost or firm-dynamics data provided.
There is a risk of deskilling for some technical roles, creating implications for training and workforce development.
Theoretical reasoning about automation-induced deskilling; no empirical study or measured skill changes provided.
Human-in-the-loop controls formalize supervisory labor and create persistent oversight costs even after automation scales.
Pattern design and governance lifecycle recommendations highlighting human checkpoints; qualitative reasoning without measurement of oversight hours or costs.
Distributed training introduces novel incentive issues (free-riding, poisoning incentives, misreporting of local metrics) that require contractual and cryptographic solutions and may create demand for trusted intermediaries or certification markets.
Mechanism/incentive analysis within the paper; threat modeling and proposed governance solutions. No experimental evaluation of incentive mechanisms or market responses.
Federated infrastructures redistribute informational power — moving custody away from centralized platforms reduces their exclusive access to behavioral data and can lower their data-based market power.
Economic and institutional analysis (conceptual), discussion of informational rents and bargaining positions. This is a theoretical economic claim without empirical market measurement in the paper.
Fairness constraints (e.g., disparate ad delivery) and monitoring become more challenging to enforce and audit without centralized raw data, requiring new governance and measurement mechanisms.
Policy and governance analysis describing limitations of decentralized data for fairness monitoring; proposed policy-aware governance layer and attestation/audit mechanisms. No empirical validation of governance effectiveness provided.
Identified concrete training gaps in current models: delegation, scoped execution, and mode switching are skills absent from current training data and limit splitting models into manager/worker roles.
Authors' diagnosis based on experimental outcomes and qualitative reasoning about model training distributions; recommendation for future training focus.
Interpretive, ad-hoc human-centered evaluation practices (e.g., “vibe checks”, team sense-making) are rational adaptations to LLM behavior rather than merely sloppy or inferior methodological choices.
Authors' interpretive argument based on interview evidence where practitioners explained why such practices persist and how they serve sense-making for unpredictable model behavior.
AI changes the nature of capital (digital/algorithmic assets) and complicates productivity accounting; researchers should decompose firm-level productivity gains into AI technology, complementary organizational capital, and human capital effects.
Theoretical proposal grounded in productivity accounting literature and conceptual discussion; no single decomposition empirical result presented.
Policy and governance issues become salient: liability, IP, security, and certification of AI-generated code require new standards for provenance, testing, and accountability.
Argument based on practitioner-raised concerns about security, IP, and provenance in the Netlight study; authors recommend policy attention; no legal/regulatory analysis or empirical policy evaluation provided.
Pakistan’s IT sector employs around 600,000 people and generates billions in exports, with several cities (Karachi, Lahore, Islamabad) acting as software/AI/digital services hubs.
Background statistics reported in the paper (sector-level descriptive figures).
Much of the growth in solo entrepreneurial entry reflects low-commitment, experimental entry and does not translate into greater representation among the highest-quality outcomes.
Analysis of engagement/commitment indicators and ranking outcomes on Product Hunt showing that many new solo launches post-release are low-commitment/experimental and that these entrants are underrepresented among top-ranked/high-quality launches.
In aggregate, the strongest open-weight model matches GPT-5 on our benchmark while being substantially cheaper and faster to run.
Aggregate score comparisons between the top-performing open-weight model and GPT-5 on AgentFloor, together with reported cost and latency measurements (as described in the evaluation section).
AI is increasingly deployed in multi-agent systems; however, most research considers only the behavior of individual models.
Background/literature claim stated in the paper (asserted by authors as motivation).
Five interaction mechanisms were identified, with the majority propagating across the subsystem boundary.
Authors' thematic analysis and STS mapping identifying five cross- or within-subsystem interaction mechanisms; qualitative assessment that most propagate across subsystem boundary.
Mobile penetration reaches 84% (in the context of low-income countries), a statistic used to motivate RSI's potential reach.
Single numeric statistic reported in the paper as background context; source or empirical basis for the statistic not provided within the supplied text.
Output quality saturates at approximately seven governed memories per entity.
Empirical analysis reported in the controlled experiments showing output quality vs. number of governed memories per entity, with saturation near seven memories.
A Sankey diagram of thematic evolution shows lexical convergence over time and indicates that a small set of authors has disproportionate influence in structuring the discourse.
Thematic evolution analysis visualized with a Sankey diagram; author influence inferred from performance trends (citations/publication counts) in the bibliometric data.
This paper is one of the first systematic reviews focused specifically on NLP in bank marketing, organizing findings along the customer journey and the marketing mix to provide a practical taxonomy.
Authors' stated novelty claim based on the scoped literature search (2014–2024) and topical focus; novelty inferred from the small number of prior papers identified at the intersection.
Productivity gains from AI may be under- or mis-measured if national accounts and tax systems do not adjust for AI-driven quality changes in services.
Analytic observation in the paper's measurement and externalities discussion; not empirically tested within the study.
The paper documents production failure vignettes and operational lessons drawn from a real enterprise deployment integrated with a major cloud provider's MCP servers (client redacted).
Paper states empirical context is field lessons from an enterprise agent platform; failure vignettes are enumerated as deliverables.
ToM alignment matters less (i.e., misalignment has smaller effect) in settings with explicit coordination protocols, strong signaling, or standardized conventions.
Analyses and experiments described in the paper showing smaller performance differences between matched and mismatched ToM orders when explicit conventions or reliable signals are available; reported as part of robustness/conditional analyses.
Measuring AI's contribution to productivity and coordination effects will be challenging; new metrics (e.g., coordination time per task, error/rework rates attributable to communication lapses) are required.
Conceptual argument and recommended measurement agenda in the paper; no empirical testing of proposed metrics provided.
Many early-stage AI advances have not translated into higher Phase II/III success rates.
Synthesis of reported outcomes and failures from industry experience; no new systematic statistical analysis provided.