Evidence (7448 claims)
Adoption
5267 claims
Productivity
4560 claims
Governance
4137 claims
Human-AI Collaboration
3103 claims
Labor Markets
2506 claims
Innovation
2354 claims
Org Design
2340 claims
Skills & Training
1945 claims
Inequality
1322 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 378 | 106 | 59 | 455 | 1007 |
| Governance & Regulation | 379 | 176 | 116 | 58 | 739 |
| Research Productivity | 240 | 96 | 34 | 294 | 668 |
| Organizational Efficiency | 370 | 82 | 63 | 35 | 553 |
| Technology Adoption Rate | 296 | 118 | 66 | 29 | 513 |
| Firm Productivity | 277 | 34 | 68 | 10 | 394 |
| AI Safety & Ethics | 117 | 177 | 44 | 24 | 364 |
| Output Quality | 244 | 61 | 23 | 26 | 354 |
| Market Structure | 107 | 123 | 85 | 14 | 334 |
| Decision Quality | 168 | 74 | 37 | 19 | 301 |
| Fiscal & Macroeconomic | 75 | 52 | 32 | 21 | 187 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Skill Acquisition | 89 | 32 | 39 | 9 | 169 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 106 | 12 | 21 | 11 | 151 |
| Consumer Welfare | 70 | 30 | 37 | 7 | 144 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 68 | 31 | 4 | 127 |
| Task Allocation | 75 | 11 | 29 | 6 | 121 |
| Training Effectiveness | 55 | 12 | 12 | 16 | 96 |
| Error Rate | 42 | 48 | 6 | — | 96 |
| Worker Satisfaction | 45 | 32 | 11 | 6 | 94 |
| Task Completion Time | 78 | 5 | 4 | 2 | 89 |
| Wages & Compensation | 46 | 13 | 19 | 5 | 83 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 17 | 9 | 5 | 50 |
| Job Displacement | 5 | 31 | 12 | — | 48 |
| Social Protection | 21 | 10 | 6 | 2 | 39 |
| Developer Productivity | 29 | 3 | 3 | 1 | 36 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Skill Obsolescence | 3 | 19 | 2 | — | 24 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Labor Share of Income | 10 | 4 | 9 | — | 23 |
To measure and monitor these effects, researchers should track firm-level adoption of AI features, fulfillment automation intensity, platform-mediated market entry, and task-level labor shifts.
Author recommendations based on gaps identified in the case-based and multi-modal empirical work and the sensitivity of results to adoption measures; not an empirical finding but a methodological claim.
Policy priorities should differ by national Skill Imbalance: countries with strong demand for new skills should prioritize education and reskilling, while countries with strong supply should prioritize firm absorption (innovation, financing, technology adoption).
Interpretation of cross-country Skill Imbalance Index and its implications; prescriptive recommendation based on the observed demand–supply patterns rather than causal testing of policies.
The threshold for taxing AI may be crossed once AI becomes sufficiently capable in substituting humans across cognitive tasks.
Model-based comparative-static/threshold analysis showing that higher AI substitutability for cognitive tasks increases the likelihood that cognitive workers will consider switching to manual jobs, thereby meeting the model's tax-initiation condition.
The results indicate the need to build digital infrastructure, human capital, and support open data.
Policy recommendation provided in the paper based on the empirical findings linking cognitive tools to market opportunities (specific cost–benefit or implementation analyses not provided in the excerpt).
Developing domain-specific vernacular NLP and speech models (health, agriculture, education) would help replicate pragmatic features (proverbs, registers) that enable epistemic appropriation.
Policy/research recommendation based on qualitative findings that proverbs and registers confer legitimacy and facilitate knowledge transfer; no experimental NLP work reported in study.
Local-language (vernacular) inclusion improves economic returns to development interventions by increasing comprehension and adoption, thereby improving program cost-effectiveness.
Logical extrapolation from observed higher comprehension and adoption rates in the field sample (N = 45); no direct economic cost–benefit analysis reported in the study—claim framed as implication for AI economics.
Economic and organizational benefits (e.g., cost-effective retention, preserved human capital for environmental innovation) are plausible outcomes of applying the approach, but require further causal and cost analyses.
Paper discusses implications and hypothesizes ROI from reduced turnover (less recruiting/onboarding/productivity loss) and preservation of green capabilities; no empirical cost or productivity data provided in the presented summary.
Findings support regulatory focus on transparency, auditability, and consumer protections because low trust would slow adoption and reduce welfare gains from AI marketing.
Policy implication derived from empirical association between trust and adoption/loyalty in the study; regulatory effects were not empirically tested in the paper.
Investments in trustworthy AI systems (privacy, transparency, fairness) can increase retention and customer lifetime value because trust raises loyalty directly and via adoption.
Managerial implication inferred from observed positive direct and indirect effects of Trust on Brand Loyalty in the SEM results; CLV and retention were not directly measured.
Firms investing in human–AI co‑creation infrastructure may gain a resilience premium; policymakers and standards bodies should consider governance frameworks for adaptive algorithmic systems balancing responsiveness with oversight.
Policy and investment implication inferred from empirical results on resilience and detection performance; direct evidence of market valuation or policy outcomes is not reported.
Greater reliance on algorithmic co‑creation shifts labor demand toward roles skilled in model oversight, interpretive judgment, and human‑machine interaction rather than purely manual segmentation tasks.
Inference from the operationalization of human–AI co‑creation via the Canvas and observed changes in practitioner workflows during 6‑month ethnography (n = 23); workforce composition effects are not empirically measured at scale in the study.
A ~90% reduction in strategic planning cycle time indicates lower managerial coordination costs and faster reallocation of marketing and R&D budgets.
Inference from measured reduction in planning cycle length (~90%) observed in the study (see ethnography/system logs); direct measures of coordination costs and budget reallocation outcomes are not reported in the summary.
Algorithmic Canvas–enabled autopoietic STP increases firms' ability to adapt endogenously to shocks, implying higher realized productivity in volatile markets and lower deadweight losses from mis‑targeting.
Inference drawn from empirical findings on resilience and detection performance (44% greater resilience, improved signal detection) and theoretical reasoning about dynamic capabilities; productivity and deadweight loss are not directly measured in the reported empirical results.
Economic evaluations of AI adoption should include psychological and human-capital externalities (effects on self-efficacy, skill depreciation, job satisfaction) to fully account for welfare and productivity dynamics.
Argument grounded in experimental and survey findings showing psychological impacts of AI-use mode; general recommendation for research and evaluation rather than an empirical finding.
Building and maintaining an open-access disclosure repository would enable comparability, aggregation, and public appraisal of environmental pressures.
Policy recommendation derived from conceptual analysis; no implemented repository or empirical evaluation reported.
Sustainability science can and should be used to identify a prioritized set of mandatory environmental disclosures focused on the most decision-relevant metrics that capture cumulative effects.
Policy proposal based on conceptual argument and suggested methodological steps; no pilot implementation or empirical validation provided.
A research agenda for AI economists should include building multimodal detection models for greenwashing and earnings management using text, financials, satellite imagery, and supply‑chain data.
Prescriptive research agenda item in the paper; no empirical implementation or benchmark results presented here.
AI and NLP methods can be used to scale verification of ESG disclosures by cross‑checking them with regulatory filings, news, supply‑chain data, satellite imagery, and alternative data to flag inconsistencies.
Proposed methodological solution in the paper's implications and research agenda; suggestion is prescriptive and not validated by new experiments in this review.
Realizing net societal gains from AI requires human-centered design, regulatory and control measures, and integration of sustainability indicators into technological development.
Normative conclusion drawn from the narrative review of interdisciplinary evidence and policy recommendations; not an empirically validated claim within this paper.
If banks operationalize NLP for personalization and acquisition at scale, this could increase differentiation, raise switching costs, and potentially affect market concentration—warranting antitrust monitoring.
Theoretical implication extrapolated from identified capability gaps and economic reasoning about differentiation, switching costs, and scaling advantages; not empirically tested in the reviewed papers.
Limited applied research on NLP for acquisition and personalization implies unrealized value in banking: NLP could enable more efficient, targeted customer acquisition and cross‑sell, potentially lowering customer‑acquisition cost (CAC) and increasing lifetime value (LTV).
Inference drawn from observed topical gaps (low article counts on acquisition/personalization) and standard marketing economics linking targeting/personalization to CAC and LTV; no direct causal evidence provided in the reviewed literature.
Multilateral coordination is needed to set baseline principles (data flows, privacy, AI safety, competition rules) to reduce regulatory fragmentation.
Scenario-based reasoning and policy prescription grounded in theoretical analysis of fragmentation costs; normative recommendation rather than empirical proof.
Research and funding priorities should reweight toward symbolic/structured knowledge, verification, curricula design, and orchestration algorithms rather than exclusive emphasis on model scale.
Prescriptive recommendation based on the conceptual advantages claimed for DSS; not supported by empirical policy or funding analysis within the paper.
Smaller, verifiable DSS agents are easier to audit and align per domain, potentially reducing systemic risks associated with large opaque generalist models.
Argumentative claim about auditability and verifiability of compact, domain-specific systems versus large generalists; no empirical auditability studies are provided.
DSS reduces environmental externalities (e.g., emissions, water use) relative to continued monolithic scaling and may reduce regulatory pressure tied to those externalities.
Theoretical claim tying reduced inference energy and decentralized deployment to lower environmental impacts; the paper suggests measuring emissions and water use but supplies no empirical measurements.
Specialization enables many niche DSS providers rather than a small number of dominant monolithic providers, thereby lowering entry barriers for vertical experts.
Market-structure argument based on modularization and domain-focused offerings; no empirical market analysis or simulation is provided.
Shifting to DSS changes the cost structure of AI: it lowers recurring OPEX per user by reducing inference energy and enabling local/device processing instead of centralized, inference-heavy cloud services.
Economic reasoning and proposed modeling approaches (capex/opex comparisons) described conceptually; no empirical economic model outputs or market data are included.
DSS societies can achieve much lower inference energy per task and enable easier on-device/edge deployment compared to monolithic LLM deployments.
Argument that smaller, domain-focused models require fewer compute resources and thus lower energy and are better suited to edge hardware; empirical measurements to support this claim are proposed but not supplied.
Architecturally, replacing single giant generalists with 'societies' of small, specialized DSS models routed by orchestration agents yields operational benefits (routing to experts, modular upgrades, specialization).
Conceptual architectural proposal describing specialized back-ends and orchestration/routing agents; the paper outlines recommended experiments but reports no empirical orchestration benchmarks.
A more sustainable and effective trajectory is to build domain-specific superintelligences (DSS) grounded in explicit symbolic abstractions (knowledge graphs, ontologies, formal logic) and trained via synthetic curricula so compact models can learn robust, domain-level reasoning.
Prescriptive proposal based on theoretical arguments about the benefits of symbolic abstractions, compact model training, and synthetic curricula; no experimental validation or empirical comparison is provided in the paper.
Standardizing these infra-level primitives could lower integration costs across ecosystems and accelerate enterprise adoption of agent-hosted services.
Policy/economic argument presented in the paper's implications and research directions; no empirical standardization impact study provided.
Missing infraprotocol primitives in MCP create opportunities for platform differentiation—providers implementing CABP/ATBA/SERF-like extensions can capture value by offering more production-ready agent tooling.
Strategic/economic reasoning stated in the implications section; not supported by empirical market-share data in the summary.
A concrete empirical test recommended by the paper is to run controlled comparisons of distribution-shift generalization between negative-only, preference-only, and hybrid-trained models across safety and usefulness metrics.
Methodological recommendation given in the paper; it is not an empirical result but an explicitly proposed verifiable experiment for future work.
Regulators could feasibly focus on certifying constraint datasets and testing model adherence to explicit prohibitions, since constraint compliance is empirically testable and verifiable.
Policy recommendation derived from the paper's epistemic argument about constraints being verifiable; presented as a plausible regulatory strategy rather than one already validated by policy experiments.
There is a commercial opportunity for startups and vendors to specialize in 'constraint datasets' and constitutional-rule libraries as tradable assets.
Market/economic inference made from the technical claim that constraints are verifiable and reusable; no empirical industry survey data provided—this is a forward-looking implication.
If negative/safety-focused signals are more sample- and compute-efficient for certain alignment goals, firms may reallocate labeling budgets away from costly preference elicitation toward collecting high-quality negative examples and rule sets.
Economic implication extrapolated from the paper's sample-efficiency claim; the paper reasons from technical sample-efficiency arguments and cited empirical parity but does not present market-level empirical data.
Improved alignment can reduce harms from misinterpretation (incorrect decisions, misinformation), lowering downstream liability and reputational risk for vendors and customers.
Paper's safety and externalities discussion argues this as a likely consequence; the claim is theoretical and not supported by empirical incident data in the paper.
Providers may charge a premium for alignment-enabled API tiers or incorporate C.A.P. into enterprise plans because of additional compute per interaction, affecting pricing and unit economics.
Paper's pricing and costs discussion predicts potential monetization strategies and pricing experiments (A/B pricing, willingness-to-pay studies) but does not report market data.
C.A.P. has potential economic effects: it can reduce time lost to misinterpretation, thereby increasing effective throughput and productivity, though net gains depend on trade-offs with pre-processing overhead.
Economic implications section provides conceptual cost–benefit arguments and recommends pilot measurements (time saved, reduced human review cost) but provides no empirical economic measurement.
C.A.P. shifts interactions from one-way command-execution to two-way, partnership-style collaboration, increasing perceived partnerliness.
Theoretical argument drawing on cognitive science and Common Ground theory and proposed human-evaluation measures (satisfaction, perceived collaboration); no empirical human-subject results reported.
C.A.P. improves long-term and dynamic dialogue alignment and reduces off-topic or mechanically incorrect responses.
Main argument of the paper based on the combined functions (expansion, weighted retrieval, alignment verification, clarification); the paper provides conceptual/theoretical justification but does not report large-scale empirical results.
Public archives of prompts and commits accelerate diffusion by lowering search/learning costs and enabling replication, thereby increasing adoption speed and lowering entry barriers.
Paper's asserted implication based on the existence of public artifacts and general reasoning about knowledge diffusion; this is an interpretive claim rather than an experimentally validated finding (argumentative, extrapolative).
Developing economic metrics linked to architecture (interoperability indices, expected upgrade cost, observability coverage, market concentration measures, systemic‑risk indicators) is recommended to guide policy and investment.
Policy recommendation grounded in the paper's normative analysis; no pilot metric development or empirical validation presented.
The benchmark provides a testbed useful for studying strategic behavior, coordination failures, and market-like interactions among agents, which can inform economic research and policy.
Paper claims the benchmark's multi-agent, strategic tasks can be used as experimental environments for economic and policy research; this is a normative claim supported by the benchmark's design rather than by empirical studies in the paper.
Open-source orchestration lowers entry barriers, broadening participation and potentially compressing rents that would otherwise accrue to well-resourced incumbents.
Paper's discussion section argues that releasing orchestration and evaluation tools publicly reduces the technical overhead for entrants; this is a theoretical/observational claim rather than empirically measured in the paper.
The clear performance gaps indicate high returns to specialized efforts (RL, domain-specific engineering) relative to generalist LLM-only approaches, shaping where teams invest labor and compute.
Paper links benchmarking results (performance gaps between baselines and humans) to economic implications, arguing specialization yields higher returns; this is an interpretive claim based on reported performance differentials.
Benchmarks like PokeAgent will reallocate researcher and industry attention toward multi-agent, partial-observability, and long-horizon planning problems—likely increasing funding and compute investment in RL and hybrid LLM+RL methods.
Paper offers an economic/implication analysis arguing that introducing such a benchmark changes incentives and investment patterns; this is a reasoned projection rather than an empirical observation.
Public investment in open environments, robotics testbeds, and safety research can reduce concentration risks and externalities and democratize access to embodied AI research.
Policy recommendation based on anticipated strategic importance of shared infrastructure; not empirically validated here.
Value in the AI ecosystem may shift from passive text/image corpora toward rich interaction datasets and simulated/real environments; ownership and control of simulation platforms and testbeds could become strategically important assets.
Economic and strategic inference from the proposed technical emphasis on embodied/interaction learning; no supporting market data in the paper.
Increased sample efficiency and transfer will reduce compute and data costs, lowering barriers to entry for firms and broadening feasible AI applications.
Economic argument connecting technical metrics to cost and market effects; not empirically demonstrated in the paper.