Evidence (7953 claims)
Adoption
5539 claims
Productivity
4793 claims
Governance
4333 claims
Human-AI Collaboration
3326 claims
Labor Markets
2657 claims
Innovation
2510 claims
Org Design
2469 claims
Skills & Training
2017 claims
Inequality
1378 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 402 | 112 | 67 | 480 | 1076 |
| Governance & Regulation | 402 | 192 | 122 | 62 | 790 |
| Research Productivity | 249 | 98 | 34 | 311 | 697 |
| Organizational Efficiency | 395 | 95 | 70 | 40 | 603 |
| Technology Adoption Rate | 321 | 126 | 73 | 39 | 564 |
| Firm Productivity | 306 | 39 | 70 | 12 | 432 |
| Output Quality | 256 | 66 | 25 | 28 | 375 |
| AI Safety & Ethics | 116 | 177 | 44 | 24 | 363 |
| Market Structure | 107 | 128 | 85 | 14 | 339 |
| Decision Quality | 177 | 76 | 38 | 20 | 315 |
| Fiscal & Macroeconomic | 89 | 58 | 33 | 22 | 209 |
| Employment Level | 77 | 34 | 80 | 9 | 202 |
| Skill Acquisition | 92 | 33 | 40 | 9 | 174 |
| Innovation Output | 120 | 12 | 23 | 12 | 168 |
| Firm Revenue | 98 | 34 | 22 | — | 154 |
| Consumer Welfare | 73 | 31 | 37 | 7 | 148 |
| Task Allocation | 84 | 16 | 33 | 7 | 140 |
| Inequality Measures | 25 | 77 | 32 | 5 | 139 |
| Regulatory Compliance | 54 | 63 | 13 | 3 | 133 |
| Error Rate | 44 | 51 | 6 | — | 101 |
| Task Completion Time | 88 | 5 | 4 | 3 | 100 |
| Training Effectiveness | 58 | 12 | 12 | 16 | 99 |
| Worker Satisfaction | 47 | 32 | 11 | 7 | 97 |
| Wages & Compensation | 53 | 15 | 20 | 5 | 93 |
| Team Performance | 47 | 12 | 15 | 7 | 82 |
| Automation Exposure | 24 | 22 | 9 | 6 | 62 |
| Job Displacement | 6 | 38 | 13 | — | 57 |
| Hiring & Recruitment | 41 | 4 | 6 | 3 | 54 |
| Developer Productivity | 34 | 4 | 3 | 1 | 42 |
| Social Protection | 22 | 10 | 6 | 2 | 40 |
| Creative Output | 16 | 7 | 5 | 1 | 29 |
| Labor Share of Income | 12 | 5 | 9 | — | 26 |
| Skill Obsolescence | 3 | 20 | 2 | — | 25 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
JobMatchAI optimizes utility across skill fit, experience, location, salary, and company preferences.
Paper claims the system's objective/utility function includes these factors and that the reranking/optimization accounts for them. No optimization algorithm details, weighting, or empirical utility gains are given in the excerpt.
JobMatchAI is production-ready.
Paper explicitly describes JobMatchAI as "production-ready" and also claims a hosted website and installable package (artifacts consistent with deployment readiness). No formal certification, deployment metrics, or uptime/performance SLAs are provided in the excerpt.
For AI agent tool design, surfacing contextual information outperforms prescribing procedural workflows.
Authors' conclusion drawn from the suite of experiments (GraphRAG vs TDD prompting vs auto-improvement) showing better regression reduction and/or resolution when contextual information is surfaced.
An autonomous auto-improvement loop raised resolution from 12% to 60% on a 10-instance subset with 0% regression.
Reported experiment on a 10-instance subset where an auto-improvement loop was applied (numbers provided in the excerpt).
Smaller models benefit more from contextual information (which tests to verify) than from procedural instructions (how to do TDD).
Inferred from comparative results across models (Qwen3-Coder 30B vs Qwen3.5-35B-A3B) and interventions (contextual test-surfacing vs TDD prompting) reported in the paper.
When deployed as an agent skill, GraphRAG improved resolution from 24% to 32%.
Empirical comparison reported in the evaluation on SWE-bench Verified (same experimental context as above).
TDAD's GraphRAG workflow reduced test-level regressions by 70% (from 6.08% to 1.82%).
Empirical result reported from the SWE-bench Verified evaluation using the GraphRAG workflow (sample details: Qwen3-Coder 30B on 100 instances and Qwen3.5-35B-A3B on 25 instances as reported).
Partial validation against observed AIS vessel behavior shows PIER is consistent with the fastest real transits while exhibiting 23.1× lower variance.
Comparison between PIER trajectories and observed fastest transits in AIS data (details in paper); reported relative variance reduction of 23.1×.
PIER eliminates catastrophic fuel waste: great-circle routing produces extreme fuel consumption (>1.5× median) in 4.8% of voyages, while PIER reduces this to 0.5% (a 9-fold reduction).
Analysis on the same 2023 AIS validation dataset across seven Gulf of Mexico routes (840 episodes per method) comparing distribution tails of voyage fuel consumption; reported incidence rates (4.8% vs 0.5%).
PIER reduces mean CO2 emissions by 10% relative to great-circle routing.
Offline evaluation using physics‑calibrated environments grounded in historical AIS data and ocean reanalysis products; validation on one full year (2023) of AIS across seven Gulf of Mexico routes with 840 episodes per method; reported mean reduction of 10% and bootstrap 95% CI for mean savings [2.9%, 15.7%].
The system is in production at Personize.ai.
Deployment statement in the paper asserting production use at Personize.ai.
The LoCoMo result confirms that governance and schema enforcement impose no retrieval quality penalty.
Interpretation in the paper linking LoCoMo benchmark accuracy (74.8%) to the conclusion that governance/schema enforcement did not degrade retrieval quality.
Governed Memory implements a closed-loop schema lifecycle with AI-assisted authoring and automated per-property refinement.
Design description in the paper describing the closed-loop schema lifecycle and AI-assisted authoring/refinement.
Governed Memory uses reflection-bounded retrieval with entity-scoped isolation.
Design description in the paper specifying reflection-bounded retrieval and entity-scoped isolation.
Governed Memory uses tiered governance routing with progressive context delivery.
Design description in the paper listing tiered governance routing and progressive delivery as mechanisms.
Governed Memory implements a dual memory model combining open-set atomic facts with schema-enforced typed properties.
Design specification within the paper describing the dual memory model (architectural mechanism).
The paper presents Governed Memory, a shared memory and governance layer addressing the memory governance gap.
System architecture and design description in the paper (proposal of a shared memory and governance layer).
The results confirm the positive impact of cognitive technologies on the development of entrepreneurial opportunities and innovative activity.
Conclusion drawn from the positive estimated association (0.33 coefficient) and the observed increases in the indices between 2020 and 2024 reported in the paper.
The Cognitive Tools Index and the Market Opportunity Index were -0.42 and -0.35 in 2020 and 0.94 and 0.92 in 2024, respectively.
Reported observed/computed index values for the years 2020 and 2024 in the study (data source and aggregation method not detailed in the excerpt).
The empirical study for 2020–2024 showed that a one standard unit increase in the Cognitive Tools Index is associated with an average 0.33 increase in the Market Opportunity Index.
Estimated coefficient reported from the panel econometric model over 2020–2024 (model included lags and used instrumental approach; sample size and standard errors not provided in the excerpt).
Pidgin significantly outperformed standard English on measures of knowledge transfer across agriculture, education, and health domains.
Aggregate analysis of questionnaire comprehension items (44-item instrument) across domain-specific modules administered to 45 participants; comparative language-performance results reported in study.
Volunteers who used proverbs and vernacular registers were incorporated into local kinship structures, granted traditional titles, and perceived as legitimate development actors rather than outsiders.
Qualitative evidence from participant observation and discourse samples collected during fieldwork; interview and questionnaire items on perceptions of volunteer legitimacy and social integration.
Agricultural techniques taught in Pidgin were nearly universally adopted by recipients.
Self-reported adoption/behavior-change items in the 44-item questionnaire and corroborating qualitative observation of agricultural practice among participants in the sample (N = 45).
Pidgin-mediated interventions achieved large comprehension gains on health messaging, exceeding 30 percentage points compared with standard English.
Quantitative comparison derived from the 44-item field questionnaire (comprehension items) administered to the 45-participant sample; reported percentage-point difference (>30 pp) in health-message comprehension by language of instruction.
Using Cameroon Pidgin English as the primary medium for Peace Corps development work produced substantially better knowledge transfer, uptake, and social legitimacy than standard English.
Mixed-methods field study of Peace Corps interventions in Cameroon's Northwest: 44-item questionnaire administered to 45 participants across agriculture, education, and health; quantitative measures of comprehension and self-reported adoption; supplemented by qualitative observation and discourse samples.
A hybrid strategic–computational framework, supported by governance mechanisms (human-in-the-loop checkpoints, escalation paths, accountability structures), is motivated to manage tensions and ensure responsible decision-making in AI-rich managerial contexts.
Synthesis-driven prescriptive framework produced by cross-framework analysis; conceptual recommendation rather than implementation evidence.
Roles oriented to information processing, optimisation, and operational precision (monitor, disseminator, resource allocator) are substantially enhanced by computational thinking (automation, optimisation, algorithmic decision-support).
Theoretical mapping of computational capabilities onto Mintzberg’s information-processing roles; conceptual reasoning without empirical validation.
AI adoption will shift fact-checking tasks (more monitoring, less rote verification), creating a need for reskilling and new roles (AI tool operators, analysts); donor and public investments should fund capacity building for local organizations.
Workforce implications inferred from interview reports about changing task mixes and the study's interpretive recommendations.
Investments should prioritize hybrid models where automation provides scale and humans handle contextual, adversarial, and legally sensitive judgments.
Recommendation based on interview findings about AI benefits and limitations and the study's interpretive synthesis.
The study distills context-sensitive best practices for fact-checking in restrictive environments, including safety protocols, local partnerships, and hybrid verification workflows.
Synthesis of findings from document analysis and interviews producing a set of recommended practices documented in the study's outputs.
AI can lower verification costs and scale reach by automating tasks such as classification, clustering, alerting, and translation.
Interview reports from platform staff and interpretive analysis identifying AI-assisted use cases for prioritization, monitoring, and translation.
Community reporting and audience-focused formats are used to improve engagement.
Platform outputs and staff interviews describing deployment of community-reporting mechanisms and tailored audience formats.
Platforms form partnerships with media outlets, academic institutions, and civil-society actors to amplify reach and secure data.
Interview accounts and organizational documents describing cross-sector partnerships and collaboration arrangements.
Transparent workflows and clear labeling are used to build credibility with audiences.
Document analysis of platform outputs and guidelines showing explicit workflow transparency and labeling practices, supported by interview statements.
Platforms emphasize local-language expertise and culturally grounded sourcing as a strategy to improve verification and credibility.
Observed practices and platform guidelines derived from document analysis and staff interviews describing the use of local-language expertise and sourcing.
Practical policy recommendation: require transparent documentation and third‑party auditing for high‑impact LLM deployments and subsidize public‑interest evaluation infrastructure.
Policy prescription supported by the paper's normative and economic analysis; no pilot implementation or empirical evaluation of the recommendation is provided.
Policy levers that can address alignment externalities include disclosure requirements (data provenance, evaluation practices), mandatory participatory evaluation for high‑impact systems, standards for auditing, procurement rules favoring participatory transparency, and liability/certification regimes.
Policy recommendation based on economic and governance reasoning and synthesis of prior regulatory proposals; no policy pilot data or impact evaluation is reported.
Economics research should develop multi‑dimensional metrics capturing welfare, distributional impacts, and autonomy rather than relying on single aggregate accuracy or safety scores.
Prescriptive recommendation grounded in critique of current benchmarking practices and theoretical desiderata; no new metric is empirically validated in the paper.
Dynamic constraints (continuous monitoring, feedback loops, and configurable safety settings that adapt post‑deployment) are preferable to static pre‑deployment-only safety fixes.
Conceptual argument and synthesis of deployment experience and monitoring literature; suggestions for operational tooling and monitoring rather than empirical evaluation.
Participatory governance—includes varied stakeholders such as users, affected communities, domain experts, and regulators in design, evaluation, and deployment decisions—will improve alignment outcomes and legitimacy.
Theoretical and normative argument citing participatory design literature and ethical governance scholarship; paper offers procedural recommendations but no empirical trial of governance models.
Alignment should shift from static, post‑training constraints (one‑off fixes like safety filters or RLHF alone) to dynamic, participatory systems that explicitly protect pluralism, autonomy, and justice.
Normative argument and conceptual synthesis drawing on literature in AI safety, value alignment, and participatory design; prescriptive reasoning rather than original empirical results.
Investment choices in collaboration AI and digital infrastructure become central strategic decisions affecting firms' comparative advantage.
Management literature synthesis and illustrative multinational cases; argument is conceptual without firm‑level comparative empirical data presented in the paper.
AI collaboration tools (virtual assistants, meeting summarizers, asynchronous platforms) complement hybrid work by reducing coordination costs and supporting dispersed teamwork.
Conceptual integration of technology and organizational literature; supported by illustrative case examples of multinational organizations but not by new quantitative causal evidence.
Hybrid and remote work increase employee autonomy and work–life integration.
Conceptual synthesis of sociological and management literatures; supported by secondary data and illustrative case studies from multinational organizations. No primary quantitative analysis or sample size reported—based on comparative case illustrations and theoretical integration.
Tariff reductions and expanded supply channels following CAFTA contributed as secondary channels to increased third‑country agricultural imports.
Paper documents tariff changes and supply‑channel expansion as part of mechanism analysis; DID and mediator tests link tariff reductions and expanded channels to import outcomes.
CAFTA improved logistics and service frictions (e.g., storage, logistics performance) relevant to agricultural imports.
Secondary channel analysis using logistics/storage indicators and related service frictions available in the data; assessed as mediators in the DID framework.
CAFTA widened China's trading‑partner and product diversity in agricultural imports, increasing both partner and product variety from third countries.
DID estimates on partner and product diversity metrics constructed from customs import records (2000–2014); reported changes in diversity as outcomes in the paper.
A complementary‑products linkage effect is a key mechanism: expanded channels and product complementarities make sourcing non‑ASEAN goods easier and more attractive.
Mechanism analysis using product‑level and partner‑level import data (China Customs) showing increased imports of complementary products and linkages consistent with this channel in DID estimates.
The primary spillover mechanism is a 'low‑cost import experience' effect: cheaper/consistent regional sourcing lowers firms' marginal cost of engaging additional foreign suppliers, encouraging imports from third countries.
Mechanism tests using mediator variables (cost/procurement indicators) within the DID framework and firm‑level data; reported as the main channel in the paper's analysis.
A new market will emerge for controls, certification, attestations, secure toolchains, and audited model deployments; compliance costs will shape comparative advantages among firms and countries.
Policy-market synthesis and analogies to certification markets in other regulated tech domains (qualitative).