Evidence (3029 claims)
Adoption
5200 claims
Productivity
4485 claims
Governance
4082 claims
Human-AI Collaboration
3029 claims
Labor Markets
2450 claims
Org Design
2305 claims
Innovation
2290 claims
Skills & Training
1920 claims
Inequality
1299 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 373 | 105 | 59 | 439 | 984 |
| Governance & Regulation | 366 | 172 | 114 | 55 | 717 |
| Research Productivity | 237 | 95 | 34 | 294 | 664 |
| Organizational Efficiency | 364 | 82 | 62 | 34 | 545 |
| Technology Adoption Rate | 292 | 115 | 66 | 27 | 504 |
| Firm Productivity | 274 | 33 | 68 | 10 | 390 |
| AI Safety & Ethics | 116 | 177 | 44 | 24 | 363 |
| Output Quality | 231 | 61 | 23 | 25 | 340 |
| Market Structure | 107 | 121 | 85 | 14 | 332 |
| Decision Quality | 158 | 68 | 33 | 17 | 279 |
| Employment Level | 70 | 32 | 74 | 8 | 186 |
| Fiscal & Macroeconomic | 74 | 52 | 32 | 21 | 183 |
| Skill Acquisition | 88 | 31 | 38 | 9 | 166 |
| Firm Revenue | 96 | 34 | 22 | — | 152 |
| Innovation Output | 105 | 12 | 21 | 11 | 150 |
| Consumer Welfare | 67 | 29 | 35 | 7 | 138 |
| Regulatory Compliance | 52 | 61 | 13 | 3 | 129 |
| Inequality Measures | 24 | 67 | 31 | 4 | 126 |
| Task Allocation | 70 | 9 | 29 | 6 | 114 |
| Error Rate | 42 | 47 | 6 | — | 95 |
| Training Effectiveness | 55 | 12 | 11 | 16 | 94 |
| Worker Satisfaction | 42 | 32 | 11 | 6 | 91 |
| Task Completion Time | 76 | 5 | 4 | 2 | 87 |
| Team Performance | 44 | 9 | 15 | 7 | 76 |
| Wages & Compensation | 38 | 13 | 19 | 4 | 74 |
| Hiring & Recruitment | 39 | 4 | 6 | 3 | 52 |
| Automation Exposure | 18 | 15 | 9 | 5 | 47 |
| Job Displacement | 5 | 29 | 12 | — | 46 |
| Developer Productivity | 27 | 2 | 3 | 1 | 33 |
| Social Protection | 18 | 8 | 6 | 1 | 33 |
| Worker Turnover | 10 | 12 | — | 3 | 25 |
| Creative Output | 15 | 5 | 3 | 1 | 24 |
| Skill Obsolescence | 3 | 18 | 2 | — | 23 |
| Labor Share of Income | 8 | 4 | 9 | — | 21 |
Human Ai Collab
Remove filter
This preference-learning approach enables the models to internalize and transfer latent consumer preference patterns, thereby mitigating the data sparsity issues prevalent in individual categories.
Claim based on the paper's reported approach: cross-category post-training and transfer of latent preferences; supported by experiments (paper states mitigation of data sparsity).
Debiasing via metadata redaction and explicit instructions restores detection in all interactive cases and 94% of autonomous cases.
Intervention experiments in Study 2 where metadata redaction and explicit instructions were applied to interactive assistants (e.g., GitHub Copilot) and autonomous agents (e.g., Claude Code); reported full restoration for interactive and 94% for autonomous.
The model implies testable governance diagnostics linking latent fragility to observable patterns: recorded dissent (anonymous vs. formal voting gaps), scenario-set diversity, pipeline and method concentration, and anchor lag.
Theoretical mapping from model primitives and observable quantities to proposed diagnostics; the paper enumerates observable patterns that should correlate with model-implied fragility. This is a theoretical implication rather than an empirically validated claim.
The clearest added value of AI over structured self-reflection lies in increasing felt accountability.
Based on RCT comparisons showing no significant AI advantage over the written-reflection questionnaire on overall goal progress, but showing higher perceived social accountability in the AI condition and a significant mediation of the AI effect on progress via perceived accountability (indirect effect = 0.15, 95% CI [0.04, 0.31]).
AI-assisted goal setting can improve short-term (two-week) goal progress.
Aggregate interpretation based on the RCT finding that the AI condition outperformed the no-support control on two-week goal progress (d = 0.33, p = .016); two-week follow-up window specified in study.
The AI increased perceived social accountability relative to the written-reflection questionnaire.
Reported comparison from the RCT showing higher perceived social accountability in the AI condition versus the written-reflection condition; measured via self-report scales at follow-up (exact scale and statistics reported in paper).
JobMatchAI provides factor-wise explanations through resume-driven search workflows.
Paper states that the system gives factor-wise explanations and ties them to resume-driven workflows; the excerpt references interpretable reranking and demo artifacts but does not include user study or explanation-faithfulness metrics.
JobMatchAI optimizes utility across skill fit, experience, location, salary, and company preferences.
Paper claims the system's objective/utility function includes these factors and that the reranking/optimization accounts for them. No optimization algorithm details, weighting, or empirical utility gains are given in the excerpt.
JobMatchAI is production-ready.
Paper explicitly describes JobMatchAI as "production-ready" and also claims a hosted website and installable package (artifacts consistent with deployment readiness). No formal certification, deployment metrics, or uptime/performance SLAs are provided in the excerpt.
For AI agent tool design, surfacing contextual information outperforms prescribing procedural workflows.
Authors' conclusion drawn from the suite of experiments (GraphRAG vs TDD prompting vs auto-improvement) showing better regression reduction and/or resolution when contextual information is surfaced.
An autonomous auto-improvement loop raised resolution from 12% to 60% on a 10-instance subset with 0% regression.
Reported experiment on a 10-instance subset where an auto-improvement loop was applied (numbers provided in the excerpt).
Smaller models benefit more from contextual information (which tests to verify) than from procedural instructions (how to do TDD).
Inferred from comparative results across models (Qwen3-Coder 30B vs Qwen3.5-35B-A3B) and interventions (contextual test-surfacing vs TDD prompting) reported in the paper.
When deployed as an agent skill, GraphRAG improved resolution from 24% to 32%.
Empirical comparison reported in the evaluation on SWE-bench Verified (same experimental context as above).
TDAD's GraphRAG workflow reduced test-level regressions by 70% (from 6.08% to 1.82%).
Empirical result reported from the SWE-bench Verified evaluation using the GraphRAG workflow (sample details: Qwen3-Coder 30B on 100 instances and Qwen3.5-35B-A3B on 25 instances as reported).
The system is in production at Personize.ai.
Deployment statement in the paper asserting production use at Personize.ai.
The LoCoMo result confirms that governance and schema enforcement impose no retrieval quality penalty.
Interpretation in the paper linking LoCoMo benchmark accuracy (74.8%) to the conclusion that governance/schema enforcement did not degrade retrieval quality.
Governed Memory implements a closed-loop schema lifecycle with AI-assisted authoring and automated per-property refinement.
Design description in the paper describing the closed-loop schema lifecycle and AI-assisted authoring/refinement.
Governed Memory uses reflection-bounded retrieval with entity-scoped isolation.
Design description in the paper specifying reflection-bounded retrieval and entity-scoped isolation.
Governed Memory uses tiered governance routing with progressive context delivery.
Design description in the paper listing tiered governance routing and progressive delivery as mechanisms.
Governed Memory implements a dual memory model combining open-set atomic facts with schema-enforced typed properties.
Design specification within the paper describing the dual memory model (architectural mechanism).
The paper presents Governed Memory, a shared memory and governance layer addressing the memory governance gap.
System architecture and design description in the paper (proposal of a shared memory and governance layer).
A hybrid strategic–computational framework, supported by governance mechanisms (human-in-the-loop checkpoints, escalation paths, accountability structures), is motivated to manage tensions and ensure responsible decision-making in AI-rich managerial contexts.
Synthesis-driven prescriptive framework produced by cross-framework analysis; conceptual recommendation rather than implementation evidence.
Roles oriented to information processing, optimisation, and operational precision (monitor, disseminator, resource allocator) are substantially enhanced by computational thinking (automation, optimisation, algorithmic decision-support).
Theoretical mapping of computational capabilities onto Mintzberg’s information-processing roles; conceptual reasoning without empirical validation.
AI adoption will shift fact-checking tasks (more monitoring, less rote verification), creating a need for reskilling and new roles (AI tool operators, analysts); donor and public investments should fund capacity building for local organizations.
Workforce implications inferred from interview reports about changing task mixes and the study's interpretive recommendations.
Investments should prioritize hybrid models where automation provides scale and humans handle contextual, adversarial, and legally sensitive judgments.
Recommendation based on interview findings about AI benefits and limitations and the study's interpretive synthesis.
The study distills context-sensitive best practices for fact-checking in restrictive environments, including safety protocols, local partnerships, and hybrid verification workflows.
Synthesis of findings from document analysis and interviews producing a set of recommended practices documented in the study's outputs.
AI can lower verification costs and scale reach by automating tasks such as classification, clustering, alerting, and translation.
Interview reports from platform staff and interpretive analysis identifying AI-assisted use cases for prioritization, monitoring, and translation.
Community reporting and audience-focused formats are used to improve engagement.
Platform outputs and staff interviews describing deployment of community-reporting mechanisms and tailored audience formats.
Platforms form partnerships with media outlets, academic institutions, and civil-society actors to amplify reach and secure data.
Interview accounts and organizational documents describing cross-sector partnerships and collaboration arrangements.
Transparent workflows and clear labeling are used to build credibility with audiences.
Document analysis of platform outputs and guidelines showing explicit workflow transparency and labeling practices, supported by interview statements.
Platforms emphasize local-language expertise and culturally grounded sourcing as a strategy to improve verification and credibility.
Observed practices and platform guidelines derived from document analysis and staff interviews describing the use of local-language expertise and sourcing.
Investment choices in collaboration AI and digital infrastructure become central strategic decisions affecting firms' comparative advantage.
Management literature synthesis and illustrative multinational cases; argument is conceptual without firm‑level comparative empirical data presented in the paper.
AI collaboration tools (virtual assistants, meeting summarizers, asynchronous platforms) complement hybrid work by reducing coordination costs and supporting dispersed teamwork.
Conceptual integration of technology and organizational literature; supported by illustrative case examples of multinational organizations but not by new quantitative causal evidence.
Hybrid and remote work increase employee autonomy and work–life integration.
Conceptual synthesis of sociological and management literatures; supported by secondary data and illustrative case studies from multinational organizations. No primary quantitative analysis or sample size reported—based on comparative case illustrations and theoretical integration.
Generative AI functions as a socio‑technical intermediary that facilitates interpretation, coordination, and decision support rather than merely automating discrete tasks.
Thematic analysis and co‑word linkage between terms related to interpretative work, coordination, and decision‑support and technical GenAI terms within the corpus.
The literature indicates a managerial shift away from hierarchical command‑and‑control toward guide‑and‑collaborate paradigms, where managers curate, guide, and coordinate AI‑augmented teams rather than micro‑manage tasks.
Synthesis of themes from the 212‑paper corpus (co‑word and thematic analyses) showing recurrent managerial/behavioural concepts such as autonomy, coordination, and decision‑support tied to GenAI discussions.
Economic models of firm behavior and market microstructure should incorporate endogenous, adaptive segmentation processes and faster feedback loops enabled by human–AI systems; ABS and large‑scale interaction data can be used to calibrate such models.
Methodological recommendation grounded in the study's mixed‑methods findings (ABS experiments and 150M interaction dataset) and observed differences between autopoietic and traditional STP regimes.
Canvas Design Principles mitigate algorithmic myopia (overfitting to historical patterns) and improve adaptability and resource efficiency.
Set of design principles proposed in the paper and evaluated through agent‑based simulation scenarios and analyses of the large behavioral dataset. Specific experimental details and quantitative effect sizes for these principles are not detailed in the summary.
Reconceptualizing STP as an autopoietic (self‑organizing) system enables continuous human–AI co‑creation and yields better outcomes in unstable markets than traditional, process‑based STP.
Conceptual argument grounded in 6‑month lab ethnography (n = 23), design and deployment of the Algorithmic Canvas in that lab context, and validation via large behavioral dataset analyses and agent‑based simulations.
Algorithmic co‑creation methods detect substantial market fluctuations about 5.8× better than traditional approaches.
Computational analysis of large behavioral dataset (150 million customer interactions) and comparative performance evaluation in empirically grounded agent‑based simulations. The detection metric and statistical significance details are not provided in the summary.
The autopoietic model shortens strategic planning cycle length by approximately 90%.
Observed/recorded time‑to‑update or strategy revision metrics gathered via Algorithmic Canvas usage and lab ethnography (6‑month lab ethnography inside a Fortune 500 company, n = 23). Exact measurement protocol and whether reduction measured in live firms, simulations, or system logs is not fully detailed in the summary.
Design and policy interventions that encourage active human contributions (e.g., draft-first workflows, co-creation interfaces, training) can help preserve worker agency and mitigate psychological costs.
Recommendation based on experimental evidence that Active-collaboration preserved psychological outcomes relative to passive use; presented as policy/design prescription rather than directly tested intervention at scale.
A complementary real-world survey (N = 270) across diverse tasks reproduced the experimental pattern, suggesting external validity beyond the lab writing tasks.
Cross-sectional survey of N = 270 respondents reporting on their AI use across multiple task types; reported patterns consistent with the experiment (passive use associated with lower efficacy/ownership/meaningfulness; active collaborative use did not).
Effective teams tend to evolve from ad-hoc interpretive methods toward systematic evaluation by (a) formalizing prompts/tests, (b) instrumenting outputs, (c) mapping failure modes to remediation paths, and (d) creating organizational decision rules.
Pattern observed in the qualitative coding of interviews where participants described trajectories or steps their teams took to formalize evaluation.
Successful teams close the results-actionability gap by systematizing interpretive practices and creating clearer pathways from evaluation signals to product changes.
Interview accounts and cross-case analysis showing some teams adopting formalization steps (e.g., standardized prompts/tests, instrumentation, remediation mappings) that participants described as enabling action.
Prioritizing asymmetrical responsibility may justify constraints on certain AI deployments (e.g., in care), shifting welfare analyses to incorporate dignity, vulnerability, and non-quantifiable harms.
Policy and normative recommendation grounded in Levinasian ethics and illustrative domain examples; no formal welfare model or empirical policy evaluation in the paper.
Emmanuel Levinas’s notion of infinite, asymmetrical responsibility to the Other provides a more incisive framework than pluralist balancing for diagnosing and responding to responsibility gaps in hybrid human–robot assemblages.
Normative-philosophical argumentation and interdisciplinary synthesis; illustrated with qualitative vignettes/case studies from healthcare robotics, autonomous vehicles, and algorithmic governance. No quantitative data or formal empirical test.
Adoption of AI feedback could lower marginal costs of delivering high-quality feedback and change fixed vs. variable cost structures for instruction delivery.
Economic implication discussed by workshop participants (50 scholars) as a theoretical possibility; no quantitative cost estimates in the report.
Generative AI can enable new feedback modalities (text, hints, worked examples, formative prompts) adaptable to content and learner needs.
Thematic conclusions from the interdisciplinary meeting of 50 scholars, describing possible modality generation capabilities of current generative models; no empirical modality-comparison data provided.
Immediate AI-generated feedback may sustain learner momentum and improve formative assessment cycles (timeliness & engagement).
Expert-opinion synthesis from structured workshop (50 scholars) identifying timely feedback as a potential pedagogical benefit; no empirical trials reported.