Evidence (3029 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	373	105	59	439	984
Governance & Regulation	366	172	114	55	717
Research Productivity	237	95	34	294	664
Organizational Efficiency	364	82	62	34	545
Technology Adoption Rate	292	115	66	27	504
Firm Productivity	274	33	68	10	390
AI Safety & Ethics	116	177	44	24	363
Output Quality	231	61	23	25	340
Market Structure	107	121	85	14	332
Decision Quality	158	68	33	17	279
Employment Level	70	32	74	8	186
Fiscal & Macroeconomic	74	52	32	21	183
Skill Acquisition	88	31	38	9	166
Firm Revenue	96	34	22	—	152
Innovation Output	105	12	21	11	150
Consumer Welfare	67	29	35	7	138
Regulatory Compliance	52	61	13	3	129
Inequality Measures	24	67	31	4	126
Task Allocation	70	9	29	6	114
Error Rate	42	47	6	—	95
Training Effectiveness	55	12	11	16	94
Worker Satisfaction	42	32	11	6	91
Task Completion Time	76	5	4	2	87
Team Performance	44	9	15	7	76
Wages & Compensation	38	13	19	4	74
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	18	15	9	5	47
Job Displacement	5	29	12	—	46
Developer Productivity	27	2	3	1	33
Social Protection	18	8	6	1	33
Worker Turnover	10	12	—	3	25
Creative Output	15	5	3	1	24
Skill Obsolescence	3	18	2	—	23
Labor Share of Income	8	4	9	—	21

Human Ai Collab Remove filter

This preference-learning approach enables the models to internalize and transfer latent consumer preference patterns, thereby mitigating the data sparsity issues prevalent in individual categories.

Claim based on the paper's reported approach: cross-category post-training and transfer of latent preferences; supported by experiments (paper states mitigation of data sparsity).

medium positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... mitigation of data sparsity through cross-category preference transfer

Debiasing via metadata redaction and explicit instructions restores detection in all interactive cases and 94% of autonomous cases.

Intervention experiments in Study 2 where metadata redaction and explicit instructions were applied to interactive assistants (e.g., GitHub Copilot) and autonomous agents (e.g., Claude Code); reported full restoration for interactive and 94% for autonomous.

medium positive Measuring and Exploiting Confirmation Bias in LLM-Assisted S... restoration of vulnerability detection (post-intervention detection rate)

The model implies testable governance diagnostics linking latent fragility to observable patterns: recorded dissent (anonymous vs. formal voting gaps), scenario-set diversity, pipeline and method concentration, and anchor lag.

Theoretical mapping from model primitives and observable quantities to proposed diagnostics; the paper enumerates observable patterns that should correlate with model-implied fragility. This is a theoretical implication rather than an empirically validated claim.

medium positive Cohesion as Concentration: Exclusion-Driven Fragility in Fin... observable diagnostics (recorded dissent patterns, voting gaps, scenario diversi...

The clearest added value of AI over structured self-reflection lies in increasing felt accountability.

Based on RCT comparisons showing no significant AI advantage over the written-reflection questionnaire on overall goal progress, but showing higher perceived social accountability in the AI condition and a significant mediation of the AI effect on progress via perceived accountability (indirect effect = 0.15, 95% CI [0.04, 0.31]).

medium positive AI-Assisted Goal Setting Improves Goal Progress Through Soci... perceived social accountability and resulting goal progress

AI-assisted goal setting can improve short-term (two-week) goal progress.

Aggregate interpretation based on the RCT finding that the AI condition outperformed the no-support control on two-week goal progress (d = 0.33, p = .016); two-week follow-up window specified in study.

medium positive AI-Assisted Goal Setting Improves Goal Progress Through Soci... short-term goal progress (self-reported at two weeks)

The AI increased perceived social accountability relative to the written-reflection questionnaire.

Reported comparison from the RCT showing higher perceived social accountability in the AI condition versus the written-reflection condition; measured via self-report scales at follow-up (exact scale and statistics reported in paper).

medium positive AI-Assisted Goal Setting Improves Goal Progress Through Soci... perceived social accountability (self-report)

JobMatchAI provides factor-wise explanations through resume-driven search workflows.

Paper states that the system gives factor-wise explanations and ties them to resume-driven workflows; the excerpt references interpretable reranking and demo artifacts but does not include user study or explanation-faithfulness metrics.

medium positive JobMatchAI An Intelligent Job Matching Platform Using Knowle... explainability: factor-wise explanations presented to users within resume-driven...

JobMatchAI optimizes utility across skill fit, experience, location, salary, and company preferences.

Paper claims the system's objective/utility function includes these factors and that the reranking/optimization accounts for them. No optimization algorithm details, weighting, or empirical utility gains are given in the excerpt.

medium positive JobMatchAI An Intelligent Job Matching Platform Using Knowle... aggregate utility across factors: skill fit, experience, location, salary, compa...

JobMatchAI is production-ready.

Paper explicitly describes JobMatchAI as "production-ready" and also claims a hosted website and installable package (artifacts consistent with deployment readiness). No formal certification, deployment metrics, or uptime/performance SLAs are provided in the excerpt.

medium positive JobMatchAI An Intelligent Job Matching Platform Using Knowle... production readiness (availability of deployable artifacts such as hosted site a...

For AI agent tool design, surfacing contextual information outperforms prescribing procedural workflows.

Authors' conclusion drawn from the suite of experiments (GraphRAG vs TDD prompting vs auto-improvement) showing better regression reduction and/or resolution when contextual information is surfaced.

medium positive TDAD: Test-Driven Agentic Development - Reducing Code Regres... effectiveness in reducing regressions and improving resolution when using contex...

An autonomous auto-improvement loop raised resolution from 12% to 60% on a 10-instance subset with 0% regression.

Reported experiment on a 10-instance subset where an auto-improvement loop was applied (numbers provided in the excerpt).

medium positive TDAD: Test-Driven Agentic Development - Reducing Code Regres... resolution rate (increase from 12% to 60%) and regression rate (reported as 0%) ...

Smaller models benefit more from contextual information (which tests to verify) than from procedural instructions (how to do TDD).

Inferred from comparative results across models (Qwen3-Coder 30B vs Qwen3.5-35B-A3B) and interventions (contextual test-surfacing vs TDD prompting) reported in the paper.

medium positive TDAD: Test-Driven Agentic Development - Reducing Code Regres... relative improvement in regression rate and resolution when providing contextual...

When deployed as an agent skill, GraphRAG improved resolution from 24% to 32%.

Empirical comparison reported in the evaluation on SWE-bench Verified (same experimental context as above).

medium positive TDAD: Test-Driven Agentic Development - Reducing Code Regres... resolution rate (percentage of issues/problems resolved)

TDAD's GraphRAG workflow reduced test-level regressions by 70% (from 6.08% to 1.82%).

Empirical result reported from the SWE-bench Verified evaluation using the GraphRAG workflow (sample details: Qwen3-Coder 30B on 100 instances and Qwen3.5-35B-A3B on 25 instances as reported).

medium positive TDAD: Test-Driven Agentic Development - Reducing Code Regres... test-level regression rate (percentage of tests that regressed)

The system is in production at Personize.ai.

Deployment statement in the paper asserting production use at Personize.ai.

medium positive Governed Memory: A Production Architecture for Multi-Agent W... deployment status (production at Personize.ai)

The LoCoMo result confirms that governance and schema enforcement impose no retrieval quality penalty.

Interpretation in the paper linking LoCoMo benchmark accuracy (74.8%) to the conclusion that governance/schema enforcement did not degrade retrieval quality.

medium positive Governed Memory: A Production Architecture for Multi-Agent W... inferred retrieval quality impact of governance/schema enforcement (no penalty)

Governed Memory implements a closed-loop schema lifecycle with AI-assisted authoring and automated per-property refinement.

Design description in the paper describing the closed-loop schema lifecycle and AI-assisted authoring/refinement.

medium positive Governed Memory: A Production Architecture for Multi-Agent W... schema lifecycle process including AI-assisted authoring and per-property refine...

Governed Memory uses reflection-bounded retrieval with entity-scoped isolation.

Design description in the paper specifying reflection-bounded retrieval and entity-scoped isolation.

medium positive Governed Memory: A Production Architecture for Multi-Agent W... retrieval strategy (reflection-bounded) and isolation scope (entity-scoped)

Governed Memory uses tiered governance routing with progressive context delivery.

Design description in the paper listing tiered governance routing and progressive delivery as mechanisms.

medium positive Governed Memory: A Production Architecture for Multi-Agent W... governance routing strategy (tiered) and context delivery method (progressive)

Governed Memory implements a dual memory model combining open-set atomic facts with schema-enforced typed properties.

Design specification within the paper describing the dual memory model (architectural mechanism).

medium positive Governed Memory: A Production Architecture for Multi-Agent W... memory model design: open-set atomic facts + schema-enforced typed properties

The paper presents Governed Memory, a shared memory and governance layer addressing the memory governance gap.

System architecture and design description in the paper (proposal of a shared memory and governance layer).

medium positive Governed Memory: A Production Architecture for Multi-Agent W... existence of an architecture called Governed Memory

A hybrid strategic–computational framework, supported by governance mechanisms (human-in-the-loop checkpoints, escalation paths, accountability structures), is motivated to manage tensions and ensure responsible decision-making in AI-rich managerial contexts.

Synthesis-driven prescriptive framework produced by cross-framework analysis; conceptual recommendation rather than implementation evidence.

medium positive Comparative analysis of strategic vs. computational thinking... presence and effectiveness of hybrid governance mechanisms in managing human–alg...

Roles oriented to information processing, optimisation, and operational precision (monitor, disseminator, resource allocator) are substantially enhanced by computational thinking (automation, optimisation, algorithmic decision-support).

Theoretical mapping of computational capabilities onto Mintzberg’s information-processing roles; conceptual reasoning without empirical validation.

medium positive Comparative analysis of strategic vs. computational thinking... enhancement in information-processing tasks (accuracy, speed, automation potenti...

AI adoption will shift fact-checking tasks (more monitoring, less rote verification), creating a need for reskilling and new roles (AI tool operators, analysts); donor and public investments should fund capacity building for local organizations.

Workforce implications inferred from interview reports about changing task mixes and the study's interpretive recommendations.

medium positive Fact-Checking Platforms in the Middle East: A Comparative St... changes in task allocation, workforce skills, and need for capacity-building inv...

Investments should prioritize hybrid models where automation provides scale and humans handle contextual, adversarial, and legally sensitive judgments.

Recommendation based on interview findings about AI benefits and limitations and the study's interpretive synthesis.

medium positive Fact-Checking Platforms in the Middle East: A Comparative St... verification effectiveness and error mitigation in workflows

The study distills context-sensitive best practices for fact-checking in restrictive environments, including safety protocols, local partnerships, and hybrid verification workflows.

Synthesis of findings from document analysis and interviews producing a set of recommended practices documented in the study's outputs.

medium positive Fact-Checking Platforms in the Middle East: A Comparative St... recommended operational practices for safety and verification effectiveness

AI can lower verification costs and scale reach by automating tasks such as classification, clustering, alerting, and translation.

Interview reports from platform staff and interpretive analysis identifying AI-assisted use cases for prioritization, monitoring, and translation.

medium positive Fact-Checking Platforms in the Middle East: A Comparative St... verification cost/time and monitoring/translation capacity

Community reporting and audience-focused formats are used to improve engagement.

Platform outputs and staff interviews describing deployment of community-reporting mechanisms and tailored audience formats.

medium positive Fact-Checking Platforms in the Middle East: A Comparative St... audience engagement

Platforms form partnerships with media outlets, academic institutions, and civil-society actors to amplify reach and secure data.

Interview accounts and organizational documents describing cross-sector partnerships and collaboration arrangements.

medium positive Fact-Checking Platforms in the Middle East: A Comparative St... audience reach and data access through partnerships

Transparent workflows and clear labeling are used to build credibility with audiences.

Document analysis of platform outputs and guidelines showing explicit workflow transparency and labeling practices, supported by interview statements.

medium positive Fact-Checking Platforms in the Middle East: A Comparative St... audience perceptions of credibility/trust

Platforms emphasize local-language expertise and culturally grounded sourcing as a strategy to improve verification and credibility.

Observed practices and platform guidelines derived from document analysis and staff interviews describing the use of local-language expertise and sourcing.

medium positive Fact-Checking Platforms in the Middle East: A Comparative St... verification quality and perceived credibility

Investment choices in collaboration AI and digital infrastructure become central strategic decisions affecting firms' comparative advantage.

Management literature synthesis and illustrative multinational cases; argument is conceptual without firm‑level comparative empirical data presented in the paper.

medium positive The Sociology of Remote Work and Organisational Culture: How... firm comparative advantage; strategic investment in AI/digital infrastructure

AI collaboration tools (virtual assistants, meeting summarizers, asynchronous platforms) complement hybrid work by reducing coordination costs and supporting dispersed teamwork.

Conceptual integration of technology and organizational literature; supported by illustrative case examples of multinational organizations but not by new quantitative causal evidence.

medium positive The Sociology of Remote Work and Organisational Culture: How... coordination costs; dispersed teamwork effectiveness

Hybrid and remote work increase employee autonomy and work–life integration.

Conceptual synthesis of sociological and management literatures; supported by secondary data and illustrative case studies from multinational organizations. No primary quantitative analysis or sample size reported—based on comparative case illustrations and theoretical integration.

medium positive The Sociology of Remote Work and Organisational Culture: How... employee autonomy; work–life integration

Generative AI functions as a socio‑technical intermediary that facilitates interpretation, coordination, and decision support rather than merely automating discrete tasks.

Thematic analysis and co‑word linkage between terms related to interpretative work, coordination, and decision‑support and technical GenAI terms within the corpus.

medium positive Generative AI and the algorithmic workplace: a bibliometric ... portrayal of GenAI role in organisational processes (socio‑technical intermediar...

The literature indicates a managerial shift away from hierarchical command‑and‑control toward guide‑and‑collaborate paradigms, where managers curate, guide, and coordinate AI‑augmented teams rather than micro‑manage tasks.

Synthesis of themes from the 212‑paper corpus (co‑word and thematic analyses) showing recurrent managerial/behavioural concepts such as autonomy, coordination, and decision‑support tied to GenAI discussions.

medium positive Generative AI and the algorithmic workplace: a bibliometric ... reported dominant managerial paradigm in the literature (guide‑and‑collaborate v...

Economic models of firm behavior and market microstructure should incorporate endogenous, adaptive segmentation processes and faster feedback loops enabled by human–AI systems; ABS and large‑scale interaction data can be used to calibrate such models.

Methodological recommendation grounded in the study's mixed‑methods findings (ABS experiments and 150M interaction dataset) and observed differences between autopoietic and traditional STP regimes.

medium positive The Algorithmic Canvas: On the Autopoietic Redefinition of S... modeling approaches and measurement strategies for firm behavior (recommendation...

Canvas Design Principles mitigate algorithmic myopia (overfitting to historical patterns) and improve adaptability and resource efficiency.

Set of design principles proposed in the paper and evaluated through agent‑based simulation scenarios and analyses of the large behavioral dataset. Specific experimental details and quantitative effect sizes for these principles are not detailed in the summary.

medium positive The Algorithmic Canvas: On the Autopoietic Redefinition of S... algorithmic myopia (reduction) and adaptability/resource efficiency

Reconceptualizing STP as an autopoietic (self‑organizing) system enables continuous human–AI co‑creation and yields better outcomes in unstable markets than traditional, process‑based STP.

Conceptual argument grounded in 6‑month lab ethnography (n = 23), design and deployment of the Algorithmic Canvas in that lab context, and validation via large behavioral dataset analyses and agent‑based simulations.

medium positive The Algorithmic Canvas: On the Autopoietic Redefinition of S... overall STP effectiveness/adaptability/resilience in unstable markets

Algorithmic co‑creation methods detect substantial market fluctuations about 5.8× better than traditional approaches.

Computational analysis of large behavioral dataset (150 million customer interactions) and comparative performance evaluation in empirically grounded agent‑based simulations. The detection metric and statistical significance details are not provided in the summary.

medium positive The Algorithmic Canvas: On the Autopoietic Redefinition of S... signal detection performance for market fluctuations (relative improvement facto...

The autopoietic model shortens strategic planning cycle length by approximately 90%.

Observed/recorded time‑to‑update or strategy revision metrics gathered via Algorithmic Canvas usage and lab ethnography (6‑month lab ethnography inside a Fortune 500 company, n = 23). Exact measurement protocol and whether reduction measured in live firms, simulations, or system logs is not fully detailed in the summary.

medium positive The Algorithmic Canvas: On the Autopoietic Redefinition of S... strategic planning cycle length (time to update/strategy revision)

Design and policy interventions that encourage active human contributions (e.g., draft-first workflows, co-creation interfaces, training) can help preserve worker agency and mitigate psychological costs.

Recommendation based on experimental evidence that Active-collaboration preserved psychological outcomes relative to passive use; presented as policy/design prescription rather than directly tested intervention at scale.

medium positive Relying on AI at work reduces self-efficacy, ownership, and ... inferred mitigation of psychological harms (not directly measured at firm scale)

A complementary real-world survey (N = 270) across diverse tasks reproduced the experimental pattern, suggesting external validity beyond the lab writing tasks.

Cross-sectional survey of N = 270 respondents reporting on their AI use across multiple task types; reported patterns consistent with the experiment (passive use associated with lower efficacy/ownership/meaningfulness; active collaborative use did not).

medium positive Relying on AI at work reduces self-efficacy, ownership, and ... self-reported relationships between AI-use mode and psychological outcomes (self...

Effective teams tend to evolve from ad-hoc interpretive methods toward systematic evaluation by (a) formalizing prompts/tests, (b) instrumenting outputs, (c) mapping failure modes to remediation paths, and (d) creating organizational decision rules.

Pattern observed in the qualitative coding of interviews where participants described trajectories or steps their teams took to formalize evaluation.

medium positive Results-Actionability Gap: Understanding How Practitioners E... process maturity in evaluation practices (ad-hoc to systematic)

Successful teams close the results-actionability gap by systematizing interpretive practices and creating clearer pathways from evaluation signals to product changes.

Interview accounts and cross-case analysis showing some teams adopting formalization steps (e.g., standardized prompts/tests, instrumentation, remediation mappings) that participants described as enabling action.

medium positive Results-Actionability Gap: Understanding How Practitioners E... degree to which evaluation leads to implemented product changes

Prioritizing asymmetrical responsibility may justify constraints on certain AI deployments (e.g., in care), shifting welfare analyses to incorporate dignity, vulnerability, and non-quantifiable harms.

Policy and normative recommendation grounded in Levinasian ethics and illustrative domain examples; no formal welfare model or empirical policy evaluation in the paper.

medium positive Examining ethical challenges in human–robot interaction usin... policy justification for constraints on AI deployments and inclusion of dignity/...

Emmanuel Levinas’s notion of infinite, asymmetrical responsibility to the Other provides a more incisive framework than pluralist balancing for diagnosing and responding to responsibility gaps in hybrid human–robot assemblages.

Normative-philosophical argumentation and interdisciplinary synthesis; illustrated with qualitative vignettes/case studies from healthcare robotics, autonomous vehicles, and algorithmic governance. No quantitative data or formal empirical test.

medium positive Examining ethical challenges in human–robot interaction usin... effectiveness of ethical framework in diagnosing/responding to responsibility ga...

Adoption of AI feedback could lower marginal costs of delivering high-quality feedback and change fixed vs. variable cost structures for instruction delivery.

Economic implication discussed by workshop participants (50 scholars) as a theoretical possibility; no quantitative cost estimates in the report.

medium positive The Future of Feedback: How Can AI Help Transform Feedback t... marginal cost per unit of feedback; changes in fixed/variable cost composition

Generative AI can enable new feedback modalities (text, hints, worked examples, formative prompts) adaptable to content and learner needs.

Thematic conclusions from the interdisciplinary meeting of 50 scholars, describing possible modality generation capabilities of current generative models; no empirical modality-comparison data provided.

medium positive The Future of Feedback: How Can AI Help Transform Feedback t... variety of feedback modalities produced; adaptability of modality to content/lea...

Immediate AI-generated feedback may sustain learner momentum and improve formative assessment cycles (timeliness & engagement).

Expert-opinion synthesis from structured workshop (50 scholars) identifying timely feedback as a potential pedagogical benefit; no empirical trials reported.

medium positive The Future of Feedback: How Can AI Help Transform Feedback t... learner engagement; tempo of formative assessment cycles; short-term task comple...

« Prev 1 2 3 … 38 39 40 … 60 61 Next »