Evidence (6491 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Human Ai Collab Remove filter

Across 21 scientific problems spanning six domains, SimpleTES discovers state-of-the-art solutions using gpt-oss models.

Empirical experiments reported across 21 problems in six domains using gpt-oss models (paper states 21 problems).

high positive Evaluation-driven Scaling for Scientific Discovery ability to discover state-of-the-art solutions (solution quality / discovery suc...

We introduce Simple Test-time Evaluation-driven Scaling (SimpleTES), a general framework that strategically combines parallel exploration, feedback-driven refinement, and local selection.

Methodological contribution described in the paper (framework design and algorithmic description).

high positive Evaluation-driven Scaling for Scientific Discovery framework design combining parallel exploration, feedback-driven refinement, and...

We propose seven interface primitives operationalizing verification-centered HCI.

Design contribution: specification of seven interface primitives within the paper (conceptual/design proposal); no user-study or empirical validation reported.

high positive The Instrumental Dissolution of Typing: Why AI Challenges th... existence and specification of interface primitives for verification-centered HC...

We map synthetic literacy -- oral input generating literate output -- as the defining feature of this transition.

Conceptual mapping and theoretical framing within the paper; supported by examples from technology trends but no empirical evaluation reported.

high positive The Instrumental Dissolution of Typing: Why AI Challenges th... emergence of synthetic literacy (oral-to-literate workflows)

Knowledge workers become adversarial auditors rather than keystroke-producers.

Projected role-shift based on the verification-bottleneck thesis and interdisciplinary supporting arguments; no empirical longitudinal workforce study reported.

high positive The Instrumental Dissolution of Typing: Why AI Challenges th... dominant work tasks/roles of knowledge workers (generation vs. auditing)

The central contribution identifies the verification bottleneck: as AI collapses production friction, the primary constraint shifts from generation to evaluation.

Theoretical argument supported by literature synthesis across multiple fields; no direct experimental quantification provided.

high positive The Instrumental Dissolution of Typing: Why AI Challenges th... relative constraint: generation vs. evaluation (verification) in knowledge work

We contribute design guidelines for specialized AI and articulate a vision for 'ecosystem-aware' Humble AI.

Paper's stated contributions (design guidelines and conceptual vision) described in the abstract.

high positive Learning from AVA: Early Lessons from a Curated and Trustwor... design guidance / conceptual framework

Qualitatively, participants used AVA as a specialized 'evidence engine'; reasoned abstention clarified scope boundaries, and trust was calibrated through institutional provenance and page-anchored citations.

Qualitative findings from surveys and 20 interviews reported in the paper (participant quotations and thematic analysis implied in abstract).

high positive Learning from AVA: Early Lessons from a Curated and Trustwor... user behavior and trust calibration (use as evidence engine; role of abstention ...

Difference-in-Differences estimates associate sustained engagement with 2.4-3.9 hours saved weekly.

Quantitative claim reported in the paper based on Difference-in-Differences analysis of usage/engagement data from the evaluation (implicit sample drawn from the >2,200 participants).

high positive Learning from AVA: Early Lessons from a Curated and Trustwor... time saved per week

AVA operationalizes epistemic humility through two mechanisms: citation verifiability (tracing claims to sources) and reasoned abstention (declining unsupported queries with justification and redirection).

Design claim describing implemented mechanisms in the platform; described in the paper as operational features.

high positive Learning from AVA: Early Lessons from a Curated and Trustwor... epistemic humility operationalization (citation verifiability and reasoned abste...

AVA's multi-agent pipeline enables users to query and receive evidence-based syntheses.

System design and capability claim in the paper (description of multi-agent pipeline producing evidence-based syntheses).

high positive Learning from AVA: Early Lessons from a Curated and Trustwor... output: evidence-based syntheses

AVA is a GenAI platform built on a curated library of over 4,000 World Bank Reports with multilingual capabilities.

System description provided in the paper; statement of dataset size and functionality (library count and multilingual support).

high positive Learning from AVA: Early Lessons from a Curated and Trustwor... system corpus size / multilingual capability

Code-generating Artificial Intelligence has gained popularity within both professional and educational programming settings over the past several years.

Background statement in the paper's introduction (observational claim about recent trends in AI adoption).

high positive Fast and Forgettable: A Controlled Study of Novices' Perform... adoption/popularity of code-generating AI

The emotional effect of the human teammate was significantly more positive and arousing compared to working with Copilot.

Subjective emotion measures (valence/arousal) collected in the study; reported significant differences favoring human teammate on positivity and arousal (n=22).

high positive Fast and Forgettable: A Controlled Study of Novices' Perform... emotional valence and arousal during task

Several dimensions of participants' workload were significantly reduced when using GitHub Copilot.

Subjective workload measures collected during the experiment; multiple workload dimensions reported as significantly lower in the Copilot condition (n=22).

high positive Fast and Forgettable: A Controlled Study of Novices' Perform... subjective workload (multiple dimensions)

Participants performed significantly better with GitHub Copilot than with their human teammate.

Experimental comparison of task performance between Copilot-assisted individual condition and human pair condition; statistical significance reported in results (sample size n=22).

high positive Fast and Forgettable: A Controlled Study of Novices' Perform... programming performance on timed Python tasks

Evaluation demonstrates speed improvements of 6-7 minutes over traditional methods.

Reported empirical timing result in paper abstract: 6-7 minutes (presumably time to validate a change) compared to traditional methods (no further detail or sample size in abstract).

high positive Aether: Network Validation Using Agentic AI and Digital Twin validation time (speed)

Evaluation demonstrates diagnostic coverage of 92-96%.

Reported empirical range in paper abstract (92-96% diagnostic coverage over evaluated cases; specific n not provided in abstract).

high positive Aether: Network Validation Using Agentic AI and Digital Twin diagnostic coverage

Evaluation demonstrates promising results in error detection (100%).

Reported empirical result in paper abstract: 100% error detection over evaluated scenarios (no sample size given in abstract).

high positive Aether: Network Validation Using Agentic AI and Digital Twin error detection rate

By orchestrating agent collaboration atop this digital twin, Aether enables automated, rapid network change validation while reducing manual effort, minimizing errors, and improving operational agility and cost-effectiveness.

High-level claim supported by system design and subsequent empirical evaluation reported in paper (evaluation details referenced in abstract).

high positive Aether: Network Validation Using Agentic AI and Digital Twin automation, manual effort, error rates, operational agility, cost-effectiveness

Aether agents use a unified Network Digital Twin integrating modeling, simulation, and emulation to maintain a consistent, up-to-date network view for verification and testing.

Design claim describing the digital twin's capabilities (modeling, simulation, emulation) as part of the system; presented in paper text.

high positive Aether: Network Validation Using Agentic AI and Digital Twin consistency and freshness of network view for verification/testing

Aether features an agentic architecture with five specialized Network Operations AI agents that collaboratively handle the change validation lifecycle from intent analysis to network verification and testing.

System architecture claim in paper describing five specialized agents (design specification; no empirical sample size).

high positive Aether: Network Validation Using Agentic AI and Digital Twin architectural decomposition into five agents

Aether integrates Generative Agentic AI with a multi-functional Network Digital Twin to automate and streamline network change validation workflows.

Paper describes Aether system design and architecture combining agentic AI and a digital twin (design-level claim; architectural description).

high positive Aether: Network Validation Using Agentic AI and Digital Twin automation/streamlining of change validation workflows

A common response to these worries stresses that the goods derived from work can be found elsewhere, often in better activities, suggesting that the proliferation of AI-powered automation does not threaten the meaningfulness of people’s lives.

Description of a commonly offered counterargument in the literature and popular debate (conceptual/literature-summary; no empirical data or sample reported).

high positive Is artificial intelligence a threat to meaningful work and l... argument that non-work activities can replace meaning from work (impact on meani...

The study uses a combination of cognitive systems theory, diplomatic negotiation models, and empirical Human-in-the-Loop experiments as its methodological basis.

Methods description in the paper listing theoretical foundations and empirical HITL experiments as components of the study design.

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... methodological approach (integration of theory and HITL experiments)

The paper outlines recommendations for international norm development, capacity building, and the creation of interoperable, transparent AI systems for diplomacy.

Policy recommendation section of the paper proposing international norms, capacity-building measures, and interoperable transparent system design.

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... policy recommendations proposed (norm development, capacity building, interopera...

Experimental HITL data indicate a 17% reduction in cognitive bias for hybrid human-AI teams.

Human-in-the-Loop (HITL) experiments reported in the paper; comparison of cognitive bias measures between hybrid teams and baseline (sample size not provided in summary).

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... cognitive bias (reduction)

Experimental HITL data indicate that hybrid human-AI teams achieved 23% faster consensus-building.

Human-in-the-Loop (HITL) experiments reported in the paper; experimental comparison between hybrid human-AI teams and baseline (details on sample size not reported in summary).

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... time to consensus (consensus-building speed)

The framework is validated through real-world and simulated case studies, including UN ceasefire mediation, EU sentiment-monitoring for conflict diplomacy, and African Union peacekeeping planning.

Validation reported via a set of real-world and simulated case studies described in the paper (case study methodology; specific cases named).

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... case-study-based validation of framework applicability

Each layer augments a core dimension of diplomatic reasoning, enabling interpretable AI contributions, foresight analysis, culturally sensitive framing, and legally compliant outputs.

Conceptual mapping of each proposed layer to functional capabilities described in the paper; claimed alignment with interpretability, foresight, cultural framing, and legal compliance.

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... interpretability, foresight analysis, culturally sensitive framing, legal compli...

The study proposes a five-layer Human-AI collaboration architecture tailored to multilateral diplomacy consisting of: (1) Context Modeling, (2) Scenario Generation, (3) Cognitive Interfacing, (4) Decision Support, and (5) Ethical-Normative Governance.

Architectural proposal in the paper based on synthesis of literature and design choices; claimed as the output of the conceptual framework.

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... definition of five-layer architecture (components enumerated)

This paper develops the concept of Artificial Diplomacy as a structured interface between human strategic cognition and machine-supported reasoning.

Theoretical development drawing on cognitive systems theory and diplomatic negotiation models; described design and conceptual argumentation in the paper.

high positive Strategic Cognition and Artificial Diplomacy: Designing Huma... conceptualization of 'Artificial Diplomacy' (design of an interface)

These divergences (between simulation and human data and across scenarios) provide crucial insights for the future design of human-centered AI agents.

Paper conclusion in abstract indicating practical implications and discussion of how divergences vary across contexts and what that implies for design.

high positive Imperfectly Cooperative Human-AI Interactions: Comparing the... design_implications

With actual human subjects, AI attributes—particularly transparency—were much more impactful than personality traits.

Abstract reporting results from the human-subjects experiment (N=290) indicating AI attributes, especially chain-of-thought transparency, had greater impact.

high positive Imperfectly Cooperative Human-AI Interactions: Comparing the... relative_influence_on_outcomes (AI_attributes_vs_personality)

In simulation experiments, personality traits and AI attributes were comparatively influential on outcomes.

Abstract claim summarizing simulation experiment results (based on the 2,000 simulated runs) that personality and AI attributes were influential.

high positive Imperfectly Cooperative Human-AI Interactions: Comparing the... influence_on_interaction_outcomes

Policymakers can reinforce these conditions by shifting from technology-neutral principles to auditable process standards that couple AI investment with reskilling and data-quality obligations.

Policy recommendation based on the study's findings and synthesis; presented as a normative implication rather than empirically tested within the study. (Sample size not reported.)

high positive Overcoming Resistance to Change: Artificial Intelligence in ... policy effectiveness in reinforcing safe, equitable AI adoption

Leaders should fund training coverage and design (not just headline hours), equip non-specialists to interpret model outputs, pair performance artefacts with participatory routines, and treat explainability as a usability requirement to achieve durable, auditable value in safety-critical energy contexts.

Prescriptive recommendation based on a 'field-tested playbook' synthesised from the multi-case qualitative study (interviews, surveys, documents). The claim is drawn from authors' interpretation of cross-case patterns rather than causal inference. (Sample size not reported.)

high positive Overcoming Resistance to Change: Artificial Intelligence in ... durable, auditable value / legitimacy and sustained use

Structured upskilling and precise recourse mechanisms are associated with higher confidence, productivity, and clearer sustainability pathways.

Observed association in multi-case qualitative data: interviews, staff/manager surveys, and policy documents; triangulated through thematic coding and cross-case synthesis. (Sample size not reported.)

high positive Overcoming Resistance to Change: Artificial Intelligence in ... worker confidence and productivity; clarity of sustainability pathways

A tight workflow fit that minimises cognitive overhead at the decision point accelerates legitimate use and strengthens links to emissions monitoring and predictive-maintenance outcomes.

Synthesised from interviews, Likert-scale surveys of technical staff and managers, and internal workflow/policy documents across multiple cases in the energy sector. (Sample size not reported.)

high positive Overcoming Resistance to Change: Artificial Intelligence in ... rate of legitimate use (adoption) and effectiveness of emissions monitoring and ...

Communicative governance — e.g. model cards, bias tests, validation reports, and explicit appeal rights — earns trust, curbs shadow workarounds, and improves safety culture.

Reported from thematic coding of interviews, surveys of staff and managers, and documentary evidence across multiple cases; triangulation claimed. (Sample size not reported.)

high positive Overcoming Resistance to Change: Artificial Intelligence in ... trust, incidence of shadow workarounds, and safety culture

Broad-based capability building beyond specialist teams prevents benefits from concentrating in expert enclaves and reduces brittle scale.

Derived from cross-case thematic synthesis of interviews, Likert surveys of mid-level managers and technical staff, and internal policy/strategy document analysis (multi-case qualitative evidence). (Sample size not reported.)

high positive Overcoming Resistance to Change: Artificial Intelligence in ... distribution of benefits across organisation and scalability of AI use

Three reinforcing levers shape adoption outcomes: (1) broad-based capability building beyond specialist teams, (2) communicative governance that couples transparency with contestability, and (3) a tight workflow fit that minimises cognitive overhead at the decision point.

Qualitative, multi-case design triangulating a semi-structured interview with a senior manager, Likert-scale surveys of mid-level managers and technical staff, and analysis of internal policies and strategy documents; thematic coding with intercoder reliability and cross-case synthesis. (Sample size not reported.)

high positive Overcoming Resistance to Change: Artificial Intelligence in ... adoption outcomes / legitimate use

The framework demonstrates how digital intelligence can enhance supply chain resilience while supporting, rather than replacing, human decision-making (human-centric/planner-centered decision support).

Framework design emphasizes human-centric decision support; field deployment reported to be planner-centered (paper claims support rather than replacement of human decision-making).

high positive Enhancing Supply Chain Resilience in Textile SMEs: A Human-C... human-centric support vs. automation replacing planners

The results indicate that upstream textile SMEs can leverage publicly visible e-commerce signals to enhance production planning responsiveness, minimize inventory exposure and dye-lot disruptions, and strengthen resilience to demand uncertainty through planner-centered digital decision support.

Synthesis claim based on model results, validation of comment volume as sales proxy, Monte Carlo-based production guidance, decision dashboard design, and the 12-month field study outcomes.

high positive Enhancing Supply Chain Resilience in Textile SMEs: A Human-C... production planning responsiveness, inventory exposure, dye-lot disruptions, res...

This research extends the C2M paradigm from downstream retail contexts to upstream textile SMEs and proposes an integrated and operationally feasible intelligence framework for resource-constrained manufacturers.

Conceptual claim supported by the methodological development, large-scale e-commerce data modeling, and a field deployment at one SME reported in paper.

high positive Enhancing Supply Chain Resilience in Textile SMEs: A Human-C... extension and operational feasibility of C2M paradigm for upstream textile SMEs

In the same 12-month field study, implementation resulted in a 16% increase in capacity utilization.

Field deployment measurements reported in paper for one Taiwanese dyeing SME over 12 months.

high positive Enhancing Supply Chain Resilience in Textile SMEs: A Human-C... capacity utilization

In the same 12-month field study, implementation resulted in a 31% decrease in dye lot changeovers.

Field deployment measurements reported in paper for one Taiwanese dyeing SME over 12 months.

high positive Enhancing Supply Chain Resilience in Textile SMEs: A Human-C... number of dye lot changeovers

In a 12-month field study at a Taiwanese dyeing SME, implementation resulted in a 28% reduction in inventory value.

Field deployment and before-after (or intervention) measurement reported in paper over 12 months at one Taiwanese dyeing SME.

high positive Enhancing Supply Chain Resilience in Textile SMEs: A Human-C... inventory value

Forecasts were translated into production guidance using Monte Carlo simulation and a decision dashboard.

Description of operationalization methods in paper: Monte Carlo simulation and a planner-facing decision dashboard used to convert forecasts into production guidance.

high positive Enhancing Supply Chain Resilience in Textile SMEs: A Human-C... operational production guidance derived from forecasts (method implementation)

Consumer comment volume was validated as a proxy for sales activity, facilitating demand estimation.

Validation analysis reported in paper linking consumer comment volume to sales activity (methodological validation; specific statistical details not provided in abstract).

high positive Enhancing Supply Chain Resilience in Textile SMEs: A Human-C... validity of consumer comment volume as proxy for sales activity

« Prev 1 2 3 … 71 72 73 … 129 130 Next »