Evidence (3103 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	378	106	59	455	1007
Governance & Regulation	379	176	116	58	739
Research Productivity	240	96	34	294	668
Organizational Efficiency	370	82	63	35	553
Technology Adoption Rate	296	118	66	29	513
Firm Productivity	277	34	68	10	394
AI Safety & Ethics	117	177	44	24	364
Output Quality	244	61	23	26	354
Market Structure	107	123	85	14	334
Decision Quality	168	74	37	19	301
Fiscal & Macroeconomic	75	52	32	21	187
Employment Level	70	32	74	8	186
Skill Acquisition	89	32	39	9	169
Firm Revenue	96	34	22	—	152
Innovation Output	106	12	21	11	151
Consumer Welfare	70	30	37	7	144
Regulatory Compliance	52	61	13	3	129
Inequality Measures	24	68	31	4	127
Task Allocation	75	11	29	6	121
Training Effectiveness	55	12	12	16	96
Error Rate	42	48	6	—	96
Worker Satisfaction	45	32	11	6	94
Task Completion Time	78	5	4	2	89
Wages & Compensation	46	13	19	5	83
Team Performance	44	9	15	7	76
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	18	17	9	5	50
Job Displacement	5	31	12	—	48
Social Protection	21	10	6	2	39
Developer Productivity	29	3	3	1	36
Worker Turnover	10	12	—	3	25
Skill Obsolescence	3	19	2	—	24
Creative Output	15	5	3	1	24
Labor Share of Income	10	4	9	—	23

Human Ai Collab Remove filter

Large language models (LLMs) and agentic systems have shown promise for automated software development.

Statement in paper referencing prior successes of LLMs and agentic systems for automated software development (no empirical data reported in this excerpt).

high positive Skilled AI Agents for Embedded and IoT Systems Development automation-assisted software development capability

Trained participants more often assigned tasks to the agent by defining strategies compared to participants who did not receive teamwork training.

Behavioral measure in experiment (frequency of assigning tasks using defined strategies) comparing trained vs. untrained participants in the KeyWe game with a scripted agent.

high positive Teaming Up With an AI Agent: Training Humans to Develop Huma... frequency_of_strategy-based_task_assignment

Participants who received the training delegated a higher percentage of tasks to the agent than participants who did not receive teamwork training.

Between-subjects comparison in KeyWe testbed with a scripted agent; measured percentage of tasks delegated by participants in trained vs. untrained groups.

high positive Teaming Up With an AI Agent: Training Humans to Develop Huma... percentage_of_tasks_delegated_to_agent

A HAT training intervention that took less than 30 minutes was developed to train humans on seven teamwork competencies.

Study description: developed a training intervention under 30 minutes targeting seven teamwork competencies; implemented as part of the experiment.

high positive Teaming Up With an AI Agent: Training Humans to Develop Huma... training_duration_and_content (existence of <30 min training on seven competenci...

The largest gains appear when AI is embedded in an orchestrated workflow rather than deployed as an isolated coding assistant.

Central thesis supported by comparisons across five delivery configurations (traditional baseline and V1–V4) in a retrospective longitudinal field study of the Chiron platform applied to three real software modernization programs; authors observe greater portfolio-level improvements when AI is integrated into coordinated workflows.

high positive Orchestrating Human-AI Software Delivery: A Retrospective Lo... aggregate team/organizational performance (speed, coverage, issue load) when AI ...

V3 and V4 add acceptance-criteria validation, repository-native review, and hybrid human-agent execution, simultaneously improving speed, coverage, and issue load.

Observed differences across the five delivery configurations (baseline, V1–V4) in the field study of three modernization programs; authors link feature additions in V3/V4 to measured improvements in stage durations, coverage, and validation issues.

high positive Orchestrating Human-AI Software Delivery: A Retrospective Lo... stage durations (speed), first-release coverage, validation-stage issue load

First-release coverage rises from 77.0% to 90.5% across the portfolio as platform versions progress.

Observed first-release coverage measured in the retrospective longitudinal field study of three real modernization programs, reported as percentages across delivery configurations.

high positive Orchestrating Human-AI Software Delivery: A Retrospective Lo... first-release coverage (percent of tasks covered on first release)

Validation-stage issue load falls from 8.03 to 2.09 issues per 100 tasks across the portfolio as platform versions progress.

Observed outcomes from the retrospective field study on three programs; validation-stage issues counted and normalized per 100 tasks across delivery configurations.

high positive Orchestrating Human-AI Software Delivery: A Retrospective Lo... validation-stage issues per 100 tasks

Modeled senior-equivalent effort falls from 1080.0 to 139.5 SEE-days under the platform configurations studied.

Modeled senior-equivalent effort computed from the study's staffing scenarios and observed outputs across the three real programs.

high positive Orchestrating Human-AI Software Delivery: A Retrospective Lo... senior-equivalent effort (SEE-days)

Modeled raw effort falls from 1080.0 to 232.5 person-days under the platform configurations studied (baseline -> V4 aggregate).

Modeled outcomes computed from observed task volumes and explicit staffing scenarios in the retrospective longitudinal field study covering three real programs.

high positive Orchestrating Human-AI Software Delivery: A Retrospective Lo... raw effort (person-days)

Portfolio totals move from 36.0 to 9.3 summed project-weeks under baseline staffing assumptions (across the three studied programs and five delivery configurations).

Retrospective longitudinal field study of the Chiron platform applied to three real software modernization programs (COBOL banking migration ~30k LOC, accounting modernization ~400k LOC, .NET/Angular mortgage modernization ~30k LOC); observed and modeled outcomes were aggregated to produce portfolio totals under explicit staffing scenarios.

high positive Orchestrating Human-AI Software Delivery: A Retrospective Lo... summed project-weeks (portfolio time)

Because instructional signals are usable only when the learner has acquired the prerequisites needed to parse them, the effective communication channel depends on the learner's current state of knowledge and becomes more informative as learning progresses.

Theoretical consequence derived from the model's prerequisite-structure assumption and sequential teaching formalization (as described in the abstract).

high positive A Mathematical Theory of Understanding informativeness of communication / effectiveness of instruction over time

Generative AI has transformed the economics of information production, making explanations, proofs, examples, and analyses available at very low cost.

Statement in paper (intro/abstract) asserting an empirical/observational fact about generative AI; no empirical sample or data reported in the abstract.

high positive A Mathematical Theory of Understanding cost of information production / availability of informational artifacts

These results highlight the importance of trustworthy AI mediation tools in contexts where not only truth, but also trust and confidence matter.

Policy/recommendation based on experimental findings that AI mediation lowers perceived trust and confidence even when accuracy is unchanged.

high positive Through the Looking-Glass: AI-Mediated Video Communication R... need for trustworthy AI mediation (recommendation)

Reinforcement learning (post-training) on our corpus improves downstream embodied manipulation performance.

Downstream evaluation described in the paper showing improved performance on embodied manipulation tasks after RL post-training on MultihopSpatial-Train.

high positive MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... embodied manipulation task performance

Reinforcement learning (post-training) on our MultihopSpatial-Train corpus enhances intrinsic VLM spatial reasoning.

Experimental intervention: RL-based post-training on the authors' training corpus followed by evaluation on intrinsic spatial reasoning benchmarks (described in the paper).

high positive MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... intrinsic spatial reasoning performance of VLMs

We provide MultihopSpatial-Train, a dedicated large-scale training corpus intended to foster spatial intelligence in VLMs.

Dataset/resource contribution described in the paper (existence and intended use of MultihopSpatial-Train).

high positive MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... training resource availability for spatial intelligence

We propose Acc@50IoU, a complementary metric that simultaneously evaluates reasoning and visual grounding by requiring both answer selection and precise bounding box prediction.

Methodological contribution in the paper defining the Acc@50IoU metric and its intended use to measure combined answer correctness and bounding-box IoU >= 0.5.

high positive MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... combined answer accuracy and box localization (reasoning + visual grounding)

We introduce MultihopSpatial, a comprehensive benchmark designed for multi-hop and compositional spatial reasoning, featuring 1- to 3-hop complex queries across diverse spatial perspectives.

Dataset/benchmark construction described in the paper (design and scope of MultihopSpatial).

high positive MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... ability to evaluate multi-hop and compositional spatial reasoning

Spatial reasoning is foundational for Vision-Language Models (VLMs), particularly when deployed as Vision-Language-Action (VLA) agents in physical environments.

Conceptual/introductory statement in the paper motivating the work (literature-based argument about VLMs and VLA agents).

high positive MultihopSpatial: Multi-hop Compositional Spatial Reasoning B... spatial reasoning capability as a foundational requirement

An approach is needed focused on emerging and future interdependencies between professionals and generative machine learning, implying extending but also reimagining theoretical perspectives on expertise, work and organizations.

Paper's central argument based on theoretical reasoning and literature synthesis about generative ML characteristics and their implications for professionals; method: conceptual/theoretical development; no empirical sample.

high positive Generative machine learning in professional work and profess... interdependencies between professionals and generative ML; implications for theo...

Existing theories need to be extended whilst also responding to the distinctive characteristics of generative machine learning and the implications for how we theorize change.

Argumentative/theoretical claim in the paper based on comparison of features of generative ML with prior digital/algorithmic technologies; method: conceptual analysis and literature engagement; no empirical sample.

high positive Generative machine learning in professional work and profess... scope and adequacy of theoretical perspectives on organizational change

We develop an approach using insights from existing literature on digital, algorithmic and artificial intelligence technologies.

Paper's stated contribution: theoretical development based on synthesis of existing literature (digital, algorithmic, AI). Method: conceptual synthesis; no empirical testing or sample reported.

high positive Generative machine learning in professional work and profess... development of a theoretical approach/framework

There is a need for an approach to theorizing professional work and professional service firms in the generative machine learning age.

Conceptual argument presented in the paper (literature-based rationale); method is theoretical/literature review and argumentation; no empirical sample reported.

high positive Generative machine learning in professional work and profess... theorizing professional work / existence of a required theoretical approach

The technology particularly benefits less experienced practitioners by providing comprehensive starting points for legal research, while experienced attorneys can use it for quality control and initial drafts.

Authors' interpretation of AI outputs from the experiment and reasoning about how those outputs map onto different practitioner needs (qualitative judgment).

high positive Robot Wingman: Using AI to Assess an Employment Termination benefit to practitioners (training/assistance, drafting, quality control)

The analysis reveals AI’s potential to transform law firm economics by dramatically reducing research time while maintaining analytical quality, though careful attorney oversight remains essential.

Inference from the experimental finding that four AI systems produced substantive analysis comparable to junior-associate work on one transcript and the stated observation about traditional research time (8–40 hours); authors' qualitative judgment about economic implications and need for oversight.

high positive Robot Wingman: Using AI to Assess an Employment Termination law firm economics (research time reduction and analytical quality)

Statutory and regulatory citations proved generally accurate and useful.

Authors' examination of statutory and regulatory references produced by the four AI engines in the experiment, judged to be generally correct and helpful.

high positive Robot Wingman: Using AI to Assess an Employment Termination accuracy/usability of statutory and regulatory citations

All four engines successfully spotted legal issues, assessed claim strengths and weaknesses, and suggested follow-up investigation—tasks that traditionally required eight to forty hours of junior attorney research time.

Observed outputs from the four AI engines on the single transcript showing issue-spotting, strengths/weaknesses assessment, and suggested follow-ups; comparison to typical junior attorney research time (stated as 8–40 hours).

high positive Robot Wingman: Using AI to Assess an Employment Termination issue-spotting and assessment quality; implied time savings relative to traditio...

Contemporary generative AI performs sophisticated legal analysis comparable to experienced associates, correctly identifying major employment law claims including ADA violations, Title VII discrimination, OSHA retaliation, FMLA interference, and workers’ compensation retaliation.

Qualitative assessment of outputs from the four AI engines applied to the single hypothetical transcript; comparison against expected legal claims (authors' judgment that outputs matched those an experienced associate would produce).

high positive Robot Wingman: Using AI to Assess an Employment Termination ability to identify relevant legal claims and assess them

Four major generative AI engines—DeepSeek, Claude, ChatGPT, and Grok—are useful legal analysis tools for employment law practitioners.

Experimental evaluation in which a single hypothetical client interview transcript was submitted to each of the four AI systems and their outputs were assessed by the authors.

high positive Robot Wingman: Using AI to Assess an Employment Termination usefulness of AI as legal analysis tools (quality of analysis/output)

Organizational support and continuous learning are important to maximize the benefits of AI integration in startup environments.

Conclusions drawn from thematic analysis of interviews with 12 startup employees emphasizing need for organizational support and ongoing learning.

high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... role of organizational support and continuous learning in realizing AI benefits

AI functions as a workforce augmentation tool that enhances human capabilities rather than replacing employees.

Reported perceptions from 12 startup employees in semi-structured interviews; thematic coding indicated view of AI as augmentation rather than replacement.

high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... AI role relative to job displacement (augmentation vs replacement)

Most employees demonstrated progressive adjustment and competency improvement over time after initial adaptation.

Interview data from 12 startup employees with thematic analysis indicating progressive adjustment and competency gains over time.

high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... progressive adjustment and competency improvement over time

AI improves employee performance by supporting more accurate decision-making and increasing work effectiveness and output quality.

Findings from semi-structured interviews of 12 startup employees, analyzed via thematic coding and frequency scoring, reporting improved decision accuracy and output quality with AI support.

high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... decision-making accuracy, work effectiveness, output quality

AI integration contributes to competency development, particularly in digital literacy, analytical thinking, and adaptive learning.

Qualitative semi-structured interviews with 12 startup employees; thematic coding highlighted competencies (digital literacy, analytical thinking, adaptive learning).

high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... competency development (digital literacy, analytical thinking, adaptive learning...

AI significantly enhances employee productivity by accelerating task completion, reducing manual workload, and improving workflow efficiency.

Qualitative study using semi-structured interviews with 12 startup employees; data analyzed with thematic coding, frequency scoring, and visualized analysis.

high positive AI-AUGMENTED WORKFORCE: THE IMPACT OF ARTIFICIAL INTELLIGENC... employee productivity (task completion speed, manual workload, workflow efficien...

Structured intent representations (PPS) can improve alignment and usability in human–AI interaction, especially in tasks where user intent is inherently ambiguous.

Synthesis of experimental findings (rendered PPS better on goal_alignment overall, task-dependent gains concentrated in high-ambiguity business tasks) and the preliminary user survey.

high positive Evaluating 5W3H Structured Prompting for Intent Alignment in... alignment_and_usability

A preliminary retrospective survey (N = 20) suggests a 66.1% reduction in follow-up prompts required, from 3.33 to 1.13 rounds, when using PPS.

Authors report a small retrospective survey of N = 20 respondents comparing number of follow-up prompt rounds required before vs after adopting PPS (self-reported).

high positive Evaluating 5W3H Structured Prompting for Intent Alignment in... number_of_follow-up_prompt_rounds_required

We introduce goal_alignment, a user-intent-centered evaluation dimension, and find that natural-language-rendered PPS outperforms both simple prompts and raw PPS JSON on this metric.

Experimental comparison across the three prompt conditions using the goal_alignment evaluation dimension applied to the collected outputs (540 outputs across 60 tasks and 3 models), as judged by an LLM judge.

high positive Evaluating 5W3H Structured Prompting for Intent Alignment in... goal_alignment

A mixed-methods empirical research agenda is presented, proposing a future PLS-SEM approach to test the mediating role of the cognitive flywheel and the moderating effect of fractal governance on organizational resilience.

Methodological proposal described in the paper (research design and proposed analytic approach); no executed empirical study or sample reported.

high positive Governing Human–AI Co-Evolution: Intelligentization Capabili... organizational_resilience (as mediator/moderator relationships to be tested)

Fractal governance architecture is proposed to mitigate systemic vulnerabilities such as automation bias.

Conceptual proposal of a governance design in the paper; no empirical test or sample provided.

high positive Governing Human–AI Co-Evolution: Intelligentization Capabili... reduction_in_automation_bias / improvement_in_decision_quality

The cognitive flywheel is the central mechanism of this dynamic capability and can be operationalized (the paper operationalizes the cognitive flywheel).

Theoretical operationalization within the paper (concept definition and proposed operational measures); no empirical measurement or sample reported.

high positive Governing Human–AI Co-Evolution: Intelligentization Capabili... mechanism_operationalization (cognitive_flywheel)

The co-evolutionary dynamic is formalized using coupled non-linear differential equations and time decay integrals.

Mathematical formalization reported in the paper (modeling methods described); no empirical parameter estimation or sample provided.

high positive Governing Human–AI Co-Evolution: Intelligentization Capabili... existence_of_mathematical_model/formal_framework

Dynamic cognitive advantage arises from the historical, recursive, structural coupling of human semantic intent and machine syntactic processing (a co-evolutionary dynamic).

Conceptual theory introduced and argued in the paper (mechanism-level proposition); formalization provided but no empirical validation.

high positive Governing Human–AI Co-Evolution: Intelligentization Capabili... competitive_differentiation/innovation_output

Conceptualizing the enterprise as a complex adaptive system operating far from thermodynamic equilibrium provides a more appropriate framing for organizations integrating AI and enables the theory of dynamic cognitive advantage.

Theoretical development and conceptual argumentation within the paper; formal framing rather than empirical test; no sample reported.

high positive Governing Human–AI Co-Evolution: Intelligentization Capabili... competitive_differentiation/innovation_output

We propose a multi-agent discussion framework wherein specialized agents collaboratively process extensive product information, distributing cognitive load to alleviate single-agent attention bottlenecks and capturing critical decision factors through structured dialogue.

Method description: multi-agent discussion architecture described and implemented; claimed to distribute cognitive load and reduce single-agent attention bottlenecks (design + reported behavior).

high positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... reduction of single-agent attention bottlenecks / distributed processing of prod...

To enhance simulation stability, we implement a mean-field mechanism designed to model the dynamic interactions between the product environment and customer populations, effectively stabilizing sampling processes within high-dimensional decision spaces.

Method description: implementation of a mean-field mechanism within the simulator; paper asserts this design stabilizes sampling in high-dimensional decision spaces (method + reported simulation behavior).

high positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... simulation stability / stabilized sampling processes

We introduce a preference learning paradigm in which LLMs are economically aligned via post-training on extensive, heterogeneous transaction records across diverse product categories.

Method description: post-training LLMs on heterogeneous transaction records across product categories to align preferences (methodological / training procedure described).

high positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... ability of models to internalize consumer preferences via post-training

This paper introduces a Multi-Agent Large Language Model-based Economic Sandbox (MALLES) as a unified simulation framework applicable to cross-domain and cross-category scenarios.

Paper description: design and implementation of MALLES, presented as a unified framework leveraging large-scale LLM generalization for cross-domain/cross-category simulation (methodological contribution).

high positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... existence and applicability of MALLES as a unified simulation framework

Leaders' AI symbolization lessens AI's negative impact on employees' emotional exhaustion.

Moderation analysis in the four-stage longitudinal study of 285 finance professionals; leader AI symbolization tested as moderator of AI usage -> emotional exhaustion path.

high positive Autonomous enhancement or emotional depletion? The dual-path... emotional exhaustion (moderated by leaders' AI symbolization)

« Prev 1 2 3 … 20 21 22 … 62 63 Next »