Evidence (6491 claims)
Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 758 | 199 | 100 | 900 | 2007 |
| Governance & Regulation | 826 | 400 | 191 | 122 | 1563 |
| Organizational Efficiency | 777 | 193 | 124 | 84 | 1189 |
| Technology Adoption Rate | 635 | 233 | 124 | 97 | 1098 |
| Research Productivity | 422 | 128 | 57 | 336 | 954 |
| Output Quality | 476 | 179 | 59 | 47 | 761 |
| Decision Quality | 328 | 177 | 81 | 47 | 640 |
| Firm Productivity | 435 | 57 | 88 | 20 | 606 |
| AI Safety & Ethics | 218 | 277 | 65 | 33 | 599 |
| Market Structure | 180 | 170 | 123 | 24 | 502 |
| Task Allocation | 213 | 64 | 72 | 33 | 387 |
| Skill Acquisition | 170 | 61 | 61 | 17 | 309 |
| Innovation Output | 203 | 27 | 43 | 18 | 292 |
| Employment Level | 105 | 54 | 107 | 13 | 281 |
| Fiscal & Macroeconomic | 131 | 69 | 43 | 26 | 276 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 153 | 48 | 26 | 3 | 230 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 122 | 49 | 6 | 221 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 69 | 92 | 10 | 2 | 173 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 56 | 56 | 26 | 13 | 154 |
| Training Effectiveness | 94 | 21 | 13 | 19 | 149 |
| Wages & Compensation | 77 | 36 | 25 | 6 | 144 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 80 | 20 | 1 | 113 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 31 | 18 | 8 | 3 | 61 |
| Skill Obsolescence | 5 | 46 | 6 | 1 | 58 |
| Social Protection | 27 | 16 | 8 | 2 | 53 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Human Ai Collab
Remove filter
We introduce unbounded cognitive fusion (UCF) as a new theoretical framework explaining coordination through cognitive synthesis rather than price signals or authority structures.
Theoretical proposal and framing within the paper; conceptual development rather than empirical validation.
Generative artificial intelligence (GenAI) fundamentally alters [traditional organizational coordination] assumptions by augmenting human cognitive capabilities across organizational boundaries.
Position paper argumentation and conceptual reasoning presented in the abstract; no empirical data or sample reported.
New tendencies in managerial AI research and practice include explainable AI, human–AI collaboration, knowledge management, enterprise analytics, and algorithmic management.
Descriptive finding from the paper's literature synthesis (topics emphasized in the review); no quantitative prevalence or counts provided in the abstract.
Machine Learning and Deep Learning enhance employee productivity, business intelligence, process mining, and data-driven decision-making by enabling prediction, perception, and adaptive learning solutions.
Claim synthesized in the review from multiple studies identified via PRISMA screening; abstract does not list the number or identity of underlying empirical studies.
AI-based technologies can greatly enhance managerial efficiency by automating repetitive activities, improving resource allocation, enabling intelligent scheduling, and supporting predictive modelling and strategic planning.
Summary conclusion from the paper's literature review (PRISMA methodology referenced); no quantitative meta-analytic effect sizes provided in abstract.
Machine Learning, Artificial Intelligence, and Deep Learning are tools that can optimize managerial decisions, enable intelligent automation, streamline workflows, and improve organizational performance.
Synthesis claim from the paper's PRISMA-based literature review (no numeric sample size reported in the abstract).
Findings underscore the importance of robust evaluation frameworks for deploying VLMs in visually rich and safety-critical environments.
Synthesis/recommendation based on experimental results showing that visual inputs (images and colors) can influence VLM decisions and that mitigation effectiveness varies by model.
Policy frameworks, reskilling initiatives, and institutional adaptations are required to ensure inclusive technological progress.
Prescriptive conclusion presented in abstract based on the review and synthesis; no empirical validation or sample sizes provided in abstract.
AI simultaneously generates demand for higher-order problem solving, emotional intelligence, and human-AI collaboration skills.
Explicit finding reported in abstract from the review of interdisciplinary literature; no quantified effect sizes or sample sizes provided in abstract.
For memory workloads requiring stable facts and stateful computation, architecture matters more than retrieval scale or model strength alone.
Conclusion drawn by the authors based on comparative experimental results reported in the paper (xmemory vs retrieval/model-strength baselines); excerpt provides aggregate benchmark comparisons but not full experimental details.
On the application-level task, xmemory reaches 95.2% accuracy, outperforming specialised memory systems, code-generated Markdown harnesses, and customer-facing frontier-model application harnesses.
Empirical evaluation on an application-level task reported in the paper showing 95.2% accuracy for xmemory and claiming it outperforms several classes of alternative systems; excerpt lacks details on the task, dataset size, or baseline numeric results.
On the end-to-end memory benchmark, xmemory reaches 97.10% F1, compared with 80.16%-87.24% across the third-party baselines.
Empirical evaluation on the paper's end-to-end memory benchmark reporting F1 scores for xmemory and a range for third-party baselines; the excerpt does not provide dataset size or statistical significance details.
On the structured extraction benchmark (judge-in-the-loop configuration) the system reaches 90.42% object-level accuracy and 62.67% output accuracy, above all tested frontier structured-output baselines.
Empirical evaluation on the paper's structured extraction benchmark in the judge-in-the-loop configuration; the excerpt reports the numeric accuracies and states they exceed tested frontier structured-output baselines. The excerpt does not specify dataset size or number of runs.
This iterative, schema-aware write-path design shifts interpretation from the read path to the write path: reads become constrained queries over verified records rather than repeated inference over retrieved prose.
Conceptual claim about how the proposed architecture affects system behavior; supported by the architectural description in the paper rather than explicit quantitative evidence in the excerpt.
We present an iterative, schema-aware write path that decomposes memory ingestion into object detection, field detection, and field-value extraction, with validation gates, local retries, and stateful prompt control.
Description of the proposed method/architecture in the paper (methodological contribution); no numeric evaluation attached to the description in the excerpt.
Reliable external AI memory must be schema-grounded (schemas define what must be remembered, what may be ignored, and which values must never be inferred).
Normative assertion supported by the paper's proposed design and subsequent experimental results (the paper introduces a schema-grounded approach and evaluates it against benchmarks), though the excerpt does not give full methodological details or sample sizes for this claim alone.
To manage AI legibility, creators perform four recurring forms of invisible authenticity labor: epistemic verification, linguistic naturalization, narrative restructuring, and performative embodiment.
Authors identify and name four recurrent practices from coding and analysis of 16 in-depth interviews with creators on Xiaohongshu and Douyin describing specific downstream repair and performance work.
Creators engage in 'AI passing': strategic efforts to conceal and humanize AI-assisted drafts so that outputs plausibly appear human-authored.
Concept introduced based on analysis of 16 in-depth interviews with creators on Xiaohongshu and Douyin describing tactics to hide AI involvement and present content as human-authored.
Effective governance requires coordinated action across technical, organizational, and regulatory domains (e.g., system-level audits, vendor guidelines, continuous monitoring, documentation across dependency chains) to establish meaningful accountability in distributed development environments.
Policy and technical recommendations derived from literature review, regulatory analysis, and the paper's conceptual findings (recommendation, not empirically validated).
Claw-Eval-Live suggests that workflow-agent evaluation should be grounded twice, in fresh external demand and in verifiable agent action.
Conclusion/recommendation drawn from the benchmark design and experimental findings; conceptual claim advocating evaluation grounded in external demand signals and verifiable actions.
The release contains 105 tasks spanning controlled business services and local workspace repair, and evaluates 13 frontier models under a shared public pass rule.
Benchmark release statistics reported in the paper: explicit counts of tasks and evaluated models (105 tasks; 13 models).
For grading, Claw-Eval-Live records execution traces, audit logs, service state, and post-run workspace artifacts, using deterministic checks when evidence is sufficient and structured LLM judging only for semantic dimensions.
Grading methodology described in the paper: instrumentation and hybrid deterministic/LLM-judging approach documented by authors (procedural description).
Each release is constructed from public workflow-demand signals, with ClawHub Top-500 skills used in the current release, and materialized as controlled tasks with fixed fixtures, services, workspaces, and graders.
Description of release construction in the methods: uses public workflow-demand data and ClawHub Top-500 skills; tasks are materialized with controlled fixtures and graders (procedural detail from the paper).
We introduce Claw-Eval-Live, a live benchmark for workflow agents that separates a refreshable signal layer, updated across releases from public workflow-demand signals, from a reproducible, time-stamped release snapshot.
Methodological contribution described in the paper; design and architecture of the benchmark are presented by the authors (design description, no external sample needed).
LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces.
Framing/background statement in the paper describing expected capabilities of workflow agents; no empirical sample size reported for this expectation.
The proposed framework balances AI-driven productivity with the epistemic sovereignty necessary to manage increasingly opaque software ecosystems.
Normative/architectural claim about the proposed framework; presented conceptually in the paper without reported empirical testing in the excerpt.
To preserve long-term resilience, engineering leaders must move beyond prompt-based development to implement rigorous human-in-the-loop pedagogical standards.
Prescriptive recommendation based on the paper's conceptual analysis; no randomized trials or empirical validation of this intervention reported in the excerpt.
The findings offer practical insights for construction firms to enhance innovation performance through effective AI integration and help engineers better leverage AI tools in design and project management workflows.
Authors' stated practical implications based on their empirical findings (survey results linking AI capability, decision-making quality, and innovation performance).
Algorithmic transparency positively moderates the relationship between AI capability and decision-making quality.
Moderation analysis reported on questionnaire data (Credamo, time-lagged) with n=435; authors state a positive moderating effect of algorithmic transparency.
Decision-making quality mediates the relationship between AI capability and innovation performance.
Mediation analysis reported on the same survey dataset (time-lagged Credamo survey) with n=435 using established measurement scales; stated in results.
AI capability is positively associated with innovation performance.
Authors report statistical analysis of questionnaire data collected via the Credamo platform (time-lagged design) using established scales; sample size n=435; result stated in findings.
Commonly reported gains include the automation of trivial and repetitive tasks.
Multiple studies in the review report that LLM-assistants automate mundane programming tasks.
Commonly reported gains include minimized code search due to LLM assistance.
Synthesis of study findings noting reductions in developer time spent searching for code or answers.
Commonly reported gains from LLM-assistants include accelerated development (faster task completion).
Multiple included studies report faster development workflows and reduced time-to-complete tasks, as synthesized in the review.
The majority of reviewed studies report considerable benefits from LLM-assistants.
Synthesis of findings across the 39 included peer-reviewed studies as reported in the review.
Capital-managing agents should be evaluated across the full path from user mandate to prompt, validated action, and settlement.
Recommendation based on the authors' empirical deployment and analysis of failure modes and mitigation effectiveness across the end-to-end pipeline.
Targeted harness changes increased capital deployment from 42.9% to 78.0% in an affected test population.
A/B or pre/post testing in an affected test population measuring percentage of capital deployed before and after harness changes.
Targeted harness changes reduced fee-led observations from 32.5% to below 10% in an affected test population.
A/B or pre/post testing in an affected test population measuring incidence of fee-led observations before and after harness changes.
Targeted harness changes reduced fabricated sell rules from 57% to 3% in an affected test population.
A/B or pre/post testing in an affected test population measuring incidence of fabricated sell-rule observations before and after harness changes (percentage rates reported).
Policy-valid submitted transactions settled with 99.9% success.
Settlement logs comparing policy-valid submitted transactions to successful onchain settlements.
We release the full code base and a richly annotated dataset to support reproducible research on adaptive VCAs.
Paper statement announcing release of code and dataset.
The recommender achieved high relevance (MRR@1=0.75).
Reported offline/online recommender evaluation in the paper using Mean Reciprocal Rank at 1 (MRR@1) metric; presumably computed over recommendations in the study (711 conversations).
Step-by-step guidance improved pleasantness and reduced user burden.
User-reported measures collected in the controlled study (likely subjective ratings across participants/conversations).
Device-level evidence increased correct resolutions from about 50% to over 90% relative to an LLM-only baseline.
Controlled study comparing SecMate with device-level diagnostic evidence to an LLM-only baseline; reported results across 144 participants / 711 conversations.
Service specificity is achieved through a proactive, context-aware recommender.
System description and recommender component evaluation in the paper.
User specificity relies on implicit proficiency inference and profile-aware troubleshooting.
System design and algorithmic description in the paper explaining user-proficiency inference and profile-aware components.
Device specificity is provided by a lightweight local diagnostic utility.
System design and implementation details reported in the paper describing the diagnostic utility component.
We present SecMate, a multi-agent VCA for cybersecurity troubleshooting that integrates device, user, and service specificity from conversational and device-level signals.
System description and architecture presented in the paper (design and implementation of SecMate).
The framework produces a list of testable empirical questions that we leave as open problems.
Statement in the paper that it derives testable empirical questions from the theoretical framework; no empirical tests are executed in the paper itself.
The framework operationalizes aspects of earlier qualitative work on supervisory control (Sheridan, 1992), common ground (Clark & Brennan, 1991), and mixed-initiative interaction (Horvitz, 1999) within a single normative ratio.
Conceptual synthesis and mapping of prior qualitative literature into the new per-task leverage formalism presented in the paper; this is a theoretical linkage rather than empirical validation.