Evidence (5192 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	738	1617
Governance & Regulation	671	334	160	99	1285
Organizational Efficiency	626	147	105	70	955
Technology Adoption Rate	502	176	98	78	861
Research Productivity	349	109	48	322	838
Output Quality	391	121	45	40	597
Firm Productivity	385	46	85	17	539
Decision Quality	277	145	63	34	526
AI Safety & Ethics	189	244	59	30	526
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	106	40	6	188
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	79	8	1	152
Regulatory Compliance	69	66	14	3	152
Training Effectiveness	82	16	13	18	131
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Human Ai Collab Remove filter

Canvas Design Principles mitigate algorithmic myopia (overfitting to historical patterns) and improve adaptability and resource efficiency.

Set of design principles proposed in the paper and evaluated through agent‑based simulation scenarios and analyses of the large behavioral dataset. Specific experimental details and quantitative effect sizes for these principles are not detailed in the summary.

medium positive The Algorithmic Canvas: On the Autopoietic Redefinition of S... algorithmic myopia (reduction) and adaptability/resource efficiency

Reconceptualizing STP as an autopoietic (self‑organizing) system enables continuous human–AI co‑creation and yields better outcomes in unstable markets than traditional, process‑based STP.

Conceptual argument grounded in 6‑month lab ethnography (n = 23), design and deployment of the Algorithmic Canvas in that lab context, and validation via large behavioral dataset analyses and agent‑based simulations.

medium positive The Algorithmic Canvas: On the Autopoietic Redefinition of S... overall STP effectiveness/adaptability/resilience in unstable markets

Algorithmic co‑creation methods detect substantial market fluctuations about 5.8× better than traditional approaches.

Computational analysis of large behavioral dataset (150 million customer interactions) and comparative performance evaluation in empirically grounded agent‑based simulations. The detection metric and statistical significance details are not provided in the summary.

medium positive The Algorithmic Canvas: On the Autopoietic Redefinition of S... signal detection performance for market fluctuations (relative improvement facto...

The autopoietic model shortens strategic planning cycle length by approximately 90%.

Observed/recorded time‑to‑update or strategy revision metrics gathered via Algorithmic Canvas usage and lab ethnography (6‑month lab ethnography inside a Fortune 500 company, n = 23). Exact measurement protocol and whether reduction measured in live firms, simulations, or system logs is not fully detailed in the summary.

medium positive The Algorithmic Canvas: On the Autopoietic Redefinition of S... strategic planning cycle length (time to update/strategy revision)

Design and policy interventions that encourage active human contributions (e.g., draft-first workflows, co-creation interfaces, training) can help preserve worker agency and mitigate psychological costs.

Recommendation based on experimental evidence that Active-collaboration preserved psychological outcomes relative to passive use; presented as policy/design prescription rather than directly tested intervention at scale.

medium positive Relying on AI at work reduces self-efficacy, ownership, and ... inferred mitigation of psychological harms (not directly measured at firm scale)

A complementary real-world survey (N = 270) across diverse tasks reproduced the experimental pattern, suggesting external validity beyond the lab writing tasks.

Cross-sectional survey of N = 270 respondents reporting on their AI use across multiple task types; reported patterns consistent with the experiment (passive use associated with lower efficacy/ownership/meaningfulness; active collaborative use did not).

medium positive Relying on AI at work reduces self-efficacy, ownership, and ... self-reported relationships between AI-use mode and psychological outcomes (self...

Effective teams tend to evolve from ad-hoc interpretive methods toward systematic evaluation by (a) formalizing prompts/tests, (b) instrumenting outputs, (c) mapping failure modes to remediation paths, and (d) creating organizational decision rules.

Pattern observed in the qualitative coding of interviews where participants described trajectories or steps their teams took to formalize evaluation.

medium positive Results-Actionability Gap: Understanding How Practitioners E... process maturity in evaluation practices (ad-hoc to systematic)

Successful teams close the results-actionability gap by systematizing interpretive practices and creating clearer pathways from evaluation signals to product changes.

Interview accounts and cross-case analysis showing some teams adopting formalization steps (e.g., standardized prompts/tests, instrumentation, remediation mappings) that participants described as enabling action.

medium positive Results-Actionability Gap: Understanding How Practitioners E... degree to which evaluation leads to implemented product changes

Prioritizing asymmetrical responsibility may justify constraints on certain AI deployments (e.g., in care), shifting welfare analyses to incorporate dignity, vulnerability, and non-quantifiable harms.

Policy and normative recommendation grounded in Levinasian ethics and illustrative domain examples; no formal welfare model or empirical policy evaluation in the paper.

medium positive Examining ethical challenges in human–robot interaction usin... policy justification for constraints on AI deployments and inclusion of dignity/...

Emmanuel Levinas’s notion of infinite, asymmetrical responsibility to the Other provides a more incisive framework than pluralist balancing for diagnosing and responding to responsibility gaps in hybrid human–robot assemblages.

Normative-philosophical argumentation and interdisciplinary synthesis; illustrated with qualitative vignettes/case studies from healthcare robotics, autonomous vehicles, and algorithmic governance. No quantitative data or formal empirical test.

medium positive Examining ethical challenges in human–robot interaction usin... effectiveness of ethical framework in diagnosing/responding to responsibility ga...

Adoption of AI feedback could lower marginal costs of delivering high-quality feedback and change fixed vs. variable cost structures for instruction delivery.

Economic implication discussed by workshop participants (50 scholars) as a theoretical possibility; no quantitative cost estimates in the report.

medium positive The Future of Feedback: How Can AI Help Transform Feedback t... marginal cost per unit of feedback; changes in fixed/variable cost composition

Generative AI can enable new feedback modalities (text, hints, worked examples, formative prompts) adaptable to content and learner needs.

Thematic conclusions from the interdisciplinary meeting of 50 scholars, describing possible modality generation capabilities of current generative models; no empirical modality-comparison data provided.

medium positive The Future of Feedback: How Can AI Help Transform Feedback t... variety of feedback modalities produced; adaptability of modality to content/lea...

Immediate AI-generated feedback may sustain learner momentum and improve formative assessment cycles (timeliness & engagement).

Expert-opinion synthesis from structured workshop (50 scholars) identifying timely feedback as a potential pedagogical benefit; no empirical trials reported.

medium positive The Future of Feedback: How Can AI Help Transform Feedback t... learner engagement; tempo of formative assessment cycles; short-term task comple...

Large language and generative models can tailor explanations, scaffolding, and practice to learners' current states and preferences (personalization).

Workshop expert consensus and thematic synthesis from 50 interdisciplinary scholars; illustrative examples discussed rather than empirical evaluation.

medium positive The Future of Feedback: How Can AI Help Transform Feedback t... degree of personalization (alignment of feedback to learner state/preferences); ...

Generative AI can produce real-time, individualized feedback at scale, potentially reducing per-student feedback costs and increasing feedback frequency.

Synthesis of expert perspectives from an interdisciplinary workshop of 50 scholars (educational psychology, computer science, learning sciences); qualitative small-group activities and thematic extraction. No primary experimental or quantitative cost data presented.

medium positive The Future of Feedback: How Can AI Help Transform Feedback t... per-student feedback cost; feedback frequency; scalability of feedback delivery

Agents learn from one another without curricula (agent-to-agent learning occurs organically in the ecosystem).

Naturalistic daily observations across platforms noting peer-to-peer agent interactions and apparent transfer of behaviors/knowledge; no controlled tests of learning or counterfactuals.

medium positive When Openclaw Agents Learn from Each Other: Insights from Em... agent-to-agent learning / behavioral change attributable to peer interactions

Agents form idea cascades and quality hierarchies without any centrally designed curriculum or intervention (emergent peer learning and spontaneous knowledge diffusion).

Observed interaction patterns across platforms showing cascades, hierarchies, and diffusion among agents in the qualitative dataset; documentation is comparative and observational rather than experimental.

medium positive When Openclaw Agents Learn from Each Other: Insights from Em... agent-to-agent idea cascades / formation of quality hierarchies

A rapidly growing ecosystem of autonomous AI agents is producing organic, multi-agent learning dynamics that go beyond dyadic human–AI interactions.

Naturalistic, qualitative daily observations over one month across multiple agent platforms (reported platforms: Moltbook, The Colony, 4claw); coverage reported of >167,000 agents interacting as peers; comparative observational documentation rather than controlled experimentation.

medium positive When Openclaw Agents Learn from Each Other: Insights from Em... presence and scale of multi-agent learning dynamics / ecosystem growth

Historical institutional publication records encode an extractable evaluative signal ("taste") that can be learned by models and used for scalable triage, screening, and curation of submissions.

Empirical results showing improved predictive accuracy after fine-tuning on accept/reject records, plus demonstration of transfer tasks and a cross-field (economics) result; implications for applications (triage, screening) are drawn from these empirical findings rather than directly deployed field experiments.

medium positive Machines acquire scientific taste from institutional traces Extractability of evaluative signal as operationalized by improved predictive ac...

Models show well-calibrated confidence: their highest-confidence predictions are 100% accurate.

Calibration analysis of fine-tuned models comparing predicted-confidence levels to actual accuracy; reported that examples the model assigned its highest confidence to were 100% accurate. (Number of highest-confidence examples and calibration buckets not reported in the provided text.)

medium positive Machines acquire scientific taste from institutional traces Calibration accuracy (accuracy among highest-confidence predictions)

The learned evaluative signal transfers to untrained tasks such as pairwise comparisons and one-sentence summaries.

Fine-tuned models were evaluated on related, untrained evaluative tasks (pairwise comparisons of pitches and one-sentence summary evaluations) and showed positive transfer performance relative to baselines. (Specific metrics, effect sizes, and sample sizes for these transfer tasks are not provided in the supplied text.)

medium positive Machines acquire scientific taste from institutional traces Performance (transfer) on pairwise-comparison and one-sentence-summary evaluativ...

The core findings (harm from ToM order mismatches and benefits from A-ToM) are robust to partners beyond LLM-driven agents.

Paper reports robustness checks testing generalization to non-LLM agent classes (details summarized in robustness section); comparisons use the same coordination metrics.

medium positive Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... coordination performance (joint payoff, success rate) when paired with non-LLM a...

A-ToM recovers coordination performance by aligning its effective ToM depth with partners across a range of multiagent tasks.

Experimental results showing A-ToM achieves coordination levels closer to matched fixed-order pairings across the repeated matrix game, grid navigation tasks, and Overcooked when facing partners with different fixed ToM depths.

medium positive Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... coordination performance (joint payoff, success rate)

An adaptive ToM (A-ToM) agent that infers its partner's ToM order from prior interactions and conditions its predictions and actions on that estimate restores alignment and improves coordination.

Implemented A-ToM (estimation from interaction history + conditioning of partner-action predictions) and evaluated it against fixed-order agents in the four environments; reported improvements in coordination metrics when A-ToM paired with partners of varying ToM orders.

medium positive Adaptive Theory of Mind for LLM-based Multi-Agent Coordinati... coordination performance (joint payoff, success rate, task completion time)

Security testing included prompt-injection/adversarial inputs to probe the security agent and layered defenses.

Paper reports conducting prompt-injection/adversarial tests as part of security evaluation; the summary does not include the number, nature, or success/failure rates of these tests.

medium positive CoMAI: A Collaborative Multi-Agent Framework for Robust and ... results of prompt-injection/adversarial tests (security evaluation)

Rubric-based, structured scoring promotes consistent, auditable judgments and reduces subjective assessor bias.

System implements rubric-based, multi-dimensional scoring and the paper asserts this improves consistency and auditability; no reported inter-rater reliability statistics or controlled comparisons to human/monolithic baselines are provided in the summary.

medium positive CoMAI: A Collaborative Multi-Agent Framework for Robust and ... consistency of judgments; auditability; subjective assessor bias

Isolating sensitive logic (scoring rubrics, adaptive difficulty rules) from free-text generation reduces the attack surface.

Design principle implemented in the architecture (separation of concerns between agents); claimed benefit in the paper. Empirical validation details (quantitative reduction in successful attacks) are not provided in the summary.

medium positive CoMAI: A Collaborative Multi-Agent Framework for Robust and ... attack surface for adversarial manipulation of scoring/adaptive rules

CoMAI implements multi-layered defenses against prompt-injection and other prompt-level attacks via a dedicated security agent and constrained state transitions.

System design (a dedicated security/validation agent and a finite-state machine enforcing information flow) and reported security testing that included prompt-injection/adversarial inputs to probe defenses.

medium positive CoMAI: A Collaborative Multi-Agent Framework for Robust and ... robustness to prompt-injection and prompt-level adversarial attacks

Candidate satisfaction with CoMAI was 84.41%.

Reported experimental metric in the paper summary; likely derived from post-interview surveys, but survey design, sample size, and response rates are not specified in the summary.

medium positive CoMAI: A Collaborative Multi-Agent Framework for Robust and ... candidate satisfaction (survey-based)

In experiments CoMAI achieved 83.33% recall.

Reported experimental metric in the paper summary; no information provided on how recall was computed (e.g., per-class vs. overall), sample sizes, or confidence intervals.

medium positive CoMAI: A Collaborative Multi-Agent Framework for Robust and ... recall (sensitivity) of target class(es)

In experiments CoMAI achieved 90.47% accuracy.

Reported experimental metric in the paper summary. The underlying dataset size, class balance, and baseline comparison details are not provided in the summary.

medium positive CoMAI: A Collaborative Multi-Agent Framework for Robust and ... assessment accuracy

CoMAI outperforms monolithic LLM-based assessments on robustness, fairness, and interpretability.

Comparative framing and reported experiments in the paper claiming improved robustness, fairness, and interpretability relative to single-agent LLM baselines; however, baseline specifics, dataset sizes, and statistical tests are not disclosed in the provided summary.

medium positive CoMAI: A Collaborative Multi-Agent Framework for Robust and ... robustness; fairness (subjective bias reduction); interpretability/auditability

The clarification protocol elicits missing premises or confirms intent rather than producing an ill-aligned response.

Paper describes structured clarification templates (binary checks, multi-choice scaffolds, short clarifying questions) intended to elicit missing information; this is a design assertion without reported user-study evidence.

medium positive A Context Alignment Pre-processor for Enhancing the Coherenc... rate of resolved ambiguities after clarification / reduction in ill-aligned resp...

There are potential welfare gains from improved decision quality and trust in automation, particularly where human oversight remains required.

Conceptual welfare analysis; no welfare quantification or simulations provided.

medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... welfare indicators (decision quality gains, trust levels, social surplus) from a...

Structured AFs can reduce information asymmetry by making reasoning traceable, thereby lowering search and verification costs in transactions and contracting.

Economic reasoning drawing on information-asymmetry theory; no empirical transaction-cost measurements given.

medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... reduction in transaction/search/verification costs attributable to traceable AFs

Firms offering argumentatively transparent AI can obtain competitive advantage and charge premium prices for verifiability and auditability.

Economic reasoning and market-structure inference; no empirical pricing or demand elasticity studies provided.

medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... price premium and competitive advantage metrics for transparent-AI providers

Demand will shift toward AI systems that provide verifiable, contestable reasoning in regulated/high‑stakes sectors (healthcare, law, finance, public policy).

Economic argument and market prediction in the paper; speculative without market data or forecasting models presented.

medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... market demand share for verifiable/contestable AI systems in regulated sectors

This approach supports collaborative reasoning ('with' humans) rather than opaque automation 'for' humans, improving uptake in high‑stakes settings.

Conceptual argument about human-in-the-loop workflows and collaborative roles; no empirical uptake or deployment data presented.

medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... human adoption/uplift in uptake for high-stakes decision systems

Framing decisions as contestable and revisable (via dialectical challenge and update) increases robustness and trust in AI-supported decision-making.

Conceptual claim arguing that contestability/revision improve robustness and trust; no experimental evidence or user studies provided.

medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... measures of robustness (resilience to error) and human trust in decisions

Running formal dialectical/acceptability semantics and dialogue protocols over AFs enables agents that reason with humans through structured debates and revisions.

Conceptual integration of formal semantics (Dung-style, bipolar, weighted) and dialogue protocols; no human-subject studies or system evaluations reported.

medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... capacity for structured debate/revision (dialogue performance, acceptability out...

Argumentation Framework Synthesis: mined fragments can be combined into coherent formal argumentation frameworks (AFs) with explicit semantics enabling verification and automated inference.

Conceptual algorithmic proposal (graph synthesis, canonicalization, formal semantics); no empirical synthesis results or benchmarks presented.

medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... coherence and correctness of synthesized AFs and verifiability of derived infere...

Argumentation Framework Mining: LLMs and NLP pipelines can be used to extract claims, premises, relations (attack/support), and provenance from text corpora.

Proposed methodological pipeline (fine-tuning/prompting LLMs and IE pipelines); conceptual proposal without implementation details or experimental results.

medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... accuracy/fidelity of extracted argument elements (claims, premises, relations, p...

Combining formal argument structures with LLMs’ ability to mine and generate rich, contextual arguments from unstructured text promises human-aware, verifiable, and trustable AI for high‑stakes domains.

Conceptual synthesis of computational argumentation (formal AFs) and LLM capabilities; no empirical validation or quantified metrics provided.

medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... trustworthiness/verifiability of AI outputs in high-stakes decision contexts

Integrating computational argumentation with large language models (LLMs) creates a new paradigm—Argumentative Human-AI Decision‑Making—where AI agents participate in dialectical, contestable, and revisable decision processes with humans.

Conceptual / design argument presented in the paper; no empirical implementation or sample; draws on prior work in computational argumentation and capabilities of LLMs.

medium positive Argumentative Human-AI Decision-Making: Toward AI Agents Tha... degree of human-AI dialectical participation (ability to engage in contestable, ...

There will likely be growth in complementary markets for model verification, provenance tracking, legal-AI audits, and human-in-the-loop workflow services.

Market foresight based on identified unmet needs (explainability, verification) and illustrative examples; no market-sizing data.

medium positive Why Avoid Generative Legal AI Systems? Hallucination, Overre... market size and growth rates for verification/audit and related services

The project demonstrates that high-skill, knowledge-intensive tasks (formal mathematics) can be substantially automated with a heterogeneous AI toolchain, reducing human coding labor while retaining supervisory oversight.

Inference from project outcomes: AI tools produced formal Lean code and discharged lemmas while the reported human supervisor did not write code; single-project evidence (n=1), qualitative and quantitative logs support partial automation.

medium positive Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau E... degree of automation in formal mathematics work (reduction in human coding effor...

The formalization finished prior to the final draft of the corresponding informal math paper.

Timing claim reported in the paper comparing formalization completion date to the final draft date of the related math paper (self-reported for the single project).

medium positive Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau E... relative completion timing (formalization finished before final draft of math pa...

Effective practices included splitting proofs into abstract (high-level reasoning) and concrete (formalization) parts, having agents perform adversarial self-review, and targeting human review to key definitions and theorem statements.

Process-level recommendations drawn from the project's workflow; paper reports these practices as successful for this single development (n=1 project) based on qualitative assessment.

medium positive Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau E... process practices associated with smoother formalization (binary presence/use of...

One mathematician supervised the process over approximately 10 days, reported a human cost of about $200, and wrote no code.

Self-reported human-role summary in the paper: single supervisor, ~10 days supervision time, reported monetary cost ≈ $200, and assertion that the human wrote no code (n=1 human supervisor for the project).

medium positive Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau E... human supervision time (≈10 days), monetary supervision cost (≈$200), human codi...

Governance should be hybrid and structured: legal/regulatory frameworks (e.g., EU AI Act), technical standards (ISO safety norms), and crisis-management practices must be combined to allocate responsibilities and intervention authority.

Policy and standards synthesis drawing on EU AI Act, ISO standards, and crisis-management literature; prescriptive argument without empirical testing.

medium positive Resilience Meets Autonomy: Governing Embodied AI in Critical... degree to which governance arrangements allocate responsibility and intervention...

« Prev 1 2 3 … 82 83 84 … 103 104 Next »