Evidence (3470 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	736	1615
Governance & Regulation	664	329	160	99	1273
Organizational Efficiency	624	143	105	70	949
Technology Adoption Rate	502	176	98	78	861
Research Productivity	348	109	48	322	836
Output Quality	391	120	44	40	595
Firm Productivity	385	46	85	17	539
Decision Quality	275	143	62	34	521
AI Safety & Ethics	183	241	59	30	517
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	105	40	6	187
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	78	8	1	151
Regulatory Compliance	69	64	14	3	150
Training Effectiveness	81	15	13	18	129
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Org Design Remove filter

The study's counterfactual analytical model links HR indicators (training intensity, absenteeism, labor productivity, turnover rates, workforce allocation) to organizational performance outcomes using regression-based simulations and predictive estimation.

Methodological claim explicitly stated: model construction from an industrial firm dataset using regression-based simulations and predictive techniques. (Specific sample size, variable operationalizations, and time frame not reported in the description.)

high mixed Artificial Intelligence and Human Resource Management: A Cou... methodological estimate of counterfactual organizational performance outcomes

A minimal linear specification (linearized model) demonstrates how coupling strength, persistence, and dissipation determine local stability and oscillatory regimes through spectral conditions on the Jacobian.

Analytic linear model and local stability analysis in the paper: computation of Jacobian, derivation of spectral conditions (eigenvalue locations) that separate stable/oscillatory regimes; illustrative examples within the paper (no empirical data).

high mixed How Intelligence Emerges: A Minimal Theory of Dynamic Adapti... local stability/oscillatory behavior characterized by Jacobian eigenvalues (spec...

We develop a theoretical framework - the productivity funnel - that traces how technological potential narrows through successive stages, from access and digital infrastructure, through organizational absorption and human capital adaptation, to ultimate value capture.

Conceptual/theoretical development presented in the paper; no empirical sample needed (framework-building).

high mixed The complementarity trap: AI adoption and value capture n/a (theoretical framework describing stages leading to value capture)

Effects of curated Skills are highly heterogeneous across domains (e.g., +4.5 pp in Software Engineering vs. +51.9 pp in Healthcare).

Per-domain pass-rate deltas reported in the paper (SkillsBench per-domain analysis). The example domain deltas (+4.5 pp and +51.9 pp) are taken from the reported per-domain results.

high mixed SkillsBench: Benchmarking How Well Agent Skills Work Across ... task pass rate (per-domain average delta)

Scholarly production, institutional incentives, funding, and the Cold War geopolitical context shaped which economic theories became prominent.

Historical institutional case study drawing on archives, correspondence, publication records, and contemporaneous debates to link institutional and funding environments to intellectual trajectories.

high mixed Ideological competition during the era of the 20th century c... prominence of economic theories (qualitative assessment tied to institutional/fu...

Whether AI increases or decreases overall inequality depends on AI’s technology structure (proprietary vs. commodity) and on labor-market institutions (rent‑sharing elasticity ξ and asset concentration).

Comparative statics and regime analysis within the calibrated model that varies the technological-form parameter (η1 vs. η0) and the rent‑sharing elasticity ξ, as well as measures of asset concentration.

high mixed When AI Levels the Playing Field: Skill Homogenization, Asse... aggregate inequality (ΔGini) as a function of technology form and institutional ...

AI can equalize individual task performance while increasing aggregate inequality because rents accrue to owners of complementary assets rather than to workers.

Analytical model and calibrated simulations demonstrating that within-task compression (reduced worker dispersion) can coexist with rising aggregate inequality (ΔGini) owing to rent concentration at the firm/asset-owner level.

high mixed When AI Levels the Playing Field: Skill Homogenization, Asse... within-task performance dispersion (decrease) and aggregate inequality (ΔGini, i...

The study's qualitative and exploratory design limits generalizability; the proposed framework requires quantitative testing and broader samples (practicing architects, firms, cross-cultural contexts).

Explicit limitations stated by authors; study is based on semi-structured interviews with architecture students (N unspecified) and inductive thematic analysis.

high mixed Human–AI Collaboration in Architectural Design Education: To... generalizability / external validity of findings and framework

Important tradeoffs exist (privacy vs. utility; centralized vs. federated data architectures; automated moderation vs. freedom of expression; cost/complexity of secure hardware) that must be balanced in VR security design.

Comparative evaluation across the reviewed corpus (31 studies) identifying recurring ethical and technical tradeoffs; authors discuss these qualitatively.

high mixed Securing Virtual Reality: Threat Models, Vulnerabilities, an... direction and magnitude of tradeoffs between privacy, utility, governance, and c...

Emotional redirection is common: 33% of fear-tagged posts receive joy-tagged responses.

Post–response emotion transition analysis using the emotion-labeled dataset; calculation of conditional probability that responses to fear-tagged posts are labeled joy (observed rate ≈33%) in Moltbook threads.

high mixed What Do AI Agents Talk About? Emergent Communication Structu... proportion of responses to fear-tagged posts that are joy-tagged (emotion transi...

Self-reflective discussion was concentrated in Science & Technology and Arts & Entertainment topical categories, while Economy & Finance threads showed no self-referential content.

Topic modeling and manual/automatic tagging of self-referential themes across identified topical categories within the Moltbook dataset; category-level counts showing presence/absence of self-referential tags (dataset: 361,605 posts).

high mixed What Do AI Agents Talk About? Emergent Communication Structu... presence and concentration (%) of self-referential content by topical category

The topology of service-dependency graphs (modelled as DAGs of compute stages) is a first-order determinant of whether decentralised, price-based resource allocation will be stable and scalable.

Systematic ablation study using simulation: 1,620 runs total across six experiment types, sweeping graph topology (hierarchical vs cross-cutting), load, hybrid integrator presence, and governance constraints; metrics included price convergence/volatility and allocation throughput/quality. Effect sizes reported in the paper show topology had the largest impact on price stability and scalability.

high mixed Real-Time AI Service Economy: A Framework for Agentic Comput... price convergence / price volatility and system scalability (throughput and allo...

Choice of scaffold materially affects outcomes: an open-source scaffold outperformed vendor-provided scaffolds by up to approximately 5 percentage points.

Comparative experiments across three scaffolding approaches (vendor scaffolds and at least one open-source scaffold) showing up to ~5 percentage point differences in measured outcomes.

high mixed Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contra... performance_difference_across_scaffolds (detection/exploitation_rates_difference...

Explanations change workflows, shift responsibilities between humans and machines, and can reshape power dynamics—creating both opportunities (better oversight) and risks (over-reliance, gaming).

Qualitative and conceptual studies synthesized in the review, including socio-technical analyses and case studies reporting observed or theorized workflow and responsibility shifts; no meta-analytic causal estimate.

high mixed Explainable AI in High-Stakes Domains: Improving Trust, Tran... workflows, responsibility allocation, power dynamics, oversight quality

Explanations increase user trust principally when they are understandable, actionable, and aligned with users’ domain knowledge; opaque or overly technical explanations can fail to build trust or even decrease it.

Thematic synthesis of empirical and conceptual studies in the reviewed literature reporting conditional effects of explanation form and comprehensibility on trust; review notes heterogeneity in study designs and contexts.

high mixed Explainable AI in High-Stakes Domains: Improving Trust, Tran... user trust / changes in trust toward AI outputs

Explainability improves perceived legitimacy, user trust, and organizational accountability only when technical transparency is paired with human-centered explanation design and governance mechanisms.

Synthesis of studies from the reviewed literature showing conditional effects of algorithmic interpretability combined with explanation design and governance; derived via thematic coding across technical and social-science sources (no new primary experimental data reported).

high mixed Explainable AI in High-Stakes Domains: Improving Trust, Tran... perceived legitimacy, user trust, organizational accountability

Explainability is a necessary but not sufficient condition for trustworthy AI in high-stakes domains.

Systematic literature review (thematic coding and synthesis) of interdisciplinary scholarship (peer-reviewed research, technical reports, policy documents); the paper synthesizes conceptual and empirical studies rather than presenting new primary data. Emphasis on high-stakes domains (healthcare, finance, public sector).

high mixed Explainable AI in High-Stakes Domains: Improving Trust, Tran... overall trustworthiness of AI systems in high-stakes domains (multidimensional c...

Data‑driven policies can either amplify or mitigate inequalities depending on data representativeness, model design, and deployment governance.

Multiple empirical examples and theoretical analyses in the review highlighting cases of both harm (bias amplification) and mitigation, identified across the 103 items.

high mixed Models, applications, and limitations of the responsible ado... distributional equity outcomes (inequality amplification or mitigation)

Citizen acceptance, transparency, and perceived fairness strongly shape adoption trajectories and the political feasibility of AI tools in government.

Repeated empirical findings in the reviewed literature linking public trust, transparency measures, and fairness perceptions to successful or failed deployments (drawn from multiple case studies in the 103 items).

high mixed Models, applications, and limitations of the responsible ado... adoption trajectory/political feasibility of government AI tools (measured via d...

Adoption of AI and data-driven governance is highly uneven across jurisdictions and sectors, driven by institutional capacity, governance frameworks, and public trust.

Cross‑regional and cross‑sector comparisons in the review corpus (103 items) showing varying maturity levels and repeated identification of institutional capacity, governance arrangements, and trust factors as determinants.

high mixed Models, applications, and limitations of the responsible ado... adoption level/maturity of AI-driven governance systems

Productivity gains from generative AI depend on task mix, integration design, and the availability of complementary human skills.

Theoretical evaluation and synthesis of heterogeneous empirical findings; authors highlight variation across firms, sectors, and tasks.

high mixed The Use of ChatGPT in Business Productivity and Workflow Opt... productivity change conditional on task mix/integration/human skills (productivi...

Methodological caveats across the literature (heterogeneity of tasks/measures, publication bias, short-term studies) limit the generalizability of current findings.

Meta-level critique within the synthesis noting study heterogeneity, likely publication/short-term biases, and variable domain-specific performance dependent on user expertise and workflows.

high mixed ChatGPT as an Innovative Tool for Idea Generation and Proble... generalizability and external validity of LLM-assisted creativity findings

Standard productivity metrics are likely to undercount the value generated by AI-augmented ideation; quality-adjusted measures of creative output are required.

Measurement critique based on the mismatch between existing productivity statistics and the kinds of upstream idea-generation gains observed in empirical studies; supported by the review's methodological discussion.

high mixed ChatGPT as an Innovative Tool for Idea Generation and Proble... measured productivity vs. true quality-adjusted creative output

Realized value from AI methods (ML, predictive analytics, anomaly detection, XAI) is conditional: these technical methods deliver capabilities only when combined with strong data governance, standardized processes, and change management.

Thematic synthesis across the systematic review (2020–2025) showing repeated case-study and practitioner-report evidence that technical gains failed to scale without governance, process standardization, and organizational change efforts.

high mixed Integrating Artificial Intelligence and Enterprise Resource ... magnitude and durability of ERP-AI benefits (e.g., sustained accuracy gains, ado...

The hybrid estimator (GA+SQP) is computationally more intensive than single-stage MLE/local optimization, implying a trade-off between estimation reliability and runtime cost.

Reported runtime and computational cost comparisons in estimation experiments: the paper notes longer runtimes for GA+SQP versus standard optimizers while documenting improvements in objective values and convergence behavior.

high mixed k-QREM: Integrating Hierarchical Structures to Optimize Boun... computation time / runtime, convergence reliability

Applying differential privacy to model updates provides a bounded formal guarantee on information leakage, but DP noise budgets and communication constraints create accuracy and latency trade-offs that must be managed.

Analytical treatment of DP's impact on learning (trade-off modeling) and qualitative simulation examples showing accuracy degradation under DP noise; no numeric privacy-utility curves from field deployments provided.

high mixed Privacy-Aware AI Advertising Systems: A Federated Learning F... information leakage (DP privacy budget), model accuracy (loss/utility), communic...

The analysis also identifies risks linked to exclusion, symbolic compliance, and concentration of control over compliance processes.

Theoretical risk mapping produced by the integrative review and interpretive synthesis; no primary empirical evidence presented.

high negative RegTech-enabled governance of sanctions-safe enterprise ecos... risks of RegTech governance (exclusion, symbolic compliance, concentration of co...

Uncertainty around compliance and excessive risk avoidance reduce the space for lawful business activity.

Interpretive synthesis of evidence and arguments across the reviewed literatures (sanctions compliance, institutional voids); no original empirical test.

high negative RegTech-enabled governance of sanctions-safe enterprise ecos... extent of lawful business activity (regulatory-compliance-driven market particip...

Firms working under such conditions often experience limited access to finance and markets.

Claim derived from literature on firm constraints in weak institutional/sanctioned contexts as reviewed in the paper; no primary empirical data reported.

high negative RegTech-enabled governance of sanctions-safe enterprise ecos... access to finance and markets for firms

Post-conflict and sanctions-affected environments are strongly affected by sanctions pressure, weak rule enforcement, and high levels of corruption risk.

Synthesis of literature on sanctions, weak institutions, and corruption risk presented in the integrative review; no new empirical sample reported.

high negative RegTech-enabled governance of sanctions-safe enterprise ecos... institutional environment quality (sanctions pressure, rule enforcement, corrupt...

In algorithm-triggered emotional escalations, workers showed lower engagement: they sent fewer messages, contributed a smaller share of total chat rounds, and showed less proactivity in information seeking and solution provision.

Behavioral measures derived from chat logs in the randomized experiment comparing worker actions post-escalation across escalation types; reported differences in message counts, share of rounds, and proxies for proactivity.

high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... worker engagement measures (message count, share of chat rounds, proactivity ind...

Human intervention is less effective in algorithm-triggered emotional escalations (where customers express frustration or dissatisfaction).

Experimental subgroup analysis comparing intervention outcomes for algorithm-triggered emotional escalations versus technical escalations; emotional escalations showed worse post-intervention outcomes.

high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... service quality after emotional escalations

AI deployment substantially lowers ratings for AI-eligible chats.

Randomized field experiment measuring customer ratings for AI-eligible chats; treated condition (AI + human oversight) produced substantially lower ratings relative to control (humans only).

high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... customer ratings for AI-eligible chats

AI deployment reduces average chat duration.

Randomized field experiment on Alibaba's Taobao platform: workers in treatment supervised an agentic AI resolving AI-eligible chats while handling AI-ineligible chats; control workers resolved all chats without AI. Effect observed on average chat duration in experiment data.

high negative Agentic AI and Human-in-the-Loop Interventions: Field Experi... average chat duration

Rather than restoring stability, this cycle intensifies anxiety, undermines mastery, and erodes professional confidence.

Theoretical claim about psychological outcomes from the conceptual reskilling loop; paper provides argumentation but no empirical measurements.

high negative AI-driven skill volatility and the emergence of re-skilling ... anxiety, sense of mastery, professional confidence

Based on Job Demands–Resources (JD-R) theory and Conservation of Resources (COR) theory, the paper conceptualizes an AI-induced reskilling loop in which ongoing technological change leads to skill erosion, continuous reskilling demands, cognitive and emotional depletion, and reinforced learning as a defensive response to perceived obsolescence.

Theoretical model/loop derived from applying JD-R and COR frameworks; no empirical test or sample reported in the paper.

high negative AI-driven skill volatility and the emergence of re-skilling ... cognitive/emotional depletion and defensive learning responses

The paper introduces the concept of 'reskilling fatigue' to explain the human consequences of persistent skill volatility among Established Knowledge Professionals (EKPs).

Conceptual/theoretical contribution presented by the authors; definition and argumentation rather than empirical validation.

high negative AI-driven skill volatility and the emergence of re-skilling ... experience of reskilling fatigue among EKPs

Continuous reskilling is widely promoted as a solution to AI-driven disruption, but little attention has been paid to its cumulative psychological costs.

Argument from literature review/observation in the paper; no empirical measurement or sample reported in the paper.

high negative AI-driven skill volatility and the emergence of re-skilling ... psychological costs of continuous reskilling (e.g., fatigue, stress)

These characteristics are properties of the tasks themselves rather than limitations of current AI models.

Conceptual argument in the paper asserting task-inherent properties drive resistance to automation; supported by theory and argumentation, not by empirical model-comparison experiments.

high negative Metis AI: The Overlooked Middle Zone Between AI-Native and W... source of automation limitation (task-inherent vs model limitation)

The resistance of Metis tasks to automation is not due to computational intractability but to institutional, social, and normative entanglements.

Theoretical argument differentiating computational from institutional/social/normative causes; supported by citations and cross-disciplinary theory rather than empirical causal identification.

high negative Metis AI: The Overlooked Middle Zone Between AI-Native and W... cause of automation resistance

There exists a class of entirely digital tasks, called 'Metis AI', that resist reliable AI automation.

Conceptual identification and definition introduced by the authors; supported by theoretical grounding in social sciences, philosophy, and humanitarian practice rather than empirical trials or quantified samples.

high negative Metis AI: The Overlooked Middle Zone Between AI-Native and W... resistance to reliable AI automation

That digital-vs-physical framing misses the most consequential boundary: the one within digital tasks.

Normative/theoretical argument presented in the paper contrasting existing framing with a proposed alternative; grounded in cross-disciplinary literature rather than empirical measurement.

high negative Metis AI: The Overlooked Middle Zone Between AI-Native and W... relevance of boundary framing for AI capabilities

Employees experience technostress, anxiety and micro-political negotiation around AI tools in everyday work.

Reported experiences from semistructured interviews with 28 managers/professionals across 12 organizations; thematic analysis highlighting technostress and anxiety as themes.

high negative Reimagining work in the age of intelligent automation: a qua... technostress and anxiety among employees

Autonomous software-engineering agents remain unreliable in realistic development settings.

Assertion in abstract summarizing the observed current state; likely based on prior literature and/or authors' observations (no empirical sample size given in abstract).

high negative AI Harness Engineering: A Runtime Substrate for Foundation-M... reliability of autonomous software-engineering agents (ability to perform correc...

The paper identifies five fundamental architectural mismatches between conventional APIs and autonomous agent requirements: exact-identifier dependence, rendering-oriented responses, single-shot interaction assumptions, user-equivalent authorization, and opaque error semantics.

Conceptual analysis and problem-framing presented in the paper (qualitative identification of five mismatch categories).

high negative Agent-First Tool API: A Semantic Interface Paradigm for Ente... architectural_mismatches_between_conventional_APIs_and_autonomous_agent_requirem...

Current surveys remain fragmented across system optimization, architecture design, and trust, lacking a unified framework to evaluate the fundamental trade-off between output quality and economic cost.

Authors' literature review and critique of existing surveys; based on mapping of prior works into separated strands (qualitative assessment rather than quantified meta-analysis).

high negative Token Economics for LLM Agents: A Dual-View Study from Compu... lack of a unified framework for output-quality vs. economic-cost trade-offs in e...

Exponential token consumption introduces severe computational, collaborative, and security bottlenecks.

Synthesis presented in the paper arguing that rising token usage causes system-level constraints; based on literature survey and conceptual analysis (no single empirical sample reported).

high negative Token Economics for LLM Agents: A Dual-View Study from Compu... computational, collaborative, and security bottlenecks caused by token consumpti...

Producing hardened, production-grade agent workflows may require extra compute and time, and these costs must be amortized through reuse across a broad user community.

Argument in paper reasoning that added rigor entails higher compute/time costs and that reuse across users is needed to amortize these costs; no empirical cost estimates provided.

high negative Engineering Robustness into Personal Agents with the AI Work... resource_costs (compute/time) and implications for amortization/adoption

By focusing on rapid, real-time synthesis, AI agents are effectively delivering users improvised prototypes rather than systems fit for high-stakes scenarios in which users may unwittingly apply them.

Conceptual argument presented in the paper asserting a qualitative mismatch between on-the-fly agents and high-stakes production needs; no empirical validation reported.

high negative Engineering Robustness into Personal Agents with the AI Work... suitability for high-stakes use / risk to users

The on-the-fly paradigm short-circuits disciplined software engineering processes—iterative design, rigorous testing, adversarial evaluation, staged deployment, and more—that have delivered relatively reliable and secure systems.

Argumentative claim in paper linking the on-the-fly loop to reduced application of standard SE processes; no empirical study, sample, or quantitative evidence provided.

high negative Engineering Robustness into Personal Agents with the AI Work... reliability and security (degree to which SE processes are applied)

« Prev 1 2 3 4 5 … 69 70 Next »