Evidence (8066 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	417	113	67	480	1091
Governance & Regulation	419	202	124	64	823
Research Productivity	261	100	34	303	703
Organizational Efficiency	406	96	71	40	616
Technology Adoption Rate	323	128	74	38	568
Firm Productivity	307	38	70	12	432
Output Quality	260	71	27	29	387
AI Safety & Ethics	118	179	45	24	368
Market Structure	107	128	85	14	339
Decision Quality	177	75	37	19	312
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	74	34	78	9	197
Skill Acquisition	98	36	40	9	183
Innovation Output	121	12	24	13	171
Firm Revenue	98	35	24	—	157
Consumer Welfare	73	31	37	7	148
Task Allocation	87	16	34	7	144
Inequality Measures	25	76	32	5	138
Regulatory Compliance	54	61	13	3	131
Task Completion Time	89	7	4	3	103
Error Rate	44	51	6	—	101
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	33	11	7	98
Wages & Compensation	54	15	20	5	94
Team Performance	47	12	15	7	82
Automation Exposure	27	26	10	6	72
Job Displacement	6	39	13	—	58
Hiring & Recruitment	40	4	6	3	53
Developer Productivity	34	4	3	1	42
Social Protection	22	11	6	2	41
Creative Output	16	7	5	1	29
Labor Share of Income	12	6	9	—	27
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

There exists an optimal level of data (big data) sharing that achieves the best balance between economic development and privacy, thereby maximizing individuals' welfare.

Analytical optimization within the theoretical macro model: model yields an interior optimum for data-sharing intensity that trades off economic gains and privacy costs (derivation/analytical result; no empirical test).

high positive Study on the impact of big data sharing on individuals’ welf... individuals' welfare maximization via optimal data-sharing level

Structured intent representations (PPS) can improve alignment and usability in human–AI interaction, especially in tasks where user intent is inherently ambiguous.

Synthesis of experimental findings (rendered PPS better on goal_alignment overall, task-dependent gains concentrated in high-ambiguity business tasks) and the preliminary user survey.

high positive Evaluating 5W3H Structured Prompting for Intent Alignment in... alignment_and_usability

A preliminary retrospective survey (N = 20) suggests a 66.1% reduction in follow-up prompts required, from 3.33 to 1.13 rounds, when using PPS.

Authors report a small retrospective survey of N = 20 respondents comparing number of follow-up prompt rounds required before vs after adopting PPS (self-reported).

high positive Evaluating 5W3H Structured Prompting for Intent Alignment in... number_of_follow-up_prompt_rounds_required

We introduce goal_alignment, a user-intent-centered evaluation dimension, and find that natural-language-rendered PPS outperforms both simple prompts and raw PPS JSON on this metric.

Experimental comparison across the three prompt conditions using the goal_alignment evaluation dimension applied to the collected outputs (540 outputs across 60 tasks and 3 models), as judged by an LLM judge.

high positive Evaluating 5W3H Structured Prompting for Intent Alignment in... goal_alignment

The Institutional Scaling Law predicts that the next phase transition will be driven not by larger models but by better-orchestrated systems of domain-specific models adapted to specific institutional niches.

Predictive conclusion derived from the Institutional Scaling Law and theoretical analysis in the paper. No empirical validation or sample size reported in the excerpt.

high positive The Institutional Scaling Law: Non-Monotonic Fitness, Capabi... drivers of the next phase transition in AI (orchestration of domain-specific sys...

A Symbiogenetic Scaling correction demonstrates that orchestrated systems of domain-specific models can outperform frontier generalists in their native deployment environments.

Theoretical correction/derivation and comparative analysis within the paper (no empirical sample or quantitative benchmark reported in the excerpt).

high positive The Institutional Scaling Law: Non-Monotonic Fitness, Capabi... performance of orchestrated domain-specific model systems versus frontier genera...

A mixed-methods empirical research agenda is presented, proposing a future PLS-SEM approach to test the mediating role of the cognitive flywheel and the moderating effect of fractal governance on organizational resilience.

Methodological proposal described in the paper (research design and proposed analytic approach); no executed empirical study or sample reported.

high positive Governing Human–AI Co-Evolution: Intelligentization Capabili... organizational_resilience (as mediator/moderator relationships to be tested)

Fractal governance architecture is proposed to mitigate systemic vulnerabilities such as automation bias.

Conceptual proposal of a governance design in the paper; no empirical test or sample provided.

high positive Governing Human–AI Co-Evolution: Intelligentization Capabili... reduction_in_automation_bias / improvement_in_decision_quality

The cognitive flywheel is the central mechanism of this dynamic capability and can be operationalized (the paper operationalizes the cognitive flywheel).

Theoretical operationalization within the paper (concept definition and proposed operational measures); no empirical measurement or sample reported.

high positive Governing Human–AI Co-Evolution: Intelligentization Capabili... mechanism_operationalization (cognitive_flywheel)

The co-evolutionary dynamic is formalized using coupled non-linear differential equations and time decay integrals.

Mathematical formalization reported in the paper (modeling methods described); no empirical parameter estimation or sample provided.

high positive Governing Human–AI Co-Evolution: Intelligentization Capabili... existence_of_mathematical_model/formal_framework

Dynamic cognitive advantage arises from the historical, recursive, structural coupling of human semantic intent and machine syntactic processing (a co-evolutionary dynamic).

Conceptual theory introduced and argued in the paper (mechanism-level proposition); formalization provided but no empirical validation.

high positive Governing Human–AI Co-Evolution: Intelligentization Capabili... competitive_differentiation/innovation_output

Conceptualizing the enterprise as a complex adaptive system operating far from thermodynamic equilibrium provides a more appropriate framing for organizations integrating AI and enables the theory of dynamic cognitive advantage.

Theoretical development and conceptual argumentation within the paper; formal framing rather than empirical test; no sample reported.

high positive Governing Human–AI Co-Evolution: Intelligentization Capabili... competitive_differentiation/innovation_output

We propose a multi-agent discussion framework wherein specialized agents collaboratively process extensive product information, distributing cognitive load to alleviate single-agent attention bottlenecks and capturing critical decision factors through structured dialogue.

Method description: multi-agent discussion architecture described and implemented; claimed to distribute cognitive load and reduce single-agent attention bottlenecks (design + reported behavior).

high positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... reduction of single-agent attention bottlenecks / distributed processing of prod...

To enhance simulation stability, we implement a mean-field mechanism designed to model the dynamic interactions between the product environment and customer populations, effectively stabilizing sampling processes within high-dimensional decision spaces.

Method description: implementation of a mean-field mechanism within the simulator; paper asserts this design stabilizes sampling in high-dimensional decision spaces (method + reported simulation behavior).

high positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... simulation stability / stabilized sampling processes

We introduce a preference learning paradigm in which LLMs are economically aligned via post-training on extensive, heterogeneous transaction records across diverse product categories.

Method description: post-training LLMs on heterogeneous transaction records across product categories to align preferences (methodological / training procedure described).

high positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... ability of models to internalize consumer preferences via post-training

This paper introduces a Multi-Agent Large Language Model-based Economic Sandbox (MALLES) as a unified simulation framework applicable to cross-domain and cross-category scenarios.

Paper description: design and implementation of MALLES, presented as a unified framework leveraging large-scale LLM generalization for cross-domain/cross-category simulation (methodological contribution).

high positive MALLES: A Multi-agent LLMs-based Economic Sandbox with Consu... existence and applicability of MALLES as a unified simulation framework

Leaders' AI symbolization lessens AI's negative impact on employees' emotional exhaustion.

Moderation analysis in the four-stage longitudinal study of 285 finance professionals; leader AI symbolization tested as moderator of AI usage -> emotional exhaustion path.

high positive Autonomous enhancement or emotional depletion? The dual-path... emotional exhaustion (moderated by leaders' AI symbolization)

Leaders' AI symbolization strengthens AI's positive effect on employees' sense of self-determination.

Moderation analysis within the same four-stage longitudinal survey of 285 finance professionals; leader AI symbolization tested as moderator of AI usage -> sense of self-determination path.

high positive Autonomous enhancement or emotional depletion? The dual-path... sense of self-determination (moderated by leaders' AI symbolization)

AI usage can boost innovative work behavior by enhancing employees' sense of self-determination.

Four-stage longitudinal study (survey) of finance professionals (N=285); mediation analysis testing AI usage -> sense of self-determination -> innovative work behavior, grounded in SOR theory.

high positive Autonomous enhancement or emotional depletion? The dual-path... innovative work behavior (mediated by sense of self-determination)

Retrieval substantially improves reasoning over textual fundamentals.

Result reported from the experiments comparing zero-shot prompting to retrieval-augmented settings on fundamentals-focused questions; the paper asserts that retrieval provided substantial improvement for textual fundamentals reasoning.

high positive FinTradeBench: A Financial Reasoning Benchmark for LLMs improvement in reasoning/performance on fundamentals-focused questions with retr...

Human-AI systems should be designed under a cognitive sustainability constraint so that gains in hybrid performance do not come at the cost of degradation in human expertise.

Normative recommendation in the paper based on the conceptual/mathematical framework and the identified trade-off; presented as an argument rather than empirically validated policy outcome in the excerpt.

high positive Cognitive Amplification vs Cognitive Delegation in Human-AI ... preservation of human expertise under human-AI design choices

Together, these quantities provide a low-dimensional metric space for evaluating whether human-AI systems achieve genuine synergistic performance and whether such performance is cognitively sustainable for the human component over time.

Claim about the utility of the defined metrics, supported within the paper by the conceptual/mathematical framework and the proposed metric definitions (theoretical demonstration rather than reported empirical validation in the excerpt).

high positive Cognitive Amplification vs Cognitive Delegation in Human-AI ... hybrid human-AI performance and cognitive sustainability

The paper defines a set of operational metrics: the Cognitive Amplification Index (CAI*), the Dependency Ratio (D), the Human Reliance Index (HRI), and the Human Cognitive Drift Rate (HCDR).

Explicit listing of newly proposed operational metrics in the paper; this is a descriptive claim about the paper's content (theoretical definitions), no sample size or empirical estimation provided in the excerpt.

high positive Cognitive Amplification vs Cognitive Delegation in Human-AI ... operational metrics for human-AI cognitive interaction (CAI*, D, HRI, HCDR)

The paper introduces a conceptual and mathematical framework to distinguish cognitive amplification (AI improves hybrid human-AI performance while preserving human expertise) from cognitive delegation (reasoning is progressively outsourced to AI).

Explicit contribution claim in the paper (description of a conceptual and mathematical framework); evidence consists of the model and formal definitions presented in the paper (no external empirical validation reported in the excerpt).

high positive Cognitive Amplification vs Cognitive Delegation in Human-AI ... mode of human-AI interaction (amplification vs delegation)

Artificial intelligence generates positive spatial spillovers for UCEE (positive effects on neighboring regions).

Spatial Durbin model reported in the abstract indicating positive spillover coefficients for artificial intelligence.

high positive How artificial intelligence and environmental regulation inf... UCEE index (spatial spillover effect of AI)

The Global Malmquist–Luenberger (GML) index and its efficiency change (EC) and technological change (TC) components stay above 1, indicating sustained efficiency gains dominated by technological progress.

GML index and decomposition results reported in the abstract based on the panel data and GML computation.

high positive How artificial intelligence and environmental regulation inf... GML index and its EC and TC components (measures of productivity/efficiency chan...

Nationally, the average UCEE index rises from about 0.3 to above 0.7 over the sample period.

Computed UCEE index results from the Super-SBM model applied to the panel of 30 provinces (2013–2022) as reported in the abstract.

high positive How artificial intelligence and environmental regulation inf... UCEE index (average, national)

Recent advances in large language models, tool-using agents, and financial machine learning are shifting financial automation from isolated prediction tasks to integrated decision systems that can perceive information, reason over objectives, and generate or execute actions.

Literature synthesis and conceptual statement in the paper's introduction describing recent technological advances and their effects on financial automation; no empirical sample size reported.

high positive AI Agents in Financial Markets: Architecture, Applications, ... shift in type of financial automation (from isolated prediction to integrated de...

SOL-ExecBench reframes GPU kernel benchmarking from beating a mutable software baseline to closing the remaining gap to hardware Speed-of-Light.

Conceptual/positioning claim made by the authors about the intended shift in benchmarking perspective enabled by SOL-ExecBench.

high positive SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GP... benchmarking_objective_shift_toward_hardware_efficiency

To support robust evaluation of agentic optimizers, we provide a sandboxed harness with GPU clock locking, L2 cache clearing, isolated subprocess execution, and static analysis-based checks against common reward-hacking strategies.

Method/tool claim in paper describing the provided evaluation harness and its engineered controls (list of features included).

high positive SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GP... evaluation_robustness_and_integrity_of_benchmarking

We report a SOL Score that quantifies how much of the gap between a release-defined scoring baseline and the hardware SOL bound a candidate kernel closes.

Paper defines the SOL Score metric and states its interpretive meaning (fraction of gap closed between baseline and hardware SOL bound).

high positive SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GP... fraction_of_gap_closed_to_hardware_bound

SOL-ExecBench measures performance against analytically derived Speed-of-Light (SOL) bounds computed by SOLAR, our pipeline for deriving hardware-grounded SOL bounds, yielding a fixed target for hardware-efficient optimization.

Methodological claim: introduction of SOLAR pipeline to compute analytic hardware-grounded SOL bounds and use of those bounds as benchmark targets, as described in the paper.

high positive SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GP... proximity_to_hardware_speed_of_light_bounds

The benchmark covers forward and backward workloads across BF16, FP8, and NVFP4, including kernels whose best performance is expected to rely on Blackwell-specific capabilities.

Paper description of benchmark coverage (workload direction and data types; inclusion of kernels tied to Blackwell hardware features).

high positive SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GP... coverage_of_workloads_and_datatypes

We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs.

Paper reports construction of the benchmark with counts: 235 CUDA kernel problems and 124 source models; descriptive dataset claim in the manuscript.

high positive SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GP... benchmark_problem_count_and_coverage

Given these findings, policymakers should favor 'strategic forbearance'—apply existing laws rather than create new regulations that could stifle innovation and diffusion of AI.

Authors' normative policy recommendation based on their interpretation of the reviewed empirical literature (risk–benefit assessment); this is a prescriptive conclusion rather than an empirical finding, so no sample size applies.

high positive AI, Productivity, and Labor Markets: A Review of the Empiric... regulatory approach to AI governance (strategy of forbearance vs. new regulation...

Generative AI lowers entry costs for startups, facilitating new firm entry and product development.

Cited empirical and descriptive evidence in the literature review indicating reduced development costs and faster product prototyping enabled by AI tools; the brief does not provide a pooled sample size or a single quantitative estimate.

high positive AI, Productivity, and Labor Markets: A Review of the Empiric... barriers to entry / startup costs and rate of new product development

Generative AI significantly boosts productivity in specific tasks like coding, writing, and customer service—often by 15% to 50%.

Synthesis/review of empirical literature through 2025 (multiple empirical studies of task-level impacts, including field and lab studies and observational analyses); the brief reports aggregate reported effect ranges but does not list a single pooled sample size.

high positive AI, Productivity, and Labor Markets: A Review of the Empiric... task-level productivity in coding, writing, and customer service

The study contributes to theory by empirically integrating technological, human, and institutional dimensions within a single architectural framework, moving beyond isolated analyses of digital credit.

Author-stated contribution based on combining measures of algorithmic credit systems, human capability, and institutional design and testing interactions in the same regression models.

high positive Architecting financial well-being in algorithmic credit syst... theoretical contribution / integrative framework

Moderation analysis reveals that higher levels of human capability and stronger institutional design amplify the positive effects of algorithmic credit systems and mitigate their adverse effects (i.e., they strengthen repayment and resilience effects and reduce financial stress).

Reported moderation analyses using interaction terms in the regression models on the 400-user cross-sectional sample; results described as significant moderation by human capability and institutional design.

high positive Architecting financial well-being in algorithmic credit syst... conditional effects on repayment behavior, financial resilience, and financial s...

Algorithmic credit systems are positively associated with financial resilience.

Regression analyses reported show a positive relationship between algorithmic credit system use and measures of financial resilience in the sample of 400 users.

high positive Architecting financial well-being in algorithmic credit syst... financial resilience

Algorithmic credit systems are positively associated with repayment behavior.

Multiple regression results reported in the study indicate a positive association between use of algorithmic credit systems and repayment behavior based on cross-sectional survey of 400 users.

high positive Architecting financial well-being in algorithmic credit syst... repayment behavior

Measurement reliability and validity were established through Cronbach's alpha and principal component analysis.

Paper states that Cronbach’s alpha and principal component analysis (PCA) were used to establish measurement reliability and validity.

high positive Architecting financial well-being in algorithmic credit syst... measurement reliability/validity

The study used a quantitative, explanatory, cross-sectional design and employed multiple regression and moderation analyses to assess relationships among algorithmic credit systems, human capability, institutional design, and financial-wellbeing outcomes.

Methods described explicitly: quantitative explanatory cross-sectional design; analytical methods named as multiple regression and moderation analyses.

high positive Architecting financial well-being in algorithmic credit syst... research design / analytic methods

Data were collected from 400 users of algorithmic and digitally mediated credit platforms.

Study reports a quantitative, explanatory, cross-sectional survey of users; sample size explicitly stated as 400.

high positive Architecting financial well-being in algorithmic credit syst... sample_size / data source

Institutional design (enforceable rules, auditable logs, human oversight on high-impact actions) is a precondition for safe delegation of real authority to LLM agents; systems should be stress-tested under governance-like constraints before assignment of real authority.

Policy recommendation derived from simulation findings that governance structure strongly influences corruption-related outcomes and that safeguards alone are not consistently sufficient; grounded in experiments and rubric-assessed outcomes across 28,112 transcript segments.

high positive I Can't Believe It's Corrupt: Evaluating Corruption in Multi... safety of delegation to LLM agents (compliance with rules, avoidance of abuse)

Among models operating below saturation, governance structure is a stronger driver of corruption-related outcomes than model identity.

Comparative analysis within the multi-agent governance simulations across different authority structures and model identities; outcomes aggregated and compared across regimes (based on the 28,112 transcript segments scored).

high positive I Can't Believe It's Corrupt: Evaluating Corruption in Multi... corruption-related outcomes / rule-breaking

Integrity in institutional AI should be treated as a pre-deployment requirement rather than a post-deployment assumption.

Argument and recommendation based on results from multi-agent governance simulations evaluating rule-breaking and abuse; conclusions drawn from aggregate outcomes across simulated regimes and interventions (see study of 28,112 transcript segments).

high positive I Can't Believe It's Corrupt: Evaluating Corruption in Multi... institutional integrity / safety of delegation to LLM agents

The AgentDS benchmark datasets are open-sourced and available at https://huggingface.co/datasets/lainmn/AgentDS.

Paper includes link to the open-source datasets and the AgentDS website.

high positive AgentDS Technical Report: Benchmarking the Future of Human-A... availability of datasets

The strongest solutions arise from human-AI collaboration.

Analysis of competition results showing top-performing submissions employed human-AI collaborative approaches rather than AI-only baselines (results from 29 teams / 80 participants).

high positive AgentDS Technical Report: Benchmarking the Future of Human-A... performance of human-AI collaborative solutions

We introduce AgentDS, a benchmark and competition designed to evaluate both AI agents and human-AI collaboration performance in domain-specific data science.

Paper describes the creation of the AgentDS benchmark and an associated competition as the study's primary methodological contribution.

high positive AgentDS Technical Report: Benchmarking the Future of Human-A... benchmark for evaluating AI agents and human-AI collaboration

« Prev 1 2 3 … 61 62 63 … 161 162 Next »