Evidence (14055 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Market concentration and network effects create platform power that may squeeze smaller providers, raise costs, or lock users into ecosystems.

Platform economics literature and case examples reviewed in the paper; conceptual and theoretical support with illustrative empirical instances from secondary sources.

high negative Financial Inclusion in the Age of FinTech Platforms: Opportu... market concentration measures; prices/costs to users; switching costs/lock-in

Infrastructure gaps (connectivity, electricity, identity systems) limit who benefits from digital finance.

Cross-country and development literature synthesized in the paper highlighting correlations between infrastructure availability and digital finance uptake; no primary empirical analysis in the paper.

high negative Financial Inclusion in the Age of FinTech Platforms: Opportu... uptake/usage of digital financial services conditional on infrastructure availab...

Measurement issues (task-based output measurement, attributing output changes to AI) and selection into early adoption bias estimated productivity gains upward.

Methodological robustness checks reported in the paper: task-based measures, bounding exercises, placebo tests, and analysis of pre-trends; discussions of selection on unobservables and potential upward bias.

high negative S-TCO: A Sustainable Teacher Context Ontology for Educationa... validity/bias of estimated productivity effects

Implementing the governed hyperautomation pattern raises upfront costs (governance tooling, monitoring, validation, compliance processes).

Economic and cost-structure discussion in the paper, based on qualitative reasoning and industry experience; no quantified cost estimates or sample-based cost analysis provided.

high negative Governed Hyperautomation for CRM and ERP: A Reference Patter... upfront implementation costs (governance tooling, validation, compliance overhea...

Use of standardized (non-adaptive) dialogues limits ecological validity relative to live adaptive chatbots.

Limitations section acknowledges that standardized (non-adaptive) experimental dialogues reduce ecological validity compared with live/adaptive chatbot interactions.

high negative AI Chatbots as Informatics-Enabled Marketing Service Systems... ecological validity

Platform KPIs (e.g., eCPM) can diverge from social welfare metrics (consumer surplus, privacy harms), creating metric misalignment.

Conceptual critique with examples of common platform metrics versus welfare economics; not accompanied by a quantitative comparison dataset.

high negative Artificial Intelligence for Personalized Digital Advertising... alignment between platform KPIs and social welfare measures

Privacy constraints reduce observability and necessitate privacy-preserving study designs that complicate estimation.

Methodological analysis referencing differential privacy, federated learning and their effects on statistical power/observability; no experimental power analyses with sample sizes presented here.

high negative Artificial Intelligence for Personalized Digital Advertising... observability and estimation precision under privacy constraints

Data access asymmetries (platforms holding proprietary logs) limit external auditability and replication of advertising research.

Empirical and institutional observation about industry data practices; supported by calls for privacy-preserving shared datasets in the paper; no quantified survey sample included.

high negative Artificial Intelligence for Personalized Digital Advertising... external auditability and ability to replicate studies

Attribution complexity — multi-touch, cross-device, and delayed conversions — confounds causal inference in advertising measurement.

Methodological discussion referencing causal inference challenges and standard problems in attribution; widely-documented in the literature though not re-measured in this paper.

high negative Artificial Intelligence for Personalized Digital Advertising... accuracy of causal attribution for ad effects

Complex automated systems make attribution and responsibility harder when harms occur (Automation vs accountability trade-off).

Qualitative institutional analysis and case-study reasoning about multi-agent automated pipelines and opaque model decisions; no single empirical incident dataset provided.

high negative Artificial Intelligence for Personalized Digital Advertising... clarity of attribution and accountability in case of harms

Richer personalization depends on granular data and cross-device identity, creating privacy externalities and compliance risks (Personalization vs privacy trade-off).

Data source inventory and privacy literature review; supported by observational industry trends (move to first-party identity) rather than a quantified sample in the paper.

high negative Artificial Intelligence for Personalized Digital Advertising... degree of personalization versus exposure to privacy risks/compliance failures

Federated infrastructures introduce adversarial risks (model/data poisoning, inference attacks on updates) that require robust aggregation, anomaly detection, and other defenses.

Threat modeling and taxonomy of adversarial/privacy threats with mapped mitigations (robust aggregation, anomaly detection, DP). Evidence is conceptual and based on standard threat frameworks; no empirical attack/defense experiments reported at scale.

high negative Privacy-Aware AI Advertising Systems: A Federated Learning F... vulnerability to poisoning/inference (attack success rate), effectiveness of def...

Delayed and sparse feedback (clicks/conversions) in advertising complicates credit assignment and timely model updates, degrading learning unless specific methods for delayed/sparse signals are used.

Analytical discussion of learning dynamics with delayed/sparse labels; conceptual solutions suggested (credit assignment methods). No large-scale empirical evaluation presented.

high negative Privacy-Aware AI Advertising Systems: A Federated Learning F... learning efficacy under delayed/sparse feedback (convergence, time-to-adapt), at...

Non-IID and heterogeneous data distributions across devices and publishers impair convergence and degrade personalization unless addressed with algorithmic adaptations.

Analytical modeling of convergence under non-IID conditions; threat/robustness discussion; prototype/simulation illustrations. This claim is supported by established literature and the paper's analytic treatment.

high negative Privacy-Aware AI Advertising Systems: A Federated Learning F... convergence behavior (rate, stability), personalization performance (accuracy on...

The cost of formalizing informal labor (CFIL) implies formalizing a worker costs on average 88% more than the informal wage in 2023.

New CFIL metric calculated for 19 countries (2023 baseline) by estimating the additional employer cost of hiring and formalizing an informal worker and reporting it relative to the informal wage, using compiled statutory obligations and informal wage benchmarks.

high negative Salaried Labor Costs in Latin America and the Caribbean: A T... CFIL (additional cost of formalizing) as % above informal wage

VIS inherits the limitations of input–output assumptions (fixed coefficients, no price feedbacks); AI-driven structural change may violate those assumptions, so dynamic extensions or calibration are needed.

Paper explicitly cautions about input–output model limitations and the need for dynamic extensions/calibration under structural/technological change.

high negative Measuring labor productivity dynamics in U.S. industrial and... validity/applicability of VIS estimates under structural/AI-driven change

There is sizable attrition in the pipeline from applicant admission through to direct employment of AI graduates, indicating leakages at multiple stages (application → admission → graduation → employment).

Quantification of human-resource losses across pipeline stages using the monitoring dataset for the 191 institutions; descriptive counts/percentages of entrants, admitted students, graduates, and those directly employed in AI roles (pipeline loss metrics reported in paper).

high negative Employment og Graduates of Educational Programs in the Field... Attrition rates / absolute losses at sequential pipeline stages (applicants → ad...

Graduates from Russian universities running AI-related educational programs together with alternative training routes (self-education and professional retraining) satisfy 43.9% of estimated national AI personnel demand.

Monitoring dataset of 191 Russian universities implementing AI-related programs; aggregated counts of university graduates plus estimated contributions from self-education and professional retraining compared to an estimated national AI personnel demand (coverage reported as 43.9%).

high negative Employment og Graduates of Educational Programs in the Field... Share (%) of estimated national AI personnel demand satisfied by combined univer...

AI automates routine and some mid-skill tasks, reducing employment in those occupations.

Empirical task-based exposure measures mapping AI capabilities to occupational task content, microdata analyses of employment by occupation using household/employer/administrative datasets, and panel regressions/decompositions that document within-occupation declines and between-occupation shifts.

high negative Intelligence and Labor Market Transformation: A Critical Ana... employment levels in routine and mid-skill occupations

Relying on secondary literature limits the paper's ability to make causal inferences and constrains empirical generalizability to all sectors or countries.

Stated limitations in the paper's Data & Methods section acknowledging scope and inferential constraints.

high negative Who Loses to Automation? AI-Driven Labour Displacement and t... causal inference strength and generalizability of conclusions

Increases in K_T reduce employment levels in affected firms and industries even when aggregate productivity rises.

Panel econometric estimates at firm and industry levels relating K_T intensity to employment outcomes, controlling for demand, input prices, and firm characteristics; difference-in-differences specifications and instrumental-variable robustness checks; corroborated by sectoral case studies.

high negative The Macroeconomic Transition of Technological Capital in the... employment (firm- and industry-level employment counts or employment growth)

Rising technological capital (K_T) — proxied by robot/automation density, software and intangible capital accumulation, AI adoption surveys, and AI-related patenting — leads to a decline in labor’s share of output.

Firm- and industry-level panel regressions linking constructed K_T intensity measures to labor shares, supported by macro growth-accounting decompositions; robustness checks include difference-in-differences and instrumenting adoption with plausibly exogenous shocks (e.g., cross-border technology diffusion, trade shocks); validated with cross-country comparisons and case studies.

high negative The Macroeconomic Transition of Technological Capital in the... labor share of income (share of output paid to labor)

Fuel subsidy reform imposed an enormous fiscal burden that peaked at 2.8% of GDP in 2022, limiting the macroeconomic leverage of AI-driven efficiency gains.

Reported fiscal statistic in the paper (2.8% of GDP in 2022) and its role in analysis of why AI savings do not translate into large macro gains.

high negative (constraint) AI-Based Technological Transformation as a Driver for Develo... fiscal burden of fuel subsidies (% of GDP) and its moderating effect on GDP/trad...

The oil and gas trade balance remained in deficit at -1.55 billion USD in May 2025 and -1.58 billion USD in July 2025 despite an overall national trade surplus.

Reported trade-balance figures in the paper (monthly trade statistics for May and July 2025).

high negative (deficit persists) AI-Based Technological Transformation as a Driver for Develo... oil & gas trade balance (USD, monthly values)

We discuss design tradeoffs, failure modes, and lessons learned from operating autonomous AI agents at scale.

Paper statement indicating inclusion of discussion sections on tradeoffs, failure modes, and operational lessons; descriptive/meta claim about paper content.

high neutral Autonomous Incident Resolution at Hyperscale: An Agentic AI ... discussion of design tradeoffs, failure modes, and lessons learned

The core problem is not the absence of explanation but the absence of structured reasoning in the first place.

Conceptual argument/proposed reframing presented in the paper; no empirical test reported.

high neutral Beyond Post-hoc Explanation: Toward Glassbox AI via Probabil... presence of structured reasoning vs. post-hoc explanation

We ran a controlled three-arm ablation on a production valuation agent: A = plain web-only LLM analyst; B = adds public structured tools + a 14-dimension valuation playbook, verifier, objectivity policy and red-team; C = adds the proprietary Noah AI corpus of curated pipeline, trial and deal intelligence.

Description of experimental arms and setup used in the study (methodological statement).

high neutral AI Scientists Are Only as Good as Their Evidence: A Stratifi... experimental treatment definitions (method)

The evidence base was concentrated in system-facing applications that detect or shape inequities within recruitment, evaluation and exposure systems.

Synthesis result from the scoping review indicating thematic concentration across included studies (as reported in abstract).

high neutral Artificial intelligence applications supporting women’s care... focus of existing empirical studies (system-facing vs individual-facing applicat...

ALE is organized around a task taxonomy with 55 subfields grouped into 13 industry clusters covering 1K+ tasks.

Author-provided counts describing the benchmark taxonomy and task pool.

high neutral Agents' Last Exam taxonomy breadth (subfields, clusters, number of tasks)

ALE covers non-physical industries defined with reference to O*NET / SOC 2018 (the U.S. federal occupational taxonomy).

Design specification described in the paper referencing O*NET / SOC 2018.

high neutral Agents' Last Exam scope of industries covered by the benchmark

Agentic AI is best characterized as a continuum of autonomy and delegated authority, distinct from purely informational outputs and including systems capable of independently generating insured events through external actions.

Conceptual taxonomy and definitional argument presented in the paper distinguishing informational models from agentic systems with delegated authority; theoretical reasoning and classification.

high neutral Insurance of Agentic AI characterization of agentic AI along autonomy/delegation continuum

We evaluated seven models (including Gemini, Claude, and GPT families) by comparing their zero-shot estimates against self-reported skill ratings from 27 participants.

Method description: evaluation of seven LLMs comparing zero-shot model estimates to self-reported skill ratings; 27 participants provided self-reports.

high neutral Can AI Guess What You Know? Performance Comparison of Large ... comparison between model zero-shot skill estimates and self-reported skill ratin...

At inference time, BRANE selects the configuration that maximizes predicted correctness penalized by cost, exposing a tunable cost-quality tradeoff without retraining.

Method description and algorithmic claim in the paper (selection rule maximizing predicted correctness with cost penalty). No empirical sample size required for algorithmic description.

high neutral Natural Language Query to Configuration for Retrieval Agents cost-quality tradeoff exposed by selection strategy

We propose BRANE, which uses an LLM to convert each query into workload-specific characteristics, then trains a lightweight per-configuration predictor that estimates whether the pipeline will answer the query correctly.

Method description in the paper: BRANE architecture and training procedure (LLM-based feature extraction + per-configuration correctness predictor). No numeric sample size reported for method description.

high neutral Natural Language Query to Configuration for Retrieval Agents method (feature extraction and predictor training)

AI deployment should be evaluated not only by average task speed, but by its overall effects on congestion, rework, and the robustness of human oversight under load.

Policy/recommendation based on the paper's theoretical results and derived implications from the queueing model (conceptual/prescriptive conclusion; no empirical testing reported).

high neutral Queue & AI: When Faster Tasks Slow Down the Workflow organizational_efficiency

The divergence between mean task speed and system-level delay caused by AI assistance is labeled the 'variance wedge'.

Definition/terminology introduced in the paper as part of its conceptual framing; supported by the analytic model description.

high neutral Queue & AI: When Faster Tasks Slow Down the Workflow task_completion_time

The benchmark probes 18 mainstream LLMs across four prompting strategies.

Benchmark experiments described in the paper evaluate 18 mainstream LLMs using four different prompting strategies applied to the collected dataset.

high neutral Benchmarking LLMs for Community Governance Simulation with L... coverage of models and prompting strategies in benchmark (number of LLMs and pro...

Structured illustrations across document processing, legal services, audit, clinical decision support, and procurement discipline the boundary logic developed in the theory.

Methodological statement that the paper uses structured cross-domain illustrations to ground and discipline the theoretical claims; no empirical sample reported.

high neutral Redrawing the AI Map: A Theory of Accountability Boundaries ... theoretical grounding via domain illustrations

There are three accountability-boundary strategies in agentic ecosystems: component, integrated, and dual-track.

Theoretical categorization introduced by the authors as part of the capability-level theory; illustrated with cross-domain examples rather than empirical testing.

high neutral Redrawing the AI Map: A Theory of Accountability Boundaries ... classification of boundary strategy

GENSTRAT generates a distribution of two-player zero-sum imperfect-information card games.

Design specification in paper; reported generated pool size of 2,000 games (abstract).

high neutral GENSTRAT: Toward a Science of Strategic Reasoning in Large L... game distribution (two-player zero-sum imperfect-information card games)

The study used standard scientific methods, employing a comparative approach and inductive and deductive methods to identify patterns of interaction between legal regulation and technological development.

Methodology section of the paper explicitly states the use of comparative, inductive and deductive methods and theoretical synthesis.

high neutral ECONOMIC SYSTEMS IN THE CONTEXT OF DIGITALISATION AND AI: TH... methodological approach used in the study

The paper develops a theoretical and legal model that treats law as an integral part of the economic system influencing income distribution, labour relations, market structure and productivity dynamics.

Model construction through synthesis of theoretical perspectives using inductive and deductive methods and comparative legal analysis (methodology described in the paper).

high neutral ECONOMIC SYSTEMS IN THE CONTEXT OF DIGITALISATION AND AI: TH... role of legal frameworks in shaping economic institutional conditions (income di...

The paper provides a taxonomy of minimum input artifacts for agentic software, firmware, and hardware work; a conversation-to-contract gate; risk-adaptive workflows; and an evidence-bundle acceptance model for agent-generated artifacts.

Declared contributions in the paper (deliverables/artefacts produced by the research; no empirical validation provided in the abstract).

high neutral Agentic Agile-V: From Vibe Coding to Verified Engineering in... availability of process artifacts and workflow models for agentic engineering

The central problem for agentic engineering is no longer prompt engineering; it is engineering process control.

Argument and synthesis presented by the paper (conceptual claim based on reviewed evidence).

high neutral Agentic Agile-V: From Vibe Coding to Verified Engineering in... primary bottleneck affecting agentic engineering effectiveness (process control ...

The results define three operating regimes.

Summary claim in results/conclusions indicating categorization of outcomes into three regimes.

high neutral Cross-domain benchmarks reveal when coordinated AI agents im... classification into operating regimes

Few benchmarks achieve widespread use (examples given include GPQA Diamond, LiveCodeBench, AIME 2025).

Empirical observation from the dataset showing that only a small number of benchmarks are highlighted across multiple builders/releases; specific named benchmarks are cited as relatively widely used.

high neutral Unsteady Metrics and Benchmarking Cultures of AI Model Build... frequency of benchmark highlighting across builders/releases

We performed a large-scale evaluation spanning 15,000 messages with cross-model validation across six LLMs from three families (OpenAI, Anthropic, Google), totaling 1,440 queries.

Study design and reported sample sizes and model counts provided in the paper.

high neutral Episodic-Semantic Memory Architecture for Long-Horizon Scien... evaluation sample size and cross-model coverage

Experiments are run with and without access to Causely under two scenarios: an active incident and a healthy baseline.

Methodological description in the paper describing the two experimental conditions (with/without Causely) and two scenarios (active incident, healthy baseline).

high neutral Causely: A Causal Intelligence Layer for Enterprise AI A Ben... experimental condition (Causely vs. no Causely) across two scenarios

Experiments compare four agent configurations (Claude Code, OpenAI Codex, HolmesGPT with Sonnet and Gemini backends).

Methodological description listing the four agent configurations used in experiments.

high neutral Causely: A Causal Intelligence Layer for Enterprise AI A Ben... agent configuration comparisons

We evaluate this value proposition through a benchmark study conducted in a controlled setting with injected faults in a 24-microservice OpenTelemetry demo application.

Methodological description in the paper specifying a controlled benchmark with an OpenTelemetry demo application composed of 24 microservices.

high neutral Causely: A Causal Intelligence Layer for Enterprise AI A Ben... benchmark evaluation setup (24-microservice demo with injected faults)

« Prev 1 2 3 … 58 59 60 … 281 282 Next »