Evidence (13827 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	749	195	97	889	1979
Governance & Regulation	815	391	188	121	1539
Organizational Efficiency	771	189	124	83	1177
Technology Adoption Rate	624	233	123	96	1084
Research Productivity	410	121	56	331	929
Output Quality	466	177	59	47	749
Decision Quality	320	174	75	42	618
Firm Productivity	435	55	88	20	604
AI Safety & Ethics	214	276	65	33	593
Market Structure	178	166	122	24	495
Task Allocation	206	64	70	31	376
Skill Acquisition	165	57	60	17	299
Innovation Output	201	27	41	18	288
Employment Level	105	51	107	13	278
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	116	63	42	11	232
Firm Revenue	149	46	26	3	224
Inequality Measures	44	122	49	6	221
Task Completion Time	169	29	8	12	219
Worker Satisfaction	89	61	20	12	182
Error Rate	69	91	10	2	172
Regulatory Compliance	76	68	14	5	163
Training Effectiveness	92	19	13	19	145
Wages & Compensation	77	36	25	6	144
Automation Exposure	51	54	22	12	142
Team Performance	86	17	27	9	140
Developer Productivity	94	17	14	6	132
Job Displacement	12	80	20	1	113
Hiring & Recruitment	51	7	8	3	69
Skill Obsolescence	5	45	6	1	57
Creative Output	31	16	7	2	57
Social Protection	27	16	8	2	53
Labor Share of Income	17	17	17	—	51
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Privacy-by-design architectures, secure data interoperability, and compliance automation contribute to trust, institutional legitimacy, and long-term adoption of digital health solutions.

Synthesis of literature on privacy engineering, interoperability standards, and compliance technologies presented in the review (literature review; inferred causal linkages discussed).

high positive Conceptual framework for AI governance, data privacy complia... trust / institutional legitimacy / long-term adoption rate of digital health sol...

The framework gives particular attention to algorithmic transparency, risk management, regulatory alignment, and lifecycle oversight of AI-enabled health systems operating under evolving privacy regulations (e.g., data protection laws and cross-border data governance standards).

Descriptive emphasis within the proposed framework, based on cited literatures in regulatory alignment and algorithmic governance (literature synthesis / conceptual emphasis).

high positive Conceptual framework for AI governance, data privacy complia... regulatory alignment and lifecycle oversight

This review develops a comprehensive conceptual framework that integrates AI governance principles, data privacy compliance mechanisms, and financially sustainable operational models within digital health ecosystems.

The paper's primary contribution is a proposed conceptual framework derived from synthesizing interdisciplinary literatures (conceptual framework produced by authors based on literature review).

high positive Conceptual framework for AI governance, data privacy complia... existence of an integrated governance/privacy/finance framework

The rapid expansion of digital health technologies driven by artificial intelligence has transformed healthcare delivery, clinical decision-making, and health data management.

Narrative synthesis in the review paper drawing on interdisciplinary literature in health informatics, clinical AI studies, and health data management (literature review / conceptual synthesis).

high positive Conceptual framework for AI governance, data privacy complia... clinical decision-making quality / healthcare delivery and data management

We present a simulation study analyzing the social benefits of applying ARS to agentic transactions.

Simulation study reported in the paper (study exists; abstract does not report simulation parameters, sample size, or quantitative results).

high positive Quantifying Trust: Financial Risk Management for Trustworthy... social benefits of applying ARS as assessed by simulation

This shifts trust from an implicit expectation about model behavior to an explicit, measurable, and enforceable product guarantee.

Conceptual claim about the expected effect of adopting ARS (argument presented by authors; no empirical substantiation in the abstract).

high positive Quantifying Trust: Financial Risk Management for Trustworthy... nature of trust (implicit expectation vs explicit/enforceable guarantee) in agen...

Under ARS, users receive predefined and contractually enforceable compensation in cases of execution failure, misalignment, or unintended outcomes.

Functional guarantee described as part of ARS design (contractual/payment mechanism described; no empirical testing detailed in the abstract).

high positive Quantifying Trust: Financial Risk Management for Trustworthy... predefined, contractually enforceable compensation for users upon execution fail...

ARS integrates risk assessment, underwriting, and compensation into a single transaction framework that protects users when interacting with agents.

Design description of ARS in the paper (architectural/design claim; no empirical validation reported in the abstract).

high positive Quantifying Trust: Financial Risk Management for Trustworthy... user protection in agent interactions via integrated risk assessment, underwriti...

We propose a complementary framework based on risk management: the Agentic Risk Standard (ARS), a payment settlement standard for AI-mediated transactions.

Framework proposal described in the paper (design/proposal; implementation referenced).

high positive Quantifying Trust: Financial Risk Management for Trustworthy... existence of the ARS framework (payment settlement standard integrating risk man...

Security evaluation across 135 test cases demonstrates 87.5% accuracy on static code safety analysis with zero false positives.

Security evaluation reported in paper across 135 test cases with reported accuracy and false positive rate.

high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... static code safety analysis accuracy and false positive rate

Security evaluation across 135 test cases demonstrates 96.7% accuracy on prompt injection detection.

Security evaluation reported in paper across 135 test cases with reported accuracy metric.

high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... prompt injection detection accuracy

On document intelligence (DocILE), Code Factory achieves the highest line item recognition accuracy (LIR: 80.4%).

Empirical evaluation reported on DocILE dataset of 5,680 invoices; LIR metric reported at 80.4% and described as the highest among compared variants.

high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... line item recognition accuracy (LIR)

Compiled AI reduces token consumption by 57x at 1,000 transactions.

Empirical token-consumption comparison reported in paper (scaling example at 1,000 transactions).

high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... token consumption

Compiled AI breaks even with runtime inference at approximately 17 transactions.

Cost/efficiency comparison reported in evaluation (function-calling context); break-even point stated in paper.

high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... cost trade-off / break-even transaction count

On function-calling, compiled AI achieves 96% task completion with zero execution tokens.

Empirical evaluation on the BFCL function-calling tasks (reported n=400).

high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... task completion rate

We introduce a system architecture for constrained LLM-based code generation, a four-stage generation-and-validation pipeline that converts probabilistic model output into production-ready code artifacts, and an evaluation framework measuring operational metrics including token amortization, determinism, reliability, security, and cost.

Paper states these three contributions as part of the authors' work (descriptive claim about methods and artifacts presented).

high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... availability of system architecture, pipeline, and evaluation framework (methodo...

By constraining generation to narrow business-logic functions embedded in validated templates, compiled AI trades runtime flexibility for predictability, auditability, cost efficiency, and reduced security exposure.

Conceptual/systems claim made in paper describing design trade-offs of the compiled AI paradigm (no single empirical test cited in the excerpt).

high positive Compiled AI: Deterministic Code Generation for LLM-Based Wor... predictability, auditability, cost efficiency, security exposure (design trade-o...

Experimental evidence confirms that AI tools raise worker productivity.

Statement in paper referencing experimental studies (no specific study, method, or sample size reported in the excerpt).

high positive The Augmentation Trap: AI Productivity and the Cost of Cogni... worker productivity

A lightweight interception layer captures and blocks only the final submission request, ensuring safe evaluation without real-world side effects.

Paper describes an interception layer in the evaluation infrastructure that prevents actual final submissions on production sites.

high positive ClawBench: Can AI Agents Complete Everyday Online Tasks? evaluation_safety (prevention of real-world side effects)

Unlike existing benchmarks that evaluate agents in offline sandboxes with static pages, ClawBench operates on production websites, preserving the full complexity, dynamic nature, and challenges of real-world web interaction.

Methodological description in the paper: evaluation occurs on live (production) websites rather than offline static sandboxes; supported by reported coverage of 144 live platforms.

high positive ClawBench: Can AI Agents Complete Everyday Online Tasks? evaluation_realism / fidelity to real-world interactions

The tasks in ClawBench require demanding capabilities beyond existing benchmarks, such as extracting relevant information from user-provided documents, navigating multi-step workflows across diverse platforms, and completing write-heavy operations like filling many detailed forms correctly.

Paper description of task types and the capabilities they require; based on the design and composition of the 153 tasks.

high positive ClawBench: Can AI Agents Complete Everyday Online Tasks? task_complexity / capability_requirements

ClawBench spans 144 live platforms across 15 categories.

Paper explicitly reports coverage across 144 production websites and 15 task categories (dataset description).

high positive ClawBench: Can AI Agents Complete Everyday Online Tasks? benchmark_scope (platforms and categories)

ClawBench is an evaluation framework of 153 simple tasks that people need to accomplish regularly in their lives and work.

Paper states the benchmark comprises 153 tasks (dataset description).

high positive ClawBench: Can AI Agents Complete Everyday Online Tasks? benchmark_scope (number of tasks)

When used appropriately, LLMs are powerful tools that can expand the frontier of empirical economics.

Normative conclusion in the abstract based on the paper's proposed framework and discussion; presented as an overall benefit but not supported by empirical outcomes or quantified gains in the excerpt.

high positive Large Language Models: An Applied Econometric Framework expansion of empirical economics research capabilities

For estimation problems—automating the measurement of economic concepts for downstream analysis—valid downstream inference requires combining LLM outputs with a small validation sample to deliver consistent and precise estimates.

Methodological claim in the abstract advocating use of a small validation sample together with LLM outputs to achieve consistent/precise estimates; no empirical demonstration or sample-size specification provided in the excerpt.

high positive Large Language Models: An Applied Econometric Framework consistency and precision of downstream estimates derived from LLM-measured vari...

The paper provides an econometric framework for realizing the potential of LLMs in two empirical uses: prediction problems and estimation problems.

Claim of contribution in the abstract describing a methodological framework (the excerpt reports the existence of the framework but does not detail empirical validation or sample sizes).

high positive Large Language Models: An Applied Econometric Framework methodological framework for empirical use of LLMs

Researchers can now revisit old questions and tackle novel ones with rich data using LLMs.

Asserted in the paper's abstract as a consequence of LLM-enabled large-scale text analysis; no empirical demonstration or quantified case described in the excerpt.

high positive Large Language Models: An Applied Econometric Framework ability to (re)address research questions using textual data

Large language models (LLMs) enable researchers to analyze text at unprecedented scale and minimal cost.

Stated as an assertion in the paper's abstract/summary; based on the authors' framing of LLM capabilities (no empirical sample, experiment, or quantified result provided in the excerpt).

high positive Large Language Models: An Applied Econometric Framework ability to analyze text at scale and cost

There is an urgent need for targeted workforce planning, investment in human capital, and collaboration between industry, government, and educational institutions to manage AI-driven labour market transformations.

Policy conclusion drawn from the paper's theoretical framing (SBTC, Human Capital Theory) and the empirical patterns identified in secondary data and official reports (2020–2024).

high positive Artificial Intelligence and labour market polarisation in In... policy interventions for workforce planning and reskilling

Comparative insights from the United Kingdom show that more systematic AI adoption and structured training programs mitigate workforce displacement.

Cross-country comparison using secondary data and official reports (2020–2024) highlighting the UK's more systematic AI adoption and structured training, which the paper presents as reducing displacement risk.

high positive Artificial Intelligence and labour market polarisation in In... mitigation of workforce displacement via structured training/AI adoption strateg...

AI adoption is increasing demand for new competencies.

Secondary sources and official reports (2020–2024) cited in the paper document emerging skill requirements and employer demand for new competencies.

high positive Artificial Intelligence and labour market polarisation in In... demand for new skills/competencies

AI adoption is driving growth in high-wage occupations.

Analysis of secondary data and official reports (2020–2024) reporting expansion of high-wage occupational categories in India.

high positive Artificial Intelligence and labour market polarisation in In... occupational growth in high-wage jobs

AI adoption disproportionately benefits high-skilled workers.

The paper cites theoretical frameworks (Skill Biased Technological Change and Human Capital Theory) and analyses of secondary data and official reports from 2020–2024 showing relative gains for high-skill occupations.

high positive Artificial Intelligence and labour market polarisation in In... wages and employment of high-skilled workers

All data, code, and model responses are open-sourced.

Statement in the paper asserting that data, code, and model outputs are publicly released.

high positive The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... availability of study materials (data, code, responses)

78.7% of observed AI interactions are augmentation, not automation.

Empirical classification of AI interactions (from cross-referenced Anthropic Economic Index interactions/tasks) reported as a percentage in the paper.

high positive The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... share of AI interactions classified as augmentation vs automation

The study cross-references the SAFI benchmark with real-world AI adoption data from the Anthropic Economic Index covering 756 occupations and 17,998 tasks.

Data linkage described in the paper: use of Anthropic Economic Index as real-world AI adoption dataset (numbers reported in text).

high positive The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... occupations and tasks coverage in cross-reference dataset

The benchmark covers 263 text-based tasks spanning all 35 skills in the U.S. Department of Labor's O*NET taxonomy.

Reported dataset construction in the paper: 263 tasks mapped to 35 O*NET skills.

high positive The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... coverage of O*NET skills by benchmark tasks

We present the Skill Automation Feasibility Index (SAFI), benchmarking four frontier LLMs -- LLaMA 3.3 70B, Mistral Large, Qwen 2.5 72B, and Gemini 2.5 Flash -- across 263 text-based tasks spanning all 35 skills in the U.S. Department of Labor's O*NET taxonomy (1,052 total model calls, 0% failure rate).

Empirical benchmark executed by the authors: 263 text-based tasks mapped to 35 O*NET skills, 4 LLMs, 1,052 total model calls reported, and reported 0% failure rate.

high positive The AI Skills Shift: Mapping Skill Obsolescence, Emergence, ... benchmark coverage and execution success (model calls and failure rate)

The paper argues for a fundamental decoupling of semantic intent from human-readable representation.

Conceptual/design claim made by the authors as a recommended shift in representation strategy for agentic consumers; presented as argumentation rather than empirically tested in abstract.

high positive Beyond Human-Readable: Rethinking Software Engineering Conve... alignment between semantic intent encoding and human-readable formats

We extend the semantic density principle to propose rehabilitation of classical anti-patterns and introduce the program skeleton concept for agentic code navigation.

Design/position claims and proposed constructs presented in the paper (program skeleton concept and re-evaluation of anti-patterns) without empirical validation reported in abstract.

high positive Beyond Human-Readable: Rethinking Software Engineering Conve... suitability of classical anti-patterns and program skeletons for agentic navigat...

Aggressive compression reduced input tokens by 17%.

Reported numeric result from the controlled experiment comparing compressed logs to other conditions; sample size not specified in abstract.

high positive Beyond Human-Readable: Rethinking Software Engineering Conve... input token count

We propose a key design principle: semantic density optimization, eliminating tokens that carry zero information while preserving tokens that carry high semantic value.

Proposal/design principle presented in the paper; theoretical justification provided and (per paper) subsequently validated by experiment.

high positive Beyond Human-Readable: Rethinking Software Engineering Conve... information/content efficiency of token representations for agentic consumers

These empirical findings provide reference for global governments to optimise artificial intelligence policies for low-carbon urban development.

Paper conclusion interpreting results as policy-relevant and generalisable lessons for governments; based on observed positive association between NAIDPZ and urban GEE.

high positive Unlocking Green Growth: How Artificial Intelligence Policies... green economic efficiency (GEE)

The impact of the NAIDPZ policy on urban GEE is positively moderated by government attention and public environmental attention.

Reported moderation analysis showing interaction effects between the treatment indicator and measures of government attention and public environmental attention within the DiD framework.

high positive Unlocking Green Growth: How Artificial Intelligence Policies... green economic efficiency (GEE)

The composite NAIDPZ policy effect increases GEE mainly through promoting green technological innovation and optimising industrial structure.

Mechanism analysis reported in the paper (channel/mediation tests) showing that indicators of green technological innovation and industrial structure optimisation account for much of the policy effect on GEE.

high positive Unlocking Green Growth: How Artificial Intelligence Policies... green economic efficiency (GEE)

The policy effect on GEE is stronger in inland cities, central-region cities, and non-resource-based cities.

Reported heterogeneity/subgroup analysis within the staggered DiD framework comparing effects across geographic regions (inland vs. others, central vs. others) and city types (non-resource-based vs. resource-based) in the 267-city sample.

high positive Unlocking Green Growth: How Artificial Intelligence Policies... green economic efficiency (GEE)

The NAIDPZ policy significantly improves urban green economic efficiency (GEE).

Estimated treatment effect from staggered DiD on the 267-city panel (2007–2023) with reported statistical significance and multiple robustness checks mentioned.

high positive Unlocking Green Growth: How Artificial Intelligence Policies... green economic efficiency (GEE)

ImplicitMemBench reframes evaluation from 'what agents recall' to 'what they automatically enact'.

Paper framing statement positioning the benchmark's conceptual contribution as shifting evaluation focus to implicit, automatic behavior rather than explicit recall.

high positive ImplicitMemBench: Measuring Unconscious Behavioral Adaptatio... evaluation framing / measurement focus

Top performers were DeepSeek-R1 (65.3%), Qwen3-32B (64.1%), and GPT-5 (63.0%).

Paper lists top model names with reported overall percentage scores from the benchmark evaluation.

high positive ImplicitMemBench: Measuring Unconscious Behavioral Adaptatio... overall accuracy on the implicit memory benchmark

The benchmark's 300-item suite employs a unified Learning/Priming-Interfere-Test protocol with first-attempt scoring.

Paper states the suite size (300 items) and describes a unified Learning/Priming-Interfere-Test protocol and that scoring is done on first attempts.

high positive ImplicitMemBench: Measuring Unconscious Behavioral Adaptatio... other

« Prev 1 2 3 … 149 150 151 … 276 277 Next »