Evidence (4560 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	378	106	59	455	1007
Governance & Regulation	379	176	116	58	739
Research Productivity	240	96	34	294	668
Organizational Efficiency	370	82	63	35	553
Technology Adoption Rate	296	118	66	29	513
Firm Productivity	277	34	68	10	394
AI Safety & Ethics	117	177	44	24	364
Output Quality	244	61	23	26	354
Market Structure	107	123	85	14	334
Decision Quality	168	74	37	19	301
Fiscal & Macroeconomic	75	52	32	21	187
Employment Level	70	32	74	8	186
Skill Acquisition	89	32	39	9	169
Firm Revenue	96	34	22	—	152
Innovation Output	106	12	21	11	151
Consumer Welfare	70	30	37	7	144
Regulatory Compliance	52	61	13	3	129
Inequality Measures	24	68	31	4	127
Task Allocation	75	11	29	6	121
Training Effectiveness	55	12	12	16	96
Error Rate	42	48	6	—	96
Worker Satisfaction	45	32	11	6	94
Task Completion Time	78	5	4	2	89
Wages & Compensation	46	13	19	5	83
Team Performance	44	9	15	7	76
Hiring & Recruitment	39	4	6	3	52
Automation Exposure	18	17	9	5	50
Job Displacement	5	31	12	—	48
Social Protection	21	10	6	2	39
Developer Productivity	29	3	3	1	36
Worker Turnover	10	12	—	3	25
Skill Obsolescence	3	19	2	—	24
Creative Output	15	5	3	1	24
Labor Share of Income	10	4	9	—	23

Productivity Remove filter

We propose a novel Co-Regulation Design Agentic Loop (CRDAL), in which a Metacognitive Co-Regulation Agent assists the Design Agent in metacognition to mitigate design fixation.

Methodological contribution presented in the paper (proposed system architecture). No empirical sample size reported for the proposal itself.

high positive Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regul... proposed agent architecture (Co-Regulation Design Agentic Loop)

We propose a novel Self-Regulation Loop (SRL), in which the Design Agent self-regulates and explicitly monitors its own metacognition.

Methodological contribution presented in the paper (proposed system architecture). No empirical sample size reported for the proposal itself.

high positive Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regul... proposed agent architecture (Self-Regulation Loop)

Policy efficacy varies significantly across corporate profiles, with the strongest effects observed in non-state-owned enterprises, high-tech firms, and firms located in eastern regions.

Heterogeneity analyses reported in the study (subgroup analysis by ownership, technology intensity, and geographic region).

high positive The Impact of Digital Economy Pilot Zones on Corporate New Q... heterogeneous policy impact on corporate NQPF across firm subgroups

The estimated positive effect of the pilot zones on corporate NQPF is robust across a comprehensive battery of robustness and endogeneity tests.

Paper reports multiple robustness and endogeneity checks (details not provided in abstract) that reportedly do not overturn main findings.

high positive The Impact of Digital Economy Pilot Zones on Corporate New Q... robustness of estimated policy effect on NQPF

Mechanism analysis identifies three systemic transmission pathways for the policy: optimizing factor allocation, deepening digital technology empowerment, and promoting green innovation and sustainability.

Mechanism analysis reported in the study (methods not detailed in abstract) attributing the policy effect to three pathways.

high positive The Impact of Digital Economy Pilot Zones on Corporate New Q... mechanistic channels: factor allocation, digital technology empowerment, green i...

The pilot zones create an optimized 'digital environment' that underlies the positive impact on corporate NQPF.

Empirical analysis in the paper attributes improved corporate NQPF to an optimized digital environment created by the policy intervention; mechanism analysis referenced.

high positive The Impact of Digital Economy Pilot Zones on Corporate New Q... presence/quality of digital environment / organizational digital infrastructure

The DML approach flexibly controls for high-dimensional confounding variables and functional form misspecification, enabling highly rigorous causal inference compared with traditional linear models.

Methodological claim based on use of Double Machine Learning in the study (described as addressing high-dimensional confounders and misspecification).

high positive The Impact of Digital Economy Pilot Zones on Corporate New Q... quality of causal inference / methodological rigor

Establishment of China’s National Digital Economy Innovation and Development Pilot Zones significantly enhances corporate New Quality Productive Forces (NQPF).

Quasi-natural experiment using Double Machine Learning (DML) framework applied to A-share listed companies over 2015–2023; empirical results reported as statistically significant.

high positive The Impact of Digital Economy Pilot Zones on Corporate New Q... corporate New Quality Productive Forces (NQPF)

AlphaFold represents an 'oracle' breakthrough in AI for scientific discovery.

Cited as an example of an algorithmic breakthrough that changed a specific scientific subtask (protein structure prediction). The paper frames AlphaFold as a milestone in the history reviewed; no new experimental data presented.

high positive A Brief History of AI for Scientific Discovery: Open Researc... impact of AlphaFold on a scientific subtask (protein structure prediction)

Phase Three employs AI for comprehensive sensitivity analysis while humans provide strategic interpretation.

Descriptive claim about the third phase of the framework and its use in the paper's applied test; presented as the intended role split between AI (computational sensitivity tasks) and humans (interpretation).

high positive AI-Augmented Real Estate Underwriting: A Practical Framework... task_allocation

Phase One leverages AI for rapid market research aggregation and preliminary pro forma generation.

Descriptive claim about the first phase of the proposed three-phase framework as presented in the paper; conceptual rather than a separate empirical finding.

high positive AI-Augmented Real Estate Underwriting: A Practical Framework... task_allocation

The framework achieved seventy-one to ninety percent time reduction while maintaining analytical quality comparable to traditional methods.

Empirical result reported from the controlled ChatGPT-4 test on the single 150-unit scenario comparing time to complete underwriting tasks versus traditional methods.

high positive AI-Augmented Real Estate Underwriting: A Practical Framework... task_completion_time

This research develops and empirically validates a three-phase framework for AI-augmented multifamily underwriting through controlled testing with ChatGPT-4 using a standardized 150-unit development scenario in Seattle's Greenwood neighborhood.

Controlled testing described in paper: use of ChatGPT-4 on a single standardized 150-unit development scenario in Seattle Greenwood to evaluate the proposed three-phase framework.

high positive AI-Augmented Real Estate Underwriting: A Practical Framework... task_completion_time

Generative artificial intelligence demonstrates significant promise for efficiency gains across financial services.

Introductory assertion in paper; general statement about the potential of generative AI, not directly derived from the paper's controlled test.

high positive AI-Augmented Real Estate Underwriting: A Practical Framework... organizational_efficiency

Empirical findings demonstrate that digitalization significantly boosts efficiency and competitiveness of industrial production.

Correlation and regression analyses reported in the study linking digitalization measures to indicators of efficiency and competitiveness across levels of analysis.

high positive Digitalization and labor costs: efficiency of industrial ent... production efficiency and competitiveness

Digital technologies (automation, IIoT, ERP systems, AI applications) reduce nonproductive costs, increase per-worker output, and improve the cost-efficiency of production in Kazakhstani enterprises.

Case studies and real examples from named enterprises (Asia Auto, Karaganda Foundry and Engineering Plant, Eurasian Resources Group) presented in the article.

high positive Digitalization and labor costs: efficiency of industrial ent... per-worker output (and labor costs per unit of production / nonproductive costs)

The number of employees and working time have a positive but limited effect on labor productivity.

Results from the study's correlation and regression analysis comparing labor input measures (employee count and working time) with productivity outcomes.

high positive Digitalization and labor costs: efficiency of industrial ent... labor productivity

Digitalization is the key driver of labor productivity growth in Kazakhstan.

Empirical correlation and regression analysis reported in the study across enterprise, industry, and national economy levels.

high positive Digitalization and labor costs: efficiency of industrial ent... labor productivity

Opportunities arising from cyborg workflows include hyper-personalized narratives, democratized production, and ethical augmentation of underrepresented voices.

Forward-looking/interpretive claim in the paper describing potential benefits and opportunities; conceptual rather than empirically demonstrated in the excerpt.

high positive Cyborg Workflows Merging Human Judgment and Agentic AI for D... personalization, access to production, representation

Scalability is addressed via edge computing to support cyborg workflows.

Design/architectural claim in the paper mentioning edge computing as a scalability mechanism; no deployment-scale measurements reported in the excerpt.

high positive Cyborg Workflows Merging Human Judgment and Agentic AI for D... scalability/adoption feasibility

The proposed workflows include robust bias mitigation strategies.

Paper asserts bias mitigation approaches are included and demonstrated in case studies; no quantitative fairness metrics or evaluation details provided in the excerpt.

high positive Cyborg Workflows Merging Human Judgment and Agentic AI for D... bias reduction / fairness

Cyborg workflows produce enhanced creative output via iterative human–AI refinement.

Qualitative claim supported by case studies and examples presented in the paper (no quantitative creativity metrics or sample sizes reported in the excerpt).

high positive Cyborg Workflows Merging Human Judgment and Agentic AI for D... creative output

Empirical evaluations validate 25-60% improvements in key metrics.

Paper states empirical evaluation results with a 25–60% improvement range; specific metrics, methods, and sample sizes are not provided in the excerpt.

high positive Cyborg Workflows Merging Human Judgment and Agentic AI for D... key metrics (unspecified)

Case studies in content generation, news curation, and immersive production demonstrate efficiency gains of up to 3x in throughput.

Reported results from unspecified case studies described in the paper; numeric claim provided but case study sample sizes and methodological details are not reported in the excerpt.

high positive Cyborg Workflows Merging Human Judgment and Agentic AI for D... throughput

The paper proposes a comprehensive framework encompassing modular architectures, hybrid protocols, and real-time collaboration interfaces informed by cognitive science, AI engineering, and media studies.

Architectural and methodological proposal described in the paper (the claim is descriptive of the proposed system; no quantitative evaluation of the framework components provided).

high positive Cyborg Workflows Merging Human Judgment and Agentic AI for D... framework components (architecture, protocols, interfaces)

Cyborg workflows fuse human judgment with agentic AI autonomous systems capable of goal-directed planning and execution.

Conceptual description and framework proposed in the paper (no empirical sample or trial details reported).

high positive Cyborg Workflows Merging Human Judgment and Agentic AI for D... human-AI task coordination

RL-based AVs improve average fuel efficiency by about 1.86% at lower speeds (below 50 km/h) compared to the IDM.

Macroscopic-level fuel efficiency comparison between RL-based AV model and IDM in simulation, stratified by speed (<50 km/h). Number of simulation runs not stated.

high positive Macroscopic Characteristics of Mixed Traffic Flow with Deep ... average fuel efficiency at speeds < 50 km/h

RL-based AVs improve average fuel efficiency by about 28.98% at higher speeds (above 50 km/h) compared to the IDM.

Macroscopic-level fuel efficiency comparison between RL-based AV model and IDM in simulation, stratified by speed (>50 km/h). Number of simulation runs not stated.

high positive Macroscopic Characteristics of Mixed Traffic Flow with Deep ... average fuel efficiency at speeds > 50 km/h

Transitioning from fully human-driven to fully RL-controlled traffic can increase road capacity by approximately 7.52%.

Macroscopic simulation experiments producing Fundamental Diagrams comparing fully human-driven traffic to fully RL-controlled traffic. Exact number of simulation scenarios or replicates not provided in the claim text.

high positive Macroscopic Characteristics of Mixed Traffic Flow with Deep ... road capacity

This study implements a Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm to control AVs and trains it using the NGSIM highway dataset to enable realistic interaction with human-driven vehicles.

Methodological description in the paper: implementation of TD3 and training on the NGSIM dataset. Dataset referenced but no numeric sample size reported in the claim text.

high positive Macroscopic Characteristics of Mixed Traffic Flow with Deep ... method used for AV control (TD3 trained on NGSIM)

The result is evidence-based triggers that replace calendar schedules and make governance auditable.

Claimed outcome of applying the decision-theoretic framework in the paper (argumentative; no empirical deployment or case-study evidence reported in the summary).

high positive Retraining as Approximate Bayesian Inference retraining trigger design and governance auditability

The paper provides a decision-theoretic framework for retraining policies.

Explicit claim about the paper's contribution; the article presents a framework (conceptual/methodological exposition).

high positive Retraining as Approximate Bayesian Inference existence of a prescriptive framework for retraining policies

The retraining decision is a cost minimization problem with a threshold that falls out of your loss function.

Decision-theoretic derivation presented in the paper (analytical/theoretical reasoning; no empirical validation reported).

high positive Retraining as Approximate Bayesian Inference formalization of retraining decision rule (cost-minimization/threshold)

Retraining can be better understood as approximate Bayesian inference under computational constraints.

Theoretical argument and decision-theoretic framing presented in the paper (conceptual/mathematical derivation rather than empirical testing).

high positive Retraining as Approximate Bayesian Inference conceptual framing of retraining

The analysis was pre-registered and code and data are publicly available.

Authors' statement in the abstract/paper declaring pre-registration and public release of code and data.

high positive Do LLMs Know What They Know? Measuring Metacognitive Efficie... research transparency (pre-registration and public code/data)

The meta-d' framework reveals which models 'know what they don't know' versus which merely appear well-calibrated due to criterion placement — a distinction with direct implications for model selection, deployment, and human-AI collaboration.

Interpretation and implications drawn from empirical results showing dissociations between calibration metrics and metacognitive measures (meta-d', M-ratio, criterion shifts); argument that this distinction informs practical decisions about model use.

high positive Do LLMs Know What They Know? Measuring Metacognitive Efficie... distinction between true metacognitive capacity and apparent calibration driven ...

We applied this framework to four LLMs (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3, Llama-3-8B-Base, Gemma-2-9B-Instruct) across 224,000 factual QA trials.

Experimental methods reported in the paper listing the four model variants and total trial count (224,000 factual QA trials).

high positive Do LLMs Know What They Know? Measuring Metacognitive Efficie... empirical evaluation of models' Type-1 and Type-2 metrics across factual QA tria...

We introduce an evaluation framework based on Type-2 Signal Detection Theory that decomposes these capacities using meta-d' and the metacognitive efficiency ratio M-ratio.

Methodological contribution described in the paper: specification of a Type-2 SDT framework and use of meta-d' and M-ratio as measurement constructs.

high positive Do LLMs Know What They Know? Measuring Metacognitive Efficie... decomposition of Type-1 vs Type-2 capacities using meta-d' and M-ratio

The best designs often do not originate from top-ranked ILP candidates, indicating that global optimization exposes improvements missed by sub-kernel search.

Analysis comparing origins of the best final designs vs. their ILP ranking, reported across the benchmark set (12).

high positive Agent Factories for High Level Synthesis: How Far Can Genera... origin/ranking of best designs relative to ILP candidates

Larger gains on harder benchmarks: streamcluster exceeds 20× and kmeans reaches approximately 10×.

Per-benchmark empirical results reported for streamcluster and kmeans in the evaluation.

high positive Agent Factories for High Level Synthesis: How Far Can Genera... execution/performance speedup for specific benchmarks

Scaling from 1 to 10 agents yields a mean 8.27× speedup over baseline.

Empirical evaluation across the reported benchmark set comparing performance with 1 agent versus 10 agents; mean speedup stated in the results.

high positive Agent Factories for High Level Synthesis: How Far Can Genera... execution/performance speedup relative to baseline

We evaluate the approach on 12 kernels from HLS-Eval and Rodinia-HLS using Claude Code (Opus 4.5/4.6) with AMD Vitis HLS.

Experimental setup described in the paper reporting evaluation on 12 kernels drawn from HLS-Eval and Rodinia-HLS, using Claude Code (Opus 4.5/4.6) and AMD Vitis HLS.

high positive Agent Factories for High Level Synthesis: How Far Can Genera... evaluation dataset and toolchain used

In Stage 2, the pipeline launches N expert agents over the top ILP solutions, each exploring cross-function optimizations such as pragma recombination, loop fusion, and memory restructuring that are not captured by sub-kernel decomposition.

Method section describing Stage 2 which runs multiple expert agents exploring cross-function optimizations on top ILP solutions.

high positive Agent Factories for High Level Synthesis: How Far Can Genera... description of Stage 2 expert-agent exploration of cross-function optimizations

In Stage 1, the pipeline decomposes a design into sub-kernels, independently optimizes each using pragma and code-level transformations, and formulates an Integer Linear Program (ILP) to assemble globally promising configurations under an area constraint.

Method section describing Stage 1 decomposition, per-sub-kernel optimization and ILP assembly under an area constraint.

high positive Agent Factories for High Level Synthesis: How Far Can Genera... description of Stage 1 decomposition and ILP-based assembly

We introduce an agent factory, a two-stage pipeline that constructs and coordinates multiple autonomous optimization agents.

Method description in the paper describing the design and implementation of the two-stage 'agent factory' pipeline.

high positive Agent Factories for High Level Synthesis: How Far Can Genera... existence and design of the two-stage agent factory pipeline

Deployment validation across 43 classrooms demonstrated an 18x efficiency gain in the assessment workflow.

Field deployment described in the paper: system was validated across 43 classrooms and an efficiency gain of 18x in the assessment workflow is reported.

high positive When AI Meets Early Childhood Education: Large Language Mode... efficiency of the assessment workflow (time/resources per assessment)

Interaction2Eval achieves up to 88% agreement with human expert judgments.

Reported evaluation results comparing Interaction2Eval outputs to human expert annotations (rubric-based judgments) on the dataset.

high positive When AI Meets Early Childhood Education: Large Language Mode... agreement between AI-generated assessments and human expert judgments

Interaction2Eval, an LLM-based framework, addresses domain-specific challenges (child speech recognition, Mandarin homophone disambiguation, rubric-based reasoning).

Methodological description in the paper: a specialized LLM-based pipeline designed to handle listed domain challenges; presented as the approach used to extract structured quality indicators.

high positive When AI Meets Early Childhood Education: Large Language Mode... capability to handle domain-specific technical challenges in automated assessmen...

TEPE-TCI-370h is the first large-scale dataset of naturalistic teacher-child interactions in Chinese preschools (370 hours, 105 classrooms) with standardized ECQRS-EC and SSTEW annotations.

Authors' dataset construction and description: 370 hours of recorded interactions from 105 classrooms, annotated with ECQRS-EC and SSTEW rubrics as reported in the paper.

high positive When AI Meets Early Childhood Education: Large Language Mode... availability of a large-scale annotated dataset for preschool teacher-child inte...

All data and models are publicly released.

Statement in abstract asserting public release of datasets and models.

high positive CUA-Suite: Massive Human-annotated Video Demonstrations for ... public availability of data and models

« Prev 1 2 3 … 23 24 25 … 91 92 Next »