Evidence (4114 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	758	199	100	900	2007
Governance & Regulation	826	400	191	122	1563
Organizational Efficiency	777	193	124	84	1189
Technology Adoption Rate	635	233	124	97	1098
Research Productivity	422	128	57	336	954
Output Quality	476	179	59	47	761
Decision Quality	328	177	81	47	640
Firm Productivity	435	57	88	20	606
AI Safety & Ethics	218	277	65	33	599
Market Structure	180	170	123	24	502
Task Allocation	213	64	72	33	387
Skill Acquisition	170	61	61	17	309
Innovation Output	203	27	43	18	292
Employment Level	105	54	107	13	281
Fiscal & Macroeconomic	131	69	43	26	276
Consumer Welfare	117	63	42	11	233
Firm Revenue	153	48	26	3	230
Task Completion Time	173	31	8	12	225
Inequality Measures	44	122	49	6	221
Worker Satisfaction	89	65	22	12	188
Error Rate	69	92	10	2	173
Regulatory Compliance	77	69	14	5	165
Automation Exposure	56	56	26	13	154
Training Effectiveness	94	21	13	19	149
Wages & Compensation	77	36	25	6	144
Team Performance	86	17	27	10	141
Developer Productivity	95	17	14	6	133
Job Displacement	12	80	20	1	113
Hiring & Recruitment	52	7	8	3	70
Creative Output	31	18	8	3	61
Skill Obsolescence	5	46	6	1	58
Social Protection	27	16	8	2	53
Labor Share of Income	17	19	17	—	53
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

Innovation Remove filter

The study implies policy actions to promote high-quality development based on the finding that innovation and the digital economy now play larger roles in growth.

Authors' discussion/conclusion drawing policy implications from empirical findings (declining capital elasticity, rising TFP and digital economy contribution).

high positive Analysis of China's Economic Growth Drivers: An Empirical St... policy implication for promoting high-quality development

Overall, China's growth model shifted over 2010–2022 from being investment-driven to being innovation-driven.

Synthesis of results: declining capital elasticity, rising TFP contribution, substantial share of digital economy in TFP, and regional patterns reported by the study.

high positive Analysis of China's Economic Growth Drivers: An Empirical St... structural shift in the growth model (investment-driven → innovation-driven)

The study's method is novel because it uses both migrant worker monitoring data and digital-economy proxy indicators, giving a more accurate picture of how labor quality and technological progress affect each other.

Author-reported methodological description: extended Cobb–Douglas approach combined with quality-adjusted labor measures derived from migrant worker monitoring data and proxy indicators for the digital economy.

high positive Analysis of China's Economic Growth Drivers: An Empirical St... measurement accuracy of labor quality and technology interaction (methodological...

Regional analysis shows coastal regions have been driven by innovation, with an estimated (innovation) coefficient of approximately 0.31.

Regional decomposition/estimation reported in the paper's analysis of coastal vs inland regions using the extended production function and digital/labour-quality measures.

high positive Analysis of China's Economic Growth Drivers: An Empirical St... innovation-related elasticity/coefficient in coastal regions (≈0.31)

The digital economy accounted for 40% of the observed increase in TFP (i.e., made up 40% of the TFP contribution).

Attribution within the growth decomposition from the extended production function, where digital economy indicators are included and their contribution to TFP is estimated.

high positive Analysis of China's Economic Growth Drivers: An Empirical St... share of TFP contribution attributable to the digital economy

The contribution rate of total factor productivity (TFP) rose from 18% to 26% between the earlier and later periods.

Decomposition of growth using the extended Cobb–Douglas production function for China over 2010–2022, reporting TFP contribution rates for the two periods.

high positive Analysis of China's Economic Growth Drivers: An Empirical St... TFP contribution rate to economic growth

The paper proposes design principles for effective, accountable, and adaptive sandboxes to contribute to debates on experimentalism in AI governance.

Stated contribution of the paper (descriptive claim about content; abstract does not list the principles or empirical testing).

high positive Experimentalism beyond ex ante regulation: A law and economi... existence and articulation of design principles for RSs

Regulatory sandboxes (RSs) have emerged as a potential solution to AI regulatory challenges.

Descriptive observation and normative framing within the paper; contextual reference to the EU AI Act's treatment of sandboxes (no empirical sample reported in the abstract).

high positive Experimentalism beyond ex ante regulation: A law and economi... adoption/emergence of RSs as a governance mechanism for AI

PIER is an offline reinforcement learning framework that learns fuel‑efficient, safety‑aware routing policies from physics‑calibrated environments grounded in historical vessel tracking data and ocean reanalysis products, requiring no online simulator.

Methodological description of PIER in the paper: offline RL trained on environments constructed from AIS and reanalysis data; no online simulator used for policy learning (implementation details provided).

high positive Physics-informed offline reinforcement learning eliminates c... requirement for online simulator (method characteristic)

Bootstrap 95% confidence interval for PIER mean CO2 savings relative to great-circle routing is [2.9%, 15.7%].

Bootstrap analysis applied to the 2023 AIS validation results (840 episodes per method) producing the stated 95% CI for mean percent savings.

high positive Physics-informed offline reinforcement learning eliminates c... 95% bootstrap confidence interval for mean percent CO2 savings

PIER reduces per‑voyage fuel consumption variance by a factor of 3.5 (p < 0.001).

Statistical comparison of per-voyage fuel variance between PIER and baseline routing on 840 episodes per method from 2023 AIS data; significance reported with p < 0.001.

high positive Physics-informed offline reinforcement learning eliminates c... variance of per-voyage fuel consumption

The main results are robust to inclusion of firm, industry, and year fixed effects, DID identification using the 2018 SCD pilot, and multiple robustness checks addressing potential confounders and endogeneity.

Authors report baseline regressions with firm/industry/year fixed effects, DID specifications exploiting the 2018 Supply Chain Innovation and Application Pilot Program as a quasi-natural experiment, and a battery of robustness tests (alternative specifications, controls, and checks).

high positive Supply Chain Digitalization and its Impact on Green Innovati... robustness of estimated SCD effects on corporate green innovation

The positive effect of SCD on green innovation is stronger for substantive green innovation (actual environmentally beneficial R&D and technologies) than for strategic green innovation (symbolic/labeling or reputation‑oriented activities).

Heterogeneous outcome analysis splitting green innovation into 'substantive' (e.g., green patents, technological R&D outputs) versus 'strategic' (signaling/compliance indicators); regression and DID estimates show larger and statistically significant coefficients for substantive measures compared to smaller or weaker effects on strategic measures.

high positive Supply Chain Digitalization and its Impact on Green Innovati... substantive green innovation (green patents, concrete environmental R&D outputs)...

Supply chain digitalization (SCD) significantly increases corporate green innovation among Chinese A-share listed firms (2012–2022).

Panel analysis of Chinese A-share listed firms over 2012–2022 using regression models with firm, industry, and year fixed effects; difference-in-differences (DID) identification exploiting the 2018 Supply Chain Innovation and Application Pilot Program as an exogenous shock to SCD; firm-level controls included; multiple robustness checks reported.

high positive Supply Chain Digitalization and its Impact on Green Innovati... corporate green innovation (aggregate measures of green innovation such as green...

Research priorities include empirically quantifying AI's effects on productivity, wages, inequality, and environmental costs; developing standardized sustainability and governance metrics; and evaluating regulatory impacts on innovation and welfare.

Stated research agenda based on gaps identified in the narrative review; identifies directions for future empirical work rather than presenting new empirical findings.

high positive The Evolution and Societal Impact of Artificial Intelligence... empirical evidence and standardized metrics for AI impacts (productivity, labor-...

AI has progressed from symbolic systems to data-driven, generative architectures and large-scale computational infrastructures, becoming a foundational technology across sectors.

Narrative synthesis of historical and technical literature across AI research and innovation studies; qualitative tracing of architectural shifts (symbolic → statistical → deep learning/generative models) and increased deployment across industries. No original empirical measurement or sample size reported in this paper.

high positive The Evolution and Societal Impact of Artificial Intelligence... technological evolution and cross-sector adoption (foundational-technology statu...

Mechanisms linking digital services to export performance include reduced transaction and search costs, platform network and scale effects, data as an input improving service quality and customization, and task‑level specialization changing comparative advantage.

Conceptual/theoretical synthesis drawing on multiple strands of literature and illustrative case studies presented in the review (no new causal identification).

high positive Analysis of Digital Services Trade and Export Competitivenes... export performance of digital services (via transaction costs, service quality, ...

Digital services trade is shifting from traditional cross‑border delivery toward online, platform‑based models, with cross‑border data flows a core input and determinant of competitiveness.

Integrative literature and policy review synthesizing domestic and international studies; theoretical/conceptual synthesis and cited case examples (no new econometric analysis or primary microdata).

high positive Analysis of Digital Services Trade and Export Competitivenes... mode of digital services delivery and export competitiveness (role of platforms ...

An asynchronous sliding-window engine treats the GPU as a sliding compute window and overlaps GPU computation with CPU-side parameter updates and multi-tier I/O to hide data movement and synchronization overheads.

System design and implementation described in the paper: an asynchronous runtime that coordinates GPU kernels, CPU updates, and multi-tier I/O. This is a design/implementation claim rather than a measured outcome; the summary links the design to performance improvements.

high positive An Efficient Heterogeneous Co-Design for Fine-Tuning on a Si... system behavior (overlap of compute and I/O / synchronization)

Evaluation metrics for the benchmark include task-specific metrics such as win-rate for battling and completion time for speedruns, as well as strategic robustness measures.

Paper's evaluation section lists metrics used: win-rate, completion time, strategic robustness; describes how they are computed and used to compare agents.

high positive The PokeAgent Challenge: Competitive and Long-Context Learni... evaluation metrics used (win-rate, completion time, strategic robustness)

Speedrunning Track includes an open-source multi-agent orchestration system and standardized evaluation scenarios for reproducible multi-agent comparisons.

Paper describes and releases an open-source orchestration harness for orchestrating LLMs/agents and provides standardized scenarios and evaluation tools meant for reproducibility.

high positive The PokeAgent Challenge: Competitive and Long-Context Learni... availability of open-source orchestration code and standardized evaluation scena...

Community interest in the benchmark was validated by a NeurIPS 2025 competition with 100+ teams and published analyses of winning submissions.

Paper reports organization/validation via a NeurIPS 2025 competition, states participation of 100+ teams, and includes documentation/analyses of top submissions.

high positive The PokeAgent Challenge: Competitive and Long-Context Learni... number of competing teams (100+), availability of competition analyses/winning s...

The project is a living benchmark: the Battling Track has a live leaderboard and the Speedrunning Track uses self-contained evaluation to ensure reproducibility.

Paper/documentation notes a live leaderboard for Battling and provides self-contained evaluation pipelines/orchestration for Speedrunning intended to support reproducible runs.

high positive The PokeAgent Challenge: Competitive and Long-Context Learni... presence of live leaderboard and self-contained evaluation pipelines

Baselines include heuristic rule-based agents, reinforcement-learning (RL) agents trained for specialist play, and LLM-based agents/harnesses for generalist approaches.

Paper presents baseline implementations and experiments spanning heuristic, RL, and LLM-based agents and describes training procedures and architectures used for each baseline category.

high positive The PokeAgent Challenge: Competitive and Long-Context Learni... presence and types of baseline agents (heuristic, RL, LLM)

The benchmark is split into two complementary tracks: a Battling Track (competitive, partial-observability battles) and a Speedrunning Track (long-horizon RPG tasks with a multi-agent orchestration harness).

Paper structure and dataset descriptions specify two tracks, their scopes, and the inclusion of a multi-agent orchestration system for the Speedrunning Track.

high positive The PokeAgent Challenge: Competitive and Long-Context Learni... benchmark partitioning (presence of Battling and Speedrunning tracks)

The Battling Track dataset contains more than 20 million recorded battle trajectories.

Paper reports a Battling Track dataset of >20M recorded battle trajectories collected from simulated/match play; size reported explicitly in dataset and methods section.

high positive The PokeAgent Challenge: Competitive and Long-Context Learni... number of recorded battle trajectories (>20,000,000)

PokeAgent Challenge is a large, realistic multi-agent benchmark built on Pokemon that stresses partial observability, game-theoretic reasoning, and long-horizon planning simultaneously.

Paper describes design and motivation of the benchmark, detailing two tracks (Battling and Speedrunning) intended to capture partial observability, adversarial/game-theoretic interactions, and long-horizon sequential planning; benchmark implementation built on Pokemon simulator and described task specifications.

high positive The PokeAgent Challenge: Competitive and Long-Context Learni... benchmark task characteristics (partial observability, game-theoretic complexity...

iDaVIE's modular architecture supports extensibility (planned features include subcube loading, advanced render modes, video scripting, and collaborative VR sessions).

Paper describes modular architecture and lists planned/possible future features; this is a software design claim rather than an empirical result.

high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... software extensibility and planned feature set

Because iDaVIE is open-source and extensible, software licensing costs are low and marginal adoption costs fall over time.

Paper states iDaVIE is open-source and designed for community-driven enhancements; economic claim based on general properties of open-source software rather than empirical cost accounting.

high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... licensing cost implication and marginal adoption costs

iDaVIE includes interaction features such as selection, cropping/subcube tools, catalogue overlays, and export back to existing pipelines.

Feature list in paper describing selection, cropping, overlays, in-VR metrics and export functionality; demonstrated integration to export edited masks/subcubes.

high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... availability and functionality of in-VR interaction and export tools

Streaming and downsampling pipelines implemented as Unity plug-ins make large volumes interactively viewable in VR while preserving needed detail for inspection.

Technical description of custom Unity plug-ins for streaming/downsampling and on-the-fly statistics; tested on HI cubes (telescopes listed) per the paper.

high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... interactive rendering performance and retention of inspection-relevant detail

iDaVIE (v1.0) is a working VR software suite that lets astronomers import, render, inspect, and interactively edit very large 3D data cubes in real time.

Described implementation of iDaVIE v1.0 built on Unity/SteamVR with custom plug-ins for parsing/downsampling and real-time rendering; tested on large 3D spectral (HI) cubes from radio telescopes (MeerKAT, ASKAP, APERTIF) as reported in the paper.

high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... ability to import/render/inspect/edit large 3D data cubes in real time (interact...

HindSight reveals a large, real difference between systems that is missed by LLM-based judging (i.e., HindSight detects the retrieval-augmentation advantage while LLM-judged metrics do not).

Combined empirical results: HindSight shows a 2.5× advantage (p < 0.001) for retrieval augmentation while LLM-as-Judge reports no significant difference (p = 0.584).

high positive HindSight: Evaluating LLM-Generated Research Ideas via Futur... Detection of performance difference between retrieval-augmented and vanilla gene...

Experiments in the paper cover 10 AI/ML research topics and use a 30-month forward evaluation window.

Experimental setup reported in the paper: scope explicitly stated as 10 AI/ML topics and a 30-month forward window after cutoff T.

high positive HindSight: Evaluating LLM-Generated Research Ideas via Futur... Scope parameters (number of topics = 10; forward window length = 30 months)

Generated ideas can be algorithmically compared to future publications and matched items can be assigned scores reflecting downstream impact (citation counts and venue acceptance).

Method section: description of algorithmic matching procedure and scoring rules that use citation counts and venue acceptance as impact proxies.

high positive HindSight: Evaluating LLM-Generated Research Ideas via Futur... Match indicators and downstream-impact scores (citations, venue acceptance) for ...

A retrieval-augmented idea generator produces 2.5× higher-scoring ideas than a vanilla generator according to HindSight (p < 0.001).

Empirical comparison reported in the paper across the specified experiments (10 AI/ML topics, time-split at T, 30-month forward window); statistical test reporting a 2.5× difference with p < 0.001.

high positive HindSight: Evaluating LLM-Generated Research Ideas via Futur... HindSight score (downstream-impact-based score for generated ideas)

HindSight is a time-split, retrospective evaluation that (1) restricts idea generation to pre-cutoff literature (time T), (2) compares generated ideas to papers published in the following 30 months, and (3) scores matches by downstream impact (citation counts and venue acceptance).

Method described in paper: time-split protocol with a temporal cutoff T, a 30-month forward window, algorithmic matching of generated ideas to later publications, and scoring based on downstream impact metrics (citations and venue acceptance).

high positive HindSight: Evaluating LLM-Generated Research Ideas via Futur... HindSight match score computed from matches to later publications weighted by ci...

The paper introduces a Multi-Object Decoder (MOD) that extends SAM 3D to jointly reconstruct multiple objects from a single image, targeting physically plausible, non-penetrating object configurations and realistic contacts.

Method section: MOD is described as an extension of the single-object SAM 3D architecture to jointly decode multiple object shapes and poses from a monocular image; the method explicitly aims to reduce inter-object penetration and model contacts.

high positive MessyKitchens: Contact-rich object-level 3D scene reconstruc... methodological capability: joint multi-object monocular 3D reconstruction, objec...

Managing captures, traces, and replay sessions from a unified single design database ensures consistency across replay targets and sessions.

Method description emphasizes a single design database coordinating captures and replays across simulation and emulation for the demonstrator system. (Operational claim demonstrated in the implementation; no metrics on error reduction provided.)

high positive ODIN-Based CPU-GPU Architecture with Replay-Driven Simulatio... consistency of trace/replay data and configuration across targets

The captured traces can be deterministically replayed across different execution targets (software/hardware simulation and hardware emulation), reducing cross-platform setup complexity and discrepancies.

The same captured waveforms/traces were replayed on both simulation and emulation environments for the ODIN demonstrator; cross-target replay was part of the described method. (Demonstrated on the single reported system; no broad cross-toolchain study provided.)

high positive ODIN-Based CPU-GPU Architecture with Replay-Driven Simulatio... consistency of reproduced behavior across simulator and emulator targets

The paper provides concrete, regulation-inspired policy examples (e.g., content prohibition, sensitive data exfiltration) showing how they map into the Policy function.

Worked, illustrative examples included in the paper mapping regulatory constraints to the Policy(agent_id, partial_path, proposed_action, org_state) formalism.

high positive Runtime Governance for AI Agents: Policies on Paths representability of regulation-inspired policies in the formalism (yes/no; examp...

Runtime policy evaluation can intercept, score, log, allow/modify/block actions, and update organizational state as part of an agent's execution loop (reference implementation architecture).

Reference implementation design described in the paper (runtime policy evaluator hooks, logging, enforcement actions); architectural reasoning and pseudo-workflows provided; no production deployment data.

high positive Runtime Governance for AI Agents: Policies on Paths feasibility of integrating runtime policy evaluator into agent loops (architectu...

Policies can be formalized as deterministic functions p_violation = Policy(agent_id, partial_path, proposed_action, org_state) that return a probability or score of violation for a proposed next action.

Formal definition and mapping in the paper; worked examples showing how regulatory-style constraints map into this function; no large-scale empirical validation.

high positive Runtime Governance for AI Agents: Policies on Paths expressiveness of policy formalism (ability to represent targeted constraints)

Effective governance for agentic LLM systems requires treating the execution path as the central object and performing runtime evaluation of proposed next actions given the partial path.

Theoretical argument and formal proposal of runtime policy evaluator that takes (agent_id, partial_path, proposed_action, org_state) and returns a violation probability; reference architecture described; illustrative examples.

high positive Runtime Governance for AI Agents: Policies on Paths governance effectiveness for path-dependent policies (qualitative/coverage)

The surrogate-driven inverse-design pipeline transfers to physical hardware — designs produced by the CNN+GA pipeline were realized and validated experimentally.

Two fabricated prototypes implemented the optimized pixelated combiners and GaN HEMT Doherty PAs; measured performance metrics correspond to the designs, demonstrating transfer from surrogate-driven design to hardware.

high positive Deep Learning-Driven Black-Box Doherty Power Amplifier with ... consistency between surrogate-driven design outputs and measured prototype perfo...

Under a 20 MHz 5G-NR-like waveform (9 dB PAPR) with digital predistortion (DPD), each prototype reached average PAE greater than 51% while meeting ACLR ≤ −60.8 dBc.

Realistic waveform testing described: a 20 MHz 5G‑NR-like signal with 9 dB PAPR was applied to the prototypes, DPD was used, and measurements reported average PAE > 51% and ACLR ≤ −60.8 dBc for each prototype.

high positive Deep Learning-Driven Black-Box Doherty Power Amplifier with ... average power-added efficiency (PAE %) and adjacent channel leakage ratio (ACLR,...

Each prototype demonstrated drain efficiency greater than 52% at 9 dB back-off.

Back-off efficiency measurements reported for the fabricated prototypes showing drain efficiency > 52% at 9 dB back-off.

high positive Deep Learning-Driven Black-Box Doherty Power Amplifier with ... drain efficiency at 9 dB back-off (%)

Each prototype produced output power exceeding 44.1 dBm at 2.75 GHz.

Measured output power reported from RF characterization of the two fabricated prototypes; reported value > 44.1 dBm at the test frequency.

high positive Deep Learning-Driven Black-Box Doherty Power Amplifier with ... output power (dBm)

Each fabricated prototype achieved peak drain efficiency greater than 74%.

Measured RF characterization reported for the two prototypes showing peak drain efficiency > 74%; measurements conducted on fabricated hardware at 2.75 GHz.

high positive Deep Learning-Driven Black-Box Doherty Power Amplifier with ... peak drain efficiency (%)

A genetic-algorithm (GA) blackbox optimizer paired with the CNN surrogate can effectively search the discrete multi-port pixel layout space to synthesize output combiners for Doherty amplifiers.

Method description: CNN surrogate embedded in a blackbox Doherty framework and used within a GA to select pixelated combiner layouts; successful designs were produced and taken to fabrication.

high positive Deep Learning-Driven Black-Box Doherty Power Amplifier with ... ability of optimization stack to find feasible combiner layouts that meet system...

« Prev 1 2 3 … 50 51 52 … 82 83 Next »