The Commonplace
Home Dashboard Papers Evidence Digests 🎲

Evidence (2320 claims)

Adoption
5227 claims
Productivity
4503 claims
Governance
4100 claims
Human-AI Collaboration
3062 claims
Labor Markets
2480 claims
Innovation
2320 claims
Org Design
2305 claims
Skills & Training
1920 claims
Inequality
1311 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 373 105 59 439 984
Governance & Regulation 366 172 115 55 718
Research Productivity 237 95 34 294 664
Organizational Efficiency 364 82 62 34 545
Technology Adoption Rate 293 118 66 30 511
Firm Productivity 274 33 68 10 390
AI Safety & Ethics 117 178 44 24 365
Output Quality 231 61 23 25 340
Market Structure 107 123 85 14 334
Decision Quality 158 68 33 17 279
Fiscal & Macroeconomic 75 52 32 21 187
Employment Level 70 32 74 8 186
Skill Acquisition 88 31 38 9 166
Firm Revenue 96 34 22 152
Innovation Output 105 12 21 11 150
Consumer Welfare 68 29 35 7 139
Regulatory Compliance 52 61 13 3 129
Inequality Measures 24 68 31 4 127
Task Allocation 71 10 29 6 116
Worker Satisfaction 46 38 12 9 105
Error Rate 42 47 6 95
Training Effectiveness 55 12 11 16 94
Task Completion Time 76 5 4 2 87
Wages & Compensation 46 13 19 5 83
Team Performance 44 9 15 7 76
Hiring & Recruitment 39 4 6 3 52
Automation Exposure 18 16 9 5 48
Job Displacement 5 29 12 46
Social Protection 19 8 6 1 34
Developer Productivity 27 2 3 1 33
Worker Turnover 10 12 3 25
Creative Output 15 5 3 1 24
Skill Obsolescence 3 18 2 23
Labor Share of Income 8 4 9 21
Clear
Innovation Remove filter
PIER is an offline reinforcement learning framework that learns fuel‑efficient, safety‑aware routing policies from physics‑calibrated environments grounded in historical vessel tracking data and ocean reanalysis products, requiring no online simulator.
Methodological description of PIER in the paper: offline RL trained on environments constructed from AIS and reanalysis data; no online simulator used for policy learning (implementation details provided).
high positive Physics-informed offline reinforcement learning eliminates c... requirement for online simulator (method characteristic)
Bootstrap 95% confidence interval for PIER mean CO2 savings relative to great-circle routing is [2.9%, 15.7%].
Bootstrap analysis applied to the 2023 AIS validation results (840 episodes per method) producing the stated 95% CI for mean percent savings.
high positive Physics-informed offline reinforcement learning eliminates c... 95% bootstrap confidence interval for mean percent CO2 savings
PIER reduces per‑voyage fuel consumption variance by a factor of 3.5 (p < 0.001).
Statistical comparison of per-voyage fuel variance between PIER and baseline routing on 840 episodes per method from 2023 AIS data; significance reported with p < 0.001.
high positive Physics-informed offline reinforcement learning eliminates c... variance of per-voyage fuel consumption
The main results are robust to inclusion of firm, industry, and year fixed effects, DID identification using the 2018 SCD pilot, and multiple robustness checks addressing potential confounders and endogeneity.
Authors report baseline regressions with firm/industry/year fixed effects, DID specifications exploiting the 2018 Supply Chain Innovation and Application Pilot Program as a quasi-natural experiment, and a battery of robustness tests (alternative specifications, controls, and checks).
high positive Supply Chain Digitalization and its Impact on Green Innovati... robustness of estimated SCD effects on corporate green innovation
The positive effect of SCD on green innovation is stronger for substantive green innovation (actual environmentally beneficial R&D and technologies) than for strategic green innovation (symbolic/labeling or reputation‑oriented activities).
Heterogeneous outcome analysis splitting green innovation into 'substantive' (e.g., green patents, technological R&D outputs) versus 'strategic' (signaling/compliance indicators); regression and DID estimates show larger and statistically significant coefficients for substantive measures compared to smaller or weaker effects on strategic measures.
high positive Supply Chain Digitalization and its Impact on Green Innovati... substantive green innovation (green patents, concrete environmental R&D outputs)...
Supply chain digitalization (SCD) significantly increases corporate green innovation among Chinese A-share listed firms (2012–2022).
Panel analysis of Chinese A-share listed firms over 2012–2022 using regression models with firm, industry, and year fixed effects; difference-in-differences (DID) identification exploiting the 2018 Supply Chain Innovation and Application Pilot Program as an exogenous shock to SCD; firm-level controls included; multiple robustness checks reported.
high positive Supply Chain Digitalization and its Impact on Green Innovati... corporate green innovation (aggregate measures of green innovation such as green...
Research priorities include empirically quantifying AI's effects on productivity, wages, inequality, and environmental costs; developing standardized sustainability and governance metrics; and evaluating regulatory impacts on innovation and welfare.
Stated research agenda based on gaps identified in the narrative review; identifies directions for future empirical work rather than presenting new empirical findings.
high positive The Evolution and Societal Impact of Artificial Intelligence... empirical evidence and standardized metrics for AI impacts (productivity, labor-...
AI has progressed from symbolic systems to data-driven, generative architectures and large-scale computational infrastructures, becoming a foundational technology across sectors.
Narrative synthesis of historical and technical literature across AI research and innovation studies; qualitative tracing of architectural shifts (symbolic → statistical → deep learning/generative models) and increased deployment across industries. No original empirical measurement or sample size reported in this paper.
high positive The Evolution and Societal Impact of Artificial Intelligence... technological evolution and cross-sector adoption (foundational-technology statu...
Mechanisms linking digital services to export performance include reduced transaction and search costs, platform network and scale effects, data as an input improving service quality and customization, and task‑level specialization changing comparative advantage.
Conceptual/theoretical synthesis drawing on multiple strands of literature and illustrative case studies presented in the review (no new causal identification).
high positive Analysis of Digital Services Trade and Export Competitivenes... export performance of digital services (via transaction costs, service quality, ...
Digital services trade is shifting from traditional cross‑border delivery toward online, platform‑based models, with cross‑border data flows a core input and determinant of competitiveness.
Integrative literature and policy review synthesizing domestic and international studies; theoretical/conceptual synthesis and cited case examples (no new econometric analysis or primary microdata).
high positive Analysis of Digital Services Trade and Export Competitivenes... mode of digital services delivery and export competitiveness (role of platforms ...
An asynchronous sliding-window engine treats the GPU as a sliding compute window and overlaps GPU computation with CPU-side parameter updates and multi-tier I/O to hide data movement and synchronization overheads.
System design and implementation described in the paper: an asynchronous runtime that coordinates GPU kernels, CPU updates, and multi-tier I/O. This is a design/implementation claim rather than a measured outcome; the summary links the design to performance improvements.
high positive An Efficient Heterogeneous Co-Design for Fine-Tuning on a Si... system behavior (overlap of compute and I/O / synchronization)
Evaluation metrics for the benchmark include task-specific metrics such as win-rate for battling and completion time for speedruns, as well as strategic robustness measures.
Paper's evaluation section lists metrics used: win-rate, completion time, strategic robustness; describes how they are computed and used to compare agents.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... evaluation metrics used (win-rate, completion time, strategic robustness)
Speedrunning Track includes an open-source multi-agent orchestration system and standardized evaluation scenarios for reproducible multi-agent comparisons.
Paper describes and releases an open-source orchestration harness for orchestrating LLMs/agents and provides standardized scenarios and evaluation tools meant for reproducibility.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... availability of open-source orchestration code and standardized evaluation scena...
Community interest in the benchmark was validated by a NeurIPS 2025 competition with 100+ teams and published analyses of winning submissions.
Paper reports organization/validation via a NeurIPS 2025 competition, states participation of 100+ teams, and includes documentation/analyses of top submissions.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... number of competing teams (100+), availability of competition analyses/winning s...
The project is a living benchmark: the Battling Track has a live leaderboard and the Speedrunning Track uses self-contained evaluation to ensure reproducibility.
Paper/documentation notes a live leaderboard for Battling and provides self-contained evaluation pipelines/orchestration for Speedrunning intended to support reproducible runs.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... presence of live leaderboard and self-contained evaluation pipelines
Baselines include heuristic rule-based agents, reinforcement-learning (RL) agents trained for specialist play, and LLM-based agents/harnesses for generalist approaches.
Paper presents baseline implementations and experiments spanning heuristic, RL, and LLM-based agents and describes training procedures and architectures used for each baseline category.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... presence and types of baseline agents (heuristic, RL, LLM)
The benchmark is split into two complementary tracks: a Battling Track (competitive, partial-observability battles) and a Speedrunning Track (long-horizon RPG tasks with a multi-agent orchestration harness).
Paper structure and dataset descriptions specify two tracks, their scopes, and the inclusion of a multi-agent orchestration system for the Speedrunning Track.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... benchmark partitioning (presence of Battling and Speedrunning tracks)
The Battling Track dataset contains more than 20 million recorded battle trajectories.
Paper reports a Battling Track dataset of >20M recorded battle trajectories collected from simulated/match play; size reported explicitly in dataset and methods section.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... number of recorded battle trajectories (>20,000,000)
PokeAgent Challenge is a large, realistic multi-agent benchmark built on Pokemon that stresses partial observability, game-theoretic reasoning, and long-horizon planning simultaneously.
Paper describes design and motivation of the benchmark, detailing two tracks (Battling and Speedrunning) intended to capture partial observability, adversarial/game-theoretic interactions, and long-horizon sequential planning; benchmark implementation built on Pokemon simulator and described task specifications.
high positive The PokeAgent Challenge: Competitive and Long-Context Learni... benchmark task characteristics (partial observability, game-theoretic complexity...
iDaVIE's modular architecture supports extensibility (planned features include subcube loading, advanced render modes, video scripting, and collaborative VR sessions).
Paper describes modular architecture and lists planned/possible future features; this is a software design claim rather than an empirical result.
high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... software extensibility and planned feature set
Because iDaVIE is open-source and extensible, software licensing costs are low and marginal adoption costs fall over time.
Paper states iDaVIE is open-source and designed for community-driven enhancements; economic claim based on general properties of open-source software rather than empirical cost accounting.
high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... licensing cost implication and marginal adoption costs
iDaVIE includes interaction features such as selection, cropping/subcube tools, catalogue overlays, and export back to existing pipelines.
Feature list in paper describing selection, cropping, overlays, in-VR metrics and export functionality; demonstrated integration to export edited masks/subcubes.
high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... availability and functionality of in-VR interaction and export tools
Streaming and downsampling pipelines implemented as Unity plug-ins make large volumes interactively viewable in VR while preserving needed detail for inspection.
Technical description of custom Unity plug-ins for streaming/downsampling and on-the-fly statistics; tested on HI cubes (telescopes listed) per the paper.
high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... interactive rendering performance and retention of inspection-relevant detail
iDaVIE (v1.0) is a working VR software suite that lets astronomers import, render, inspect, and interactively edit very large 3D data cubes in real time.
Described implementation of iDaVIE v1.0 built on Unity/SteamVR with custom plug-ins for parsing/downsampling and real-time rendering; tested on large 3D spectral (HI) cubes from radio telescopes (MeerKAT, ASKAP, APERTIF) as reported in the paper.
high positive iDaVIE v1.0: A virtual reality tool for interactive analysis... ability to import/render/inspect/edit large 3D data cubes in real time (interact...
HindSight reveals a large, real difference between systems that is missed by LLM-based judging (i.e., HindSight detects the retrieval-augmentation advantage while LLM-judged metrics do not).
Combined empirical results: HindSight shows a 2.5× advantage (p < 0.001) for retrieval augmentation while LLM-as-Judge reports no significant difference (p = 0.584).
high positive HindSight: Evaluating LLM-Generated Research Ideas via Futur... Detection of performance difference between retrieval-augmented and vanilla gene...
Experiments in the paper cover 10 AI/ML research topics and use a 30-month forward evaluation window.
Experimental setup reported in the paper: scope explicitly stated as 10 AI/ML topics and a 30-month forward window after cutoff T.
high positive HindSight: Evaluating LLM-Generated Research Ideas via Futur... Scope parameters (number of topics = 10; forward window length = 30 months)
Generated ideas can be algorithmically compared to future publications and matched items can be assigned scores reflecting downstream impact (citation counts and venue acceptance).
Method section: description of algorithmic matching procedure and scoring rules that use citation counts and venue acceptance as impact proxies.
high positive HindSight: Evaluating LLM-Generated Research Ideas via Futur... Match indicators and downstream-impact scores (citations, venue acceptance) for ...
A retrieval-augmented idea generator produces 2.5× higher-scoring ideas than a vanilla generator according to HindSight (p < 0.001).
Empirical comparison reported in the paper across the specified experiments (10 AI/ML topics, time-split at T, 30-month forward window); statistical test reporting a 2.5× difference with p < 0.001.
high positive HindSight: Evaluating LLM-Generated Research Ideas via Futur... HindSight score (downstream-impact-based score for generated ideas)
HindSight is a time-split, retrospective evaluation that (1) restricts idea generation to pre-cutoff literature (time T), (2) compares generated ideas to papers published in the following 30 months, and (3) scores matches by downstream impact (citation counts and venue acceptance).
Method described in paper: time-split protocol with a temporal cutoff T, a 30-month forward window, algorithmic matching of generated ideas to later publications, and scoring based on downstream impact metrics (citations and venue acceptance).
high positive HindSight: Evaluating LLM-Generated Research Ideas via Futur... HindSight match score computed from matches to later publications weighted by ci...
The paper introduces a Multi-Object Decoder (MOD) that extends SAM 3D to jointly reconstruct multiple objects from a single image, targeting physically plausible, non-penetrating object configurations and realistic contacts.
Method section: MOD is described as an extension of the single-object SAM 3D architecture to jointly decode multiple object shapes and poses from a monocular image; the method explicitly aims to reduce inter-object penetration and model contacts.
high positive MessyKitchens: Contact-rich object-level 3D scene reconstruc... methodological capability: joint multi-object monocular 3D reconstruction, objec...
Managing captures, traces, and replay sessions from a unified single design database ensures consistency across replay targets and sessions.
Method description emphasizes a single design database coordinating captures and replays across simulation and emulation for the demonstrator system. (Operational claim demonstrated in the implementation; no metrics on error reduction provided.)
high positive ODIN-Based CPU-GPU Architecture with Replay-Driven Simulatio... consistency of trace/replay data and configuration across targets
The captured traces can be deterministically replayed across different execution targets (software/hardware simulation and hardware emulation), reducing cross-platform setup complexity and discrepancies.
The same captured waveforms/traces were replayed on both simulation and emulation environments for the ODIN demonstrator; cross-target replay was part of the described method. (Demonstrated on the single reported system; no broad cross-toolchain study provided.)
high positive ODIN-Based CPU-GPU Architecture with Replay-Driven Simulatio... consistency of reproduced behavior across simulator and emulator targets
The paper provides concrete, regulation-inspired policy examples (e.g., content prohibition, sensitive data exfiltration) showing how they map into the Policy function.
Worked, illustrative examples included in the paper mapping regulatory constraints to the Policy(agent_id, partial_path, proposed_action, org_state) formalism.
high positive Runtime Governance for AI Agents: Policies on Paths representability of regulation-inspired policies in the formalism (yes/no; examp...
Runtime policy evaluation can intercept, score, log, allow/modify/block actions, and update organizational state as part of an agent's execution loop (reference implementation architecture).
Reference implementation design described in the paper (runtime policy evaluator hooks, logging, enforcement actions); architectural reasoning and pseudo-workflows provided; no production deployment data.
high positive Runtime Governance for AI Agents: Policies on Paths feasibility of integrating runtime policy evaluator into agent loops (architectu...
Policies can be formalized as deterministic functions p_violation = Policy(agent_id, partial_path, proposed_action, org_state) that return a probability or score of violation for a proposed next action.
Formal definition and mapping in the paper; worked examples showing how regulatory-style constraints map into this function; no large-scale empirical validation.
high positive Runtime Governance for AI Agents: Policies on Paths expressiveness of policy formalism (ability to represent targeted constraints)
Effective governance for agentic LLM systems requires treating the execution path as the central object and performing runtime evaluation of proposed next actions given the partial path.
Theoretical argument and formal proposal of runtime policy evaluator that takes (agent_id, partial_path, proposed_action, org_state) and returns a violation probability; reference architecture described; illustrative examples.
high positive Runtime Governance for AI Agents: Policies on Paths governance effectiveness for path-dependent policies (qualitative/coverage)
The surrogate-driven inverse-design pipeline transfers to physical hardware — designs produced by the CNN+GA pipeline were realized and validated experimentally.
Two fabricated prototypes implemented the optimized pixelated combiners and GaN HEMT Doherty PAs; measured performance metrics correspond to the designs, demonstrating transfer from surrogate-driven design to hardware.
high positive Deep Learning-Driven Black-Box Doherty Power Amplifier with ... consistency between surrogate-driven design outputs and measured prototype perfo...
Under a 20 MHz 5G-NR-like waveform (9 dB PAPR) with digital predistortion (DPD), each prototype reached average PAE greater than 51% while meeting ACLR ≤ −60.8 dBc.
Realistic waveform testing described: a 20 MHz 5G‑NR-like signal with 9 dB PAPR was applied to the prototypes, DPD was used, and measurements reported average PAE > 51% and ACLR ≤ −60.8 dBc for each prototype.
high positive Deep Learning-Driven Black-Box Doherty Power Amplifier with ... average power-added efficiency (PAE %) and adjacent channel leakage ratio (ACLR,...
Each prototype demonstrated drain efficiency greater than 52% at 9 dB back-off.
Back-off efficiency measurements reported for the fabricated prototypes showing drain efficiency > 52% at 9 dB back-off.
high positive Deep Learning-Driven Black-Box Doherty Power Amplifier with ... drain efficiency at 9 dB back-off (%)
Each prototype produced output power exceeding 44.1 dBm at 2.75 GHz.
Measured output power reported from RF characterization of the two fabricated prototypes; reported value > 44.1 dBm at the test frequency.
Each fabricated prototype achieved peak drain efficiency greater than 74%.
Measured RF characterization reported for the two prototypes showing peak drain efficiency > 74%; measurements conducted on fabricated hardware at 2.75 GHz.
high positive Deep Learning-Driven Black-Box Doherty Power Amplifier with ... peak drain efficiency (%)
A genetic-algorithm (GA) blackbox optimizer paired with the CNN surrogate can effectively search the discrete multi-port pixel layout space to synthesize output combiners for Doherty amplifiers.
Method description: CNN surrogate embedded in a blackbox Doherty framework and used within a GA to select pixelated combiner layouts; successful designs were produced and taken to fabrication.
high positive Deep Learning-Driven Black-Box Doherty Power Amplifier with ... ability of optimization stack to find feasible combiner layouts that meet system...
Explicit enforcement of signal constraints in DeePC provides a safety/operational advantage over many pure learning approaches that do not explicitly enforce hard constraints.
Algorithmic formulation includes constraints in the optimization; paper contrasts this with unconstrained learning-based controllers and demonstrates constrained, feasible actuation in simulation.
high positive Data-driven generalized perimeter control: Zürich case study explicit constraint satisfaction and operational safety of signal timings
DeePC can compute traffic-light actuation sequences that respect hard operational and safety constraints (e.g., phasing, minimum/maximum green times).
Formulation of DeePC as a constrained optimization problem in the paper with explicit constraint terms for signal phasing and safety; implemented in simulation experiments where constraints are enforced in the controller optimization.
high positive Data-driven generalized perimeter control: Zürich case study constraint satisfaction / feasibility of computed actuation sequences
Reframing urban traffic dynamics with behavioral systems theory allows system evolution to be learned and predicted directly from measured input–output data (no explicit model identification).
Theoretical exposition in the paper showing that traffic trajectories can be represented as linear combinations of past measured trajectories via Hankel/data matrices; used as the basis for predictive control (DeePC).
high positive Data-driven generalized perimeter control: Zürich case study predictive capability from measured I/O trajectories (ability to forecast future...
Applying DeePC yields measurable improvements in system-level outcomes (reduced total travel time and CO2 emissions) in a very large, high-fidelity microscopic simulation of Zürich.
Simulation experiments in a city-scale, high-fidelity microscopic closed-loop simulator of Zürich comparing DeePC-controlled signals against baseline controllers (e.g., fixed-time or standard adaptive schemes); reported reductions in aggregated metrics (total travel time and CO2 emissions).
high positive Data-driven generalized perimeter control: Zürich case study total travel time; CO2 emissions
A model-free traffic control approach (DeePC) can steer urban traffic via dynamic traffic-light control without building explicit traffic models.
Algorithmic/theoretical development (behavioral systems theory + DeePC) and controller-in-loop experiments in a high-fidelity microscopic closed-loop simulator of Zürich demonstrating closed-loop control using only input–output trajectory data (Hankel matrices) rather than parametric model identification.
high positive Data-driven generalized perimeter control: Zürich case study ability to generate feasible control (traffic-light) actuation sequences and clo...
The model weights will be open (open-weight release) to support European sovereignty and adoption.
Authors state intent to publish open weights and position the model as an open-weight European alternative; the summary reports this as a declared objective. The paper likely includes a licensing/availability statement.
high positive EngGPT2: Sovereign, Efficient and Open Intelligence planned availability / licensing status of model weights
Traditional machine-learning baselines were included for comparison in the benchmarks.
Paper explicitly states that traditional ML baselines were used alongside TSFMs in benchmarking experiments. The summary does not list which baselines or their quantitative results.
high positive Bridging the High-Frequency Data Gap: A Millisecond-Resoluti... inclusion of traditional ML baseline models in comparative evaluation
The dataset sampling resolution is at the millisecond level, enabling forecasting horizons from 1 step (100 ms) up to 96 steps (9.6 s).
Paper states sampling resolution is millisecond-level and defines forecasting tasks spanning 1 to 96 steps (100 ms to 9.6 s). This is a methodological description rather than an experimental metric.
high positive Bridging the High-Frequency Data Gap: A Millisecond-Resoluti... supported forecast horizons (temporal prediction horizon: 100 ms–9.6 s)