The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲

Evidence (4114 claims)

Adoption
8570 claims
Productivity
7631 claims
Governance
6869 claims
Human-AI Collaboration
6491 claims
Org Design
4175 claims
Innovation
4114 claims
Labor Markets
3566 claims
Skills & Training
2966 claims
Inequality
2066 claims

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome Positive Negative Mixed Null Total
Other 758 199 100 900 2007
Governance & Regulation 826 400 191 122 1563
Organizational Efficiency 777 193 124 84 1189
Technology Adoption Rate 635 233 124 97 1098
Research Productivity 422 128 57 336 954
Output Quality 476 179 59 47 761
Decision Quality 328 177 81 47 640
Firm Productivity 435 57 88 20 606
AI Safety & Ethics 218 277 65 33 599
Market Structure 180 170 123 24 502
Task Allocation 213 64 72 33 387
Skill Acquisition 170 61 61 17 309
Innovation Output 203 27 43 18 292
Employment Level 105 54 107 13 281
Fiscal & Macroeconomic 131 69 43 26 276
Consumer Welfare 117 63 42 11 233
Firm Revenue 153 48 26 3 230
Task Completion Time 173 31 8 12 225
Inequality Measures 44 122 49 6 221
Worker Satisfaction 89 65 22 12 188
Error Rate 69 92 10 2 173
Regulatory Compliance 77 69 14 5 165
Automation Exposure 56 56 26 13 154
Training Effectiveness 94 21 13 19 149
Wages & Compensation 77 36 25 6 144
Team Performance 86 17 27 10 141
Developer Productivity 95 17 14 6 133
Job Displacement 12 80 20 1 113
Hiring & Recruitment 52 7 8 3 70
Creative Output 31 18 8 3 61
Skill Obsolescence 5 46 6 1 58
Social Protection 27 16 8 2 53
Labor Share of Income 17 19 17 53
Worker Turnover 11 12 3 26
Industry 1 1
Clear
Innovation Remove filter
By placing networked IoT sensors in factories, trucks, storage sites, and upstream suppliers, real-time data were paired with machine-learning routines to schedule preventive maintenance, forecast orders, and guide blockchain tracking, routing adjustments, and automated decisions balancing green goals with everyday performance.
Paper description of system design and interventions: placement of sensors across supply chain nodes and pairing with ML routines for maintenance, forecasting, blockchain tracking, routing, and automated decisions.
high positive Green Supply Chain Optimization: AI and IoT for Ethical Reso... implementation of AI-IoT system functions (preventive maintenance scheduling, or...
TechToken improves general representation quality, outperforming state-of-the-art models across different patent-related tasks.
Benchmarking experiments across multiple patent-centered downstream tasks comparing TechToken to state-of-the-art models, with reported outperformance (statement given in abstract). Exact tasks, metrics, and sample sizes are not specified in the abstract.
high positive Anticipating Innovation Using Large Language Models downstream task performance / representation quality on patent-related tasks
Context similarity between code embeddings, defined as a measure of linguistic convergence, accurately predicts first technological combinations.
Operationalization of 'context similarity' between IPC-code embeddings and empirical validation showing it predicts first-time combinations of technologies (as claimed in abstract); implies out-of-sample prediction or event-detection experiments on patent combination events.
high positive Anticipating Innovation Using Large Language Models accuracy of predicting first joint occurrence (combination) of IPC codes / techn...
We introduce TechToken, a transformer-based model that treats technologies, classified by International Patent Classification (IPC) codes, as words in its vocabulary, learning the language of technologies by embedding these codes during fine-tuning.
Methodological contribution described in the paper: architecture is a transformer model with IPC codes tokenized and embedded during fine-tuning on patent data (method statement from abstract).
high positive Anticipating Innovation Using Large Language Models ability to represent IPC-coded technologies as embeddings (model representation ...
Forthcoming combinations leave an early trace in the collective language of patents, with predictive signals detectable even decades in advance.
Temporal analysis of patent-language embeddings showing predictive signals preceding first occurrences of technological combinations; described as detectable 'even decades in advance' across the patent corpus (statement in paper abstract). Specific methods likely include embedding IPC codes and measuring changes in context similarity over time; sample described qualitatively as spanning many patents (abstract mentions 'thousands of patents').
high positive Anticipating Innovation Using Large Language Models prediction of first occurrence of new technological combinations
For government policy, it is necessary to establish precise dynamic intervention and orderly exit mechanisms to effectively govern the computing power innovation ecosystem.
Policy implication drawn from the model's analysis of equilibria and regime transitions, and numerical experiments indicating path-dependent/regime-dependent outcomes under different regulatory strategies (method: theoretical model implications + simulation).
high positive Evolutionary Dynamics of Openness, Dependence, and Regulatio... need for/efficacy of dynamic regulatory intervention and exit mechanisms
A leading computing power incumbent could strengthen its ecological niche and maintain its role as an industry cornerstone by opening its underlying interfaces and software stacks while remaining integrated.
Implication derived from the model's strategic equilibrium analysis and simulations regarding incumbents' strategies for preserving niche/market position (method: evolutionary game analysis + simulations).
high positive Evolutionary Dynamics of Openness, Dependence, and Regulatio... incumbent ecological niche strength / market dominance
Downstream AI firms may benefit from advancing vertical integration, achieving hardware–software co-optimization through self-developed domain-specific architectures.
Result of the theoretical model (tripartite evolutionary game) and numerical simulation experiments showing advantages to downstream innovators when pursuing vertical integration and co-optimization (method: theoretical model + simulation).
high positive Evolutionary Dynamics of Openness, Dependence, and Regulatio... benefit to downstream firms (product/innovation co-optimization, competitive pos...
Mechanism tests reveal efficiency gains via automation are a key pathway by which AI increases productivity in constrained firms.
Mechanism analysis reported in the paper (tests linking AI adoption to automation-related efficiency improvements in constrained firm clusters).
high positive The Heterogeneous Effects of Artificial Intelligence on Ente... Efficiency gains / productivity via automation
Firms constrained by limited intangibles, outdated hardware, or weak human capital benefit most from AI adoption when AI mitigates bottlenecks (i.e., larger positive TFP effects for resource-constrained firms).
Subgroup/cluster-specific estimates from panel analysis showing larger productivity gains in clusters characterized by limited intangibles, outdated hardware, or weak human capital.
high positive The Heterogeneous Effects of Artificial Intelligence on Ente... Total Factor Productivity (TFP) / productivity gains
The frontier for AI-augmented science is not acceleration; it is the redesign of the certifying infrastructure around these new scarcities.
Prescriptive conclusion in the paper arguing priority of institutional redesign over mere speed gains; presented without empirical testing in the excerpt.
high positive AI-Augmented Science and the New Institutional Scarcities prioritization of redesigning certifying infrastructure versus accelerating scie...
Competent-looking judgment, including selecting, ranking, attributing, and certifying, is now produced at scale at marginal cost approaching zero, inverting the dominant economics-of-AI reading that treats judgment as the scarce complement to cheap prediction.
Argumentative/theoretical claim in the paper; no empirical sample, experiment, or quantitative data reported in the excerpt (implicit basis: observation of scalable AI outputs).
high positive AI-Augmented Science and the New Institutional Scarcities production of competent-looking judgment (selecting, ranking, attributing, certi...
Policy recommendations: invest in digital infrastructure, human capital development, and inclusive technology diffusion strategies to ensure more equitable distribution of AI-driven economic value.
Policy implications drawn from study findings (heterogeneous effects and mediation by structural conditions).
high positive The Economic Value of Agentic AI: A Comparative Analysis of ... equitable distribution of AI-driven economic value (policy interventions)
The magnitude of AI's growth effects varies across economic contexts: developed economies experience substantially stronger growth impacts (approximately 0.33) than emerging economies (approximately 0.15).
Heterogeneity analysis / subgroup comparisons (developed vs emerging economies) using the panel data regressions and/or quantile regressions on the 2015–2024 dataset; exact sample sizes per subgroup not reported.
high positive The Economic Value of Agentic AI: A Comparative Analysis of ... economic growth (heterogeneous treatment effects by country group)
AI adoption has a comparatively weaker direct effect on economic growth (direct effect β = 0.09).
Mediation/structural decomposition from the paper showing direct (non-mediated) coefficient from AI adoption to growth.
high positive The Economic Value of Agentic AI: A Comparative Analysis of ... economic growth (direct effect)
Agentic AI influences economic growth primarily through a productivity channel (mediated effect β = 0.35, p < 0.01).
Mediation analysis (panel data) estimating indirect effect of AI adoption on GDP growth via measured productivity channel; data sources: World Bank and OECD indicators, 2015–2024.
high positive The Economic Value of Agentic AI: A Comparative Analysis of ... economic growth (mediated via productivity)
AI adoption significantly improves firm-level productivity (β = 0.18, p < 0.01).
Fixed-effects panel regression using an AI Adoption Index as predictor on firm-level productivity; data drawn from World Bank (World Development Indicators and Enterprise Surveys) and OECD AI indicators for 2015–2024 (sample size not reported in text).
Agentic AI has strong potential to boost productivity and growth.
Statement in paper motivated by literature review and the study's empirical results linking AI adoption to productivity and growth.
high positive The Economic Value of Agentic AI: A Comparative Analysis of ... productivity and economic growth (general)
The field's near-term research agenda should explicitly include collecting and using triadic data.
Normative recommendation in the paper; presented as the authors' advised research priority rather than empirically justified within the excerpt.
high positive The Conversations Beneath the Code: Triadic Data for Long-Ho... inclusion of triadic data collection/use in near-term research agendas in the SW...
This data is the empirical key to four open questions in agent training.
Argumentative claim in the paper asserting centrality of triadic data to addressing unspecified four open research questions; no empirical demonstration included in the excerpt.
high positive The Conversations Beneath the Code: Triadic Data for Long-Ho... resolvability of four open questions in agent training using triadic data
This triadic data is capturable in 12-18 months with methods already mature in adjacent fields.
Claim in the paper based on authors' assessment of methodological maturity in adjacent fields; no empirical project timeline or pilot data is provided in the excerpt.
high positive The Conversations Beneath the Code: Triadic Data for Long-Ho... time required to collect a triadic dataset using existing methods
Any such corpus -- triadic or otherwise -- must justify its quality to a fine-tuning researcher through a four-tier evidence framework: mechanical verification, statistical corpus characterization, probe experiments, and pre-registered blind evaluation.
Methodological proposal in the paper outlining a four-tier evidence framework; presented as normative guidance rather than validated by application to a corpus in the excerpt.
high positive The Conversations Beneath the Code: Triadic Data for Long-Ho... quality and trustworthiness of fine-tuning corpora as judged by the four-tier fr...
The canonical instantiation of triadic data is two complementary products: long-horizon expert trajectories captured under stimulated-recall protocols, and simulated cross-functional companies -- instrumented teams of senior engineers, product managers, designers, and data scientists working through ambiguous deliverables on shared infrastructure.
Prescriptive specification in the paper proposing two concrete dataset types as canonical instantiations; presented as design/recommendation rather than empirically tested.
high positive The Conversations Beneath the Code: Triadic Data for Long-Ho... availability and suitability of dataset modalities (stimulated-recall expert tra...
The substrate for the next generation of software-engineering (SWE) agents is neither larger GitHub scrapes nor more solo-agent trajectories nor -- sufficient by itself -- open human-AI dialogue logs; it is triadic data: synchronized capture of the human-human conversations where engineering context is formed, the human-AI sessions where that context is partially consumed, and the multi-week cross-functional work that surrounds both.
Argument and conceptual proposal in the paper; no empirical validation or comparative experiments are provided in the excerpt.
high positive The Conversations Beneath the Code: Triadic Data for Long-Ho... effectiveness of training data substrates for improving agent performance on lon...
KOs transform verification economics: what was previously too costly to verify becomes feasible, enabling accumulated human validation to improve reliability over time.
Theoretical claim about economic and cumulative effects of adopting KOs; no cost-benefit analysis, pilot results, or quantitative evidence reported in the paper.
high positive Reliable AI Needs to Externalize Implicit Knowledge: A Human... cost-effectiveness of verification and cumulative improvement in AI reliability
We propose Knowledge Objects (KOs) — structured artifacts that externalize implicit knowledge into forms humans can inspect, verify, and endorse.
Proposed solution described in the paper; conceptual design and intended properties presented, without reported deployments, trials, or empirical evaluation.
high positive Reliable AI Needs to Externalize Implicit Knowledge: A Human... externalization and human verifiability of implicit knowledge via KOs
We release the benchmark, harness, sweep configurations, and full run corpus.
Statement of artifact release in the paper; verifiable by checking the project's repository or supplementary materials.
high positive AgentFloor: How Far Up the tool use Ladder Can Small Open-We... availability of released materials (benchmark and run corpus)
These findings suggest a practical design principle for agentic systems: use smaller open-weight models for the broad base of routine actions, and reserve large frontier models for the narrower class of tasks that truly demand deeper planning and control.
Synthesis/recommendation drawn from the empirical results on AgentFloor showing where small/mid models suffice and where frontier models have advantage; prescriptive claim rather than a direct empirical measurement.
high positive AgentFloor: How Far Up the tool use Ladder Can Small Open-We... recommended task routing strategy for agentic systems (model assignment to task ...
The gap appears most clearly on long-horizon planning tasks that require sustained coordination and reliable constraint tracking over many steps, where frontier models still hold an advantage, though neither side reaches strong reliability.
Performance breakdown by capability tier on AgentFloor showing frontier (GPT-5) advantage on long-horizon planning/constraint-tracking tasks; both model groups have low absolute reliability on these tasks according to reported results.
high positive AgentFloor: How Far Up the tool use Ladder Can Small Open-We... performance on long-horizon planning tasks (ability to sustain coordination and ...
We evaluate 16 open-weight models, from 0.27B to 32B parameters, alongside GPT-5 across 16,542 scored runs.
Empirical evaluation reported in the paper: 16 open-weight models spanning specified parameter sizes, inclusion of GPT-5, and a total of 16,542 scored runs (reported counts).
high positive AgentFloor: How Far Up the tool use Ladder Can Small Open-We... evaluation runs (model-by-task performance across 16,542 scored runs)
We introduce AgentFloor, a deterministic 30-task benchmark organized as a six-tier capability ladder, spanning instruction following, tool use, multi-step coordination, and long-horizon planning under persistent constraints.
Paper describes the design of the benchmark: deterministic, 30 tasks, organized into six tiers covering specified capabilities. This is a descriptive claim about the artifact introduced in the work.
high positive AgentFloor: How Far Up the tool use Ladder Can Small Open-We... benchmark construction (30 tasks, six-tier capability ladder)
The paper proposes five forms of online and offline issuance of RSDM, providing a prototype for creating a globally recognized modern honest money.
Authors' stated contribution in the paper (enumeration of five issuance forms and provision of a prototype); the excerpt explicitly refers to 'five forms'.
high positive RSDM: The Consensus Honest Money in the AI Era number_of_issuance_forms_proposed_and_provision_of_a_prototype
RSDM is an innovative version of Jiaozi (a deposit receipt for base metal coin that emerged in Sichuan, China, about a thousand years ago).
Comparative/analogical claim by the authors linking the proposed design to a historical instrument; no empirical analysis provided in the excerpt.
high positive RSDM: The Consensus Honest Money in the AI Era similarity_between_RSDM_and_historical_Jiaozi
Redeemable Self-Decaying/Devaluing Money (RSDM) is a tokenized commodity money whose essential innovation is to fill the hole in the storage fee of metal coins through the self-devaluing of metal weight recorded on the deposit certificate (warehouse receipt) of metal coins.
Design/specification proposed in the paper (conceptual mechanism); no empirical evaluation or sample size reported in the excerpt.
high positive RSDM: The Consensus Honest Money in the AI Era design_feature_RSDM_self-devaluation_to_cover_storage_fee
When AI acts as an agent for cross-border capital pool and cross cyclical asset allocation, it needs a sound money that can resist the depreciation of fiat currency and store long-term value.
Theoretical argument in the paper about functional requirements of AI agents managing cross-border capital; no empirical sample reported in the excerpt.
high positive RSDM: The Consensus Honest Money in the AI Era need_for_sound_money_by_AI_agents_in_cross-border_capital_allocation
In the AI world, however, the medium of exchange tends to be a globally recognized currency.
Author's theoretical assertion / forward-looking claim in the paper; no empirical data or sample provided in the excerpt.
high positive RSDM: The Consensus Honest Money in the AI Era likelihood_of_global_currency_becoming_medium_of_exchange_for_AI
Qiushi Engine performed thousands of LLM-mediated reasoning, measurement and revision actions during its investigations (e.g., 3,242 LLM calls, 1,242 tool calls).
Operational logs and activity counts reported in the paper: 145.9 million tokens, 3,242 LLM calls, 1,242 tool calls, 163 research notes, 44 scripts.
high positive End-to-end autonomous scientific discovery on a real optical... scale of automated research activity (counts of LLM calls, tool calls, notes, sc...
Qiushi Engine combines nonlinear research phases, Meta-Trace memory and a dual-layer architecture to maintain adaptive and stable research trajectories across long-horizon investigations.
System architecture and methods section describing nonlinear research phases, Meta-Trace memory, and dual-layer architecture; demonstrated operation across long-horizon tasks in experiments (thousands of LLM and tool calls).
high positive End-to-end autonomous scientific discovery on a real optical... ability to maintain adaptive and stable research trajectories over long-horizon ...
The AI-discovered optical bilinear mechanism suggests a route towards high-speed, energy-efficient optical hardware for pairwise computation.
Interpretive claim based on the structural analogy between the discovered optical bilinear interaction and Transformer attention; conceptual argument provided in the paper rather than measured hardware speed or energy benchmarks.
high positive End-to-end autonomous scientific discovery on a real optical... potential for high-speed, energy-efficient optical hardware (conceptual implicat...
In an open-ended study (145.9 million tokens, 3,242 LLM calls, 1,242 tool calls, 163 research notes and 44 scripts), Qiushi Engine proposes and experimentally validates an optical bilinear interaction, a physical mechanism structurally analogous to a core operation in Transformer attention.
Open-ended experimental study reported in the paper with the listed activity metrics (145.9M tokens, 3,242 LLM calls, etc.); experimental investigation and measurements presented claiming validation of optical bilinear interaction and drawing structural analogy to Transformer attention's pairwise operation.
high positive End-to-end autonomous scientific discovery on a real optical... experimental validation of an optical bilinear interaction mechanism
Qiushi Engine autonomously reproduces a published transmission-matrix experiment on a non-original platform.
Experimental reproduction reported in the paper; description of executing the published transmission-matrix experiment using the Qiushi Engine on a different (non-original) optical platform and presenting measured results comparing to published experiment.
high positive End-to-end autonomous scientific discovery on a real optical... successful reproduction of a published transmission-matrix experiment (experimen...
Qiushi Discovery Engine is an LLM-based agentic system for end-to-end autonomous scientific discovery on a real optical platform.
Description and implementation of the Qiushi Engine combining LLM-based agentic control with an optical experimental platform; system design and end-to-end experiments reported in the paper (no randomized trial; system demonstration).
high positive End-to-end autonomous scientific discovery on a real optical... existence and operation of an end-to-end autonomous LLM-driven discovery system ...
The paper formalizes these limitations, addresses four alternative views, and proposes a co-existence solution plus a call to action for system builders, benchmark designers, and the memory community.
Meta-claim about the paper's content: formalization, rebuttals, and recommendations stated in the abstract; no empirical sample reported in abstract.
high positive Contextual Agentic Memory is a Memo, Not True Memory proposed research and design agenda (co-existence of lookup and weight-based mem...
Complementary Learning Systems (CLS) theory shows biological intelligence solved this problem by pairing fast hippocampal exemplar storage with slow neocortical weight consolidation.
Appeal to established neuroscience theory (CLS); the paper draws on CLS literature to justify the two-system solution in biology; no new empirical sample reported in abstract.
high positive Contextual Agentic Memory is a Memo, Not True Memory memory architecture in biological intelligence (hippocampus + neocortex)
Scalable synthetic computer creation, together with at-scale simulations, is highly promising as a foundational substrate for agent self-improvement and agentic reinforcement learning in long-horizon productivity scenarios.
Authors' conclusion/argument based on the methods and preliminary experimental results presented in the paper (interpretive claim rather than a quantified empirical result).
high positive Synthetic Computers at Scale for Long-Horizon Productivity S... suitability as a substrate for agent self-improvement and agentic RL
Given that personas are abundant at billion scale, this methodology can in principle scale to millions or even billions of synthetic user worlds with sufficient compute, enabling broader coverage of diverse professions, roles, contexts, environments, and productivity needs.
Argumentative/theoretical scalability claim based on the abundance of personas and the scalable design of the methodology (no empirical demonstration at millions/billions scale reported).
high positive Synthetic Computers at Scale for Long-Horizon Productivity S... scalability potential (number of synthetic user worlds producible)
Each run requires over 8 hours of agent runtime and spans more than 2,000 turns on average.
Reported runtime and turn-count metrics from the preliminary experiments (per-run runtime >8 hours; per-run average >2,000 turns).
high positive Synthetic Computers at Scale for Long-Horizon Productivity S... agent runtime per simulation run; number of turns per run
In preliminary experiments, we create 1,000 synthetic computers and run long-horizon simulations on them.
Reported preliminary experiment count in the paper (explicit statement: 1,000 synthetic computers were created and simulated).
high positive Synthetic Computers at Scale for Long-Horizon Productivity S... number of synthetic computers created and simulated
Conditioned on each synthetic computer, we run long-horizon simulations: one agent creates productivity objectives that are specific to the computer's user and require multiple professional deliverables and about a month of human work; another agent then acts as that user and keeps working across the computer ... until these objectives are completed.
Description of the two-agent simulation procedure in the paper (simulation design: objective-creating agent and user-acting agent executing tasks across the synthetic computer).
high positive Synthetic Computers at Scale for Long-Horizon Productivity S... ability to simulate long-horizon, user-conditioned productivity workflows
We introduce Synthetic Computers at Scale, a scalable methodology for creating such environments with realistic folder hierarchies and content-rich artifacts (e.g., documents, spreadsheets, and presentations).
Methodological description and implementation presented in the paper (design and procedures for generating synthetic computers and artifact types).
high positive Synthetic Computers at Scale for Long-Horizon Productivity S... creation of synthetic computer environments with realistic folder hierarchies an...