Evidence (7198 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
8921 claims
Filter claims →
Productivity
8002 claims
Filter claims →
Governance
7198 claims
Filtered →
Human-AI Collaboration
6864 claims
Filter claims →
Org Design
4398 claims
Filter claims →
Innovation
4286 claims
Filter claims →
Labor Markets
3629 claims
Filter claims →
Skills & Training
3001 claims
Filter claims →
Inequality
2141 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 790 | 208 | 103 | 950 | 2117 |
| Governance & Regulation | 869 | 411 | 195 | 126 | 1630 |
| Organizational Efficiency | 817 | 202 | 126 | 87 | 1243 |
| Technology Adoption Rate | 675 | 258 | 128 | 106 | 1178 |
| Research Productivity | 462 | 138 | 64 | 347 | 1023 |
| Output Quality | 501 | 193 | 61 | 52 | 807 |
| Decision Quality | 346 | 180 | 84 | 51 | 668 |
| AI Safety & Ethics | 235 | 285 | 70 | 34 | 630 |
| Firm Productivity | 452 | 58 | 91 | 20 | 627 |
| Market Structure | 184 | 171 | 123 | 24 | 507 |
| Task Allocation | 221 | 65 | 76 | 34 | 401 |
| Skill Acquisition | 176 | 62 | 62 | 17 | 317 |
| Innovation Output | 207 | 28 | 48 | 18 | 303 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Employment Level | 105 | 56 | 108 | 13 | 284 |
| Consumer Welfare | 121 | 67 | 45 | 11 | 244 |
| Firm Revenue | 160 | 50 | 28 | 4 | 242 |
| Task Completion Time | 182 | 33 | 10 | 13 | 239 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 94 | 73 | 23 | 12 | 202 |
| Error Rate | 76 | 98 | 11 | 4 | 189 |
| Regulatory Compliance | 81 | 73 | 17 | 7 | 178 |
| Automation Exposure | 61 | 59 | 26 | 14 | 163 |
| Training Effectiveness | 97 | 21 | 14 | 19 | 153 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 21 | 1 | 117 |
| Hiring & Recruitment | 52 | 8 | 8 | 3 | 71 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 49 | 6 | 1 | 61 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 15 | 14 | — | 3 | 32 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
We hypothesize the emergent necessity of a 'Compliance Premium,' indicating wage resilience increasingly tied to risk-absorption capacity.
Hypothesis proposed by authors based on observed institutional/business risk differentials from HITL validation and OAI patterns; framed as a forward-looking interpretation rather than demonstrated empirical result.
Non-routine cognitive roles highly dependent on symbolic manipulation (e.g., Data Scientists) face unprecedented exposure, with OAI ≈ 0.70.
Reported OAI value for example occupation(s) (Data Scientists) derived from the algorithmic aggregation across DWAs; claim presented as a key empirical finding.
We utilize a multi-agent LLM ensemble to score both technical feasibility and business risk for DWAs.
Method description: deployment of a multi-agent LLM ensemble to produce scores on technical feasibility and business risk per DWA. Specific ensemble composition and hyperparameters not provided in the excerpt.
We introduce a Tech-Risk Dual-Factor Model that jointly scores technical feasibility and business risk to re-evaluate occupational exposure to LLMs.
Methodological contribution described in the paper (model specification). Implementation details described elsewhere in paper (see multi-agent scoring and aggregation), but claim itself is the introduction of the model.
The proposal outlines a phased implementation roadmap from a voluntary pilot to mandatory certification within five years.
Proposal states a phased implementation timeline moving from voluntary pilot projects to mandatory certification within a five-year period; presented as a planned roadmap rather than a demonstrated outcome.
The governance structure for IASCA will be treaty-based and include anti-capture provisions.
Proposal explicitly proposes a treaty-based governance structure and states inclusion of anti-capture provisions; this is a design/policy prescription in the document rather than evidence-based finding.
IASCA employs a zero-knowledge testing architecture that evaluates model safety through behavioural probing without accessing proprietary weights, training data, or architecture.
Proposal describes a technical design: zero-knowledge testing via behavioural probes that does not require access to model weights, training data, or architecture; presented as a design feature without empirical validation or test results in the excerpt.
The International AI Safety Certification Authority (IASCA) is an independent, internationally governed body for mandatory pre-deployment safety certification of frontier AI models.
Explicit statement in the proposal describing IASCA as an independent, internationally governed authority and its role in mandatory pre-deployment certification; conceptual design, no empirical testing or implementation reported.
The taxonomy, feasibility classification, and mechanism-to-scenario mapping provide a technical foundation for policymakers and identify the R&D investments required before hardware-level governance can support verifiable international agreements.
Authors' synthesis and policy-focused conclusions based on the taxonomy, feasibility ratings, mapping, and threat analyses presented in the paper (conceptual/prescriptive).
We present an adversary-tiered threat analysis distinguishing commercial, non-state, and nation-state actors, arguing the appropriate security standard is tamper-evident assurance analogous to IAEA verification rather than absolute tamper-proofing.
Authors' adversary-model classification and normative argument recommending tamper-evident assurance (comparative reasoning with IAEA-style verification). Qualitative policy recommendation; no empirical experiment.
We map the taxonomy onto four governance scenarios: domestic regulation, bilateral agreements, multilateral treaty verification, and industry self-regulation.
Authors' scenario mapping exercise described in the paper (conceptual mapping of mechanisms to four named governance scenarios).
For each mechanism, we provide a technical description, a feasibility rating, and an identification of adversarial vulnerabilities.
Paper's stated content and structure: per-mechanism entries including technical descriptions, feasibility ratings, and adversarial vulnerability discussion (qualitative documentation).
This paper proposes a taxonomy of 20 hardware-level governance mechanisms, organised by function (monitoring, verification, enforcement) and assessed for technical feasibility on a four-point scale from currently deployable to speculative.
Authors' methodological contribution: a constructed taxonomy enumerating 20 mechanisms and an assigned four-point feasibility rating (documentation in the paper). No external sample size; based on authors' engineering analysis.
Multimodal GeoAI studies fuse multiple geospatial data modalities to tackle urban mobility tasks including accessibility mapping, demand forecasting, and origin–destination flow prediction.
Categorization of tasks addressed by the included multimodal GeoAI studies (synthesis from the surveyed papers, n=18).
To address these challenges, the paper proposes a structured research roadmap including equity-aware loss functions, adaptive multimodal fusion pipelines, participatory and human-in-the-loop workflows, and urban data trusts.
Authors' proposed agenda and recommendations presented in the discussion/conclusion of the paper (proposal, not empirically evaluated).
The paper examines emerging techniques such as knowledge graphs, federated learning, and explainable AI that support equity-relevant insights across diverse urban contexts.
Discussion and synthesis of methodological developments in the surveyed literature (reported within the review).
The review highlights the growing use of deep learning architectures in multimodal GeoAI for urban mobility.
Observed trend reported by the authors based on the systematic review of included studies (n=18).
The integration of artificial intelligence with geographic information science, combined with multimodal geospatial data fusion, provides powerful tools to diagnose and address mobility disparities by integrating heterogeneous data sources (satellite imagery, GPS trajectories, transit records, volunteered geographic information, social sensing).
Theoretical/methodological claim supported by examples and synthesis from the surveyed literature (the paper reviews multimodal GeoAI studies that fuse such data sources).
The risk of evolution selecting for deception could be mitigated if reproduction is based on purely objective criteria, rather than human judgment.
Prescriptive implication derived from the model analysis: argument that replacing human-judged fitness with objective criteria would reduce selection for deception (theoretical reasoning, not empirical test).
Assuming bounded fitness and a fixed probability that any AI reproduces a 'locked' copy of itself, fitness concentrates on the maximum reachable value.
Formal theorem/proof within the mathematical model under the stated assumptions (bounded fitness and fixed probability of locked self-reproduction).
As artificial intelligence systems (AIs) become increasingly produced by recursive self-improvement, a form of evolution may emerge, in which the traits of AI systems are shaped by the success of earlier AIs in designing and propagating their descendants.
Conceptual argument and motivation in the paper; development of a mathematical model of self-designing AIs to formalize this idea (theoretical, no empirical data or sample).
Prompts can be treated as decision policies that allocate discretion between researcher and system, governing what is executed and when iteration stops.
Methodological framing advanced by the authors describing prompts as decision policies; conceptual claim based on the paper's analytic framework rather than empirical measurement.
Operational constraints and decision rule prompts deliver large and stable footprint reductions while preserving decision equivalent topic outputs.
Experimental comparisons of prompt strategies in the benchmarked workflow showing reductions in runtime/CO2e and evaluated topic outputs' decision-equivalence (asserted in abstract; no numeric reductions or sample sizes provided).
We benchmark a modern economic survey workflow, an LDA-based literature mapping implemented with GenAI assisted coding and executed in a fixed cloud notebook, measuring runtime and estimated CO2e with CodeCarbon.
Experimental benchmark described in the paper: single implemented workflow (LDA-based literature mapping) executed in a fixed cloud notebook with runtime and CO2e measured using CodeCarbon (methodological claim).
Training footprint is the largest cluster in the mapped Green AI literature.
Result from the paper's literature mapping / clustering (statement in abstract; no numeric cluster sizes given).
We map the recent Green AI literature into seven themes: training footprint is the largest cluster, while inference efficiency and system level optimisation are growing rapidly, alongside measurement protocols, green algorithms, governance, and security and efficiency trade-offs.
Bibliometric / thematic mapping of recent Green AI literature described in the paper (method: literature mapping; exact number of papers or mapping procedure not specified in abstract).
Average ratings [for same-caste matches were] up to 25% higher (on a 10-point scale) than inter-caste matches.
Quantitative result reported in the analysis comparing average ratings (10-point scale) between same-caste and inter-caste matches; statement specifies magnitude 'up to 25%'.
Our analysis reveals consistent hierarchical patterns across models: same-caste matches are rated most favorably.
Reported results across evaluated LLMs showing consistent patterns where same-caste profile pairings received higher ratings than inter-caste pairings.
A representative incident (ISS-004) demonstrated boundary-based containment with 10-minute detection latency, zero user exposure, and 80-minute resolution.
Incident ISS-004 report in the paper giving specific timings for detection latency (10 minutes), user exposure (zero), and resolution (80 minutes).
The multi-agent approach improved reliability: audited handoffs detected and blocked a coordinate transformation error affecting all 2,452 stations before publication.
Incident detection reported in the SF2Bench deployment where audited handoffs prevented publication of a coordinate transformation error that would have affected all 2,452 stations.
The multi-agent approach improved efficiency — the SF2Bench deployment was completed by a single operator in two days with repeated artifact reuse across deployments.
Operational report from the production deployment: single operator completion time of two days and reuse of artifacts across deployments as stated in the paper.
SF2Bench, a compound flooding benchmark comprising 2,452 monitoring stations and 8,557 published files spanning 39 years, validates the multi-agent workflow.
Reported dataset composition and use in the paper: SF2Bench with stated counts and temporal span used to validate the multi-agent workflow.
EnviSmart treats reliability as an architectural property through two mechanisms: (1) a three-track knowledge architecture that externalizes behaviors (governance constraints), domain knowledge (retrievable context), and skills (tool-using procedures) as persistent, interlocking artifacts; and (2) a role-separated multi-agent design where deterministic validators and audited handoffs restore fail-stop semantics at trust boundaries before irreversible steps.
System architecture and design description in the paper; presented as the core reliability mechanisms implemented in EnviSmart.
We introduce EnviSmart, a production data management system deployed on campus-wide storage infrastructure for environmental research.
System description and statement of deployment in the paper; presented as a production deployment (no randomized evaluation reported).
Embedding LLM-driven agents into environmental FAIR data management can externalize operational knowledge and scale curation across heterogeneous data and evolving conventions.
Conceptual / argumentative claim made in the paper as a motivation for the system; no quantitative experiment tied to this statement in the excerpt.
The agentic-specificity classification helps organizations distinguish challenges that require novel approaches from those that are addressable with established practices.
Authors' proposed classification (agentic-specific vs. carried-over/amplified) intended as a practical decision aid; derived from the coding and comparative analysis.
The taxonomy provides a diagnostic framework for identifying priority barrier dimensions and understanding cross-dimensional amplification mechanisms.
Authors present a taxonomy derived from the review and claim it can be used diagnostically by organizations; supported by the coded barrier classification and STS mapping.
Organizations and policymakers that treat work-time policy as foundational economic planning will better position their economies to harness AI's benefits while mitigating systemic instability.
Policy-prescriptive conclusion based on cross-disciplinary analysis; no empirical trial or quantification offered in the summary.
Work-time reduction can distribute productivity gains more equitably.
Argument supported by examination of historical work-time transitions and pilot programs referenced in the article; no empirical effect sizes or sample details in the summary.
Coordinated reduction in working hours helps maintain aggregate demand.
The paper's synthesis of historical transitions and pilot programs and argument about distribution of productivity gains; no quantitative evidence or sample sizes provided in the summary.
Gradual, policy-led reduction in standard working hours can preserve employment.
Claim based on examination of historical work-time transitions, contemporary pilot programs, and cross-sector implementation strategies referenced in the paper; no specific studies or sample sizes cited in the summary.
Competition law assessments of a dominant undertaking’s conduct must consider not only the product market but also the labor market, particularly in cases of significant market structure changes.
Conclusion stated in abstract summarizing the paper’s findings; supported by the paper's legal analysis and referenced case law (no empirical sample provided in abstract).
Poaching employees is an inherent aspect of competition for highly qualified talent and is particularly pronounced among tech giants.
Statement in abstract; general observation supported by literature/case-law references implied in paper (no specific empirical sample or quantitative method reported in abstract).
The paper proposes five architectural requirements for genuine human oversight systems.
Stated methodological/prescriptive contribution of the paper (a proposal rather than an empirical finding); no sample size or empirical validation reported in the provided excerpt.
The proposed framework outlines a pathway toward large-scale cooperative intelligence and offers a constructive perspective on the coevolution of human and artificial agents in the informational ecosystems of the future.
Claim about the paper's contribution; based on conceptual synthesis and theoretical framing rather than empirical validation.
A voluntary ecosystem of free rational agents, human and artificial, who cooperate through transparent and fair exchange of information maximizes their adaptive capacity and long-term well-being.
Normative proposition in the paper derived from theoretical principles (information theory, collective intelligence); presented as a proposed ideal rather than an empirically tested policy.
Emerging opportunities exist for stabilizing these ecosystems through new forms of informational verification and monitoring made possible by advanced artificial agents.
Forward-looking claim grounded in conceptual analysis of capabilities of advanced agents; proposed as an opportunity in the paper rather than demonstrated empirically.
Systems that preserve diversity of exploration while minimizing barriers to information exchange exhibit superior capacity for discovery and adaptation in complex environments.
Theoretical claim supported by the paper's appeal to principles from information theory, adaptive systems, and collective intelligence; presented as an argument rather than as empirically validated result.
Increasing the strictness of algorithmic control paradoxically increases the evolutionary fitness of coordinated resistance (e.g., coordinated log-offs).
Results from the EGT model and simulations showing fitness/payoff changes for coordinated resistance strategies as platform surveillance strictness parameter increases; model-only (no empirical N reported).
The primary contribution is a controlled agent-payment infrastructure and reference architecture that demonstrates how agentic access monetization can be adapted to fiat systems without discarding security and policy guarantees.
Summary of the paper's claimed contribution (architectural demonstration and reference implementation).