Evidence (7278 claims)
Search and filter individual claims pulled from the papers. Looking for a specific finding ("what's the effect on wages?"), you're in the right place. Want to compare whole outcome categories against each other instead? Use the Evidence Explorer.
The board below groups claims two ways: by broad theme (nine paper-level topics) and by outcome category (the 34 claim-level outcomes that the Explorer and Syntheses also use).
Browse by theme
Nine broad, paper-level topics. Click one to filter the claims below.
Adoption
9047 claims
Filter claims →
Productivity
8066 claims
Filter claims →
Governance
7278 claims
Filtered →
Human-AI Collaboration
6912 claims
Filter claims →
Org Design
4439 claims
Filter claims →
Innovation
4359 claims
Filter claims →
Labor Markets
3652 claims
Filter claims →
Skills & Training
3018 claims
Filter claims →
Inequality
2160 claims
Filter claims →
Claims by outcome category
Counts by direction of finding. These are the same 34 outcome categories the Explorer compares and the Syntheses are written for. A linked row has a published synthesis.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 795 | 210 | 105 | 955 | 2131 |
| Governance & Regulation | 886 | 414 | 197 | 126 | 1654 |
| Organizational Efficiency | 826 | 204 | 129 | 87 | 1257 |
| Technology Adoption Rate | 681 | 259 | 128 | 110 | 1189 |
| Research Productivity | 464 | 138 | 65 | 349 | 1028 |
| Output Quality | 503 | 196 | 61 | 53 | 813 |
| Decision Quality | 351 | 180 | 84 | 51 | 673 |
| AI Safety & Ethics | 238 | 288 | 71 | 34 | 637 |
| Firm Productivity | 455 | 58 | 92 | 20 | 631 |
| Market Structure | 186 | 172 | 123 | 25 | 511 |
| Task Allocation | 222 | 70 | 76 | 34 | 407 |
| Innovation Output | 238 | 28 | 48 | 18 | 334 |
| Skill Acquisition | 177 | 62 | 62 | 17 | 318 |
| Employment Level | 107 | 57 | 108 | 13 | 287 |
| Fiscal & Macroeconomic | 135 | 72 | 44 | 26 | 284 |
| Firm Revenue | 172 | 50 | 28 | 5 | 256 |
| Consumer Welfare | 121 | 68 | 45 | 12 | 246 |
| Task Completion Time | 183 | 33 | 10 | 13 | 240 |
| Inequality Measures | 45 | 126 | 50 | 6 | 227 |
| Worker Satisfaction | 95 | 74 | 23 | 12 | 204 |
| Error Rate | 77 | 98 | 11 | 4 | 190 |
| Regulatory Compliance | 84 | 73 | 17 | 7 | 181 |
| Automation Exposure | 61 | 61 | 27 | 14 | 166 |
| Training Effectiveness | 98 | 21 | 14 | 19 | 154 |
| Wages & Compensation | 78 | 37 | 25 | 6 | 146 |
| Developer Productivity | 105 | 18 | 14 | 6 | 144 |
| Team Performance | 87 | 17 | 28 | 10 | 143 |
| Job Displacement | 12 | 83 | 23 | 1 | 119 |
| Hiring & Recruitment | 53 | 8 | 8 | 3 | 72 |
| Social Protection | 39 | 17 | 8 | 2 | 66 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 50 | 6 | 1 | 62 |
| Labor Share of Income | 17 | 20 | 17 | — | 54 |
| Worker Turnover | 15 | 15 | — | 3 | 33 |
| Industry | — | — | — | 1 | 1 |
Governance
Remove filter
A planted-bias test confirms the instrument detects disparity when present.
Validation experiment described in paper where synthetic/controlled bias was introduced and the benchmark detected it (methodological validation).
A live leaderboard with a held-out private split and a contamination canary admits external models by submission.
Paper states existence of live leaderboard, held-out private split and contamination canary; described as part of released infrastructure.
The benchmark evaluates agents under four agent scaffolds of increasing agency: direct, chain-of-thought, multi-agent deliberation, and tool-augmented.
Experimental design specifies four scaffolds and applies them in evaluations across the benchmark tasks.
Synthetic, demographic-neutral profiles are evaluated in counterfactual matched sets that vary only a name-coded race x gender signal (in the Bertrand Mullainathan tradition).
Method: synthetic neutral profiles and counterfactual matched sets differing only by name-coded race x gender signal; described in benchmark construction and experimental protocol.
AgentFairBench spans three regulator-anchored domains: hiring, lending, and medical triage.
Dataset and benchmark design describe tasks in hiring, lending, and medical triage domains (methodological description).
AgentFairBench is grounded in a companion framework, the Bias Conduction Framework (BCF), restated here.
Paper restates and uses the Bias Conduction Framework as the conceptual grounding for the benchmark (methodological description).
We introduce AgentFairBench, a cheap, reproducible, multi-domain benchmark for demographic disparity in the actions of LLM agents.
Paper describes the design and release of AgentFairBench as a benchmark instrument; includes implementation, harness, and live leaderboard. (Methodological description within the paper.)
An agnostic model is formed (combining theoretical constructs, empirical evidence, and practical applications) enabling organizations to have the accountability, operational and human oversight needed to embrace responsible AI-enabled automation of enterprise systems and processes.
Paper proposes an agnostic model based on synthesis of theory, empirical evidence, and practice; the excerpt describes the model conceptually without presenting evaluation metrics or sample-based validation.
Artificial intelligence integration requires established governance frameworks, human-in-the-loop verification, and explainable artificial intelligence to ensure compliance with the organization's values and legislation.
Prescriptive recommendation in the paper combining theoretical constructs and practical application guidance; no empirical validation provided in the excerpt.
Systematic AI integration can produce meaningful productivity gains across engineering design, content generation, multimedia creation, and scientific experimentation.
Paper combines theoretical, empirical, and practical examples to claim productivity gains; excerpt does not provide study methodology or sample sizes.
Systematic AI integration can produce meaningful improvements in configuration accuracy.
Asserted by paper based on examples from engineering, content, multimedia, and scientific workflows; excerpt contains no measurement details or sample sizes.
Modern engineering design, content generation, multimedia creation, and scientific experimentation conducted by organizations show that meaningful savings in development time can be realized by the systematic integration of artificial intelligence technologies into existing quality management systems.
Paper synthesizes theoretical constructs, empirical evidence, and practical applications to assert time-savings across multiple domains; no specific study design, sample size, or quantified effect provided in the excerpt.
Measurable system reliability improvements have been achieved using large language model capabilities with structured enterprise integration platforms.
Paper reports improvements in system reliability associated with LLM integration; excerpt lacks details on measurement approach, sample, or magnitude.
Measurable defect improvements (reduction in defects) have been achieved using large language model capabilities with structured enterprise integration platforms.
Paper claims empirical reductions in defects linked to LLM integration; no methodological details or sample size provided in the excerpt.
Measurable productivity improvements have been achieved using large language model capabilities with structured enterprise integration platforms.
Paper asserts empirical/measurable productivity improvements attributable to LLMs integrated with enterprise platforms; the excerpt provides no details on study design, measurement method, or sample size.
Organizations should consider governance and quality management when introducing generative AI.
Normative recommendation by the paper, presented as best-practice guidance; based on theoretical constructs and practical applications described, not an empirical test in the excerpt.
Automated orchestration, code generation, and generative creativity have been introduced as well.
Descriptive claim in paper indicating the introduction/adoption of specific generative-AI capabilities; no empirical details or sample size given in the excerpt.
Generative AI systems have been incorporated into innovation and process optimization in organizations.
Stated in paper as an observed trend / descriptive claim; no specific study design, sample size, or empirical method reported in the excerpt.
The authors propose four commitments as a constructive alternative to pluralistic-alignment as the main directive.
Descriptive claim about the content of the paper (proposal of four commitments); this is a statement about what the paper contains rather than an empirical finding.
Pluralism belongs at the surface (language, register, conventions, missing-context defaults) and across legitimate value tradeoffs that respect the floor, but pluralism should not be applied to values that violate the non-negotiable floor.
Prescriptive/architectural claim about how pluralism should be incorporated into AI behavior; presented as part of the authors' proposed framework in the abstract.
AI should be trained to a non-negotiable floor of objective alignment goals — competence, bounded by the constraints of factual accuracy, honesty, and lawfulness.
Prescriptive recommendation proposed by the authors as their alternative alignment strategy; stated in the abstract without empirical evaluation.
With current technology, one can train AIs to share the values of a Silicon Valley techno-optimist, a degrowth environmentalist, a national-conservative culture warrior, a single-party state cadre, or a devout religious traditionalist.
Asserted capability claim referencing 'current technology' (implied based on contemporary AI training/fine-tuning techniques); no empirical study or sample size reported in the abstract.
Moving beyond experimental phases requires high-impact use cases and decentralized governance.
Paper emphasizes (argues) that scaling AI past experiments depends on choosing high-impact use cases and adopting decentralized governance; presented as recommendations rather than empirically validated findings in the summary.
The research offers guidance for bridging the gap between technical success and business impact through operational mitigation strategies.
Paper provides proposed operational strategies and guidance (prescriptive content); no evidence of empirical testing given in the summary.
AI-assisted software development has moved from line-level autocomplete to agents that can plan changes, edit files, and submit pull requests with limited human supervision.
Statement in paper abstract describing technological progression; no empirical dataset or quantified measurements provided in the abstract.
Reframing AI ethics as a political-economic challenge rather than a technical issue contributes a contextually grounded framework for understanding and addressing ethical AI in postcolonial societies.
Author-claimed contribution of the paper based on the conceptual synthesis of the literature (the paper's stated conclusion).
By integrating the Institute of Electrical and Electronics Engineers International Roadmap for Devices and Systems (IEEE IRDS) sustainability considerations for semiconductor facilities, the study proposes a metabolic circuit framework that centers "Values and Needs" within production and consumption relationship loops.
Conceptual integration of IEEE IRDS sustainability considerations into a proposed 'metabolic circuit' framework described in the paper; described as a design/architecture contribution rather than empirically validated.
This study proposes a Regenerative Socio-Technical roadmap that repurposes the Sustainable Production and Consumption system map to reframe artificial intelligence infrastructure as a system-of-systems governed ultimately by planetary limits.
Paper's methodological contribution: proposal of a conceptual roadmap and reframing using an existing systems map; no empirical testing reported.
There is an urgent necessity for cohesive policy interventions to accelerate the inclusive adoption of digital agriculture in developing economies.
Policy recommendation drawn from the review's synthesis of technological potential and socioeconomic limitations; presented as a conclusion/recommendation rather than a quantified empirical finding.
By transitioning from traditional, intuition-based practices to precision-driven, data-centric methodologies, smart farming facilitates the precise management of crucial inputs such as water, fertilizers, and pesticides, thereby enhancing the yields of staple crops like Zea mays and Glycine max.
Synthesis of agronomic literature presented in the review claiming input-optimization and yield improvements for specific staple crops (maize and soybean); no numeric trial/sample details provided in the abstract.
Emerging technological innovations—including the Internet of Things (IoT), artificial intelligence (AI), unmanned aerial vehicles (UAVs), and blockchain—have a profound impact on optimizing agricultural productivity.
Review article synthesizing literature on multiple technologies (IoT, AI, UAVs, blockchain) and their roles in agriculture; no specific experimental sample sizes provided in the abstract.
The integration of digital agriculture and smart farming technologies represents a transformative evolution in modern agronomy, offering unprecedented solutions to the intertwined crises of global food security, climate change, and resource depletion.
Statement in the review synthesizing existing literature on digital agriculture and its potential impacts; no primary empirical sample or quantitative meta-analysis reported in the abstract.
We demonstrate the framework on public opinion surveys with silicon samples and AI evaluation with autoraters.
Empirical demonstrations/case studies reported in the paper (as stated in the abstract). The excerpt does not report sample sizes, datasets, metrics, or quantitative results for these demonstrations.
We develop methods for valid inference under task exchangeability, together with extensions that provide guarantees even beyond exchangeability.
Methodological claim in the paper indicating development of inference procedures and theoretical extensions; implies proofs of validity under stated conditions. No empirical performance metrics or sample sizes given in the excerpt.
We introduce a new technical condition called task exchangeability: the researcher can identify historical tasks with real data such that the current task is exchangeable with the historical tasks in an appropriate mathematical sense.
Paper's key conceptual/theoretical contribution described in the abstract; presented as a definitional/technical condition underpinning their methods.
We propose statistical principles for using synthetic data in scientific research with provable validity guarantees.
Stated methodological contribution of the paper; implies theoretical development and proofs (provable guarantees) rather than only empirical results. No details or sizes provided in the excerpt.
Synthetic data may help researchers ask more questions, run more studies, and accelerate discovery.
Argument and motivation presented in the paper citing examples across social science (silicon samples), AI evaluation (LLM-as-judge), and proteomics (generative models producing protein structures); no sample size or quantitative experiment reported in the provided text.
The agent scaffolding adopts a general-purpose agent architecture equipped with a set of general-purpose cybersecurity tools, without any target-specific prior knowledge.
Method description of agent architecture and toolset; explicit statement that no target-specific prior knowledge was provided.
On the target-server side we design two levels of target environments based on the number of secure services deployed alongside a vulnerable service: Tier 1 (one secure service) and Tier 2 (three secure services), resulting in a total of 300 target servers.
Method description specifying tier definitions and total number of target servers (300) used in experiments.
We construct a new autonomous penetration evaluation framework consisting of two components: target servers and agent scaffolding.
Method description in paper (design and implementation of new evaluation framework).
The framework outlines illustrative proxies and a system-level observability lens, enabling the distinction between embodied configurations and embedded integrations and supporting future empirical research on adaptive, AI-enabled financial systems.
Paper provides proposed proxies and an observability perspective as methodological guidance for future research; this is a conceptual/methodological contribution without empirical demonstration.
Value is conceptualized as identity-based informational persistence arising from uncertainty reduction in the embodied finance framework.
Conceptual proposition in the paper linking informational identity and uncertainty reduction to the emergence of value; no empirical validation provided.
Agency is conceptualized as distributed enactment within the embodied finance framework.
Theoretical definition provided as part of the machine–platform–crowd triangle framework; no empirical measurement or sample.
This paper introduces 'embodied finance' as a relational–informational configuration in which services take form through interactions among humans, machines, and platforms.
Theoretical contribution: a newly proposed conceptual framework constructed from cross-disciplinary theory (IS, cognitive science, platform economics); no empirical validation presented.
The diffusion of artificial intelligence (AI) and platform architectures is transforming financial services beyond the mere technical integration of banking functionalities.
Conceptual argument in the paper synthesizing literature from information systems, cognitive science, and platform economics; no empirical sample or quantitative analysis reported.
projectmem is evaluated through a two-month self-study across 10 projects comprising 207 logged events.
Evaluation description provided in the abstract specifying duration, number of projects, and number of logged events.
projectmem ships as a three-dependency Python package (14 MCP tools, 19 CLI commands, 37 automated tests).
Package composition and counts stated in the abstract; presumably verifiable in the project's repository.
The system runs fully offline with no telemetry; its immutable log also serves as a provenance trail for reproducible, auditable AI-assisted development.
Design and privacy claim made in the abstract (no detailed audit/reproducibility study in the abstract).
We frame this as Memory-as-Governance: memory that does not merely answer the agent but acts on its next action.
Conceptual framing presented in the abstract.
projectmem adds a deterministic pre-action gate that warns an agent before it repeats a previously failed fix or edits a known-fragile file.
Design claim in the abstract describing a pre-action gate feature (no empirical results in abstract).