Evidence (6574 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Human Ai Collab
Remove filter
Startups integrate GenAI not as a peripheral tool but as a structural collaborator.
Interpretive finding from interviews and authors' theorization based on the dataset (17 interviews).
Generative AI (GenAI) is influencing how startups form, operate, and create value.
Statement in paper's introduction/abstract; supported by the study's framing and qualitative interview data (17 expert interviews).
Large-scale validation in a production code completion environment shows Echo increased the acceptance rate from 25.7% to 35.7%.
Reported result from 'large-scale validation' in a production code completion environment as stated in the paper's abstract; no sample size, statistical tests, or additional experimental details provided in the excerpt.
User-driven refinement sequences distill agents' flawed proposals into high-quality training signals.
Conceptual/empirical claim in the paper that user refinements produce verified solutions which serve as high-quality signals; supported by the paper's later validation claim but no separate sample size or statistical detail provided in the excerpt.
Echo is a generalized framework that operationalizes the transition from raw experience to learnable knowledge by echoing environmental feedback into the training loop for model optimization.
Methodological contribution described by the authors (framework description); no implementation details or quantitative validation given in the excerpt besides later mention of validation.
Widespread deployment of AI agents provides low-cost access to massive streams of real-world experience data.
Stated observation in the paper; no quantitative deployment statistics or sample sizes provided in the excerpt.
Continuous learning from 'experience data' (interactions between agents and their environments) promises to transcend the scalability and knowledge limitations of static human data.
Conceptual claim in the paper proposing continuous learning from experience data as a solution; no empirical details provided in the excerpt.
There is a session-level carryover effect: a participant's prior AI use leads to further AI adoption and entrenches their miscalibration about time savings.
Observed analyses across sessions in the three pre-registered user studies (combined N = 2691) showing that prior within-session AI use predicts subsequent AI adoption and stronger miscalibration.
People display 'efficiency-gain illusions': they overestimate how much time and effort savings AI use provides.
Same three pre-registered user studies (combined N = 2691) that measured participants' perceived time/effort savings from AI versus actual measured time/effort.
People frequently choose to use AI even when doing so is inefficient (i.e., provides no meaningful time or effort savings).
Three pre-registered user studies reported in the paper (combined N = 2691) measuring participants' choices to use AI on cognitively simple tasks and comparing those choices to measured time/effort savings.
The result is a shared vocabulary for practitioners building hybrid systems, an analytical lens for researchers studying combination patterns, and a starting point for evaluators interested in the full quality of human-AI decision-making rather than accuracy alone.
Authors' stated contributions/anticipated utility of their framework (conceptual claim about the expected usefulness of their mapping).
Closing the synergy gap requires explicit engagement with a wider design space.
Prescriptive conclusion from the authors advocating broader design engagement (conceptual recommendation based on their framework).
Meta-analyses show that AI assistance tends to improve human performance compared to working alone.
Reference to existing meta-analyses in the literature reported by the authors (meta-analytic evidence aggregated across studies; no specific meta-analysis names, sample sizes, or quantitative pooled effects provided in the excerpt).
AI is now embedded in healthcare, finance, policy, and many other domains.
Statement in the paper's introduction/abstract summarizing the current deployment of AI across domains (literature observation, no specific empirical study or sample size cited).
These results position PRISM-Coach as a practical blueprint for privacy-by-design adaptive learning systems in everyday wellness.
Authors' interpretation in paper based on the implemented system and evaluation results (telemetry + survey + matched comparison).
92% report increased privacy confidence after transparency disclosures.
In-app needs assessment survey reported in paper; percentage stated (92%). Sample size for survey not given in abstract.
Survey results show that 82% report positive perceived benefit.
In-app needs assessment survey reported in paper; percentage stated (82%). Sample size for survey not given in abstract.
In the matched comparison, AI-enabled workflow yields higher average weight loss: 5.2 kg versus 3.1 kg.
Matched 19-week comparison window reported in paper; average weight loss numbers provided (5.2 kg vs 3.1 kg); sample size not stated in abstract.
In a matched 19-week comparison window, the AI-enabled workflow achieves adherence of 0.74 versus 0.48 under static grouping.
Matched 19-week comparison window reported in paper; comparison of AI-enabled workflow vs static grouping; sample size for comparison not stated in abstract.
At the population level, daily check-in adherence increases from 0.35 to 0.68.
Three years of telemetry from ~2,800 users reported in paper (population-level metrics).
PRISM-Coach was instantiated in a commercially deployed lifestyle coaching platform and evaluated using three years of telemetry from approximately 2,800 users and an in-app needs assessment survey.
Reported deployment and evaluation details in paper; telemetry period = 3 years; approximate user count = 2,800; survey described.
A human-in-the-loop coaching assistant generates de-identified summaries and draft messages without sending raw PII or PHI to external AI services.
System design and implementation described; claimed as part of instantiated PRISM-Coach deployment.
The system uses a privacy-constrained contextual bandit to assign users to eligible peer groups under coach-capacity and stability constraints.
Algorithmic method described in paper; implemented in the deployed system.
The system uses vault-based controlled identity restoration.
Method/architecture description in paper; implemented as part of instantiated platform.
PRISM-Coach separates each user into four bounded views: Identity, Operational, Learning, and Coaching, each with distinct access controls and risk profiles.
System architecture described in paper; implemented design (instantiated) reported.
Agentic AI does not eliminate engineering discipline; it increases the value of requirements, constraints, traceability, independent verification, and human approval.
Conclusion drawn from synthesis of evidence across multiple domains and argumentation in the paper.
Agentic Agile-V and the task-level SCOPE-V loop (Specify, Constrain, Orchestrate, Prove, Evolve, Verify) convert conversational intent into structured engineering artifacts and acceptance evidence.
The paper proposes this process framework (the claim is the proposed function of the framework; no empirical evaluation given in the abstract).
Controlled studies report productivity gains in some enterprise tasks.
Controlled experimental studies referenced by the paper (specific trials/stats not provided in abstract).
These capabilities make software and hardware development faster in some settings.
Aggregated evidence cited in the paper including controlled studies and adoption studies (details not specified in abstract).
Agentic AI coding systems can inspect repositories, plan implementation steps, edit files, call tools, run tests, and submit pull requests.
Descriptive synthesis of existing agentic systems and demonstrations referenced in the paper (literature/examples); no single study or sample size given in the abstract.
ScienceClaw x Infinite provides the auditable artifact and provenance layer for this evaluation.
Paper statement that ScienceClaw x Infinite was used to supply auditable artifacts and provenance for the benchmark.
When one signal dominates, as in paradigm-shift detection, coordination mainly improves interpretation and traceability.
Reported result for the historical paradigm-shift detection task indicating limited predictive gains but improved interpretability and provenance when using coordinated agents.
Cross-channel composites improve over single-channel baselines: exoplanet vetting reaches AUROC 0.955.
Reported performance metric (AUROC=0.955) for the exoplanet vetting task comparing cross-channel composite to single-channel baselines.
When different disciplines each capture only part of the phenomenon, cross-channel composites improve over single-channel baselines: climate-vector emergence reaches AUROC 0.944.
Reported performance metric (AUROC=0.944) for the climate-vector emergence task comparing cross-channel composite to single-channel baselines.
Each case uses a frozen evaluation panel, predefined scoring protocols, explicit baselines, ablations or null controls, and stated limitations.
Methods claim describing evaluation protocol components reported in the paper.
We evaluate this question with a cross-domain benchmark spanning four scientific tasks: mapping molecular structure into musical representations, detecting historical paradigm shifts in science, identifying vector-borne disease emergence, and vetting transiting-exoplanet candidates.
Stated design of the study: description of benchmark tasks in the paper's methods/abstract.
Research on automation should be reoriented away from a primary focus on job loss toward understanding the organizational and technological transformations produced by digital work.
Normative and methodological recommendation derived from the paper's critical review of literature and the mappings of production/work networks; argued on conceptual and interpretive grounds rather than new empirical estimation.
The global HR technology market is expected to expand from USD 43.7 billion in 2025 to over USD 81 billion by 2032.
Forecast figure stated in paper (likely sourced from a market research / industry report, not specified in the excerpt).
Artificial Intelligence (AI) is increasingly marketed as a neutral arbiter capable of eliminating unconscious bias from human resource processes.
Statement in paper (assertion about industry marketing and positioning); no empirical data or citation provided in the excerpt.
Scholarly and empirical research should prioritize multilevel analysis, algorithmic governance, and ethical considerations to study the AI-infused strategic landscape.
Paper's concluding research agenda based on gaps identified in the conceptual analysis; prescriptive recommendation rather than empirical finding.
The Claude family leads the benchmark and produces the most professional-looking outputs in our qualitative review.
Empirical result reported from the paper's benchmark and qualitative review of agent outputs (specific metrics, number of agents/tasks, and quantitative scores not provided in the excerpt).
We develop an evaluation taxonomy comprising three dimensions: Accuracy, Formula, and Format, each comprising fine-grained criteria that reflect professional standards.
Methodological contribution stated in paper; described taxonomy elements (Accuracy, Formula, Format) as part of the evaluation design.
We provide one of the first evaluations of agents on end-to-end spreadsheet tasks, focusing on economically critical financial workflows such as modeling and scenario analysis.
Claim of contribution in the paper; refers to the authors' own evaluation study (details like number of tasks/agents not provided in the excerpt).
Frontier AI labs have developed agents that can construct entire spreadsheets from scratch.
Asserted in paper as background/context; no specific models, numbers, or experimental details provided in the excerpt.
LLM agents are increasingly expected to carry out end-to-end workflows, producing complete artifacts from high-level user instructions.
Framing statement in paper; no empirical data or sample size reported to support the trend claim within the excerpt.
Adoption under higher communicative standards and institutional norms can mitigate suboptimal collective equilibria by imposing social commitments on individual users.
Theoretical argument and model-based analysis proposing communicative and institutional interventions as mitigating mechanisms (conceptual and formal reasoning).
Individually stable strategies can be scaled to collective equilibria using three extrapolation principles: (a) non-communicative aggregation, (b) local social signaling, and (c) institutional norms setting.
Theoretical extrapolation/principled modeling presented in the paper (conceptual and formal extension from individual to collective level).
Canonical decision-theoretic strategies that account for adaptive user trajectories can be mapped so that agents transition between strategies based on interaction feedback to reach stable equilibria.
Analytical results from the decision-theoretic modeling in the paper showing adaptive trajectories and stable equilibria (theoretical model derivation).
The paper develops a decision- and game-theoretic approach to the human-AI delegation-verification dilemma.
Methodological contribution: construction of decision- and game-theoretic models described in the paper (modeling/theoretical development).
Emerging models of human-AI interaction predominantly advance the complementarity thesis variously dubbed human-AI collaboration and human-AI hybrid intelligence.
Literature characterization / conceptual review reported in the paper (no empirical sample or quantitative analysis cited).