Evidence (11677 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	609	159	77	738	1617
Governance & Regulation	671	334	160	99	1285
Organizational Efficiency	626	147	105	70	955
Technology Adoption Rate	502	176	98	78	861
Research Productivity	349	109	48	322	838
Output Quality	391	121	45	40	597
Firm Productivity	385	46	85	17	539
Decision Quality	277	145	63	34	526
AI Safety & Ethics	189	244	59	30	526
Market Structure	152	154	109	20	440
Task Allocation	158	50	56	26	295
Innovation Output	178	23	38	17	257
Skill Acquisition	137	52	50	13	252
Fiscal & Macroeconomic	120	64	38	23	252
Employment Level	93	46	96	12	249
Firm Revenue	130	43	26	3	202
Consumer Welfare	99	51	40	11	201
Inequality Measures	36	106	40	6	188
Task Completion Time	134	18	6	5	163
Worker Satisfaction	79	54	16	11	160
Error Rate	64	79	8	1	152
Regulatory Compliance	69	66	14	3	152
Training Effectiveness	82	16	13	18	131
Wages & Compensation	70	25	22	6	123
Team Performance	74	16	21	9	121
Automation Exposure	41	48	19	9	120
Job Displacement	11	71	16	1	99
Developer Productivity	71	14	9	3	98
Hiring & Recruitment	49	7	8	3	67
Social Protection	26	14	8	2	50
Creative Output	26	14	6	2	49
Skill Obsolescence	5	37	5	1	48
Labor Share of Income	12	13	12	—	37
Worker Turnover	11	12	—	3	26
Industry	—	—	—	1	1

AI tools assist across the full research lifecycle: idea generation, study design, literature review and synthesis, data management and analysis, writing/editing, publishing, communication, and compliance.

Key point asserted in the paper. Implied support comes from aggregated reports and studies of tool functionality and user reports (literature review, surveys, case studies). No specific sample or usage statistics provided in the abstract.

medium positive Artificial Intelligence for Improving Research Productivity ... use of AI tools by research stage (task-level adoption rates); extent of AI-assi...

AI is becoming an integrated research productivity layer in universities that speeds and improves the entire scholarly workflow — from idea generation through analysis to dissemination — by lowering cognitive and technical burdens, which boosts research quality and institutional research performance.

Statement presented as the paper's main finding. Abstract summarizes "recent evidence" but does not specify original data or methods; likely based on literature synthesis (empirical studies, survey/interview work, case reports) rather than a single original dataset. No sample size, measurement definitions, or identification strategy provided in the abstract.

medium positive Artificial Intelligence for Improving Research Productivity ... research productivity (workflow speed, time-to-completion), research quality (qu...

First‑mover adoption and superior governance can create persistent competitive advantages for firms deploying generative AI effectively.

Theoretical reasoning and case examples from industry reports included in the synthesis; absence of broad causal evidence noted.

medium positive The Use of ChatGPT in Business Productivity and Workflow Opt... persistence of firm performance advantages (profitability, market share) post‑ad...

Scale and data advantages associated with generative AI adoption may reinforce winner‑take‑all dynamics, favoring large firms that can exploit data and integration economies.

Conceptual argument and industry observations synthesized in the review; no comprehensive market concentration empirical analysis presented.

medium positive The Use of ChatGPT in Business Productivity and Workflow Opt... market concentration (HHI), firm market share growth, entry/exit rates

Realizing sustainable economic value from generative AI requires robust governance, AI literacy, and human‑centric augmentation strategies (AI as assistant, not replacement).

Normative conclusion based on conceptual synthesis of empirical patterns and theoretical arguments in the review.

medium positive The Use of ChatGPT in Business Productivity and Workflow Opt... sustained economic returns (ROI), long‑run productivity, adoption success condit...

Generative AI has potential to improve the quality of information processing and the speed of decision‑making.

Conceptual arguments plus early case examples and small empirical studies reported in the literature synthesis; no broad causal estimates provided.

medium positive The Use of ChatGPT in Business Productivity and Workflow Opt... information quality (accuracy, completeness), decision latency

Short‑term deployments of generative AI produce efficiency gains such as time savings and faster turnaround.

Early empirical studies and industry reports summarized in the review; reported case examples of tool deployments (no unified sample size reported).

medium positive The Use of ChatGPT in Business Productivity and Workflow Opt... time savings (minutes/hours per task), turnaround time

Generative AI produces measurable gains in operational efficiency and strategic insight.

Synthesized findings and illustrative case examples from early empirical studies and industry reports; authors note lack of large-scale causal evidence.

medium positive The Use of ChatGPT in Business Productivity and Workflow Opt... operational efficiency (processing time, throughput), measures of strategic insi...

Generative AI enables scalable personalized communication with customers, employees, and partners.

Aggregation of industry use cases and early empirical reports discussed in the conceptual synthesis (no large-scale causal studies reported).

medium positive The Use of ChatGPT in Business Productivity and Workflow Opt... personalization scale (messages per unit time), engagement metrics (response rat...

Generative AI enhances decision support by synthesizing information, surfacing options, and generating explanations for decision‑makers.

Critical literature synthesis and early case examples from industry reports and small studies cited in the review; theoretical evaluation of decision workflows.

medium positive The Use of ChatGPT in Business Productivity and Workflow Opt... decision support effectiveness (quality of synthesized information), decision sp...

Generative AI automates routine administrative workflows and parts of analytical pipelines.

Nano review / conceptual synthesis aggregating early empirical studies, industry reports, and case examples; no original primary dataset reported.

medium positive The Use of ChatGPT in Business Productivity and Workflow Opt... degree of task automation (share of routine administrative/analytical tasks auto...

Short-run: measurable productivity gains for many coding tasks imply higher effective output per developer.

Controlled experiments and benchmark tasks that report time savings and/or increased task throughput with LLM assistance; studies often in lab/microtask settings with varying sample sizes.

medium positive ChatGPT as a Tool for Programming Assistance and Code Develo... effective output per developer (productivity metrics)

Organizations will need to build processes and tools (automated testing, static analysis, code review augmented for AI outputs) to realize net benefits safely.

Qualitative case studies and practitioner reports documenting emerging organizational practices and recommendations; derived from observed failure modes and security/IP risks.

medium positive ChatGPT as a Tool for Programming Assistance and Code Develo... adoption of verification tooling and process changes (qualitative/operational re...

The highest value arises when human developers verify, adapt, and integrate AI suggestions—human–AI complementarity.

User studies and controlled experiments showing improved outcomes when humans validate and edit AI outputs; qualitative interviews and case studies reporting effective human-in-the-loop workflows.

medium positive ChatGPT as a Tool for Programming Assistance and Code Develo... task success rate, final code quality, and error rates when human verification i...

These tools lower initial barriers for novices by giving example code, explanations, and templates, potentially accelerating onboarding.

User studies, observational analyses, and qualitative interviews reporting that novices use LLM outputs as examples and templates; evidence primarily short-term and context-dependent.

medium positive ChatGPT as a Tool for Programming Assistance and Code Develo... novice task performance and onboarding time

LLMs are most effective when used interactively as assistants rather than as autonomous code authors.

User studies, observational analyses, and controlled comparisons showing better outcomes for interactive, iterative prompting and verification versus one-shot autonomous code generation; heterogeneous study designs (mostly short-term lab or microtask settings).

medium positive ChatGPT as a Tool for Programming Assistance and Code Develo... task success rate and code quality when used interactively versus autonomous gen...

LLMs can speed up many programming tasks (boilerplate, code completion, documentation, simple debugging) and change how developers iterate.

Synthesis of controlled experiments and benchmark tasks comparing developer speed/accuracy with and without LLM assistance, supplemented by user studies and observational analyses; sample sizes and tasks vary across studies (typically lab/microtask settings, often tens to low hundreds of participants).

medium positive ChatGPT as a Tool for Programming Assistance and Code Develo... developer productivity (task completion time, throughput) and task iteration fre...

Token taxes incentivize more efficient model designs (fewer tokens per task) and may shift competition toward lightweight models or on-device solutions.

Mechanism-based economic reasoning about price incentives included in the paper; no empirical or simulation evidence provided.

medium positive Token Taxes: mitigating AGI's economic risks model efficiency (tokens per task) and market composition (lightweight/on-device...

Agent-based models (ABMs) are needed to simulate micro-to-macro dynamics of token taxes because standard representative-agent or DSGE models may miss heterogeneity, network effects, and path dependence.

Methodological argument in the paper advocating ABMs; no ABM results included (proposal only).

medium positive Token Taxes: mitigating AGI's economic risks ability of models to capture heterogeneity, network effects, path dependence (mo...

Black-box token verification (tamper-evident consumption tokens or receipts tied to API calls) can prove taxable consumption without full model inspection.

Technical proposal for cryptographic/ledgered receipts described in the paper; no prototype, security analysis, or empirical tests provided.

medium positive Token Taxes: mitigating AGI's economic risks verifiability of inference consumption without inspecting model internals

A staged audit pipeline—black-box token verification, norm-based tax rates, then white-box audits—provides a feasible path to design and evaluate token taxes.

Proposed enforcement architecture described in the paper (conceptual design); no deployment or simulation results presented.

medium positive Token Taxes: mitigating AGI's economic risks compliance detection and enforcement feasibility

Token taxes can be enforced using existing compute-governance and commercial billing infrastructure (API billing, cloud metering, hardware telemetry, attestation).

Technical architecture discussion proposing use of existing billing and telemetry systems; no implementation or pilot data provided.

medium positive Token Taxes: mitigating AGI's economic risks practical enforceability using existing infrastructure

Compared with robot- or FLOP-based taxes, token taxes better capture where AI-generated value is realized.

Analytic comparison in the paper arguing tokens map to user-facing consumption while FLOP/robot taxes map to inputs; conceptual reasoning rather than empirical test.

medium positive Token Taxes: mitigating AGI's economic risks alignment between tax base and location of value realization (value capture)

The framework enables scenario testing for policies and shocks (e.g., lockdowns, targeted interventions, information campaigns) where human judgment and adaptation matter.

Paper reports experiments across policy regimes and discusses use cases for testing timing, targeting, and communication strategies; however, concrete policy evaluation examples and quantitative policy results are not detailed in the summary.

medium positive An LLM-Driven Multi-Agent Simulation Framework for Coupled E... suitability for scenario/policy analysis (ability to simulate policy-induced cha...

Experiments run with multiple LLM backends (proprietary and open-source) show qualitatively consistent dynamics, indicating framework stability to model choice.

Cross-backend comparisons and robustness checks reported in the paper; several LLMs used though the exact models and counts are not specified in the summary.

medium positive An LLM-Driven Multi-Agent Simulation Framework for Coupled E... qualitative consistency of macro dynamics (e.g., similarity in infection/economi...

Behavioral changes in the simulation emerge endogenously from cognitive reasoning rather than from parameterized switches, producing context-sensitive, heterogeneous responses.

Description of agent heterogeneity (differences in perceptions, priorities, and local conditions) and use of CoT reasoning per agent; reported emergent, diverse responses in experiments. (Degree of heterogeneity and quantitative heterogeneity metrics not provided in summary.)

medium positive An LLM-Driven Multi-Agent Simulation Framework for Coupled E... heterogeneity in individual behaviors (context-sensitive changes in contacts, wo...

LLM-driven agents embedded in a Perception–Deliberation–Action (PDA) loop produce endogenous, human-like adaptive behaviors via Chain-of-Thought reasoning.

Multi-agent simulation where each agent is implemented as an LLM-driven cognitive unit running the PDA loop each timestep; agents use Chain-of-Thought (CoT) prompts/internal reasoning to make decisions. (Exact simulation sample size / population not specified in summary.)

medium positive An LLM-Driven Multi-Agent Simulation Framework for Coupled E... agent-level behavioral adaptation patterns / ‘‘human-likeness’’ of decisions (e....

Task‑based, dynamic exposure measures and real‑time data enable earlier detection of displacement risks and reallocation needs than static, occupation‑level extrapolations.

Conceptual argument and proposed architecture; no empirical timing comparison or lead-time statistics provided.

medium positive Enhancing BLS Methodologies for Projecting AI's Impact on Em... detection lead time for displacement risks; timeliness of signals indicating rea...

LLMs can be used to score task automation/augmentation plausibility and to detect emergent tasks.

Methodological proposal describing use of LLMs for semantic mapping/scoring of tasks; no empirical validation or accuracy metrics for LLM task scoring provided in the paper.

medium positive Enhancing BLS Methodologies for Projecting AI's Impact on Em... task-level automation/augmentation plausibility scores; detection of emergent ta...

Modeling nonlinearity (threshold adoption, network spillovers, complementarities) and path dependence in adoption dynamics is necessary rather than relying on linear extrapolation.

Theoretical argument and model suggestions (S‑curve diffusion, agent-based models) in the paper; no empirical comparison demonstrating superior performance provided.

medium positive Enhancing BLS Methodologies for Projecting AI's Impact on Em... accuracy of adoption dynamics forecasts; capture of threshold and spillover effe...

Applying causal inference methods (difference‑in‑differences, synthetic controls, instrumental variables, structural counterfactuals) can distinguish automation (task substitution) from augmentation (productivity/role change) and estimate net employment effects.

Methodological recommendation with examples of applicable identification strategies; no specific empirical applications or results reported in the paper.

medium positive Enhancing BLS Methodologies for Projecting AI's Impact on Em... causal estimates separating substitution vs augmentation effects; net employment...

Integrating multiple data streams (CPS, LEHD/LODES, UI wage records, administrative microdata, job ads, occupational manuals, enterprise adoption surveys) yields richer gross‑flows and skills measurement than using single data sources.

Proposed data-integration strategy and references to candidate datasets; no empirical demonstration or quantified improvement in measurement presented.

medium positive Enhancing BLS Methodologies for Projecting AI's Impact on Em... quality of gross‑flows estimates (transition rates, spell durations), comprehens...

A dynamic Occupational AI Exposure Score (OAIES) can quantify exposure at the task level using LLMs, job‑task matrices (e.g., O*NET), and real‑time job ad / workplace data to capture evolving capability of AI systems.

Methodological description of OAIES construction (mapping tasks to occupations, LLM scoring, weighting by time use/criticality); no empirical implementation or validation data presented in the paper.

medium positive Enhancing BLS Methodologies for Projecting AI's Impact on Em... OAIES scores (task- and occupation-level exposure measures) with uncertainty int...

Measurement and forecasting should move away from occupation-level forecasts toward task-level, continuously updated indicators linked to real-world adoption measures (firm purchases, API usage, procurement).

Recommendation in the paper motivated by rapid changes in AI capabilities and limitations of static indices; evidence basis is methodological argument and examples of richer adoption measures rather than a quantified evaluation of forecast improvements.

medium positive Recent Methodologies on AI and Labour - a Desk Review forecast accuracy and timeliness of AI exposure indicators

Policy should prioritise flexible reskilling and retraining programs targeted at high-risk tasks and low-skilled workers, informed by task-level exposure maps.

Policy implication recommended by the paper drawing on distributional findings (higher displacement risk for low-skilled tasks) and the availability of task-level exposure indices; evidence basis combines empirical pattern synthesis and normative recommendation rather than an RCT or program evaluation.

medium positive Recent Methodologies on AI and Labour - a Desk Review effectiveness of reskilling/training programs in mitigating displacement and imp...

Think tanks and international organisations are emphasising scenario planning with differing adoption initial conditions to inform reskilling and labour-market policy.

References to policy and scenario work by organisations named in the paper (TBI, IPPR, IMF, TBI 2024; IPPR 2024; Korinek 2023); evidence basis is published scenario reports and policy papers rather than experimental data.

medium positive Recent Methodologies on AI and Labour - a Desk Review policy scenario outputs (projected employment/wage/productivity under alternativ...

Practical measures (task selection, oversight, verification, governance) enable responsible deployment of GenAI that balances firm-level goals with individual consultants' skill development.

Recommendations synthesized from interviews with practitioners and the TGAIF framework; presented as practice guidance rather than experimentally tested interventions.

medium positive Where Automation Meets Augmentation: Balancing the Double-Ed... responsible deployment indicators (compliance with oversight procedures, balance...

The Task–GenAI Fit (TGAIF) framework maps task characteristics to GenAI capabilities to guide decisions about when and how to use GenAI effectively in consulting processes.

Framework inductively derived from interview data in the study; authors present mapping logic based on task features and reported GenAI capabilities. Evidence is conceptual and qualitative rather than empirically validated.

medium positive Where Automation Meets Augmentation: Balancing the Double-Ed... appropriateness of GenAI role for specific consulting tasks (decision guidance)

Generative AI offers efficiency and scaling opportunities in consulting.

Reported repeatedly in practitioner interviews summarized by the authors; qualitative impressions rather than measured productivity gains. No quantitative sample-size or effect-size reported.

medium positive Where Automation Meets Augmentation: Balancing the Double-Ed... operational efficiency (e.g., time-to-complete tasks, ability to scale deliverab...

A closed interaction loop—MLLM ingesting multimodal inputs (visual, machine feedback, user actions) and outputting structured commands and AR overlays—reduces user cognitive load during machine operation.

System architecture described in the paper plus empirical finding of reduced subjective workload in the CMM case study; supports the claim that the interaction loop contributes to cognitive-load reduction. (Causal attribution to loop structure is inferred rather than directly isolated experimentally.)

medium positive Augmented Reality-Based Training System Using Multimodal Lan... Cognitive load (subjective workload measures) and qualitative alignment of guida...

An iterative, scenario-refined prompt engineering structure enables the LLM (ChatGPT in this study) to generate task-specific, contextualized guidance that aligns with real-time user actions and machine state.

System design and methods: authors describe developing and refining a prompt structure across multiple machine-operation scenarios and using ChatGPT as the generative engine to produce stepwise instructions and contextual overlay content. Evidence is methodological and qualitative within the paper's development process.

medium positive Augmented Reality-Based Training System Using Multimodal Lan... Quality/alignment of LLM-generated guidance with scenario context and real-time ...

Participants reported lower perceived workload and improved usability when using the AR-MLLM system.

Subjective workload/usability questionnaires were administered in the CMM case study; authors report reduced reported workload under AR-MLLM guidance. (Questionnaire instrument, scales, and sample size not specified in the summary.)

medium positive Augmented Reality-Based Training System Using Multimodal Lan... Subjective workload/usability (self-reported measures)

Participants completed assigned CMM tasks faster when using the AR-MLLM system compared to baseline/traditional training.

Task execution time was recorded in the CMM case study; authors report statistically meaningful reductions in completion time with AR-MLLM guidance versus baseline. (Summary does not give numerical effect sizes or sample size.)

medium positive Augmented Reality-Based Training System Using Multimodal Lan... Task execution time (duration to complete assigned operations)

The AR-MLLM system achieved high measurement/feature-activity accuracy (participants performed correct measurements under AR-MLLM guidance).

Measurement/feature activity correctness was measured in the CMM case study; authors report high measurement accuracy under the AR-MLLM condition. (Exact rates and sample size not provided in the summary.)

medium positive Augmented Reality-Based Training System Using Multimodal Lan... Measurement/feature activity accuracy (correctness of performed measurements)

The AR-MLLM system achieved high task-recognition accuracy (the system correctly identified the current task/step).

Measured task recognition accuracy in the CMM case study; authors report 'high' recognition accuracy for the system. (Exact numeric accuracy and sample size not specified in the summary.)

medium positive Augmented Reality-Based Training System Using Multimodal Lan... Task recognition accuracy (system correctly identifying current task/step)

An AR + multimodal LLM (AR-MLLM) training system can substantially improve training and execution in complex machine operations (demonstrated on a Coordinate Measuring Machine).

Case-study experiment in the paper where human participants performed CMM measurement tasks both with and without the AR-MLLM system; metrics collected included task recognition accuracy, measurement activity correctness, task completion time, and subjective workload/usability. (Participant sample size not specified in the provided summary.)

medium positive Augmented Reality-Based Training System Using Multimodal Lan... Overall training and execution performance (aggregated: task accuracy, task comp...

AI methods such as transfer learning, active learning, and Bayesian approaches improve data efficiency and uncertainty quantification in drug discovery and preclinical modeling.

Methodological literature and exemplar studies summarized in the review describing these approaches; heterogeneous examples, no quantitative synthesis.

medium positive Artificial Intelligence in Drug Discovery and Development: R... data efficiency (number of experiments/samples needed), calibration of uncertain...

Clear regulatory alignment (e.g., preparation of credibility plans and qualified digital endpoints) reduces regulatory uncertainty, de-risks investment, and raises adoption rates of AI tools.

Policy and regulatory framework analysis in the review; references to regulatory guidance and qualification processes (narrative, forward-looking).

medium positive Artificial Intelligence in Drug Discovery and Development: R... regulatory uncertainty (qualitative), investment adoption rates in AI tools, pac...

Economic value from AI adoption concentrates with data-rich firms and platforms that own large, high-quality datasets and validation pipelines.

Economic analysis and theoretical arguments in the paper (narrative), supported by observed market patterns cited in the literature; no formal empirical valuation provided.

medium positive Artificial Intelligence in Drug Discovery and Development: R... firm returns/competitive advantage attributable to dataset ownership and validat...

Adopting equity-by-design (including diverse, non‑European datasets and subgroup evaluation) reduces model bias and improves global generalizability of AI models.

Recommendations and examples in the review; draws on literature documenting subgroup performance differences and bias remediation strategies (narrative evidence).

medium positive Artificial Intelligence in Drug Discovery and Development: R... subgroup performance disparities, generalizability across populations/geographie...

« Prev 1 2 3 … 209 210 211 … 233 234 Next »