Evidence (8625 claims)
Adoption
8625 claims
Productivity
7686 claims
Governance
6917 claims
Human-AI Collaboration
6574 claims
Org Design
4189 claims
Innovation
4131 claims
Labor Markets
3588 claims
Skills & Training
2985 claims
Inequality
2066 claims
Evidence Matrix
Claim counts by outcome category and direction of finding.
| Outcome | Positive | Negative | Mixed | Null | Total |
|---|---|---|---|---|---|
| Other | 761 | 200 | 101 | 904 | 2020 |
| Governance & Regulation | 829 | 400 | 191 | 122 | 1566 |
| Organizational Efficiency | 784 | 193 | 125 | 84 | 1197 |
| Technology Adoption Rate | 637 | 236 | 124 | 97 | 1103 |
| Research Productivity | 431 | 131 | 58 | 340 | 972 |
| Output Quality | 481 | 183 | 59 | 47 | 770 |
| Decision Quality | 332 | 177 | 82 | 49 | 647 |
| Firm Productivity | 439 | 57 | 88 | 20 | 610 |
| AI Safety & Ethics | 218 | 279 | 66 | 33 | 602 |
| Market Structure | 181 | 170 | 123 | 24 | 503 |
| Task Allocation | 214 | 64 | 72 | 33 | 388 |
| Skill Acquisition | 174 | 62 | 62 | 17 | 315 |
| Innovation Output | 204 | 27 | 45 | 18 | 295 |
| Employment Level | 105 | 54 | 108 | 13 | 282 |
| Fiscal & Macroeconomic | 132 | 69 | 43 | 26 | 277 |
| Consumer Welfare | 117 | 63 | 42 | 11 | 233 |
| Firm Revenue | 154 | 48 | 26 | 3 | 231 |
| Task Completion Time | 173 | 31 | 8 | 12 | 225 |
| Inequality Measures | 44 | 123 | 50 | 6 | 223 |
| Worker Satisfaction | 89 | 65 | 22 | 12 | 188 |
| Error Rate | 71 | 92 | 10 | 2 | 175 |
| Regulatory Compliance | 77 | 69 | 14 | 5 | 165 |
| Automation Exposure | 58 | 56 | 26 | 13 | 156 |
| Training Effectiveness | 96 | 21 | 14 | 19 | 152 |
| Wages & Compensation | 77 | 37 | 25 | 6 | 145 |
| Team Performance | 86 | 17 | 27 | 10 | 141 |
| Developer Productivity | 95 | 17 | 14 | 6 | 133 |
| Job Displacement | 12 | 81 | 21 | 1 | 115 |
| Hiring & Recruitment | 52 | 7 | 8 | 3 | 70 |
| Creative Output | 32 | 20 | 8 | 3 | 64 |
| Skill Obsolescence | 5 | 47 | 6 | 1 | 59 |
| Social Protection | 28 | 16 | 8 | 2 | 54 |
| Labor Share of Income | 17 | 19 | 17 | — | 53 |
| Worker Turnover | 11 | 12 | — | 3 | 26 |
| Industry | — | — | — | 1 | 1 |
Adoption
Remove filter
We conduct a large-scale empirical evaluation across four risk datasets and a realistic fraud-detection environment involving professional analysts and 3,735 case reviews.
Experimental methods reported in the paper: evaluation across four risk datasets and a fraud-detection environment with professional analysts; stated sample of 3,735 case reviews.
A central issue is how humans interpret the algorithm's choice of features, which affects the design and evaluation of highlighting policies.
Framing and motivation in the paper: conceptual claim motivating the formal models and analysis (theoretical/argumentative).
We illustrate our framework in a calibrated empirical exercise based on the American Housing Survey.
An empirical/calibrated exercise using data from the American Housing Survey reported in the paper; the claim is that the framework is illustrated empirically (data-based demonstration).
Humans may interpret the algorithm's choice of features in different ways: a sophisticated agent correctly conditions on the selection rule, while a naive agent updates only on revealed feature values and treats the selection event as exogenous.
Conceptual/behavioral modeling in the paper that defines two agent-types (sophisticated vs naive) and analyzes their distinct inference processes (theoretical/modeling).
Highlighting can be modeled as a constrained information policy that selects a small number of features to reveal.
Modeling framework developed in the paper: formal definition of highlighting as an information policy with a feature-selection constraint (theoretical/modeling).
The study uses a mixed-methods design combining a quantitative survey of 312 senior managers/strategy professionals and 28 semi-structured interviews across four sectors in Zimbabwe.
Methods reported in the paper: quantitative survey n = 312; qualitative 28 interviews across manufacturing, financial services, telecommunications, and retail.
The paper integrates information processing theory, the resource-based view, and the dynamic capabilities perspective to develop an integrated framework linking digital technology adoption, visibility, and resilience.
Theoretical framing described in the paper (explicit mention of the three theories and their integration).
The study employs hierarchical regression, structural equation modeling (SEM), and rigorous endogeneity controls including instrumental variables and propensity score matching.
Methods section summary reported in the paper; explicit listing of regression, SEM, IV, and propensity score matching.
The study draws on survey data from 742 manufacturing and logistics firms across 23 countries.
Reported sample description in the paper: survey of 742 firms across 23 countries (manufacturing and logistics).
This review was conducted following the guidelines of the Preferred Reporting of Items in a Systematic Review and Meta-Analysis (PRISMA).
Methodological statement in the paper's abstract indicating PRISMA adherence; no further protocol details or study counts provided in the abstract.
The staggered expansion of Turkey's national natural gas pipeline network provides plausibly exogenous variation in connectivity because pipeline routing is determined by energy distribution priorities rather than digital demand.
Identification strategy described by the authors: using pipeline expansion as an instrument/conduit for fiber-optic deployment; argument rests on institutional routing rules and timing.
We evaluate structural validity, semantic alignment, reproducibility, and refinement effort to characterize authoring scalability.
Reported evaluation dimensions in the paper; implies empirical assessments were performed along these axes (details not provided in the abstract).
The paper foregrounds industrial firms' own digital agency as a less understood aspect in the literature on digitalization and governance.
Authors' positioning of their contribution and literature review claim in the paper (qualitative/theoretical claim).
The analysis is limited to OECD economies and monthly aggregate data, which constrains generalizability.
Study design: monthly panel of 38 OECD economies from 2000–2024 as stated in paper; author-reported limitation.
Digital trade alone is not statistically significant in affecting CO2 emissions (β = −0.030).
Same fixed-effects econometric specification on the monthly panel of 38 OECD economies (2000–2024); coefficient reported but not statistically significant.
The governance of open-weight artificial intelligence (AI) models has been framed as a binary choice: openness as risk, restriction as safety.
Literature and policy framing review presented in the paper (conceptual/argumentative analysis).
This is an exploratory and qualitative state-of-practice study grounded in over 30 interviews across four stakeholder groups (large enterprises, small/medium firms, AI developers, and CAD/CAM/CAE vendors).
Methodological statement in the paper describing study design and sample composition.
Key breakthroughs needed include integration with traditional engineering tools and data types, robust verification frameworks, and improved spatial and physical reasoning.
Interviewee-identified requirements compiled from over 30 interviews; stakeholders repeatedly pinpoint integration, verification, and spatial/physical reasoning as priority technical advances.
Five major themes emerged from the review: (1) Machine Learning for Credit Risk Assessment and Financial Inclusion; (2) Deep Learning and Neural Networks for Market Prediction and Volatility Forecasting; (3) Natural Language Processing and Sentiment Analysis for Decision Support; (4) AI-Based Fraud Detection and Operational Risk Management; and (5) Explainable AI, Regulatory Technology, and Governance Frameworks.
Thematic synthesis of the 64 retained studies reported in results; explicit listing of five themes in the paper's Results section.
We conducted a scoping review across four major databases (SciSpace, Google Scholar, ArXiv) covering publications from 2019 to 2025 and retained 64 unique studies after deduplication and screening.
Methods section: Arksey and O'Malley framework (enhanced by Levac et al.), explicit database search (SciSpace, Google Scholar, ArXiv), timeframe stated (2019–2025), and reported final sample of 64 studies after deduplication and screening.
As AI reduces the costs of ideation, synthesis, and search, the central bottlenecks of science increasingly shift toward coordination, adjudication, validation, and adaptive steering.
Argumentative/trend claim presented in the paper as motivation for PIM; no empirical time-series or quantitative analysis provided in the paper itself.
The paper formalises crowdsourced R&D and hackathon-type architectures as operational search forms and links these to Causal Problem Modelling (CPM) and the Causal Theoretical Twin Architecture (CTTA).
Conceptual mapping and theoretical linkage between existing crowdsourcing/hackathon models and CPM/CTTA within the PIM framework (theoretical exposition; no empirical mapping or measurement reported).
PIM proceeds through causal problem decomposition, distributed search, real-time evidential updating, contribution traceability, staged validation, and dynamic reprioritisation of candidate solution pathways.
Procedural description of the PIM methodology and its constituent stages in the paper (methodological/theoretical exposition; no experimental implementation reported).
PIM is designed for problem spaces characterised by causal heterogeneity, partial observability, nonlinear interaction, long feedback delays, and distributed expertise.
Methodological design specification within the paper describing the target problem-space features for which PIM is intended (conceptual specification; no empirical testing).
This paper formalises extensions of crowdsourced R&D and hackathon-based research into a general methodology called Probabilistic Innovation Methodology (PIM).
The paper presents a conceptual/theoretical formalisation and names the resulting methodology PIM (no empirical study or sample reported).
The empirical analysis covers MENA economies over the period 2010–2023.
Paper explicitly states the temporal and geographic scope: MENA economies, 2010–2023.
The study employs a dynamic panel data approach using the System Generalized Method of Moments (System GMM) estimator to address endogeneity, unobserved heterogeneity, and persistence effects.
Methods statement in the paper describing the use of System GMM for panel data covering MENA economies over 2010–2023.
Endogeneity in estimating AI's effects was controlled using a two-way fixed effects (TWFE) model and Propensity Score Matching (PSM).
Methodological claim reported in the study about the identification strategy used to estimate causal effects of AI adoption.
The timing of AI adoption was identified through a multi-step, contextually validated text analysis of DART business reports.
Descriptive/methodological statement in the study describing how adoption dates were extracted from firms' regulatory/business reports (DART) via a validated text-analytic procedure.
The average effect of AI adoption on market value (Tobin's Q) was not statistically significant across all firms.
TWFE and PSM estimates on KOSDAQ-listed firms (2018–2025) reporting firm-level Tobin's Q before and after identified AI-adoption timing.
No statistically significant change was observed in return on assets (ROA) following AI adoption.
Same empirical setting as above (KOSDAQ firms 2018–2025) using TWFE and PSM to estimate causal effects of AI adoption on ROA.
Four propositions formalize the gradient, cascade compounding, delegation-depth effects, and extension sufficiency, establishing boundary conditions for the framework's valid operating envelope.
Theoretical/formal propositions presented in the paper that articulate limits and conditions for the framework's applicability.
The framework is analytically assessed for transferability across four decision system architectures.
Paper reports an analytic (cross-architecture) assessment comparing framework applicability across four named decision system architectures.
A shift-share design finds no detectable effect of early adoption on worker-reported technology-related task restructuring.
Causal-style shift-share analysis using the 2024 EWCS exposure measures to estimate effects of early generative AI adoption on worker-reported changes in technology-related task content; sample >36,600 workers; result reported as no detectable effect.
We compare multiple state-of-the-art agents (e.g., GPT-4o, Llama 3, Qwen2) on metrics assessing tool selection accuracy, faithfulness, and hallucination.
Paper lists evaluated models (GPT-4o, Llama 3, Qwen2) and reports evaluation on metrics including tool selection accuracy, faithfulness, and hallucination across the benchmark.
Our benchmark consists of 100 financial questions.
Paper explicitly states the benchmark contains 100 financial questions.
Under three scenarios (optimistic: 2028-2035; base: 2035-2045; pessimistic: 2045-2060), we specify disconfirmation criteria that would weaken the thesis if observed.
Scenario analysis and specification of disconfirmation criteria by the authors; methodological claim about forecasting structure rather than empirical result.
Converging evidence from history, philosophy, neuroscience, technology, organizational studies, and cultural analysis supports this thesis.
Authors' multidisciplinary literature review and synthesis across the named fields (method: qualitative review); no single empirical dataset or sample size given.
We introduce 'instrumental dissolution' -- loss of institutional-default status while persisting in specialist niches.
Conceptual/theoretical contribution defined by the authors and illustrated via cross-disciplinary examples; no empirical validation sample reported.
Typing's dominance was instrumental, not cognitively necessary.
Argumentative/historical analysis presented in the paper; synthesis of historical and philosophical literature (no empirical sample or experiment reported).
We conducted an in-the-wild evaluation with over 2,200 individuals from heterogeneous organisations and roles in 116 countries, via log analysis, surveys, and 20 interviews.
Reported evaluation methods and sample in the paper's abstract: log analysis, surveys, and 20 interviews with over 2,200 participants across 116 countries.
Buildings account for approximately 40% of global energy consumption.
Statement in paper (background/contextual fact); likely based on cited external data though no sample size reported in excerpt.
Algorithmic accuracy alone does not determine value; legitimacy and uptake hinge on people's and process readiness.
Thematic conclusion drawn from interviews, Likert surveys, and document analysis across cases indicating non-technical factors strongly influence uptake despite algorithmic performance metrics. (Sample size not reported.)
The study's measurement model is supported by Composite Reliability (CR), Average Variance Extracted (AVE), and several model-fit indicators.
Paper explicitly states CR, AVE, and model-fit indices were used and supported the construct measurements and SEM.
Principal Component Analysis (PCA) identified the main constructs related to adoption of FinTech and perceived algorithmic trust.
Paper reports using PCA to identify constructs underlying adoption and perceived algorithmic trust prior to CFA/SEM.
Structured questionnaires were administered to 400 respondents in both city and rural areas of developing countries.
Method section statement specifying a quantitative research design and that structured questionnaires were sent to 400 respondents.
The study combines theoretical analysis with quantitative empirical research using survey data from Bosnia and Herzegovina analyzed by regression.
Paper summary states the methodological approach: theoretical analysis plus a quantitative empirical study based on survey data from Bosnia and Herzegovina, analyzed with regression methods. No further methodological details or sample size provided in the summary.
The long-term dynamic effects of AI on resilience remain unverified and require longer-term data.
Authors explicitly state the need for longer time-series data to validate long-term dynamics.
Enterprise-level indicators used in the study do not directly capture supply chain network structure and node dependencies.
Explicit limitation noted by the authors about measurement and scope.
The study's sample is limited to listed manufacturing companies, so conclusions should be applied cautiously to small and medium-sized enterprises (SMEs).
Explicit limitation stated by the authors in the paper.