Evidence (7953 claims)

Evidence Matrix

Claim counts by outcome category and direction of finding.

Outcome	Positive	Negative	Mixed	Null	Total
Other	402	112	67	480	1076
Governance & Regulation	402	192	122	62	790
Research Productivity	249	98	34	311	697
Organizational Efficiency	395	95	70	40	603
Technology Adoption Rate	321	126	73	39	564
Firm Productivity	306	39	70	12	432
Output Quality	256	66	25	28	375
AI Safety & Ethics	116	177	44	24	363
Market Structure	107	128	85	14	339
Decision Quality	177	76	38	20	315
Fiscal & Macroeconomic	89	58	33	22	209
Employment Level	77	34	80	9	202
Skill Acquisition	92	33	40	9	174
Innovation Output	120	12	23	12	168
Firm Revenue	98	34	22	—	154
Consumer Welfare	73	31	37	7	148
Task Allocation	84	16	33	7	140
Inequality Measures	25	77	32	5	139
Regulatory Compliance	54	63	13	3	133
Error Rate	44	51	6	—	101
Task Completion Time	88	5	4	3	100
Training Effectiveness	58	12	12	16	99
Worker Satisfaction	47	32	11	7	97
Wages & Compensation	53	15	20	5	93
Team Performance	47	12	15	7	82
Automation Exposure	24	22	9	6	62
Job Displacement	6	38	13	—	57
Hiring & Recruitment	41	4	6	3	54
Developer Productivity	34	4	3	1	42
Social Protection	22	10	6	2	40
Creative Output	16	7	5	1	29
Labor Share of Income	12	5	9	—	26
Skill Obsolescence	3	20	2	—	25
Worker Turnover	10	12	—	3	25

The paper identifies three core mechanisms underlying calibrated trust and complementarity: (1) calibrated trust balancing reliance and oversight, (2) complementarity–trust interaction for optimal performance, and (3) dynamic feedback loops producing reinforcing learning cycles.

Explicit identification of mechanisms claimed in the paper's synthesis; this is a descriptive claim about the paper's content rather than an empirical finding—no sample or empirical test reported in the abstract.

high null result Optimising Human– AI Decision Performance: A Trust and Cap... n/a (identification of theoretical mechanisms)

AI-adopting firms do not increase capital expenditures following adoption.

Firm-level capex analysis showing no significant change in capital expenditures for adopters versus nonadopters post-adoption in the paper's empirical framework.

high null result AI and Productivity: The Role of Innovation capital expenditures (capex)

It remains unclear how developers' general programming and security-specific experience, and the type of AI tool used (free vs. paid), affect the security of the resulting software — motivating this study.

Paper's stated research gap/motivation: the authors identify uncertainty in the literature regarding interactions between developer experience, AI tool tier (free vs. paid), and resulting code security.

high null result The Impact of AI-Assisted Development on Software Security: ... the combined effect of developer experience and AI tool type on code security (i...

Participants were assigned a security-related programming task using either no AI tools, the free version, or the paid version of Gemini.

Experimental design described in the paper: random/conditional assignment of participants into three groups (no AI, free Gemini, paid Gemini) performing the same security-related programming task.

high null result The Impact of AI-Assisted Development on Software Security: ... experimental condition (tool used) as it relates to subsequent code security out...

We conducted a quantitative programming study with software developers (n = 159) exploring the impact of Google's AI tool Gemini on code security.

Explicit methodological statement in the paper: a quantitative study with 159 participating software developers assigned to experimental conditions to evaluate Gemini's impact on security-related programming tasks.

high null result The Impact of AI-Assisted Development on Software Security: ... impact of Gemini on code security (security of code produced in the study)

The authors surveyed workers and developers on a representative sample of 171 tasks and used language models (LMs) to scale ratings to 10,131 computer-assisted tasks across all U.S. occupations.

Study methodology reported in the paper: surveys of 'workers and developers' on 171 tasks, plus LM-based scaling to 10,131 tasks (coverage claims across U.S. occupations).

high null result Are We Automating the Joy Out of Work? Designing AI to Augme... coverage and scaling of task-level ratings (number of tasks surveyed and number ...

SWE-Skills-Bench is available at https://github.com/GeniusHTX/SWE-Skills-Bench.

Repository URL provided in the paper for the benchmark's code/data.

high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... public availability (URL) of the benchmark

SWE-Skills-Bench provides a testbed for evaluating the design, selection, and deployment of skills in software engineering agents.

Benchmark design pairs skills, repositories, and deterministic verification tests; intended use stated by authors as a testbed for evaluation of skills.

high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... availability of a benchmarking testbed for evaluating agent skills

39 of 49 skills yield zero pass-rate improvement.

Empirical evaluation over 49 skills and ~565 task instances reporting that 39 skills produced no improvement in test pass rate when injected.

high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... change in task acceptance-test pass rate (zero improvement)

The authors introduce a deterministic verification framework that maps each task's acceptance criteria to execution-based tests, enabling controlled paired evaluation with and without the skill.

Method: creation of a deterministic verification framework that converts acceptance criteria into executable tests; used to perform paired evaluations (with skill vs. without skill).

high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... ability to deterministically verify task acceptance criteria via execution-based...

SWE-Skills-Bench pairs 49 public SWE skills with authentic GitHub repositories pinned at fixed commits and requirement documents with explicit acceptance criteria, yielding approximately 565 task instances across six SWE subdomains.

Benchmark construction: 49 public skills, repositories pinned to fixed commits, requirement documents with acceptance criteria, producing ~565 task instances spanning six SWE subdomains (as reported by the paper).

high null result SWE-Skills-Bench: Do Agent Skills Actually Help in Real-Worl... number of skill-repo-task instances (~565) and coverage across six subdomains

The article introduces a novel Bayesian Item Response Theory framework that quantifies human–AI synergy by separately estimating individual ability, collaborative ability, and AI model capability while controlling for task difficulty.

Methodological contribution described in the paper: development and application of a Bayesian Item Response Theory model that includes separate parameters for individual ability, collaborative ability, AI model capability, and task difficulty (method section of the paper).

high null result Quantifying and Optimizing Human-AI Synergy: Evidence-Based ... estimated parameters for individual ability, collaborative ability, AI model cap...

The Planner is trained via Supervised Fine-Tuning (SFT) to internalize diagnostic capabilities and then aligned with business outcomes (conversion rate) via Reinforcement Learning (RL).

Method description in the paper specifying SFT initialization followed by RL alignment targeting conversion rate (UCVR) as reward signal.

high null result Probe-then-Plan: Environment-Aware Planning for Industrial E... Planner diagnostic behavior and policy alignment with conversion rate (model tra...

EASP's Offline Data Synthesis stage: a Teacher Agent synthesizes diverse, execution-validated plans by diagnosing the probed environment.

Method description in the paper detailing the Teacher Agent's role in synthesizing execution-validated plans during offline data synthesis.

high null result Probe-then-Plan: Environment-Aware Planning for Industrial E... synthesized execution-validated search plans (data generation outcome)

The Probe-then-Plan mechanism uses a lightweight Retrieval Probe to expose the retrieval snapshot, enabling the Planner to diagnose execution gaps and generate grounded search plans.

Methodological description in the paper: design and implementation of Retrieval Probe and Planner; validated through synthesized data and downstream evaluations (offline and online).

high null result Probe-then-Plan: Environment-Aware Planning for Industrial E... retrieval snapshot exposure and Planner diagnostic output (implementation/functi...

Descriptive statistics, reliability tests, regression analysis, and structural equation modelling (SEM) were employed to analyse the relationships between AI adoption and entrepreneurial outcomes.

Methods section reporting use of descriptive statistics, reliability tests, regression analysis, and SEM to evaluate relationships between AI adoption and measured outcomes.

high null result Entrepreneurship in the Era of Artificial Intelligence: Rede... not applicable (methodological detail)

The study used a quantitative research design and collected data from 350 entrepreneurs and managers of small and medium-sized enterprises (SMEs) who had adopted AI in their business operations.

Methods section of the paper specifying a quantitative design and a sample size of 350 AI-adopting SME entrepreneurs/managers.

high null result Entrepreneurship in the Era of Artificial Intelligence: Rede... not applicable (methodological detail)

The study used portfolio-level analysis to compare the financial outcomes of portfolios constructed using AI-driven ESG indicators with those based on conventional ESG ratings.

Methodological statement in the paper: portfolio-level analysis and comparative design. The summary does not specify the number of portfolios, asset universes, time frame, or construction rules.

high null result Green Intelligence in Finance: Artificial Intelligence-Drive... Study methodology (portfolio-level comparative analysis)

A quantitative methodology was employed, utilizing a structured questionnaire administered to 400 small business owners.

Explicit methodological statement in the paper: structured questionnaire survey with sample size N=400 small business owners.

high null result The role of artificial intelligence in enhancing financial l... method / sample (use of structured questionnaire; sample size = 400)

The study uses a game-theoretic model involving a foundation model provider and two competing downstream firms to analyze how policy interventions affect consumer surplus in the AI supply chain.

Methodological description in the paper: a formal game-theoretic model with one upstream provider and two downstream competing firms; equilibrium analysis and comparative statics are performed on model outcomes (prices, qualities, profits, consumer surplus).

high null result The Economics of AI Supply Chain Regulation model equilibrium outcomes (prices, qualities, provider profit, downstream profi...

Foi realizada etnografia organizacional orientada ao SCF, com roteiro e triangulação de evidências.

Método qualitativo divulgado no resumo: etnografia organizacional com roteiro e triangulação; o resumo não fornece número de organizações, duração ou amostragem.

high null result A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... evidências qualitativas da existência e manifestação da fricção psicoantropológi...

Foi construído e validado um instrumento psicométrico (escala SCF-30) e calculado um índice 0–100, com modelagem por Equações Estruturais (SEM) e testes de confiabilidade/validade.

Descrição metodológica explícita no resumo: construção e validação da escala SCF-30, uso de SEM e testes de confiabilidade e validade. O resumo não detalha estatísticas, amostra ou resultados numéricos.

high null result A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... pontuação SCF (índice 0–100) e propriedades psicométricas da escala SCF-30 (conf...

O SCF é operacionalizado por três vetores centrais: Percepção de Complexidade (PC), Aversão ao Risco Institucional (AR) e Inércia Cultural (IC).

Estrutura conceitual e operacional apresentada no artigo; especificação explícita dos três vetores como componentes do construto SCF.

high null result A FRICÇÃO PSICOANTROPOLÓGICA (SCF - Symbolic-Cognitive Frict... componentes constituintes do construto SCF (PC, AR, IC)

This research conducts a critical analysis of the ethical implications of artificial intelligence in terms of job displacement during the fifth industrial revolution.

Author-declared methodology: a literature-based critical analysis drawing on novel studies and the existing body of literature; no further methodological details (e.g., inclusion criteria, databases searched) provided in the excerpt.

high null result A Study on Work-Life Balance of Women Employees in the IT Se... ethical implications of AI-related job displacement

This study uses panel data on agricultural firms listed on the Shanghai and Shenzhen A-share markets from 2007 to 2023 and applies a multidimensional fixed-effects model to estimate the impact of AI on firms’ total factor productivity (TFP).

Methodological statement in the paper: dataset = panel of listed agricultural firms (Shanghai and Shenzhen A-share markets), time period 2007–2023; empirical approach = multidimensional fixed-effects model.

high null result Artificial intelligence and the sustainable development of a... study design / estimation of AI impact on total factor productivity (TFP)

Degree, betweenness, and eigenvector centrality metrics were used to identify structural vulnerabilities and leverage points in the construction supply chain network.

Paper reports calculation of degree, betweenness, and eigenvector centrality to outline vulnerabilities; specific metrics and interpretations are reported (e.g., degree centrality value for brokers).

high null result Social-Network Analytics of Construction Supply Chain network centrality measures (degree, betweenness, eigenvector) as indicators of ...

Thematic coding translated reported interactions into nodes and edges of a complex network and grouped challenges into thematic categories.

Methods described: thematic coding applied to interview data to create network structure and to generate challenge categories (six main categories, 16 open codes reported).

high null result Social-Network Analytics of Construction Supply Chain conversion of qualitative interactions into network structure and thematic categ...

This study combines empirical, semi-structured interviews with social network analytics to map construction supply chain relationships and vulnerabilities.

Methods reported in the paper: use of semi-structured interviews plus social network analysis (thematic coding to create nodes/edges, calculation of network metrics). Sample size not specified in the abstract.

high null result Social-Network Analytics of Construction Supply Chain research method integration (interviews + social network analytics)

Distinguishing between base models and fine-tuned systems is important for researchers using LLMs to study cultural patterns, because fine-tuning and alignment can change the behaviors relevant to behavioral research.

Analytical distinction and methodological guidance in the paper; claim grounded in conceptual reasoning about model development workflows rather than a specific experimental demonstration in the excerpt.

high null result The Third Ambition: Artificial Intelligence and the Science ... impact of model provenance (base vs fine-tuned) on suitability for behavioral/cu...

Contemporary artificial intelligence research has been organized around two dominant ambitions: productivity (treating AI systems as tools for accelerating work and economic output) and alignment (ensuring increasingly capable systems behave safely and in accordance with human values).

Literature synthesis and conceptual framing within the paper (review of prevailing research agendas and priorities in AI literature). No original empirical sample or experiment reported for this claim in the provided text.

high null result The Third Ambition: Artificial Intelligence and the Science ... categorization of dominant research ambitions in contemporary AI (productivity v...

This study analyzes comments and statements from party members in OECD countries from 2016 to 2025 through content analysis, examining media interviews, speeches, and debates.

Description of the study's data and method: content analysis of party member comments and statements drawn from media interviews, speeches, and debates across OECD countries over the 2016–2025 period (sample size and selection details not reported in the excerpt).

high null result Political Ideology, Artificial Intelligence (AI), and Labor ... dataset composition and methodological approach (sources and timeframe of analyz...

The study contributes to the literature by integrating evidence across higher education, vocational training, and lifelong learning to emphasize the need for balanced policy approaches to skill formation.

Stated contribution in the paper: cross-pathway synthesis of existing empirical evidence and secondary data (methods described as comparative synthesis; no primary empirical contribution reported in the summary).

high null result Balancing Higher Education, Vocational Training, and Lifelon... scholarly contribution / integrative synthesis

The study uses secondary data and comparative evidence from prior empirical studies to analyze relationships between higher education, vocational education, and lifelong learning.

Stated methodology in the paper: analysis of secondary data and synthesis of prior empirical/comparative studies (no primary data collection; no sample sizes reported).

high null result Balancing Higher Education, Vocational Training, and Lifelon... methodological approach / data sources

This study analyzed survey data from 466 Chinese food delivery riders using structural equation modeling and bootstrapping procedures, modeling work pressure as a mediator and perceived autonomy as a moderator.

Statement in abstract describing sample size (466 Chinese food delivery riders) and analytic approach (SEM and bootstrapping) and modeled variables (work pressure mediator, perceived autonomy moderator).

high null result Not all algorithmic controls are equal: the double-edged imp... methodology / analysis approach

Drawing on leadership theory, emotional intelligence research and AI ethics informs the proposed framework.

Methodological/design statement in the paper describing its intellectual grounding; indicates literature-based synthesis rather than primary data collection.

high null result Deconstructing success: why being human still matters sources informing the framework (theoretical influences)

The study uses topic modeling on a corpus of over 4,600 academic papers to identify the dominant themes in the economics of AI literature.

Unsupervised topic modeling applied to a compiled corpus of >4,600 papers (authors' described methodology and sample size).

high null result Mapping the Landscape of the Economics of AI Literature: Gap... identified topics / dominant themes (topic prevalence across the corpus)

The paper explores risk frameworks, ethical constraints, and policy imperatives related to AI.

Descriptive claim about the paper's analytic content (thematic/policy analysis); no empirical details or measurement approach are given in the abstract.

high null result AI for Good: Societal Impact and Public Policy analysis of risk frameworks, ethical constraints, and policy imperatives

This paper investigates societal applications of AI across domains such as healthcare, education, accessibility, environmental management, emergency response, and civic administration.

Descriptive statement of the paper's scope and methods (literature review / cross-domain analysis implied); the abstract lists the domains but does not specify empirical procedures or sample sizes.

high null result AI for Good: Societal Impact and Public Policy coverage of AI applications in specified domains (healthcare, education, accessi...

Chatbot suggestions were artificially varied in aggregate accuracy across treatment conditions from low (53%) to high (100%).

Paper describes experimental manipulation of chatbot suggestion accuracy with aggregate accuracies ranging from 53% to 100%; manipulation method (how suggestions were generated or sampled) described in methods (not fully detailed in excerpt).

high null result LLMs in social services: How does chatbot accuracy affect hu... manipulated chatbot suggestion accuracy (range 53%–100%)

Caseworkers in the control condition (no chatbot suggestions) had a mean accuracy of 49%.

Reported experimental outcome: mean accuracy for control group = 49%; based on the randomized experiment using the 770-question benchmark.

high null result LLMs in social services: How does chatbot accuracy affect hu... caseworker accuracy (mean percent correct in control condition = 49%)

We conducted a randomized experiment with caseworkers recruited from nonprofit outreach organizations in Los Angeles.

Paper describes a randomized experiment recruiting caseworkers from nonprofit outreach organizations in Los Angeles; sample size and recruitment details not given in the excerpt.

high null result LLMs in social services: How does chatbot accuracy affect hu... execution of a randomized experiment with nonprofit caseworker participants (loc...

The benchmark questions have corresponding expert-verified answers.

Paper states benchmark questions have expert-verified answers; verification method and number/credentials of experts not specified in the excerpt.

high null result LLMs in social services: How does chatbot accuracy affect hu... availability of expert-verified reference answers for benchmark questions

We created a 770-question multiple-choice benchmark dataset of difficult, but realistic questions that a caseworker might receive.

Paper reports creation of a benchmark dataset containing 770 multiple-choice questions described as difficult and realistic; questions and dataset construction described in methods (no sample-of-questions or external validation details provided in the excerpt).

high null result LLMs in social services: How does chatbot accuracy affect hu... benchmark dataset size and content (770 multiple-choice questions)

The study's conclusions draw on three complementary evidence bases: (a) task-level evidence on what generative AI can already do in practice; (b) occupational exposure and complementarity analysis using Philippine labor force data; and (c) firm- and worker-level evidence on AI adoption.

Description of methods and data sources in the paper: task-level capability testing/assessment, analysis of national labor force/occupation data for exposure/complementarity, and firm/worker surveys or qualitative adoption evidence.

high null result Labor Futures Under Artificial Intelligence: Scenarios for t... methodological integration of evidence bases (description of data/methods rather...

There is a need for more longitudinal and cross-country studies to better understand the long-term value creation of ERM in MSMEs.

Authors' conclusion and identified research gaps based on the scope and limitations of the existing literature reviewed (i.e., predominance of cross-sectional or single-country studies).

high null result A Literature Review: Effect of Enterprise Risk Management (E... long-term value creation of ERM (research evidence availability)

Extensive experiments were conducted using both synthetic and real hospital datasets to evaluate the framework.

Statement in the paper indicating experiments on synthetic and real datasets; exact sizes, sources, and composition of these datasets are not provided in the excerpt.

high null result Enhancing hospital workforce planning, scheduling, and perfo... breadth of experimental evaluation (use of synthetic and real datasets)

The paper explains the main legal frameworks that currently regulate AI in India, as well as proposals for future legislation.

Author's legal and policy analysis / document review of existing statutes and proposed laws (qualitative review). No quantitative sample size; based on review of legal texts and policy proposals cited in the article.

high null result Regulation and governance of artificial intelligence in Indi... existence and content of legal/regulatory frameworks and proposed legislation go...

DDDM was quantified using AI language models, specifically BERT and ChatGLM2-6B.

Methodological description in the paper stating that BERT and ChatGLM2-6B were leveraged to quantify the extent of DDDM (implementation details, training/data specifics, and sample not provided in the excerpt).

high null result The data-driven decision-making, sustainable value creation,... degree of data-driven decision-making (DDDM) (measurement variable)

A “macro approach” that (1) directly models equilibrium behavior of large employers, (2) combines macro data with empirical estimates of employers’ responses (from the micro approach) to estimate the model, and (3) uses the model to compute aggregate costs of monopsony and optimal policies, is the appropriate methodological response.

Methodological proposal set out by the paper; this is a description of the authors' recommended empirical/theoretical strategy rather than an empirical finding. The excerpt contains no implementation details, datasets, or estimation results.

high null result Labor Market Power: From Micro Evidence to Macro Consequence... aggregate costs of monopsony and optimal policy prescriptions

The traditional theoretical and empirical “micro approach” to studying labor market power requires that firms are small and atomistic.

Conceptual/theoretical characterization of the micro approach stated by the paper; no empirical sample, dataset, or formal model provided in the excerpt.

high null result Labor Market Power: From Micro Evidence to Macro Consequence... assumption about firm size/atomistic nature in micro monopsony models

« Prev 1 2 3 … 30 31 32 … 159 160 Next »