Generative AI overviews now appear for roughly half of web queries and reshape which sites users see, favoring Google-owned content and sidelining publishers that block Google's AI crawler; the AI summaries are also less stable and less robust to small query changes than traditional search results, raising concerns about publisher visibility and revenue.

How Generative AI Disrupts Search: An Empirical Study of Google Search, Gemini, and AI Overviews

Riley Grossman, Songjiang Liu, Michael K. Chen, Mike Smith, Cristian Borcea, Yi Chen · April 30, 2026

arxiv descriptive medium evidence 7/10 relevance Source PDF

Using a 11,500-query benchmark, the paper shows that AI Overviews appear for 51.5% of queries, retrieve substantially different and low-overlap sources (favoring Google-owned content and underrepresenting sites that block Google's AI crawler), and are less consistent across runs and edits than traditional organic results.

Generative AI is being increasingly integrated into web search for the convenience it provides users. In this work, we aim to understand how generative AI disrupts web search by retrieving and presenting the information and sources differently from traditional search engines. We introduce a public benchmark dataset of 11,500 user queries to support our study and future research of generative search. We compare the search results returned by Google's search engine, the accompanying AI Overview (AIO), and Gemini Flash 2.5 for each query. We have made several key findings. First, we find that for 51.5\% of representative, real-user queries, AIOs are generated, and are displayed above the organic search results. Controversial questions frequently result in an AIO. Second, we show that the retrieved sources are substantially different for each search engine (<0.2 average Jaccard similarity). Traditional Google search is significantly more likely to retrieve information from popular or institutional websites in government or education, while generative search engines are significantly more likely to retrieve Google-owned content. Third, we observe that websites that block Google's AI crawler are significantly less likely to be retrieved by AIOs, despite having access to the content. Finally, AIOs are less consistent when processing two runs of the same query, and are less robust to minor query edits. Our findings have important implications for understanding how generative search impacts website visibility, the effectiveness of generative engine optimization techniques, and the information users receive. We call for revenue frameworks to foster a sustainable and mutually beneficial ecosystem for publishers and generative search providers.

Summary

Main Finding

Generative-search summaries (Google AI Overviews, AIOs) are now common and materially change which web sources users see. AIOs appear for a large fraction of queries (65.6% of the authors' 11.5k benchmark; ~51.5% on the representative ORCAS subset), they cite a different set of sources than traditional SERPs (average Jaccard similarity ≈ 0.11–0.18 across engine pairs), they favor Google-owned content and disfavor popular/institutional domains (and sites that block Google’s AI crawler), and they are less consistent/robust than traditional search. These shifts have direct economic consequences for publishers, the SEO/GEO market, and the incentives governing web content access and compensation.

Key Points

AIO prevalence
- Overall AIO generation: 65.6% of the 11,500 benchmark queries.
- Representative real-user (ORCAS) queries: 51.5% produced an AIO.
- Very high for long, informational, question-formatted queries (e.g., ELI5: 94.6%; NQ questions ≈ 86.2%); low for product keyword queries (Amazon Retail: 17.4%).
- AIOs are rare for trending queries (≈ 8.1%) but very common for sensitive/political queries (≈ 93.8% of political queries).
Divergent source sets
- Low overlap between engines: average Jaccard similarity of sources = 0.18 for AIO vs SERP, 0.11 for AIO vs Gemini, 0.16 for SERP vs Gemini.
- Rank-sensitive agreement (RBO) also low: AIO vs SERP ≈ 0.23; AIO vs Gemini ≈ 0.15; SERP vs Gemini ≈ 0.21.
- Each engine returns ≈ 8–10 sources on average (SERP 8.75, AIO 9.24, Gemini 9.68), so low overlap reflects different retrieval/prioritization methods, not list length.
Source characteristics
- Traditional SERP favors popular domains and institutional (.gov, .edu) sites.
- Generative outputs (AIO/Gemini) cite proportionally more Google-owned content and fewer popular/institutional sites.
- Websites that block Google’s AI crawler are significantly less likely to be cited in AIOs even though AIOs can technically access the content.
Consistency and robustness
- AIOs are less consistent across repeated runs and more sensitive to minor query edits or device/location changes than traditional SERP.
High-stakes behavior
- For contentious or political queries, AIOs are frequently produced and often take a stance in the generated text (AIOs: ~33.4% exhibited an expressed stance; Gemini: ~5.6%).
- Generative summaries have well-documented risks (hallucination, cherry-picking citations, attribution errors), which are amplified when they replace direct links to institutional sources.

Data & Methods

Benchmark dataset
- 11,500 queries spanning 9 types: ORCAS (5,000 real-user queries labeled by intent), Amazon Retail (500), Retail-Comp (500), Retail-Q (500), Debate (1,000), ELI5 (1,000), Localized (1,000), NQ (1,000), NQ Keywords (1,000).
- Additional time-sensitive subsets used in some analyses.
Collection procedures
- SERP and AIO results collected via SerpAPI simulating a mobile device from Newark, NJ to keep device/location controlled.
- Gemini 2.5 Flash responses collected via Gemini API with Google Search grounding enabled; no custom system prompts.
- Collection date for the main benchmark: Dec 7–8, 2025.
- For comparability, the analysis focuses on first-page SERP results and only on queries where all three systems returned sources (7,439 queries used for many comparisons).
Metrics and analyses
- Set overlap: Jaccard similarity at URL level.
- Rank-aware overlap: Rank‑Biased Overlap (RBO) with persistence parameter p = 0.9.
- Additional analyses: per-domain changes, domain categories (popularity, institution type), effect of robots/AI-crawler blocking, consistency across repeated runs and minor query edits, stance detection on generated summaries.
- Statistical testing (e.g., chi-square for categorical differences) and robustness checks reported; dataset and code available: https://github.com/rag24/AIO

Implications for AI Economics

Revenue and traffic redistribution
- Generative summaries reduce direct clicks to publishers by providing synthesized answers up front, threatening ad-driven and subscription revenues that rely on pageviews.
- Publishers face a trade-off: blocking AI crawlers may protect raw content but further reduces visibility in AIOs; allowing crawlers risks reuse/excerpting without commensurate compensation.
- The paper calls for revenue frameworks (licensing, revenue-sharing, micropayments) to align publisher and generative-search incentives and avoid a race-to-the-bottom for content access.
Market power and vertical integration risks
- The finding that generative outputs disproportionately cite Google-owned content raises concerns about preferential treatment and self-preferencing, with potential antitrust and competition policy implications.
- If dominant search/generative providers amplify their own content, network effects could accelerate concentration and lock-in, reducing diversity of information sources and bargaining power of independent publishers.
Impacts on SEO / GEO markets and service providers
- Traditional SEO tactics may become less effective because generative systems retrieve and rank sources differently; the nascent Generative Engine Optimization (GEO) industry faces uncertain efficacy.
- Publishers and GEO vendors need new measurement tools to evaluate visibility in AIOs and to monetize generative citations (if any).
Externalities and public-good concerns
- Generative search may rely less on institutional (.gov/.edu) and otherwise high-credence sources, especially for politically sensitive queries—this has social-welfare implications (misinformation risks, lower-quality information in civic domains).
- There is a potential mismatch between private incentives (minimize cost/complexity of grounding) and public interest (accurate, transparent sourcing).
Policy and marketplace remedies
- Short- to medium-term: transparency requirements (source provenance, citation links), audits of grounding behavior, and standards for citation display could mitigate information-quality externalities.
- Medium- to long-term: negotiated compensation models (licensing deals, per-use payments, aggregator revenue shares), API-based content use markets, or regulatory interventions to prevent anti-competitive self-preferencing.
- Antitrust and privacy regulators may need to consider how crawler access, data extraction, and preferential citation affect competition and content markets.
Research & measurement needs (for economists and policymakers)
- Quantify traffic and revenue impacts on publishers from AIO-style displays (click-through vs. summary consumption).
- Model platform-publisher bargaining under alternative licensing/revenue-sharing arrangements.
- Evaluate welfare trade-offs between user convenience (one-shot answers) and information quality/diversity.
- Standardized metrics for AIO transparency, robustness, and citation fidelity; routine independent audits and public datasets (the paper’s dataset/code are an example).

Short summary takeaway: generative search has already altered which sources users see and how often publishers receive traffic. That disruption creates economic pressure on publishers and raises competition, compensation, and public-good issues that call for new measurement, business models, and policy interventions to align incentives between generative-search providers and content creators.

Repository / data: https://github.com/rag24/AIO (authors’ processed datasets and code).

Assessment

Paper Typedescriptive Evidence Strengthmedium — Large, purpose-built dataset and direct measurements of search outputs provide credible descriptive evidence that generative search changes which sources are surfaced; however the study is observational, limited to particular engines/versions/time/regions, cannot identify causal mechanisms for publisher outcomes (e.g., traffic or revenue effects), and may be affected by personalization and dynamic indexing. Methods Rigormedium — The authors use systematic, reproducible metrics (prevalence counts, Jaccard similarity, domain categorization) and robustness checks (repeat runs, minor edits), and release the benchmark publicly; but engine internals are opaque, sampling frame and representativeness of the query set are not fully detailed here, potential confounders (personalization, geolocation, temporal changes) are not randomized or fully controlled, and statistical inference details are not described in this summary. SampleA public benchmark of 11,500 representative, real-user web search queries; for each query the authors collected results from Google organic search, Google's AI Overview (AIO), and Gemini Flash 2.5, and performed analyses on presence of AIOs, source overlap (Jaccard), domain types, effects of sites blocking Google's AI crawler, and stability across repeated runs and small query edits. Themesadoption governance IdentificationComparative benchmarking using a public dataset of 11,500 real-user queries: for each query the authors collect Google organic results, Google's AI Overview (AIO), and Gemini Flash 2.5 outputs; they measure AIO prevalence, compute Jaccard similarity of retrieved source sets, categorize domains (e.g., gov/edu, Google-owned), test retrieval differences for sites that block Google's AI crawler, and evaluate result consistency across repeated runs and minor query edits. GeneralizabilityLimited to the specific search engines and engine versions tested (Google AIO and Gemini Flash 2.5) and the time period of data collection, Potentially affected by geolocation, personalization, and device/context settings not fully controlled in the study, Representativeness depends on how the 11,500 'real-user' queries were sampled; results may not generalize to other query mixes or languages, Dynamic nature of search/indexing means results may change over time; findings may not hold as engines evolve, Descriptive retrieval differences do not directly translate into measured economic outcomes (traffic, clicks, publisher revenue)

Claims (10)

Claim	Direction	Confidence	Outcome	Details
We introduce a public benchmark dataset of 11,500 user queries to support our study and future research of generative search. Other	null_result	high	dataset size (number of queries)	n=11500 11,500 user queries 0.3
For 51.5% of representative, real-user queries, AI Overviews (AIOs) are generated and are displayed above the organic search results. Adoption Rate	positive	high	presence and placement of AI Overview (AIO)	n=11500 51.5% 0.18
Controversial questions frequently result in an AIO. Adoption Rate	positive	medium	likelihood of AIO generation for controversial queries	n=11500 0.11
The retrieved sources are substantially different for each search engine (average pairwise Jaccard similarity < 0.2). Adoption Rate	mixed	high	overlap (Jaccard similarity) of retrieved source domains across engines	n=11500 <0.2 average Jaccard similarity 0.18
Traditional Google search is significantly more likely to retrieve information from popular or institutional websites in government or education. Adoption Rate	positive	high	proportion of results from government/education/institutional websites	n=11500 0.18
Generative search engines are significantly more likely to retrieve Google-owned content. Adoption Rate	positive	high	proportion of results that are Google-owned content	n=11500 0.18
Websites that block Google's AI crawler are significantly less likely to be retrieved by AIOs, despite having access to the content. Adoption Rate	negative	high	likelihood/frequency of being retrieved in AIOs for crawler-blocking vs non-blocking sites	n=11500 0.18
AIOs are less consistent when processing two runs of the same query. Output Quality	negative	high	run-to-run consistency/variability of AIO outputs	0.18
AIOs are less robust to minor query edits. Output Quality	negative	high	robustness of results to minor query edits	0.18
These findings have important implications for website visibility, the effectiveness of generative engine optimization techniques, and the information users receive; we call for revenue frameworks to foster a sustainable and mutually beneficial ecosystem for publishers and generative search providers. Governance And Regulation	positive	high	policy recommendation for revenue frameworks / publisher sustainability	0.03