Researchers use LLM research assistants more like collaborators than search engines, submitting longer, more complex queries and delegating tasks such as drafting and gap identification; experienced users become more targeted and engage citations more deeply, though simple keyword queries persist.
AI-powered scientific research tools are rapidly being integrated into research workflows, yet the field lacks a clear lens into how researchers use these systems in real-world settings. We present and analyze the Asta Interaction Dataset, a large-scale resource comprising over 200,000 user queries and interaction logs from two deployed tools (a literature discovery interface and a scientific question-answering interface) within an LLM-powered retrieval-augmented generation platform. Using this dataset, we characterize query patterns, engagement behaviors, and how usage evolves with experience. We find that users submit longer and more complex queries than in traditional search, and treat the system as a collaborative research partner, delegating tasks such as drafting content and identifying research gaps. Users treat generated responses as persistent artifacts, revisiting and navigating among outputs and cited evidence in non-linear ways. With experience, users issue more targeted queries and engage more deeply with supporting citations, although keyword-style queries persist even among experienced users. We release the anonymized dataset and analysis with a new query intent taxonomy to inform future designs of real-world AI research assistants and to support realistic evaluation.
Summary
Main Finding
Researchers use deployed LLM-powered research tools not as simple search engines but as collaborative partners: they submit longer, more complex queries, delegate substantive research tasks (e.g., drafting, gap-finding), and treat generated outputs and cited evidence as persistent artifacts. Usage patterns evolve with experience—becoming more targeted and citation-focused—yet simpler, keyword-style queries persist. The authors release the Asta Interaction Dataset (≈200K anonymized queries and logs) and a new query-intent taxonomy to support realistic evaluation and design.
Key Points
- Dataset: ≈200,000 user queries and interaction logs from two deployed tools (literature discovery interface; scientific question-answering interface) inside a retrieval-augmented generation (RAG) platform.
- Query characteristics: users issue longer and more complex queries than typical web-search queries.
- Role of the system: treated as a collaborative research partner; users delegate tasks such as drafting content and identifying research gaps.
- Artifact persistence: generated responses and cited evidence are revisited and navigated in non-linear ways; users treat outputs as persistent artifacts in their workflows.
- Engagement and learning: with experience, users tend to (a) issue more targeted queries and (b) engage more deeply with supporting citations; however, simple keyword-style queries remain common even among experienced users.
- Contribution: anonymized dataset release + a new query-intent taxonomy to guide future designs and to enable more realistic evaluation of research assistants.
Data & Methods
- Data source: interaction logs and textual queries collected from two production research-assistant tools within an LLM-based RAG platform.
- Scale: roughly 200,000 queries across many users and sessions.
- Analyses: descriptive and exploratory characterization of query patterns (length, complexity), temporal/experience-based changes in behavior, and interaction behaviors (navigation among outputs and citations, revisitation patterns).
- Taxonomy: qualitative or mixed-method coding produced a new query-intent taxonomy to classify real-world research queries and inform evaluation design.
- Release: the dataset and accompanying analysis (anonymized) are published to enable replication and downstream work.
Implications for AI Economics
-
Productivity and task-shifting
- Tools enable delegation of substantive tasks (drafting, gap identification), implying potential productivity gains and a shift in researchers’ time allocation from routine search to higher-level oversight and synthesis.
- Economic impact depends on how quality scales: if outputs reduce time-to-result without harming quality, research throughput and returns to research investment may rise.
-
Complementarity vs. substitution
- The collaborative treatment suggests complementarity between tools and skilled researchers (tools augment cognitive tasks). However, persistent automation of routine tasks could substitute some junior or administrative research tasks.
- Heterogeneous effects likely across roles and seniority: experienced users appear to extract more value, signaling skill-biased complementarities.
-
Learning, human capital, and adoption dynamics
- Usage evolves with experience (more targeted queries, deeper citation engagement), indicating on-the-job learning and increasing returns to experience. Adoption models should incorporate learning curves and heterogeneity in use intensity.
- Persistent prevalence of keyword-style queries suggests limits to learning or variation in worker incentives/time constraints.
-
Market design and pricing
- Value derives not just from raw model output but from features that support persistence, citation engagement, and non-linear navigation. Product and pricing strategies should reflect these workflow integrations (e.g., subscription tiers for collaboration/history features).
- Metrics for product success should go beyond query-response latency and BLEU-like metrics to include measures of citation engagement, revision cycles, and downstream research outputs.
-
Evaluation and policy
- Standard benchmarking (isolated question answering) may misrepresent real-world value; the released taxonomy and dataset enable evaluations that reflect actual intents and workflows, informing procurement and regulatory assessment.
- Persistent artifacts and citation behaviors raise questions about provenance, reproducibility, and attribution—implications for academic norms, incentives, and IP policy.
-
Research and public-good considerations
- Release of an anonymized interaction dataset lowers barriers for independent study of economic impacts and tool design, enabling better calibration of models of adoption, productivity, and labor-market effects.
- Future empirical work should quantify effects on publication output, time-to-discovery, research quality, and labor demand within research organizations.
Suggested next empirical steps for economists - Link interaction logs to researcher outputs (papers, grants) to estimate causal effects on productivity and quality. - Quantify heterogeneous returns by experience, field, and role to assess distributional impacts. - Measure substitution vs. complementarity by tracking task allocation and staffing needs over time.
Assessment
Claims (7)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| The Asta Interaction Dataset comprises over 200,000 user queries and interaction logs from two deployed tools (a literature discovery interface and a scientific question-answering interface) within an LLM-powered retrieval-augmented generation platform. Other | null_result | high | size and composition of dataset (number of queries, tools included) |
n=200000
0.3
|
| Users submit longer and more complex queries than in traditional search. Research Productivity | positive | medium | query length and complexity |
n=200000
0.11
|
| Users treat the system as a collaborative research partner, delegating tasks such as drafting content and identifying research gaps. Research Productivity | positive | medium | frequency of delegation behaviors (drafting content, gap identification) in user interactions |
n=200000
0.11
|
| Users treat generated responses as persistent artifacts, revisiting and navigating among outputs and cited evidence in non-linear ways. Research Productivity | positive | medium | revisit and navigation behavior (frequency of revisits, non-linear navigation patterns) |
n=200000
0.11
|
| With experience, users issue more targeted queries and engage more deeply with supporting citations. Research Productivity | positive | medium | targeted query frequency and citation engagement over user experience/time |
n=200000
0.11
|
| Keyword-style queries persist even among experienced users. Research Productivity | mixed | medium | prevalence of keyword-style queries by user experience level |
n=200000
0.11
|
| We release the anonymized dataset and analysis with a new query intent taxonomy to inform future designs of real-world AI research assistants and to support realistic evaluation. Other | null_result | high | data and taxonomy release |
0.3
|