Getting Started

This guide walks you through installing ScholarScout on your computer, step by step. No programming experience needed. If you can copy-paste text, you can do this.

What is ScholarScout?

ScholarScout is a free tool that reads academic papers from 8 databases and uses AI to generate research ideas, product concepts, or feature suggestions. It runs on your own computer (not in the cloud), so your data stays private.

What you need before starting

A computer running Windows, Mac, or Linux
Internet connection (to download papers and talk to the AI)
About 10 minutes for the initial setup

You do NOT need to know how to code. Just follow the steps below.

Step 1: Install Python

Python is the programming language ScholarScout is built with. You need it installed on your computer.

Windows

Go to python.org/downloads
Click the big yellow "Download Python" button
Run the downloaded file
IMPORTANT: Check the box that says "Add Python to PATH" at the bottom of the installer window
Click "Install Now"
Wait for it to finish, then close the installer

Mac

Go to python.org/downloads
Click the big yellow "Download Python" button
Open the downloaded .pkg file
Follow the installer steps (just keep clicking "Continue" and "Agree")
Click "Install"

Linux (Ubuntu/Debian)

Open a terminal and type:

sudo apt update
sudo apt install python3 python3-pip python3-venv

Check if Python is installed correctly

Open a terminal (or Command Prompt on Windows) and type:

python --version

You should see something like Python 3.12.4. Any number 3.10 or higher is fine. If you see an error on Mac/Linux, try python3 --version instead.

Step 2: Download ScholarScout

You have two options:

Option A: Download as ZIP (easiest)

Go to github.com/neej4/ScholarScout
Click the green "Code" button
Click "Download ZIP"
Extract the ZIP file to a folder you can find easily (like your Desktop or Documents)

Option B: Use Git (if you have it installed)

git clone https://github.com/neej4/ScholarScout.git

Step 3: Install dependencies

ScholarScout needs some extra Python packages to work. Here is how to install them:

Windows

Open File Explorer and navigate to the ScholarScout folder
Click on the address bar at the top (where it shows the folder path)
Type cmd and press Enter. This opens a command window in that folder.
Type the following and press Enter:

pip install -r requirements.txt

Mac / Linux

Open Terminal
Navigate to the ScholarScout folder. For example, if it is on your Desktop:

cd ~/Desktop/ScholarScout
pip3 install -r requirements.txt

Wait for all packages to download and install. This may take 1-2 minutes.

Step 4: Get a free AI API key

ScholarScout needs an AI service to generate ideas. The easiest free option is Google Gemini:

Go to aistudio.google.com/app/apikey
Sign in with your Google account
Click "Create API Key"
Copy the key (it looks like a long string of random letters and numbers)
Keep this key somewhere safe. You will paste it into ScholarScout in the next step.

Tip: Gemini gives you 15 free requests per minute. That is more than enough for normal use. If you want an alternative, Groq is also free — see the LLM Providers section.

Step 5: Start ScholarScout

In the same terminal/command window from Step 3:

python preview_server.py

(On Mac/Linux, use python3 preview_server.py if python does not work.)

You should see a message saying the server is running. Now open your web browser and go to:

http://localhost:5050

Step 6: First-time setup wizard

When you open ScholarScout for the first time, a setup wizard appears. Follow these steps:

Choose a provider: Select "Gemini" (or whichever provider you got a key for)
Paste your API key: Paste the key you copied in Step 4
Test connection: Click the "Test" button. You should see a green checkmark.
Pick categories: Choose 2-3 research fields you are interested in (e.g., "Machine Learning", "Medicine", "Agriculture")
Done! Click finish to close the wizard

Step 7: Generate your first ideas

You are now ready to use ScholarScout. You have two buttons:

Quick — Generates ideas instantly (~10 seconds) using AI knowledge. Great for a first test.
Run — Fetches fresh papers from academic databases, analyzes trends, then generates ideas (2-5 minutes). More thorough and grounded in real papers.

Try clicking Quick first to see how it works. Then try Run for the full experience.

Troubleshooting

Problem	Solution
`python` is not recognized	On Windows: reinstall Python and make sure "Add to PATH" is checked. On Mac/Linux: try `python3` instead.
`pip install` fails	Try `pip3 install -r requirements.txt` or `python -m pip install -r requirements.txt`
Port 5050 already in use	Another program is using that port. Close it, or edit `preview_server.py` and change the port number.
"LLM unreachable" error	Your API key might be wrong. Go to Settings tab and re-paste your key. Click "Test Connection".
0 papers fetched	You hit a rate limit. Wait 5 minutes and try again, or use Quick mode (which does not fetch papers).
Page is blank / nothing loads	Make sure the terminal still shows the server running. If it crashed, run `python preview_server.py` again.

Four Modes

ScholarScout can generate four different types of output from the same papers. Think of it like asking four different experts to read the same research and give you different kinds of advice.

Academic Mode

Who is this for? Students (undergraduate, masters, PhD), researchers, anyone writing a thesis or paper.

What it produces:

Research topic suggestions with clear research questions
Suggested methodology (how to actually do the research)
Key papers you should read
Novelty check (is this idea actually new?)
Quality score based on feasibility and originality

Available goals:

Any — General exploration, no constraints
Thesis — Scoped for a thesis project (6-12 months)
Publication — Aimed at publishable research
Grant Proposal — Framed for funding applications

Example: A medical student selects categories "Medicine" and "Machine Learning", sets goal to "Thesis". ScholarScout reads recent papers and suggests: "Predicting antibiotic resistance patterns using transformer models on hospital lab data — gap: no existing study combines temporal lab sequences with resistance metadata."

Product Mode

Who is this for? Entrepreneurs, developers, hackathon participants, anyone who wants to build something.

What it produces:

Product name and one-line description
MVP features (minimum viable product — what to build first)
Suggested tech stack
Revenue model ideas
Existing competitors and how your idea differs

Available goals:

Hackathon — Buildable in 24-48 hours
Side Project — Weekend/hobby scope
AI Tool — AI-powered product ideas
Industry R&D — Enterprise-scale research and development

Example: A developer selects "Natural Language Processing" category, sets goal to "Hackathon". ScholarScout suggests: "PaperBrief — a browser extension that summarizes any arXiv paper into a 3-bullet TL;DR using the retrieval-augmented approach from [recent paper]."

Develop Mode

Who is this for? Developers who already have a project and want to improve it using ideas from recent research.

What it produces:

Feature suggestions directly applicable to YOUR project
Integration opportunities (connect your project with new techniques)
Performance optimizations based on recent papers

Available goals:

Feature — New functionality to add
Integration — Connect with external systems or techniques
Optimization — Make existing features faster or better
Extension — Expand scope of your project
Pivot — Explore new directions for your project

Important: In Develop mode, you MUST describe your project in the "Context" field. Every generated idea will be specifically about improving your project. If you leave the context empty, the results will be generic.

Example: You describe your project as "A mobile app for tracking plant growth using phone camera photos." ScholarScout reads computer vision papers and suggests: "Add disease detection using the few-shot learning approach from [paper] — requires only 5 example images per disease type."

Review Mode

Who is this for? Anyone writing a literature review, systematic review, or survey paper. Also useful for getting a quick overview of a new field.

What it produces:

Papers automatically clustered into thematic groups
Per-cluster synthesis: common methodologies, key findings, research gaps
Cross-cutting analysis: field timeline, active debates, open questions
Curated reading list ranked by citation impact

How it works differently: Review mode does NOT generate new ideas. Instead, it organizes and synthesizes existing papers. The pipeline has 6 phases: Validate → Fetch → Cluster → Synthesize → Cross-cutting → Save.

Example: You select "cs.AI" and "cs.CL" categories, set mode to Review with goal "Literature Synthesis". ScholarScout fetches 40+ papers, clusters them into 5 themes (e.g., "Retrieval-Augmented Generation", "Instruction Tuning", "Multi-Agent Systems"), and produces a synthesis per cluster with gaps and open questions.

How to choose the right mode

Your situation	Mode	Goal
I need a thesis topic	Academic	Thesis
I want to explore what is new in my field	Academic	Any
I have a hackathon this weekend	Product	Hackathon
I want to build a startup idea	Product	Side Project or AI Tool
I have an app and want new features	Develop	Feature
My project is slow, I want to optimize it	Develop	Optimization
I need to write a literature review	Review	Literature Synthesis
I want to understand a new field quickly	Review	Literature Synthesis

Data Sources

ScholarScout fetches papers from 8 academic databases. You do not need to configure anything — the system automatically picks the best 3-4 databases based on your chosen research categories.

The 8 databases

Source	Size	API Key Needed?	What it covers
arXiv	2.4M+ preprints	No	Computer Science, Physics, Mathematics. Full-text open access preprints (papers before peer review).
OpenAlex	250M+ works	No	All academic fields. The largest open catalog of scholarly works. Very reliable metadata.
Semantic Scholar	200M+ papers	Optional	All fields. Especially good for citation data (who cites whom). Works without a key but has rate limits.
PubMed	36M+ articles	No	Biomedical and life sciences. The go-to database for medicine, nursing, pharmacy, biology.
Crossref	150M+ DOI records	No	All fields. Covers most published journal articles. Good for finding DOIs and publication metadata.
DOAJ	9M+ open access	No	Social sciences, agriculture, education, regional journals. Only open-access articles.
Scopus	90M+ records	Yes (free for academics)	Engineering, chemistry, materials science. Strong citation metrics. Requires a free API key from Elsevier.
DBLP	6M+ CS papers	No	Computer Science conference papers (NeurIPS, ICML, ACL, CVPR, etc). Very focused on CS.

Smart source routing

When you select research categories, ScholarScout automatically picks the best databases for your field. Here is how it decides:

Your field	Databases used
Computer Science, Statistics, Electrical Engineering	arXiv + Semantic Scholar + OpenAlex + DBLP
Medicine	PubMed + Semantic Scholar + Crossref + Scopus
Biology	PubMed + OpenAlex + Crossref + DOAJ
Physics	arXiv + Semantic Scholar + OpenAlex + Crossref
Engineering	Crossref + OpenAlex + Semantic Scholar + Scopus
Chemistry	Crossref + OpenAlex + Semantic Scholar + PubMed
Mathematics	arXiv + Semantic Scholar + OpenAlex + Crossref
Social Sciences	Crossref + OpenAlex + DOAJ + Semantic Scholar
Earth Sciences, Agriculture	Crossref + OpenAlex + DOAJ + PubMed

This means if you are a medical researcher, ScholarScout will NOT waste time searching arXiv (which mostly has CS/Physics papers). It goes straight to PubMed and other medical databases.

Do I need API keys for the databases?

Short answer: No. Six out of eight databases work without any key. ScholarScout works out of the box.

The two optional keys are:

Semantic Scholar (S2_API_KEY) — Without a key, you get 100 requests per 5 minutes. With a free key, you get 10,000 requests per 5 minutes. Only matters if you run many searches quickly.
Scopus (SCOPUS_API_KEY) — Required to use Scopus at all. Free for academic/research use. If you do not set this key, Scopus is simply skipped (the other 7 databases still work).

See the Optional API Keys section for setup instructions.

LLM Providers

ScholarScout uses an AI language model (LLM) to analyze papers and generate ideas. You need to connect it to at least one AI provider. There are free options available.

You only need ONE provider. Pick one and follow the steps below.

Gemini (recommended for beginners)

Google's AI model. Free, fast, and the easiest to set up.

Cost: Free (15 requests per minute)
Speed: Fast
Best for: Most users. The free tier is generous enough for daily use.

How to get your free Gemini API key

Go to https://aistudio.google.com/app/apikey
Sign in with your Google account (any Gmail account works)
Click "Create API Key"
If asked to select a project, click "Create API key in new project"
Your key will appear. It looks something like: AIzaSyB... (a long string of letters and numbers)
Click the copy button next to the key
Go back to ScholarScout, open Settings, select "Gemini" as provider, and paste your key
Click "Test Connection" to verify it works

Note: Gemini's free tier gives you 15 requests per minute and 1,500 per day. A single ScholarScout "Run" uses about 3-5 requests. You can comfortably run it many times per day.

Groq (fast alternative)

Groq runs AI models on specialized hardware, making it very fast. Also has a free tier.

Cost: Free tier available
Speed: Very fast (often faster than Gemini)
Best for: Users who want speed, or as a backup when Gemini is rate-limited.

How to get your free Groq API key

Go to https://console.groq.com/keys
Create an account (you can sign up with Google, GitHub, or email)
Once logged in, you will see the API Keys page
Click "Create API Key"
Give it a name (anything, like "ScholarScout")
Copy the key that appears (you will only see it once!)
Go back to ScholarScout, open Settings, select "Groq" as provider, and paste your key
Click "Test Connection" to verify it works

Ollama (fully local, no internet needed)

Ollama runs AI models directly on your computer. No data leaves your machine. Good for privacy-sensitive research.

Cost: Free (uses your computer's processing power)
Speed: Depends on your computer's GPU. Slow on older machines.
Best for: Users who need complete privacy or do not have reliable internet.

How to set up Ollama

Go to https://ollama.com/download
Download and install Ollama for your operating system
Open a terminal and run: ollama pull llama3.2 (this downloads the AI model, about 2-4 GB)
Keep Ollama running in the background
In ScholarScout Settings, select "Ollama" as provider. No API key needed.

Hardware note: Ollama works best with a dedicated GPU (NVIDIA with 8GB+ VRAM). On a laptop without a GPU, responses will be slow (30-60 seconds per request instead of 2-3 seconds).

OpenRouter (access to 100+ models)

A gateway that lets you use many different AI models through one API key. Pay-per-use.

Cost: Pay per token (varies by model, some are very cheap)
Speed: Varies by model
Get key: https://openrouter.ai/keys
Best for: Power users who want to try different models.

OpenAI (GPT models)

The company behind ChatGPT. High quality but costs money.

Cost: Pay per token (about $0.01-0.03 per ScholarScout run)
Speed: Fast
Get key: https://platform.openai.com/api-keys
Best for: Users who already have an OpenAI account and want high-quality output.

Custom endpoint

Any AI service that uses the OpenAI-compatible API format. This includes LM Studio, vLLM, and other local AI servers.

Base URL: Your server address (e.g., http://localhost:1234/v1)
Model: Whatever model your server is running
API key: Optional (most local servers do not need one)

Which provider should I choose?

Priority	Choose
I want free and easy	Gemini
I want free and fast	Groq
I need complete privacy	Ollama
I want the best quality	OpenAI (paid)
I want to experiment with models	OpenRouter

Understanding Results

After ScholarScout finishes running, you will see a list of idea cards. This section explains what everything means.

Idea cards

Each card represents one generated idea. Here is what you will see on each card:

Title — A short name for the idea
Description — A 2-3 sentence summary of what the idea is about
Source paper(s) — The academic paper(s) that inspired this idea
Category — Which research field this belongs to
Quality score — A number indicating how promising the idea is

In Academic mode, cards also show methodology suggestions and research questions. In Product mode, cards show MVP features and tech stack. In Develop mode, cards show how the idea applies to your specific project.

Quality score

Each idea gets a quality score from 1 to 10. This is the AI's estimate of how good the idea is, based on:

Novelty — Is this idea actually new? Or has it been done before?
Feasibility — Can this realistically be done with available resources?
Impact — If successful, would this matter to the field?
Clarity — Is the idea well-defined enough to act on?

A score of 7+ is generally a strong idea worth exploring further. Scores of 4-6 are decent but may need refinement. Below 4 means the idea is vague or already well-explored.

Remember: The quality score is an AI estimate, not a guarantee. A score of 9 does not mean the idea will definitely work. Always apply your own judgment.

Novelty check

ScholarScout checks whether your generated idea already exists in the literature. It does this by:

Comparing the idea's text against titles and abstracts of papers in the database
Using both semantic similarity (meaning) and keyword overlap (exact words)
Flagging ideas that are too similar to existing work

If an idea is flagged as "low novelty," it means similar research already exists. This does not mean the idea is bad — it means you should read those existing papers first and find what makes your angle different.

Deep Dive

Click on any idea card to open the Deep Dive view. This gives you a detailed breakdown:

Research outline — Step-by-step plan for how to pursue this idea
Methodology — Specific methods, tools, or approaches to use
Expected challenges — What might go wrong and how to handle it
Related work — Other papers and projects in this space
Timeline estimate — How long this might take

Deep Dive is generated on-demand (when you click the card), so it takes a few seconds to load.

Grounding badges

In the Deep Dive view, you may see colored badges next to each section. These are "grounding indicators" that tell you how closely the AI's output matches the source paper:

Badge color	Meaning	What to do
Green (Source-aligned)	This section closely reflects what the source paper actually says	Good. The AI is sticking to the facts from the paper.
Yellow (Partially aligned)	Some claims are supported by the paper, but some may be inferred or extrapolated	Reasonable, but double-check specific claims against the original paper.
Red (Low alignment)	This section may not be directly supported by the source paper	The AI may be generating from general knowledge rather than the paper. Verify manually.

Important: Grounding badges measure topical similarity between the Deep Dive text and the source paper's abstract. They do NOT measure factual accuracy. A green badge means the AI is talking about the same topic as the paper — not that every statement is correct. Always read the original paper.

Bookmarks

Click the bookmark icon on any idea card to save it for later. Bookmarks are stored in your browser (not on a server), so they persist between sessions but are specific to your browser.

Tips & Tricks

These are best practices from experienced users to help you get better results from ScholarScout.

Start with Quick mode

Before running a full pipeline (which takes 2-5 minutes), try Quick mode first. It generates ideas in about 10 seconds using the AI's existing knowledge. This helps you:

Verify your API key works
See what kind of output to expect
Refine your category selection before committing to a full run

Pick 2-3 categories maximum

It is tempting to select many categories, but fewer is better:

More categories = more API requests = higher chance of hitting rate limits
More categories = longer processing time
Focused searches produce more relevant ideas

If you want to explore broadly, do multiple runs with different category combinations rather than selecting everything at once.

Use the context field

The context field (in your Profile settings) dramatically improves results. Tell ScholarScout about:

Your research background ("I am a 2nd year PhD student in computational biology")
Your constraints ("I have access to hospital EHR data but no wet lab")
Your interests ("I am interested in the intersection of NLP and clinical notes")
Your existing project (especially important for Develop mode)

The more specific your context, the more tailored the ideas will be.

Upload a document for better context

You can upload a PDF, text file, or markdown file as additional context. Good candidates:

Your thesis proposal draft
A paper you want to build upon
Your project's README file
A grant call description

The uploaded document is read by the AI and used to make ideas more relevant to your specific situation.

Run at different times of day

Free API tiers (Gemini, Groq, Semantic Scholar) have rate limits. If you get errors or 0 papers fetched:

Wait 5-10 minutes and try again
Try running during off-peak hours (early morning or late evening in US time zones)
Use Quick mode as a fallback (it does not fetch papers, so no rate limits from databases)

Combine modes for the same topic

Try running the same categories in different modes:

First, run in Academic mode to understand the research landscape
Then, run in Product mode to see what could be built from those papers
If you have a project, run in Develop mode to find applicable techniques

Each mode reads the same papers but asks different questions, giving you three different perspectives.

Use Deep Dive selectively

Deep Dive generates a detailed analysis for one idea. It costs one additional AI request per idea. Tips:

Only Deep Dive ideas with a quality score of 6 or higher
Read the short description first — if it does not interest you, skip the Deep Dive
Use Deep Dive results as a starting point, not a final plan

Check the source papers

Every idea links back to the paper(s) that inspired it. Always:

Read the abstract of the source paper
Check if the paper is from a reputable venue
Verify that the AI's interpretation matches what the paper actually says

ScholarScout is a brainstorming tool, not a fact-checker. The AI can misinterpret papers or make connections that do not hold up under scrutiny.

Save good ideas with bookmarks

Use the bookmark feature to save promising ideas. Then come back later with fresh eyes to evaluate them. Ideas that still seem good after a day or two are worth pursuing.

Frequently Asked Questions

General

What is ScholarScout?
ScholarScout is a free, open-source tool that reads academic papers from 8 databases and uses AI to generate research ideas, product concepts, or feature suggestions for your existing projects. It runs entirely on your own computer.

Is it really free?
Yes. ScholarScout itself is free and open source (MIT license). You need an AI API key to use it — both Google Gemini and Groq offer free tiers that are more than enough for regular use. You never need to pay anything.

Does it send my data to the cloud?
Your research queries and context are sent to the AI provider you choose (Gemini, Groq, etc.) for processing. Nothing else leaves your computer. There is no telemetry, no tracking, no analytics, no cloud storage. All generated ideas, cached papers, and session history stay on your machine.

What languages does it support?
The interface and output are in English. Papers are fetched in whatever language they are available in on the source databases (mostly English, but OpenAlex and Crossref include papers in many languages).

Do I need to know how to code?
No. You need to be able to copy-paste commands into a terminal (the Getting Started guide walks you through this step by step), but you do not need to write any code or understand programming.

Can I use it offline?
Partially. If you use Ollama (local AI) and only use Quick mode with cached papers, you can work offline. But fetching new papers from databases requires internet, and cloud AI providers (Gemini, Groq, OpenAI) require internet.

Usage questions

What is the difference between Quick and Run?

Run — Fetches fresh papers from academic databases (auto-selects best 3-4 per category), analyzes trends, then generates ideas. Takes 2-5 minutes. Results are grounded in real, recent papers.
Quick — Generates ideas instantly (~10 seconds) using the AI's knowledge and any cached papers from previous runs. Faster but less grounded.

Use Quick for fast brainstorming and testing. Use Run when you want thorough, paper-backed results.

Why did I get 0 papers or 0 ideas?
Usually this means you hit a rate limit. The most common cause is Semantic Scholar's free tier (100 requests per 5 minutes). Solutions:

Wait 5 minutes and try again
Use Quick mode (does not fetch papers)
Select fewer categories (each category triggers multiple API calls)
Get a free Semantic Scholar API key for higher limits (see Optional API Keys section)

Can I use my own local AI model?
Yes. Two options: (1) Ollama — download and run models locally with no internet needed after setup. (2) Custom endpoint — point ScholarScout at any OpenAI-compatible API server (LM Studio, vLLM, etc).

How do I change my AI provider or API key?
Click the Settings tab in the dashboard. You can change your provider, model, and API key at any time. Click "Test Connection" after making changes to verify everything works.

Where are my results saved?
Results are saved in the data/ folder inside the ScholarScout directory:

session_history.json — All your past sessions
snapshot_*.json — Detailed results from each run
papers_cache.json — Cached papers (used by Quick mode)
scholarscout_ideas_*.csv — Ideas exported as spreadsheet-compatible CSV

Can I export my ideas?
Yes. Each run automatically creates a CSV file in the data/ folder. You can open CSV files in Excel, Google Sheets, or any spreadsheet program.

How do I update ScholarScout?
If you used Git to download it:

cd ScholarScout
git pull
pip install -r requirements.txt

If you downloaded the ZIP: download the latest ZIP from GitHub, extract it, and replace your old files (keep your data/ folder and config.yaml to preserve your settings and history).

The setup wizard does not appear. How do I reconfigure?
The wizard only shows on first launch. To access settings later, use the Settings tab in the dashboard. If you want to force the wizard to appear again, clear your browser's localStorage for localhost:5050.

Quality and accuracy

Are the generated ideas guaranteed to be novel?
No. ScholarScout includes a novelty check that compares ideas against existing papers, but this is not exhaustive. Always do your own literature review before committing to an idea. Think of ScholarScout as a brainstorming partner, not a novelty guarantee.

Can I trust the Deep Dive analysis?
Deep Dive is AI-generated and should be treated as a starting point, not a final plan. Check the grounding badges — green means the content closely matches the source paper. Yellow and red sections may include AI extrapolation. Always verify claims against the original papers.

Why do some ideas seem generic or obvious?
This can happen when: (1) your context field is empty or too vague, (2) you selected very broad categories, or (3) the AI model is not powerful enough. Try adding more specific context, narrowing your categories, or switching to a stronger model.

Activity Center

The Activity Center is a popup that appears when you click Run. It shows you exactly what the pipeline is doing in real time.

Layout

The popup has three areas:

Left: Owl Chase game — A pixel art owl runs through a forest and catches paper dots as they are fetched. Each dot represents a real paper from one of the 8 databases. You can click or press Space to make the owl jump.
Right: Pipeline status — Shows the current phase, console log, and an LLM Chat tab that narrates what the AI is doing in casual language.
Bottom: Live graph — Paper dots grouped by category (or by cluster in Review mode). Dots animate into position as papers are fetched and clustered.

Pipeline phases

The phase list adapts to the mode you selected:

Default mode (5 phases)	Review mode (6 phases)
Validate LLM	Validate LLM
Fetch papers	Fetch papers
Analyze trends	Cluster papers
Generate ideas	Synthesize clusters
Save results	Cross-cutting analysis
	Save results

Mini notification bar

If you close the Activity Center while the pipeline is still running, a small notification bar appears at the bottom of the screen. Click it to reopen the Activity Center without losing progress.

The owl game

The owl game is not just decoration. Each paper dot that spawns corresponds to a real paper being fetched from a database. The dot colors indicate the source:

White = arXiv
Gray = OpenAlex
Blue = Semantic Scholar
Green = PubMed
Purple = Crossref
Teal = DOAJ
Orange = Scopus / DBLP

When the pipeline finishes, the owl does a celebration animation. Your score (papers caught / total) is shown in the top-right corner.

Custom Skills

Skills are text files that tell the AI what kind of output to produce. They act as "personality profiles" for the idea generator. ScholarScout comes with built-in skills, but you can create your own.

How skills work

When you select a goal (like "Thesis" or "Hackathon"), ScholarScout loads a corresponding skill file and adds it to the AI prompt. The skill file contains instructions like:

What constraints to follow (timeline, budget, scope)
What the output should look like
What to avoid (anti-patterns)
What makes a good result for this specific goal

For example, the "Hackathon" skill tells the AI: "Ideas must be buildable in 24-48 hours by a small team, use freely available tools, and have a clear demo-able output."

Built-in skills

skills/
├── ACADEMIC/                  (research-oriented goals)
│   ├── UNDERGRADUATE/SKILL.md
│   ├── MASTERS/SKILL.md
│   ├── PHD/SKILL.md
│   ├── THESIS/SKILL.md
│   ├── PUBLICATION/SKILL.md
│   ├── GRANT_PROPOSAL/SKILL.md
│   ├── LAB_SCIENTIST/SKILL.md
│   ├── CLINICAL_RESEARCHER/SKILL.md
│   └── DATA_SCIENTIST/SKILL.md
├── PRODUCT/                   (build something new)
│   ├── HACKATHON/SKILL.md
│   ├── SIDE_PROJECT/SKILL.md
│   ├── AI_TOOL/SKILL.md
│   └── INDUSTRY_RND/SKILL.md
└── DEVELOP/                   (improve existing project)
    ├── FEATURE/SKILL.md
    ├── INTEGRATION/SKILL.md
    ├── OPTIMIZATION/SKILL.md
    ├── EXTENSION/SKILL.md
    └── PIVOT/SKILL.md

Creating your own skill

You can create a custom skill for your specific needs. For example, if you are writing a specific type of grant proposal, you can create a skill that tailors the output to that format.

Step-by-step

Decide which mode your skill belongs to:
- skills/ACADEMIC/ — for research goals
- skills/PRODUCT/ — for building products
- skills/DEVELOP/ — for improving existing projects
Create a new folder with your skill name (use UPPERCASE and underscores):
```
skills/ACADEMIC/MY_CUSTOM_SKILL/
```
Create a file called SKILL.md inside that folder
Write your skill instructions in that file (see template below)
Restart ScholarScout — your skill will automatically appear as a goal option

Skill file template

# [Your Skill Name]

## Profile
- Duration: [how long the project should take]
- Resources: [what the user has access to]
- Budget: [financial constraints]
- Scope: [how big/small the project should be]

## Constraints
- Ideas MUST [requirement 1]
- Ideas MUST [requirement 2]
- Ideas MUST NOT [anti-pattern 1]

## Output expectations
- Each idea should include [specific elements]
- A good idea for this goal looks like [description]

## Anti-patterns (avoid these)
- [Common mistake 1]
- [Common mistake 2]

Tips for writing good skills

Keep it under 2000 characters (longer files get truncated)
Be specific about constraints — vague instructions produce vague output
Include examples of what a good result looks like
List anti-patterns (what NOT to do) — this helps the AI avoid common mistakes

Sharing skills

If you create a useful skill (e.g., "How to write an NIH R01 grant proposal" or "Agricultural extension project for developing countries"), consider sharing it with the community by submitting a pull request on GitHub. Skills are the easiest way to contribute to ScholarScout.

Architecture

This section explains how ScholarScout works internally. Useful if you want to contribute code or understand what happens when you click "Run".

Pipeline flow

When you click "Run", here is what happens behind the scenes:

User clicks Run
  → Browser sends request to /api/run
  → Server spawns a background process (run_pipeline.py)
  → Orchestrator takes over:
      → Phase 1: Fetch papers
          (queries 3-4 databases based on your categories)
      → Phase 2: Analyze trends
          (AI reads papers, identifies keywords, gaps, saturation)
      → Phase 3: Generate ideas
          (AI generates ideas in academic/product/develop mode)
      → Phase 4: Write output
          (saves CSV + JSON snapshot + updates session history)
  → Progress is streamed back to the browser in real-time (SSE)

Key modules

Module	What it does
`preview_server.py`	Entry point. Starts the web server. Very small file (~40 lines).
`src/core/orchestrator.py`	Pipeline controller. Coordinates all four phases in sequence.
`src/core/analyzer.py`	Trend analysis. Sends papers to the AI and asks for keywords, gaps, and saturation levels.
`src/core/generator.py`	Idea generation. Four modes (academic, product, develop, review). Handles chunked generation for large outputs.
`src/core/deep_dive.py`	Detailed analysis per idea. Also handles grounding verification (comparing output to source paper).
`src/core/novelty_checker.py`	Checks if an idea already exists. Uses semantic similarity and keyword overlap.
`src/core/llm.py`	Multi-provider AI client. Handles Gemini, Groq, OpenAI, Ollama, OpenRouter, and custom endpoints.
`src/core/config.py`	Configuration management. Reads config.yaml, handles feature flags and paths.
`src/core/fetchers/`	Paper fetching. One file per database (arxiv_fetcher.py, pubmed_fetcher.py, etc). All inherit from BaseFetcher.

Web routes

Route file	Endpoints
`src/web/routes/pipeline.py`	`/api/run`, `/api/progress` — Start pipeline, stream progress
`src/web/routes/ideas.py`	`/api/quick` — Quick mode idea generation
`src/web/routes/analysis.py`	`/api/deepdive`, `/api/novelty` — Deep Dive and novelty check
`src/web/routes/settings.py`	`/api/settings` — Read/write config, test connection
`src/web/routes/sessions.py`	`/api/sessions` — Session history
`src/web/routes/upload.py`	`/api/upload` — File upload for context

Data files

File	Purpose
`data/papers_cache.json`	Cached papers from previous runs (used by Quick mode)
`data/session_history.json`	Record of all past sessions
`data/pipeline_progress.jsonl`	Real-time progress log (read by SSE stream)
`data/snapshot_*.json`	Full results from each run
`config.yaml`	User settings (provider, API key, feature flags)

Feature flags

Advanced features can be toggled on/off in config.yaml under the features: section, or via environment variables:

FEATURE_REFINE / SCOUT_REFINE=1 — Self-distillation (AI refines its own output)
FEATURE_SENSITIVITY / SCOUT_SENSITIVITY=1 — Prompt sensitivity check
FEATURE_GROUNDING / SCOUT_GROUNDING=1 — Deep Dive grounding verification
CACHE_EXPIRY_DAYS — How many days before cached papers expire

Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run unit tests only
pytest tests/ -m "not integration"

# Run all tests (needs Flask)
pytest tests/

# Run JavaScript tests
npm test

Optional API Keys

ScholarScout works without any database API keys. However, two optional keys can improve your experience by unlocking additional databases or higher rate limits.

Semantic Scholar API Key (S2_API_KEY)

What it does: Increases your Semantic Scholar rate limit from 100 requests per 5 minutes to 10,000 requests per 5 minutes. Without this key, Semantic Scholar still works — you just might hit rate limits if you run many searches quickly.

Cost: Free

How to get it

Go to https://www.semanticscholar.org/product/api
Click "Get API Key" or "Request API Key"
Fill in the form (name, email, what you are using it for — just say "academic research tool")
You will receive your API key by email (usually within a few minutes)

How to set it up

Windows (Command Prompt):

set S2_API_KEY=your-key-here
python preview_server.py

Windows (PowerShell):

$env:S2_API_KEY="your-key-here"
python preview_server.py

Mac / Linux:

export S2_API_KEY="your-key-here"
python3 preview_server.py

To make it permanent (so you do not have to type it every time):

Windows: Search for "Environment Variables" in the Start menu, click "Edit the system environment variables", click "Environment Variables", then add a new User variable with name S2_API_KEY and your key as the value.
Mac/Linux: Add export S2_API_KEY="your-key-here" to your ~/.bashrc or ~/.zshrc file.

Scopus API Key (SCOPUS_API_KEY)

What it does: Enables the Scopus database (90M+ records covering engineering, chemistry, materials science, and more). Without this key, Scopus is simply skipped — the other 7 databases still work fine.

Cost: Free for academic and research use

How to get it

Go to https://dev.elsevier.com/
Click "Register" to create an account
After registering, go to "My API Key" section
Click "Create API Key"
Fill in the details:
- Label: anything (e.g., "ScholarScout")
- Website: can be left blank or use your institution's website
Your API key will be displayed. Copy it.

Note: Elsevier's free API access is intended for academic and research purposes. Commercial use requires a separate agreement. If you are a student or researcher at a university, you qualify for free access.

How to set it up

Windows (Command Prompt):

set SCOPUS_API_KEY=your-key-here
python preview_server.py

Windows (PowerShell):

$env:SCOPUS_API_KEY="your-key-here"
python preview_server.py

Mac / Linux:

export SCOPUS_API_KEY="your-key-here"
python3 preview_server.py

To make it permanent, follow the same steps as described for S2_API_KEY above.

Do I really need these keys?

For most users: No. The 6 free databases (arXiv, OpenAlex, PubMed, Crossref, DOAJ, DBLP) plus the free tier of Semantic Scholar cover the vast majority of academic fields. You only need these optional keys if:

You are running many searches in quick succession and hitting Semantic Scholar rate limits
You specifically need Scopus data (engineering, chemistry, materials science with citation metrics)

Contributing

ScholarScout is open source and welcomes contributions. Here are the main ways you can help.

Ways to contribute

Contribution type	Difficulty	Description
Write a skill file	Easy	Create a SKILL.md for your domain expertise (no coding needed)
Report a bug	Easy	Open an issue on GitHub describing what went wrong
Improve prompts	Medium	Make the AI generate better output by refining prompt templates
Add a fetcher	Medium	Add support for a new academic database
Fix a bug	Medium	Pick an open issue and submit a fix
Add a feature	Hard	Implement a new capability

Adding a new fetcher (new database)

If you know of an academic database that ScholarScout does not support yet, you can add it:

Create a new file: src/core/fetchers/my_fetcher.py
Make it inherit from BaseFetcher (see src/core/fetchers/base.py for the interface)
Implement the fetch_papers(category, max_results) method that returns a list of Paper objects
Add a category-to-keyword mapping (so the fetcher knows what to search for each category)
Register your fetcher in src/core/orchestrator.py:
- Add it to self.all_fetchers dictionary
- Add it to relevant entries in self._source_routes
Make sure your fetcher returns [] (empty list) gracefully if the API key is missing or the category is not supported

Improving prompts

The AI prompts are in these files:

src/core/analyzer.py — Trend analysis prompts
src/core/generator.py — Idea generation prompts (one per mode)
src/core/deep_dive.py — Deep Dive analysis prompts

When improving prompts:

Test with at least 2 different providers (Gemini + Groq recommended)
Make sure the output format does not change (other code parses the AI's response)
Keep prompts concise — longer prompts cost more tokens and can confuse smaller models
Document what you changed and why in your pull request

Development setup

# Clone the repository
git clone https://github.com/neej4/ScholarScout.git
cd ScholarScout

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest tests/ -m "not integration"

# Start the server
python preview_server.py

Pull request guidelines

Create a new branch for your changes (do not commit directly to main)
Keep changes focused — one feature or fix per pull request
Add tests for new functionality
Make sure existing tests still pass
Write a clear description of what you changed and why

Reporting bugs

Open an issue at github.com/neej4/ScholarScout/issues with:

What you expected to happen
What actually happened
Steps to reproduce the problem
Your operating system and Python version
Which AI provider you are using
Any error messages from the terminal