Getting Started

This guide walks you through installing ScholarScout on your computer, step by step. No programming experience needed. If you can copy-paste text, you can do this.

What is ScholarScout?

ScholarScout is a free tool that reads academic papers from 8 databases and uses AI to generate research ideas, product concepts, or feature suggestions. It runs on your own computer (not in the cloud), so your data stays private.

What you need before starting

  • A computer running Windows, Mac, or Linux
  • Internet connection (to download papers and talk to the AI)
  • About 10 minutes for the initial setup

You do NOT need to know how to code. Just follow the steps below.

Step 1: Install Python

Python is the programming language ScholarScout is built with. You need it installed on your computer.

Windows

  1. Go to python.org/downloads
  2. Click the big yellow "Download Python" button
  3. Run the downloaded file
  4. IMPORTANT: Check the box that says "Add Python to PATH" at the bottom of the installer window
  5. Click "Install Now"
  6. Wait for it to finish, then close the installer

Mac

  1. Go to python.org/downloads
  2. Click the big yellow "Download Python" button
  3. Open the downloaded .pkg file
  4. Follow the installer steps (just keep clicking "Continue" and "Agree")
  5. Click "Install"

Linux (Ubuntu/Debian)

Open a terminal and type:

sudo apt update
sudo apt install python3 python3-pip python3-venv

Check if Python is installed correctly

Open a terminal (or Command Prompt on Windows) and type:

python --version

You should see something like Python 3.12.4. Any number 3.10 or higher is fine. If you see an error on Mac/Linux, try python3 --version instead.

Step 2: Download ScholarScout

You have two options:

Option A: Download as ZIP (easiest)

  1. Go to github.com/neej4/ScholarScout
  2. Click the green "Code" button
  3. Click "Download ZIP"
  4. Extract the ZIP file to a folder you can find easily (like your Desktop or Documents)

Option B: Use Git (if you have it installed)

git clone https://github.com/neej4/ScholarScout.git

Step 3: Install dependencies

ScholarScout needs some extra Python packages to work. Here is how to install them:

Windows

  1. Open File Explorer and navigate to the ScholarScout folder
  2. Click on the address bar at the top (where it shows the folder path)
  3. Type cmd and press Enter. This opens a command window in that folder.
  4. Type the following and press Enter:
pip install -r requirements.txt

Mac / Linux

  1. Open Terminal
  2. Navigate to the ScholarScout folder. For example, if it is on your Desktop:
cd ~/Desktop/ScholarScout
pip3 install -r requirements.txt

Wait for all packages to download and install. This may take 1-2 minutes.

Step 4: Get a free AI API key

ScholarScout needs an AI service to generate ideas. The easiest free option is Google Gemini:

  1. Go to aistudio.google.com/app/apikey
  2. Sign in with your Google account
  3. Click "Create API Key"
  4. Copy the key (it looks like a long string of random letters and numbers)
  5. Keep this key somewhere safe. You will paste it into ScholarScout in the next step.
Tip: Gemini gives you 15 free requests per minute. That is more than enough for normal use. If you want an alternative, Groq is also free — see the LLM Providers section.

Step 5: Start ScholarScout

In the same terminal/command window from Step 3:

python preview_server.py

(On Mac/Linux, use python3 preview_server.py if python does not work.)

You should see a message saying the server is running. Now open your web browser and go to:

http://localhost:5050

Step 6: First-time setup wizard

When you open ScholarScout for the first time, a setup wizard appears. Follow these steps:

  1. Choose a provider: Select "Gemini" (or whichever provider you got a key for)
  2. Paste your API key: Paste the key you copied in Step 4
  3. Test connection: Click the "Test" button. You should see a green checkmark.
  4. Pick categories: Choose 2-3 research fields you are interested in (e.g., "Machine Learning", "Medicine", "Agriculture")
  5. Done! Click finish to close the wizard

Step 7: Generate your first ideas

You are now ready to use ScholarScout. You have two buttons:

  • Quick — Generates ideas instantly (~10 seconds) using AI knowledge. Great for a first test.
  • Run — Fetches fresh papers from academic databases, analyzes trends, then generates ideas (2-5 minutes). More thorough and grounded in real papers.

Try clicking Quick first to see how it works. Then try Run for the full experience.

Troubleshooting

ProblemSolution
python is not recognized On Windows: reinstall Python and make sure "Add to PATH" is checked. On Mac/Linux: try python3 instead.
pip install fails Try pip3 install -r requirements.txt or python -m pip install -r requirements.txt
Port 5050 already in use Another program is using that port. Close it, or edit preview_server.py and change the port number.
"LLM unreachable" error Your API key might be wrong. Go to Settings tab and re-paste your key. Click "Test Connection".
0 papers fetched You hit a rate limit. Wait 5 minutes and try again, or use Quick mode (which does not fetch papers).
Page is blank / nothing loads Make sure the terminal still shows the server running. If it crashed, run python preview_server.py again.

Three Modes

ScholarScout can generate three different types of output from the same papers. Think of it like asking three different experts to read the same research and give you different kinds of advice.

Academic Mode

Who is this for? Students (undergraduate, masters, PhD), researchers, anyone writing a thesis or paper.

What it produces:

  • Research topic suggestions with clear research questions
  • Suggested methodology (how to actually do the research)
  • Key papers you should read
  • Novelty check (is this idea actually new?)
  • Quality score based on feasibility and originality

Available goals:

  • Any — General exploration, no constraints
  • Thesis — Scoped for a thesis project (6-12 months)
  • Publication — Aimed at publishable research
  • Grant Proposal — Framed for funding applications

Example: A medical student selects categories "Medicine" and "Machine Learning", sets goal to "Thesis". ScholarScout reads recent papers and suggests: "Predicting antibiotic resistance patterns using transformer models on hospital lab data — gap: no existing study combines temporal lab sequences with resistance metadata."

Product Mode

Who is this for? Entrepreneurs, developers, hackathon participants, anyone who wants to build something.

What it produces:

  • Product name and one-line description
  • MVP features (minimum viable product — what to build first)
  • Suggested tech stack
  • Revenue model ideas
  • Existing competitors and how your idea differs

Available goals:

  • Hackathon — Buildable in 24-48 hours
  • Side Project — Weekend/hobby scope
  • AI Tool — AI-powered product ideas
  • Industry R&D — Enterprise-scale research and development

Example: A developer selects "Natural Language Processing" category, sets goal to "Hackathon". ScholarScout suggests: "PaperBrief — a browser extension that summarizes any arXiv paper into a 3-bullet TL;DR using the retrieval-augmented approach from [recent paper]."

Develop Mode

Who is this for? Developers who already have a project and want to improve it using ideas from recent research.

What it produces:

  • Feature suggestions directly applicable to YOUR project
  • Integration opportunities (connect your project with new techniques)
  • Performance optimizations based on recent papers

Available goals:

  • Feature — New functionality to add
  • Integration — Connect with external systems or techniques
  • Optimization — Make existing features faster or better
  • Extension — Expand scope of your project
  • Pivot — Explore new directions for your project

Important: In Develop mode, you MUST describe your project in the "Context" field. Every generated idea will be specifically about improving your project. If you leave the context empty, the results will be generic.

Example: You describe your project as "A mobile app for tracking plant growth using phone camera photos." ScholarScout reads computer vision papers and suggests: "Add disease detection using the few-shot learning approach from [paper] — requires only 5 example images per disease type."

How to choose the right mode

Your situationModeGoal
I need a thesis topicAcademicThesis
I want to explore what is new in my fieldAcademicAny
I have a hackathon this weekendProductHackathon
I want to build a startup ideaProductSide Project or AI Tool
I have an app and want new featuresDevelopFeature
My project is slow, I want to optimize itDevelopOptimization

Data Sources

ScholarScout fetches papers from 8 academic databases. You do not need to configure anything — the system automatically picks the best 3-4 databases based on your chosen research categories.

The 8 databases

SourceSizeAPI Key Needed?What it covers
arXiv 2.4M+ preprints No Computer Science, Physics, Mathematics. Full-text open access preprints (papers before peer review).
OpenAlex 250M+ works No All academic fields. The largest open catalog of scholarly works. Very reliable metadata.
Semantic Scholar 200M+ papers Optional All fields. Especially good for citation data (who cites whom). Works without a key but has rate limits.
PubMed 36M+ articles No Biomedical and life sciences. The go-to database for medicine, nursing, pharmacy, biology.
Crossref 150M+ DOI records No All fields. Covers most published journal articles. Good for finding DOIs and publication metadata.
DOAJ 9M+ open access No Social sciences, agriculture, education, regional journals. Only open-access articles.
Scopus 90M+ records Yes (free for academics) Engineering, chemistry, materials science. Strong citation metrics. Requires a free API key from Elsevier.
DBLP 6M+ CS papers No Computer Science conference papers (NeurIPS, ICML, ACL, CVPR, etc). Very focused on CS.

Smart source routing

When you select research categories, ScholarScout automatically picks the best databases for your field. Here is how it decides:

Your fieldDatabases used
Computer Science, Statistics, Electrical EngineeringarXiv + Semantic Scholar + OpenAlex + DBLP
MedicinePubMed + Semantic Scholar + Crossref + Scopus
BiologyPubMed + OpenAlex + Crossref + DOAJ
PhysicsarXiv + Semantic Scholar + OpenAlex + Crossref
EngineeringCrossref + OpenAlex + Semantic Scholar + Scopus
ChemistryCrossref + OpenAlex + Semantic Scholar + PubMed
MathematicsarXiv + Semantic Scholar + OpenAlex + Crossref
Social SciencesCrossref + OpenAlex + DOAJ + Semantic Scholar
Earth Sciences, AgricultureCrossref + OpenAlex + DOAJ + PubMed

This means if you are a medical researcher, ScholarScout will NOT waste time searching arXiv (which mostly has CS/Physics papers). It goes straight to PubMed and other medical databases.

Do I need API keys for the databases?

Short answer: No. Six out of eight databases work without any key. ScholarScout works out of the box.

The two optional keys are:

  • Semantic Scholar (S2_API_KEY) — Without a key, you get 100 requests per 5 minutes. With a free key, you get 10,000 requests per 5 minutes. Only matters if you run many searches quickly.
  • Scopus (SCOPUS_API_KEY) — Required to use Scopus at all. Free for academic/research use. If you do not set this key, Scopus is simply skipped (the other 7 databases still work).

See the Optional API Keys section for setup instructions.

LLM Providers

ScholarScout uses an AI language model (LLM) to analyze papers and generate ideas. You need to connect it to at least one AI provider. There are free options available.

You only need ONE provider. Pick one and follow the steps below.

Gemini (recommended for beginners)

Google's AI model. Free, fast, and the easiest to set up.

  • Cost: Free (15 requests per minute)
  • Speed: Fast
  • Best for: Most users. The free tier is generous enough for daily use.

How to get your free Gemini API key

  1. Go to https://aistudio.google.com/app/apikey
  2. Sign in with your Google account (any Gmail account works)
  3. Click "Create API Key"
  4. If asked to select a project, click "Create API key in new project"
  5. Your key will appear. It looks something like: AIzaSyB... (a long string of letters and numbers)
  6. Click the copy button next to the key
  7. Go back to ScholarScout, open Settings, select "Gemini" as provider, and paste your key
  8. Click "Test Connection" to verify it works
Note: Gemini's free tier gives you 15 requests per minute and 1,500 per day. A single ScholarScout "Run" uses about 3-5 requests. You can comfortably run it many times per day.

Groq (fast alternative)

Groq runs AI models on specialized hardware, making it very fast. Also has a free tier.

  • Cost: Free tier available
  • Speed: Very fast (often faster than Gemini)
  • Best for: Users who want speed, or as a backup when Gemini is rate-limited.

How to get your free Groq API key

  1. Go to https://console.groq.com/keys
  2. Create an account (you can sign up with Google, GitHub, or email)
  3. Once logged in, you will see the API Keys page
  4. Click "Create API Key"
  5. Give it a name (anything, like "ScholarScout")
  6. Copy the key that appears (you will only see it once!)
  7. Go back to ScholarScout, open Settings, select "Groq" as provider, and paste your key
  8. Click "Test Connection" to verify it works

Ollama (fully local, no internet needed)

Ollama runs AI models directly on your computer. No data leaves your machine. Good for privacy-sensitive research.

  • Cost: Free (uses your computer's processing power)
  • Speed: Depends on your computer's GPU. Slow on older machines.
  • Best for: Users who need complete privacy or do not have reliable internet.

How to set up Ollama

  1. Go to https://ollama.com/download
  2. Download and install Ollama for your operating system
  3. Open a terminal and run: ollama pull llama3.2 (this downloads the AI model, about 2-4 GB)
  4. Keep Ollama running in the background
  5. In ScholarScout Settings, select "Ollama" as provider. No API key needed.
Hardware note: Ollama works best with a dedicated GPU (NVIDIA with 8GB+ VRAM). On a laptop without a GPU, responses will be slow (30-60 seconds per request instead of 2-3 seconds).

OpenRouter (access to 100+ models)

A gateway that lets you use many different AI models through one API key. Pay-per-use.

  • Cost: Pay per token (varies by model, some are very cheap)
  • Speed: Varies by model
  • Get key: https://openrouter.ai/keys
  • Best for: Power users who want to try different models.

OpenAI (GPT models)

The company behind ChatGPT. High quality but costs money.

  • Cost: Pay per token (about $0.01-0.03 per ScholarScout run)
  • Speed: Fast
  • Get key: https://platform.openai.com/api-keys
  • Best for: Users who already have an OpenAI account and want high-quality output.

Custom endpoint

Any AI service that uses the OpenAI-compatible API format. This includes LM Studio, vLLM, and other local AI servers.

  • Base URL: Your server address (e.g., http://localhost:1234/v1)
  • Model: Whatever model your server is running
  • API key: Optional (most local servers do not need one)

Which provider should I choose?

PriorityChoose
I want free and easyGemini
I want free and fastGroq
I need complete privacyOllama
I want the best qualityOpenAI (paid)
I want to experiment with modelsOpenRouter

Understanding Results

After ScholarScout finishes running, you will see a list of idea cards. This section explains what everything means.

Idea cards

Each card represents one generated idea. Here is what you will see on each card:

  • Title — A short name for the idea
  • Description — A 2-3 sentence summary of what the idea is about
  • Source paper(s) — The academic paper(s) that inspired this idea
  • Category — Which research field this belongs to
  • Quality score — A number indicating how promising the idea is

In Academic mode, cards also show methodology suggestions and research questions. In Product mode, cards show MVP features and tech stack. In Develop mode, cards show how the idea applies to your specific project.

Quality score

Each idea gets a quality score from 1 to 10. This is the AI's estimate of how good the idea is, based on:

  • Novelty — Is this idea actually new? Or has it been done before?
  • Feasibility — Can this realistically be done with available resources?
  • Impact — If successful, would this matter to the field?
  • Clarity — Is the idea well-defined enough to act on?

A score of 7+ is generally a strong idea worth exploring further. Scores of 4-6 are decent but may need refinement. Below 4 means the idea is vague or already well-explored.

Remember: The quality score is an AI estimate, not a guarantee. A score of 9 does not mean the idea will definitely work. Always apply your own judgment.

Novelty check

ScholarScout checks whether your generated idea already exists in the literature. It does this by:

  1. Comparing the idea's text against titles and abstracts of papers in the database
  2. Using both semantic similarity (meaning) and keyword overlap (exact words)
  3. Flagging ideas that are too similar to existing work

If an idea is flagged as "low novelty," it means similar research already exists. This does not mean the idea is bad — it means you should read those existing papers first and find what makes your angle different.

Deep Dive

Click on any idea card to open the Deep Dive view. This gives you a detailed breakdown:

  • Research outline — Step-by-step plan for how to pursue this idea
  • Methodology — Specific methods, tools, or approaches to use
  • Expected challenges — What might go wrong and how to handle it
  • Related work — Other papers and projects in this space
  • Timeline estimate — How long this might take

Deep Dive is generated on-demand (when you click the card), so it takes a few seconds to load.

Grounding badges

In the Deep Dive view, you may see colored badges next to each section. These are "grounding indicators" that tell you how closely the AI's output matches the source paper:

Badge colorMeaningWhat to do
Green (Source-aligned) This section closely reflects what the source paper actually says Good. The AI is sticking to the facts from the paper.
Yellow (Partially aligned) Some claims are supported by the paper, but some may be inferred or extrapolated Reasonable, but double-check specific claims against the original paper.
Red (Low alignment) This section may not be directly supported by the source paper The AI may be generating from general knowledge rather than the paper. Verify manually.
Important: Grounding badges measure topical similarity between the Deep Dive text and the source paper's abstract. They do NOT measure factual accuracy. A green badge means the AI is talking about the same topic as the paper — not that every statement is correct. Always read the original paper.

Bookmarks

Click the bookmark icon on any idea card to save it for later. Bookmarks are stored in your browser (not on a server), so they persist between sessions but are specific to your browser.

Tips & Tricks

These are best practices from experienced users to help you get better results from ScholarScout.

Start with Quick mode

Before running a full pipeline (which takes 2-5 minutes), try Quick mode first. It generates ideas in about 10 seconds using the AI's existing knowledge. This helps you:

  • Verify your API key works
  • See what kind of output to expect
  • Refine your category selection before committing to a full run

Pick 2-3 categories maximum

It is tempting to select many categories, but fewer is better:

  • More categories = more API requests = higher chance of hitting rate limits
  • More categories = longer processing time
  • Focused searches produce more relevant ideas

If you want to explore broadly, do multiple runs with different category combinations rather than selecting everything at once.

Use the context field

The context field (in your Profile settings) dramatically improves results. Tell ScholarScout about:

  • Your research background ("I am a 2nd year PhD student in computational biology")
  • Your constraints ("I have access to hospital EHR data but no wet lab")
  • Your interests ("I am interested in the intersection of NLP and clinical notes")
  • Your existing project (especially important for Develop mode)

The more specific your context, the more tailored the ideas will be.

Upload a document for better context

You can upload a PDF, text file, or markdown file as additional context. Good candidates:

  • Your thesis proposal draft
  • A paper you want to build upon
  • Your project's README file
  • A grant call description

The uploaded document is read by the AI and used to make ideas more relevant to your specific situation.

Run at different times of day

Free API tiers (Gemini, Groq, Semantic Scholar) have rate limits. If you get errors or 0 papers fetched:

  • Wait 5-10 minutes and try again
  • Try running during off-peak hours (early morning or late evening in US time zones)
  • Use Quick mode as a fallback (it does not fetch papers, so no rate limits from databases)

Combine modes for the same topic

Try running the same categories in different modes:

  1. First, run in Academic mode to understand the research landscape
  2. Then, run in Product mode to see what could be built from those papers
  3. If you have a project, run in Develop mode to find applicable techniques

Each mode reads the same papers but asks different questions, giving you three different perspectives.

Use Deep Dive selectively

Deep Dive generates a detailed analysis for one idea. It costs one additional AI request per idea. Tips:

  • Only Deep Dive ideas with a quality score of 6 or higher
  • Read the short description first — if it does not interest you, skip the Deep Dive
  • Use Deep Dive results as a starting point, not a final plan

Check the source papers

Every idea links back to the paper(s) that inspired it. Always:

  • Read the abstract of the source paper
  • Check if the paper is from a reputable venue
  • Verify that the AI's interpretation matches what the paper actually says

ScholarScout is a brainstorming tool, not a fact-checker. The AI can misinterpret papers or make connections that do not hold up under scrutiny.

Save good ideas with bookmarks

Use the bookmark feature to save promising ideas. Then come back later with fresh eyes to evaluate them. Ideas that still seem good after a day or two are worth pursuing.

Frequently Asked Questions

General

What is ScholarScout?
ScholarScout is a free, open-source tool that reads academic papers from 8 databases and uses AI to generate research ideas, product concepts, or feature suggestions for your existing projects. It runs entirely on your own computer.

Is it really free?
Yes. ScholarScout itself is free and open source (MIT license). You need an AI API key to use it — both Google Gemini and Groq offer free tiers that are more than enough for regular use. You never need to pay anything.

Does it send my data to the cloud?
Your research queries and context are sent to the AI provider you choose (Gemini, Groq, etc.) for processing. Nothing else leaves your computer. There is no telemetry, no tracking, no analytics, no cloud storage. All generated ideas, cached papers, and session history stay on your machine.

What languages does it support?
The interface and output are in English. Papers are fetched in whatever language they are available in on the source databases (mostly English, but OpenAlex and Crossref include papers in many languages).

Do I need to know how to code?
No. You need to be able to copy-paste commands into a terminal (the Getting Started guide walks you through this step by step), but you do not need to write any code or understand programming.

Can I use it offline?
Partially. If you use Ollama (local AI) and only use Quick mode with cached papers, you can work offline. But fetching new papers from databases requires internet, and cloud AI providers (Gemini, Groq, OpenAI) require internet.

Usage questions

What is the difference between Quick and Run?

  • Run — Fetches fresh papers from academic databases (auto-selects best 3-4 per category), analyzes trends, then generates ideas. Takes 2-5 minutes. Results are grounded in real, recent papers.
  • Quick — Generates ideas instantly (~10 seconds) using the AI's knowledge and any cached papers from previous runs. Faster but less grounded.

Use Quick for fast brainstorming and testing. Use Run when you want thorough, paper-backed results.

Why did I get 0 papers or 0 ideas?
Usually this means you hit a rate limit. The most common cause is Semantic Scholar's free tier (100 requests per 5 minutes). Solutions:

  • Wait 5 minutes and try again
  • Use Quick mode (does not fetch papers)
  • Select fewer categories (each category triggers multiple API calls)
  • Get a free Semantic Scholar API key for higher limits (see Optional API Keys section)

Can I use my own local AI model?
Yes. Two options: (1) Ollama — download and run models locally with no internet needed after setup. (2) Custom endpoint — point ScholarScout at any OpenAI-compatible API server (LM Studio, vLLM, etc).

How do I change my AI provider or API key?
Click the Settings tab in the dashboard. You can change your provider, model, and API key at any time. Click "Test Connection" after making changes to verify everything works.

Where are my results saved?
Results are saved in the data/ folder inside the ScholarScout directory:

  • session_history.json — All your past sessions
  • snapshot_*.json — Detailed results from each run
  • papers_cache.json — Cached papers (used by Quick mode)
  • scholarscout_ideas_*.csv — Ideas exported as spreadsheet-compatible CSV

Can I export my ideas?
Yes. Each run automatically creates a CSV file in the data/ folder. You can open CSV files in Excel, Google Sheets, or any spreadsheet program.

How do I update ScholarScout?
If you used Git to download it:

cd ScholarScout
git pull
pip install -r requirements.txt

If you downloaded the ZIP: download the latest ZIP from GitHub, extract it, and replace your old files (keep your data/ folder and config.yaml to preserve your settings and history).

The setup wizard does not appear. How do I reconfigure?
The wizard only shows on first launch. To access settings later, use the Settings tab in the dashboard. If you want to force the wizard to appear again, clear your browser's localStorage for localhost:5050.

Quality and accuracy

Are the generated ideas guaranteed to be novel?
No. ScholarScout includes a novelty check that compares ideas against existing papers, but this is not exhaustive. Always do your own literature review before committing to an idea. Think of ScholarScout as a brainstorming partner, not a novelty guarantee.

Can I trust the Deep Dive analysis?
Deep Dive is AI-generated and should be treated as a starting point, not a final plan. Check the grounding badges — green means the content closely matches the source paper. Yellow and red sections may include AI extrapolation. Always verify claims against the original papers.

Why do some ideas seem generic or obvious?
This can happen when: (1) your context field is empty or too vague, (2) you selected very broad categories, or (3) the AI model is not powerful enough. Try adding more specific context, narrowing your categories, or switching to a stronger model.

Custom Skills

Skills are text files that tell the AI what kind of output to produce. They act as "personality profiles" for the idea generator. ScholarScout comes with built-in skills, but you can create your own.

How skills work

When you select a goal (like "Thesis" or "Hackathon"), ScholarScout loads a corresponding skill file and adds it to the AI prompt. The skill file contains instructions like:

  • What constraints to follow (timeline, budget, scope)
  • What the output should look like
  • What to avoid (anti-patterns)
  • What makes a good result for this specific goal

For example, the "Hackathon" skill tells the AI: "Ideas must be buildable in 24-48 hours by a small team, use freely available tools, and have a clear demo-able output."

Built-in skills

skills/
├── ACADEMIC/                  (research-oriented goals)
│   ├── UNDERGRADUATE/SKILL.md
│   ├── MASTERS/SKILL.md
│   ├── PHD/SKILL.md
│   ├── THESIS/SKILL.md
│   ├── PUBLICATION/SKILL.md
│   ├── GRANT_PROPOSAL/SKILL.md
│   ├── LAB_SCIENTIST/SKILL.md
│   ├── CLINICAL_RESEARCHER/SKILL.md
│   └── DATA_SCIENTIST/SKILL.md
├── PRODUCT/                   (build something new)
│   ├── HACKATHON/SKILL.md
│   ├── SIDE_PROJECT/SKILL.md
│   ├── AI_TOOL/SKILL.md
│   └── INDUSTRY_RND/SKILL.md
└── DEVELOP/                   (improve existing project)
    ├── FEATURE/SKILL.md
    ├── INTEGRATION/SKILL.md
    ├── OPTIMIZATION/SKILL.md
    ├── EXTENSION/SKILL.md
    └── PIVOT/SKILL.md

Creating your own skill

You can create a custom skill for your specific needs. For example, if you are writing a specific type of grant proposal, you can create a skill that tailors the output to that format.

Step-by-step

  1. Decide which mode your skill belongs to:
    • skills/ACADEMIC/ — for research goals
    • skills/PRODUCT/ — for building products
    • skills/DEVELOP/ — for improving existing projects
  2. Create a new folder with your skill name (use UPPERCASE and underscores):
    skills/ACADEMIC/MY_CUSTOM_SKILL/
  3. Create a file called SKILL.md inside that folder
  4. Write your skill instructions in that file (see template below)
  5. Restart ScholarScout — your skill will automatically appear as a goal option

Skill file template

# [Your Skill Name]

## Profile
- Duration: [how long the project should take]
- Resources: [what the user has access to]
- Budget: [financial constraints]
- Scope: [how big/small the project should be]

## Constraints
- Ideas MUST [requirement 1]
- Ideas MUST [requirement 2]
- Ideas MUST NOT [anti-pattern 1]

## Output expectations
- Each idea should include [specific elements]
- A good idea for this goal looks like [description]

## Anti-patterns (avoid these)
- [Common mistake 1]
- [Common mistake 2]

Tips for writing good skills

  • Keep it under 2000 characters (longer files get truncated)
  • Be specific about constraints — vague instructions produce vague output
  • Include examples of what a good result looks like
  • List anti-patterns (what NOT to do) — this helps the AI avoid common mistakes

Sharing skills

If you create a useful skill (e.g., "How to write an NIH R01 grant proposal" or "Agricultural extension project for developing countries"), consider sharing it with the community by submitting a pull request on GitHub. Skills are the easiest way to contribute to ScholarScout.

Architecture

This section explains how ScholarScout works internally. Useful if you want to contribute code or understand what happens when you click "Run".

Pipeline flow

When you click "Run", here is what happens behind the scenes:

User clicks Run
  → Browser sends request to /api/run
  → Server spawns a background process (run_pipeline.py)
  → Orchestrator takes over:
      → Phase 1: Fetch papers
          (queries 3-4 databases based on your categories)
      → Phase 2: Analyze trends
          (AI reads papers, identifies keywords, gaps, saturation)
      → Phase 3: Generate ideas
          (AI generates ideas in academic/product/develop mode)
      → Phase 4: Write output
          (saves CSV + JSON snapshot + updates session history)
  → Progress is streamed back to the browser in real-time (SSE)

Key modules

ModuleWhat it does
preview_server.py Entry point. Starts the web server. Very small file (~40 lines).
src/core/orchestrator.py Pipeline controller. Coordinates all four phases in sequence.
src/core/analyzer.py Trend analysis. Sends papers to the AI and asks for keywords, gaps, and saturation levels.
src/core/generator.py Idea generation. Three modes (academic, product, develop). Handles chunked generation for large outputs.
src/core/deep_dive.py Detailed analysis per idea. Also handles grounding verification (comparing output to source paper).
src/core/novelty_checker.py Checks if an idea already exists. Uses semantic similarity and keyword overlap.
src/core/llm.py Multi-provider AI client. Handles Gemini, Groq, OpenAI, Ollama, OpenRouter, and custom endpoints.
src/core/config.py Configuration management. Reads config.yaml, handles feature flags and paths.
src/core/fetchers/ Paper fetching. One file per database (arxiv_fetcher.py, pubmed_fetcher.py, etc). All inherit from BaseFetcher.

Web routes

Route fileEndpoints
src/web/routes/pipeline.py/api/run, /api/progress — Start pipeline, stream progress
src/web/routes/ideas.py/api/quick — Quick mode idea generation
src/web/routes/analysis.py/api/deepdive, /api/novelty — Deep Dive and novelty check
src/web/routes/settings.py/api/settings — Read/write config, test connection
src/web/routes/sessions.py/api/sessions — Session history
src/web/routes/upload.py/api/upload — File upload for context

Data files

FilePurpose
data/papers_cache.jsonCached papers from previous runs (used by Quick mode)
data/session_history.jsonRecord of all past sessions
data/pipeline_progress.jsonlReal-time progress log (read by SSE stream)
data/snapshot_*.jsonFull results from each run
config.yamlUser settings (provider, API key, feature flags)

Feature flags

Advanced features can be toggled on/off in config.yaml under the features: section, or via environment variables:

  • FEATURE_REFINE / SCOUT_REFINE=1 — Self-distillation (AI refines its own output)
  • FEATURE_SENSITIVITY / SCOUT_SENSITIVITY=1 — Prompt sensitivity check
  • FEATURE_GROUNDING / SCOUT_GROUNDING=1 — Deep Dive grounding verification
  • CACHE_EXPIRY_DAYS — How many days before cached papers expire

Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run unit tests only
pytest tests/ -m "not integration"

# Run all tests (needs Flask)
pytest tests/

# Run JavaScript tests
npm test

Optional API Keys

ScholarScout works without any database API keys. However, two optional keys can improve your experience by unlocking additional databases or higher rate limits.

Semantic Scholar API Key (S2_API_KEY)

What it does: Increases your Semantic Scholar rate limit from 100 requests per 5 minutes to 10,000 requests per 5 minutes. Without this key, Semantic Scholar still works — you just might hit rate limits if you run many searches quickly.

Cost: Free

How to get it

  1. Go to https://www.semanticscholar.org/product/api
  2. Click "Get API Key" or "Request API Key"
  3. Fill in the form (name, email, what you are using it for — just say "academic research tool")
  4. You will receive your API key by email (usually within a few minutes)

How to set it up

Windows (Command Prompt):

set S2_API_KEY=your-key-here
python preview_server.py

Windows (PowerShell):

$env:S2_API_KEY="your-key-here"
python preview_server.py

Mac / Linux:

export S2_API_KEY="your-key-here"
python3 preview_server.py

To make it permanent (so you do not have to type it every time):

  • Windows: Search for "Environment Variables" in the Start menu, click "Edit the system environment variables", click "Environment Variables", then add a new User variable with name S2_API_KEY and your key as the value.
  • Mac/Linux: Add export S2_API_KEY="your-key-here" to your ~/.bashrc or ~/.zshrc file.

Scopus API Key (SCOPUS_API_KEY)

What it does: Enables the Scopus database (90M+ records covering engineering, chemistry, materials science, and more). Without this key, Scopus is simply skipped — the other 7 databases still work fine.

Cost: Free for academic and research use

How to get it

  1. Go to https://dev.elsevier.com/
  2. Click "Register" to create an account
  3. After registering, go to "My API Key" section
  4. Click "Create API Key"
  5. Fill in the details:
    • Label: anything (e.g., "ScholarScout")
    • Website: can be left blank or use your institution's website
  6. Your API key will be displayed. Copy it.
Note: Elsevier's free API access is intended for academic and research purposes. Commercial use requires a separate agreement. If you are a student or researcher at a university, you qualify for free access.

How to set it up

Windows (Command Prompt):

set SCOPUS_API_KEY=your-key-here
python preview_server.py

Windows (PowerShell):

$env:SCOPUS_API_KEY="your-key-here"
python preview_server.py

Mac / Linux:

export SCOPUS_API_KEY="your-key-here"
python3 preview_server.py

To make it permanent, follow the same steps as described for S2_API_KEY above.

Do I really need these keys?

For most users: No. The 6 free databases (arXiv, OpenAlex, PubMed, Crossref, DOAJ, DBLP) plus the free tier of Semantic Scholar cover the vast majority of academic fields. You only need these optional keys if:

  • You are running many searches in quick succession and hitting Semantic Scholar rate limits
  • You specifically need Scopus data (engineering, chemistry, materials science with citation metrics)

Contributing

ScholarScout is open source and welcomes contributions. Here are the main ways you can help.

Ways to contribute

Contribution typeDifficultyDescription
Write a skill fileEasyCreate a SKILL.md for your domain expertise (no coding needed)
Report a bugEasyOpen an issue on GitHub describing what went wrong
Improve promptsMediumMake the AI generate better output by refining prompt templates
Add a fetcherMediumAdd support for a new academic database
Fix a bugMediumPick an open issue and submit a fix
Add a featureHardImplement a new capability

Adding a new fetcher (new database)

If you know of an academic database that ScholarScout does not support yet, you can add it:

  1. Create a new file: src/core/fetchers/my_fetcher.py
  2. Make it inherit from BaseFetcher (see src/core/fetchers/base.py for the interface)
  3. Implement the fetch_papers(category, max_results) method that returns a list of Paper objects
  4. Add a category-to-keyword mapping (so the fetcher knows what to search for each category)
  5. Register your fetcher in src/core/orchestrator.py:
    • Add it to self.all_fetchers dictionary
    • Add it to relevant entries in self._source_routes
  6. Make sure your fetcher returns [] (empty list) gracefully if the API key is missing or the category is not supported

Improving prompts

The AI prompts are in these files:

  • src/core/analyzer.py — Trend analysis prompts
  • src/core/generator.py — Idea generation prompts (one per mode)
  • src/core/deep_dive.py — Deep Dive analysis prompts

When improving prompts:

  • Test with at least 2 different providers (Gemini + Groq recommended)
  • Make sure the output format does not change (other code parses the AI's response)
  • Keep prompts concise — longer prompts cost more tokens and can confuse smaller models
  • Document what you changed and why in your pull request

Development setup

# Clone the repository
git clone https://github.com/neej4/ScholarScout.git
cd ScholarScout

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest tests/ -m "not integration"

# Start the server
python preview_server.py

Pull request guidelines

  • Create a new branch for your changes (do not commit directly to main)
  • Keep changes focused — one feature or fix per pull request
  • Add tests for new functionality
  • Make sure existing tests still pass
  • Write a clear description of what you changed and why

Reporting bugs

Open an issue at github.com/neej4/ScholarScout/issues with:

  • What you expected to happen
  • What actually happened
  • Steps to reproduce the problem
  • Your operating system and Python version
  • Which AI provider you are using
  • Any error messages from the terminal