Getting Started
This guide walks you through installing ScholarScout on your computer, step by step. No programming experience needed. If you can copy-paste text, you can do this.
What is ScholarScout?
ScholarScout is a free tool that reads academic papers from 8 databases and uses AI to generate research ideas, product concepts, or feature suggestions. It runs on your own computer (not in the cloud), so your data stays private.
What you need before starting
- A computer running Windows, Mac, or Linux
- Internet connection (to download papers and talk to the AI)
- About 10 minutes for the initial setup
You do NOT need to know how to code. Just follow the steps below.
Step 1: Install Python
Python is the programming language ScholarScout is built with. You need it installed on your computer.
Windows
- Go to python.org/downloads
- Click the big yellow "Download Python" button
- Run the downloaded file
- IMPORTANT: Check the box that says "Add Python to PATH" at the bottom of the installer window
- Click "Install Now"
- Wait for it to finish, then close the installer
Mac
- Go to python.org/downloads
- Click the big yellow "Download Python" button
- Open the downloaded .pkg file
- Follow the installer steps (just keep clicking "Continue" and "Agree")
- Click "Install"
Linux (Ubuntu/Debian)
Open a terminal and type:
sudo apt update
sudo apt install python3 python3-pip python3-venv
Check if Python is installed correctly
Open a terminal (or Command Prompt on Windows) and type:
python --version
You should see something like Python 3.12.4. Any number 3.10 or higher is fine. If you see an error on Mac/Linux, try python3 --version instead.
Step 2: Download ScholarScout
You have two options:
Option A: Download as ZIP (easiest)
- Go to github.com/neej4/ScholarScout
- Click the green "Code" button
- Click "Download ZIP"
- Extract the ZIP file to a folder you can find easily (like your Desktop or Documents)
Option B: Use Git (if you have it installed)
git clone https://github.com/neej4/ScholarScout.git
Step 3: Install dependencies
ScholarScout needs some extra Python packages to work. Here is how to install them:
Windows
- Open File Explorer and navigate to the ScholarScout folder
- Click on the address bar at the top (where it shows the folder path)
- Type
cmdand press Enter. This opens a command window in that folder. - Type the following and press Enter:
pip install -r requirements.txt
Mac / Linux
- Open Terminal
- Navigate to the ScholarScout folder. For example, if it is on your Desktop:
cd ~/Desktop/ScholarScout
pip3 install -r requirements.txt
Wait for all packages to download and install. This may take 1-2 minutes.
Step 4: Get a free AI API key
ScholarScout needs an AI service to generate ideas. The easiest free option is Google Gemini:
- Go to aistudio.google.com/app/apikey
- Sign in with your Google account
- Click "Create API Key"
- Copy the key (it looks like a long string of random letters and numbers)
- Keep this key somewhere safe. You will paste it into ScholarScout in the next step.
Step 5: Start ScholarScout
In the same terminal/command window from Step 3:
python preview_server.py
(On Mac/Linux, use python3 preview_server.py if python does not work.)
You should see a message saying the server is running. Now open your web browser and go to:
http://localhost:5050
Step 6: First-time setup wizard
When you open ScholarScout for the first time, a setup wizard appears. Follow these steps:
- Choose a provider: Select "Gemini" (or whichever provider you got a key for)
- Paste your API key: Paste the key you copied in Step 4
- Test connection: Click the "Test" button. You should see a green checkmark.
- Pick categories: Choose 2-3 research fields you are interested in (e.g., "Machine Learning", "Medicine", "Agriculture")
- Done! Click finish to close the wizard
Step 7: Generate your first ideas
You are now ready to use ScholarScout. You have two buttons:
- Quick — Generates ideas instantly (~10 seconds) using AI knowledge. Great for a first test.
- Run — Fetches fresh papers from academic databases, analyzes trends, then generates ideas (2-5 minutes). More thorough and grounded in real papers.
Try clicking Quick first to see how it works. Then try Run for the full experience.
Troubleshooting
| Problem | Solution |
|---|---|
python is not recognized |
On Windows: reinstall Python and make sure "Add to PATH" is checked. On Mac/Linux: try python3 instead. |
pip install fails |
Try pip3 install -r requirements.txt or python -m pip install -r requirements.txt |
| Port 5050 already in use | Another program is using that port. Close it, or edit preview_server.py and change the port number. |
| "LLM unreachable" error | Your API key might be wrong. Go to Settings tab and re-paste your key. Click "Test Connection". |
| 0 papers fetched | You hit a rate limit. Wait 5 minutes and try again, or use Quick mode (which does not fetch papers). |
| Page is blank / nothing loads | Make sure the terminal still shows the server running. If it crashed, run python preview_server.py again. |
Three Modes
ScholarScout can generate three different types of output from the same papers. Think of it like asking three different experts to read the same research and give you different kinds of advice.
Academic Mode
Who is this for? Students (undergraduate, masters, PhD), researchers, anyone writing a thesis or paper.
What it produces:
- Research topic suggestions with clear research questions
- Suggested methodology (how to actually do the research)
- Key papers you should read
- Novelty check (is this idea actually new?)
- Quality score based on feasibility and originality
Available goals:
- Any — General exploration, no constraints
- Thesis — Scoped for a thesis project (6-12 months)
- Publication — Aimed at publishable research
- Grant Proposal — Framed for funding applications
Example: A medical student selects categories "Medicine" and "Machine Learning", sets goal to "Thesis". ScholarScout reads recent papers and suggests: "Predicting antibiotic resistance patterns using transformer models on hospital lab data — gap: no existing study combines temporal lab sequences with resistance metadata."
Product Mode
Who is this for? Entrepreneurs, developers, hackathon participants, anyone who wants to build something.
What it produces:
- Product name and one-line description
- MVP features (minimum viable product — what to build first)
- Suggested tech stack
- Revenue model ideas
- Existing competitors and how your idea differs
Available goals:
- Hackathon — Buildable in 24-48 hours
- Side Project — Weekend/hobby scope
- AI Tool — AI-powered product ideas
- Industry R&D — Enterprise-scale research and development
Example: A developer selects "Natural Language Processing" category, sets goal to "Hackathon". ScholarScout suggests: "PaperBrief — a browser extension that summarizes any arXiv paper into a 3-bullet TL;DR using the retrieval-augmented approach from [recent paper]."
Develop Mode
Who is this for? Developers who already have a project and want to improve it using ideas from recent research.
What it produces:
- Feature suggestions directly applicable to YOUR project
- Integration opportunities (connect your project with new techniques)
- Performance optimizations based on recent papers
Available goals:
- Feature — New functionality to add
- Integration — Connect with external systems or techniques
- Optimization — Make existing features faster or better
- Extension — Expand scope of your project
- Pivot — Explore new directions for your project
Important: In Develop mode, you MUST describe your project in the "Context" field. Every generated idea will be specifically about improving your project. If you leave the context empty, the results will be generic.
Example: You describe your project as "A mobile app for tracking plant growth using phone camera photos." ScholarScout reads computer vision papers and suggests: "Add disease detection using the few-shot learning approach from [paper] — requires only 5 example images per disease type."
How to choose the right mode
| Your situation | Mode | Goal |
|---|---|---|
| I need a thesis topic | Academic | Thesis |
| I want to explore what is new in my field | Academic | Any |
| I have a hackathon this weekend | Product | Hackathon |
| I want to build a startup idea | Product | Side Project or AI Tool |
| I have an app and want new features | Develop | Feature |
| My project is slow, I want to optimize it | Develop | Optimization |
Data Sources
ScholarScout fetches papers from 8 academic databases. You do not need to configure anything — the system automatically picks the best 3-4 databases based on your chosen research categories.
The 8 databases
| Source | Size | API Key Needed? | What it covers |
|---|---|---|---|
| arXiv | 2.4M+ preprints | No | Computer Science, Physics, Mathematics. Full-text open access preprints (papers before peer review). |
| OpenAlex | 250M+ works | No | All academic fields. The largest open catalog of scholarly works. Very reliable metadata. |
| Semantic Scholar | 200M+ papers | Optional | All fields. Especially good for citation data (who cites whom). Works without a key but has rate limits. |
| PubMed | 36M+ articles | No | Biomedical and life sciences. The go-to database for medicine, nursing, pharmacy, biology. |
| Crossref | 150M+ DOI records | No | All fields. Covers most published journal articles. Good for finding DOIs and publication metadata. |
| DOAJ | 9M+ open access | No | Social sciences, agriculture, education, regional journals. Only open-access articles. |
| Scopus | 90M+ records | Yes (free for academics) | Engineering, chemistry, materials science. Strong citation metrics. Requires a free API key from Elsevier. |
| DBLP | 6M+ CS papers | No | Computer Science conference papers (NeurIPS, ICML, ACL, CVPR, etc). Very focused on CS. |
Smart source routing
When you select research categories, ScholarScout automatically picks the best databases for your field. Here is how it decides:
| Your field | Databases used |
|---|---|
| Computer Science, Statistics, Electrical Engineering | arXiv + Semantic Scholar + OpenAlex + DBLP |
| Medicine | PubMed + Semantic Scholar + Crossref + Scopus |
| Biology | PubMed + OpenAlex + Crossref + DOAJ |
| Physics | arXiv + Semantic Scholar + OpenAlex + Crossref |
| Engineering | Crossref + OpenAlex + Semantic Scholar + Scopus |
| Chemistry | Crossref + OpenAlex + Semantic Scholar + PubMed |
| Mathematics | arXiv + Semantic Scholar + OpenAlex + Crossref |
| Social Sciences | Crossref + OpenAlex + DOAJ + Semantic Scholar |
| Earth Sciences, Agriculture | Crossref + OpenAlex + DOAJ + PubMed |
This means if you are a medical researcher, ScholarScout will NOT waste time searching arXiv (which mostly has CS/Physics papers). It goes straight to PubMed and other medical databases.
Do I need API keys for the databases?
Short answer: No. Six out of eight databases work without any key. ScholarScout works out of the box.
The two optional keys are:
- Semantic Scholar (S2_API_KEY) — Without a key, you get 100 requests per 5 minutes. With a free key, you get 10,000 requests per 5 minutes. Only matters if you run many searches quickly.
- Scopus (SCOPUS_API_KEY) — Required to use Scopus at all. Free for academic/research use. If you do not set this key, Scopus is simply skipped (the other 7 databases still work).
See the section for setup instructions.
LLM Providers
ScholarScout uses an AI language model (LLM) to analyze papers and generate ideas. You need to connect it to at least one AI provider. There are free options available.
You only need ONE provider. Pick one and follow the steps below.
Gemini (recommended for beginners)
Google's AI model. Free, fast, and the easiest to set up.
- Cost: Free (15 requests per minute)
- Speed: Fast
- Best for: Most users. The free tier is generous enough for daily use.
How to get your free Gemini API key
- Go to https://aistudio.google.com/app/apikey
- Sign in with your Google account (any Gmail account works)
- Click "Create API Key"
- If asked to select a project, click "Create API key in new project"
- Your key will appear. It looks something like:
AIzaSyB...(a long string of letters and numbers) - Click the copy button next to the key
- Go back to ScholarScout, open Settings, select "Gemini" as provider, and paste your key
- Click "Test Connection" to verify it works
Groq (fast alternative)
Groq runs AI models on specialized hardware, making it very fast. Also has a free tier.
- Cost: Free tier available
- Speed: Very fast (often faster than Gemini)
- Best for: Users who want speed, or as a backup when Gemini is rate-limited.
How to get your free Groq API key
- Go to https://console.groq.com/keys
- Create an account (you can sign up with Google, GitHub, or email)
- Once logged in, you will see the API Keys page
- Click "Create API Key"
- Give it a name (anything, like "ScholarScout")
- Copy the key that appears (you will only see it once!)
- Go back to ScholarScout, open Settings, select "Groq" as provider, and paste your key
- Click "Test Connection" to verify it works
Ollama (fully local, no internet needed)
Ollama runs AI models directly on your computer. No data leaves your machine. Good for privacy-sensitive research.
- Cost: Free (uses your computer's processing power)
- Speed: Depends on your computer's GPU. Slow on older machines.
- Best for: Users who need complete privacy or do not have reliable internet.
How to set up Ollama
- Go to https://ollama.com/download
- Download and install Ollama for your operating system
- Open a terminal and run:
ollama pull llama3.2(this downloads the AI model, about 2-4 GB) - Keep Ollama running in the background
- In ScholarScout Settings, select "Ollama" as provider. No API key needed.
OpenRouter (access to 100+ models)
A gateway that lets you use many different AI models through one API key. Pay-per-use.
- Cost: Pay per token (varies by model, some are very cheap)
- Speed: Varies by model
- Get key: https://openrouter.ai/keys
- Best for: Power users who want to try different models.
OpenAI (GPT models)
The company behind ChatGPT. High quality but costs money.
- Cost: Pay per token (about $0.01-0.03 per ScholarScout run)
- Speed: Fast
- Get key: https://platform.openai.com/api-keys
- Best for: Users who already have an OpenAI account and want high-quality output.
Custom endpoint
Any AI service that uses the OpenAI-compatible API format. This includes LM Studio, vLLM, and other local AI servers.
- Base URL: Your server address (e.g.,
http://localhost:1234/v1) - Model: Whatever model your server is running
- API key: Optional (most local servers do not need one)
Which provider should I choose?
| Priority | Choose |
|---|---|
| I want free and easy | Gemini |
| I want free and fast | Groq |
| I need complete privacy | Ollama |
| I want the best quality | OpenAI (paid) |
| I want to experiment with models | OpenRouter |
Understanding Results
After ScholarScout finishes running, you will see a list of idea cards. This section explains what everything means.
Idea cards
Each card represents one generated idea. Here is what you will see on each card:
- Title — A short name for the idea
- Description — A 2-3 sentence summary of what the idea is about
- Source paper(s) — The academic paper(s) that inspired this idea
- Category — Which research field this belongs to
- Quality score — A number indicating how promising the idea is
In Academic mode, cards also show methodology suggestions and research questions. In Product mode, cards show MVP features and tech stack. In Develop mode, cards show how the idea applies to your specific project.
Quality score
Each idea gets a quality score from 1 to 10. This is the AI's estimate of how good the idea is, based on:
- Novelty — Is this idea actually new? Or has it been done before?
- Feasibility — Can this realistically be done with available resources?
- Impact — If successful, would this matter to the field?
- Clarity — Is the idea well-defined enough to act on?
A score of 7+ is generally a strong idea worth exploring further. Scores of 4-6 are decent but may need refinement. Below 4 means the idea is vague or already well-explored.
Novelty check
ScholarScout checks whether your generated idea already exists in the literature. It does this by:
- Comparing the idea's text against titles and abstracts of papers in the database
- Using both semantic similarity (meaning) and keyword overlap (exact words)
- Flagging ideas that are too similar to existing work
If an idea is flagged as "low novelty," it means similar research already exists. This does not mean the idea is bad — it means you should read those existing papers first and find what makes your angle different.
Deep Dive
Click on any idea card to open the Deep Dive view. This gives you a detailed breakdown:
- Research outline — Step-by-step plan for how to pursue this idea
- Methodology — Specific methods, tools, or approaches to use
- Expected challenges — What might go wrong and how to handle it
- Related work — Other papers and projects in this space
- Timeline estimate — How long this might take
Deep Dive is generated on-demand (when you click the card), so it takes a few seconds to load.
Grounding badges
In the Deep Dive view, you may see colored badges next to each section. These are "grounding indicators" that tell you how closely the AI's output matches the source paper:
| Badge color | Meaning | What to do |
|---|---|---|
| Green (Source-aligned) | This section closely reflects what the source paper actually says | Good. The AI is sticking to the facts from the paper. |
| Yellow (Partially aligned) | Some claims are supported by the paper, but some may be inferred or extrapolated | Reasonable, but double-check specific claims against the original paper. |
| Red (Low alignment) | This section may not be directly supported by the source paper | The AI may be generating from general knowledge rather than the paper. Verify manually. |
Bookmarks
Click the bookmark icon on any idea card to save it for later. Bookmarks are stored in your browser (not on a server), so they persist between sessions but are specific to your browser.
Tips & Tricks
These are best practices from experienced users to help you get better results from ScholarScout.
Start with Quick mode
Before running a full pipeline (which takes 2-5 minutes), try Quick mode first. It generates ideas in about 10 seconds using the AI's existing knowledge. This helps you:
- Verify your API key works
- See what kind of output to expect
- Refine your category selection before committing to a full run
Pick 2-3 categories maximum
It is tempting to select many categories, but fewer is better:
- More categories = more API requests = higher chance of hitting rate limits
- More categories = longer processing time
- Focused searches produce more relevant ideas
If you want to explore broadly, do multiple runs with different category combinations rather than selecting everything at once.
Use the context field
The context field (in your Profile settings) dramatically improves results. Tell ScholarScout about:
- Your research background ("I am a 2nd year PhD student in computational biology")
- Your constraints ("I have access to hospital EHR data but no wet lab")
- Your interests ("I am interested in the intersection of NLP and clinical notes")
- Your existing project (especially important for Develop mode)
The more specific your context, the more tailored the ideas will be.
Upload a document for better context
You can upload a PDF, text file, or markdown file as additional context. Good candidates:
- Your thesis proposal draft
- A paper you want to build upon
- Your project's README file
- A grant call description
The uploaded document is read by the AI and used to make ideas more relevant to your specific situation.
Run at different times of day
Free API tiers (Gemini, Groq, Semantic Scholar) have rate limits. If you get errors or 0 papers fetched:
- Wait 5-10 minutes and try again
- Try running during off-peak hours (early morning or late evening in US time zones)
- Use Quick mode as a fallback (it does not fetch papers, so no rate limits from databases)
Combine modes for the same topic
Try running the same categories in different modes:
- First, run in Academic mode to understand the research landscape
- Then, run in Product mode to see what could be built from those papers
- If you have a project, run in Develop mode to find applicable techniques
Each mode reads the same papers but asks different questions, giving you three different perspectives.
Use Deep Dive selectively
Deep Dive generates a detailed analysis for one idea. It costs one additional AI request per idea. Tips:
- Only Deep Dive ideas with a quality score of 6 or higher
- Read the short description first — if it does not interest you, skip the Deep Dive
- Use Deep Dive results as a starting point, not a final plan
Check the source papers
Every idea links back to the paper(s) that inspired it. Always:
- Read the abstract of the source paper
- Check if the paper is from a reputable venue
- Verify that the AI's interpretation matches what the paper actually says
ScholarScout is a brainstorming tool, not a fact-checker. The AI can misinterpret papers or make connections that do not hold up under scrutiny.
Save good ideas with bookmarks
Use the bookmark feature to save promising ideas. Then come back later with fresh eyes to evaluate them. Ideas that still seem good after a day or two are worth pursuing.
Frequently Asked Questions
General
What is ScholarScout?
ScholarScout is a free, open-source tool that reads academic papers from 8 databases and uses AI to generate research ideas, product concepts, or feature suggestions for your existing projects. It runs entirely on your own computer.
Is it really free?
Yes. ScholarScout itself is free and open source (MIT license). You need an AI API key to use it — both Google Gemini and Groq offer free tiers that are more than enough for regular use. You never need to pay anything.
Does it send my data to the cloud?
Your research queries and context are sent to the AI provider you choose (Gemini, Groq, etc.) for processing. Nothing else leaves your computer. There is no telemetry, no tracking, no analytics, no cloud storage. All generated ideas, cached papers, and session history stay on your machine.
What languages does it support?
The interface and output are in English. Papers are fetched in whatever language they are available in on the source databases (mostly English, but OpenAlex and Crossref include papers in many languages).
Do I need to know how to code?
No. You need to be able to copy-paste commands into a terminal (the Getting Started guide walks you through this step by step), but you do not need to write any code or understand programming.
Can I use it offline?
Partially. If you use Ollama (local AI) and only use Quick mode with cached papers, you can work offline. But fetching new papers from databases requires internet, and cloud AI providers (Gemini, Groq, OpenAI) require internet.
Usage questions
What is the difference between Quick and Run?
- Run — Fetches fresh papers from academic databases (auto-selects best 3-4 per category), analyzes trends, then generates ideas. Takes 2-5 minutes. Results are grounded in real, recent papers.
- Quick — Generates ideas instantly (~10 seconds) using the AI's knowledge and any cached papers from previous runs. Faster but less grounded.
Use Quick for fast brainstorming and testing. Use Run when you want thorough, paper-backed results.
Why did I get 0 papers or 0 ideas?
Usually this means you hit a rate limit. The most common cause is Semantic Scholar's free tier (100 requests per 5 minutes). Solutions:
- Wait 5 minutes and try again
- Use Quick mode (does not fetch papers)
- Select fewer categories (each category triggers multiple API calls)
- Get a free Semantic Scholar API key for higher limits (see Optional API Keys section)
Can I use my own local AI model?
Yes. Two options: (1) Ollama — download and run models locally with no internet needed after setup. (2) Custom endpoint — point ScholarScout at any OpenAI-compatible API server (LM Studio, vLLM, etc).
How do I change my AI provider or API key?
Click the Settings tab in the dashboard. You can change your provider, model, and API key at any time. Click "Test Connection" after making changes to verify everything works.
Where are my results saved?
Results are saved in the data/ folder inside the ScholarScout directory:
session_history.json— All your past sessionssnapshot_*.json— Detailed results from each runpapers_cache.json— Cached papers (used by Quick mode)scholarscout_ideas_*.csv— Ideas exported as spreadsheet-compatible CSV
Can I export my ideas?
Yes. Each run automatically creates a CSV file in the data/ folder. You can open CSV files in Excel, Google Sheets, or any spreadsheet program.
How do I update ScholarScout?
If you used Git to download it:
cd ScholarScout
git pull
pip install -r requirements.txt
If you downloaded the ZIP: download the latest ZIP from GitHub, extract it, and replace your old files (keep your data/ folder and config.yaml to preserve your settings and history).
The setup wizard does not appear. How do I reconfigure?
The wizard only shows on first launch. To access settings later, use the Settings tab in the dashboard. If you want to force the wizard to appear again, clear your browser's localStorage for localhost:5050.
Quality and accuracy
Are the generated ideas guaranteed to be novel?
No. ScholarScout includes a novelty check that compares ideas against existing papers, but this is not exhaustive. Always do your own literature review before committing to an idea. Think of ScholarScout as a brainstorming partner, not a novelty guarantee.
Can I trust the Deep Dive analysis?
Deep Dive is AI-generated and should be treated as a starting point, not a final plan. Check the grounding badges — green means the content closely matches the source paper. Yellow and red sections may include AI extrapolation. Always verify claims against the original papers.
Why do some ideas seem generic or obvious?
This can happen when: (1) your context field is empty or too vague, (2) you selected very broad categories, or (3) the AI model is not powerful enough. Try adding more specific context, narrowing your categories, or switching to a stronger model.
Custom Skills
Skills are text files that tell the AI what kind of output to produce. They act as "personality profiles" for the idea generator. ScholarScout comes with built-in skills, but you can create your own.
How skills work
When you select a goal (like "Thesis" or "Hackathon"), ScholarScout loads a corresponding skill file and adds it to the AI prompt. The skill file contains instructions like:
- What constraints to follow (timeline, budget, scope)
- What the output should look like
- What to avoid (anti-patterns)
- What makes a good result for this specific goal
For example, the "Hackathon" skill tells the AI: "Ideas must be buildable in 24-48 hours by a small team, use freely available tools, and have a clear demo-able output."
Built-in skills
skills/
├── ACADEMIC/ (research-oriented goals)
│ ├── UNDERGRADUATE/SKILL.md
│ ├── MASTERS/SKILL.md
│ ├── PHD/SKILL.md
│ ├── THESIS/SKILL.md
│ ├── PUBLICATION/SKILL.md
│ ├── GRANT_PROPOSAL/SKILL.md
│ ├── LAB_SCIENTIST/SKILL.md
│ ├── CLINICAL_RESEARCHER/SKILL.md
│ └── DATA_SCIENTIST/SKILL.md
├── PRODUCT/ (build something new)
│ ├── HACKATHON/SKILL.md
│ ├── SIDE_PROJECT/SKILL.md
│ ├── AI_TOOL/SKILL.md
│ └── INDUSTRY_RND/SKILL.md
└── DEVELOP/ (improve existing project)
├── FEATURE/SKILL.md
├── INTEGRATION/SKILL.md
├── OPTIMIZATION/SKILL.md
├── EXTENSION/SKILL.md
└── PIVOT/SKILL.md
Creating your own skill
You can create a custom skill for your specific needs. For example, if you are writing a specific type of grant proposal, you can create a skill that tailors the output to that format.
Step-by-step
- Decide which mode your skill belongs to:
skills/ACADEMIC/— for research goalsskills/PRODUCT/— for building productsskills/DEVELOP/— for improving existing projects
- Create a new folder with your skill name (use UPPERCASE and underscores):
skills/ACADEMIC/MY_CUSTOM_SKILL/ - Create a file called
SKILL.mdinside that folder - Write your skill instructions in that file (see template below)
- Restart ScholarScout — your skill will automatically appear as a goal option
Skill file template
# [Your Skill Name]
## Profile
- Duration: [how long the project should take]
- Resources: [what the user has access to]
- Budget: [financial constraints]
- Scope: [how big/small the project should be]
## Constraints
- Ideas MUST [requirement 1]
- Ideas MUST [requirement 2]
- Ideas MUST NOT [anti-pattern 1]
## Output expectations
- Each idea should include [specific elements]
- A good idea for this goal looks like [description]
## Anti-patterns (avoid these)
- [Common mistake 1]
- [Common mistake 2]
Tips for writing good skills
- Keep it under 2000 characters (longer files get truncated)
- Be specific about constraints — vague instructions produce vague output
- Include examples of what a good result looks like
- List anti-patterns (what NOT to do) — this helps the AI avoid common mistakes
Sharing skills
If you create a useful skill (e.g., "How to write an NIH R01 grant proposal" or "Agricultural extension project for developing countries"), consider sharing it with the community by submitting a pull request on GitHub. Skills are the easiest way to contribute to ScholarScout.
Architecture
This section explains how ScholarScout works internally. Useful if you want to contribute code or understand what happens when you click "Run".
Pipeline flow
When you click "Run", here is what happens behind the scenes:
User clicks Run
→ Browser sends request to /api/run
→ Server spawns a background process (run_pipeline.py)
→ Orchestrator takes over:
→ Phase 1: Fetch papers
(queries 3-4 databases based on your categories)
→ Phase 2: Analyze trends
(AI reads papers, identifies keywords, gaps, saturation)
→ Phase 3: Generate ideas
(AI generates ideas in academic/product/develop mode)
→ Phase 4: Write output
(saves CSV + JSON snapshot + updates session history)
→ Progress is streamed back to the browser in real-time (SSE)
Key modules
| Module | What it does |
|---|---|
preview_server.py |
Entry point. Starts the web server. Very small file (~40 lines). |
src/core/orchestrator.py |
Pipeline controller. Coordinates all four phases in sequence. |
src/core/analyzer.py |
Trend analysis. Sends papers to the AI and asks for keywords, gaps, and saturation levels. |
src/core/generator.py |
Idea generation. Three modes (academic, product, develop). Handles chunked generation for large outputs. |
src/core/deep_dive.py |
Detailed analysis per idea. Also handles grounding verification (comparing output to source paper). |
src/core/novelty_checker.py |
Checks if an idea already exists. Uses semantic similarity and keyword overlap. |
src/core/llm.py |
Multi-provider AI client. Handles Gemini, Groq, OpenAI, Ollama, OpenRouter, and custom endpoints. |
src/core/config.py |
Configuration management. Reads config.yaml, handles feature flags and paths. |
src/core/fetchers/ |
Paper fetching. One file per database (arxiv_fetcher.py, pubmed_fetcher.py, etc). All inherit from BaseFetcher. |
Web routes
| Route file | Endpoints |
|---|---|
src/web/routes/pipeline.py | /api/run, /api/progress — Start pipeline, stream progress |
src/web/routes/ideas.py | /api/quick — Quick mode idea generation |
src/web/routes/analysis.py | /api/deepdive, /api/novelty — Deep Dive and novelty check |
src/web/routes/settings.py | /api/settings — Read/write config, test connection |
src/web/routes/sessions.py | /api/sessions — Session history |
src/web/routes/upload.py | /api/upload — File upload for context |
Data files
| File | Purpose |
|---|---|
data/papers_cache.json | Cached papers from previous runs (used by Quick mode) |
data/session_history.json | Record of all past sessions |
data/pipeline_progress.jsonl | Real-time progress log (read by SSE stream) |
data/snapshot_*.json | Full results from each run |
config.yaml | User settings (provider, API key, feature flags) |
Feature flags
Advanced features can be toggled on/off in config.yaml under the features: section, or via environment variables:
FEATURE_REFINE/SCOUT_REFINE=1— Self-distillation (AI refines its own output)FEATURE_SENSITIVITY/SCOUT_SENSITIVITY=1— Prompt sensitivity checkFEATURE_GROUNDING/SCOUT_GROUNDING=1— Deep Dive grounding verificationCACHE_EXPIRY_DAYS— How many days before cached papers expire
Testing
# Install dev dependencies
pip install -e ".[dev]"
# Run unit tests only
pytest tests/ -m "not integration"
# Run all tests (needs Flask)
pytest tests/
# Run JavaScript tests
npm test
Optional API Keys
ScholarScout works without any database API keys. However, two optional keys can improve your experience by unlocking additional databases or higher rate limits.
Semantic Scholar API Key (S2_API_KEY)
What it does: Increases your Semantic Scholar rate limit from 100 requests per 5 minutes to 10,000 requests per 5 minutes. Without this key, Semantic Scholar still works — you just might hit rate limits if you run many searches quickly.
Cost: Free
How to get it
- Go to https://www.semanticscholar.org/product/api
- Click "Get API Key" or "Request API Key"
- Fill in the form (name, email, what you are using it for — just say "academic research tool")
- You will receive your API key by email (usually within a few minutes)
How to set it up
Windows (Command Prompt):
set S2_API_KEY=your-key-here
python preview_server.py
Windows (PowerShell):
$env:S2_API_KEY="your-key-here"
python preview_server.py
Mac / Linux:
export S2_API_KEY="your-key-here"
python3 preview_server.py
To make it permanent (so you do not have to type it every time):
- Windows: Search for "Environment Variables" in the Start menu, click "Edit the system environment variables", click "Environment Variables", then add a new User variable with name
S2_API_KEYand your key as the value. - Mac/Linux: Add
export S2_API_KEY="your-key-here"to your~/.bashrcor~/.zshrcfile.
Scopus API Key (SCOPUS_API_KEY)
What it does: Enables the Scopus database (90M+ records covering engineering, chemistry, materials science, and more). Without this key, Scopus is simply skipped — the other 7 databases still work fine.
Cost: Free for academic and research use
How to get it
- Go to https://dev.elsevier.com/
- Click "Register" to create an account
- After registering, go to "My API Key" section
- Click "Create API Key"
- Fill in the details:
- Label: anything (e.g., "ScholarScout")
- Website: can be left blank or use your institution's website
- Your API key will be displayed. Copy it.
How to set it up
Windows (Command Prompt):
set SCOPUS_API_KEY=your-key-here
python preview_server.py
Windows (PowerShell):
$env:SCOPUS_API_KEY="your-key-here"
python preview_server.py
Mac / Linux:
export SCOPUS_API_KEY="your-key-here"
python3 preview_server.py
To make it permanent, follow the same steps as described for S2_API_KEY above.
Do I really need these keys?
For most users: No. The 6 free databases (arXiv, OpenAlex, PubMed, Crossref, DOAJ, DBLP) plus the free tier of Semantic Scholar cover the vast majority of academic fields. You only need these optional keys if:
- You are running many searches in quick succession and hitting Semantic Scholar rate limits
- You specifically need Scopus data (engineering, chemistry, materials science with citation metrics)
Contributing
ScholarScout is open source and welcomes contributions. Here are the main ways you can help.
Ways to contribute
| Contribution type | Difficulty | Description |
|---|---|---|
| Write a skill file | Easy | Create a SKILL.md for your domain expertise (no coding needed) |
| Report a bug | Easy | Open an issue on GitHub describing what went wrong |
| Improve prompts | Medium | Make the AI generate better output by refining prompt templates |
| Add a fetcher | Medium | Add support for a new academic database |
| Fix a bug | Medium | Pick an open issue and submit a fix |
| Add a feature | Hard | Implement a new capability |
Adding a new fetcher (new database)
If you know of an academic database that ScholarScout does not support yet, you can add it:
- Create a new file:
src/core/fetchers/my_fetcher.py - Make it inherit from
BaseFetcher(seesrc/core/fetchers/base.pyfor the interface) - Implement the
fetch_papers(category, max_results)method that returns a list of Paper objects - Add a category-to-keyword mapping (so the fetcher knows what to search for each category)
- Register your fetcher in
src/core/orchestrator.py:- Add it to
self.all_fetchersdictionary - Add it to relevant entries in
self._source_routes
- Add it to
- Make sure your fetcher returns
[](empty list) gracefully if the API key is missing or the category is not supported
Improving prompts
The AI prompts are in these files:
src/core/analyzer.py— Trend analysis promptssrc/core/generator.py— Idea generation prompts (one per mode)src/core/deep_dive.py— Deep Dive analysis prompts
When improving prompts:
- Test with at least 2 different providers (Gemini + Groq recommended)
- Make sure the output format does not change (other code parses the AI's response)
- Keep prompts concise — longer prompts cost more tokens and can confuse smaller models
- Document what you changed and why in your pull request
Development setup
# Clone the repository
git clone https://github.com/neej4/ScholarScout.git
cd ScholarScout
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest tests/ -m "not integration"
# Start the server
python preview_server.py
Pull request guidelines
- Create a new branch for your changes (do not commit directly to main)
- Keep changes focused — one feature or fix per pull request
- Add tests for new functionality
- Make sure existing tests still pass
- Write a clear description of what you changed and why
Reporting bugs
Open an issue at github.com/neej4/ScholarScout/issues with:
- What you expected to happen
- What actually happened
- Steps to reproduce the problem
- Your operating system and Python version
- Which AI provider you are using
- Any error messages from the terminal