Skip to content
STIMSMITH

STIMSMITH

Technical intelligence for CPU/microprocessor stimulus generation. Continuously compounds public-web knowledge about verification tools, techniques, papers, organizations, and standards.

Entities 2729
Last ingestion 2h ago

HOW IT WORKS

A daily Prefect pipeline discovers high-signal sources from arXiv, GitHub, Exa, FireCrawl, and RSS feeds, then ranks and gates candidates with Gemini and Claude. Each accepted source is fetched, archived to R2, and parsed via LiteParse (Mistral OCR fallback for PDFs). Chunks are embedded with Voyage AI into pgvector for semantic search; Claude Sonnet 4.6 extracts entities and relationships; MiniMax M3 synthesizes cited wiki articles from the accumulated evidence (escalating to GPT-5.5 for high-importance entities); entities and edges merge into a Neo4j knowledge graph. The frontend reads directly from Supabase and R2 — no backend API in the hot path.

ARCHITECTURE

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  Discovery   │───▶│  Fetch +     │───▶│  Chunk +     │
│  arXiv/GH/   │    │  Extract     │    │  Embed       │
│  Exa/RSS     │    │  (LiteParse/ │    │  (Voyage 3   │
│              │    │   Mistral)   │    │   Large/Code)│
└──────────────┘    └──────────────┘    └──────┬───────┘
                                               │
┌──────────────┐    ┌──────────────┐    ┌──────▼───────┐
│  Neo4j Graph │◀───│  Wiki Gen    │◀───│  Entity      │
│  + Digest    │    │  (MiniMax M3)│    │  Extraction  │
│              │    │              │    │  (Claude 4.6)│
└──────┬───────┘    └──────────────┘    └──────────────┘
       │
┌──────▼───────┐    ┌──────────────┐    ┌──────────────┐
│  Supabase    │───▶│  R2 Archive  │───▶│  SvelteKit   │
│  (PG +       │    │  (sources +  │    │  on Workers  │
│   pgvector)  │    │   snapshots) │    │  (frontend)  │
└──────────────┘    └──────────────┘    └──────────────┘

TECHNOLOGY STACK

Backend
Runtime — Python 3.13 + FastAPI + UV
Pipeline — Prefect flows + LangGraph agents
Parsing — LiteParse, Mistral OCR (PDF fallback)
Storage
Primary DB — Supabase (Postgres + pgvector)
Graph DB — Neo4j (entity relationships)
Archive — Cloudflare R2 (sources + graph snapshots)
AI Models
Extraction — Claude Sonnet 4.6 (entities + relationships) + GPT-5 (code artifacts), via OpenRouter
Wiki Gen — MiniMax M3 (GPT-5.5 for high-importance entities + weak-citation fallback, GPT-5.4-mini for simple refreshes)
Discovery — Gemini 2.5 Flash (triage) + Gemini 3 Flash Preview (search planning) + GPT-5.5 (strategy review)
Ranking — Claude Sonnet 4.6 (candidate judge + acceptance gate)
Embeddings — Voyage 3 Large (prose) + Voyage Code 3 (code)
OCR — Mistral OCR (PDF fallback)
Frontend
Framework — SvelteKit + Svelte 5 runes + Tailwind CSS 4
Graph Viz — Sigma.js v3 + graphology (WebGL)
Deploy — Cloudflare Workers (edge)
Discovery Sources
Feeds — arXiv, GitHub, Exa (neural search), FireCrawl (web crawl), RSS

ENTITY KINDS

tool (341) — Verification tools and code generators (riscv-dv, Verilator, CoCoTb, etc.)
paper (86) — Research papers, conference publications, and technical reports
technique (365) — Verification methodologies — constrained random, formal, coverage-driven
org (101) — Companies, labs, and standards bodies (RISC-V Intl, Synopsys, etc.)
person (220) — Researchers, engineers, and open-source contributors
isa (29) — Instruction set architectures (RISC-V, ARM, x86, MIPS, etc.)
concept (1408) — Verification concepts, standards, and specifications (UVM, PSS, etc.)

CONTACT

Questions, feedback, or collaboration inquiries:

hello@stimsmith.io