Research program
AI for Knowledge Creation — discovery, synthesis, replication
Discovery, synthesis, and replication tooling that keeps signal above the noise. Provenance-first graphs, verification gates, and conformal publish/hold decisions keep outputs auditable.
Data vintage: Oct 2025
Scope at a glance
- Stance
- Publish proof. Hold when uncertainty exceeds conformal thresholds.
- Focus
- Provenance graphs, novelty scoring, contradiction detection, replication packs, uncertainty gates.
- Deliverable
- PageMind & Emplo ship provenance-backed synthesis with replication packs and abstention logs.
Abstract
Open indexes now surface hundreds of millions of research objects. Without provenance and verification, LLM-assisted reading amplifies both findable science and slop. We integrate provenance graphs, citation-intent gates, contradiction sentinels, and replication packs so outputs ship only when conformal thresholds bound residual risk; otherwise we hold and escalate.
State of the field (2024–Oct 2025)
1.1 Provenance & citation integrity
- Provenance graphs spanning Works, Authors, Venues, References, Claims, Datasets, and Code nodes, enriched with citation-intent labels.
- Citation integrity debt tracked via DOI resolvability, title/year/venue matching, and scite role frequencies.
- Citation-intent models (FINECITE) outperform zero-shot LLMs; we gate references and label intent mixes before drafting.
1.2 Novelty estimation
- SchNovel introduces RAG-Novelty, improving over embedding-only baselines on 15k paper pairs.
- GraphMind builds hierarchical contextual graphs and constrains novelty scoring with structural neighbours.
- Bibliographic coupling and co-citation remain first-principles baselines we log alongside neural scores.
1.3 Contradiction detection & verification
- SPOT (83 papers) reveals paper-level verification remains difficult (precision 6.1%, recall 21.1%).
- PRISMM-Bench (262 multimodal inconsistencies) requires identify, remedy, and pair-match tasks—best models 26–54% accuracy.
- CliniFact provides 1,970 clinical claim/evidence pairs for domain-specific contradiction mining.
1.4 Replication workflows & packaging
- Follow PRISMA 2020 for review documentation and flow diagrams; attach Datasheets and Model Cards to each pack.
- Target ≤ 50 MB replication packs with deterministic scripts, hashes, and environment locks; log time-to-reproduce.
- EuroSys/SIGMOD artifact programs show ~72% reproducible, 41% reusable artefacts—our baseline for success.
1.5 Human-time saved (with risk controls)
- LLM ensembles for abstract screening (JAMIA 2025) deliver 41.8% workload reduction at 100% sensitivity.
- Update workflows and weakly-supervised active learning report WSS@95 gains and high recall with pseudo-labelling.
- Conformal risk control supplies miscoverage bounds on long-form outputs; thresholds decide publish vs hold.
1.6 Uncertainty-aware publish/hold
Conformal Risk Control (CRC) and conformal tail-risk control provide miscoverage-bounded abstention for text outputs. We calibrate on held-out splits, log every abstention, and escalate when citation integrity scores fall below threshold.
1.7 Spam pressure & integrity threats
Paper mills and AI-generated survey floods raise the baseline for filtering. Provenance-first pipelines with integrity scoring and contradiction sentinels are the defence. We treat unresolved integrity debt as a blocker.
Visual evidence
Verification frontiers (Oct 2025)
SPOT precision
6.1%Paper-level precision on confirmed errors
SPOT recall
21.1%Paper-level recall on confirmed errors
PRISMM identify
54%Reviewer-flagged multimodal inconsistencies — identify
PRISMM remediate
41%Reviewer-flagged inconsistencies — remedy
Replication ecology — pack outcomes
≤ 10 MB
100%10–25 MB
100%25–50 MB
100%Method map (verification-first)
- 1. Data ingestion & provenance graph. Ingest OpenAlex, Crossref, arXiv, OpenReview, Semantic Scholar, scite. Create Work, Venue, Author, Reference, Citance, Claim, Dataset, and Code nodes with resolver status and hashes.
- 2. Conflict & claim layer. Extract claims from text, tables, and figures; align citations; mine contradictions using SPOT/PRISMM and domain suites like CliniFact.
- 3. Constraint-aware synthesis. Draft only from integrity-clean sources; enforce glossary terms; require neighbourhood reading (bibliographic coupling, co-citation, embedding neighbours).
- 4. Verification & risk control. Run contradiction sentinels and conformal risk control (CRC/CTR) to produce publish / abstain / escalate decisions with logged rationales.
- 5. Replication packs. Ship ≤ 50 MB packs with redacted data, env lockfile, deterministic script, Datasheet, and Model Card. Log reproduction success and time.
Publish-or-hold policy
Outputs ship only if citation integrity scores clear thresholds, contradictions are absent, and conformal risk stays below the user-set α. Otherwise, we hold, escalate, and log rationale with timestamped reviewer approvals.
Benchmark table (selected, 2024–Oct 2025)
| Task | Novelty proxy | Contradiction P/R | Replication success | Corpus size | Human-time saved | Source |
|---|---|---|---|---|---|---|
| Paper-error detection (SPOT) | — | 0.061 / 0.211 | — | 83 papers, 91 confirmed errors | — | May 2025, preprint |
| Reviewer-flag inconsistencies (PRISMM) | — | Task scores 0.26–0.54 | — | 262 inconsistencies, 242 papers | — | Oct 2025, preprint |
| Novelty (GraphMind) | Graph-aware novelty acc 0.50–0.69 | — | — | 3,063 papers | — | May 2025, preprint |
| Novelty (SchNovel) | Pairwise novelty accuracy vs emb-only | — | — | 15k paper pairs | — | Jul 2025, peer-reviewed |
| Citation intent (FINECITE) | — | — | — | 4 public datasets | — | Jul 2025, peer-reviewed |
| SR screening (JAMIA ensembles) | — | — | — | 119,695 records | 41.8% at 100% sensitivity; 99.1% max at lower | May 2025, peer-reviewed |
| SR pipeline (TrialMind) | — | — | — | 100 systematic reviews, 2,220 studies | +71.4% recall; −44.2% screening time | Aug 2025, peer-reviewed |
| Replication norms (EuroSys AE) | — | — | 75 "Results Reproduced"; ~58% participation | Multi-year AE records | — | Aug 2025, whitepaper |
Integration with inAi products
5.1 PageMind (research & QA)
- Build provenance graphs with contradiction edges and intent labels; filter navigation via glossary enforcement.
- Draft only from integrity-clean sources; embed citance snippets; enforce intent mix quotas (background ≤ 40%).
- Run sentence- and paper-level contradiction checks (SPOT) plus PRISMM before publishing figure-heavy sections.
- Ship ≤ 50 MB replication packs with hashes, deterministic scripts, Datasheet/Model Card, and time-to-reproduce logs.
- Gate publish-or-hold via CRC/CTR thresholds; escalate when two models disagree or intent consistency breaks.
5.2 Emplo (evidence-grounded business packets)
- Require source trace for claims about roles, companies, or revenues.
- Enforce citation integrity checks (DOI/URL resolve, metadata match); flag unsupported claims under CRC.
Figures & tables to include
- F1: Provenance-aware literature map (above).
- F2: Verification bars (above) with 95% CIs where reported.
- F3: Replication ecology stacked bars (above) showing reproduce/reuse/fail buckets.
- T1: Benchmark table (above) with novelty, contradiction, replication, time saved, corpus size.
- T2: Method map diagram (described in section 2).
Open problems (KCR-1 invites)
- Reliable novelty at small scale—compare GraphMind, SchNovel, neighbourhood density, and human scoring for n < 50 candidates.
- Contradiction mining at paper scale—push SPOT/PRISMM recall > 50% without collapsing precision; add multimodal sentinels.
- Minimal replication ecology—determine smallest pack that keeps ≥ 80% reproduction across labs using ARI/AE baselines.
- Conformal thresholds for publish/hold—study CRC vs tail-risk control for long-form synthesis and abstention budgets.
- Auditable citation chains through transformations—preserve intent and provenance across translation/summarisation.
- Spam filtration at scale—detect paper-mill and AI-survey floods without suppressing legitimate rapid reviews.
Integration notes
- Adopt: GraphMind features as constraints; SchNovel for evaluation only.
- Adopt: Citation-intent encoders for gating references and labelling intent mix; Crossref and scite for integrity scoring.
- Adopt: Verification gates using SPOT and PRISMM; log all abstentions with CRC/CTR thresholds.
- Adopt: Perfect-sensitivity screening with WSS@95 reporting; replication norms from ARI/EuroSys.
- Revise: Weakly-supervised screening limited to curator-approved domains.
- Reject: Auto-publish without PRISMM checks or with unresolved citation integrity debt.
Limits & failure controls
- Hallucination traps (unanswerable prompts, synthetic DOIs) plus CRC/CTR gating.
- Citation integrity enforcement via Crossref resolve + title/year/venue matching; block mismatches.
- Contradiction sentinels on sentence and paper level; run SPOT/PRISMM before publish.
- Red-team prompts to force unsupported claims; require dual-model agreement + human spot checks.
- Packaging caps (≤ 50 MB, hashed artefacts, deterministic logs); audit trails by email only.
Data vintage: Oct 2025 · Last updated 01 Oct 2025
