Active Research Program

Recherche

Là où l’orchestration rencontre l’AGI — et où les agents apprennent à faire tourner l’entreprise.

Nous étudions comment l’intelligence se déploie via la gestion de l’information autant que via la capacité brute des modèles, et nous transformons ces apprentissages en produits. La recherche est un moteur central chez inAi, pas un accessoire.

Limits of Intelligence — orchestration vs raw capacity

Brute scale lifts capability, but €/task, s/task, and seed-variance rise with it. Orchestration—decomposition, routing, verification, and information management—lets smaller or mid-sized models match or beat larger monoliths at equal quality, pushing the cost–latency Pareto frontier outward while stabilizing outcomes.

Why inAi cares. We engineer information-management stacks that feed PageMind (retail catalog content) and Emplo (automated applications). Our stance is explicit (AGI/SAI/ASI); agents run parts of a business under measured constraints. We publish tests, audits, and numbers; no timelines; email only.

What we study

  • Depth vs capability: find the sweet-spot decomposition depth d before errors accumulate.
  • Error propagation & repairability: verifier-guided decoding, constrained formats, and rollback to prevent cascades.
  • Dynamic routing / mixture-of-agents: uncertainty-aware routing and selective retrieval to minimise tokens and spend at matched quality.
  • Frontiers: quantify €/task, s/task, variance, and repair rate at fixed win-rate.

Methods we disclose

  • Multi-sampling / self-consistency and parallel cross-analysis.
  • Verifier stacks (process/outcome) for verifier-guided decoding.
  • Plan–act–verify with rollback and traceable exports.

Product impact

  • PageMind: lower €/SKU, spec-clean, source-traceable content.
  • Emplo: fewer oversight minutes, stable latency at budget caps, escalations only for hard cases.

What we publish vs keep private

  • Publish win-rate curves, ablations, robustness/variance audits, failure taxonomies, protocols, small redacted datasets.
  • Keep code and full diagrams private.

Pareto frontier: orchestration wins on cost–latency at equal quality. Data vintage: Oct 2025.

Agentic Decision Systems

An agentic decision system is a business executor with a goal-conditioned world model, a planner that proposes minimal plans, an actor that uses tools/APIs, a verifier that checks outcomes against constraints, and memory that preserves state and precedents. It plans → acts → verifies → repairs under measurable gates, leaving per-decision logs so agents can run parts of a business under audit.

Why this matters. We pursue measurable autonomy that reduces cost and latency without losing control. Autonomy is earned via numbers—decision accuracy, repair rate, oversight minutes per 100 tasks, escalation rate, unit cost—and maintained with auditability and rollback.

What we study

  • Autonomy gates: Assist → Approve → Auto-with-review → Auto. Example gates:<br/>Approve ≥ 70% accuracy & ≤ 50 oversight min/100 tasks;<br/>Auto-with-review ≥ 85% accuracy, ≥ 50% repair, ≤ 20 oversight;<br/>Auto ≥ 95% accuracy, ≤ 5% escalations, ≤ 5 oversight, 0 Sev-A/10k tasks.
  • Oversight minutes/100 tasks as the primary field KPI.
  • Recovery from bad states: repair loops, rollback, bounded retries.
  • Audit trails: per-decision traces of inputs, tool calls, checks, outcomes.

Methods we disclose

  • Mixture-of-agents routing by task class with role-scoped permissions.
  • Verifier stacks (policy/temporal checks, post-action assertions, sampled audits).
  • Exception channels with transactional rollback and escalation gates.
  • Human-in-the-loop approvals calibrated to risk and drift.

Where this shows up

  • PageMind — Catalog ops: Auto-with-review on attribute extraction/category mapping with F1 gates, drift alarms, full trace.
  • Emplo — Candidate ops: screening & scheduling at Assist → Auto-with-review; screening cost −25–60% with bias audits.
  • Support ops: triage Auto, FAQ Auto-with-review; 3–4× throughput/agent at fixed budget; severe violations ≤ 1/1k.

What we publish vs keep private

  • Publish: autonomy ladders, unit-economics deltas, drift monitoring methods.
  • Keep private: internal routing policies, vendor/model cost curves, prompts, and deployment diagrams.

Open problems we’re tackling

  • Stable gates across domains/languages without retuning.
  • Cost envelopes that hold under volume spikes with safe model downgrades.
  • Automated rollback heuristics that localise faults and limit blast radius.

Autonomy ladder with numeric gates from Assist to Auto. Data vintage: Oct 2025.

Data vintage: Oct 2025.

AI for Knowledge Creation

What’s the problem

The research firehose risks collapsing signal into noise: crucial results are lost, claims are hard to verify, and near-duplicates dilute attention. Tooling must preserve provenance (who said what, backed by which evidence) and enforce truthfulness with contradiction checks and uncertainty-aware publish-or-hold gates.

Why inAi cares

inAi’s mission is to raise discovery, synthesis, and replication signal for researchers and partners. PageMind retains a glossary memory for consistent terminology, enforces citation integrity (DOI resolvability, metadata match, citation intent), and provides source trace down to claim spans. We focus on October 2025-current methods and adopt foundational ideas only when they directly explain what works now.

What we study

  • Provenance-aware mapping: claim extraction, citation matching, citation intent, support vs. contradiction edges.
  • Verification-first synthesis: constraint-aware generation that only states claims grounded in retrieved evidence, with automatic contradiction flags.
  • Novelty proxies: bibliographic coupling, co-citation structure, and embedding-distance baselines calibrated to expert judgement.
  • Replication packs: small, meaningful bundles (hash-checked code, subset data, env lockfile, data/model cards) that reproduce key tables/figures.

Methods we disclose

  • Contradiction flags at sentence/section/paper level via retrieval-augmented checks.
  • Conformal thresholds for selective generation and publish-or-hold decisions.
  • Small redacted datasets with deterministic logs for external review.
  • Fixed-corpus harnesses with preregistered prompts, seeds, and metrics.

Where this shows up in products

  • PageMind literature maps with support/contradict edges and hover-to-source spans.
  • Redacted replication bundles attached to reports (code, subset data, manifest, env).
  • Audit-ready exports (JSON/CSV/GraphML) of claims, evidence, citation intent, and uncertainty.

What we publish vs keep private

  • Publish: protocols (KCR-1), provenance maps, redacted samples, negative results, leaderboards.
  • Private: full internal datasets, proprietary weights, and partner-specific prompts/logs.

Open problems we’re tackling (KCR-1)

  • Robust novelty metrics that track expert judgements across venues.
  • De-dup of synthetic noise without suppressing legitimate convergence.
  • Cross-lab reproducibility with the smallest still-effective replication pack.
  • Paper-level contradiction mining that scales recall without flooding users.

Schematic provenance map shows citation and contradiction edges across clustered topics. Data vintage: Oct 2025.

AI and Business Operations — where work actually moves

Tasks migrate from assist → supervise → run only when the economics are proven and auditable: lower € per accepted item at the same or better quality/latency, with traceable decisions and a safe rollback.

Why inAi cares. We map and measure this transition in production workloads—retail catalog ops, candidate ops, and support triage—so leaders can raise throughput at a fixed budget without raising risk. Our EU-ready posture (regional processing, no training on customer data without permission) and source-trace ledger tie each output to inputs, checks, and approvals.

What we study

  • Unit cost per accepted item: Assist −20–40%; Supervise −50–70%; Run −70–95% (domain-weighted, Oct 2025).
  • Constraint-violations/1k outputs: severe ≤ 1 to stay in “run”; automatic revert on breach.
  • Oversight minutes/100 tasks: Assist 30–60; Supervise 8–15; Run ≤ 3 via confidence routing + sampling.
  • Drift & recovery: control charts and acceptance sampling; revert-repair-re-promote with holdout monitors.

Methods we disclose

  • Verifier stacks (grounding, policy, schema, domain constraints).
  • Mixture-of-agents routing by confidence/cost envelope.
  • Repair budgets (bounded retries, structured edits).
  • Monitored acceptance gates (SPC limits, sampling plans, holdback tests).

Where this shows up

  • PageMind (retail): attributes & compliance run at Auto/Auto-with-review; € per accepted item −50–90% by attribute; sub-minute publish SLAs.
  • Emplo (candidate ops): screening & scheduling at Assist → Auto-with-review; screening cost −25–60% with bias audits.
  • Support ops: triage Auto, FAQ Auto-with-review; 3–4× throughput/agent at fixed budget; severe violations ≤ 1/1k.

What we publish vs keep private

  • Publish: autonomy ladders, unit-economics deltas, drift monitoring methods.
  • Keep private: internal routing policies, vendor/model cost curves, prompts, and deployment diagrams.

Open problems we’re tackling

  • Stable gates across domains/languages without retuning.
  • Cost envelopes that hold under volume spikes with safe model downgrades.
  • Automated rollback heuristics that localise faults and limit blast radius.

Unit-economics waterfall — €/accepted item vs baseline across autonomy levels. Data vintage: Oct 2025.

Data vintage: Oct 2025.

Notre plateforme et nos méthodes (classes divulguées)

  • Orchestration de modèles. Multi-échantillonnage, auto-consistance, débat/analyse croisée, modèles vérificateurs, routage dynamique entre fournisseurs, mélange d’agents pour des compétences hétérogènes.
  • Formes de données. PDF, images, audio, tableaux et chats/emails privés avec permission explicite ou surrogats synthétiques/expurgés. Les sorties incluent une traçabilité des sources pour permettre les audits.
  • Superposition d’information. Mapping inter-archives, modes mémoire (glossaire, traçabilité, vecteur), transformations déterministes aux frontières, contrôleurs de format/contraintes.
  • Boucle agent. Objectif → plan → action → vérification → révision, avec rollback, reprise et canaux d’exception.

Ce qui existe déjà (alimente les produits)

Les pipelines non structurés → structurés et le mapping inter-archives forment le noyau opérationnel de PageMind (tableurs prêts à l’emploi avec traçabilité par ligne).

Le dialogue multi-modèles et la vérification renforcent la qualité d’appariement dans Emplo et soutiennent les contrôles d’export PageMind.

Les échafaudages décisionnels posent la base des futurs modes agentiques dans les deux produits (autonomie bornée, trace d’audit, reprise humaine).

Les explorations d’accès nourrissent la conception pour claviers natifs, wearables et surfaces de messagerie.

Collaborations

Nous travaillons avec universités, laboratoires et sites pilotes. Contact par e-mail uniquement.

Universités et laboratoires

Études conjointes sur orchestration vs capacité, audits décisionnels agentiques et harnais d’évaluation. Notes techniques coécrites et projets étudiants à périmètre clair, avec livrables publiables.

Subventions

Recherche appliquée sur les opérations multilingues, l’employabilité et la prise de décision agentique. Travail avec données expurgées/synthétiques ou jeux autorisés sous périmètres stricts.

Pilotes

Données réelles, métriques d’acceptation prédéfinies, court rapport écrit. Nous publions protocoles et résultats quand prêts.

Publications et politique de partage

Nous publions des problématiques, protocoles d’évaluation, benchmarks, audits, petits jeux de données expurgés, cartes de littérature et notes techniques périodiques.

Nous gardons privés le code propriétaire, les schémas système complets et les jeux internes. Les contenus approfondis sont partagés sélectivement sous accord. Des démos sont organisées par e-mail lorsque pertinent.

Explorations

  • Agents natifs clavier. Fenêtres de contexte respectueuses de la confidentialité, consentement par application, complétions n-best basse latence avec décodage sensible aux contraintes.
  • Wearables. Micro-instructions, critiques instantanées, récupération embarquée pour la confidentialité ; planificateurs courts horizons avec retours haptiques ou visuels.
  • Surfaces de messagerie. Fils d’agents avec mémoire structurée, appels d’outils déclenchés par préfixe de message, formulaires multi-étapes compressés en procédures conversationnelles avec vérificateurs.

Ces explorations guident la conception ; ce ne sont pas des annonces produit.

Protocoles d’évaluation

OVC-1 Orchestration vs Capacité

Tâches : Extraction d’attributs, génération contrainte, traduction avec glossaire, synthèse multi-documents.

Métriques : Exact match/soft match, exactitude des citations, violations de contraintes, coût, latence, variance selon les seeds.

Procédure : Auto-consistance à k échantillons, seuils de vérificateur, budget de réparation, fronts de Pareto.

ADA-1 Audit décisionnel agentique

Tâches : Trois classes issues du noyau d’entreprise agentique.

Métriques : Précision décisionnelle, minimalité du plan, taux de récupération, minutes de supervision pour 100 tâches, taux d’escalade des exceptions.

Procédure : Examinateurs en aveugle, seuils d’acceptation déclarés à l’avance, échantillonnage stratifié pour les audits.

KCR-1 Création et réplication de connaissances

Tâches : Produire une revue de littérature avec signalement des contradictions ; générer des packs de réplication.

Métriques : Proxy de nouveauté, précision/rappel de la détection de contradictions, taux de réussite des réplications.

Procédure : Corpus fixes avec conflits injectés ; publication des résultats positifs et négatifs.