Research
Where orchestration meets AGI—and agents learn to run a business.
We study how intelligence scales through the management of information as much as through raw model capacity, and we turn those findings into products. Research is a first-class engine at inAi, not an accessory.
Four research themes
Limits of Intelligence
Decomposition, routing, and verification stacks that beat raw scale on cost, latency, and stability.
Agentic Decision Systems
Decision systems that plan, act, and repair so agents can run parts of a business under audit.
AI for Knowledge Creation
Discovery, synthesis, and replication tooling that keeps signal above the noise and publishes proof.
AI and Business Operations
Maps the shift from assist to run with economics that justify autonomy across real operations.
Limits of Intelligence — orchestration vs raw capacity
Brute scale lifts capability, but €/task, s/task, and seed-variance rise with it. Orchestration—decomposition, routing, verification, and information management—lets smaller or mid-sized models match or beat larger monoliths at equal quality, pushing the cost–latency Pareto frontier outward while stabilizing outcomes.
Why inAi cares. We engineer information-management stacks that feed PageMind (retail catalog content) and Emplo (automated applications). Our stance is explicit (AGI/SAI/ASI); agents run parts of a business under measured constraints. We publish tests, audits, and numbers; no timelines; email only.
What we study
- Depth vs capability: find the sweet-spot decomposition depth d before errors accumulate.
- Error propagation & repairability: verifier-guided decoding, constrained formats, and rollback to prevent cascades.
- Dynamic routing / mixture-of-agents: uncertainty-aware routing and selective retrieval to minimise tokens and spend at matched quality.
- Frontiers: quantify €/task, s/task, variance, and repair rate at fixed win-rate.
Methods we disclose
- Multi-sampling / self-consistency and parallel cross-analysis.
- Verifier stacks (process/outcome) for verifier-guided decoding.
- Plan–act–verify with rollback and traceable exports.
Product impact
- PageMind: lower €/SKU, spec-clean, source-traceable content.
- Emplo: fewer oversight minutes, stable latency at budget caps, escalations only for hard cases.
What we publish vs keep private
- Publish win-rate curves, ablations, robustness/variance audits, failure taxonomies, protocols, small redacted datasets.
- Keep code and full diagrams private.
Pareto frontier: orchestration wins on cost–latency at equal quality. Data vintage: Oct 2025.
Agentic Decision Systems
An agentic decision system is a business executor with a goal-conditioned world model, a planner that proposes minimal plans, an actor that uses tools/APIs, a verifier that checks outcomes against constraints, and memory that preserves state and precedents. It plans → acts → verifies → repairs under measurable gates, leaving per-decision logs so agents can run parts of a business under audit.
Why this matters. We pursue measurable autonomy that reduces cost and latency without losing control. Autonomy is earned via numbers—decision accuracy, repair rate, oversight minutes per 100 tasks, escalation rate, unit cost—and maintained with auditability and rollback.
What we study
- Autonomy gates: Assist → Approve → Auto-with-review → Auto. Example gates:<br/>Approve ≥ 70% accuracy & ≤ 50 oversight min/100 tasks;<br/>Auto-with-review ≥ 85% accuracy, ≥ 50% repair, ≤ 20 oversight;<br/>Auto ≥ 95% accuracy, ≤ 5% escalations, ≤ 5 oversight, 0 Sev-A/10k tasks.
- Oversight minutes/100 tasks as the primary field KPI.
- Recovery from bad states: repair loops, rollback, bounded retries.
- Audit trails: per-decision traces of inputs, tool calls, checks, outcomes.
Methods we disclose
- Mixture-of-agents routing by task class with role-scoped permissions.
- Verifier stacks (policy/temporal checks, post-action assertions, sampled audits).
- Exception channels with transactional rollback and escalation gates.
- Human-in-the-loop approvals calibrated to risk and drift.
Where this shows up
- PageMind — Catalog ops: Auto-with-review on attribute extraction/category mapping with F1 gates, drift alarms, full trace.
- Emplo — Candidate ops: screening & scheduling at Assist → Auto-with-review; screening cost −25–60% with bias audits.
- Support ops: triage Auto, FAQ Auto-with-review; 3–4× throughput/agent at fixed budget; severe violations ≤ 1/1k.
What we publish vs keep private
- Publish: autonomy ladders, unit-economics deltas, drift monitoring methods.
- Keep private: internal routing policies, vendor/model cost curves, prompts, and deployment diagrams.
Open problems we’re tackling
- Stable gates across domains/languages without retuning.
- Cost envelopes that hold under volume spikes with safe model downgrades.
- Automated rollback heuristics that localise faults and limit blast radius.
Autonomy ladder with numeric gates from Assist to Auto. Data vintage: Oct 2025.
Data vintage: Oct 2025.
AI for Knowledge Creation
What’s the problem
The research firehose risks collapsing signal into noise: crucial results are lost, claims are hard to verify, and near-duplicates dilute attention. Tooling must preserve provenance (who said what, backed by which evidence) and enforce truthfulness with contradiction checks and uncertainty-aware publish-or-hold gates.
Why inAi cares
inAi’s mission is to raise discovery, synthesis, and replication signal for researchers and partners. PageMind retains a glossary memory for consistent terminology, enforces citation integrity (DOI resolvability, metadata match, citation intent), and provides source trace down to claim spans. We focus on October 2025-current methods and adopt foundational ideas only when they directly explain what works now.
What we study
- Provenance-aware mapping: claim extraction, citation matching, citation intent, support vs. contradiction edges.
- Verification-first synthesis: constraint-aware generation that only states claims grounded in retrieved evidence, with automatic contradiction flags.
- Novelty proxies: bibliographic coupling, co-citation structure, and embedding-distance baselines calibrated to expert judgement.
- Replication packs: small, meaningful bundles (hash-checked code, subset data, env lockfile, data/model cards) that reproduce key tables/figures.
Methods we disclose
- Contradiction flags at sentence/section/paper level via retrieval-augmented checks.
- Conformal thresholds for selective generation and publish-or-hold decisions.
- Small redacted datasets with deterministic logs for external review.
- Fixed-corpus harnesses with preregistered prompts, seeds, and metrics.
Where this shows up in products
- PageMind literature maps with support/contradict edges and hover-to-source spans.
- Redacted replication bundles attached to reports (code, subset data, manifest, env).
- Audit-ready exports (JSON/CSV/GraphML) of claims, evidence, citation intent, and uncertainty.
What we publish vs keep private
- Publish: protocols (KCR-1), provenance maps, redacted samples, negative results, leaderboards.
- Private: full internal datasets, proprietary weights, and partner-specific prompts/logs.
Open problems we’re tackling (KCR-1)
- Robust novelty metrics that track expert judgements across venues.
- De-dup of synthetic noise without suppressing legitimate convergence.
- Cross-lab reproducibility with the smallest still-effective replication pack.
- Paper-level contradiction mining that scales recall without flooding users.
Schematic provenance map shows citation and contradiction edges across clustered topics. Data vintage: Oct 2025.
AI and Business Operations — where work actually moves
Tasks migrate from assist → supervise → run only when the economics are proven and auditable: lower € per accepted item at the same or better quality/latency, with traceable decisions and a safe rollback.
Why inAi cares. We map and measure this transition in production workloads—retail catalog ops, candidate ops, and support triage—so leaders can raise throughput at a fixed budget without raising risk. Our EU-ready posture (regional processing, no training on customer data without permission) and source-trace ledger tie each output to inputs, checks, and approvals.
What we study
- Unit cost per accepted item: Assist −20–40%; Supervise −50–70%; Run −70–95% (domain-weighted, Oct 2025).
- Constraint-violations/1k outputs: severe ≤ 1 to stay in “run”; automatic revert on breach.
- Oversight minutes/100 tasks: Assist 30–60; Supervise 8–15; Run ≤ 3 via confidence routing + sampling.
- Drift & recovery: control charts and acceptance sampling; revert-repair-re-promote with holdout monitors.
Methods we disclose
- Verifier stacks (grounding, policy, schema, domain constraints).
- Mixture-of-agents routing by confidence/cost envelope.
- Repair budgets (bounded retries, structured edits).
- Monitored acceptance gates (SPC limits, sampling plans, holdback tests).
Where this shows up
- PageMind (retail): attributes & compliance run at Auto/Auto-with-review; € per accepted item −50–90% by attribute; sub-minute publish SLAs.
- Emplo (candidate ops): screening & scheduling at Assist → Auto-with-review; screening cost −25–60% with bias audits.
- Support ops: triage Auto, FAQ Auto-with-review; 3–4× throughput/agent at fixed budget; severe violations ≤ 1/1k.
What we publish vs keep private
- Publish: autonomy ladders, unit-economics deltas, drift monitoring methods.
- Keep private: internal routing policies, vendor/model cost curves, prompts, and deployment diagrams.
Open problems we’re tackling
- Stable gates across domains/languages without retuning.
- Cost envelopes that hold under volume spikes with safe model downgrades.
- Automated rollback heuristics that localise faults and limit blast radius.
Unit-economics waterfall — €/accepted item vs baseline across autonomy levels. Data vintage: Oct 2025.
Data vintage: Oct 2025.
Our platform and methods (disclosed classes)
- Model orchestration. Multi-sampling, self-consistency, debate/cross-examination, verifier models, dynamic routing across providers, mixture-of-agents for heterogeneous skills.
- Data forms. PDFs, images, audio, tables, and private chats/emails with explicit permission or synthetic/redacted surrogates. Outputs carry source trace to enable audits.
- Information layering. Cross-archive mapping, memory modes (glossary, trace, vector), deterministic transforms at boundaries, format/constraint checkers.
- Agent loop. Goal → plan → act → verify → revise, with rollback, resume, and exception channels.
What already exists (feeds the products)
Unstructured→structured pipelines and cross-archive mapping form the operational core of PageMind (catalog-ready spreadsheets with per-row trace).
Multi-model dialogue and verification raise match quality in Emplo and underpin PageMind export checks.
Decision scaffolds provide the base for future agentic modes in both products (bounded autonomy, audit trail, human override).
Access explorations inform design for native keyboards, wearables, and messaging surfaces.
Collaborations
We work with universities, labs, and pilot sites. Email only.
Universities and labs
Joint studies on orchestration vs capacity, agentic decision audits, and evaluation harnesses. Co-authored technical notes and student projects with clear scopes and publishable outputs.
Grants
Applied research in multilingual content operations, employability, and agentic decision-making. Work with redacted/synthetic data or permissioned datasets under strict scopes.
Pilots
Real data, predefined acceptance metrics, short written report. We publish protocols and results when ready.
Publications and sharing policy
We publish problem statements, evaluation protocols, benchmarks, audits, small redacted datasets, literature maps, and periodic technical notes.
We keep private proprietary code, full system diagrams, and internal datasets. Deeper materials are shared selectively under agreement. Demos are arranged by email when relevant.
Explorations
- Native keyboard agents. Context windows that respect privacy, per-app consent, low-latency n-best completions with constraint-aware decoding.
- Wearables. Micro-prompts, glanceable critiques, on-device retrieval for privacy; short-horizon planners with vibration or visual cues.
- Messaging surfaces. Agent threads with structured memory, tool-calling by message prefix, multi-step forms collapsed into chat procedures with verifiers.
These explorations inform design; they are not product announcements.
Evaluation protocols
OVC-1 Orchestration vs Capacity
Tasks: Attribute extraction, constrained generation, translation-with-glossary, multi-doc synthesis.
Metrics: Exact-match/soft-match, citation correctness, constraint violations, cost, latency, variance across seeds.
Procedure: K-sample self-consistency, verifier thresholds, repair allowance budget, Pareto fronts.
ADA-1 Agentic Decision Audit
Tasks: Three classes from the agentic firm kernel.
Metrics: Decision accuracy, plan minimality, recovery rate, oversight minutes per 100 tasks, exception escalation rate.
Procedure: Blinded reviewers, pre-declared acceptance thresholds, stratified sampling for audits.
KCR-1 Knowledge Creation and Replication
Tasks: Produce a literature review with contradiction flags; generate replication packs.
Metrics: Novelty proxy, contradiction detection precision/recall, replication success rate.
Procedure: Fixed corpora with seeded conflicts; publish positive and negative findings.
