Infrastructure for
Science that Compounds
in the Age of AI Agents

François

François Lanusse

CNRS ·

Liam

Liam Parker

UC Berkeley ·

lightconeresearch.org

2025 — the year AI agents entered science

Denario / CMBAgent

Multi-agent system that generates full papers across astrophysics, biology, chemistry, and more.

DeepMind AI Co-Scientist

Generates novel hypotheses, reviews literature, and designs experiments in a closed-loop self-improving cycle.

Sakana AI “The AI Scientist”

End-to-end system: ideation → coding → experiments → writing → peer review, all automated.

Edison Scientific “Kosmos”

12-hour autonomous runs combining data analysis and literature search. Proposed a novel mechanism for Type 2 diabetes risk from public genetics data.

Sakana AI — The AI Scientist pipeline

Sakana AI

Google DeepMind AI Co-Scientist system design

DeepMind

Fully autonomous AI science produces… noise (for now)

Edison Scientific — Kosmos runs 42,000 lines of code

Edison Scientific, “Announcing Kosmos” (Nov 2025)

Tens of thousands of lines of generated code. No one reads it. No one audits it. How do you trust the results?

Denario — AI-generated papers across six disciplines

Denario — AI-generated papers across six disciplines

The outputs are hard to trust. Too much material, impossible to audit, no way to tell what’s real.

…but with a human in the loop, the hints are already striking

Matthew Schwartz × Claude — Vibe Physics

“Claude proved fast, indefatigable, and eager to please. It also, on occasion, faked results — hoping I wouldn’t notice.”

— Matthew Schwartz, Vibe Physics (Anthropic, 2026)

And on a personal note...

NeurIPS 2025

Weak Lensing Uncertainty Challenge

Open competition on weak-gravitational-lensing measurement — one of the hardest inference problems in cosmology.

I’ve worked on this problem for 7 years. You could say I’m somewhat of an expert…

Winning teams — NeurIPS 2025

AI is changing fast — don’t bet on “now”

METR Time Horizon — task length doubling every ~7 months, with Claude Mythos Preview at ≥16 h

METR, Task-Completion Time Horizons (metr.org/time-horizons, May 2026 snapshot, CC-BY) — Claude Mythos Preview at ≥16 h.

Exponential improvement

AI task horizons are doubling every ~89 days (~17×/year, METR TH 1.1, post-2024 window). Today’s “noisy” or “mildly useful” outputs won’t stay that way. Build for where models will be in a year, not where they are today.

AI co-scientist systems become obsolete really fast

Denario, Kosmos, Sakana — all tightly coupled to yesterday’s models. Denario is already obsolete 6 months later. As models improve, these AI co-scientist systems are replaced wholesale.

The question

What’s the right thing
to build right now?

Science that Compounds: the Need for A New Substrate for Research in the Age of AI

Lanusse & Parker · May 2026

AI will empower scientists to pursue more complex and ambitious research questions — and, multiplied across a field, drive a step change in the rate at which results enter circulation.

So the question we focus on

How can we establish that a result can be trustedfar more efficiently than today, to keep up with the growth of the literature?

Three properties make a result vettable — and AI finally makes them cheap

A structural answer — the form a result must take so that its soundness can be re-established by a human or a machine, efficiently, at every step of its lifecycle.

Provenance‑certified

Every plot, number, and claim ties back to the data, code, and decisions that produced it — eliminating fabricated results without requiring re-execution.

Fully observable

Code and artifacts, but also every consequential decision — estimator, prior, cutoff, dataset — and the reasoning behind it are inspectable.

Scientifically legible

Organized around the claims, decisions, and insights that matter — with direct paths down into the evidence and code behind any point.

None of this is new. Snakemake, Nextflow, REANA, etc — the community has been pushing in this direction for a decade.
The reason these principles haven’t become ubiquitous is simple: they have been too costly to follow for a typical research team.

AI can fix the problem it creates

Agentic AI flips that calculus on its head. When the work itself is AI-assisted, the provenance trace, the decision log, and the scientific-level summary come along for free — built in by construction, not negotiated against the scientist’s time.

And so —

we’re starting Lightcone Research — to build the tooling to produce science that compounds in the age of AI.

Introducing

Lightcone Research

An open-source initiative to build tooling
for robust scientific research in the age of AI.

UC Berkeley CNRS

Team & roadmap

An international, open-source initiative — based at UC Berkeley and CNRS, philanthropically backed.

Core team

François Lanusse

François
Lanusse

CNRS · AIM

Liam Parker

Liam
Parker

UC Berkeley

Alexandre Boucaud

Alexandre
Boucaud

CNRS · APC

Cail Daley

Cail
Daley

CosmoStat · AIM

Nolan Koblischke

Nolan
Koblischke

U. of Toronto

Kangning Diao

Kangning
Diao

UC Berkeley

Advisors

Uroš Seljak

Uroš Seljak

UC Berkeley · BCCP/BIDS

Fernando Pérez

Fernando Pérez

UC Berkeley · BIDS

Kyle Cranmer

Kyle Cranmer

U. Wisconsin–Madison

Associated centers

Berkeley Institute for Data Science AISSAI

Milestones

Mid‑Jan 2026
Project inception

May 2026 · today
Project launch

July 28–31, 2026
Agentic AI for Science Developer Summit · Berkeley

September 2026
First stable version

A new layer for scientific knowledge

Our bet: invest in how scientific knowledge is captured and shared in the age of AI — not at the level of code, not at the level of papers, but something in between.

Code

Executable but opaque.
Buried assumptions, no intent.

Lightcone

Decisions, assumptions,
evidence, provenance

Paper

Readable but lossy.
Can’t regenerate the analysis.

From a Lightcone spec you can regenerate the code with any model, or generate the paper — because the scientific intent is preserved.

Inspectable

Every result traces back to the decisions and evidence that produced it.

Composable

Swap an assumption, extend the analysis, compare alternatives — without starting over.

Reusable

Other projects can build on your work — growing a shared body of knowledge over time.

A layered ecosystem

FUTURE

Platform — Hosting & sharing infrastructure

COMING SOON

UI Layer — Visual interface for analyses

ALPHA — TECH PREVIEW

Agent Layer — Claude plugin for AI-assisted research

ALPHA — TECH PREVIEW

CLI & Tooling — Validation, execution, workflows, HPC

ALPHA — CORE

ASTRA — Agentic Schema for Transparent Research Analysis — Core specification format

Everything builds on ASTRA — the declarative spec that captures the scientific intent of an analysis. The layers above read from and write to this single source of truth.

The spec

Full specification, examples, and contribution guide:

astra‑spec.org

Open source

BSD 3-Clause · co-developed in the open with the scientific community.

github.com/LightconeResearch

Start with a shared research record.

A durable record of each project’s scientific structure — question, inputs, outputs, and choices.

Lives alongside the code, so the analysis stays legible as it evolves.

terminal
$ lc init my-analysis
✓ created astra.yaml
✓ initialized project
astra.yaml ASTRA
inputs:
decisions:
evidence:
recipes:
outputs:
insights:

Make scientific choices explicit.

Every consequential choice — data, preprocessing, model, priors, systematics — recorded as a first-class object.

Each choice carries its alternatives and the evidence or rationale behind it.

DECISION Prior on optical depth τ
  • Planck low-ℓ EE
  • Free (uninformative)
  • Fixed τ = 0.054
EVIDENCE quote verified

“The low-ℓ EE polarization likelihood provides the tightest CMB-only constraint on the reionization optical depth, τ = 0.054 ± 0.007.”

Planck Collaboration, 2020 · A&A 641, A6

Run the workflow from the record.

Inputs, choices, and expected outputs are read directly from the record — not just documented next to it.

Results are produced against the same structure that describes the analysis.

data.csv
astra.yaml
inputs:
data
decisions:
preprocess: standard
optimizer: adam
metric: rmse
outputs:
figure: plot.py
load.py
preprocess.py
train.py
evaluate.py
plot.py
figure.png

Trace every result back through the analysis.

Every figure, table, metric, or dataset carries its full provenance trace — not a dead end.

What produced it, what it used, which choices shaped it, and which snapshot of the record it came from.

figure.png
INSPECTOR figure.png
Produced byplot.py
Decisions that impact itpreprocess = standard, model = linear
Supported by verified evidence
Comes fromdata
Provenance trace matches

Technical deep dive

ASTRA

Agentic Schema for Transparent Research Analysis

v0.0.10 · early alpha

Our open specification for structuring computational research — making analyses inspectable, reproducible, and legible to humans and agents alike.

A quick walkthrough of the schema.

Inputs · Outputs · Decisions

Every spec declares what it needs, what it produces, and what choices it makes.

# astra.yaml — Iris classification, trimmed
id: iris_classification
name: "Iris Classification Study"

inputs:
  - id: iris_data
    type: data
    source: "sklearn.datasets.load_iris"

outputs:
  - id: accuracy
    type: metric
    recipe:
      command: python src/evaluate.py
  - id: confusion_matrix
    type: figure

decisions:
  scaling:
    label: "Feature Scaling"
    default: standard
    options:
      none:     { label: "No Scaling" }
      standard: { label: "StandardScaler" }
      minmax:   { label: "MinMaxScaler" }

  model:
    label: "Classification Model"
    default: random_forest
    options:
      svm:           { label: "Support Vector Machine" }
      random_forest: { label: "Random Forest" }
      logistic:      { label: "Logistic Regression" }

Inputs

Data sources or references to other ASTRA analyses (type: analysis). How projects compose into chains.

Outputs

Five kinds — metric, figure, table, data, report. Each carries an optional recipe (build rule).

Decisions

Named choice points with options, a default, and a rationale. Each option can carry its own description and links to supporting evidence.

Picking one option per decision gives you a universe — a single, executable configuration.

Prior insights & findings

Every claim backed by evidence — either a quote from the literature, or an artifact produced by the analysis itself.

# Claims in, claims out — same shape, different direction

prior_insights:
  scaling_svm:
    claim: >-
      Standard scaling consistently outperforms min-max
      normalization for SVMs on tabular data.
    created_at: "2026-03-12T09:00:00Z"
    evidence:
      - id: ev_paper
        doi: "10.48550/arXiv.1706.03762"
        quote:
          exact: "Z-score normalization yielded higher accuracy."
        location: { page: 8 }

findings:
  best_model:
    claim: Random Forest reaches 96.2% with standard scaling.
    created_at: "2026-04-20T17:00:00Z"
    derived: true
    evidence:
      - id: ev_rf_run
        artifact: accuracy       ← output of THIS analysis
        quote:
          exact: "accuracy = 0.962"

decisions:
  scaling:
    options:
      standard:
        insights: [scaling_svm]   ← option cites prior insight

Prior insights

Knowledge brought IN from literature. Evidence = doi + verbatim quote + page anchor. These inform your decisions.

Findings

Knowledge taken OUT of the analysis. Evidence = artifact (an output of this analysis) + quote. What the run produced.

Shared model

Both are the same Insight object — claim + evidence[]. Placement (prior vs. finding) sets the direction.

astra validate --verify-evidence fetches DOIs and checks quotes are real text. No fabricated citations.

Decisions, universes, multiverse

How ASTRA turns methodological choices into an explorable analysis space.

Decisions

Each choice has named options with rationale and evidence.

Universe

One complete set of selections — a single path through decision space.

Multiverse

The full space of decision combinations — for testing robustness to analysis choices.

# universes/baseline.yaml
id: baseline
description: "Default configuration"

decisions:
  scaling: standard
  model: random_forest
  test_size: small

A universe is just a YAML file — one option selected per decision. Running it produces all declared outputs.

The multiverse is the full space of decision combinations. Run alternative paths to test whether your conclusions are robust to the choices you made.

Purpose: robustness. Do your results hold when you swap a scaling method, change a model, or shift a prior? The multiverse tells you.

Building on Steegen, Tuerlinckx, Gelman & Vanpaemel, “Increasing Transparency Through a Multiverse Analysis,” Perspectives on Psychological Science 11(5), 702–712 (2016), doi:10.1177/1745691616658637; and Yu & Barter, Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making (MIT Press, 2024) — PCS (Predictability, Computability, Stability) framework, vdsbook.com.

Compute & containers

Recipes carry the environment and the resource budget — so the same spec runs on a laptop, a cluster, or NERSC.

# Container + resources travel with the recipe

outputs:
  - id: trained_model
    type: data
    recipe:
      command: python src/train.py
      container: ghcr.io/lightcone/astro-ml:v2.3
      resources:
        cpus: 16
        memory: "128GB"
        gpus: 2
        time_limit: "4h"

  - id: accuracy
    type: metric
    recipe:
      command: python src/evaluate.py
      inputs: [trained_model]
      container: ghcr.io/lightcone/astro-ml:v2.3

Container

An image reference (registry/img:tag) or a path to a Containerfile to build from source — declared on each recipe.

Resources

cpus · memory · gpus · time_limit. A budget declaration — schedulers (SLURM, local, NERSC) translate it.

Portable

Docker on a laptop, Shifter/Podman at NERSC, Apptainer on SLURM — same spec, different runtime.

Technical deep dive

Lightcone‑CLI

The execution layer & agent skills around ASTRA.

lc init · lc run · lc status · lc verify

Turns an astra.yaml into enforced, reproducible execution — and gives Claude Code a substrate where it cannot fabricate results.

From spec to results — without fabrication

The agent describes the analysis. Lightcone-CLI runs it — so every figure, metric, and table you see is one the engine actually produced.

# The daily loop

$ lc init my-analysis
  ✓ scaffolds astra.yaml, recipes, universes
  ✓ installs Claude Code skills + hooks
  ✓ sets container runtime (auto-detect)

$ lc run                          # materialize ALL outputs
$ lc run accuracy                 # a single output
$ lc run --universe baseline      # one universe
  → Snakemake DAG · Dask dispatch · container per recipe

$ lc status                       # offline; reads manifests only
  accuracy             ok
  confusion_matrix     stale     # recipe / decisions drifted
  trained_model        missing

$ lc verify                       # walk provenance chain
  → recompute sha256 of outputs and inputs
  → flag tampered_data / broken_chain / missing_manifest

Containers per recipe

Each recipe runs inside its declared image — Docker, Podman, or podman-hpc. lc build pre-builds lc-<proj>-<hash> tags; runtime is auto-detected or pinned in ~/.lightcone/config.yaml.

Dask — laptop to HPC

Snakemake builds the DAG; jobs always dispatch through a Dask scheduler. Local LocalCluster on a workstation, srun-launched workers under SLURM, or any external scheduler (k8s, jobqueue) via DASK_SCHEDULER_ADDRESS.

No fabricated results

Every output is materialized through lc run. A per-output .lightcone-manifest.json records code_version, data_version, input hashes, git SHA, host. The agent can’t write a number into a figure that the engine didn’t produce.

lc export wrroc — one command produces a Workflow Run RO-Crate bundle (JSON-LD) for Zenodo / WorkflowHub.

Skills that ship with lc init

Every project bootstraps with a bundle of Claude Code skills copied into .claude/skills/. You drive the agent with /lc-new, /lc-from-code, /lc-from-paper; the rest are siblings the agent invokes as needed.

Entry points — pick by what you have

/lc-new — from a research question

Interactive scoping: surfaces decisions, searches literature, extracts verified quotes as prior insights, drafts universes. No YAML written by hand.

/lc-from-code — from an existing codebase

Scans the repo, drafts astra.yaml, parameterizes scripts so decisions can vary — existing logic untouched.

/lc-from-paper — reproduce a paper

ORIENT → ralph-loop reproduction: extracts the paper, interviews you, clones reference code, then iterates ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE under a per-paper constitution.

/lc-feedback — report a bug

Files a GitHub issue with version & session context auto-attached.

Claude Code

Support for more harnesses coming very soon.

Lightcone in action on DESI DR1 BAO analysis

arXiv:2404.03000

DESI 2024 III: Baryon Acoustic Oscillations from Galaxies and Quasars.

arXiv abstract page for DESI 2024 III: Baryon Acoustic Oscillations from Galaxies and Quasars
Hubble diagram

Lightcone

Hubble diagram reproduced by Lightcone

DESI 2024 III

DESI 2024 III Figure 15 — Hubble diagram
Analysis DAG

From raw catalogs to the Hubble diagram.

DESI DR1 LSS catalogs data + 18 randoms fiducial cosmology tabulated z → r(z) RascalC covariances post-recon run_reconstruction.py × 4 parents compute_xi.py post-recon ξ × 8 fit_bao_post.py × 8 MCMC chains make_distance_table.py D_M/r_d, D_H/r_d, D_V/r_d plot_hubble_diagram.py Fig 15 Fig 15 reproduced 840b954 c7bd74a cdc334a 8069d11 1bb61f2 Workflow run verified by Snakemake sha256:b19558d0d64e2333… · reproduced 2026-05-12 decisions smoothing_radius smoothing_radius_qso recon_method decisions s-binning ells decisions broadband damping_prior damping_centers fit_range template_cosmology fitting_method decisions systematic_error_treatment

Get involved

Lightcone Research

Help us build the open substrate for scientific research
in the age of AI.


Applications open

Developer Summit

Open to researchers, engineers, and contributors from any institution. July 28–31, 2026 · Berkeley.

lightconeresearch.org/developer‑summit

Hiring

Full‑time positions

If you know anyone who would be a fit, please send them our way.

lightconeresearch.org

lightconeresearch.org  ·  github.com/LightconeResearch