Lightcone Research — San Francisco Launch

Infrastructure for
Science that Compounds
in the Age of AI Agents

François Lanusse

CNRS ·

Liam Parker

UC Berkeley ·

lightconeresearch.org

The Problem

2025 — the year AI agents entered science

Denario / CMBAgent

Multi-agent system that generates full papers across astrophysics, biology, chemistry, and more.

DeepMind AI Co-Scientist

Generates novel hypotheses, reviews literature, and designs experiments in a closed-loop self-improving cycle.

Sakana AI “The AI Scientist”

End-to-end system: ideation → coding → experiments → writing → peer review, all automated.

Edison Scientific “Kosmos”

12-hour autonomous runs combining data analysis and literature search. Proposed a novel mechanism for Type 2 diabetes risk from public genetics data.

Sakana AI

DeepMind

The Problem

Fully autonomous AI science produces… noise (for now)

Edison Scientific — Kosmos runs 42,000 lines of code

Edison Scientific, “Announcing Kosmos” (Nov 2025)

Tens of thousands of lines of generated code. No one reads it. No one audits it. How do you trust the results?

Denario — AI-generated papers across six disciplines

The outputs are hard to trust. Too much material, impossible to audit, no way to tell what’s real.

The Problem

…but with a human in the loop, the hints are already striking

Matthew Schwartz × Claude — Vibe Physics

“Claude proved fast, indefatigable, and eager to please. It also, on occasion, faked results — hoping I wouldn’t notice.”

— Matthew Schwartz, Vibe Physics (Anthropic, 2026)

anthropic.com/research/vibe-physics

And on a personal note...

NeurIPS 2025

Weak Lensing Uncertainty Challenge

Open competition on weak-gravitational-lensing measurement — one of the hardest inference problems in cosmology.

I’ve worked on this problem for 7 years. You could say I’m somewhat of an expert…

Winning teams — NeurIPS 2025

The trajectory

AI is changing fast — don’t bet on “now”

METR Time Horizon — task length doubling every ~7 months, with Claude Mythos Preview at ≥16 h

METR, Task-Completion Time Horizons (metr.org/time-horizons, May 2026 snapshot, CC-BY) — Claude Mythos Preview at ≥16 h.

Exponential improvement

AI task horizons are doubling every ~89 days (~17×/year, METR TH 1.1, post-2024 window). Today’s “noisy” or “mildly useful” outputs won’t stay that way. Build for where models will be in a year, not where they are today.

AI co-scientist systems become obsolete really fast

Denario, Kosmos, Sakana — all tightly coupled to yesterday’s models. Denario is already obsolete 6 months later. As models improve, these AI co-scientist systems are replaced wholesale.

The question

What’s the right thing
to build right now?

Our position

Science that Compounds: the Need for A New Substrate for Research in the Age of AI

Lanusse & Parker · May 2026

AI will empower scientists to pursue more complex and ambitious research questions — and, multiplied across a field, drive a step change in the rate at which results enter circulation.

So the question we focus on

How can we establish that a result can be trusted — far more efficiently than today, to keep up with the growth of the literature?

Parker & Lanusse, Science that Compounds — Lightcone Research, May 2026

doi.org/10.5281/zenodo.20181269

Our position

Three properties make a result vettable — and AI finally makes them cheap

A structural answer — the form a result must take so that its soundness can be re-established by a human or a machine, efficiently, at every step of its lifecycle.

Provenance‑certified

Every plot, number, and claim ties back to the data, code, and decisions that produced it — eliminating fabricated results without requiring re-execution.

Fully observable

Code and artifacts, but also every consequential decision — estimator, prior, cutoff, dataset — and the reasoning behind it are inspectable.

Scientifically legible

Organized around the claims, decisions, and insights that matter — with direct paths down into the evidence and code behind any point.

None of this is new. Snakemake, Nextflow, REANA, etc — the community has been pushing in this direction for a decade.
The reason these principles haven’t become ubiquitous is simple: they have been too costly to follow for a typical research team.

AI can fix the problem it creates

Agentic AI flips that calculus on its head. When the work itself is AI-assisted, the provenance trace, the decision log, and the scientific-level summary come along for free — built in by construction, not negotiated against the scientist’s time.

And so —

we’re starting Lightcone Research — to build the tooling to produce science that compounds in the age of AI.

Introducing

An open-source initiative to build tooling
for robust scientific research in the age of AI.

Who we are

Team & roadmap

An international, open-source initiative — based at UC Berkeley and CNRS, philanthropically backed.

Core team

François
Lanusse

CNRS · AIM

Liam
Parker

UC Berkeley

Alexandre
Boucaud

CNRS · APC

Cail
Daley

CosmoStat · AIM

Nolan
Koblischke

U. of Toronto

Kangning
Diao

UC Berkeley

Advisors

Uroš Seljak

UC Berkeley · BCCP/BIDS

Fernando Pérez

UC Berkeley · BIDS

Kyle Cranmer

U. Wisconsin–Madison

Associated centers

Milestones

Mid‑Jan 2026
Project inception

May 2026 · today
Project launch

July 28–31, 2026
Agentic AI for Science Developer Summit · Berkeley

September 2026
First stable version

What we are building

A new layer for scientific knowledge

Our bet: invest in how scientific knowledge is captured and shared in the age of AI — not at the level of code, not at the level of papers, but something in between.

Code

Executable but opaque.
Buried assumptions, no intent.

Lightcone

Decisions, assumptions,
evidence, provenance

Paper

Readable but lossy.
Can’t regenerate the analysis.

From a Lightcone spec you can regenerate the code with any model, or generate the paper — because the scientific intent is preserved.

Inspectable

Every result traces back to the decisions and evidence that produced it.

Composable

Swap an assumption, extend the analysis, compare alternatives — without starting over.

Reusable

Other projects can build on your work — growing a shared body of knowledge over time.

Architecture

A layered ecosystem

FUTURE

Platform — Hosting & sharing infrastructure

COMING SOON

UI Layer — Visual interface for analyses

ALPHA — TECH PREVIEW

Agent Layer — Claude plugin for AI-assisted research

ALPHA — TECH PREVIEW

CLI & Tooling — Validation, execution, workflows, HPC

ALPHA — CORE

ASTRA — Agentic Schema for Transparent Research Analysis — Core specification format

Everything builds on ASTRA — the declarative spec that captures the scientific intent of an analysis. The layers above read from and write to this single source of truth.

The spec

Full specification, examples, and contribution guide:

astra‑spec.org

Open source

BSD 3-Clause · co-developed in the open with the scientific community.

github.com/LightconeResearch

Start with a shared research record.

A durable record of each project’s scientific structure — question, inputs, outputs, and choices.

Lives alongside the code, so the analysis stays legible as it evolves.

terminal

$ lc init my-analysis

✓ created astra.yaml

✓ initialized project

astra.yaml ASTRA

inputs:

decisions:

evidence:

recipes:

outputs:

insights:

Make scientific choices explicit.

Every consequential choice — data, preprocessing, model, priors, systematics — recorded as a first-class object.

Each choice carries its alternatives and the evidence or rationale behind it.

DECISION Prior on optical depth τ

Planck low-ℓ EE
Free (uninformative)
Fixed τ = 0.054

EVIDENCE quote verified

“The low-ℓ EE polarization likelihood provides the tightest CMB-only constraint on the reionization optical depth, τ = 0.054 ± 0.007.”

Planck Collaboration, 2020 · A&A 641, A6

Run the workflow from the record.

Inputs, choices, and expected outputs are read directly from the record — not just documented next to it.

Results are produced against the same structure that describes the analysis.

data.csv

astra.yaml

inputs:

data

decisions:

preprocess: standard

optimizer: adam

metric: rmse

outputs:

figure: plot.py

load.py

preprocess.py

train.py

evaluate.py

plot.py

figure.png

Trace every result back through the analysis.

Every figure, table, metric, or dataset carries its full provenance trace — not a dead end.

What produced it, what it used, which choices shaped it, and which snapshot of the record it came from.

figure.png

INSPECTOR figure.png

Produced byplot.py

Decisions that impact itpreprocess = standard, model = linear

Supported by verified evidence

Comes fromdata

Provenance trace matches

Technical deep dive

ASTRA

Agentic Schema for Transparent Research Analysis

v0.0.10 · early alpha

Our open specification for structuring computational research — making analyses inspectable, reproducible, and legible to humans and agents alike.

A quick walkthrough of the schema.

ASTRA · the building blocks

Inputs · Outputs · Decisions

Every spec declares what it needs, what it produces, and what choices it makes.

# astra.yaml — Iris classification, trimmed
id: iris_classification
name: "Iris Classification Study"

inputs:
  - id: iris_data
    type: data
    source: "sklearn.datasets.load_iris"

outputs:
  - id: accuracy
    type: metric
    recipe:
      command: python src/evaluate.py
  - id: confusion_matrix
    type: figure

decisions:
  scaling:
    label: "Feature Scaling"
    default: standard
    options:
      none:     { label: "No Scaling" }
      standard: { label: "StandardScaler" }
      minmax:   { label: "MinMaxScaler" }

  model:
    label: "Classification Model"
    default: random_forest
    options:
      svm:           { label: "Support Vector Machine" }
      random_forest: { label: "Random Forest" }
      logistic:      { label: "Logistic Regression" }

Inputs

Data sources or references to other ASTRA analyses (type: analysis). How projects compose into chains.

Outputs

Five kinds — metric, figure, table, data, report. Each carries an optional recipe (build rule).

Decisions

Named choice points with options, a default, and a rationale. Each option can carry its own description and links to supporting evidence.

Picking one option per decision gives you a universe — a single, executable configuration.

ASTRA · knowledge

Prior insights & findings

Every claim backed by evidence — either a quote from the literature, or an artifact produced by the analysis itself.

# Claims in, claims out — same shape, different direction

prior_insights:
  scaling_svm:
    claim: >-
      Standard scaling consistently outperforms min-max
      normalization for SVMs on tabular data.
    created_at: "2026-03-12T09:00:00Z"
    evidence:
      - id: ev_paper
        doi: "10.48550/arXiv.1706.03762"
        quote:
          exact: "Z-score normalization yielded higher accuracy."
        location: { page: 8 }

findings:
  best_model:
    claim: Random Forest reaches 96.2% with standard scaling.
    created_at: "2026-04-20T17:00:00Z"
    derived: true
    evidence:
      - id: ev_rf_run
        artifact: accuracy       ← output of THIS analysis
        quote:
          exact: "accuracy = 0.962"

decisions:
  scaling:
    options:
      standard:
        insights: [scaling_svm]   ← option cites prior insight

Prior insights

Knowledge brought IN from literature. Evidence = doi + verbatim quote + page anchor. These inform your decisions.

Findings

Knowledge taken OUT of the analysis. Evidence = artifact (an output of this analysis) + quote. What the run produced.

Shared model

Both are the same Insight object — claim + evidence[]. Placement (prior vs. finding) sets the direction.

astra validate --verify-evidence fetches DOIs and checks quotes are real text. No fabricated citations.

ASTRA · core protocol

Decisions, universes, multiverse

How ASTRA turns methodological choices into an explorable analysis space.

Decisions

Each choice has named options with rationale and evidence.

Universe

One complete set of selections — a single path through decision space.

Multiverse

The full space of decision combinations — for testing robustness to analysis choices.

# universes/baseline.yaml
id: baseline
description: "Default configuration"

decisions:
  scaling: standard
  model: random_forest
  test_size: small

A universe is just a YAML file — one option selected per decision. Running it produces all declared outputs.

The multiverse is the full space of decision combinations. Run alternative paths to test whether your conclusions are robust to the choices you made.

Purpose: robustness. Do your results hold when you swap a scaling method, change a model, or shift a prior? The multiverse tells you.

Building on Steegen, Tuerlinckx, Gelman & Vanpaemel, “Increasing Transparency Through a Multiverse Analysis,” Perspectives on Psychological Science 11(5), 702–712 (2016), doi:10.1177/1745691616658637; and Yu & Barter, Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making (MIT Press, 2024) — PCS (Predictability, Computability, Stability) framework, vdsbook.com.

ASTRA · execution

Compute & containers

Recipes carry the environment and the resource budget — so the same spec runs on a laptop, a cluster, or NERSC.

# Container + resources travel with the recipe

outputs:
  - id: trained_model
    type: data
    recipe:
      command: python src/train.py
      container: ghcr.io/lightcone/astro-ml:v2.3
      resources:
        cpus: 16
        memory: "128GB"
        gpus: 2
        time_limit: "4h"

  - id: accuracy
    type: metric
    recipe:
      command: python src/evaluate.py
      inputs: [trained_model]
      container: ghcr.io/lightcone/astro-ml:v2.3

Container

An image reference (registry/img:tag) or a path to a Containerfile to build from source — declared on each recipe.

Resources

cpus · memory · gpus · time_limit. A budget declaration — schedulers (SLURM, local, NERSC) translate it.

Portable

Docker on a laptop, Shifter/Podman at NERSC, Apptainer on SLURM — same spec, different runtime.

Technical deep dive

Lightcone‑CLI

The execution layer & agent skills around ASTRA.

lc init · lc run · lc status · lc verify

Turns an astra.yaml into enforced, reproducible execution — and gives Claude Code a substrate where it cannot fabricate results.

Lightcone‑CLI · execution engine

From spec to results — without fabrication

The agent describes the analysis. Lightcone-CLI runs it — so every figure, metric, and table you see is one the engine actually produced.

# The daily loop

$ lc init my-analysis
  ✓ scaffolds astra.yaml, recipes, universes
  ✓ installs Claude Code skills + hooks
  ✓ sets container runtime (auto-detect)

$ lc run                          # materialize ALL outputs
$ lc run accuracy                 # a single output
$ lc run --universe baseline      # one universe
  → Snakemake DAG · Dask dispatch · container per recipe

$ lc status                       # offline; reads manifests only
  accuracy             ok
  confusion_matrix     stale     # recipe / decisions drifted
  trained_model        missing

$ lc verify                       # walk provenance chain
  → recompute sha256 of outputs and inputs
  → flag tampered_data / broken_chain / missing_manifest

Containers per recipe

Each recipe runs inside its declared image — Docker, Podman, or podman-hpc. lc build pre-builds lc-<proj>-<hash> tags; runtime is auto-detected or pinned in ~/.lightcone/config.yaml.

Dask — laptop to HPC

Snakemake builds the DAG; jobs always dispatch through a Dask scheduler. Local LocalCluster on a workstation, srun-launched workers under SLURM, or any external scheduler (k8s, jobqueue) via DASK_SCHEDULER_ADDRESS.

No fabricated results

Every output is materialized through lc run. A per-output .lightcone-manifest.json records code_version, data_version, input hashes, git SHA, host. The agent can’t write a number into a figure that the engine didn’t produce.

lc export wrroc — one command produces a Workflow Run RO-Crate bundle (JSON-LD) for Zenodo / WorkflowHub.

Lightcone‑CLI · agent layer

Skills that ship with `lc init`

Every project bootstraps with a bundle of Claude Code skills copied into .claude/skills/. You drive the agent with /lc-new, /lc-from-code, /lc-from-paper; the rest are siblings the agent invokes as needed.

Entry points — pick by what you have

/lc-new — from a research question

Interactive scoping: surfaces decisions, searches literature, extracts verified quotes as prior insights, drafts universes. No YAML written by hand.

/lc-from-code — from an existing codebase

Scans the repo, drafts astra.yaml, parameterizes scripts so decisions can vary — existing logic untouched.

/lc-from-paper — reproduce a paper

ORIENT → ralph-loop reproduction: extracts the paper, interviews you, clones reference code, then iterates ARCHITECT → SPECIFY → LITERATURE → IMPLEMENT → RUN → COMPARE under a per-paper constitution.

/lc-feedback — report a bug

Files a GitHub issue with version & session context auto-attached.

Support for more harnesses coming very soon.

Lightcone in action on DESI DR1 BAO analysis

arXiv:2404.03000

DESI 2024 III: Baryon Acoustic Oscillations from Galaxies and Quasars.

Hubble diagram

Lightcone

DESI 2024 III

Analysis DAG

From raw catalogs to the Hubble diagram.

Get involved

Help us build the open substrate for scientific research
in the age of AI.

Applications open

Developer Summit

Open to researchers, engineers, and contributors from any institution. July 28–31, 2026 · Berkeley.

lightconeresearch.org/developer‑summit

Hiring

Full‑time positions

If you know anyone who would be a fit, please send them our way.

lightconeresearch.org

lightconeresearch.org · github.com/LightconeResearch

2025 — the year AI agents entered science

Denario / CMBAgent

DeepMind AI Co-Scientist

Sakana AI “The AI Scientist”

Edison Scientific “Kosmos”

Fully autonomous AI science produces… noise (for now)

…but with a human in the loop, the hints are already striking

And on a personal note...

Weak Lensing Uncertainty Challenge

AI is changing fast — don’t bet on “now”

Exponential improvement

AI co-scientist systems become obsolete really fast

Science that Compounds: the Need for A New Substrate for Research in the Age of AI

Three properties make a result vettable — and AI finally makes them cheap

Provenance‑certified

Fully observable

Scientifically legible

Team & roadmap

A new layer for scientific knowledge

Inspectable

Composable

Reusable

A layered ecosystem

Start with a shared research record.

Make scientific choices explicit.

Run the workflow from the record.

Trace every result back through the analysis.

Inputs · Outputs · Decisions

Prior insights & findings

Decisions, universes, multiverse

Compute & containers

From spec to results — without fabrication

Skills that ship with `lc init`

damping_prior

gaussian

flat

Likelihoods and Systematic Errors for the DESI 2024 BAO Cosmological Analysis — Chen et al. 2024

Developer Summit

Full‑time positions

2025 — the year AI agents entered science

Denario / CMBAgent

DeepMind AI Co-Scientist

Sakana AI “The AI Scientist”

Edison Scientific “Kosmos”

Fully autonomous AI science produces… noise (for now)

…but with a human in the loop, the hints are already striking

And on a personal note...

Weak Lensing Uncertainty Challenge

AI is changing fast — don’t bet on “now”

Exponential improvement

AI co-scientist systems become obsolete really fast

Science that Compounds: the Need for A New Substrate for Research in the Age of AI

Three properties make a result vettable — and AI finally makes them cheap

Provenance‑certified

Fully observable

Scientifically legible

Team & roadmap

A new layer for scientific knowledge

Inspectable

Composable

Reusable

A layered ecosystem

Start with a shared research record.

Make scientific choices explicit.

Run the workflow from the record.

Trace every result back through the analysis.

Inputs · Outputs · Decisions

Prior insights & findings

Decisions, universes, multiverse

Compute & containers

From spec to results — without fabrication

Skills that ship with lc init

Developer Summit

Full‑time positions

Skills that ship with `lc init`