Domain	Examples
geo/terrain	slope, aspect, hillshade, curvature, viewshed, TRI
geo/raster	calc, clip, mosaic, reproject, resample, reclassify, normalize
geo/imagery	radiometric indices (NDVI, BSI), calibration, pansharpening
geo/analysis	zonal stats, point sample, change detection, weighted overlay
geo/hydrology	catchment network, tiered network, upstream trace
tabular	filter, select, sort, join, union, aggregate
temporal	resample, rolling, align, interpolate, period stats

Local-First Computation

Not everything needs a cloud cluster.

┌──────────────────────────────────────────────────────┐
│              BROWSER (instant, free)                  │
│  DuckDB-WASM: SQL on Parquet/GeoParquet, <100MB      │
│  Client-side rendering: PMTiles, COG range requests   │
│  Lightweight raster ops: NDVI, reclassify, normalize  │
├──────────────────────────────────────────────────────┤
│              LOCAL (seconds, free)                    │
│  DuckDB native: SQL on larger datasets, <10GB         │
│  GDAL/rasterio: terrain ops, reprojection, mosaics   │
│  Full Python: custom scripts, ML inference            │
├──────────────────────────────────────────────────────┤
│              CLOUD (minutes, metered)                 │
│  K8s batch: continental-scale, fan-out/reduce         │
│  Multi-GB imagery: temporal composites, ML training   │
│  Long-running: change detection over archives         │
└──────────────────────────────────────────────────────┘

Same spec. The system picks the tier based on data size, operation complexity, and engine type.

Format	Type	Access Pattern
COG (Cloud-Optimized GeoTIFF)	raster	HTTP range requests for tiles/overviews
GeoParquet	vector	Column pruning, row-group filtering
PMTiles	tiles	Single-file tile archive, offset-based
Zarr	n-d arrays	Chunk-addressable, S3-native

Risk Level	Trigger	Response
INSTANT	Cached artifact, simple lookup	Return result
SAFE	Small AOI, known operation	Cost estimate
UNCERTAIN	Multi-dataset fusion, method choice matters	Epistemic justification required
EXPENSIVE	Continental-scale, hours of compute	Cost + impact disclosure

Declarative Geospatial Infrastructure

for Reproducible Environmental Analysis

The Architecture

1. Data Sources

The Data Problem

Concepts, Not URLs

Crosswalks

Dataset Identity

2. Transforms & Analyses

Operations as a Vocabulary

Domains

Layers as a DAG

Caching Across Transformations

Multi-Step Pipelines

Parameters as First-Class Citizens

Demo

`data.folia.sh/@kedron/malcomb-vulnerability`

3. Compute

Local-First Computation

The Tabular Stack

The Geospatial Stack

Coarse-to-Fine Analysis

Fan-Out / Reduce

4. Deployment

Publishing Research Outputs

Cloud-Native All the Way Down

What This Enables

5. Agentic Development

Why Does the Machine Do What It Does?

Epistemic Justification

Risk Classification

Agents + Multiverse Analysis

Putting It Together

The Declarative Spec

Principles

Thank You