Skip to main content
A dataset is a named collection of records within a tenant. You ingest data into a dataset, and you query a dataset by its dataset_id. Search runs over one or more datasets at a time.

Listing datasets

GET /v1/datasets returns the datasets in the selected tenant. Each entry includes its identifier and current state, which is what you pass to search and export.
curl https://api.withsplendor.com/v1/datasets \
  -H "Authorization: Bearer $SPLENDOR_TOKEN" \
  -H "X-Splendor-Tenant-Id: $SPLENDOR_TENANT_ID"

Schema

GET /v1/datasets/{dataset_id}/schema describes the fields Splendor has discovered in a dataset. Splendor infers the schema from the records it ingests, so the fields you can filter, aggregate, and project in SQL search reflect what is actually present in your data. Use the schema to learn which fields exist before building filters or SQL projections against them.

Readiness

Ingestion and indexing happen asynchronously, so a dataset becomes queryable in stages. Readiness reports which capabilities are available yet:
  • Schema — the dataset’s fields have been inferred.
  • Text search — records are indexed and answer text and sql queries.
  • Semantic search — vector embeddings are built and answer semantic queries.
Check one dataset with GET /v1/datasets/{dataset_id}/readiness, or check many at once with POST /v1/datasets/readiness.
Poll readiness after ingesting new data. Semantic search becomes available after text search, because embeddings are built once records are indexed.

Tracking ingestion

Each ingest produces a run you can follow. GET /v1/ingest/runs lists runs across your sources and datasets, and GET /v1/ingest/runs/{ingest_run_id} reports the progress and readiness of a single run. Use these to confirm that a load finished before you query it. For how data gets into a dataset in the first place, see Ingest a dataset, Stream logs, and Connect a source.