Skip to main content
A source is a region of object storage that feeds a dataset. You don’t tell a source what kind of data it holds — Splendor detects each object’s type as it ingests it and routes it accordingly. One source can hold a mix of record files and images, and a record can point at an image so the two are searchable as one thing.

A source is a region of bytes

A source is where data lands, not what shape it is. You no longer declare a format when you create one:
curl https://api.withsplendor.com/v1/admin/hosted-sources \
  -H "Authorization: Bearer $SPLENDOR_TOKEN" \
  -H "X-Splendor-Tenant-Id: $SPLENDOR_TENANT_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "source_key": "catalog",
    "name": "Product catalog",
    "dataset_id": "products",
    "source_type": "hosted"
  }'
Each object is routed by its own content: a .jsonl file becomes records, a .csv becomes records, a .png is treated as an image. Hosted uploads carry a filename so the type is unambiguous; for objects already in your own bucket, the key’s extension (and, for CloudTrail, the payload shape) decides.
Because routing is per-object, a single source can hold your JSON records and the images those records reference. You no longer need a second source — or a second upload pipeline — just to mix modalities.

Three roles an object can play

ObjectRoleHow it’s searched
A record file (.jsonl, .csv, …)RecordText, SQL, and semantic search over its fields
An image referenced by a record fieldAssetThe image is embedded onto the record, so the record is found by visual similarity
An image with no record referencing itStandalone imageFound directly by visual similarity (an image library)
The same image bytes can be an asset of a record in one source and a standalone image in another — that’s your choice, driven entirely by config.

Images referenced by records

The north-star pattern: log a JSON record that points at an image, and have the record be findable by both its metadata and the image’s visual content — with no duplicate image document. Declare which fields hold image locators under semantic.image_embeddings:
{
  "source_key": "catalog",
  "name": "Product catalog",
  "dataset_id": "products",
  "source_type": "hosted",
  "semantic": {
    "image_embeddings": { "fields": ["photo_url"] }
  }
}
Now a record like:
{ "sku": "A-12", "name": "Walnut chair", "color": "brown", "photo_url": "images/a-12.png" }
is ingested as one searchable record. At ingest, Splendor resolves photo_url, fetches the image, embeds it, and attaches that embedding to the record’s row. Searching content_filter: "images" for “a brown wooden chair” returns the product record — which you can then filter by color = "brown" like any other field. One record, found by metadata filter and visual similarity.
1

Parse the record

The .jsonl object becomes a record; its fields are indexed for text and SQL search.
2

Resolve the locator

The declared field’s value (images/a-12.png) is resolved to a concrete object in one of your sources (see below).
3

Fetch & embed

Splendor reads the image bytes and embeds them with SigLIP.
4

Attach to the record

The image embedding is written onto the record’s row, carrying the record’s metadata — so a visual match resolves back to the record.

Locating the image

The value of an image-reference field can be written three ways, all resolving to an object inside one of your sources (a value can never reach storage you don’t own):
LocatorResolves toUse when
images/a-12.pngthe record’s own source, under its prefixthe image sits alongside the records (most common)
splendor://<source>/<key>the named sourcethe image lives in a different source you own
s3://<bucket>/<key>the source whose bucket + prefix covers that keyyour records already carry absolute S3 URIs (typical for BYOC)
For images in your own cloud (BYOC), Splendor assumes the source’s role to read them; for hosted images it uses managed credentials. Either way, the same image is fetched and embedded only once even if many records reference it.

Passive assets vs. standalone images

Whether a bare image object becomes its own searchable document is inferred from your config:
  • No image-reference fields declared → images in the source are standalone image documents (an image library you search directly).
  • Image-reference fields declared → images are passive assets: stored and referenceable, but not indexed on their own. They appear in search only through the records that reference them — so you don’t get a duplicate hit for the image and the record.
Need both in one source — a searchable image library and records that reference some of those images? Set the escape hatch:
{ "semantic": { "image_embeddings": { "fields": ["photo_url"], "index_standalone": true } } }
A field is either a record (text) field or an image-reference field, never both — declaring the same path under both document_embeddings and image_embeddings is rejected.

Where this shows up

  • Ingest a dataset — create a source and upload records (no format needed).
  • Semantic search — use content_filter: "images" to retrieve by visual similarity.
  • Datasets & schema — image-reference fields appear in the dataset like any other field you can filter on.