A source is a region of bytes
A source is where data lands, not what shape it is. You no longer declare aformat when you create one:
.jsonl file becomes records, a .csv becomes records, a .png is treated as an image. Hosted uploads carry a filename so the type is unambiguous; for objects already in your own bucket, the key’s extension (and, for CloudTrail, the payload shape) decides.
Three roles an object can play
| Object | Role | How it’s searched |
|---|---|---|
A record file (.jsonl, .csv, …) | Record | Text, SQL, and semantic search over its fields |
| An image referenced by a record field | Asset | The image is embedded onto the record, so the record is found by visual similarity |
| An image with no record referencing it | Standalone image | Found directly by visual similarity (an image library) |
Images referenced by records
The north-star pattern: log a JSON record that points at an image, and have the record be findable by both its metadata and the image’s visual content — with no duplicate image document. Declare which fields hold image locators undersemantic.image_embeddings:
photo_url, fetches the image, embeds it, and attaches that embedding to the record’s row. Searching content_filter: "images" for “a brown wooden chair” returns the product record — which you can then filter by color = "brown" like any other field. One record, found by metadata filter and visual similarity.
Parse the record
The
.jsonl object becomes a record; its fields are indexed for text and SQL search.Resolve the locator
The declared field’s value (
images/a-12.png) is resolved to a concrete object in one of your sources (see below).Locating the image
The value of an image-reference field can be written three ways, all resolving to an object inside one of your sources (a value can never reach storage you don’t own):| Locator | Resolves to | Use when |
|---|---|---|
images/a-12.png | the record’s own source, under its prefix | the image sits alongside the records (most common) |
splendor://<source>/<key> | the named source | the image lives in a different source you own |
s3://<bucket>/<key> | the source whose bucket + prefix covers that key | your records already carry absolute S3 URIs (typical for BYOC) |
Passive assets vs. standalone images
Whether a bare image object becomes its own searchable document is inferred from your config:- No image-reference fields declared → images in the source are standalone image documents (an image library you search directly).
- Image-reference fields declared → images are passive assets: stored and referenceable, but not indexed on their own. They appear in search only through the records that reference them — so you don’t get a duplicate hit for the image and the record.
A field is either a record (text) field or an image-reference field, never both — declaring the same path under both
document_embeddings and image_embeddings is rejected.Where this shows up
- Ingest a dataset — create a source and upload records (no
formatneeded). - Semantic search — use
content_filter: "images"to retrieve by visual similarity. - Datasets & schema — image-reference fields appear in the dataset like any other field you can filter on.