Catalog

`rerun.catalog`

`IndexValuesLike: TypeAlias = npt.NDArray[np.int_] | npt.NDArray[np.datetime64] | pa.Int64Array` `module-attribute`

A type alias for index values.

This can be any numpy-compatible array of integers, or a pyarrow.Int64Array

`VectorDistanceMetricLike: TypeAlias = VectorDistanceMetric | Literal['L2', 'Cosine', 'Dot', 'Hamming']` `module-attribute`

A type alias for vector distance metrics.

`class Schema`

The schema representing a set of available columns for a dataset.

A schema contains both index columns (timelines) and component columns (entity/component data).

`def eq(other)`

Check equality with another Schema.

`def init(inner)`

Create a new Schema wrapper.

PARAMETER	DESCRIPTION
`inner`	The internal schema object from the bindings. TYPE: `SchemaInternal`

`def iter()`

Iterate over all column descriptors in the schema (index columns first, then component columns).

`def repr()`

Return a string representation of the schema.

`def column_for(entity_path, component)`

Look up the column descriptor for a specific entity path and component.

PARAMETER	DESCRIPTION
`entity_path`	The entity path to look up. TYPE: `str`
`component`	The component to look up. Example: `Points3D:positions`. TYPE: `str`

RETURNS	DESCRIPTION
`ComponentColumnDescriptor \| None`	The column descriptor, if it exists.

`def column_for_selector(selector)`

Look up the column descriptor for a specific selector.

PARAMETER	DESCRIPTION
`selector`	The selector to look up. String arguments are expected to follow the format: `"<entity_path>:<component_type>"` TYPE: `str \| ComponentColumnSelector \| ComponentColumnDescriptor`

RETURNS	DESCRIPTION
`ComponentColumnDescriptor`	The column descriptor.

RAISES	DESCRIPTION
`LookupError`	If the column is not found.
`ValueError`	If the string selector format is invalid or the input type is unsupported.
Note: if the input is already a `ComponentColumnDescriptor`, it is
`returned directly without checking for existence.`

`def column_names()`

Return a list of all column names in the schema.

RETURNS	DESCRIPTION
`The names of all columns (index columns first, then component columns).`

`def component_columns()`

Return a list of all the component columns in the schema.

Component columns contain the data for a specific component of an entity.

`def index_columns()`

Return a list of all the index columns in the schema.

Index columns contain the index values for when the data was updated. They generally correspond to Rerun timelines.

`class ComponentColumnDescriptor`

The descriptor of a component column.

Component columns contain the data for a specific component of an entity.

Column descriptors are used to describe the columns in a Schema. They are read-only. To select a component column, use ComponentColumnSelector.

`archetype: str` `property`

The archetype name, if any.

This property is read-only.

`component: str` `property`

The component.

This property is read-only.

`component_type: str | None` `property`

The component type, if any.

This property is read-only.

`entity_path: str` `property`

The entity path.

This property is read-only.

`is_static: bool` `property`

Whether the column is static.

This property is read-only.

`name: str` `property`

The name of this column.

This property is read-only.

`class ComponentColumnSelector`

A selector for a component column.

Component columns contain the data for a specific component of an entity.

`component: str` `property`

The component.

This property is read-only.

`entity_path: str` `property`

The entity path.

This property is read-only.

`def init(entity_path, component)`

Create a new ComponentColumnSelector.

PARAMETER	DESCRIPTION
`entity_path`	The entity path to select. TYPE: `str`
`component`	The component to select. Example: `Points3D:positions`. TYPE: `str`

`class IndexColumnDescriptor`

The descriptor of an index column.

Index columns contain the index values for when the data was updated. They generally correspond to Rerun timelines.

Column descriptors are used to describe the columns in a Schema. They are read-only. To select an index column, use IndexColumnSelector.

`is_static: bool` `property`

Part of generic ColumnDescriptor interface: always False for Index.

`name: str` `property`

The name of the index.

This property is read-only.

`class IndexColumnSelector`

A selector for an index column.

Index columns contain the index values for when the data was updated. They generally correspond to Rerun timelines.

`name: str` `property`

The name of the index.

This property is read-only.

`def init(index)`

Create a new IndexColumnSelector.

PARAMETER	DESCRIPTION
`index`	The name of the index to select. Usually the name of a timeline. TYPE: `str`

`class AlreadyExistsError`

Bases: Exception

Raised when trying to create a resource that already exists.

`class CatalogClient`

Client for a remote Rerun catalog server.

Note: the datafusion package is required to use this client. Initialization will fail with an error if the package is not installed.

`ctx: datafusion.SessionContext` `property`

Returns a DataFusion session context for querying the catalog.

`url: str` `property`

Returns the catalog URL.

`def all_entries()` `deprecated`

Deprecated

Use entries() instead

Returns a list of all entries in the catalog.

`def append_to_table(table_name, batches=None, **named_params)` `deprecated`

Deprecated

Use TableEntry.append() instead

Append record batches to an existing table.

PARAMETER	DESCRIPTION
`table_name`	The name of the table entry to write to. This table must already exist. TYPE: `str`
`batches`	One or more record batches to write into the table. TYPE: `RecordBatchReader \| RecordBatch \| Sequence[RecordBatch] \| Sequence[Sequence[RecordBatch]] \| None` DEFAULT: `None`
`**named_params`	Named parameters to write to the table as columns. TYPE: `Any` DEFAULT: `{}`

`def create_dataset(name)`

Creates a new dataset with the given name.

`def create_table(name, schema, url)`

Create and register a new table.

PARAMETER	DESCRIPTION
`name`	The name of the table entry to create. It must be unique within all entries in the catalog. An exception will be raised if an entry with the same name already exists. TYPE: `str`
`schema`	The schema of the table to create. TYPE: `Schema`
`url`	The URL of the directory for where to store the Lance table. TYPE: `str`

`def create_table_entry(name, schema, url)` `deprecated`

Deprecated

Use create_table() instead

Create and register a new table.

`def dataset_entries()` `deprecated`

Deprecated

Use datasets() instead

Returns a list of all dataset entries in the catalog.

`def dataset_names(*, include_hidden=False)`

Returns a list of all dataset names in the catalog.

PARAMETER	DESCRIPTION
`include_hidden`	If True, include blueprint datasets. TYPE: `bool` DEFAULT: `False`

`def datasets(*, include_hidden=False)`

Returns a list of all dataset entries in the catalog.

PARAMETER	DESCRIPTION
`include_hidden`	If True, include blueprint datasets. TYPE: `bool` DEFAULT: `False`

`def do_global_maintenance()`

Perform maintenance tasks on the whole system.

`def entries(*, include_hidden=False)`

Returns a list of all entries in the catalog.

PARAMETER	DESCRIPTION
`include_hidden`	If True, include hidden entries (blueprint datasets and system tables like `__entries`). TYPE: `bool` DEFAULT: `False`

`def entry_names(*, include_hidden=False)`

Returns a list of all entry names in the catalog.

PARAMETER	DESCRIPTION
`include_hidden`	If True, include hidden entries (blueprint datasets and system tables like `__entries`). TYPE: `bool` DEFAULT: `False`

`def get_dataset(name=None, *, id=None)`

Returns a dataset by its ID or name.

Exactly one of id or name must be provided.

PARAMETER	DESCRIPTION
`name`	The name of the dataset. TYPE: `str \| None` DEFAULT: `None`
`id`	The unique identifier of the dataset. Can be an `EntryId` object or its string representation. TYPE: `EntryId \| str \| None` DEFAULT: `None`

`def get_dataset_entry(*, id=None, name=None)` `deprecated`

Deprecated

Use get_dataset() instead

Returns a dataset by its ID or name.

`def get_table(name=None, *, id=None)`

Returns a table by its ID or name.

Exactly one of id or name must be provided.

PARAMETER	DESCRIPTION
`name`	The name of the table. TYPE: `str \| None` DEFAULT: `None`
`id`	The unique identifier of the table. Can be an `EntryId` object or its string representation. TYPE: `EntryId \| str \| None` DEFAULT: `None`

`def get_table_entry(*, id=None, name=None)` `deprecated`

Deprecated

Use get_table() instead

Returns a table by its ID or name.

`def register_table(name, url)`

Registers a foreign Lance table (identified by its URL) as a new table entry with the given name.

PARAMETER	DESCRIPTION
`name`	The name of the table entry to create. It must be unique within all entries in the catalog. An exception will be raised if an entry with the same name already exists. TYPE: `str`
`url`	The URL of the Lance table to register. TYPE: `str`

`def table_entries()` `deprecated`

Deprecated

Use tables() instead

Returns a list of all dataset entries in the catalog.

`def table_names(*, include_hidden=False)`

Returns a list of all table names in the catalog.

PARAMETER	DESCRIPTION
`include_hidden`	If True, include system tables (e.g., `__entries`). TYPE: `bool` DEFAULT: `False`

`def tables(*, include_hidden=False)`

Returns a list of all table entries in the catalog.

PARAMETER	DESCRIPTION
`include_hidden`	If True, include system tables (e.g., `__entries`). TYPE: `bool` DEFAULT: `False`

`def update_table(table_name, batches=None, **named_params)` `deprecated`

Deprecated

Use TableEntry.upsert() instead

Upsert record batches to an existing table.

PARAMETER	DESCRIPTION
`table_name`	The name of the table entry to write to. This table must already exist. TYPE: `str`
`batches`	One or more record batches to write into the table. TYPE: `RecordBatchReader \| RecordBatch \| Sequence[RecordBatch] \| Sequence[Sequence[RecordBatch]] \| None` DEFAULT: `None`
`**named_params`	Named parameters to write to the table as columns. TYPE: `Any` DEFAULT: `{}`

`def write_table(name, batches, insert_mode)` `deprecated`

Deprecated

Use TableEntry.append(), overwrite(), or upsert() instead

Writes record batches into an existing table.

PARAMETER	DESCRIPTION
`name`	The name of the table entry to write to. This table must already exist. TYPE: `str`
`batches`	One or more record batches to write into the table. For convenience, you can pass in a record batch, list of record batches, list of list of batches, or a [`pyarrow.RecordBatchReader`]. TYPE: `RecordBatchReader \| RecordBatch \| Sequence[RecordBatch] \| Sequence[Sequence[RecordBatch]]`
`insert_mode`	Determines how rows should be added to the existing table. TYPE: `TableInsertMode`

`class DatasetEntry`

Bases: Entry[DatasetEntryInternal]

A dataset entry in the catalog.

`catalog: CatalogClient` `property`

The catalog client that this entry belongs to.

`created_at: datetime` `property`

The entry's creation date and time.

`id: EntryId` `property`

The entry's id.

`kind: EntryKind` `property`

The entry's kind.

`manifest_url: str` `property`

Return the dataset manifest URL.

`name: str` `property`

The entry's name.

`updated_at: datetime` `property`

The entry's last updated date and time.

`def eq(other)`

Compare this entry to another object.

Supports comparison with str and EntryId to enable the following patterns:

"entry_name" in client.entries()
entry_id in client.entries()

`def arrow_schema()`

Return the Arrow schema of the data contained in the dataset.

`def blueprint_dataset()`

The associated blueprint dataset, if any.

`def blueprints()`

Lists all blueprints currently registered with this dataset.

`def create_fts_index(*, column, time_index, store_position=False, base_tokenizer='simple')` `deprecated`

Deprecated

use create_fts_search_index

`def create_fts_search_index(*, column, time_index, store_position=False, base_tokenizer='simple')`

Create a full-text search index on the given column.

`def create_vector_index(*, column, time_index, target_partition_num_rows=None, num_sub_vectors=16, distance_metric='Cosine')` `deprecated`

Deprecated

use create_vector_search_index instead

`def create_vector_search_index(*, column, time_index, target_partition_num_rows=None, num_sub_vectors=16, distance_metric='Cosine')`

Create a vector index on the given column.

This will enable indexing and build the vector index over all existing values in the specified component column.

Results can be retrieved using the search_vector API, which will include the time-point on the indexed timeline.

Only one index can be created per component column -- executing this a second time for the same component column will replace the existing index.

PARAMETER	DESCRIPTION
`column`	The component column to create the index on. TYPE: `str \| ComponentColumnSelector \| ComponentColumnDescriptor`
`time_index`	Which timeline this index will map to. TYPE: `IndexColumnSelector`
`target_partition_num_rows`	The target size (in number of rows) for each partition. The underlying indexer (lance) will pick a default when no value is specified - today this is 8192. It will also cap the maximum number of partitions independently of this setting - currently 4096. TYPE: `int \| None` DEFAULT: `None`
`num_sub_vectors`	The number of sub-vectors to use when building the index. TYPE: `int` DEFAULT: `16`
`distance_metric`	The distance metric to use for the index. ("L2", "Cosine", "Dot", "Hamming") TYPE: `VectorDistanceMetric \| str` DEFAULT: `'Cosine'`

`def default_blueprint()`

Return the name currently set blueprint.

`def default_blueprint_partition_id()` `deprecated`

Deprecated

Use default_blueprint_segment_id() instead

The default blueprint partition ID for this dataset, if any.

`def delete()`

Delete this entry from the catalog.

`def delete_indexes(column)` `deprecated`

Deprecated

use delete_search_indexes instead

`def delete_search_indexes(column)`

Deletes all user-defined indexes for the specified column.

`def do_maintenance(optimize_indexes=False, retrain_indexes=False, compact_fragments=False, cleanup_before=None, unsafe_allow_recent_cleanup=False)`

Perform maintenance tasks on the datasets.

`def download_partition(partition_id)` `deprecated`

Deprecated

Use download_segment() instead

Download a partition from the dataset.

`def download_segment(segment_id)`

Download a segment from the dataset.

`def filter_contents(exprs)`

Return a new DatasetView filtered to the given entity paths.

Entity path expressions support wildcards: - "/points/**" matches all entities under /points - "-/text/**" excludes all entities under /text

PARAMETER	DESCRIPTION
`exprs`	Entity path expression or list of entity path expressions. TYPE: `str \| Sequence[str]`

RETURNS	DESCRIPTION
`DatasetView`	A new view filtered to the matching entity paths.

Examples:

# Filter to a single entity path
view = dataset.filter_contents("/points/**")

# Filter to specific entity paths
view = dataset.filter_contents(["/points/**"])

# Exclude certain paths
view = dataset.filter_contents(["/points/**", "-/text/**"])

# Chain with segment filters
view = dataset.filter_segments(["recording_0"]).filter_contents("/points/**")

`def filter_segments(segment_ids)`

Return a new DatasetView filtered to the given segment IDs.

PARAMETER	DESCRIPTION
`segment_ids`	A segment ID string, a list of segment ID strings, or a DataFusion DataFrame with a column named 'rerun_segment_id'. When passing a DataFrame, if there are additional columns, they will be ignored. TYPE: `str \| Sequence[str] \| DataFrame`

RETURNS	DESCRIPTION
`DatasetView`	A new view filtered to the given segments.

Examples:

# Filter to a single segment
view = dataset.filter_segments("recording_0")

# Filter to specific segments
view = dataset.filter_segments(["recording_0", "recording_1"])

# Filter using a DataFrame
good_segments = segment_table.filter(col("success"))
view = dataset.filter_segments(good_segments)

# Read data from the filtered view
df = view.reader(index="timeline")

`def list_indexes()` `deprecated`

Deprecated

use list_search_indexes instead

`def list_search_indexes()`

List all user-defined indexes in this dataset.

`def manifest()`

Return the dataset manifest as a DataFusion DataFrame.

`def partition_ids()` `deprecated`

Deprecated

Use segment_ids() instead

Returns a list of partition IDs for the dataset.

`def partition_url(partition_id, timeline=None, start=None, end=None)` `deprecated`

Deprecated

Use segment_url() instead

Return the URL for the given partition.

`def reader(index, *, include_semantically_empty_columns=False, include_tombstone_columns=False, fill_latest_at=False, using_index_values=None)`

Create a reader over this dataset.

Returns a DataFusion DataFrame.

Server side filters

The returned DataFrame supports server side filtering for both rerun_segment_id and the index (timeline) column, which can greatly improve performance. For example, the following filters will effectively be handled by the Rerun server.

dataset.reader(index="real_time").filter(col("rerun_segment_id") == "aabbccddee")
dataset.reader(index="real_time").filter(col("real_time") == "1234567890")
dataset.reader(index="real_time").filter(
    (col("rerun_segment_id") == "aabbccddee") & (col("real_time") == "1234567890")
)

PARAMETER	DESCRIPTION
`index`	The index (timeline) to use for the view. Pass `None` to read only static data. TYPE: `str \| None`
`include_semantically_empty_columns`	Whether to include columns that are semantically empty. TYPE: `bool` DEFAULT: `False`
`include_tombstone_columns`	Whether to include tombstone columns. TYPE: `bool` DEFAULT: `False`
`fill_latest_at`	Whether to fill null values with the latest valid data. TYPE: `bool` DEFAULT: `False`
`using_index_values`	If provided, specifies the exact index values to sample for all segments. Can be a numpy array (datetime64[ns] or int64), a pyarrow Array, or a sequence. Use with `fill_latest_at=True` to populate rows with the most recent data. TYPE: `IndexValuesLike \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	A DataFusion DataFrame.

`def register(recording_uri, *, layer_name='base')`

Register RRD URIs to the dataset and return a handle to track progress.

This method initiates the registration of recordings to the dataset, and returns a handle that can be used to wait for completion or iterate over results.

PARAMETER	DESCRIPTION
`recording_uri`	The URI(s) of the RRD(s) to register. Can be a single URI string or a sequence of URIs. TYPE: `str \| Sequence[str]`
`layer_name`	The layer(s) to which the recordings will be registered to. Can be a single layer name (applied to all recordings) or a sequence of layer names (must match the length of `recording_uri`). Defaults to `"base"`. TYPE: `str \| Sequence[str]` DEFAULT: `'base'`

RETURNS	DESCRIPTION
`RegistrationHandle`	A handle to track and wait on the registration tasks.

`def register_blueprint(uri, set_default=True)`

Register an existing .rbl visible to the server.

By default, also set this blueprint as default.

`def register_prefix(recordings_prefix, layer_name=None)`

Register all RRDs under a given prefix to the dataset and return a handle to track progress.

A prefix is a directory-like path in an object store (e.g. an S3 bucket or ABS container). All RRDs that are recursively found under the given prefix will be registered to the dataset.

This method initiates the registration of the recordings to the dataset, and returns a handle that can be used to wait for completion or iterate over results.

PARAMETER	DESCRIPTION
`recordings_prefix`	The prefix under which to register all RRDs. TYPE: `str`
`layer_name`	The layer to which the recordings will be registered to. If `None`, this defaults to `"base"`. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`A handle to track and wait on the registration tasks.`

`def schema()`

Return the schema of the data contained in the dataset.

`def search_fts(query, column)`

Search the dataset using a full-text search query.

`def search_vector(query, column, top_k)`

Search the dataset using a vector search query.

`def segment_ids()`

Returns a list of segment IDs for the dataset.

`def segment_table(join_meta=None, join_key='rerun_segment_id')`

Return the segment table as a DataFusion DataFrame.

The segment table contains metadata about each segment in the dataset, including segment IDs, layer names, storage URLs, and size information.

PARAMETER	DESCRIPTION
`join_meta`	Optional metadata table or DataFrame to join with the segment table. If a `TableEntry` is provided, it will be converted to a DataFrame using `reader()`. TYPE: `TableEntry \| DataFrame \| None` DEFAULT: `None`
`join_key`	The column name to use for joining, defaults to "rerun_segment_id". Both the segment table and `join_meta` must contain this column. TYPE: `str` DEFAULT: `'rerun_segment_id'`

RETURNS	DESCRIPTION
`DataFrame`	The segment metadata table, optionally joined with `join_meta`.

`def segment_url(segment_id, timeline=None, start=None, end=None)`

Return the URL for the given segment.

PARAMETER	DESCRIPTION
`segment_id`	The ID of the segment to get the URL for. TYPE: `str`
`timeline`	The name of the timeline to display. TYPE: `str \| None` DEFAULT: `None`
`start`	The start selected time for the segment. Integer for ticks, or datetime/nanoseconds for timestamps. TYPE: `datetime \| int \| None` DEFAULT: `None`
`end`	The end selected time for the segment. Integer for ticks, or datetime/nanoseconds for timestamps. TYPE: `datetime \| int \| None` DEFAULT: `None`

Examples:

With ticks

>>> start_tick, end_time = 0, 10
>>> dataset.segment_url("some_id", "log_tick", start_tick, end_time)

With timestamps

>>> start_time, end_time = datetime.now() - timedelta(seconds=4), datetime.now()
>>> dataset.segment_url("some_id", "real_time", start_time, end_time)

RETURNS	DESCRIPTION
`str`	The URL for the given segment.

`def set_default_blueprint(blueprint_name)`

Set an already-registered blueprint as default for this dataset.

`def set_default_blueprint_partition_id(partition_id)` `deprecated`

Deprecated

Use set_default_blueprint_segment_id() instead

Set the default blueprint partition ID for this dataset.

`def update(*, name=None)`

Update this entry's properties.

PARAMETER	DESCRIPTION
`name`	New name for the entry TYPE: `str \| None` DEFAULT: `None`

`class DatasetView`

A filtered view over a dataset in the catalog.

A DatasetView provides lazy filtering over a dataset's segments and entity paths. Filters are composed lazily and only applied when data is actually read.

Create a DatasetView by calling filter_segments() or filter_contents() on a DatasetEntry.

Examples:

# Filter to specific segments
view = dataset.filter_segments(["recording_0", "recording_1"])

# Filter to specific entity paths
view = dataset.filter_contents(["/points/**"])

# Chain filters
view = dataset.filter_segments(["recording_0"]).filter_contents(["/points/**"])

# Read data
df = view.reader(index="timeline")

`def init(internal)`

Create a new DatasetView wrapper.

PARAMETER	DESCRIPTION
`internal`	The internal Rust-side DatasetView object. TYPE: `DatasetViewInternal`

`def repr()`

Return a string representation of the DatasetView.

`def arrow_schema()`

Return the filtered Arrow schema for this view.

RETURNS	DESCRIPTION
`Schema`	The filtered Arrow schema.

`def download_segment(segment_id)` `deprecated`

Deprecated

This method is deprecated and will be removed in a future release

Download a specific segment from the dataset.

PARAMETER	DESCRIPTION
`segment_id`	The ID of the segment to download. TYPE: `str`

RETURNS	DESCRIPTION
`Recording`	The downloaded recording.

`def filter_contents(exprs)`

Return a new DatasetView filtered to the given entity paths.

Entity path expressions support wildcards: - "/points/**" matches all entities under /points - "-/text/**" excludes all entities under /text

PARAMETER	DESCRIPTION
`exprs`	Entity path expression or list of entity path expressions. TYPE: `str \| Sequence[str]`

RETURNS	DESCRIPTION
`DatasetView`	A new view filtered to the matching entity paths.

`def filter_segments(segment_ids)`

Return a new DatasetView filtered to the given segment IDs.

Filters are composed: if this view already has a segment filter, the result is the intersection of both filters.

PARAMETER	DESCRIPTION
`segment_ids`	A segment ID string, a list of segment ID strings, or a DataFusion DataFrame with a column named 'rerun_segment_id'. TYPE: `str \| Sequence[str] \| DataFrame`

RETURNS	DESCRIPTION
`DatasetView`	A new view filtered to the given segments.

`def reader(index, *, include_semantically_empty_columns=False, include_tombstone_columns=False, fill_latest_at=False, using_index_values=None)`

Create a reader over this DatasetView.

Returns a DataFusion DataFrame.

Server side filters

The returned DataFrame supports server side filtering for both rerun_segment_id and the index (timeline) column, which can greatly improve performance. For example, the following filters will effectively be handled by the Rerun server.

dataset.reader(index="real_time").filter(col("rerun_segment_id") == "aabbccddee")
dataset.reader(index="real_time").filter(col("real_time") == "1234567890")
dataset.reader(index="real_time").filter(
    (col("rerun_segment_id") == "aabbccddee") & (col("real_time") == "1234567890")
)

PARAMETER	DESCRIPTION
`index`	The index (timeline) to use for the view. Pass `None` to read only static data. TYPE: `str \| None`
`include_semantically_empty_columns`	Whether to include columns that are semantically empty. TYPE: `bool` DEFAULT: `False`
`include_tombstone_columns`	Whether to include tombstone columns. TYPE: `bool` DEFAULT: `False`
`fill_latest_at`	Whether to fill null values with the latest valid data. TYPE: `bool` DEFAULT: `False`
`using_index_values`	If provided, specifies the exact index values to sample for all segments. Can be a numpy array (datetime64[ns] or int64), a pyarrow Array, or a sequence. Use with `fill_latest_at=True` to populate rows with the most recent data. TYPE: `IndexValuesLike \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`A DataFusion DataFrame.`

`def schema()`

Return the filtered schema for this view.

The schema reflects any content filters applied to the view.

RETURNS	DESCRIPTION
`Schema`	The filtered schema.

`def segment_ids()`

Return the segment IDs for this view.

If segment filters have been applied, only matching segments are returned.

RETURNS	DESCRIPTION
`list[str]`	The list of segment IDs.

`def segment_table(join_meta=None, join_key='rerun_segment_id')`

Return the segment table as a DataFusion DataFrame.

The segment table contains metadata about each segment in the dataset, including segment IDs, layer names, storage URLs, and size information.

Only segments matching this view's filters are included.

PARAMETER	DESCRIPTION
`join_meta`	Optional metadata table or DataFrame to join with the segment table. If a `TableEntry` is provided, it will be converted to a DataFrame using `reader()`. TYPE: `TableEntry \| DataFrame \| None` DEFAULT: `None`
`join_key`	The column name to use for joining, defaults to "rerun_segment_id". Both the segment table and `join_meta` must contain this column. TYPE: `str` DEFAULT: `'rerun_segment_id'`

RETURNS	DESCRIPTION
`DataFrame`	The segment metadata table, optionally joined with `join_meta`.

`class Entry`

Bases: ABC, Generic[InternalEntryT]

An entry in the catalog.

`catalog: CatalogClient` `property`

The catalog client that this entry belongs to.

`created_at: datetime` `property`

The entry's creation date and time.

`id: EntryId` `property`

The entry's id.

`kind: EntryKind` `property`

The entry's kind.

`name: str` `property`

The entry's name.

`updated_at: datetime` `property`

The entry's last updated date and time.

`def eq(other)`

Compare this entry to another object.

Supports comparison with str and EntryId to enable the following patterns:

"entry_name" in client.entries()
entry_id in client.entries()

`def delete()`

Delete this entry from the catalog.

`def update(*, name=None)`

Update this entry's properties.

PARAMETER	DESCRIPTION
`name`	New name for the entry TYPE: `str \| None` DEFAULT: `None`

`class EntryId`

A unique identifier for an entry in the catalog.

`def init(id)`

Create a new EntryId from a string.

`def str()`

Return str(self).

`class EntryKind`

The kinds of entries that can be stored in the catalog.

`def int()`

int(self)

`def str()`

Return str(self).

`class NotFoundError`

Bases: Exception

Raised when the requested resource is not found.

`class RegistrationHandle`

Handle to track and wait on segment registration tasks.

`def iter_results(timeout_secs=None)`

Stream completed registrations as they finish.

Uses the server's streaming API to yield results as tasks complete. Each result is yielded exactly once when its task completes (success or error).

PARAMETER	DESCRIPTION
`timeout_secs`	Timeout in seconds. None for blocking. Note that using None doesn't guarantee that a TimeoutError will never be eventually raised for long-running tasks. Setting a timeout and polling is recommended for monitoring very large registration batches. TYPE: `int \| None` DEFAULT: `None`

YIELDS	DESCRIPTION
`SegmentRegistrationResult`	The result of each completed registration.

RAISES	DESCRIPTION
`TimeoutError`	If the timeout is reached before all tasks complete.

`def wait(timeout_secs=None)`

Block until all registrations complete.

PARAMETER	DESCRIPTION
`timeout_secs`	Timeout in seconds. None for blocking. Note that using None doesn't guarantee that a TimeoutError will never be eventually raised for long-running tasks. Setting a timeout and polling is recommended for monitoring very large registration batches. TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`RegistrationResult`	The result containing the list of segment IDs in registration order.

RAISES	DESCRIPTION
`ValueError`	If any registration fails.
`TimeoutError`	If the timeout is reached before all tasks complete.

`class SegmentRegistrationResult` `dataclass`

Result of a completed segment registration.

`error: str | None` `instance-attribute`

Error message if registration failed, or None if successful.

`is_error: bool` `property`

Returns True if the registration failed.

`is_success: bool` `property`

Returns True if the registration was successful.

`segment_id: str | None` `instance-attribute`

The resulting segment ID. May be None if registration failed.

`uri: str` `instance-attribute`

The source URI that was registered.

`class TableEntry`

Bases: Entry[TableEntryInternal]

A table entry in the catalog.

Note: this object acts as a table provider for DataFusion.

`catalog: CatalogClient` `property`

The catalog client that this entry belongs to.

`created_at: datetime` `property`

The entry's creation date and time.

`id: EntryId` `property`

The entry's id.

`kind: EntryKind` `property`

The entry's kind.

`name: str` `property`

The entry's name.

`storage_url: str` `property`

The table's storage URL.

`updated_at: datetime` `property`

The entry's last updated date and time.

`def __datafusion_table_provider__()`

Returns a DataFusion table provider capsule.

`def eq(other)`

Compare this entry to another object.

Supports comparison with str and EntryId to enable the following patterns:

"entry_name" in client.entries()
entry_id in client.entries()

`def append(batches=None, **named_params)`

Append to the Table.

PARAMETER	DESCRIPTION
`batches`	Arrow data to append to the table. Can be a RecordBatchReader, a single RecordBatch, a list of RecordBatches, or a list of lists of RecordBatches (as returned by `datafusion.DataFrame.collect()`). TYPE: `_BatchesType \| None` DEFAULT: `None`
`**named_params`	Each named parameter corresponds to a column in the table. TYPE: `Any` DEFAULT: `{}`

`def arrow_schema()`

Returns the Arrow schema of the table.

`def delete()`

Delete this entry from the catalog.

`def overwrite(batches=None, **named_params)`

Overwrite the Table with new data.

PARAMETER	DESCRIPTION
`batches`	Arrow data to overwrite the table with. Can be a RecordBatchReader, a single RecordBatch, a list of RecordBatches, or a list of lists of RecordBatches (as returned by `datafusion.DataFrame.collect()`). TYPE: `_BatchesType \| None` DEFAULT: `None`
`**named_params`	Each named parameter corresponds to a column in the table. TYPE: `Any` DEFAULT: `{}`

`def reader()`

Registers the table with the DataFusion context and return a DataFrame.

`def to_arrow_reader()`

Convert this table to a pyarrow.RecordBatchReader.

`def update(*, name=None)`

Update this entry's properties.

PARAMETER	DESCRIPTION
`name`	New name for the entry TYPE: `str \| None` DEFAULT: `None`

`def upsert(batches=None, **named_params)`

Upsert data into the Table.

To use upsert, the table must contain a column with the metadata:

    {"rerun:is_table_index" = "true"}

Any row with a matching index value will have the new data inserted. Any row without a matching index value will be appended as a new row.

PARAMETER	DESCRIPTION
`batches`	Arrow data to upsert into the table. Can be a RecordBatchReader, a single RecordBatch, a list of RecordBatches, or a list of lists of RecordBatches (as returned by `datafusion.DataFrame.collect()`). TYPE: `_BatchesType \| None` DEFAULT: `None`
`**named_params`	Each named parameter corresponds to a column in the table TYPE: `Any` DEFAULT: `{}`

`class VectorDistanceMetric`

Bases: Enum

Which distance metric for use for vector index.

Catalog

rerun.catalog

IndexValuesLike: TypeAlias = npt.NDArray[np.int_] | npt.NDArray[np.datetime64] | pa.Int64Array module-attribute

VectorDistanceMetricLike: TypeAlias = VectorDistanceMetric | Literal['L2', 'Cosine', 'Dot', 'Hamming'] module-attribute

class Schema

def __eq__(other)

def __init__(inner)

def __iter__()

def __repr__()

def column_for(entity_path, component)

def column_for_selector(selector)

def column_names()

def component_columns()

def index_columns()

class ComponentColumnDescriptor

archetype: str property

component: str property

component_type: str | None property

entity_path: str property

is_static: bool property

name: str property

class ComponentColumnSelector

component: str property

entity_path: str property

def __init__(entity_path, component)

class IndexColumnDescriptor

is_static: bool property

name: str property

class IndexColumnSelector

name: str property

def __init__(index)

class AlreadyExistsError

class CatalogClient

ctx: datafusion.SessionContext property

url: str property

def all_entries() deprecated

def append_to_table(table_name, batches=None, **named_params) deprecated

def create_dataset(name)

def create_table(name, schema, url)

def create_table_entry(name, schema, url) deprecated

def dataset_entries() deprecated

def dataset_names(*, include_hidden=False)

def datasets(*, include_hidden=False)

def do_global_maintenance()

def entries(*, include_hidden=False)

def entry_names(*, include_hidden=False)

def get_dataset(name=None, *, id=None)

def get_dataset_entry(*, id=None, name=None) deprecated

def get_table(name=None, *, id=None)

def get_table_entry(*, id=None, name=None) deprecated

def register_table(name, url)

def table_entries() deprecated

def table_names(*, include_hidden=False)

def tables(*, include_hidden=False)

def update_table(table_name, batches=None, **named_params) deprecated

def write_table(name, batches, insert_mode) deprecated

class DatasetEntry

catalog: CatalogClient property

created_at: datetime property

id: EntryId property

kind: EntryKind property

manifest_url: str property

name: str property

updated_at: datetime property

def __eq__(other)

def arrow_schema()

def blueprint_dataset()

def blueprints()

def create_fts_index(*, column, time_index, store_position=False, base_tokenizer='simple') deprecated

def create_fts_search_index(*, column, time_index, store_position=False, base_tokenizer='simple')

def create_vector_index(*, column, time_index, target_partition_num_rows=None, num_sub_vectors=16, distance_metric='Cosine') deprecated

def create_vector_search_index(*, column, time_index, target_partition_num_rows=None, num_sub_vectors=16, distance_metric='Cosine')

def default_blueprint()

def default_blueprint_partition_id() deprecated

def delete()

def delete_indexes(column) deprecated

def delete_search_indexes(column)

def do_maintenance(optimize_indexes=False, retrain_indexes=False, compact_fragments=False, cleanup_before=None, unsafe_allow_recent_cleanup=False)

def download_partition(partition_id) deprecated

def download_segment(segment_id)

`rerun.catalog`

`IndexValuesLike: TypeAlias = npt.NDArray[np.int_] | npt.NDArray[np.datetime64] | pa.Int64Array` `module-attribute`

`VectorDistanceMetricLike: TypeAlias = VectorDistanceMetric | Literal['L2', 'Cosine', 'Dot', 'Hamming']` `module-attribute`

`class Schema`

`def eq(other)`

`def init(inner)`

`def iter()`

`def repr()`

`def column_for(entity_path, component)`

`def column_for_selector(selector)`

`def column_names()`

`def component_columns()`

`def index_columns()`

`class ComponentColumnDescriptor`

`archetype: str` `property`

`component: str` `property`

`component_type: str | None` `property`

`entity_path: str` `property`

`is_static: bool` `property`

`name: str` `property`

`class ComponentColumnSelector`

`component: str` `property`

`entity_path: str` `property`

`def init(entity_path, component)`

`class IndexColumnDescriptor`

`is_static: bool` `property`

`name: str` `property`

`class IndexColumnSelector`

`name: str` `property`

`def init(index)`

`class AlreadyExistsError`

`class CatalogClient`

`ctx: datafusion.SessionContext` `property`

`url: str` `property`

`def all_entries()` `deprecated`

`def append_to_table(table_name, batches=None, **named_params)` `deprecated`

`def create_dataset(name)`

`def create_table(name, schema, url)`

`def create_table_entry(name, schema, url)` `deprecated`

`def dataset_entries()` `deprecated`

`def dataset_names(*, include_hidden=False)`

`def datasets(*, include_hidden=False)`

`def do_global_maintenance()`

`def entries(*, include_hidden=False)`

`def entry_names(*, include_hidden=False)`

`def get_dataset(name=None, *, id=None)`

`def get_dataset_entry(*, id=None, name=None)` `deprecated`

`def get_table(name=None, *, id=None)`

`def get_table_entry(*, id=None, name=None)` `deprecated`

`def register_table(name, url)`

`def table_entries()` `deprecated`

`def table_names(*, include_hidden=False)`

`def tables(*, include_hidden=False)`

`def update_table(table_name, batches=None, **named_params)` `deprecated`

`def write_table(name, batches, insert_mode)` `deprecated`

`class DatasetEntry`

`catalog: CatalogClient` `property`

`created_at: datetime` `property`

`id: EntryId` `property`

`kind: EntryKind` `property`

`manifest_url: str` `property`

`name: str` `property`

`updated_at: datetime` `property`

`def eq(other)`

`def arrow_schema()`

`def blueprint_dataset()`

`def blueprints()`

`def create_fts_index(*, column, time_index, store_position=False, base_tokenizer='simple')` `deprecated`

`def create_fts_search_index(*, column, time_index, store_position=False, base_tokenizer='simple')`

`def create_vector_index(*, column, time_index, target_partition_num_rows=None, num_sub_vectors=16, distance_metric='Cosine')` `deprecated`

`def create_vector_search_index(*, column, time_index, target_partition_num_rows=None, num_sub_vectors=16, distance_metric='Cosine')`

`def default_blueprint()`

`def default_blueprint_partition_id()` `deprecated`

`def delete()`

`def delete_indexes(column)` `deprecated`

`def delete_search_indexes(column)`

`def do_maintenance(optimize_indexes=False, retrain_indexes=False, compact_fragments=False, cleanup_before=None, unsafe_allow_recent_cleanup=False)`

`def download_partition(partition_id)` `deprecated`

`def download_segment(segment_id)`

`def filter_contents(exprs)`