Catalog

`rerun.catalog`

`IndexValuesLike: TypeAlias = npt.NDArray[np.int_] | npt.NDArray[np.datetime64] | pa.Int64Array` `module-attribute`

A type alias for index values.

This can be any numpy-compatible array of integers, or a pyarrow.Int64Array

`VectorDistanceMetricLike: TypeAlias = VectorDistanceMetric | Literal['L2', 'Cosine', 'Dot', 'Hamming']` `module-attribute`

A type alias for vector distance metrics.

`class Schema`

The schema representing a set of available columns for a dataset.

A schema contains both index columns (timelines) and component columns (entity/component data).

`def eq(other)`

Check equality with another Schema.

`def init(inner)`

Create a new Schema wrapper.

PARAMETER	DESCRIPTION
`inner`	The internal schema object from the bindings. TYPE: `SchemaInternal`

`def iter()`

Iterate over all column descriptors in the schema (index columns first, then component columns).

`def repr()`

Return a string representation of the schema.

`def archetypes(*, include_properties=False)`

Return a list of all the archetypes in the schema.

PARAMETER	DESCRIPTION
`include_properties`	If `True`, archetypes used in properties are included. TYPE: `bool` DEFAULT: `False`

`def column_for(entity_path, component)`

Look up the column descriptor for a specific entity path and component.

PARAMETER	DESCRIPTION
`entity_path`	The entity path to look up. TYPE: `str`
`component`	The component to look up. Example: `Points3D:positions`. TYPE: `str`

RETURNS	DESCRIPTION
`ComponentColumnDescriptor \| None`	The column descriptor, if it exists.

`def column_for_selector(selector)`

Look up the column descriptor for a specific selector.

PARAMETER	DESCRIPTION
`selector`	The selector to look up. String arguments are expected to follow the format: `"<entity_path>:<component_type>"` TYPE: `str \| ComponentColumnSelector \| ComponentColumnDescriptor`

RETURNS	DESCRIPTION
`ComponentColumnDescriptor`	The column descriptor.

RAISES	DESCRIPTION
`LookupError`	If the column is not found.
`ValueError`	If the string selector format is invalid or the input type is unsupported.
Note: if the input is already a `ComponentColumnDescriptor`, it is
`returned directly without checking for existence.`

`def column_names()`

Return a list of all column names in the schema.

RETURNS	DESCRIPTION
`The names of all columns (index columns first, then component columns).`

`def column_names_for(*, entity_path=None, archetype=None, component_type=None, include_properties=False)`

Return a list of column names matching the given criteria.

PARAMETER	DESCRIPTION
`entity_path`	If set, only return columns with this entity path. TYPE: `str \| None` DEFAULT: `None`
`archetype`	If set, only return columns with this archetype. Accepts a fully-qualified archetype name (e.g., `"rerun.archetypes.Points3D"`), a short name (e.g., `"Points3D"`), or an Archetype class (e.g., `rr.Points3D`). TYPE: `str \| type[Archetype] \| None` DEFAULT: `None`
`component_type`	If set, only return columns with this component type. TYPE: `str \| None` DEFAULT: `None`
`include_properties`	If `True`, include property columns (`/__properties/`). TYPE:* `bool` DEFAULT: `False`

`def columns_for(*, entity_path=None, archetype=None, component_type=None, include_properties=False)`

Return a filtered list of component columns matching the given criteria.

PARAMETER	DESCRIPTION
`entity_path`	If set, only return columns with this entity path. TYPE: `str \| None` DEFAULT: `None`
`archetype`	If set, only return columns with this archetype. Accepts a fully-qualified archetype name (e.g., `"rerun.archetypes.Points3D"`), a short name (e.g., `"Points3D"`), or an Archetype class (e.g., `rr.Points3D`). TYPE: `str \| type[Archetype] \| None` DEFAULT: `None`
`component_type`	If set, only return columns with this component type. TYPE: `str \| None` DEFAULT: `None`
`include_properties`	If `True`, include property columns (`/__properties/`). TYPE:* `bool` DEFAULT: `False`

`def component_columns()`

Return a list of all the component columns in the schema.

Component columns contain the data for a specific component of an entity.

`def component_types(*, include_properties=False)`

Return a list of all the component types in the schema.

PARAMETER	DESCRIPTION
`include_properties`	If `True`, component types used in properties are included. TYPE: `bool` DEFAULT: `False`

`def entity_paths(*, include_properties=False)`

Return a sorted list of all unique entity paths in the schema. By default, the properties are not included.

PARAMETER	DESCRIPTION
`include_properties`	If `True`, include property entities (`/__properties/`) TYPE:* `bool` DEFAULT: `False`

`def index_columns()`

Return a list of all the index columns in the schema.

Index columns contain the index values for when the data was updated. They generally correspond to Rerun timelines.

`class ComponentColumnDescriptor`

The descriptor of a component column.

Component columns contain the data for a specific component of an entity.

Column descriptors are used to describe the columns in a Schema. They are read-only. To select a component column, use ComponentColumnSelector.

`archetype: str | None` `property`

The archetype name, if any.

This property is read-only.

`component: str` `property`

The component.

This property is read-only.

`component_type: str | None` `property`

The component type, if any.

This property is read-only.

`entity_path: str` `property`

The entity path.

This property is read-only.

`is_property: bool` `property`

Is this column a property?

`is_static: bool` `property`

Whether the column is static.

This property is read-only.

`name: str` `property`

The name of this column.

This property is read-only.

`class ComponentColumnSelector`

A selector for a component column.

Component columns contain the data for a specific component of an entity.

`component: str` `property`

The component.

This property is read-only.

`entity_path: str` `property`

The entity path.

This property is read-only.

`def init(entity_path, component)`

Create a new ComponentColumnSelector.

PARAMETER	DESCRIPTION
`entity_path`	The entity path to select. TYPE: `str`
`component`	The component to select. Example: `Points3D:positions`. TYPE: `str`

`class IndexColumnDescriptor`

The descriptor of an index column.

Index columns contain the index values for when the data was updated. They generally correspond to Rerun timelines.

Column descriptors are used to describe the columns in a Schema. They are read-only. To select an index column, use IndexColumnSelector.

`is_static: bool` `property`

Part of generic ColumnDescriptor interface: always False for Index.

`name: str` `property`

The name of the index.

This property is read-only.

`class IndexColumnSelector`

A selector for an index column.

Index columns contain the index values for when the data was updated. They generally correspond to Rerun timelines.

`name: str` `property`

The name of the index.

This property is read-only.

`def init(index)`

Create a new IndexColumnSelector.

PARAMETER	DESCRIPTION
`index`	The name of the index to select. Usually the name of a timeline. TYPE: `str`

`class AlreadyExistsError`

Bases: Exception

Raised when trying to create a resource that already exists.

`class CatalogClient`

Client for a remote Rerun catalog server.

Note: the datafusion package is required to use this client. Initialization will fail with an error if the package is not installed.

`ctx: datafusion.SessionContext` `property`

Returns a DataFusion session context for querying the catalog.

`url: str` `property`

Returns the catalog URL.

`def init(url, *, token=None, addr=None)`

Connect to a remote Rerun catalog server.

PARAMETER	DESCRIPTION
`url`	The URL of the catalog server to connect to. TYPE: `str`
`token`	An optional authentication token to use when connecting to the server. TYPE: `str \| None` DEFAULT: `None`
`addr`	Deprecated: Renamed to `url` TYPE: `str \| None` DEFAULT: `None`

`def create_dataset(name)`

Creates a new dataset with the given name.

Entry names may only contain ASCII alphanumeric characters, underscores, hyphens, dots, colons and spaces, and must be at most 180 characters long.

`def create_table(name, schema, url=None)`

Create and register a new table.

PARAMETER	DESCRIPTION
`name`	The name of the table entry to create. It must be unique within all entries in the catalog. An exception will be raised if an entry with the same name already exists. Entry names may only contain ASCII alphanumeric characters, underscores, hyphens, dots, colons and spaces, and must be at most 180 characters long. TYPE: `str`
`schema`	The schema of the table to create. TYPE: `Schema`
`url`	The URL of the directory for where to store the Lance table. If provided, the table will be stored in a globally unique subdirectory. If not provided, the server will use an automatically generated URL based on its configured writable storage. TYPE: `str \| None` DEFAULT: `None`

`def dataset_names(*, include_hidden=False)`

Returns a list of all dataset names in the catalog.

PARAMETER	DESCRIPTION
`include_hidden`	If True, include blueprint datasets. TYPE: `bool` DEFAULT: `False`

`def datasets(*, include_hidden=False)`

Returns a list of all dataset entries in the catalog.

PARAMETER	DESCRIPTION
`include_hidden`	If True, include blueprint datasets. TYPE: `bool` DEFAULT: `False`

`def do_global_maintenance()`

Perform maintenance tasks on the whole system.

`def entries(*, include_hidden=False)`

Returns a list of all entries in the catalog.

PARAMETER	DESCRIPTION
`include_hidden`	If True, include hidden entries (blueprint datasets and system tables like `__entries`). TYPE: `bool` DEFAULT: `False`

`def entry_names(*, include_hidden=False)`

Returns a list of all entry names in the catalog.

PARAMETER	DESCRIPTION
`include_hidden`	If True, include hidden entries (blueprint datasets and system tables like `__entries`). TYPE: `bool` DEFAULT: `False`

`def get_dataset(name=None, *, id=None)`

Returns a dataset by its ID or name.

Exactly one of id or name must be provided.

PARAMETER	DESCRIPTION
`name`	The name of the dataset. TYPE: `str \| None` DEFAULT: `None`
`id`	The unique identifier of the dataset. Can be an `EntryId` object or its string representation. TYPE: `EntryId \| str \| None` DEFAULT: `None`

`def get_table(name=None, *, id=None)`

Returns a table by its ID or name.

Exactly one of id or name must be provided.

PARAMETER	DESCRIPTION
`name`	The name of the table. TYPE: `str \| None` DEFAULT: `None`
`id`	The unique identifier of the table. Can be an `EntryId` object or its string representation. TYPE: `EntryId \| str \| None` DEFAULT: `None`

`def register_table(name, url)`

Registers a foreign Lance table (identified by its URL) as a new table entry with the given name.

PARAMETER	DESCRIPTION
`name`	The name of the table entry to create. It must be unique within all entries in the catalog. An exception will be raised if an entry with the same name already exists. Entry names may only contain ASCII alphanumeric characters, underscores, hyphens, dots, colons and spaces, and must be at most 180 characters long. TYPE: `str`
`url`	The URL of the Lance table to register. TYPE: `str`

`def table_names(*, include_hidden=False)`

Returns a list of all table names in the catalog.

PARAMETER	DESCRIPTION
`include_hidden`	If True, include system tables (e.g., `__entries`). TYPE: `bool` DEFAULT: `False`

`def tables(*, include_hidden=False)`

Returns a list of all table entries in the catalog.

PARAMETER	DESCRIPTION
`include_hidden`	If True, include system tables (e.g., `__entries`). TYPE: `bool` DEFAULT: `False`

`class DatasetEntry`

Bases: Entry[DatasetEntryInternal]

A dataset entry in the catalog.

`catalog: CatalogClient` `property`

The catalog client that this entry belongs to.

`created_at: datetime` `property`

The entry's creation date and time.

`id: EntryId` `property`

The entry's id.

`kind: EntryKind` `property`

The entry's kind.

`manifest_url: str` `property`

Return the dataset manifest URL.

`name: str` `property`

The entry's name.

`updated_at: datetime` `property`

The entry's last updated date and time.

`def eq(other)`

Compare this entry to another object.

Supports comparison with str and EntryId to enable the following patterns:

"entry_name" in client.entries()
entry_id in client.entries()

`def arrow_schema()`

Return the Arrow schema of the data contained in the dataset.

`def blueprint_dataset()`

The associated blueprint dataset, if any.

`def blueprints()`

Lists all blueprints currently registered with this dataset.

`def create_fts_search_index(*, column, time_index, store_position=False, base_tokenizer='simple')`

Create a full-text search index on the given column.

`def create_vector_search_index(*, column, time_index, target_partition_num_rows=None, num_sub_vectors=16, distance_metric='Cosine')`

Create a vector index on the given column.

This will enable indexing and build the vector index over all existing values in the specified component column.

Results can be retrieved using the search_vector API, which will include the time-point on the indexed timeline.

Only one index can be created per component column -- executing this a second time for the same component column will replace the existing index.

PARAMETER	DESCRIPTION
`column`	The component column to create the index on. TYPE: `str \| ComponentColumnSelector \| ComponentColumnDescriptor`
`time_index`	Which timeline this index will map to. TYPE: `IndexColumnSelector`
`target_partition_num_rows`	The target size (in number of rows) for each partition. The underlying indexer (lance) will pick a default when no value is specified - today this is 8192. It will also cap the maximum number of partitions independently of this setting - currently 4096. TYPE: `int \| None` DEFAULT: `None`
`num_sub_vectors`	The number of sub-vectors to use when building the index. TYPE: `int` DEFAULT: `16`
`distance_metric`	The distance metric to use for the index. ("L2", "Cosine", "Dot", "Hamming") TYPE: `VectorDistanceMetric \| str` DEFAULT: `'Cosine'`

`def default_blueprint()`

Return the name currently set blueprint.

`def delete()`

Delete this entry from the catalog.

`def delete_search_indexes(column)`

Deletes all user-defined indexes for the specified column.

`def do_maintenance(optimize_indexes=False, retrain_indexes=False, compact_fragments=False, cleanup_before=None, unsafe_allow_recent_cleanup=False)`

Perform maintenance tasks on the datasets.

`def download_segment(segment_id)`

Download a segment from the dataset.

`def filter_contents(exprs)`

Return a new DatasetView filtered to the given entity paths.

Entity path expressions support wildcards: - "/points/**" matches all entities under /points - "-/text/**" excludes all entities under /text

PARAMETER	DESCRIPTION
`exprs`	Entity path expression or list of entity path expressions. Passing `[]` results in filtering out all contents. TYPE: `str \| Sequence[str]`

RETURNS	DESCRIPTION
`DatasetView`	A new view filtered to the matching entity paths.

Examples:

# Filter to a single entity path
view = dataset.filter_contents("/points/**")

# Filter to specific entity paths
view = dataset.filter_contents(["/points/**"])

# Exclude certain paths
view = dataset.filter_contents(["/points/**", "-/text/**"])

# Chain with segment filters
view = dataset.filter_segments(["recording_0"]).filter_contents("/points/**")

`def filter_segments(segment_ids)`

Return a new DatasetView filtered to the given segment IDs.

PARAMETER	DESCRIPTION
`segment_ids`	A segment ID string, a list of segment ID strings, or a DataFusion DataFrame with a column named 'rerun_segment_id'. When passing a DataFrame, if there are additional columns, they will be ignored. TYPE: `str \| Sequence[str] \| DataFrame`

RETURNS	DESCRIPTION
`DatasetView`	A new view filtered to the given segments.

Examples:

# Filter to a single segment
view = dataset.filter_segments("recording_0")

# Filter to specific segments
view = dataset.filter_segments(["recording_0", "recording_1"])

# Filter using a DataFrame
good_segments = segment_table.filter(col("success"))
view = dataset.filter_segments(good_segments)

# Read data from the filtered view
df = view.reader(index="timeline")

`def get_index_ranges()`

Returns the range bounds of all indexes per segment.

`def list_search_indexes()`

List all user-defined indexes in this dataset.

`def manifest(include_diagnostic_data=False)`

Return the dataset manifest as a DataFusion DataFrame.

PARAMETER	DESCRIPTION
`include_diagnostic_data`	Include diagnostic data in the manifest. That may include rows that correspond to layers which failed registration, were deleted, or are in pending states. Note Diagnostic data is subject to change in any release and should not be relied on for production. TYPE: `bool` DEFAULT: `False`

`def reader(index, *, include_semantically_empty_columns=False, include_tombstone_columns=False, fill_latest_at=False, using_index_values=None)`

Create a reader over this dataset.

Returns a DataFusion DataFrame.

Server side filters

The returned DataFrame supports server side filtering for both rerun_segment_id and the index (timeline) column, which can greatly improve performance. For example, the following filters will effectively be handled by the Rerun server.

dataset.reader(index="real_time").filter(col("rerun_segment_id") == "aabbccddee")
dataset.reader(index="real_time").filter(col("real_time") == "1234567890")
dataset.reader(index="real_time").filter(
    (col("rerun_segment_id") == "aabbccddee") & (col("real_time") == "1234567890")
)

PARAMETER	DESCRIPTION
`index`	The index (timeline) to use for the view. Pass `None` to read only static data. TYPE: `str \| None`
`include_semantically_empty_columns`	Whether to include columns that are semantically empty. TYPE: `bool` DEFAULT: `False`
`include_tombstone_columns`	Whether to include tombstone columns. TYPE: `bool` DEFAULT: `False`
`fill_latest_at`	Whether to fill null values with the latest valid data. TYPE: `bool` DEFAULT: `False`
`using_index_values`	If provided, specifies the exact index values to sample per segment. Can be a numpy array (datetime64[ns] or int64), a pyarrow Array, or a sequence. Use with `fill_latest_at=True` to populate rows with the most recent data. TYPE: `dict[str, IndexValuesLike] \| DataFrame \| IndexValuesLike \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	A DataFusion DataFrame.

`def register(recording_uri, *, layer_name='base', on_duplicate=OnDuplicateSegmentLayer.ERROR)`

Register RRD URIs to the dataset and return a handle to track progress.

This method initiates the registration of recordings to the dataset, and returns a handle that can be used to wait for completion or iterate over results.

PARAMETER	DESCRIPTION
`recording_uri`	The URI(s) of the RRD(s) to register. Can be a single URI string or a sequence of URIs. TYPE: `str \| Sequence[str]`
`layer_name`	The layer(s) to which the recordings will be registered to. Can be a single layer name (applied to all recordings) or a sequence of layer names (must match the length of `recording_uri`). Defaults to `"base"`. TYPE: `str \| Sequence[str]` DEFAULT: `'base'`
`on_duplicate`	How to handle the cases where the segment id and layer name already exist in the dataset? Defaults to `OnDuplicateSegmentLayer.ERROR`. TYPE: `OnDuplicateSegmentLayer` DEFAULT: `ERROR`

RETURNS	DESCRIPTION
`RegistrationHandle`	A handle to track and wait on the registration tasks.

`def register_blueprint(uri, set_default=True)`

Register an existing .rbl visible to the server.

By default, also set this blueprint as default.

`def register_prefix(recordings_prefix, layer_name=None, on_duplicate=OnDuplicateSegmentLayer.ERROR)`

Register all RRDs under a given prefix to the dataset and return a handle to track progress.

A prefix is a directory-like path in an object store (e.g. an S3 bucket or ABS container). All RRDs that are recursively found under the given prefix will be registered to the dataset.

This method initiates the registration of the recordings to the dataset, and returns a handle that can be used to wait for completion or iterate over results.

PARAMETER	DESCRIPTION
`recordings_prefix`	The prefix under which to register all RRDs. TYPE: `str`
`layer_name`	The layer to which the recordings will be registered to. If `None`, this defaults to `"base"`. TYPE: `str \| None` DEFAULT: `None`
`on_duplicate`	How to handle the cases where the segment id and layer name already exist in the dataset? Defaults to `OnDuplicateSegmentLayer.ERROR`. TYPE: `OnDuplicateSegmentLayer` DEFAULT: `ERROR`

RETURNS	DESCRIPTION
`A handle to track and wait on the registration tasks.`

`def schema()`

Return the schema of the data contained in the dataset.

`def search_fts(query, column)`

Search the dataset using a full-text search query.

`def search_vector(query, column, top_k)`

Search the dataset using a vector search query.

`def segment_ids()`

Returns a list of segment IDs for the dataset.

`def segment_table(join_meta=None, join_key='rerun_segment_id')`

Return the segment table as a DataFusion DataFrame.

The segment table contains metadata about each segment in the dataset, including segment IDs, layer names, storage URLs, and size information.

PARAMETER	DESCRIPTION
`join_meta`	Optional metadata table or DataFrame to join with the segment table. If a `TableEntry` is provided, it will be converted to a DataFrame using `reader()`. TYPE: `TableEntry \| DataFrame \| None` DEFAULT: `None`
`join_key`	The column name to use for joining, defaults to "rerun_segment_id". Both the segment table and `join_meta` must contain this column. TYPE: `str` DEFAULT: `'rerun_segment_id'`

RETURNS	DESCRIPTION
`DataFrame`	The segment metadata table, optionally joined with `join_meta`.

`def segment_url(segment_id, timeline=None, start=None, end=None)`

Return the URL for the given segment.

PARAMETER	DESCRIPTION
`segment_id`	The ID of the segment to get the URL for. TYPE: `str`
`timeline`	The name of the timeline to display. TYPE: `str \| None` DEFAULT: `None`
`start`	The start selected time for the segment. Integer for ticks, or datetime/nanoseconds for timestamps. TYPE: `datetime \| int \| None` DEFAULT: `None`
`end`	The end selected time for the segment. Integer for ticks, or datetime/nanoseconds for timestamps. TYPE: `datetime \| int \| None` DEFAULT: `None`

Examples:

With ticks

>>> start_tick, end_time = 0, 10
>>> dataset.segment_url("some_id", "log_tick", start_tick, end_time)

With timestamps

>>> start_time, end_time = datetime.now() - timedelta(seconds=4), datetime.now()
>>> dataset.segment_url("some_id", "real_time", start_time, end_time)

RETURNS	DESCRIPTION
`str`	The URL for the given segment.

`def set_default_blueprint(blueprint_name)`

Set an already-registered blueprint as default for this dataset.

`def set_name(name)`

Change the name of this entry.

Note: entry names must be unique within the catalog. If the new name is not unique, an error will be raised.

Entry names may only contain ASCII alphanumeric characters, underscores, hyphens, dots, colons and spaces, and must be at most 180 characters long.

PARAMETER	DESCRIPTION
`name`	New name for the entry TYPE: `str`

`def unregister(*, segments_to_drop, layers_to_drop, force=False)`

Unregisters segments and layers from the dataset.

Excluding IO errors, this will always succeed as long the target dataset exists. Corollary: unregistering data that doesn't exist is a no-op.

This method acts as a product filter: * empty segments_to_drop + empty layers_to_drop: invalid argument error * empty segments_to_drop + non-empty layers_to_drop: remove specified layers for all segments * non-empty segments_to_drop + empty layers_to_drop: remove all layers for specified segments * non-empty segments_to_drop + non-empty layers_to_drop: delete all specified layers for all specified segments

PARAMETER	DESCRIPTION
`segments_to_drop`	The segment IDs to drop. All of them if empty. The final filter will be the outer product of this and `layers_to_drop`. TYPE: `str \| Sequence[str]`
`layers_to_drop`	The layer names to drop. All of them if empty. The final filter will be the outer product of this and `segments_to_drop`. TYPE: `str \| Sequence[str]`
`force`	If true, deletion will go through regardless of the segments/layers' current statuses. This is only useful in the very specific, catatrophic scenario where the contents of the task queue were lost and some tasks are now stuck in `status=pending` forever. Do not use this unless you know exactly what you're doing. TYPE: `bool` DEFAULT: `False`

`def update(*, name=None)` `deprecated`

Deprecated

Entry.update() is deprecated. Use Entry.set_name() instead.

Update this entry's properties.

.. deprecated:: Use :meth:set_name instead.

PARAMETER	DESCRIPTION
`name`	New name for the entry TYPE: `str \| None` DEFAULT: `None`

`class DatasetView`

A filtered view over a dataset in the catalog.

A DatasetView provides lazy filtering over a dataset's segments and entity paths. Filters are composed lazily and only applied when data is actually read.

Create a DatasetView by calling filter_segments() or filter_contents() on a DatasetEntry.

Examples:

# Filter to specific segments
view = dataset.filter_segments(["recording_0", "recording_1"])

# Filter to specific entity paths
view = dataset.filter_contents(["/points/**"])

# Chain filters
view = dataset.filter_segments(["recording_0"]).filter_contents(["/points/**"])

# Read data
df = view.reader(index="timeline")

`def init(internal)`

Create a new DatasetView wrapper.

PARAMETER	DESCRIPTION
`internal`	The internal Rust-side DatasetView object. TYPE: `DatasetViewInternal`

`def repr()`

Return a string representation of the DatasetView.

`def arrow_schema()`

Return the filtered Arrow schema for this view.

RETURNS	DESCRIPTION
`Schema`	The filtered Arrow schema.

`def filter_contents(exprs)`

Return a new DatasetView filtered to the given entity paths.

Entity path expressions support wildcards: - "/points/**" matches all entities under /points - "-/text/**" excludes all entities under /text

PARAMETER	DESCRIPTION
`exprs`	Entity path expression or list of entity path expressions. Passing `[]` results in filtering out all contents. TYPE: `str \| Sequence[str]`

RETURNS	DESCRIPTION
`DatasetView`	A new view filtered to the matching entity paths.

Examples:

# Filter to a single entity path
view = dataset.filter_contents("/points/**")

# Filter to specific entity paths
view = dataset.filter_contents(["/points/**"])

# Exclude certain paths
view = dataset.filter_contents(["/points/**", "-/text/**"])

# Chain with segment filters
view = dataset.filter_segments(["recording_0"]).filter_contents("/points/**")

`def filter_segments(segment_ids)`

Return a new DatasetView filtered to the given segment IDs.

PARAMETER	DESCRIPTION
`segment_ids`	A segment ID string, a list of segment ID strings, or a DataFusion DataFrame with a column named 'rerun_segment_id'. When passing a DataFrame, if there are additional columns, they will be ignored. TYPE: `str \| Sequence[str] \| DataFrame`

RETURNS	DESCRIPTION
`DatasetView`	A new view filtered to the given segments.

Examples:

# Filter to a single segment
view = dataset.filter_segments("recording_0")

# Filter to specific segments
view = dataset.filter_segments(["recording_0", "recording_1"])

# Filter using a DataFrame
good_segments = segment_table.filter(col("success"))
view = dataset.filter_segments(good_segments)

# Read data from the filtered view
df = view.reader(index="timeline")

`def get_index_ranges()`

Returns the range bounds of all indexes per segment.

`def reader(index, *, include_semantically_empty_columns=False, include_tombstone_columns=False, using_index_values=None, fill_latest_at=False)`

Create a reader over this DatasetView.

Returns a DataFusion DataFrame.

Server side filters

The returned DataFrame supports server side filtering for both rerun_segment_id and the index (timeline) column, which can greatly improve performance. For example, the following filters will effectively be handled by the Rerun server.

dataset.reader(index="real_time").filter(col("rerun_segment_id") == "aabbccddee")
dataset.reader(index="real_time").filter(col("real_time") == "1234567890")
dataset.reader(index="real_time").filter(
    (col("rerun_segment_id") == "aabbccddee") & (col("real_time") == "1234567890")
)

PARAMETER	DESCRIPTION
`index`	The index (timeline) to use for the view. Pass `None` to read only static data. TYPE: `str \| None`
`include_semantically_empty_columns`	Whether to include columns that are semantically empty. TYPE: `bool` DEFAULT: `False`
`include_tombstone_columns`	Whether to include tombstone columns. TYPE: `bool` DEFAULT: `False`
`using_index_values`	If a dict is provided, keys are segment IDs and values are the index values to sample for that segment (per-segment semantics). If a DataFrame is provided, it must have 'rerun_segment_id' and index columns. Use with `fill_latest_at=True` to populate rows with the most recent data. TYPE: `dict[str, IndexValuesLike] \| DataFrame \| IndexValuesLike \| None` DEFAULT: `None`
`fill_latest_at`	Whether to fill null values with the latest valid data. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`A DataFusion DataFrame.`

`def schema()`

Return the filtered schema for this view.

The schema reflects any content filters applied to the view.

RETURNS	DESCRIPTION
`Schema`	The filtered schema.

`def segment_ids()`

Return the segment IDs for this view.

If segment filters have been applied, only matching segments are returned.

RETURNS	DESCRIPTION
`list[str]`	The list of segment IDs.

`def segment_table(join_meta=None, join_key='rerun_segment_id')`

Return the segment table as a DataFusion DataFrame.

The segment table contains metadata about each segment in the dataset, including segment IDs, layer names, storage URLs, and size information.

Only segments matching this view's filters are included.

PARAMETER	DESCRIPTION
`join_meta`	Optional metadata table or DataFrame to join with the segment table. If a `TableEntry` is provided, it will be converted to a DataFrame using `reader()`. TYPE: `TableEntry \| DataFrame \| None` DEFAULT: `None`
`join_key`	The column name to use for joining, defaults to "rerun_segment_id". Both the segment table and `join_meta` must contain this column. TYPE: `str` DEFAULT: `'rerun_segment_id'`

RETURNS	DESCRIPTION
`DataFrame`	The segment metadata table, optionally joined with `join_meta`.

`class Entry`

Bases: ABC, Generic[InternalEntryT]

An entry in the catalog.

`catalog: CatalogClient` `property`

The catalog client that this entry belongs to.

`created_at: datetime` `property`

The entry's creation date and time.

`id: EntryId` `property`

The entry's id.

`kind: EntryKind` `property`

The entry's kind.

`name: str` `property`

The entry's name.

`updated_at: datetime` `property`

The entry's last updated date and time.

`def eq(other)`

Compare this entry to another object.

Supports comparison with str and EntryId to enable the following patterns:

"entry_name" in client.entries()
entry_id in client.entries()

`def delete()`

Delete this entry from the catalog.

`def set_name(name)`

Change the name of this entry.

Note: entry names must be unique within the catalog. If the new name is not unique, an error will be raised.

Entry names may only contain ASCII alphanumeric characters, underscores, hyphens, dots, colons and spaces, and must be at most 180 characters long.

PARAMETER	DESCRIPTION
`name`	New name for the entry TYPE: `str`

`def update(*, name=None)` `deprecated`

Deprecated

Entry.update() is deprecated. Use Entry.set_name() instead.

Update this entry's properties.

.. deprecated:: Use :meth:set_name instead.

PARAMETER	DESCRIPTION
`name`	New name for the entry TYPE: `str \| None` DEFAULT: `None`

`class EntryId`

A unique identifier for an entry in the catalog.

`def init(id)`

Create a new EntryId from a string.

`def str()`

Return str(self).

`def as_bytes()`

Return the raw 16-byte representation.

`class EntryKind`

The kinds of entries that can be stored in the catalog.

`def int()`

int(self)

`def str()`

Return str(self).

`class NotFoundError`

Bases: Exception

Raised when the requested resource is not found.

`class RegistrationHandle`

Handle to track and wait on segment registration tasks.

`def iter_results(timeout_secs=None)`

Stream completed registrations as they finish.

Uses the server's streaming API to yield results as tasks complete. Each result is yielded exactly once when its task completes (success or error).

PARAMETER	DESCRIPTION
`timeout_secs`	Timeout in seconds. None for blocking. Note that using None doesn't guarantee that a TimeoutError will never be eventually raised for long-running tasks. Setting a timeout and polling is recommended for monitoring very large registration batches. TYPE: `int \| None` DEFAULT: `None`

YIELDS	DESCRIPTION
`SegmentRegistrationResult`	The result of each completed registration.

RAISES	DESCRIPTION
`TimeoutError`	If the timeout is reached before all tasks complete.

`def wait(timeout_secs=None)`

Block until all registrations complete.

PARAMETER	DESCRIPTION
`timeout_secs`	Timeout in seconds. None for blocking. Note that using None doesn't guarantee that a TimeoutError will never be eventually raised for long-running tasks. Setting a timeout and polling is recommended for monitoring very large registration batches. TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`RegistrationResult`	The result containing the list of segment IDs in registration order.

RAISES	DESCRIPTION
`ValueError`	If any registration fails.
`TimeoutError`	If the timeout is reached before all tasks complete.

`class RegistrationResult` `dataclass`

Result of a completed registration batch.

`segment_ids: list[str]` `instance-attribute`

The ids of the registered segments.

`class SegmentRegistrationResult` `dataclass`

Result of a completed segment registration.

`error: str | None` `instance-attribute`

Error message if registration failed, or None if successful.

`is_error: bool` `property`

Returns True if the registration failed.

`is_success: bool` `property`

Returns True if the registration was successful.

`segment_id: str | None` `instance-attribute`

The resulting segment ID. May be None if registration failed.

`uri: str` `instance-attribute`

The source URI that was registered.

`class TableEntry`

Bases: Entry[TableEntryInternal]

A table entry in the catalog.

Note: this object acts as a table provider for DataFusion.

`catalog: CatalogClient` `property`

The catalog client that this entry belongs to.

`created_at: datetime` `property`

The entry's creation date and time.

`id: EntryId` `property`

The entry's id.

`kind: EntryKind` `property`

The entry's kind.

`name: str` `property`

The entry's name.

`storage_url: str` `property`

The table's storage URL.

`updated_at: datetime` `property`

The entry's last updated date and time.

`def __datafusion_table_provider__()`

Returns a DataFusion table provider capsule.

`def eq(other)`

Compare this entry to another object.

Supports comparison with str and EntryId to enable the following patterns:

"entry_name" in client.entries()
entry_id in client.entries()

`def append(batches=None, **named_params)`

Append to the Table.

PARAMETER	DESCRIPTION
`batches`	Arrow data to append to the table. Can be a RecordBatchReader, a single RecordBatch, a list of RecordBatches, or a list of lists of RecordBatches (as returned by `datafusion.DataFrame.collect()`). TYPE: `_BatchesType \| None` DEFAULT: `None`
`**named_params`	Each named parameter corresponds to a column in the table. TYPE: `Any` DEFAULT: `{}`

`def arrow_schema()`

Returns the Arrow schema of the table.

`def delete()`

Delete this entry from the catalog.

`def overwrite(batches=None, **named_params)`

Overwrite the Table with new data.

PARAMETER	DESCRIPTION
`batches`	Arrow data to overwrite the table with. Can be a RecordBatchReader, a single RecordBatch, a list of RecordBatches, or a list of lists of RecordBatches (as returned by `datafusion.DataFrame.collect()`). TYPE: `_BatchesType \| None` DEFAULT: `None`
`**named_params`	Each named parameter corresponds to a column in the table. TYPE: `Any` DEFAULT: `{}`

`def reader()`

Registers the table with the DataFusion context and return a DataFrame.

`def set_name(name)`

Change the name of this entry.

Note: entry names must be unique within the catalog. If the new name is not unique, an error will be raised.

Entry names may only contain ASCII alphanumeric characters, underscores, hyphens, dots, colons and spaces, and must be at most 180 characters long.

PARAMETER	DESCRIPTION
`name`	New name for the entry TYPE: `str`

`def to_arrow_reader()`

Convert this table to a pyarrow.RecordBatchReader.

`def update(*, name=None)` `deprecated`

Deprecated

Entry.update() is deprecated. Use Entry.set_name() instead.

Update this entry's properties.

.. deprecated:: Use :meth:set_name instead.

PARAMETER	DESCRIPTION
`name`	New name for the entry TYPE: `str \| None` DEFAULT: `None`

`def upsert(batches=None, **named_params)`

Upsert data into the Table.

To use upsert, the table must contain a column with the metadata:

    {"rerun:is_table_index" = "true"}

Any row with a matching index value will have the new data inserted. Any row without a matching index value will be appended as a new row.

PARAMETER	DESCRIPTION
`batches`	Arrow data to upsert into the table. Can be a RecordBatchReader, a single RecordBatch, a list of RecordBatches, or a list of lists of RecordBatches (as returned by `datafusion.DataFrame.collect()`). TYPE: `_BatchesType \| None` DEFAULT: `None`
`**named_params`	Each named parameter corresponds to a column in the table TYPE: `Any` DEFAULT: `{}`

`class VectorDistanceMetric`

Bases: Enum

Which distance metric for use for vector index.

Catalog

rerun.catalog

IndexValuesLike: TypeAlias = npt.NDArray[np.int_] | npt.NDArray[np.datetime64] | pa.Int64Array module-attribute

VectorDistanceMetricLike: TypeAlias = VectorDistanceMetric | Literal['L2', 'Cosine', 'Dot', 'Hamming'] module-attribute

class Schema

def __eq__(other)

def __init__(inner)

def __iter__()

def __repr__()

def archetypes(*, include_properties=False)

def column_for(entity_path, component)

def column_for_selector(selector)

def column_names()

def column_names_for(*, entity_path=None, archetype=None, component_type=None, include_properties=False)

def columns_for(*, entity_path=None, archetype=None, component_type=None, include_properties=False)

def component_columns()

def component_types(*, include_properties=False)

def entity_paths(*, include_properties=False)

def index_columns()

class ComponentColumnDescriptor

archetype: str | None property

component: str property

component_type: str | None property

entity_path: str property

is_property: bool property

is_static: bool property

name: str property

class ComponentColumnSelector

component: str property

entity_path: str property

def __init__(entity_path, component)

class IndexColumnDescriptor

is_static: bool property

name: str property

class IndexColumnSelector

name: str property

def __init__(index)

class AlreadyExistsError

class CatalogClient

ctx: datafusion.SessionContext property

url: str property

def __init__(url, *, token=None, addr=None)

def create_dataset(name)

def create_table(name, schema, url=None)

def dataset_names(*, include_hidden=False)

def datasets(*, include_hidden=False)

def do_global_maintenance()

def entries(*, include_hidden=False)

def entry_names(*, include_hidden=False)

def get_dataset(name=None, *, id=None)

def get_table(name=None, *, id=None)

def register_table(name, url)

def table_names(*, include_hidden=False)

def tables(*, include_hidden=False)

class DatasetEntry

catalog: CatalogClient property

created_at: datetime property

id: EntryId property

kind: EntryKind property

manifest_url: str property

name: str property

updated_at: datetime property

def __eq__(other)

def arrow_schema()

def blueprint_dataset()

def blueprints()

def create_fts_search_index(*, column, time_index, store_position=False, base_tokenizer='simple')

def create_vector_search_index(*, column, time_index, target_partition_num_rows=None, num_sub_vectors=16, distance_metric='Cosine')

def default_blueprint()

def delete()

def delete_search_indexes(column)

def do_maintenance(optimize_indexes=False, retrain_indexes=False, compact_fragments=False, cleanup_before=None, unsafe_allow_recent_cleanup=False)

def download_segment(segment_id)

def filter_contents(exprs)

def filter_segments(segment_ids)

def get_index_ranges()

def list_search_indexes()

def manifest(include_diagnostic_data=False)

def reader(index, *, include_semantically_empty_columns=False, include_tombstone_columns=False, fill_latest_at=False, using_index_values=None)

def register(recording_uri, *, layer_name='base', on_duplicate=OnDuplicateSegmentLayer.ERROR)

`rerun.catalog`

`IndexValuesLike: TypeAlias = npt.NDArray[np.int_] | npt.NDArray[np.datetime64] | pa.Int64Array` `module-attribute`

`VectorDistanceMetricLike: TypeAlias = VectorDistanceMetric | Literal['L2', 'Cosine', 'Dot', 'Hamming']` `module-attribute`

`class Schema`

`def eq(other)`

`def init(inner)`

`def iter()`

`def repr()`

`def archetypes(*, include_properties=False)`

`def column_for(entity_path, component)`

`def column_for_selector(selector)`

`def column_names()`

`def column_names_for(*, entity_path=None, archetype=None, component_type=None, include_properties=False)`

`def columns_for(*, entity_path=None, archetype=None, component_type=None, include_properties=False)`

`def component_columns()`

`def component_types(*, include_properties=False)`

`def entity_paths(*, include_properties=False)`

`def index_columns()`

`class ComponentColumnDescriptor`

`archetype: str | None` `property`

`component: str` `property`

`component_type: str | None` `property`

`entity_path: str` `property`

`is_property: bool` `property`

`is_static: bool` `property`

`name: str` `property`

`class ComponentColumnSelector`

`component: str` `property`

`entity_path: str` `property`

`def init(entity_path, component)`

`class IndexColumnDescriptor`

`is_static: bool` `property`

`name: str` `property`

`class IndexColumnSelector`

`name: str` `property`

`def init(index)`

`class AlreadyExistsError`

`class CatalogClient`

`ctx: datafusion.SessionContext` `property`

`url: str` `property`

`def init(url, *, token=None, addr=None)`

`def create_dataset(name)`

`def create_table(name, schema, url=None)`

`def dataset_names(*, include_hidden=False)`

`def datasets(*, include_hidden=False)`

`def do_global_maintenance()`

`def entries(*, include_hidden=False)`

`def entry_names(*, include_hidden=False)`

`def get_dataset(name=None, *, id=None)`

`def get_table(name=None, *, id=None)`

`def register_table(name, url)`

`def table_names(*, include_hidden=False)`

`def tables(*, include_hidden=False)`

`class DatasetEntry`

`catalog: CatalogClient` `property`

`created_at: datetime` `property`

`id: EntryId` `property`

`kind: EntryKind` `property`

`manifest_url: str` `property`

`name: str` `property`

`updated_at: datetime` `property`

`def eq(other)`

`def arrow_schema()`

`def blueprint_dataset()`

`def blueprints()`

`def create_fts_search_index(*, column, time_index, store_position=False, base_tokenizer='simple')`

`def create_vector_search_index(*, column, time_index, target_partition_num_rows=None, num_sub_vectors=16, distance_metric='Cosine')`

`def default_blueprint()`

`def delete()`

`def delete_search_indexes(column)`

`def do_maintenance(optimize_indexes=False, retrain_indexes=False, compact_fragments=False, cleanup_before=None, unsafe_allow_recent_cleanup=False)`

`def download_segment(segment_id)`

`def filter_contents(exprs)`

`def filter_segments(segment_ids)`

`def get_index_ranges()`

`def list_search_indexes()`

`def manifest(include_diagnostic_data=False)`

`def reader(index, *, include_semantically_empty_columns=False, include_tombstone_columns=False, fill_latest_at=False, using_index_values=None)`

`def register(recording_uri, *, layer_name='base', on_duplicate=OnDuplicateSegmentLayer.ERROR)`

`def register_blueprint(uri, set_default=True)`