Skip to content

Utilities

rerun.utilities

datafusion

DataFusion utilities.

collect
def collect_to_string_list(df, col, remove_nulls=True)

Collect a single column of a DataFrame into a Python string list.

This is a convenience function. DataFusion collection returns a stream of record batches. Sometimes it is preferable to extract a single column out of all of these batches and convert it to a string.

PARAMETER DESCRIPTION
df

The input DataFusion DataFrame

TYPE: DataFrame

col

The column to collect. You can provide either a string column name or a DataFusion expression.

TYPE: str | Expr

remove_nulls

If true, any null values will be removed from the result. If false these will be converted into None.

TYPE: bool DEFAULT: True

functions
url_generation
def partition_url(dataset, *, partition_id_col=None, timestamp_col=None, timeline_name=None)

Compute the URL for a partition within a dataset.

This is a Rerun focused DataFusion function that will create a DataFusion expression for the partition URL.

To manually invoke the underlying UDF, see partition_url_udf or partition_url_with_timeref_udf.

PARAMETER DESCRIPTION
dataset

The input Rerun Dataset.

TYPE: DatasetEntry

partition_id_col

The column containing the partition ID. If not provided, it will assume a default value of rerun_partition_id. You may pass either a DataFusion expression or a string column name.

TYPE: str | Expr | None DEFAULT: None

timestamp_col

If this parameter is passed in, generate a URL that will jump to a specific timestamp within the partition.

TYPE: str | Expr | None DEFAULT: None

timeline_name

When used in combination with timestamp_col, this specifies which timeline to seek along. By default this will use the same string as timestamp_col.

TYPE: str | None DEFAULT: None

def partition_url_udf(dataset)

Create a UDF to the URL for a partition within a Dataset.

This function will generate a UDF that expects one column of input, a string containing the Partition ID.

def partition_url_with_timeref_udf(dataset, timeline_name)

Create a UDF to the URL for a partition within a Dataset with timestamp.

This function will generate a UDF that expects two columns of input, a string containing the Partition ID and the timestamp in nanoseconds.