Utilities
rerun.utilities
datafusion
DataFusion utilities.
collect
def collect_to_string_list(df, col, remove_nulls=True)
Collect a single column of a DataFrame into a Python string list.
This is a convenience function. DataFusion collection returns a stream of record batches. Sometimes it is preferable to extract a single column out of all of these batches and convert it to a string.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
The input DataFusion DataFrame
TYPE:
|
col
|
The column to collect. You can provide either a string column name or a DataFusion expression. |
remove_nulls
|
If true, any
TYPE:
|
functions
url_generation
def partition_url(dataset, *, partition_id_col=None, timestamp_col=None, timeline_name=None)
Compute the URL for a partition within a dataset.
This is a Rerun focused DataFusion function that will create a DataFusion expression for the partition URL.
To manually invoke the underlying UDF, see partition_url_udf or
partition_url_with_timeref_udf.
| PARAMETER | DESCRIPTION |
|---|---|
dataset
|
The input Rerun Dataset.
TYPE:
|
partition_id_col
|
The column containing the partition ID. If not provided, it will assume
a default value of |
timestamp_col
|
If this parameter is passed in, generate a URL that will jump to a specific timestamp within the partition. |
timeline_name
|
When used in combination with
TYPE:
|
def partition_url_udf(dataset)
Create a UDF to the URL for a partition within a Dataset.
This function will generate a UDF that expects one column of input, a string containing the Partition ID.
def partition_url_with_timeref_udf(dataset, timeline_name)
Create a UDF to the URL for a partition within a Dataset with timestamp.
This function will generate a UDF that expects two columns of input, a string containing the Partition ID and the timestamp in nanoseconds.