pub struct CpuWriteGpuReadBelt {
chunk_size: u64,
active_chunks: Vec<Chunk>,
closed_chunks: Vec<Chunk>,
free_chunks: Vec<Chunk>,
sender: Sender<Chunk>,
receiver: Receiver<Chunk>,
}
Expand description
Efficiently performs many buffer writes by sharing and reusing temporary buffers.
Internally it uses a ring-buffer of staging buffers that are sub-allocated.
Based on to wgpu::util::StagingBelt
However, there are some important differences:
- can create buffers without yet knowing the target copy location
- lifetime of returned buffers is independent of the
CpuWriteGpuReadBelt
(allows working with several in parallel!) - use of
re_renderer
’s resource pool - handles alignment in a defined manner (see this as of writing open wgpu issue on Alignment guarantees for mapped buffers)
Fields§
§chunk_size: u64
Minimum size for new buffers.
active_chunks: Vec<Chunk>
Chunks which are CPU write at the moment.
closed_chunks: Vec<Chunk>
Chunks which are GPU read at the moment.
I.e. they have scheduled transfers already; they are unmapped and one or more
command encoder has one or more copy_buffer_to_buffer
commands with them
as source.
free_chunks: Vec<Chunk>
Chunks that are back from the GPU and ready to be mapped for write and put
into active_chunks
.
sender: Sender<Chunk>
When closed chunks are mapped again, the map callback sends them here.
Note that we shouldn’t use SyncSender
since this can block the Sender
if a buffer is full,
which means that in a single threaded situation (Web!) we might deadlock.
receiver: Receiver<Chunk>
Free chunks are received here to be put on self.free_chunks
.
Implementations§
source§impl CpuWriteGpuReadBelt
impl CpuWriteGpuReadBelt
sourceconst MIN_OFFSET_ALIGNMENT: u64 = 16u64
const MIN_OFFSET_ALIGNMENT: u64 = 16u64
All allocations of this allocator will be aligned to at least this size.
Requiring a minimum alignment means we need to pad less often. Also, it has the potential of making memcpy operations faster.
Needs to be larger or equal than [wgpu::MAP_ALIGNMENT
], [wgpu::COPY_BUFFER_ALIGNMENT
]
and the largest possible texel block footprint (since offsets for texture copies require this)
For alignment requirements in WebGPU
in general, refer to
the specification on alignment-class limitations
Note that this does NOT mean that the CPU memory has any alignment. See this issue about lack of CPU memory alignment in wgpu/WebGPU.
sourcepub fn new(chunk_size: BufferSize) -> Self
pub fn new(chunk_size: BufferSize) -> Self
Create a cpu-write & gpu-read staging belt.
The chunk_size
is the unit of internal buffer allocation; writes will be
sub-allocated within each chunk. Therefore, for optimal use of memory, the
chunk size should be:
- larger than the largest single
CpuWriteGpuReadBelt::allocate
operation; - 1-4 times less than the total amount of data uploaded per submission
(per
CpuWriteGpuReadBelt::before_queue_submit()
); and - bigger is better, within these bounds.
TODO(andreas): Adaptive chunk sizes TODO(andreas): Shrinking after usage spikes?
sourcepub fn allocate<T: Pod + Send + Sync>(
&mut self,
device: &Device,
buffer_pool: &GpuBufferPool,
num_elements: usize,
) -> Result<CpuWriteGpuReadBuffer<T>, CpuWriteGpuReadError>
pub fn allocate<T: Pod + Send + Sync>( &mut self, device: &Device, buffer_pool: &GpuBufferPool, num_elements: usize, ) -> Result<CpuWriteGpuReadBuffer<T>, CpuWriteGpuReadError>
Allocates a cpu writable buffer for num_elements
instances of type T
.
The buffer will be aligned to T’s alignment, but no less than Self::MIN_OFFSET_ALIGNMENT
.
sourcepub fn before_queue_submit(&mut self)
pub fn before_queue_submit(&mut self)
Prepare currently mapped buffers for use in a submission.
This must be called before the command encoder(s) used in CpuWriteGpuReadBuffer
copy operations are submitted.
At this point, all the partially used staging buffers are closed (cannot be used for
further writes) until after CpuWriteGpuReadBelt::after_queue_submit
is called and the GPU is done
copying the data from them.
sourcepub fn after_queue_submit(&mut self)
pub fn after_queue_submit(&mut self)
Recall all of the closed buffers back to be reused.
This must only be called after the command encoder(s) used in CpuWriteGpuReadBuffer
copy operations are submitted. Additional calls are harmless.
Not calling this as soon as possible may result in increased buffer memory usage.
sourcefn receive_chunks(&mut self)
fn receive_chunks(&mut self)
Move all chunks that the GPU is done with (and are now mapped again)
from self.receiver
to self.free_chunks
.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for CpuWriteGpuReadBelt
impl !RefUnwindSafe for CpuWriteGpuReadBelt
impl Send for CpuWriteGpuReadBelt
impl !Sync for CpuWriteGpuReadBelt
impl Unpin for CpuWriteGpuReadBelt
impl !UnwindSafe for CpuWriteGpuReadBelt
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> Downcast for Twhere
T: Any,
impl<T> Downcast for Twhere
T: Any,
§fn into_any(self: Box<T>) -> Box<dyn Any>
fn into_any(self: Box<T>) -> Box<dyn Any>
Box<dyn Trait>
(where Trait: Downcast
) to Box<dyn Any>
. Box<dyn Any>
can
then be further downcast
into Box<ConcreteType>
where ConcreteType
implements Trait
.§fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
Rc<Trait>
(where Trait: Downcast
) to Rc<Any>
. Rc<Any>
can then be
further downcast
into Rc<ConcreteType>
where ConcreteType
implements Trait
.§fn as_any(&self) -> &(dyn Any + 'static)
fn as_any(&self) -> &(dyn Any + 'static)
&Trait
(where Trait: Downcast
) to &Any
. This is needed since Rust cannot
generate &Any
’s vtable from &Trait
’s.§fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
&mut Trait
(where Trait: Downcast
) to &Any
. This is needed since Rust cannot
generate &mut Any
’s vtable from &mut Trait
’s.§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
source§impl<T> IntoEither for T
impl<T> IntoEither for T
source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moresource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more