Friday Feature: DataBuffers

FridayFeature

[Author's Note: Look ma! Posted it on time...well almost...its 20 minutes to midnight :) ].

Today’s Friday Feature is bit of an addendum to last week’s. So I left on how the engine has an interesting interop mechanism that focused on working with unmanaged memory and generics (RawBuffer).  This is a boon to work with for the new effects framework or other buffer graphics resources. However, it also highlighted bit of a design flaw with the engine’s venerable DataBuffer<T> class.

First some history. The first engine version (based on the original Spark design) has a DataBuffer / DataBuffer<T> implementation to deal with a managed array as a stream of bytes. The typed data buffer class has relative and absolute get/set methods for operating on the generic data. The parent abstract class has relative and absolute get/set methods for operating on the data as a bunch of bytes. The primary use case for the data buffer was for meshes – a way to organize individual vertex streams (positions, tangents, normals, texture coordinates, etc) in an easy to use, generic fashion. So it provided a nice way of getting bytes for every vertex to make interleaved buffers, but if needed, I could use the managed array directly.

Another plus of the data buffer design is the relative get/set, which makes building up triangle streams a lot easier during mesh generation. That last bit is the important piece to take from this. With a move to unmanaged buffers, potentially we wouldn’t have that managed array. So with all the interop goodness, I introduced some inconsistency with the overall engine design. Some parts of the API would be “half baked” in a way…have these unmanaged data buffers, but then have cases where managed arrays are input (Textures…Buffers…). So I spent some time re-designing some aspects of the data buffer idea to make the engine API more consistent and the overall data buffer concept more pleasant to work with, while preserving the original intent (treating data as bytes generically).

So, the high level changes are as follows:

  • Needed different types of data buffers…one that uses a RawBuffer, and another that is like the old one to use a managed array, therefore the data buffer design is now split into the IDataBuffer and IDataBuffer<T> interfaces. The standard implementations are DataBuffer<T> and DataBufferArray<T>.
  • Every API that takes a T[] array now takes an IDataBuffer<T>.
  • For increased usability, data buffers are now enumerable.. The typed inteface also has an indexer.
  • There is a struct wrapper that implements IDataBuffer<int> to wrap short/int indices, for use with mesh generation as well as computing bounding volumes.
  • All of these changes are to increase usability and performance (e.g. enumerable and unmanaged buffers), while at the same time decrease any unnecessary data replication. It’s all data in the end!

 

The interfaces are quite large, so I won’t post them in their entirety, but the general idea of functionality is like so:

IDataBuffer

  • Meta data – current position, length, size (in bytes), element type, element size (in bytes), if it has a next element. Clearly the idea is to work with typed elements, but this interface exists so we can work with the data without generics (e.g. generating an interleaved vertex buffer for meshes, we know each attribute size and how to deal with it).
  • Relative GetBytes(..) returns an array of bytes or puts them in an existing byte array. This starts at the current position and advances to the nearest element.
  • Relative SetBytes(..) does more or less the same thing with a byte array. If you notice, these relative methods are more in line with operating on the whole element size rather than a part of it (due to the nature of the current position pointer).
  • Absolute GetBytes(..) same functionality as the relative one, but you supply a starting element index (not a byte index).
  • Absolute SetBytes(..) again the same thing, just with putting byte data into the buffer. Like the relative methods, these absolute methods are more in line with working with whole element data. Of course, a subset of an element data can be retrieved/set relative to some index, but it cannot be offset from that index (like in a vertex buffer, for example…no notion of a “vertex stride”).
  • Implements IDisposable, IEnumerable, and ISavable
  • Also has options for direct raw access (pointer to unmanaged memory) – IF the buffer supports it. In the case of managed arrays, obviously the buffer does not support this.

IDataBuffer<T>

  • Has a get/set indexer.
  • Relative Get(..) methods for getting a single generic element, which will be at the current position, and the position pointer is incremented appropriately. And an absolute Set(..) doing the opposite.
  • Relative GetRange(…) which operates on a generic T[] array – either returning a new one or putting elements in an existing one. Same idea with starting at the current position and advancing by number of elements read. And an absolute SetRange(…) doing the opposite.
  • Absolute Set(…) methods for setting a single generic element at any index (basically method-based indexer). And also absolute Get(…), doing the opposite.
  • Absolute SetRange(…) methods which operates on a generic T[] array, by now you should know the drill! And also absolute GetRange(…), doing the opposite.
  • Implements IEnumerable<T>. The old implementation didn’t have this, it was mostly an oversight.

 

So that’s what the API looks like in a nutshell, there are three primary implementations:

  • DataBuffer<T>

This is the old data buffer in name only as it is now working with a RawBuffer as its backing store. You may recall from the last post that a RawBuffer does minimal error checking, as those checks are done at the data buffer level.

  • DataBufferArray<T>

This is the old data buffer in regards to functionality, albeit with a twist. This data buffer uses a managed T[] array as the backing store, and just like the original implementation exposes that data as a property. The twist is, if the DataBufferArray<T> is created with an input T[] array, the data is not copied. Instead the array is directly “bound” to the data buffer. You can also “bind” different T[] arrays to the data buffer.

So in some cases this data buffer acts a bridge between existing data stored in a managed array, and any Tesla API method that takes in a data buffer. You can create a pool of these objects, and keep re-using them, rather than copying data unnecessarily or creating new garbage.

Originally this class was called “DataBufferView<T>” because it acted as a “view” on the data. It has since changed to Array, since you can also create the data buffer without passing a T[] array in. You can also serialize this object via the ISavable serialization process, so its bit more than just a “view” since it actually stores stuff.

  • IndexData

Now this guy is a very special case. It’s a struct and it does not implement ISavable, its meant to be entirely used as a wrapper. It doesn’t actually store anything. Instead, it holds onto either an IDataBuffer<int> or an IDataBuffer<short> and forwards method calls appropriately. This was always a sticking point with the old mesh class in Tesla – dealing with either 32 bit or 16 bit indices (Tesla supports both).

IndexData implements the data buffer interface for ints of course, but the idea again is to not force the developer to have to copy data or resort to some hack to get short data to work with a method call that only takes an int array, or vice versa. You’ll see its usage mostly with the ComputeFromIndexedPoints method in the BoundingVolume class. I want to support either index format, it makes sense to, but it doesn’t make sense to do more work than you need to!

The struct also implements several implicit operators to automatically wrap up int/short data buffers (either the unmanaged or managed array versions). So its usage is fairly automatic and transparent.

 

And that’s about it actually. The data buffer interfaces/implementations serve as the backbone for all data management in the engine, so it’s a pretty important chunk of functionality that got some re-design time. It’s even more important since every API in the engine now takes the interface too. To re-iterate, that means everything…textures, vertex buffers, etc. So a vertex buffer no longer has a Set<T>(T[]), that method call is now Set<T>(IDataBuffer<T>).

As we’ve seen, this is no problem if you store your data inside a data buffer. If you don’t, there are mechanisms to easily and effortlessly (without impacting performance) to make your data work with the engine objects. But of course, data buffers have other niceties as I mentioned where their usage is highly encouraged, as you get more functionality and options than working with just a managed T[] array.

Comments

  1. Abe Hamade says

    Interesting read, thanks for the info! Couple questions for clarification mostly, DataBuffer implementation is unmanaged memory backing store, and DataBufferArray is managed memory backing store? Whereas the managed memory implementation supports direct binding to the original data the unmanaged version does not due to its originating source being manged memory?

  2. Starnick says

    Yes, the standard DataBuffer implementation is unmanaged because its backed by a RawBuffer. The memory is initialized via AllocHGlobal. Aside from the interop stuff, I’ve come more in contact with the Large Object Heap (LOH) in the last few years and it made sense to minimize garbage that gets allocated to that heap, since fragmentation can become an issue. That’s not so much of an issue with moving to .NET 4.5 though due to enhancements to the GC.

    You can create a DataBufferArray as such:

    Vector3[] myPositions;
    Vector3[] myNormals;

    DataBufferArray db = new DataBufferArray(myPositions);

    //Do work with positions

    db.Bind(myNormals);

    //Do work with normals

    The constructor in this case, does not copy the data from that vector3 array, rather it uses it as a backing store. The bind method does the same (and returns the old backing store). I assume that in 99% of the cases, you’re going to be using the standard implementation that uses the unmanaged backing store. But there are always a case where that may not be appropriate (the best one comes to mind is, say you’re using AssimpNet or other model importer and you already have arrays of data). Not a problem at compile time, but maybe its coming from some source at run time. It’s just another (potentially) useful way of working with your data, without having to jump through any hoops working with Tesla…due to the method call changes, since in effect this data buffer stuff has essentially “replaced” the engine’s usage of managed arrays.

    Maybe its not as big of a deal as I’m making out (actually, it’s not), but it’s a useful area that got attention and I find it rather neat with how it came out.

Leave a Reply to Abe Hamade Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Unable to load the Are You a Human PlayThru™. Please contact the site owner to report the problem.