Basic structure of a tensor
-
ne
contains the number of elements in each dimension. ggml is row-major order, e.g. for a 2x4 matrix A (2 rows, 4 columns), each row has 4 elements, thusne[0] = 4
andne[1] =2
(number of elements in a column)- This means that the fastest moving dimension is the “first one”, which is the opposite of the Pytorch notation where the last one is the fastest moving.
-
nb
contains the stride: the number of bytes between consecutive elements in each dimension.- assuming float32,
nb[0] = sizeof(float) = 4
- Then,
nb[1] = ne[0]*nb[0]
(length of first dim in bytes) - We can then define bottom-up:
nb[i]=ne[i-1]*nb[i-1]
- (4,3,2) tensor
- assuming float32,
Tensors operations and views
-
op
may be any supported operation between tensors. Setting it toGGML_OP_NONE
marks that the tensor holds data. Other values can mark an operation. This is because ggml_tensors are part of a computation graph. -
src
is an array of pointers to the tensors between which the operation is to be taken. For example, ifop == GGML_OP_MUL_MAT
, thensrc
will contain pointers to the two tensors to be multiplied. Ifop == GGML_OP_NONE
, thensrc
will be empty. -
data
points to the actual tensor’s data, orNULL
if this tensor is an operation. It may also point to another tensor’s data, and then it’s known as a view e.g.ggml_transpose()
. -
Matmul example (
result
does not contain any data. It just represents the theoretical result of multiplying a and b)
Computing tensors
- The
ggml_mul_mat()
function above, or any other tensor operation, does not calculate anything but just prepares the tensors for the operation. - A different way to look at it is that it builds up a computation graph where each tensor operation is a node,
- The operation’s sources are the node’s children.
- In the matrix multiplication scenario, the graph has a parent node with operation
GGML_OP_MUL_MAT
, along with two children.
In order to actually compute the result tensor, the following steps are taken:
- Data is loaded into each leaf tensor’s
data
pointer. In the example the leaf tensors areA
andB
. - The output tensor (
AB
) is converted to a computation graph usingggml_build_forward()
. This function is relatively straightforward and orders the nodes in a depth-first order.1 - The computation graph is run using
ggml_graph_compute()
, which runsggml_compute_forward()
on each node in a depth-first order.ggml_compute_forward()
does the heavy lifting of calculations. It performs the mathematical operation and fills the tensor’sdata
pointer with the result. - At the end of this process, the output tensor’s
data
pointer points to the final result.