Basic structure of a tensor

ne
contains the number of elements in each dimension. ggml is rowmajor order, e.g. for a 2x4 matrix A (2 rows, 4 columns), each row has 4 elements, thusne[0] = 4
andne[1] =2
(number of elements in a column) This means that the fastest moving dimension is the â€śfirst oneâ€ť, which is the opposite of the Pytorch notation where the last one is the fastest moving.

nb
contains the stride: the number of bytes between consecutive elements in each dimension. assuming float32,
nb[0] = sizeof(float) = 4
 Then,
nb[1] = ne[0]*nb[0]
(length of first dim in bytes)  We can then define bottomup:
nb[i]=ne[i1]*nb[i1]
 (4,3,2) tensor
 assuming float32,
Tensors operations and views

op
may be any supported operation between tensors. Setting it toGGML_OP_NONE
marks that the tensor holds data. Other values can mark an operation. This is because ggml_tensors are part of a computation graph. 
src
is an array of pointers to the tensors between which the operation is to be taken. For example, ifop == GGML_OP_MUL_MAT
, thensrc
will contain pointers to the two tensors to be multiplied. Ifop == GGML_OP_NONE
, thensrc
will be empty. 
data
points to the actual tensorâ€™s data, orNULL
if this tensor is an operation. It may also point to another tensorâ€™s data, and then itâ€™s known as a view e.g.ggml_transpose()
. 
Matmul example (
result
does not contain any data. It just represents the theoretical result of multiplying a and b)
Computing tensors
 The
ggml_mul_mat()
function above, or any other tensor operation, does not calculate anything but just prepares the tensors for the operation.  A different way to look at it is that it builds up a computation graph where each tensor operation is a node,
 The operationâ€™s sources are the nodeâ€™s children.
 In the matrix multiplication scenario, the graph has a parent node with operation
GGML_OP_MUL_MAT
, along with two children.
In order to actually compute the result tensor, the following steps are taken:
 Data is loaded into each leaf tensorâ€™s
data
pointer. In the example the leaf tensors areA
andB
.  The output tensor (
AB
) is converted to a computation graph usingggml_build_forward()
. This function is relatively straightforward and orders the nodes in a depthfirst order.1  The computation graph is run using
ggml_graph_compute()
, which runsggml_compute_forward()
on each node in a depthfirst order.ggml_compute_forward()
does the heavy lifting of calculations. It performs the mathematical operation and fills the tensorâ€™sdata
pointer with the result.  At the end of this process, the output tensorâ€™s
data
pointer points to the final result.