- This simulates a systolic array matrix multiplier, as might be found in a Tensor Processing Unit.
- This code multiplies two input matrices (mk times kn) and adds the result to a third input matrix (mn).
- Row-major: mk has m rows and k columns, kn has k rows and n columns, mn has m rows and n columns.
- The systolic array is the same shape as matrix mn. Each element of mn has a data processing unit (dpu).
- All k elements of column i of matrix kn flow south (unmodified) along column i of the systolic array.
- All k elements of row i of matrix mk flow east (unmodified) along row i of the systolic array.
- Each dpu computes the dot product of a full row of matrix mk and a full column of matrix kn.

