- This simulates a systolic array matrix multiplier, as might be found in a Tensor Processing Unit.
- This code multiplies two input matrices (mk times kn) and adds the result to a third input matrix (mn).
- Row-major: mk has m rows and k columns, kn has k rows and n columns, mn has m rows and n columns.
- The systolic array is the same shape as matrix kn. Each element of kn has a data processing unit (dpu).
- All m elements of column i of matrix mk flow east (unmodified) along row i of the systolic array.
- Each dpu receives cumulative sums from the north and elements of matrix mk from the west.
- Each dpu multiplies its element of kn, adds the product to a received sum, and sends a new sum south.
- The m sums to be added to column i of mn flow south from the dpu in column i of the last row of the array.

To receive a hint, submit unfixed code.