There are a few things that must be kept in mind to ensure smooth integration between MatrixLib and BLAS/LAPACK. The first is that MatrixLib expects to have control over the arrays that contain the data for a matrix, and if user code makes unexpected changes to these buffers errors can occur. MatrixLib provides access to the internal data array for a matrix, but the user is advised to avoid manipulation of this array beyond that strictly necessary- resizing, element access and modification, etc should all be done via the standard MatrixLib function calls.
These two pieces of information can be retrieved via the ml_matrix::raw_data() function and the ml_matrix::data_stride() function. They are provided entirely for the purpose of allows BLAS/LAPACK calls; again, it is not advised tha the user modify the raw data array directly.
Often BLAS and LAPACK calls will require an input matrix and an output matrix. The most efficient way to handle this is to create the output matrix, of the appropriate size, before the function call; raw_data() and data_stride() can then be called on this new output matrix to acquire the necessary information for passing into the BLAS or LAPACK function call. When doing this, however, be sure that the output matrix is the correct size when it is constructed; many BLAS and LAPACK function calls assume the output matrix size from the input matrix size rather than taking specific row/column count information (other than the major stride). If the output matrix data array passed in is smaller than that needed by the function, memory corruption will occur.
First, let's assume that we have two matrices, A and B. We'll assume for the sake of brevity that the sizes of the matrices match up such that A*B is a valid operation. We're interested in writing a function to compute A*B and return the result, C. We can use the following as the signature for our function:
ml_matrix matrix_multiply(const ml_matrix &A,const ml_matrix &B);
To do the calculation, we'll use the BLAS function cblas_dgemm. This function actually does more work than we need, calculating C = x*A*B + y*C, but by passing in the appropriate scalar values for x and y we can accomplish our goal. The function takes a large number of parameters:
|1||(constant)||Storage order of matrix. Use the ML_BLAS_STORAGE macro to pass the correct value.|
|2-3||(constant)||Whether or not the A and B (respectively) matrices should be transposed before multiplication. Pass CblasNoTrans to avoid transpose.|
|4||integer||The number of rows in the A matrix.|
|5||integer||The number of columns in the B matrix.|
|6||integer||The number of columns in the A matrix.|
|7||double||A scalar to multiply by during computation.|
|8||double *||The data array for the A matrix.|
|9||integer||The major stride for the A matrix.|
|10||double *||The data array for the B matrix.|
|11||integer||The major stride for the B matrix.|
|12||double||A scalar to multiply C by during the computation.|
|13||double *||The data array for the C matrix (output).|
|14||integer||The major stride for the C matrix.|
Looking at the above list, then, we see that despite the large number of parameters the function call will actually be reasonably straightforward. First, we create an output matrix, C, of the appropriate size:
ml_matrix C(A.rows(),B.cols());Now that we have that, we can call cblas_dgemm to do the work.cblas_dgemm(ML_BLAS_STORAGE,CblasNoTrans,CBlasNoTrans, A.rows(),B.cols(),A.rows(), 1.0,A.raw_data(),A.data_stride(), B.raw_data(),B.data_stride(), 0.0,C.raw_data(),C.data_stride());Putting everything together, we can produce the whole function: