BLAS and LAPACK

Overview

MatrixLib provides integration with BLAS and LAPACK, both via built-in functions that use one or the other to do actual calculations and via functions that allow a user to extract the necessary data from MatrixLib to make arbitrary BLAS/LAPACK function calls in their own code. Doing so is quite straightforward, and allows the user to take advantage of BLAS/LAPACK functionality not yet built into MatrixLib itself.

There are a few things that must be kept in mind to ensure smooth integration between MatrixLib and BLAS/LAPACK. The first is that MatrixLib expects to have control over the arrays that contain the data for a matrix, and if user code makes unexpected changes to these buffers errors can occur. MatrixLib provides access to the internal data array for a matrix, but the user is advised to avoid manipulation of this array beyond that strictly necessary- resizing, element access and modification, etc should all be done via the standard MatrixLib function calls.

BLAS/LAPACK Calls

BLAS and LAPACK functions require four pieces of information regarding a matrix in order to properly operate on it. The first two are simple: the number of rows and the number of columns in the matrix. This information is readily available to the user via the ml_matrix::rows() and ml_matrix::cols() function calls. The third and fourth pieces of information are details usually masked by the MatrixLib class: A pointer to the data array for the matrix, and the "major stride" of the matrix. MatrixLib internally stores matrices in memory that is sometimes slightly larger than strictly necessary in order to optimize resizes and provide a workspace for certain operations. The major stride tells BLAS/LAPACK how large the matrix actually is in terms of the storage array, rather than the size presented to the user.

These two pieces of information can be retrieved via the ml_matrix::raw_data() function and the ml_matrix::data_stride() function. They are provided entirely for the purpose of allows BLAS/LAPACK calls; again, it is not advised tha the user modify the raw data array directly.

Often BLAS and LAPACK calls will require an input matrix and an output matrix. The most efficient way to handle this is to create the output matrix, of the appropriate size, before the function call; raw_data() and data_stride() can then be called on this new output matrix to acquire the necessary information for passing into the BLAS or LAPACK function call. When doing this, however, be sure that the output matrix is the correct size when it is constructed; many BLAS and LAPACK function calls assume the output matrix size from the input matrix size rather than taking specific row/column count information (other than the major stride). If the output matrix data array passed in is smaller than that needed by the function, memory corruption will occur.

Example

One of the simplest examples of using BLAS to perform calculations can come straight from MatrixLib and the * operator for matrix multiplication. To implement this as a user function via BLAS would be quite simple.

First, let's assume that we have two matrices, A and B. We'll assume for the sake of brevity that the sizes of the matrices match up such that A*B is a valid operation. We're interested in writing a function to compute A*B and return the result, C. We can use the following as the signature for our function:


ml_matrix matrix_multiply(const ml_matrix &A,const ml_matrix &B);

To do the calculation, we'll use the BLAS function cblas_dgemm. This function actually does more work than we need, calculating C = x*A*B + y*C, but by passing in the appropriate scalar values for x and y we can accomplish our goal. The function takes a large number of parameters:

cblas_dgemm Parameters

Parameter Type Meaning

1 (constant) Storage order of matrix. Use the ML_BLAS_STORAGE macro to pass the correct value.

2-3 (constant) Whether or not the A and B (respectively) matrices should be transposed before multiplication. Pass CblasNoTrans to avoid transpose.

4 integer The number of rows in the A matrix.

5 integer The number of columns in the B matrix.

6 integer The number of columns in the A matrix.

7 double A scalar to multiply by during computation.

8 double * The data array for the A matrix.

9 integer The major stride for the A matrix.

10 double * The data array for the B matrix.

11 integer The major stride for the B matrix.

12 double A scalar to multiply C by during the computation.

13 double * The data array for the C matrix (output).

14 integer The major stride for the C matrix.

Looking at the above list, then, we see that despite the large number of parameters the function call will actually be reasonably straightforward. First, we create an output matrix, C, of the appropriate size:


ml_matrix C(A.rows(),B.cols());

Now that we have that, we can call cblas_dgemm to do the work.


cblas_dgemm(ML_BLAS_STORAGE,CblasNoTrans,CBlasNoTrans,
            A.rows(),B.cols(),A.rows(),
            1.0,A.raw_data(),A.data_stride(),
            B.raw_data(),B.data_stride(),
            0.0,C.raw_data(),C.data_stride());

Putting everything together, we can produce the whole function:


ml_matrix matrix_multiply(const ml_matrix &A,const ml_matrix &B)
{
    ml_matrix C(A.rows(),B.cols());
    cblas_dgemm(ML_BLAS_STORAGE,CblasNoTrans,CBlasNoTrans,
                A.rows(),B.cols(),A.rows(),
                1.0,A.raw_data(),A.data_stride(),
                B.raw_data(),B.data_stride(),
                0.0,C.raw_data(),C.data_stride());
    return C;
}

The above function does no error checking, of course, but when inputs are assumed to be valid it will properly return the result A*B by way of BLAS matrix multiplication calculations.

cblas_dgemm Parameters
Parameter	Type	Meaning
1	(constant)	Storage order of matrix. Use the ML_BLAS_STORAGE macro to pass the correct value.
2-3	(constant)	Whether or not the A and B (respectively) matrices should be transposed before multiplication. Pass CblasNoTrans to avoid transpose.
4	integer	The number of rows in the A matrix.
5	integer	The number of columns in the B matrix.
6	integer	The number of columns in the A matrix.
7	double	A scalar to multiply by during computation.
8	double *	The data array for the A matrix.
9	integer	The major stride for the A matrix.
10	double *	The data array for the B matrix.
11	integer	The major stride for the B matrix.
12	double	A scalar to multiply C by during the computation.
13	double *	The data array for the C matrix (output).
14	integer	The major stride for the C matrix.