Intel® Math Kernel Library

Getting Started Tutorial: Using the Intel® Math Kernel Library for Matrix
Multiplication

or go to: http:// www.intel. * Other names and brands may be claimed as the property of others. may be obtained by calling 1-800-548-4725. PERSONAL INJURY. and/or other countries. MERCHANTABILITY. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Hewlett-Packard Development Company. OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Copies of documents which have an order number and are referenced in this document. COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. SUBCONTRACTORS AND AFFILIATES. 3 .Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.htm Intel processor numbers are not a measure of performance. or other Intel literature.com/products/ processor_number/ Intel. WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN. Intel Corporation. TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. AND EMPLOYEES OF EACH. not across different processor families. in personal injury or death. AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. DAMAGES. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS. HARMLESS AGAINST ALL CLAIMS COSTS. directly or indirectly. OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION. ANY CLAIM OF PRODUCT LIABILITY. Current characterized errata are available on request. Copyright © 2012. without notice. L. Intel may make changes to specifications and product descriptions at any time. A "Mission Critical Application" is any application in which failure of the Intel Product could result. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". EXPRESS OR IMPLIED. Processor numbers differentiate features within each processor family.S. All rights reserved. The information here is subject to change without notice.com/design/literature. MANUFACTURE.P. YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES. RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE. BY ESTOPPEL OR OTHERWISE. Intel Atom. and Intel Core are trademarks of Intel Corporation in the U. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY. Go to: http://www.intel. OFFICERS. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION. AND THE DIRECTORS. OR INFRINGEMENT OF ANY PATENT. DIRECTLY OR INDIRECTLY. NO LICENSE. Portions Copyright© 2001. Do not finalize a design with this information.

Intel® Math Kernel Library Getting Started Tutorial: Using the Intel® Math Kernel Library for Matrix Multiplication 4 .

you can find more resources at http://software. you should be able to: More Resources This tutorial uses the Fortran language.com/en-us/articles/intel-software-product-tutorials/. 5 . This site also offers a printable version (PDF) of tutorials. A similar tutorial using a sample application in another programming language may be available at http:// software.com/en-us/articles/ intel-mkl/. Learning Objectives After you complete this tutorial.intel. but the concepts and procedures in this tutorial apply regardless of programming language.intel. About This Tutorial This tutorial demonstrates how to use Intel MKL in your applications: Estimated Duration 10-20 minutes.Overview Discover how to incorporate core math functions from the Intel® Math Kernel Library (Intel® MKL) to improve the performance of your application. • • • • • • • • Multiplying matrices using Intel MKL routines Measuring performance of matrix multiplication Controlling threading Use Intel MKL routines for linear algebra Compile and link your code Measure performance using support functions Understand the impact of threading on Intel MKL performance Control threading for Intel MKL functions In addition.

Intel® Math Kernel Library Getting Started Tutorial: Using the Intel® Math Kernel Library for Matrix Multiplication 6 .

SSE3. which perform a variety of vector and matrix operations. Notice revision #20110804 Linear Algebra • • • • BLAS LAPACK/ ScaLAPACK PARDISO* Iterative sparse solvers Fast Fourier Transforms • • • Multidimensional (up to 7D) FFTs FFTW interfaces Cluster FFT Summary Statistics • • • • • • Kurtosis Variation coefficient Quantiles. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Logarithmic • Power/Root • Rounding Vector Random Number Generators • • • • • Conguential Recursive WichmannHill • Mersenne Twister • Sobol • Neiderreiter • RDRANDbased Poisson Solvers Optimization Solvers Exploring Basic Linear Algebra Subprograms (BLAS) One key area is the Basic Linear Algebra Subprograms (BLAS). order statistics Min/max Variance/ covariance . functionality. Intel MKL offers highly-optimized and extensively threaded routines which implement many types of operations. Intel does not guarantee the availability. Data Fitting • • • Splines Interpolation Cell search Other Components • Vector Math • • Trigonometric • Hyperbolic • Exponential. and SSSE3 instruction sets and other optimizations. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. 7 . This tutorial uses the dgemm routine to demonstrate how to perform matrix multiplication as efficiently as possible. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. or effectiveness of any optimization on microprocessors not manufactured by Intel... These optimizations include SSE2.Introduction to the Intel® Math Kernel Library Use the Intel Math Kernel Library (Intel MKL) when you need to perform computations with high performance.

Intel® Math Kernel Library Getting Started Tutorial: Using the Intel® Math Kernel Library for Matrix Multiplication 8 .

ALPHA.P. storing matrix values in the arrays.0 BETA = 0.a. Use dgemm to Multiply Matrices This exercise demonstrates declaring variables. ") and matrix B(". you can perform this operation with the transpose or conjugate transpose of A and B.M. N=1000) A(M. B. where A. ")" FORMAT(a.N). J (M=2000. or mkl_install_dir/doc/samples/mkl_mmx_f (Linux* OS/ OS X*).0 9 .I5.a." x". Although Intel MKL supports Fortran 90 and later. *.I5. The most widely used is the dgemm routine.I5. The arrays are used to store these matrices: The one-dimensional arrays in the exercises store the matrices by placing the elements of each column in successive cells of the arrays. *. The complete details of capabilities of the dgemm routine and all of its arguments can be found in the ?gemm topic in the Intel Math Kernel Library Reference Manual. * Fortran source code is found in dgemm_example. *. I. " matrix A(". N.a. "" ALPHA = 1. "Initializing data for matrix multiplication C=A*B for " PRINT 10. B(P. C(M. which calculates the product of double precision matrices: The dgemm routine can perform several calculations.Multiplying Matrices Using dgemm Intel MKL provides several routines for multiplying matrices.N) "This example computes real matrix C=alpha*A*B+beta*C" "using Intel(R) MKL function dgemm. BETA M. P. and C" "are matrices and alpha and beta are double precision " "scalars" "" PRINT *. For example. N. P=200.a) PRINT *. NOTE The Fortran source code for the exercises in this tutorial is found in mkl_install_dir\doc \samples\mkl_mmx_f (Windows* OS).I5. *.P)." x".f PROGRAM MAIN IMPLICIT NONE DOUBLE PRECISION INTEGER PARAMETER DOUBLE PRECISION PRINT PRINT PRINT PRINT PRINT 10 *. the exercises in this tutorial use FORTRAN 77 for compatibility with as many versions of Fortran as possible. and calling dgemm to compute the product of the matrices. P.

MIN(P. In this case: 'N' Character indicating that the matrices A and B should not be transposed or conjugate transposed before multiplication. J = 1. "" 30 FORMAT(6(ES12. M DO J = 1.P. M DO J = 1. "Top left corner of matrix C:" PRINT 30.M) PRINT *. This call to the dgemm routine multiplies the matrices: CALL DGEMM('N'.J).BETA.M.MIN(N.J) = (I-1) * P + J END DO END DO DO I = 1.MIN(P.6)) PRINT *.J). M.P.ALPHA. "Intializing matrix data" PRINT *.J = 1. "" PRINT *.J).M.Intel® Math Kernel Library Getting Started Tutorial: Using the Intel® Math Kernel Library for Matrix Multiplication PRINT *. "" PRINT *.4. "Computations completed.0 END DO END DO PRINT *.C.J) = 0. "Top left corner of matrix A:" PRINT 20. ((C(I.ALPHA.B.1x)) PRINT *.0.MIN(N. I = 1.C. "Example completed. I = 1. "" DO I = 1.1x)) PRINT *." PRINT *. "Computing matrix product using Intel(R) MKL DGEMM " PRINT *.M.MIN(M. P A(I.6)) PRINT *.BETA.N.A. "subroutine" CALL DGEMM('N'. N B(I.A.P. N.6)). N C(I.M. I = 1.M) The arguments provide options for how Intel MKL performs the operation." STOP END NOTE This exercise illustrates how to call the dgemm routine. "" 20 FORMAT(6(F12.6)) PRINT *.N.'N'.6)).MIN(M.J) = -((I-1) * N + J) END DO END DO DO I = 1.6)). ((B(I. ((A(I. "Top left corner of matrix B:" PRINT 20.'N'. P Integers indicating the size of the matrices: • A: M rows by P columns • B: P rows by N columns • C: M rows by N columns 10 . P DO J = 1.B. J = 1.P. An actual application would make use of the result of the matrix multiplication.

Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. After compiling and linking. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. M Leading dimension of array A. type • Windows* OS: ifort /Qmkl dgemm_example. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. P Leading dimension of array B. Compile and Link Your Code Intel MKL provides many options for creating code for multiple processors and operating systems. or effectiveness of any optimization on microprocessors not manufactured by Intel. or the number of elements between successive columns (for column major storage) in memory. and SSSE3 instruction sets and other optimizations. compatible with different compilers and third-party libraries. use the Intel MKL Link Line Advisor to generate a command line to compile and link the exercises in this tutorial: http://software. Intel does not guarantee the availability. A Array used to store matrix A. In the case of this exercise the leading dimension is the same as the number of rows. M Leading dimension of array C. Notice revision #20110804 See Also Intel MKL Documentation Intel Math Kernel Library Knowledge Base 11 . In the case of this exercise the leading dimension is the same as the number of rows.intel. For other compilers.f NOTE This assumes that you have installed Intel MKL and set environment variables as described in http://software. BETA Real value used to scale matrix C. and with different interfaces.out on Linux* OS and OS X*. To compile and link the exercises in this tutorial with Intel® Composer XE. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.intel. named dgemm_example.com/en-us/articles/intel-mkl-link-line-advisor/. functionality. SSE3. C Array used to store matrix C.Multiplying Matrices Using dgemm ALPHA Real value used to scale the product of matrices A and B. These optimizations include SSE2. OS X*: ifort -mkl dgemm_example. execute the resulting executable file. or the number of elements between successive columns (for column major storage) in memory. In the case of this exercise the leading dimension is the same as the number of rows.f • Linux* OS. or the number of elements between successive columns (for column major storage) in memory.com/en-us/articles/intel-mkl-110-getting-started/.exe on Windows* OS or a. B Array used to store matrix B.

Intel® Math Kernel Library Getting Started Tutorial: Using the Intel® Math Kernel Library for Matrix Multiplication 12 .

" milliseconds ==" FORMAT(A.S_ELAPSED*1000. "Intel(R) MKL DGEMM subroutine to get stable " PRINT *.J) END DO C(I.N.P. "== Matrix multiplication using Intel(R) MKL DGEMM ==" PRINT 50. NOTE The quick execution of the dgemm routine makes it difficult to measure its speed. "Measuring performance of matrix product using " PRINT *.M.BETA. You should set the value of the LOOP_COUNT constant so that the total execution time is about one second. "" S_INITIAL = DSECND() 13 . "" CALL DGEMM('N'.N.K) * B(K.A. "Making the first run of matrix product using " PRINT *.P. "" DO I = 1.C. P TEMP = TEMP + A(I.A. " == completed at ".ALPHA.BETA. perform the same measurement. even for an operation on a large matrix. "Measuring performance of matrix product using " PRINT *. but use a triply-nested loop to multiply the matrices. LOOP_COUNT CALL DGEMM('N'.5.A) PRINT *. For this reason. the exercises perform the multiplication multiple times. N TEMP = 0. "Intel(R) MKL DGEMM subroutine" PRINT *. "triple nested loop to get stable run time" PRINT *.M) END DO S_ELAPSED = (DSECND() . Measure Performance of dgemm Use the dsecnd routine to return the elapsed CPU time in seconds.ALPHA.f PRINT *.B.M. "" S_INITIAL = DSECND() DO R = 1. * Fortran source code is found in dgemm_with_timing. M DO J = 1. This provides a way of quantifying the performance improvement resulting from using Intel MKL routines in this tutorial.0 DO K = 1. "Making the first run of matrix product using " PRINT *.C.Measuring Performance with Intel® MKL Support Functions Intel MKL provides functions to measure performance. * Fortran source code is found in matrix_multiplication.M. "" Measure Performance Without Using dgemm In order to show the improvement resulting from using dgemm.'N'. "measurements" PRINT *.M.S_INITIAL) / LOOP_COUNT PRINT *.J) = TEMP END DO END DO PRINT *.M) 50 PRINT *. "triple nested loop" PRINT *.P.B.P.f PRINT *.'N'. "run time measurements" PRINT *.F12.

"== Matrix multiplication using triple nested loop ==" PRINT 50.5.K) * B(K. N TEMP = 0.S_ELAPSED*1000.J) END DO C(I.0 DO K = 1.F12. P TEMP = TEMP + A(I. You can find more information about measuring Intel MKL performance from the article "A simple example to measure the performance of an Intel MKL function" in the Intel Math Kernel Library Knowledge Base.Intel® Math Kernel Library Getting Started Tutorial: Using the Intel® Math Kernel Library for Matrix Multiplication 50 DO R = 1. LOOP_COUNT DO I = 1. M DO J = 1.S_INITIAL) / LOOP_COUNT PRINT *." milliseconds ==" FORMAT(A.J) = TEMP END DO END DO END DO S_ELAPSED = (DSECND() .A) PRINT *. See Also Intel MKL Documentation Intel Math Kernel Library Knowledge Base 14 . " == completed at ". "" Compare the results in the first exercise using dgemm to the results of the second exercise without using dgemm.

" thread(s) ==" FORMAT(A. "" DO K = 1.A) PRINT *. " == using ". "" MAX_THREADS = MKL_GET_MAX_THREADS() 20 30 PRINT 20.N.ALPHA." thread(s)" FORMAT(A. "Finding max number of threads Intel(R) MKL can use for" PRINT *.M.P.I2." Running Intel(R) MKL from 1 to ".S_ELAPSED*1000. * Fortran source code is found in dgemm_threading_effect_example." thread(s)" FORMAT(A. where n is the number of physical cores on the system.A. By restricting the number of threads and measuring the change in performance of dgemm. N C(I.A." threads" FORMAT(A.P.I2. "Intel(R) MKL DGEMM subroutine to get stable " PRINT *. "" S_INITIAL = DSECND() DO R = 1.5. "run time measurements" PRINT *. this exercise shows how threading impacts performance. "Making the first run of matrix product using " PRINT *.BETA.M. "Measuring performance of matrix product using " PRINT 40. "" END DO Examine the results shown and notice that time to multiply the matrices decreases as the number of threads increases. " == completed at ".A) FORMAT(A.N.K. LOOP_COUNT CALL DGEMM('N'.A) PRINT *.K.C.MAX_THREADS.'N'.ALPHA.C.P.M.M) END DO S_ELAPSED = (DSECND() . MAX_THREADS DO I = 1.S_INITIAL) / LOOP_COUNT PRINT *.f PRINT *. 15 . "" CALL DGEMM('N'.M) 40 50 60 PRINT *.B. Intel MKL uses n threads.'N'. Limit the Number of Cores Used for dgemm This exercise uses the mkl_set_num_threads routine to override the default number of threads. "== Matrix multiplication using Intel(R) MKL DGEMM ==" PRINT 50. " Requesting Intel(R) MKL to use ".A) PRINT *.K.F12.I2. M DO J = 1.P.B. and mkl_get_max_threads to determine the maximum number of threads.J) = 0. " Intel(R) MKL DGEMM subroutine on ".0 ENDDO ENDDO PRINT 30.I2.A) CALL MKL_SET_NUM_THREADS(K) PRINT *." milliseconds ==" PRINT 60. If you try to run this exercise with more than the number of threads returned by mkl_get_max_threads.M. you might see performance degrade when you use more threads than physical cores.BETA. "parallel runs" PRINT *.Measuring Effect of Threading on dgemm By default.

intel.Intel® Math Kernel Library Getting Started Tutorial: Using the Intel® Math Kernel Library for Matrix Multiplication NOTE You can see specific performance results for dgemm at the Details tab at http:// software.com/en-us/articles/intel-mkl. See Also Intel MKL Documentation Intel Math Kernel Library Knowledge Base 16 .

Other Areas to Explore The exercises so far have given the basic ideas needed to get started with Intel MKL. The newest of these models is the best option for those calling Intel MKL from managed runtime libraries and is easy to link. The allocation functions allow proper alignment of memory to ensure reproducibility when used together with CBWR functions. ILP64). and VSL to report errors. • Single Dynamic Library (SDL) linking model Intel MKL has two ways to link to dynamic libraries. Linking and interfaces • The ILP64 interface Most users call the interface of Intel MKL that takes 32-bit integers for size parameters. but increased memory and also some legacy code requires 64-bit integers. Acquaint yourself with other support functions by referring to the "Support functions" chapter of the Intel MKL Reference Manual: • Support functions for Conditional Numerical Reproducibility (CNR) These functions provide the means to balance reproducibility with performance in certain conditions. LAPACK. but requires some functions calls to use nondefault interfaces (for example. interfaces. Error handling functions • The xerbla function is used by BLAS. The following are some controls. See Also Intel MKL Documentation Intel Math Kernel Library Knowledge Base 17 . • Memory functions These functions provide support for allocating and freeing memory. VML. If you do not want the behavior to change based on an environment variable in a particular case. Support functions The second exercise shows how to use the timing functions and the third exercise shows the use of threading control functions. but there are plenty of other areas to explore. In all cases the function overrides the behavior of the environment variable. and topics which you might find worth investigating further. Miscellaneous • Environment variables Many controls in Intel MKL have both environment variables and functional versions. See the Intel MKL User's Guide for more information on Intel MKL linking models. See the Intel MKL User's Guide for descriptions of the environment variables used by Intel MKL. use the function call to ensure the desired setting. Read more about the ILP64 interface and the libraries and functions supporting it in the Intel MKL User's Guide.

Intel® Math Kernel Library Getting Started Tutorial: Using the Intel® Math Kernel Library for Matrix Multiplication 18 .

Index Index M main cover 3 19 .

Intel® Math Kernel Library Getting Started Tutorial: Using the Intel® Math Kernel Library for Matrix Multiplication 20 .