Professional Documents
Culture Documents
004 2081 002 Scientific Libraries Reference Manual Volume 2 Version 3.3 July 1999
004 2081 002 Scientific Libraries Reference Manual Volume 2 Version 3.3 July 1999
Manual, Volume 2
004–2081–002
"! # $&%' ( ) # *+), $#-./! !102 3)4*50)67*867 9267:1-;1) *)<=( $>?(!?(4*@ )6AB67C<=( /$)DFE6
B6G?BD: >?#67H
: $I($=C4D <J>$)! 6A*K*?67"<="4+6A:LEDM# $4(#DNEDPOQ"446G$R?67 <M *8*K"$SDC "! # $M%5( #*4?, $#-
.Z>?4D4(* q $)32?eD ?? (K?( R./:( )( 2 DC4 ?( wv2g]U p )? ( Dg8 02, $)C]"</)02,]ix7yjz{B|}` 92( _r8~2 T Em# U pp .Z?B6G$4 #6mmX2
a/p W102 Ta 8;W10j a Y[,]kZ8~2g4U p W.Z?( $): a Y[,]?klmi< q (B6rC467:167 (! ! MB6AD *8467B67:H4 (:6G<M( q *'( $):H167#( >?*86$lO[D q *K4(4 $I *5( $
*8! ( $:?, ??1UQ;D?eDo ??e ;D2e ;1f2e ; $)1>?G6G$)Us(A $)46G$($#6;D! *42?kl8( w.Z$ <=(4 $h;1)6A(A46A )? ( w. pp )( H?o
( HAoX)? ( MQ</? !" $)RK2*846G</)? (2X5D#]?? (HW T )( R8o ( Po*K62? ( T $ q ( PYQZ82( 2i02W1W T ! E("( $F
( H8g4U p )( MmmX)g4;o ( M8tr 2( P;o ? (w;1uX))( P;1uW ( D;m>?4D 2( R~2g4U p ?? (w~)UV8?? ( Dg4fm, UQ1t;D
XZ67! 926A $/ )6?OQ67`-F-j- ?X5%5( >?*8*4)X5D#]92 6O@)W UVXZ8?%5 j(02 $)3)_rW ~2./0j),]kZ8FYQXm67 67*Y[6A O[D q XZ * q ./ (3
Y[6A O[D q >?6G>? $PW $F9) B$<=6G$ Y[67 O[D q >?6G>? $)b;1D!"*+?k T Y[W ;D02l8)mW1% T XZ0j8U ./0m;W a/p W0 T , YQ}1
8*846G<UV( $46G$)( $#6l( $):H026<=D+6?;67*84 $PW $F92"B$F<M6G$) F; >?*8467: a Y[,]kZ8($: a YQ,4kl5UQ.Z~(AG6L4 (:6<=( q *@DC( M026A*K6A(AG#]F
T - T - - 2('Or)D! ! M Or$67:[*4>E*8 :1 ( =DC)m !" # $M%5( )"#*4?, $)# -
m%5,N *`(4(:16G<=( q D Cm ! "# $M%5( ) # *4), $#-,]02, ~\( $):Qm ! # $M%5 ( #*@(AG6dB67j *8467B67:[4 (:6<=( q *@( $):Q )6d "! # $M%5( #*5! DDw *`(
4 (:6G<M( q DCm ! "# $I5
% ( #*4), $)# -
X5I"*'(4(:16G<=( q C2? $)4BD!XZ(4(b8*846G<M*+?, $)# -)XZW1 a/T ;02, ~2Ft.Z~2?( $):LtUs(B6r4(:16G<=( q *@DCX5 j 4(!W1 >? <M6G$)
"? (A4"$-2W102oL *5(d4(:16G<M(A q DCW UQ.P 8, $)# -)W ;D."*'(4(:16G<=( q C2W;D.K2*846G<=*4?, $#-,]U"*'(4(:16G<=( q C2, $467 $(4 $(!
>?*8 $)6A*K*NUs(#]) $)67*'"? (A4"$-UV, p *5(d4(:16G<M(A q DCDUs, p ? </>?467@K2*846G<=*8- a Y[, ~\ *5(LB67j *8467B67:H4(:16G<M(A q $I )6 a $) 46A:
m4(467*@($:[D )67`# >$)4"67*4N! "#6G$*K6A: 67 #! >*K 9267! B>?3[~5i1k)?6G$S? </?( $F T <="467:1-~5ik?6G$M"*`(dB67D"*K46AB67:H4(:16G<=( q Cj~5ik?6G$
<P?($ T 4:1-F~&V $):1OK2*846G<( $:H )6?~\:6G9) #6b(B6d4 (:6<=( q *5DCD;16dk?6G$S%5B>?-
;1)6 a Y[,]?klw?67 (4 $)P*4*846G< *5:67 9267: C4B < a YQ, ~?h82*K46<t?-7;1)6 a Y[,]?klr ?67(4 $)h*4*846G< *5(! *8lE(*867:Q $h?(A 2 $S )6
ej>? =167 q 67! 6GSmC] O[(B6dXZ *84 ED>?4 $V41mX5F>$:167?! # 6G$)*86rC]B <;1)6b0267D6$)4*5C2 )6 a $) 9267*8 HDC?(! "C4D $)"(-
New Features
V'hB`KAK¡D&¡D¢``£Q¤V3&¥\¦5M8)hK¢`BQ3§Bm?¨
Record of Revision
©`ª1«m¬1D®2¯ °ª1¬±1«m3²w³D®2¯
´`¨ µ ¶3¡D¢¸·¹º1¹
»H¡D'¥\` B^m'¼'¼') B`£\ ¢`V½HV¾8¿/ÀMÁs´`¨ µV3§BmSA'``B`£¸¿/1Â\Ãrm3¡D¢
¡D¥\¼'' RmÂmK¥\¨
Ä`¨ µ ÅK`'A·¹¹1·
ÃL¼'B`b¤VBK¢^3Æ?BmBÇm'¼'¼')AKB`£\ ¢`V½HV¾K¿PÀ=ÁÄ`¨ µV3§B1SA'``B`£)^¿PAÂÃL3¡D¢
¡D¥\¼'' RmÂmK¥\¨
È`¨ µ É='£'md·¹¹Ê
ÃL¼'B`b¤VBK¢^3Æ?BmBËm'¼'¼') B`£\½=&¾8¿PÀ=ÁsÈ`¨ µs3§BmVA'``B`£)¸¿PAÂÃL3¡D¢
¡D¥\¼'' RmÂmK¥\¨
º`¨ µ É='£'md·¹¹Ì
ÃL¼'B`b¤VBK¢^3Æ?BmBËm'¼'¼') B`£\K¢`S¿/1Â?Í5BÎ ·`¨ µs3§BmSÏ81Â`¡D¢`3`)'mÐw ¢`bA'`
¿/ÂÃL3¡D¢^m ¥\¨Z¾KK¢`BQ3ÆBmB))r ¢`S¦5)¡j'¥\`K B)`Ñ? ¢`S¥\ ¢§BBÎAÂB
`Ò§B`£R¦5)¡j'¥\`K¦BÇ ¢`Vm¥\M¥\`'§b1QK¢`Sm¡DB` BÓ5¡H§BBÎAA¨Z¾8`m ¦5Ñ@BdB
¦5)¡j'¥\`K¦¸BÇ ¢`RÔÖÕD×8ØVÙÚÜÛFÝAÕDÝAÞ&ßLà áDàAÝ àAâã àZÔÖÕDâ?äÕDåÜÑ`¼''ΧBB¡DKB)^ÁÃLæKÊ·Ì1º`¨5ç5¢`SmAr`¦
m3¡D¢3)'KB`Ñ@¤V¢`B¡D¢\¤V3&B\K¢`S½HV¾8¿/ÀMÁsÈ`¨ µIÆ?AB))r ¢`Sm¡DB`KBÓ'¡H§BBÎAÂ Ñ`¤V3
¥\Æ?¦¸ \ ¢`&½=V¾K¿PÀ=Áè) 1¸Í5BÎAA¨
º`¨ · ÅK'`S·1¹¹é
ÃL¤sAB M \m'¼'¼')r ¢`&¿PAÂ?Í'BÎ?[·`¨ ·V3§B1&ÏKmÂ?`¡D¢`3)`'mÐr ¢`r'`Q¿PAÂ
ÃL3¡D¢^mÂ?m ¥\¨Zç5¢`BQ3Æ?BmBËB`¡DA¼')1K[m5¼@¼'ArK)[ ¢`S¿PAÂ^¶ê'êË¢`3¦'¤s13
¼'§B 8)A¥¨
Ê`¨ µ »H¡D¥\ÎR·¹¹´
ÃL¤sAB M \m'¼'¼')r ¢`&¿PAÂ?Í'BÎ?[Ê`¨ µV3§B1&K¢`1b'`Q)¸¿PAÂ\Ãrm3¡D¢¸m ¥\¨
ç5¢`BQ3Æ?BB)ÇB`¡D)¼')AKQm'¼'¼')rK)hÁ2¡D§BΧB&Í'É=ê1ÉH¿PíÏKÁ2¡DÍ5ÉHêÉH¿PíRÐd`¦
¦5)¡j'¥\`K¦¸m'¼'¼')d8)hÌÊæKÎBrè?èçË3' B`¨lÉ=¦5¦5B B§Z3)'KB`Q¤V3&¦5¦5¦K)
è?èçË`¦ë5Í5ÉHÁ'¨
Ì`¨ µ ÅK'`S·1¹¹È
ÃL¤sAB M \m'¼'¼')r ¢`&¿PAÂ?Í'BÎ?[Ì`¨ µV3§B1&K¢`1b'`Q)¸¿PAÂ\Ãrm3¡D¢¸m ¥\¨
ç5¢`BQ3Æ?BB)Ç3¥\Æ?Hm'¼'¼')d8)P ¢`Vë'B¡HÍ5B`RÉH§B£ÎsÁ2'μ'3)£¥\QK)[m¢`3¦
004–2081–002 i
Scientific Libraries Reference Manual, Volume 2
Ì`¨ · É='£'md·¹¹º
½=¼'¦'K¦¸K^3ï5¡Dd¡D¢``£ BÇ ¢`Sê'3£A¥\¥\B`£ð5`Æ?B3`¥\`̨ ·s3§Bm?¨5ç5¢`
¼'AB`K¦ËKñd)r ¢`BQ¥\`'§b¤V ¥\¦'SÆ?B§BΧB&BǼ')mKm¡DB¼@dÏ ¨ ¼'mÐK)¥\b)`§BÂK
K¢`BQ3§Bm?¨
Ì`¨ Ì ÅK'§B·¹¹1¹
½=¼'¦'K¦¸K^3ï5¡Dd¡D¢``£ BÇ ¢`Sê'3£A¥\¥\B`£ð5`Æ?B3`¥\`̨ Ìs3§Bm?¨5ç5¢`
¼'AB`K¦ËKñd)r ¢`BQ¥\`'§b¤V ¥\¦'SÆ?B§BΧB&BǼ')mKm¡DB¼@dÏ ¨ ¼'mÐK)¥\b)`§BÂK
K¢`BQ3§Bm?¨
ii 004–2081–002
About This Guide
Documentation Organization
ç5¢`&¼'B` ¦ËÆmB)`H)r ¢`SÁ¡jB`KBÓ'¡MÍ5BÎAAÂ¥\^¼'£Q¼'¼'1[BÇÊ&Æ?)§B5¥\
`¦3I£3'¼@¦¡D¡D)3¦5B`£Ç \ ¼'B¡D¨5ÁS ¢` ÏKÌÁÐd¥\1˼'£SK)
¦5KB§BHÎ'lK¢`&¡D` ` d¡D¢ÇÆ?§B'¥\?¨ INTRO_LIBSCI
ð51¡j¢¸ ¼'B¡=¡j B)ǧBm\¢`[¸B` 3¦5'¡D AÂ\¥\˼'1£?S¤V¢`B¡D¢^ñ¼@§BB` K¢`
¡D)`K` )K¢`Sm¡D BÇ1`¦¼'3)ÆB¦5=) ¢`RB`K)A¥\KB)¸Î)'rK¢`S'm£I)K¢`)m
3)'KB`N¨lç5¢`SK§B§B)¤VB`£\B`K3)¦5'¡DK)Â^¥\^¼'£Q3IÆ?B§BΧB?ø
Ï8Ì1Á2Ð
INTRO_BLACS
Ï8Ì1Á2Ð
INTRO_BLAS1
Ï8Ì1Á2Ð
INTRO_BLAS2
Ï8Ì1Á2Ð
INTRO_BLAS3
ÏKÌÁÐ
INTRO_CORE
ÏKÌÁÐ
INTRO_FFT
ÏKÌÁÐ
INTRO_LAPACK
ÏKÌÁÐ
INTRO_MACH
Ï8Ì1Á2Ð
INTRO_SCALAPACK
ÏKÌÁÐ
INTRO_SPARSE
004–2081–002 iii
Scientific Libraries Reference Manual, Volume 2
ÏKÌÁÐ
INTRO_SPEC
ÏKÌÁ2Ð
INTRO_SUPERSEDED
Related Publications
ç5¢`&8)§B§B)¤VB`£¥\`'1§B ¦5)¡j'¥\`d ¢`V¿/1Â?Í5BÎ[¼'3)¦5'¡DK¨ZÉH§B§b¥\^¼'£QB
¢`mI¥\`'§B ¡DÇ1§BÎ&Æ?B¤V¦)`§BB`IÎÂ'B`£ ¢` ¡D)¥\¥\`¦w¨
man
ùûú â?×KÝ7ÚÜâ?ü7ÚÜã[ý5Ý þDã àAÿäÝ àAüRßrà áDà7Ý3àAâã àbÔÖÕDâ?äÕDå
å]ÚÜãÕj×KÚÜþDâý5Ý þ`ÝAÕ HàAÝ ühÙ?Ú]ÛFÝAÕDÝAÞVßrà áDà7Ý3àAâã àbÔÖÕDâ?äÕDå
ù
ù ã ÚÜà7â?×KÚ ãQÙÚÜÛFÝAÕDÝAÚ]àAüPßLàAÕDÿÞIßLà áDàAÝ àAâã à
ù å]ÚÜãÕj×KÚÜþDâý5Ý þ`ÝAÕ HàAÝ ühÙ?Ú]ÛFÝAÕDÝAÞVßràAÕDÿÞsßrà áDàAÝ àAâã à
ç5¢`SK§B§B)¤VB`£\¥\`'§Bh¦5m¡DBÎS ¢`S¼'3)¦5'¡j BÇ ¢`Sê53)£1¥\¥\B`£ð5`Æ?B3)`¥\` ¨
ç5¢`mI¼''ΧBB¡DKB)`Q¦5m¡DBÎ& ¢`S¼' B`£^mÂ?mK¥\ÑB`¼''ö2)'K¼''LÏK¾ ö2À=ÐKÑ`¦
)K¢`[3§B ¦ )¼'B¡j¨
ù
à HàAâ?×rÙþDÕDÿàAÝ Ù=ßLÕDâÿVåÜÿ&ßLà áàAÝ àAâ?ãàdÔÖÕDâä?ÕDå
ù H ú !" ü7àAÝ þ#!=ÕDâÿüRßLà áDàAÝ àAâã àbÔÖÕjâäÕjå
ù H ú !" ü7àAÝ þ#!= ÕDâÿüRßLà7Õjÿ1Þ&ßrà áDàAÝ àAâã à
ù äÚÜÿàP×KþHý5ÕjÝ7ÕDåÜåÜàAå%)$ àAã×KþjÝ åÜÚ]ã ÕD×KÚÜþDâ?ü
ù å]ÚÜãÕj×KÚÜþDâý5Ý þ`
ÝAÕ H àAÝ ü ú'&() ä?Ú]ÿà
¾K¸1¦5¦5BKB)¸ \ ¢`mI¦5)¡j'¥\`KmÑ@ƧZ¦5)¡j'¥\`KQ3IÆ?B§BΧBIK¢`l¦5m¡DBÎ
¢`V¡D)¥\¼'B§BRÂmK¥\ Æ1B§BΧBS)¸½=&¾8¿PÀ=Ás`¦½=&¾8¿PÀ=Á'ö2¥\õl¨'Á2)¥\IdK¢`m
¥\`'1§B 3?ø
ù *,+- ßLàAÕDÿÞIßLà áDàAÝ àAâã à
ù *,+-. þ HÕDâ?ÿüRÕDâÿ/=Ú]Ý3à7ã ×KÚ10à7üRßrà áDà7Ý3à7â?ãàLÔÖÕDâ?äÕDå
ù * þjÝ7×8Ý7Õjâ\Ù?ÕDâ ` äÕ2 àPßrà áDàAÝ àAâã àbÔÖÕDâ?äÕDå13$ þDåÜä= à54
ù * þjÝ7×8Ý7Õjâ\Ù?ÕDâ ` äÕ2 àPßrà áDàAÝ àAâã àbÔÖÕDâ?äÕDå13$ þDåÜä= à56
ù * þjÝ7×8Ý7Õjâ\Ù?ÕDâ ` äÕ2 àPßrà áDàAÝ àAâã àbÔÖÕDâ?äÕDå13$ þDåÜä= à57
ù ÝAÕDÞ &(88 ßLà áDàAÝ àAâã àbÔÖÕjâäÕjå
iv 004–2081–002
About This Guide
DC
http://techpubs.sgi.com/library
BA FE
ç5¢`B ÎÒmBKI¡D` B`RB`8)A¥\KB)K¢`1b§B§B)¤V[Â?)¸ ÒÎ3)¤VmI¦5¡D'¥\` h)`§BB`Ñ
)3¦5h¦5¡D'¥\` mÑ'1`¦m`¦K¦5ΡDõ \Á ¾m¨ `)¡D^§Bm\3¦'[I¼'B` ¦Á ¾
¦5)¡D'¥\`LΡD§B§BB`£·sºµµVÄ1ÊÈV¹Ìµ1È`¨
BA
ç5¢` üAà7ÝPý5äÛFåÜÚÜãÕj×KÚ]þjâü ÕD×KÕDåÜþ Ǧ5m¡DBÎQ ¢`VÆ?B§BÎB§BB Â`¦¡D` `Ld§B§l¿PAÂ
¢`3¦'¤s13V`¦m)8 ¤V3V¦5)¡j'¥\`KQ ¢`r3IÆ?B§BΧBMK¡j'm ¥\A¨Z¿P'mK)¥\A
¤V¢`m5Îm¡DABÎ?I \ ¢`S¿PAÂ\¾K`KA¥ Ï8¿/ÃL¾K`KA¥\Ðr¼'3)£¥ó¡j11¡j¡Dm[ ¢`B
B`K)¥\1KB)¸)¸K¢`S¿/ÃL¾8`K)A¥ ÂmK¥¨
BK)A §B§B¾w¥\¤VB1`B`£ÇK½HB`ÃL[ÍwBø `K)A¥\KB)¸)¸¼''ΧBB¡D§BÂ\Æ?B§BΧBI¿PAÂ^¦5)¡j'¥\`KQd ¢`
Á
!C
http://www.cray.com/swpubs/
GA
ç5¢`B ÎmBKI¡D)`KB`QB`K)A¥\KB)¸ ¢`r§B§B)¤V[Â?)KÎ3¤VmI¦5)¡j'¥\`KQ)`§BB`
FI
HH
`¦m`¦¸K¦5ΡDõ ^Á ¾ ¨1ç2Ò)3¦5[s¼@B` ¦¿/1Â\¦5)¡D'¥\`KÑ`B ¢`h¡D§B§
GA
·sÄ´1·VĺÌV´1¹µÈVhm`¦sK¡DmB¥\B§BI)rÂ'R3 ?'L \Kñ`'¥\Î
·sÄ´1·VĺÌVÌ1ºéµ`¨5Á ¾¥\¼'§B)Â?[¥\Â\§Bm\)3¦5h¼'AB`K¦Ë¿PAÂ^¦5)¡D'¥\`KQÎÂ
m`¦5B`£^K¢`Bh3¦'A Æ?BV§B¡DK3)`B¡M¥\B§5 ¨
orderdsk
KJ
¿/5m ¥\ )'KmB¦5I)r ¢`V½H`BK¦Á2 1`¦¿P`1¦5Vm¢`'§B¦¡D)`K¡Dd ¢`Bh§B)¡D§
mAÆ?B¡j&)3£` BÇKh)3¦5AB`£B`KA¥\ BÇ`¦¦5)¡D'¥\`K1KB)¸B`KA¥\ Br¨
Conventions
ç5¢`&8)§B§B)¤VB`£¡D)`Æ`KB)`H13&'m¦¸ ¢`3'£¢`' ¢`BQ¦5)¡j'¥\` ø
004–2081–002 v
Scientific Libraries Reference Manual, Volume 2
¿ Æ ¶ £
ç5¢`BQÓ'ñ¦'æ8m¼'¡DIK)`r¦5` §BB A§lB ¥\[m5¡D¢Ç
command ¡D¥\¥\1`¦5mÑÓ5§BÑ`3)'KB`mÑ@¼' ¢Ç`¥\Ñ@B£`§BmÑ
¥\mm£mÑ``¦Ë¼'3£A¥\¥\B`£§B`£?'1£?&m '¡D 5 ¨
0ÕDÝAÚ]ÕjÛFå]à ¾8 §BB¡= Â?¼'81¡jS¦5`)K Æ?ABΧBI` ABQ1`¦¤V3¦5
)h¡D`¡D¼' [ÎB`£¦5Ó5`¦w¨
ç5¢`BQΧB¦5ÑÓ'ñ¦5æK¼'1¡jS8)`r¦'` Q§BBKA§lB ¥\
user input
¢`rK¢`S'mh` AQBÇB` 1¡j BÆ?ImmB)`¨
À='K¼''LB[m¢`)¤sÇBÇ`)`Î)§B¦5Ñ`Ó'ñN¦5æKm¼'¡DIK`¨
ML N QN PO
¾K¸1¦5¦5BKB)¸ \ ¢`mIKA¥\ KB`£^¡D)`Æ`KB)`mÑ`ƧZ`¥\B`£¡D)`Æ`KB)` 3
'm¦ ¢`3)'£?¢`)'K¢`S¦'¡D'¥\` KB)r¨ ¿/1Â\ê RêËmÂ?m ¥\ &¦5`) §B§
L PO
¡D)`Ó5£'1KB)`H)r¿/¼'A§B§B§lÆ?¡D R¼'3)¡DmmB`£^ÏKê Rê'ÐdmÂ?mK¥\ K¢`1LA'^K¢`
L
½HV¾8¿/ÀMÁÒ¼' B`£\m ¥¨ 1¿PAÂǶê'êËm ¥\ s¦5`)K[1§B§l¡D`Ó'£5A B`
PO BA
¢`V¿/1Â\ç5ÌðmBQ ¢`r'^ ¢`S½=V¾K¿PÀ=Á5ö¥\õ^)¼'AKB`£ÂmK¥¨ ¾8Ãr¾8÷
m ¥\ s¦5`)K Á ¾r¼'§BKK)A¥\=¤V¢`B¡D¢^'^ ¢`S¾8ÃL¾K÷Ö)¼'A B`£\mÂ?m ¥¨
ç5¢`&¦5K'§Bdm¢`§B§lBÇ ¢`S½=V¾K¿PÀ=Á`¦½HV¾K¿PÀ=Á5ö¥\õ\)¼'A B`£ÇmÂmK¥\mÑ`3K ¦
^Q ¢`SüA×KÕjâÿÕDÝAÿsüAØàAå]åÜÑB[sÆmB)ÇdK¢`&íh)AÇm¢`§B§l ¢`b¡D`K)¥\H \ ¢`
K)§B§B¤VB`£Çm `¦53¦5ø
ù
R
¾8`m BK' I)rð5§B¡j AB¡j1§l`¦ð'§B¡DK3)`B¡DHð5`£B`AQÏK¾Kð5ð5ð5ÐLê5)AKΧBSÀM¼@AKB`£
ÁÂ?m ¥ô¾8` K1¡jIÏ8ê'À=Á¾8÷rÐdÁ2 `¦513¦·µµ1Ì`¨ Ê ·¹1¹Ê
ù ÷[ö2À=¼'^ê'AK1Î?ܧBBKÂ;A 'B¦5Ñ`¾Km'IésÏK÷rêA é1Ð
ç5¢`S½=&¾8¿PÀ=Ás`¦Ç½=V¾K¿PÀ=Á5ö¥\õ\)¼'A B`£\mÂ?m ¥\[1§Bm'¼'¼')lK¢`S)¼' B`§''m
)K¢`S¿ m¢`§B§3¨
Á ¡ ¢ ¦ £ » ¡ ¼
VÉH¶ð Á¼'¡DBÓ'H ¢`S`¥\S)K¢`S` AÂ`¦ÎABï5ÂmK
B K'`¡DKB)r¨
BE
Á L&ÀMê'Á¾8Á ê'3m` ¢`VmÂ`KñVdK¢`&` ÂN¨
¾K¶ê'Í5ð5¶ð5&ç)ÉPç5¾KÀM ¾8¦'`KBÓ'[ ¢`Vm ¥\ \¤V¢`B¡D¢Ç ¢`S` Â\¼'¼'§BB¨
vi 004–2081–002
About This Guide
004–2081–002 vii
Scientific Libraries Reference Manual, Volume 2
Reader Comments
KJ
¾KÂ)¢`1ÆS¡j)¥\¥\` Q1Î?)'rK¢`S ¡D¢``B¡D§b¡D¡D'1¡jÂmÑ'¡D` ` ÑR3£` B)¸)
¢`B=¦'¡D'¥\` Ñ@¼@§BmIK§B§Z'¨'ë5Vm'3& \B`¡D§B'¦5M ¢`V BK§BI1`¦¼'Ad`'¥\ÎR
¢`V¦'¡D'¥\`b¤VBK¢ËÂ'R¡D¥\¥\`K¨
E¡DÇ¡D` ¡Dd'HBÇ`Â)r ¢`&8)§B§B)¤VB`£¤VÂø
ù Á`¦æK¥\B§Z \ ¢`SK§B§B)¤VB`£\¦5¦53mø
ù
techpubs@sgi.com
L PO H ·sÄ1´µV¹ÌÊ&µºµ·`¨
Á`¦&Kñ \ ¢`&K ` BÇ) 1çN¡D¢``B¡j1§lê''ΧBB¡j1KB)` V ø
½=mMK¢`&è¦5ΡDõ\)¼'KB)Ǹ ¢`SçN¡j¢``B¡D§Zê''Î?§BB¡D B)`=Í'BÎ?AAÂT
C )§B¦UC B¦5
ù
Cμ@£?ø
ù
http://techpubs.sgi.com
DA
¿P1§B§ZK¢`Sç¡D¢``B¡j1§lê''ΧBB¡j1KB)` 3)'¼'Ñ@ ¢`3)'£?¢ÇK¢`&çN¡j¢``B¡D§ZÉ=mmBmK`¡D
¿P`K Ñ'mB`£)`&dK¢`SK)§B§B¤VB`£\`'¥\ÎANø
èRÁGA ¾r¾KÃL¾K÷ Îm¦)¼'AKB`£ÂmK¥\Nø5·&ºµµ&ºµµéÁ BA ¾
èR½=V¾K¿PÀ=Á)h½HV¾8¿/ÀMÁ'ö¥\õ\Îm¦¼' B`£\m ¥\ )h¿PAÂÀMAB£BËʵµµ
H
mÂ?m ¥\øZ·&º1µµV¹´µ&ÊÈʹVÏKK)§B§Z83&K3)¥ ¢`S½=`B ¦Á K `¦¿P1`¦5Ðb
·sÄ´·&ĺÌV´Ä1µµ
ù Á`¦¥\B§5 \ ¢`VK)§B§B¤VB`£\¦5¦53ø
BA
ç¡D¢``B¡j1§lê''ΧBB¡j1KB)`
Á ¾
VN
·Äµ1µsÉH¥\¼'¢`BK¢`K3Sê'õ?¤V¨
R
¶'` B LB¤[Ñ`¿P1§BB8)A`BV¹éµé1Ì ·Ì´·
CÇ&Æ?§B'IÂ)'h¡D¥\¥\`K[`¦¤VB§B§l3¼')`¦KK¢`¥ó¼@3¥\¼' §BÂ2¨
viii 004–2081–002
CONTENTS
intro_lapack, INTRO_LAPACK .......................... Introduction to LAPACK solvers for dense linear systems ...................... 333
eispack, EISPACK .................................................. Introduction to Eigensystem computation for dense linear systems ......... 349
linpack, LINPACK .................................................. Single-precision real and complex LINPACK routines ............................ 355
Scalable LAPACK
intro_scalapack, INTRO_SCALAPACK ............ Introduction to the ScaLAPACK routines for distributed matrix
computations .............................................................................................. 359
descinit, DESCINIT ............................................. Initializes a descriptor vector of a distributed two-dimensional array ...... 362
indxg2p, INDXG2P .................................................. Computes the coordinate of the processing element (PE) that
possesses the entry of a distributed matrix ............................................... 364
numroc, NUMROC ...................................................... Computes the number of rows or columns of a distributed matrix
owned locally ............................................................................................ 365
pcheevx, PCHEEVX .................................................. Computes selected eigenvalues and eigenvectors of a Hermitian-
definite eigenproblem ................................................................................ 366
pchegvx, PCHEGVX .................................................. Computes selected eigenvalues and eigenvectors of a Hermitian-
definite generalized eigenproblem ............................................................. 374
psgebrd, PSGEBRD, PCGEBRD ............................... Reduces a real or complex distributed matrix to bidiagonal form ........... 382
psgelqf, PSGELQF, PCGELQF ............................... Computes an LQ factorization of a real or complex distributed matrix ... 387
psgeqlf, PSGEQLF, PCGEQLF ............................... Computes a QL factorization of a real or complex distributed matrix ..... 390
psgeqpf, PSGEQPF, PCGEQPF ............................... Computes a QR factorization with column pivoting of a real or
complex distributed matrix ........................................................................ 393
psgeqrf, PSGEQRF, PCGEQRF ............................... Computes a QR factorization of a real or complex distributed matrix ..... 396
psgerqf, PSGERQF, PCGERQF ............................... Computes a RQ factorization of a real or complex distributed matrix ..... 399
psgesv, PSGESV, PCGESV ...................................... Computes the solution to a real or complex system of linear
equations .................................................................................................... 402
psgetrf, PSGETRF, PCGETRF ............................... Computes an LU factorization of a real or complex distributed matrix ... 405
psgetri, PSGETRI, PCGETRI ............................... Computes the inverse of a real or complex distributed matrix ................. 408
psgetrs, PSGETRS, PCGETRS ............................... Solves a real or complex distributed system of linear equations .............. 411
psposv, PSPOSV, PCPOSV ...................................... Solves a real symmetric or complex Hermitian system of linear
equations .................................................................................................... 414
pspotrf, PSPOTRF, PCPOTRF ............................... Computes the Cholesky factorization of a real symmetric or complex
Hermitian positive definite distributed matrix ........................................... 418
pspotri, PSPOTRI, PCPOTRI ............................... Computes the inverse of a real symmetric or complex Hermitian
positive definite distributed matrix ............................................................ 421
pspotrs, PSPOTRS, PCPOTRS ............................... Solves a real symmetric positive definite or complex Hermitian
positive definite system of linear equations .............................................. 424
pssyevx, PSSYEVX .................................................. Computes selected eigenvalues and eigenvectors of a real symmetric
matrix ......................................................................................................... 427
pssygvx, PSSYGVX .................................................. Computes selected eigenvalues and eigenvectors of a real
symmetric-definite generalized eigenproblem ........................................... 434
pssytrd, PSSYTRD, PCHETRD ............................... Reduces a real symmetric or complex Hermitian distributed matrix to
tridiagonal form ......................................................................................... 442
pstrtri, PSTRTRI, PCTRTRI ............................... Computes the inverse of a real or complex upper or lower triangular
distributed matrix ....................................................................................... 446
pstrtrs, PSTRTRS, PCTRTRS ............................... Solves a real or complex distributed triangular system ............................ 449
BLACS routines
intro_blacs, INTRO_BLACS ............................... Introduction to Basic Linear Algebra Communication Subprograms ...... 535
blacs_barrier, BLACS_BARRIER ..................... Stops execution until all specifed processes have called a routine ........... 539
blacs_exit, BLACS_EXIT ................................... Frees all existing grids .............................................................................. 540
blacs_gridexit, BLACS_GRIDEXIT ................. Frees a grid ................................................................................................ 541
blacs_gridinfo, BLACS_GRIDINFO ................. Returns information about the two-dimensional processor grid ............... 542
blacs_gridinit, BLACS_GRIDINIT ................. Initializes counters, variables, and so on, for the BLACS routines .......... 543
blacs_gridmap, BLACS_GRIDMAP ..................... a grid of processors ................................................................................... 544
blacs_pcoord, BLACS_PCOORD .......................... Computes coordinates in two-dimensional grids ....................................... 545
blacs_pnum, BLACS_PNUM ................................... Returns the processor element number for specified coordinates in
two-dimensional grids ............................................................................... 546
gridinfo3d, GRIDINFO3D ................................... Returns information about the three-dimensional processor grid ............. 547
gridinit3d, GRIDINIT3D ................................... Initializes variables for a three-dimensional (3D) grid partition of
processor set .............................................................................................. 548
igamn2d, IGAMN2D, SGAMN2D, CGAMN2D ............ Determines minimum absolute values of rectangular matrices ................. 550
Out-of-core routines
intro_core, INTRO_CORE ................................... Introduction to the Cray Research Scientific Library out-of-core
routines for linear algebra ......................................................................... 575
scopy2rv, SCOPY2RV, CCOPY2RV ........................ Copies a submatrix of a real or complex matrix in memory into a
virtual matrix ............................................................................................. 590
scopy2vr, SCOPY2VR, CCOPY2VR ........................ Copies a submatrix of a virtual matrix to a real or complex (in
memory) matrix ......................................................................................... 593
vbegin, VBEGIN ...................................................... Initializes the out-of-core routine data structures ...................................... 595
vend, VEND ................................................................ Handles terminal processing for the out-of-core routines ......................... 598
vsgemm, VSGEMM, VCGEMM ...................................... Multiplies a virtual real or complex general matrix by a virtual real
or complex general matrix ........................................................................ 600
vsgetrf, VSGETRF, VCGETRF ............................... Computes an LU factorization of a virtual general matrix with real or
complex elements, using partial pivoting with row interchanges ............. 604
vsgetrs, VSGETRS, VCGETRS ............................... Solves a virtual system of linear equations, using the LU factorization
computed by VSGETRF(3S) or VCGETRF(3S) ......................................... 608
vspotrf, VSPOTRF .................................................. Computes the Cholesky factorization of a real symmetric positive
definite virtual matrix ................................................................................ 610
vspotrs, VSPOTRS .................................................. Solves a virtual system of linear equations with a symmetric positive
definite matrix whose Cholesky factorization has been computed by
VSPOTRF(3S) ............................................................................................ 612
vssyrk, VSSYRK ...................................................... Performs symmetric rank k update of a real or complex symmetric
virtual matrix ............................................................................................. 614
vstorage, VSTORAGE ............................................. Declares packed storage mode for a triangular, symmetric, or
Hermitian (complex only) virtual matrix .................................................. 616
vstrsm, VSTRSM, VCTRSM ...................................... Solves a virtual real or virtual complex triangular system of equations
with multiple right-hand sides ................................................................... 619
Superseded routines
intro_superseded, INTRO_SUPERSEDED ....... Introduction to superseded Scientific Library routines ............................. 631
gather, GATHER ...................................................... Gathers a vector from a source vector ...................................................... 633
minv, MINV ................................................................ Solves systems of linear equations by inverting a square matrix ............. 634
mxm, MXM ....................................................................
Computes matrix-times-matrix product (unit increments) ........................ 637
mxma, MXMA ................................................................ Computes matrix-times-matrix product (arbitrary increments) ................. 639
mxv, MXV ....................................................................
Computes matrix-times-vector product (unit increments) ......................... 642
mxva, MXVA ................................................................ Computes matrix-times-vector product (arbitrary increments) ................. 644
scatter, SCATTER .................................................. Scatters a vector into another vector ......................................................... 646
smxpy, SMXPY ........................................................... Multiplies a column vector by a matrix and adds the result to another
column vector ............................................................................................ 647
sxmpy, SXMPY ........................................................... Multiplies a row vector by a matrix and adds the result to another
row vector .................................................................................................. 649
trid, TRID ................................................................ Solves a tridiagonal system ....................................................................... 651
NAME
INTRO_LAPACK – Introduction to LAPACK solvers for dense linear systems
IMPLEMENTATION
See individual man pages for implementation details
DESCRIPTION
The preferred solvers for dense linear systems are those parts of the LAPACK package included in the
current version of the Scientific Library. The LAPACK routines in the Scientific Library supersede the older
LINPACK routines (see LINPACK(3S) for more information).
LAPACK Routines
LAPACK is a public domain library of subroutines for solving dense linear algebra problems, including the
following:
• Systems of linear equations
• Linear least squares problems
• Eigenvalue problems
• Singular value decomposition (SVD) problems
For details about which routines are supported, see LAPACK Routines Contained in the Scientific Library,
which follows.
The LAPACK package is designed to be the successor to the older LINPACK and EISPACK packages. It
uses today’s high-performance computers more efficiently than the older packages. It also extends the
functionality of these packages by including equilibration, iterative refinement, error bounds, and driver
routines for linear systems, routines for computing and reordering the Schur factorization, and condition
estimation routines for eigenvalue problems.
Performance issues are addressed by implementing the most computationally-intensive algorithms by using
the Level 2 and 3 Basic Linear Algebra Subprograms (BLAS). Because most of the BLAS were optimized
in single- and multiple-processor environments for UNICOS and UNICOS/mk systems, these algorithms give
near optimal performance.
The original Fortran programs are described in the LAPACK User’s Guide by E. Anderson, Z. Bai,
C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney,
S. Ostrouchov, and D. Sorensen, published by the Society for Industrial and Applied Mathematics (SIAM),
Philadelphia, 1992. You can order the LAPACK User’s Guide, publication TPD– 0003.
LAPACK Routines Contained in the Scientific Library
Most of the single-precision (64-bit) real and complex routines from LAPACK 2.0 are supported in the
Scientific Library. This includes driver routines and computational routines for solving linear systems, least
squares problems, and eigenvalue and singular value problems. Selected auxiliary routines for generating
and manipulating elementary orthogonal transformations are also supported.
The Scientific Library does not include the LAPACK driver routines for certain generalized eigenvalue and
singular value computations and the divide-and-conquer routines for computing eigenvalues, which were new
for LAPACK 2.0. This may be added in a future release. Also, most of the auxiliary routines used only
internally by LAPACK have been renamed to avoid conflicts with user-defined subroutine names.
The LAPACK routines in the Scientific Library are described online in man pages. For example, to see a
description of the arguments to the expert driver routine for solving a general system of equations, enter the
following command:
% man sgesvx
The user interface to all LAPACK routines is exactly the same as the standard LAPACK interface, except
for the CPTSV(3L) and CPTSVX(3L) driver routines. An optional character argument was added to CPTSV
and CPTSVX to afford upward compatibility with the storage format in LINPACK’s CPTSL. However,
because the argument is optional the LAPACK calling sequence also is accepted.
Several enhancements were made to the public-domain LAPACK software to improve performance for
UNICOS and UNICOS/mk systems. In particular, the solve routines were redesigned to give better
performance for one or a small number of right-hand sides, and to make better use of parallelism when the
number of right-hand sides is large.
Tuning parameters for the block algorithms provided in the Scientific Library are set within the LAPACK
routine ILAENV(3L). ILAENV(3L) is an integer function subprogram that accepts information about the
problem type and dimensions, and it returns one integer parameter, such as the optimal block size, the
minimum block size for which a block algorithm should be used, or the crossover point (the problem size at
which it becomes more efficient to switch to an unblocked algorithm). The setting of tuning parameters
occurs without user intervention, but users may call ILAENV(3L) directly to discover the values that will be
used (for example, to determine how much workspace to provide).
Naming Scheme
The name of each LAPACK routine is a coded specification of its function (within the limits of standard
FORTRAN 77 six-character names).
All driver and computational routines have five- or six-character names of the form XYYZZ or XYYZZZ.
The first letter in each name, X, indicates the data type, as follows:
S REAL (single precision)
C COMPLEX
The next two letters, YY, indicate the type of matrix (or the most-significant matrix). Most of these
two-letter codes apply to both real and complex matrices, but a few apply specifically to only one or the
other. The matrix types are as follows:
BD BiDiagonal
GB General Band
GE GEneral (nonsymmetric)
GG General matrices, Generalized problem
GT General Tridiagonal
HB Hermitian Band (complex only)
HE HErmitian (possibly indefinite) (complex only)
HG Hessenberg matrix, Generalized problem
HP Hermitian Packed (possibly indefinite) (complex only)
HS upper HeSsenberg
OP Orthogonal Packed (real only)
OR ORthogonal (real only)
PB Positive definite Band (symmetric or Hermitian)
PO POsitive definite (symmetric or Hermitian)
PP Positive definite Packed (symmetric or Hermitian)
PT Positive definite Tridiagonal (symmetric or Hermitian)
SB Symmetric Band (real only)
SP Symmetric Packed (possibly indefinite)
ST Symmetric Tridiagonal
SY SYmmetric (possibly indefinite)
TB Triangular Band
TG Triangular matrices, Generalized problem
TP Triangular Packed
TR TRiangular
TZ TrapeZoidal
UN UNitary (complex only)
UP Unitary Packed (complex only)
Some LAPACK auxiliary routines also have man pages on UNICOS and UNICOS/mk systems. These
routines use the special YY designation:
LA LAPACK Auxiliary routine
For example, ILAENV(3) is the auxiliary routine that determines the block size for a particular algorithm and
problem size.
The last two or three letters, ZZ or ZZZ, indicate the computation performed. For example, SGETRF
performs a TRiangular Factorization of a Single-precision (real) GEneral matrix; CGETRF performs the
factorization of a Complex GEneral matrix.
Name Purpose
Name Purpose
Name Purpose
Name Purpose
SSYSVX Solves a real or complex symmetric indefinite system of linear equations AX = B and
CSYSVX provides an estimate of the condition number and error bounds on the solution.
Computational Routines
These computational routines are listed in alphabetical order, with real matrix routines and complex matrix
routines grouped together as appropriate.
Name Purpose
CHECON Estimates the reciprocal of the condition number of a complex Hermitian indefinite matrix,
using the factorization computed by CHETRF.
CHERFS Improves the computed solution to a complex Hermitian indefinite system of linear
equations AX = B and provides error bounds for the solution.
CHETRF Computes the factorization of a complex Hermitian indefinite matrix, using the diagonal
pivoting method.
CHETRI Computes the inverse of a complex Hermitian indefinite matrix, using the factorization
computed by CHETRF.
CHETRS Solves a complex Hermitian indefinite system of linear equations AX = B, using the
factorization computed by CHETRF.
CHPCON Estimates the reciprocal of the condition number of a complex Hermitian indefinite matrix
in packed storage, using the factorization computed by CHPTRF.
CHPRFS Improves the computed solution to a complex Hermitian indefinite system of linear
equations AX = B (A is held in packed storage) and provides error bounds for the solution.
CHPTRF Computes the factorization of a complex Hermitian indefinite matrix in packed storage,
using the diagonal pivoting method.
CHPTRI Computes the inverse of a complex Hermitian indefinite matrix in packed storage, using the
factorization computed by CHPTRF.
CHPTRS Solves a complex Hermitian indefinite system of linear equations AX = B (A is held in
packed storage) using the factorization computed by CHPTRF.
ILAENV Determines tuning parameters (such as the block size).
SBDSQR Compute the singular value decomposition of a general matrix reduced to bidiagonal form
CBDSQR
SGBCON Estimates the reciprocal of the condition number of a general band matrix, in either the 1-
CGBCON norm or the infinity-norm, using the LU factorization computed by SGBTRF or CGBTRF.
Name Purpose
SGBEQU Computes row and column scalings to equilibrate a general band matrix and reduce its
CGBEQU condition number. Does not multiprocess or call any multiprocessing routines.
SGBRFS Improves the computed solution to any of the following general banded systems of linear
CGBRFS equations and provides error bounds for the solution.
AX = B
T
A X=B
H
A X=B
SGBTRF Computes an LU factorization of a general band matrix, using partial pivoting with row
CGBTRF interchanges.
SGBTRS Solves any of the following general banded systems of linear equations using the LU
CGBTRS factorization computed by SGBTRF or CGBTRF.
AX = B
T
A X=B
H
A X=B
SGEBAK Back transform the eigenvectors of a matrix transformed by SGEBAL/CGEBAL.
CGEBAK
SGEBAL Balances a general matrix A.
CGEBAL
SGEBRD Reduces a general matrix to upper or lower bidiagonal form by an orthogonal/unitary
CGEBRD transformation.
SGECON Estimates the reciprocal of the condition number of a general matrix, in either the 1-norm or
CGECON the infinity-norm, using the LU factorization computed by SGETRF or CGETRF.
SGEEQU Computes row and column scalings to equilibrate a general rectangular matrix and to reduce
CGEEQU its condition number.
SGEHRD Reduces a general matrix to upper Hessenberg form by an orthogonal/unitary transformation.
CGEHRD
SGELQF Computes an LQ factorization of a general rectangular matrix.
CGELQF
SGEQLF Computes a QL factorization of a general rectangular matrix.
CGEQLF
SGEQPF Computes a QR factorization with column pivoting of a general rectangular matrix.
CGEQPF
Name Purpose
Name Purpose
SGTTRF Computes an LU factorization of a general tridiagonal matrix, using partial pivoting with
CGTTRF row interchanges.
SGTTRS Solves a general tridiagonal system of linear equations using the LU factorization computed
CGTTRS by SGTTRF or CGTTRF. AX = B
T
A X=B
H
A X=B
SHGEQZ Compute the eigenvalues of a matrix pair (A,B) in generalized upper Hessenberg form using
CHGEQZ the QZ method
SHSEIN Compute eigenvectors of a upper Hessenberg matrix by inverse iteration
CHSEIN
SHSEQR Compute eigenvalues, Schur form, and Schur vectors of a upper Hessenberg matrix
CHSEQR
SLAMCH Computes machine-specific constants.
SLARF Applies an elementary reflector.
CLARF
SLARFB Applies a block reflector.
CLARFB
SLARFG Generates an elementary reflector.
CLARFG
SLARFT Forms the triangular factor of a block reflector.
CLARFT
SLARGV Generate a vector of real or complex plane rotations
CLARGV
SLARNV Generates a vector of random numbers.
CLARNV
SLARTG Generates a plane rotation.
CLARTG
SLARTV Apply a vector of real or complex plane rotations to two vectors
CLARTV
SLASR Apply a sequence of real plane rotations to a matrix
CLASR
SOPGTR Generates the orthogonal/unitary matrix Q from SSPTRD/CHPTRD.
CUPGTR
Name Purpose
Name Purpose
SPBRFS Improves the computed solution to a symmetric or Hermitian positive definite banded
CPBRFS system of linear equations AX = B and provides error bounds for the solution.
SPBSTF Compute a split Cholesky factorization of a symmetric or Hermitian positive definite band
CPBSTF matrix.
SPBTRF Computes the Cholesky factorization of a symmetric or Hermitian positive definite band
CPBTRF matrix.
SPBTRS Solves a symmetric or Hermitian positive definite banded system of linear equations AX =
CPBTRS B, using the Cholesky factorization computed by SPBTRF or CPBTRF.
SPOCON Estimates the reciprocal of the condition number of a symmetric or Hermitian positive
CPOCON definite matrix, using the Cholesky factorization computed by SPOTRF or CPOTRF.
SPOEQU Computes row and column scalings to equilibrate a symmetric or Hermitian positive definite
CPOEQU matrix and reduces its condition number.
SPORFS Improves the computed solution to a symmetric or Hermitian positive definite system of
CPORFS linear equations AX = B and provides error bounds for the solution.
SPOTRF Computes the Cholesky factorization of a symmetric or Hermitian positive definite matrix.
CPOTRF
SPOTRI Computes the inverse of a symmetric or Hermitian positive definite matrix, using the
CPOTRI Cholesky factorization computed by SPOTRF or CPOTRF.
SPOTRS Solves a symmetric or Hermitian positive definite system of linear equations AX = B, using
CPOTRS the Cholesky factorization computed by SPOTRF or CPOTRF.
SPPCON Estimates the reciprocal of the condition number of a symmetric or Hermitian positive
CPPCON definite matrix in packed storage, using the Cholesky factorization computed by SPPTRF or
CPPTRF.
SPPEQU Computes row and column scalings to equilibrate a symmetric or Hermitian positive definite
CPPEQU matrix in packed storage and reduces its condition number.
SPPRFS Improves the computed solution to a symmetric or Hermitian positive definite system of
CPPRFS linear equations AX = B (A is held in packed storage) and provides error bounds for the
solution.
SPPTRF Computes the Cholesky factorization of a symmetric or Hermitian positive definite matrix in
CPPTRF packed storage.
SPPTRI Computes the inverse of a symmetric or Hermitian positive definite matrix in packed
CPPTRI storage, using the Cholesky factorization computed by SPPTRF or CPPTRF.
SPPTRS Solves a symmetric or Hermitian positive definite system of linear equations AX = B (A is
CPPTRS held in packed storage) using the Cholesky factorization computed by SPPTRF or CPPTRF.
Name Purpose
H
SPTCON Uses the LDL factorization computed by SPTTRF or CPTTRF to compute the reciprocal
CPTCON of the condition number of a symmetric or Hermitian positive definite tridiagonal matrix.
SPTEQR Compute eigenvalues and eigenvectors of a symmetric or Hermitian positive definite
CPTEQR tridiagonal matrix.
SPTRFS Improves the computed solution to a symmetric or Hermitian positive definite tridiagonal
CPTRFS system of linear equations AX = B and provides error bounds for the solution.
SPTTRF Computes the LDL H factorization of a symmetric or Hermitian positive definite tridiagonal
CPTTRF matrix.
H
SPTTRS Uses the LDL factorization computed by SPTTRF or CPTTRF to solve a symmetric or
CPTTRS Hermitian positive definite tridiagonal system of linear equations.
SSBGST Reduce a symmetric or Hermitian definite banded generalized eigenproblem to standard
CHBGST form.
SSBTRD Reduce a symmetric or Hermitian band matrix to real symmetric tridiagonal form by an
CHBTRD orthogonal/unitary transformation.
SSPCON Estimates the reciprocal of the condition number of a real or complex symmetric indefinite
CSPCON matrix in packed storage, using the factorization computed by SSPTRF or CSPTRF.
SSPGST Reduce a symmetric or Hermitian definite generalized eigenproblem to standard form, using
CHPGST packed storage.
SSPRFS Improves the computed solution to a real or complex symmetric indefinite system of linear
CSPRFS equations AX = B (A is held in packed storage) and provides error bounds for the solution.
SSPTRD Reduces a symmetric/Hermitian packed matrix A to real symmetric tridiagonal form by an
CHPTRD orthogonal/unitary transformation.
SSPTRF Computes the factorization of a real or complex symmetric indefinite matrix in packed
CSPTRF storage, using the diagonal pivoting method.
SSPTRI Computes the inverse of a real or complex symmetric indefinite matrix in packed storage,
CSPTRI using the factorization computed by SSPTRF or CSPTRF.
SSPTRS Solves a real or complex symmetric indefinite system of linear equations AX = B (A is held
CSPTRS in packed storage) using the factorization computed by SSPTRF or CSPTRF.
SSTEBZ Compute eigenvalues of a symmetric tridiagonal matrix by bisection.
SSTEIN Compute eigenvectors of a real symmetric tridiagonal matrix by inverse iteration.
CSTEIN
SSTEQR Compute eigenvalues and eigenvectors of a real symmetric tridiagonal matrix using the
CSTEQR implicit QL or QR method.
Name Purpose
SSTERF Compute all eigenvalues of a symmetric tridiagonal matrix using the root-free variant of the
QL or QR algorithm.
SSYCON Estimates the reciprocal of the condition number of a real or complex symmetric indefinite
CSYCON matrix, using the factorization computed by SSYTRF or CSYTRF.
SSYGST Reduce a symmetric or Hermitian definite generalized eigenproblem to standard form.
CHEGST
SSYRFS Improves the computed solution to a real or complex symmetric indefinite system of linear
CSYRFS equations AX = B and provides error bounds for the solution.
SSYTRD Reduces a symmetric/Hermitian matrix A to real symmetric tridiagonal form by an
CHETRD orthogonal/unitary transformation.
SSYTRF Computes the factorization of a real complex symmetric indefinite matrix, using the
CSYTRF diagonal pivoting method.
SSYTRI Computes the inverse of a real or complex symmetric indefinite matrix, using the
CSYTRI factorization computed by SSYTRF or CSYTRF.
SSYTRS Solves a real or complex symmetric indefinite system of linear equations AX = B, using the
CSYTRS factorization computed by SSYTRF or CSYTRF.
STBCON Estimates the reciprocal of the condition number of a triangular band matrix, in either the
CTBCON 1-norm or the infinity-norm.
STBRFS Provides error bounds for the solution of any of the following triangular banded systems of
CTBRFS linear equations:
AX = B
T
A X=B
H
A X=B
STBTRS Solves any of the following triangular banded systems of linear equations:
CTBTRS AX = B
T
A X=B
H
A X=B
STGEVC Compute eigenvectors of a pair of matrices (A,B) in generalized Schur form.
CTGEVC
STPCON Estimates the reciprocal of the condition number of a triangular matrix in packed storage, in
CTPCON either the 1-norm or the infinity-norm.
Name Purpose
STPRFS Provides error bounds for the solution of any of the following triangular systems of linear
CTPRFS equations where A is held in packed storage.
AX = B
T
A X=B
H
A X=B
STPTRI Computes the inverse of a triangular matrix in packed storage.
CTPTRI
STPTRS Solves any of the following triangular systems of linear equations where A is held in packed
CTPTRS storage.
AX = B
T
A X=B
H
A X=B
STRCON Estimates the reciprocal of the condition number of a triangular matrix, in either the 1-norm
CTRCON or the infinity-norm.
STREVC Compute eigenvectors of a real upper quasi-triangular matrix.
CTREVC Compute eigenvectors of a complex triangular matrix.
STREXC Exchange diagonal blocks in the real Schur factorization of a real matrix.
CTREXC Exchange diagonal elements in the Schur factorization of a complex matrix.
STRRFS Provides error bounds for the solution of any of the following triangular systems of linear
CTRRFS equations:
AX = B
T
A X=B
H
A X=B
STRSEN Compute condition numbers to measure the sensitivity of a cluster of eigenvalues and its
CTRSEN corresponding invariant subspace.
STRSNA Compute condition numbers for specified eigenvalues and eigenvectors of a real upper
quasi-triangular matrix.
CTRSNA Compute condition numbers for specified eigenvalues and eigenvectors of a complex upper
triangular matrix.
STRSYL Solve the Sylvester matrix equation
CTRSYL
Name Purpose
SEE ALSO
LINPACK(3S) which lists the names of the LINPACK routines that are superseded by the linear system
solvers in LAPACK
LAPACK User’s Guide, CRI publication TPD– 0003
NAME
EISPACK – Introduction to Eigensystem computation for dense linear systems
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
EISPACK is a package of Fortran routines for solving the eigenvalue problem and for computing and using
the singular-value decomposition.
The original Fortran versions are described in the Matrix Eigensystem Routines – EISPACK Guide, second
edition, by B. T. Smith, J. M. Boyle, J. J. Dongarra, B. S. Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler,
published by Springer-Verlag, New York, 1976, Library of Congress catalog card number 76– 2662. The
original Fortran versions also are documented in the Matrix Eigensystem Routines - EISPACK Guide
Extensions (Lecture Notes in Computer Science, Vol. 51) by B. S. Garbow, J. M. Boyle, J. J. Dongarra, and
C. B. Moler, published by Springer-Verlag, New York, 1977, Library of Congress catalog card number
77– 2802.
Most EISPACK routines are superseded by routines from the more recent public domain package, LAPACK,
described in the LAPACK User’s Guide (see INTRO_LAPACK(3S) for a complete reference). Of particular
interest to EISPACK users who want to switch to LAPACK is Appendix D, "Converting from LINPACK
and EISPACK," of the LAPACK User’s Guide. This appendix contains a table that shows the name of the
LAPACK routines that are functionally equivalent to each EISPACK routine.
Each Scientific Library version of the EISPACK routines has the same name, algorithm, and calling
sequence as the original version. Optimization of each routine includes the following:
• Use of the Level 1 BLAS routines when applicable, and use of the Level 2 and 3 BLAS in TRED1,
TRED2, TRBAK, and REDUC.
• Removal of Fortran IF statements when the result of either branch is the same.
• Unrolling complicated Fortran DO loops to improve vectorization.
• Use of Fortran compiler directives to aid vector optimization.
These modifications increase vectorization and use optimized library routines; therefore, they reduce
execution time. Only the order of computations within a loop is changed; the modified versions produce the
same answers as the original versions, unless the problem is sensitive to small changes in the data.
The following table lists the routines, name, matrix or decomposition, and purpose for each routine.
Reduces matrix to upper Hessenberg form by using unitary Complex general CORTH
similarity transformations
Forms eigenvectors by back transforming those of the Real general ELMBAK
corresponding matrices determined by ELMHES
Reduces matrix to upper Hessenberg form by using Real general ELMHES
elementary similarity transformations
Accumulates transformations used in the reduction to upper Real general ELTRAN
Hessenberg form done by ELMHES
Reduces to symmetric tridiagonal matrix that has the same Real nonsymmetric tridiagonal FIGI
eigenvalues
Reduces to symmetric tridiagonal matrix that has the same Real nonsymmetric tridiagonal FIGI2
eigenvalues, retaining the diagonal similarity transformations
Finds eigenvalues by QR method Real upper Hessenberg HQR
Finds eigenvalues and eigenvectors by QR method Real upper Hessenberg HQR2
Finds eigenvectors given the eigenvectors of the real Complex Hermitian HTRIBK
symmetric tridiagonal matrix calculated by HTRIDI
(including eigenvectors calculated by TQL2 or IMTQL2)
Finds eigenvectors given the eigenvectors of the real Complex Hermitian (packed) HTRIB3
symmetric tridiagonal matrix calculated by HTRID3
(eigenvectors calculated by TQL2 or IMTQL2, among
others)
Reduces to real symmetric tridiagonal form by using unitary Complex Hermitian HTRIDI
similarity transformations
Reduces to real symmetric tridiagonal form by using unitary Complex Hermitian (packed) HTRID3
similarity transformations
Finds eigenvalues by using implicit QL method, and Real symmetric tridiagonal IMTQLV
associates them with their corresponding submatrix indices
Finds eigenvalues by implicit QL method Real symmetric tridiagonal IMTQL1
Finds eigenvalues and eigenvectors by implicit QL method Real symmetric tridiagonal IMTQL2
Finds eigenvectors that correspond to specified eigenvalues Real upper Hessenberg INVIT
by using inverse iteration
Determines the singular-value decomposition A = USV T , Real rectangular MINFIT
forming U T B rather than U by using Householder
bidiagonalization and a variant of the QR algorithm
Finds the eigenvalues that lie between specified indices by Real symmetric tridiagonal TRIDIB
using bisection
Finds the eigenvalues that lie in a specified interval and each Real symmetric tridiagonal TSTURM
corresponding eigenvector by using bisection and inverse
iteration
SEE ALSO
LAPACK User’s Guide, CRI publication TPD– 0003
NAME
LINPACK – Single-precision real and complex LINPACK routines
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
LINPACK is a public domain package of Fortran routines that solves systems of linear equations and
computes the QR, Cholesky, and singular value decompositions. The original Fortran programs are
described in the LINPACK User’s Guide by J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart,
published by the Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 1979, Library of
Congress catalog card number 78– 78206.
Most LINPACK routines are superseded by routines from the more recent public domain package, LAPACK,
described in the LAPACK User’s Guide (see INTRO_LAPACK(3S) for a complete reference). Of particular
interest to LINPACK users who want to switch to LAPACK is Appendix D, "Converting from LINPACK
and EISPACK," of the LAPACK User’s Guide. This appendix contains a table that shows the name of the
LAPACK routines that are functionally equivalent to each LINPACK routine.
Each single-precision Scientific Library version of the LINPACK routines has the same name, algorithm, and
calling sequence as the original version. Optimization of each routine includes the following:
• Replacement of calls to the BLAS routines SSCAL, SCOPY, SSWAP, SAXPY, and SROT with inline
Fortran code vectorized by the Cray Research Fortran compilers. (SROTG is still called by LINPACK.)
• Removal of Fortran IF statements in which the result of either branch is the same.
• Replacement of SDOT to solve triangular systems of linear equations in SPOSL, STRSL, and SCHDD
with more vectorizable code.
These optimizations affect only the execution order of floating-point operations in DO loops. See the
LINPACK User’s Guide for further descriptions. The complex routines have been added without much
optimization.
As mentioned previously, LAPACK does not completely supersede LINPACK. In the following table, an
asterick (*) marks LINPACK routines that are not superseded in public domain LAPACK. This table lists
the name, matrix or decomposition, and purpose for each routine.
SEE ALSO
INTRO_LAPACK(3S) for information and references about the LAPACK routines that supersede LINPACK
LAPACK User’s Guide, CRI publication TPD– 0003
Dongarra, J. J., C. B. Moler, J. R. Bunch, and G. W. Stewart, LINPACK User’s Guide. Society for
Industrial and Applied Mathematics (SIAM), Philadelphia, 1979.
NAME
INTRO_SCALAPACK – Introduction to the ScaLAPACK routines for distributed matrix computations
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
The ScaLAPACK library contains routines for solving real or complex general, triangular, or positive definite
distributed systems. It also contains routines for reducing distributed matrices to condensed form and an
eigenvalue problem solver for real symmetric distributed matrices. Finally, it also includes a set of routines
that perform basic operations involving distributed matrices and vectors, the PBLAS.
Individual man pages exist for all routines except the PBLAS. You can find more information on the
PBLAS on the World Wide Web at the following URL: http://www.netlib.org/.
Changes from Public Domain Version
The ScaLAPACK development team is directed by Jack Dongarra and consists of groups at UT Knoxville
and UC Berkeley. A version of the package is available in the public domain on the World Wide Web at
the following URL: http://www.netlib.org/.
In the UNICOS/mk version, the calling sequences to all ScaLAPACK routines remain unchanged.
Initialization
Some of the ScaLAPACK routines require the Basic Linear Algebra Communication Subprograms (BLACS)
to be initialized. This can be done through a call to BLACS_GRIDINIT(3S). Finally, each distributed array
that is passed as an argument to a ScaLAPACK routine, requires a descriptor, which is set through a call to
DESCINIT(3S). If a call is required, it is documented on the man page for the routine.
Available Routines
The following routines are available:
Linear Solvers
PSGETRF, PCGETRF LU factorization and solution of linear general
PSGETRS, PCGETRS distributed systems of linear equations.
PSTRTRS, PCTRTRS
PSGESV, PCGESV
PSPOTRF, PCPOTRF Cholesky factorization and solution of real symmetric
PSPOTRS, PCPOTRS or complex Hermitian distributed systems of linear
PSPOSV, PCPOSV equations.
PSGEQRF, PCGEQRF QR, RQ, QL, LQ, and QR with column pivoting for general
PSGERQF, PCGERQF distributed matrices.
PSGEQLF, PCGEQLF
PSGELQF, PCGELQF
PSGEQPF, CGEQPF
PSAMAX PCAMAX
PSASUM PSCASU M
PSAXPY PCAXPY
PSNRM2 PSCNRM 2
PSCOPY PCCOPY
PSDOT PCDOTC PCD OTU
PSSCAL PCS CAL PCS SCA L
PSSWAP PCS WAP
Level 2
Level 3
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S)
Choi, J., J. Dongarra, R. Pozo, and D. Walker, ‘‘Scalapack: A scalable linear algebra library for distributed
memory concurrent computers,’’ in Proceedings of the Fourth Symposium on the Frontiers of Massively
Parallel Computation, IEEE Comput. Soc. Press, 1992.
NAME
DESCINIT – Initializes a descriptor vector of a distributed two-dimensional array
SYNOPSIS
CALL DESCINIT (desc, m, n, mb, nb, irsrc, icsrc, icntxt, lld, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
DESCINIT associates a descriptor vector with a two-dimensional (2D) block, or block-cyclically distributed
array. The vector stores information required by the parallel 2D FFT and ScaLAPACK routines to establish
the mapping between an entry in the distributed 2D array and the processor that owns it.
The DESCINIT routine accepts the following arguments.
desc Integer array of dimension 9. (output)
Array descriptor.
m Integer. (input)
Number of global rows in the distributed matrix whose descriptor is being created.
n Integer. (input)
Number of global columns in the distributed matrix whose descriptor is being created.
mb Integer. (input)
Blocking size used to distribute the rows of the distributed matrix.
nb Integer. (input)
Blocking size used to distribute the columns of the distributed matrix.
irsrc Integer. (input)
Processor row that owns the first row of the distributed matrix.
icsrc Integer. (input)
Processor column that owns the first column of the distributed matrix.
icntxt Integer. (input)
Context handle that identifies the grid of processors over which the distributed matrix is
distributed as returned by a call to BLACS_GRIDINIT(3S).
lld Integer. (input)
The leading dimension of the local array that stores the local blocks of the distributed matrix.
info Integer. (output)
info = 0: Successful exit.
info < 0: If info = – i, the ith argument had an illegal value.
SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), BLACS_PCOORD(3S), INTRO_BLACS(3S)
NAME
INDXG2P – Computes the coordinate of the processing element (PE) that possesses the entry of a
distributed matrix
SYNOPSIS
my_home=INDXG2P(indxglob, nb, iproc, isrcproc, nproc)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
INDXG2P computes the coordinate of the processing element (PE) that posseses the entry of a distributed
matrix specified by a global index indxglob. The formula for my_home is the following:
my_home = MOD(isrcproc+(indxglob-1)/nb, nprocs)
This routine accepts the following arguments:
indxglob Integer. (global input)
The global index of the element.
nb Integer. (global input)
Block size, size of the blocks the distributed matrix is split into.
iproc Integer. (local dummy)
Dummy argument; used to unify the calling sequence of the tool routines.
isrcproc Integer. (global input)
The coordinate of the process that possesses the first row/column of the distributed matrix.
nproc Integer. (global input)
Total number processes over which the matrix is distributed.
NAME
NUMROC – Computes the number of rows or columns of a distributed matrix owned locally
SYNOPSIS
nrows_or_cols=NUMROC(n, nb, iproc, isrcproc, nprocs)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
NUMROC computes the number of rows or columns of a distributed matrix owned locally by the processor
indicated by iproc. If only a close upper-bound on the value is needed (for example, to determine how
much to allocate for a workspace), you can use the following formula to approximate the value returned by
this function:
nrows_or_cols ~= ((n/nb)/nprocs)*nb + nb
This routine accepts the following aruguments:
n Integer. (global input)
The number of rows/columns in distributed matrix.
nb Integer. (global input)
Block size. The size of the blocks which the blocks that the distributed matrix is split into.
iproc Integer. (local input)
The coordinate of the processor with the local array row or column to be determined.
isrcproc Integer. (global input)
The coordinate of the processor that possesses the first row or column of the distributed matrix.
nprocs Integer. (global input)
The total number of processors over which the matrix is distributed.
NAME
PCHEEVX – Computes selected eigenvalues and eigenvectors of a Hermitian-definite eigenproblem
SYNOPSIS
CALL PCHEEVX (jobZ, range, uplo, n, A, iA, jA, descA, vl, vu, il, iu, abstol, m, nZ, w,
orfac, Z, iZ, jZ, descZ, work, lwork, rwork, lrwork, iwork, ifail, iclustr, gap, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PCHEEVX computes all the eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A by
calling the recommended sequence of ScaLAPACK routines. Eigenvalues/vectors can be selected by
specifying a range of vlues or a range of indices for the desired eigenvalues.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSRC_A , NPR OW)
On exit, the lower triangle (if uplo = ’L’) or the upper triangle (if uplo = ’U’) of A, including the
diagonal, is destroyed.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated
on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension dlen_. (global input)
The array descriptor for the distributed matrix A. If descA(CTXT_ ) is incorrect, this routine
cannot guarantee correct error reporting.
vl Real. (global input)
If range=’V’, the lower bound of the interval to be searched for eigenvalues. If range =’A’ or ’I’,
it is not referenced.
vu Real. (global input)
If range =’V’, the upper bound of the interval to be searched for eigenvalues. If range =’A’ or ’I’,
it is not referenced.
il Integer. (global input)
If range =’I’, the index (from smallest to largest) of the smallest eigenvalue to be returned. il ≥ 1.
If range=’A’ or ’V’, it is not referenced.
iu Integer. (global input)
If range =’I’, the index (from smallest to largest) of the largest eigenvalue to be returned.
min(il,n) ≤ iu ≤ n. If range =’A’ or ’V’, it is not referenced.
abstol Real. (global input)
If jobZ=’V’, setting abstol to PSLAMCH(CONTEXT,’U’) yields the most orthogonal
eigenvectors.
This is the absolute error tolerance for the eigenvalues. An approximate eigenvalue is accepted as
converged when it is determined to lie in an interval [a,b] of width less than or equal to the
following:
abstol + eps * MAX(|a|,|b|)
eps is the machine precision. If abstol is ≤ 0, eps * norm(T) will be used in its place, where
norm(T) is the 1-norm of the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues will be computed most accurately when abstol is set to twice the underflow threshold
2*PSLAMCH(’S’) not zero. If this routine returns with ((MOD(INFO,2).NE.0).OR.
(MOD(INFO/8,2).NE.0)), indicating that some eigenvalues or eigenvectors did not converge,
try setting abstol to 2*PSLAMCH(’S’).
If info ≥ 0
if jobZ = ’N’, work(1) equals the minimal and optimal amount of workspace;
if jobZ = ’V’, work(1) equals the minimal amount of workspace required to guarantee
orthogonal eigenvectors on the given input matrix with the given ortol. In version 1.0,
work(1) equals the minimal workspace required to compute eigenvales.
If info<0, then
if jobZ=’N’, work(1) equals the minimal and optimal amount of workspace
if jobZ=’V’
if range=’A’ or range=’I’, then work(1) equals the minimal workspace required
to compute all eigenvectors (no guarantee on orthogonality).
if range=’V’, then work(1) equals the minimal workspace required to compute
N_Z=DESCZ(N_) eigenvectors (no guarantee on orthogonality). In version 1.0,
work(1) equals the minimal workspace required to compute eigenvalues.
lwork Integer. (locak input)
Size of work array. If only eigenvalues are requested, lwork ≥ N + (NPO + MQP + NB) *
NB. If eigenvectors are requested, lwork ≥ N + MAX(NB*(NPO+1),3)
rwork Real array, dimension (lrwork). (local workspace/output)
lrwork Integer. (local input) The following variable definitions are used to define lrwork:
NN = MAX ( N, NB, 2 )
NEI G = number of eigenvectors requested
NB = descA( MB_ ) = des cA( NB_ ) = des cZ( MB_ ) = des cZ( NB_ )
des cA( RSR C_ ) = des cA( NB_ ) = des cZ( RSR C_ ) = des cZ( CSRC_ ) = 0
NP0 = NUMROC( NN, NB, 0, 0, NPR OW )
MQ0 = NUMROC(MA X(N EIG,NB,2) NV,0,0 ,NPCOL )
ICE IL( X, Y ) is a ScaLAPACK function returning ceiling (X/ Y)
{W(K),...,W(K+CLUSTERSIZE-1)|W(J+1)≤ W(J)+orfac*norm(A)}
If lrwork is too small to guarantee orthogonality, PCHEEVX attempts to maintain orthogonality in
the clusters with the smallest spacing between the eigenvalues. If lrwork is too small to compute
all of the eigenvectors requested, no computation is performed and info = – 25 is returned. Note
that when range = ’V’, PCHEEVX does not know how many eigenvectors are requested until the
eigenvalues are computed. Therefore, when range = ’V’ and as long as lwork is large enough to
allow PCHEEVX to compute the eigenvalues, PCHEEVX will compute the eigenvalues and as many
eigenvectors as it can.
Relationship between workspace, orthogonality, and performance:
If CLUSTERSIZE ≥ N/SQRT(NPROW*NPCOL), providing enough space to compute all the
eigenvectors orthogonally will cause serious degradation in performance. In the limit (i.e.
CLUSTERSIZE = N-1), PCSTEIN will perform no better than CSTEIN on one processor. For
CLUSTERSIZE = N/SQRT(NPROW*NPCOL) reorthogonalizing all eigenvectors will increase the
total execution time by a factor of 2 or more.
For CLUSTERSIZE > N/SQRT(NPROW*NPCOL), execution time will grow as the square of the
cluster size, all other factors remaining equal and assuming enough workspace. Less workspace
means less reorthogonalization but faster execution.
iwork Integer array. (local workspace)
On return, iwork(1) contains the amount of integer workspace required. If the input parameters are
incorrect, iwork(1) may also be incorrect.
liwork Integer. (local input)
Size of iwork. liwork ≥ 6*NNP
where:
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)
NAME
PCHEGVX – Computes selected eigenvalues and eigenvectors of a Hermitian-definite generalized
eigenproblem
SYNOPSIS
CALL PCHEGVX (ibtype, jobZ, range, uplo, n, A, iA, jA, descA, B, iB, jB, descB, vl, vu,
il, iu, abstol, m, nZ, w, orfac, Z, iZ, jZ, descZ, work, lwork, rwork, lrwork, iwork, ifail,
iclustr, gap, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PCHEGVX computes all the eigenvalues and, optionally, eigenvectors of a complex generalized Hermitian-
definite eigenproblem, of the form:
sub (A)*x= (lambd a)* sub (B) *x, sub (A) *su b(B )x= (la mbd a)* x
or
sub (B)*su b(A)*x =(l amb da) *x
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSRC_A , NPR OW)
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)
NAME
PSGEBRD, PCGEBRD – Reduces a real or complex distributed matrix to bidiagonal form
SYNOPSIS
CALL PSGEBRD (m, n, A, iA, jA, descA, D, E, tauQ, tauP, work, liwork, info)
CALL PCGEBRD (m, n, A, iA, jA, descA, D, E, tauQ, tauP, work, liwork, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSGEBRD and PCGEBRD reduce a real or complex general m-by-n distributed matrix of the following form:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
to upper or lower bidiagonal form B by the following orthogonal transformation:
Q’ sub(A)*P = B
If m ≥ n, B is upper bidiagonal; if m < n, B is lower bidiagonal.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSRC_A , NPR OW)
These routines accept the following arguments. For PCGEBRD, the following real arguments must be
complex:
m Integer. (global input)
The number of rows to be operated on (the order of the distributed submatrix sub(A)).
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, this array contains the local pieces of the general distributed matrix sub(A).
On exit, if m ≥ n, the diagonal and the first superdiagonal of sub(A) are overwritten with the upper
bidiagonal matrix B; the elements below the diagonal, with the array tauQ, represent the orthogonal
matrix Q as a product of elementary reflectors, and the elements above the first superdiagonal, with
the array tauP, represent the orthogonal matrix P as a product of elementary reflectors.
If m < n, the diagonal and the first subdiagonal are overwritten with the lower bidiagonal matrix B;
the elements below the first subdiagonal, with the array tauQ, represent the orthogonal matrix Q as
a product of elementary reflectors, and the elements above the diagonal, with the array tauP,
represent the orthogonal matrix P as a product of elementary reflectors. See the Further Details
subsection for more information.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
and NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool functions; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = -i.
Alignment Requirements
The distributed submatrix sub(A) must verify some alignment properties, namely the following expressions
should be true:
(MB_A. EQ.NB_ A .AN D. IROFFA .EQ.IC OFFA)
Further Details
The matrices Q and P are represented as products of elementary reflectors (if m ≥ n):
Q = H(1 ) H(2)...H( n) and P=G(1) G(2 ).. .G( n-1 )
Q = H(1 ) H(2 ).. .H(n)
where tauQ and tauP are real scalars, and v and u are real vectors; v(1:i-1)=0, v(i)=1, and
v(i+1:m) is stored on exit in A(iA+i:iA+m-1,jA+i-1); u(1:i)=0, u(i+1)=1, and u(i+2:n) is
stored on exit in A(iA+i-1,jA+i+1:jA+n-1); tauQ is stored in tauQ(jA+i-1), and tauP is stored
in tauP(iA+i-1).
If m < n,
Q = H(1 ) H(2)...H( m-1 ) and P = G(1 ) G(2 ).. .G( m)
where tauQ and tauP are real scalars, and v and u are real vectors; v(1:i)=0, v(i+1)=1, and
v(i+2:m) is stored on exit in A(iA+i+1:iA+m-1,jA+i-1); u(1:i-1)=0, u(i)=1, and u(i+1:n)
is stored on exit in A(iA+i-1,jA+i:jA+n-1); tauQ is stored in tauQ(jA+i-1) and tauP is stored
in tauP(iA+i-1)
The following examples illustrate the contents of sub(A) on exit:
(m > n) (m < n)
m = 6 and n =5 m = 5 and n = 6
( d e u1 u1 u1 ) ( d u1 u1 u1 u1 u1 )
( v1 d e u2 u2 ) ( e d u2 u2 u2 u2 )
( v1 v2 d e u3 ) ( v1 e d u3 u3 u3 )
( v1 v2 v3 d e ) ( v1 v2 e d u4 u4 )
( v1 v2 v3 v4 d ) ( v1 v2 v3 e d u5 )
( v1 v2 v3 v4 v5 )
where d and e denote diagonal and off-diagonal elements of B, v1 denotes an element of the vector defining
H(i), and u1 an element of the vector defining G(i).
SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)
NAME
PSGELQF, PCGELQF – Computes an LQ factorization of a real or complex distributed matrix
SYNOPSIS
CALL PSGELQF (m, n, A, iA, jA, descA, tau, work, lwork, info)
CALL PCGELQF (m, n, A, iA, jA, descA, tau, work, lwork, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSGELQF and PCGELQF compute a LQ factorization of a real or complex distributed m-by-n matrix:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)=L*Q
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp( M )=N UMROC( M, MB_ A, MYR OW, RSRC_A , NPR OW )
These routines accept the following arguments. For PCGELQF, the following arguments must be complex:
m Integer. (global input)
The number of rows to be operated on; that is, the order of the distributed submatrix sub(A). m
must be ≥ 0.
n Integer. (global input)
The number of columns to be operated on; that is, the number of columns of the distributed
submatrix sub(A). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the m-by-n distributed matrix sub(A) to be factored.
On exit, the elements on and below the diagonal of sub(A) contain the (MIN(m,n)-by-m) lower
trapezoidal matrix L (L is lower triangular if m ≤ n); the elements above the diagonal, with the array
tau, represent the orthogonal matrix Q as a product of elementary reflectors. See the Further Details
subsection for more information.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
tau Real array, dimension LOCp(iA+MIN(m,n)-1. (local output)
This array contains the scalar factors tau of the elementary reflectors. tau is tied to the distributed
matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, work(1) returns the minimal and optimal lwork.
lwork Integer. (local input)
The dimension of the array work.
and NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool functions; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
Further Details
The matrix Q is represented as a product of elementary reflectors:
Q = H(i A+k -1) H(i A+k -2) ...H(i a)
where k=MIN(m,n).
Each H(i) has the following form:
H = I - tau * v * v’
where tau is a real scalar, and v is a real vector with v(1:i-1)=0 and v(i)=1; v(i+1:n) is stored on
exit in A(iA+i-1:jA+i-1,jA+n-1) and tau is stored in tau(iA+i-1).
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)
NAME
PSGEQLF, PCGEQLF – Computes a QL factorization of a real or complex distributed matrix
SYNOPSIS
CALL PSGEQLF (m, n, A, iA, jA, descA, tau, work, lwork, info)
CALL PCGEQLF (m, n, A, iA, jA, descA, tau, work, lwork, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSGEQLF and PCGELQF compute a QL factorization of a real or complex distributed m-by-n matrix:
sub(A)=A(iA:iA+m-1,jA:jA+n-1)=Q*L
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimenstional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p( M ) = NUMROC ( M, MB_ A, MYR OW, RSRC_A , NPR OW )
These routines accept the following arguments. For PCGEQLF, the following real arguments must be
complex:
m Integer. (global input)
The number of rows to be operated on (the order of the distributed submatrix sub(A)). m must be ≥
0.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the m-by-n distributed matrix sub(A) to be factored.
On exit, if m ≥ n, the lower triangle of the distributed submatrix sub(A) contains the n-by-n lower
triangular matrix L. If m ≤ n, the elements on and below the (n-to-m)-th superdiagonal contain the
m-by-n lower trapezoidal matrix L. The remaining elements, with the array tau, represent the
orthogonal matrix Q as a product of elementary reflectors. See the Further Details subsection for
more information.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
tau Real array, dimension LOCq(N_A). (local output)
This array contains the scalar factors tau of the elementary reflectors. tau is tied to the distributed
matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, work(1) returns the minimal and optimal lwork.
and NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool functions; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, then info = – i.
Further Details
The matrix Q is represented as a product of elementary reflectors:
Q = H(j a+k-1) ...H(j a+1 ) (H(ja)
where k=MIN(m,n)
Each H(i) has the following form:
H = I - tau * v * v’
where tau is a real scalar, and v is a real vector with v(m-k+i+1:m)=0 and v(m-k+i)=1; v(1:m-
k+i-1) is stored on exit in A(iA:iA+m-k+i-2,jA+n-k+i-1), and tau is stored in
tau(jA+n-k+i-1).
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)
NAME
PSGEQPF, PCGEQPF – Computes a QR factorization with column pivoting of a real or complex distributed
matrix
SYNOPSIS
CALL PSGEQPF (m, n, A, iA, jA, descA, ipiv, tau, work, lwork, info)
CALL PCGEQPF (m, n, A, iA, jA, descA, ipiv, tau, work, lwork, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSGEQPF and PCGEQPF compute a QR factorization with column pivoting of a real or complex m-by-n
distributed matrix:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
sub(A)*P = Q*R
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp( M ) = NUMROC ( M, MB_ A, MYR OW, RSR C_A, NPROW )
These routines accept the following arguments. For PCGEQPF, the following real arguments must be
complex:
m Integer. (global input)
The number of rows to be operated on (the order of the distributed submatrix sub(A)). m must be ≥
0.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the m-by-n distributed matrix sub(A) to be factored.
On exit, the elements on and above the diagonal of sub(A) contain the (MIN(m,n)-by-n) upper
trapezoidal matrix R (R is upper triangular if m ≥ n); the elements below the diagonal, with the
array tau represent the orthogonal matrix Q as a product of elementary reflectors. See the Further
Details subsection for more information.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
ipiv Integer array, dimension (LOCq(jA+n-1). (local output)
On exit, if ipiv(i) = k, the local i-th column of A(iA:iA+n– 1, jA:jA+n-1)*P was the global kth
column of A(iA:iA+n-1, jA:jA+n-1). ipiv is tied to the distributed matrix A.
and NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool functions; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
Further Details
The matrix Q is represented as a product of elementary reflectors:
Q = H(1 ) H(2 ) ... H(n )
where tau is a real scalar, and v is a real vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is
stored on exit in A(iA+i-1:iA+m-1,jA+i-1).
The matrix P is represented in jpvt as follows: if jpvt(j) = i the jth column of P is the ith canonical
unit vector.
SEE ALSO
BLACS_GRIDINFO(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)
NAME
PSGEQRF, PCGEQRF – Computes a QR factorization of a real or complex distributed matrix
SYNOPSIS
CALL PSGEQRF (m, n, A, iA, jA, descA, tau, work, lwork, info)
CALL PCGEQRF (m, n, A, iA, jA, descA, tau, work, lwork, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSGEQRF and PCGEQRF compute a QR factorization of a real or complex distributed m-by-n matrix of the
form:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)=Q*R
These routines require square block decomposition (MB_A = NB_A, as defined in the comments which
follow).
A description vector is associated with each two-dimenstional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSRC_A , NPR OW)
These routines accept the following arguments. For PCGEQRF, the following real arguments must be
complex:
m Integer. (global input)
The number of rows to be operated on (the order of the distributed submatrix sub(A)). m must be ≥
0.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the m-by-n distributed matrix sub(A) to be factored.
On exit, the elements on and above the diagonal of sub(A) contain the (MIN(m,n) by n) upper
trapezoidal matrix R (R is upper triangular if m ≥ n); the elements below the diagonal, with the
array tau represent the orthogonal matrix Q as a product of elementary reflectors. See the Further
Details subsection for more information.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
tau Real array, dimension LOCq(jA+MIN(m,n)-1). (local output)
This array contains the scalar factors tau of the elementary reflectors. tau is tied to the distributed
matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, work(1) returns the minimal and optimal lwork.
and NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool functions; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, then info = – i.
Further Details
The matrix Q is represented as a product of elementary reflectors:
Q = H(jA) H(j A+1 ) ... H(j A+k-1)
where k = min(m,n).
Each H(i) has the following form:
H = I - tau * v * v’
where tau is a real scalar, and v is a real vector with v(1:i-1)=0 and v(i)=1; v(i+1:m) is stored on
exit in A(iA+i-1:iA+m-1,jA+i-1) and tau is stored in TAU(jA+i-1).
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)
NAME
PSGERQF, PCGERQF – Computes a RQ factorization of a real or complex distributed matrix
SYNOPSIS
CALL PSGERQF (m, n, A, iA, jA, descA, tau, work, lwork, info)
CALL PCGERQF (m, n, A, iA, jA, descA, tau, work, lwork, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSGERQF and PCGERQF compute a RQ factorization of a real or complex distributed m-by-n matrix:
sub(A) = A(iA:iA+m-1,jA:jA+n-1) = R * Q
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSR C_A, NPROW)
These routines accept the following arguments. For PCGERQF, the following real arguments must be
complex:
m Integer. (global input)
The number of rows to be operated on (the order of the distributed submatrix sub(A)).
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the m-by-n distributed matrix sub(A) to be factored.
On exit, if m≤ n, the upper triangle of sub(A) contains the m-by-m upper triangular matrix R. If m
≥ n, the elements on and above the (m-to-n)-th subdiagonal contain the m-by-n upper trapezoidal
matrix R; the remaining elements, with the array tau, represent the orthogonal matrix Q as a product
of elementary reflectors (see the Further Details subsection).
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (global and local input)
The array descriptor for the distributed matrix A.
tau Real array, dimension LOCp(M_A). (local output)
This array contains the scalar factors tau of the elementary reflectors. tau is tied to the distributed
matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, work(1) returns the minimal and optimal lwork.
lwork Integer. (local input)
lwork ≥ MB_A * (Mp0 + Nq0 + MB_A)
where
and NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool functions; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
Further Details
The matrix Q is represented as a product of elementary reflectors:
Q = H(i A) H(i A+1 ) ... H(iA+k -1)
where k = MIN(m,n).
Each H(i) has the following form:
H = I - tau * v * v’
where tau is a real scalar, and v is a real vector with v(n-k+i+1:n)=0 and v(n-k+1)=1;
v(1:n-k+1-1) is stored on exit in A(iA+i-1:iA+m-1,jA+i-1) and tau is stored in
TAU(iA+m-k+i-1).
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)
NAME
PSGESV, PCGESV – Computes the solution to a real or complex system of linear equations
SYNOPSIS
CALL PSGESV (n, nrhs, A, iA, jA, descA, ipiv, B, iB, jB, descB, info)
CALL PCGESV (n, nrhs, A, iA, jA, descA, ipiv, B, iB, jB, descB, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSGESV and PCGESV compute the solution to a real or complex system of linear equations:
sub(A) X = sub(B)
where sub(A)=A(iA:iA+n-1,jA:jA+n-1) is an n-by-n distributed matrix and X and sub(B)=B(iB:iB+n-
1, jB:jB+nrhs-1) are n-by-nrhs distributed matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor sub(A) as sub(A) = P *
L * U, where P is a permutation matrix, L is unit lower triangular, and U is upper triangular. L and U are
stored in sub(A). The factored form of sub(A) is then used to solve the system of equations sub(A)X=sub(B).
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of block-cyclicly distributed matrics. In these comments, the
underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p(M) = NUM ROC(M, MB_ A, MYR OW, RSR C_A, NPROW)
These routines accept the following arguments. For PCGESV, the following real arguments must be
complex:
n Integer. (global input)
The number of rows and columns to be operated on (the order of the distributed submatrix sub(A)).
n must be ≥ 0.
nrhs Integer. (global input)
The number of right hand sides (the number of columns of the distributed submatrix sub(A)). nrhs
must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the n-by-n distributed matrix sub(A) to be factored.
On exit, this array contains the local pieces of the factors L and U from the factorization sub(A) =
P*L*U; the unit diagonal elements of L are not stored.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
ipiv Integer array, dimension ( LOCp(M_A)+MB_A ). (local output)
This array contains the pivoting information ipiv(i), which is the global row that local row i was
swapped with. This array is tied to the distributed matrix A.
B Real pointer into the local memory to an array of dimension (LLD_B,LOCq(jB+nrhs– 1). (local
input/local output)
On entry, the right hand side distributed matrix sub(B).
On exit, if info=0, sub(B) is overwritten by the solution distributed matrix X.
iB Integer. (global input)
The global row index of B, which points to the beginning of the submatrix that will be operated on.
jB Integer. (global input)
The global column index of B, which points to the beginning of the submatrix that will be operated
on.
descB Integer array of dimension 9. (input)
The array descriptor for the distributed matrix B.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
info > 0 If info = K, U(iA+K-1,jA+K-1) is exactly 0. The factorization has been completed,
but the factor U is exactly singular, so the solution could not be computed.
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)
NAME
PSGETRF, PCGETRF – Computes an LU factorization of a real or complex distributed matrix
SYNOPSIS
CALL PSGETRF (m, n, A, iA, jA, descA, ipiv, info)
CALL PCGETRF (m, n, A, iA, jA, descA, ipiv, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSGETRF and PCGETRF compute an LU factorization of a real or complex general m-by-n distributed
matrix of the form:
sub(A)=(iA:iA+m-1,jA:jA+n-1)
by using partial pivoting with row interchanges.
The factorization has the following form:
sub(A) = P * L * U
P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and
U is upper triangular (upper trapezoidal if m < n). L and U are stored in sub(A).
This is the right-looking Parallel Level 3 BLAS version of the algorithm.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSR C_A, NPROW)
These routines accept the following arguments. For PCGETRF, the following real arguments must be
complex:
m Integer. (global input)
The number of rows to be operated on (the order of the distributed submatrix sub(A)). m must be ≥
0.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the m-by-n distributed matrix sub(A) to be factored.
On exit, this array contains the local pieces of the factors L and U from the factorization sub(A) =
P*L*U; the unit diagonal elements of L are not stored.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A, which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
ipiv Integer array, dimension ( LOCp(M_A)+MB_A). (local output)
This array contains the pivoting information. ipiv(i) is the global row that local row i was swapped
with. This array is tied to the distributed matrix A.
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S)
NAME
PSGETRI, PCGETRI – Computes the inverse of a real or complex distributed matrix
SYNOPSIS
CALL PSGETRI (n, A, iA, jA, descA, ipiv, work, lwork, iwork, liwork, info)
CALL PCGETRI (n, A, iA, jA, descA, ipiv, work, lwork, iwork, liwork, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSGETRI and PCGETRI compute the inverse of a real or complex distributed matrix by using the LU
factorization computed by PSGETRF(3S) or PCGETRF(3S). This method inverts U and then computes the
inverse of InvA, which is the following:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
It does this by solving the system InvA*L=inv(U) for InvA.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSRC_A , NPR OW)
These routines accept the following arguments. For PCGETRI, the following real arguments must be
complex:
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the L and U obtained by the factorization sub(A)=P*L*U computed by
PSGETRF(3F).
On exit, if info = 0, sub(A) contains the inverse of the original distributed matrix sub(A).
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A, which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
ipiv Integer array, dimension ( LOCp(M_A)+MB_A. (local output)
This arrray keeps track of the pivoting information. ipiv(i) is the global row index that the local
row i was swapped with. This array is tied to the distributed matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, if info = 0, work(1) returns the minimal and optimal lwork.
lwork Integer. (local input)
lwork=LOCp(n+MOD(iA-1,MB_A))*NB_A. lwork is used to keep a copy of (at maximum) an
entire column block of sub(A).
iwork Integer array, dimension (liwork). (local workspace)
On exit, if info = 0, iwork(1) returns the minimal and optimal liwork.
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), PCGETRF(3S), PSGETRF(3S)
NAME
PSGETRS, PCGETRS – Solves a real or complex distributed system of linear equations
SYNOPSIS
CALL PSGETRS (trans, n, nrhs, A, iA, jA, descA, ipiv, B, iB, jB, descB, info)
CALL PCGETRS (trans, n, nrhs, A, iA, jA, descA, ipiv, B, iB, jB, descB, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSGETRS and PCGETRS solve a system of real or complex distributed linear equations
op (sub(A)) * X = sub(B)
with a general n-by-n distributed matrix sub(A) by using the LU factorization computed by PSGETRF(3S).
sub(A) denotes
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
T
and op(A)=A or A , and sub(B) denotes B(iB:iB+ n– 1, jB:jB+nrhs -1).
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M) = NUM ROC(M, MB_ A, MYR OW, RSRC_A , NPR OW)
These routines accept the following arguments. For PCGETRS, the following real arguments must be
complex:
trans Character. (global input)
Specifies the form of the system of equations:
trans = ’N’: sub(A) * X = sub(B) (No transpose)
T
trans = ’T’: sub(A) * X = sub(B) (Transpose)
T
trans = ’C’: sub(A) * X = sub(B) (Transpose)
n Integer. (global input)
The number of columns to be operated on; (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
nrhs Integer. (global input)
The number of right-hand sides (the number of columns of the distributed submatrix sub(B)). nrhs
must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the of the factors L and U from the factorization sub(A)=P*L*U; the
unit diagonal elements of L are not stored.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
ipiv Integer array, dimension (LOCp(M_A+MB_A). (local input)
This array contains the pivoting information, ipiv(i), which is the global row that local row i was
swapped with. This array is tied to the distributed matrix A.
B Real pointer into the local memory to an array of dimension (LLD_B, LOCq(jB +nrhs– 1)). (local
input/local output)
On entry, the right-hand side of distributed matrix sub(B).
On exit, sub(B) is overwritten by the solution distributed matrix X.
iB Integer. (global input)
The global row index of B, which points to the beginning of the submatrix that will be operated on.
jB Integer. (global input)
The global column index of B, which points to the beginning of the submatrix that will be operated
on.
descB Integer array of dimension 9. (input)
The array descriptor for the distributed matrix B.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, then info = – i.
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S), PSGETRF(3S)
NAME
PSPOSV, PCPOSV – Solves a real symmetric or complex Hermitian system of linear equations
SYNOPSIS
CALL PSPOSV (uplo, n, nrhs, A, iA, jA, descA, B, iB, jB, descB, info)
CALL PCPOSV (uplo, n, nrhs, A, iA, jA, descA, B, iB, jB, descB, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSPOSV computes the solution to a real symmetric positive definite system of linear equations, as in the
following:
sub(A) X = sub(B)
where sub(A) denotes the following:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
sub(A) is an n-by-n symmetric distributed positive definite matrix and X and sub(B), which denotes the
following, are n-by-nrhs distributed matrices:
B(iB:iB+n-1,jB:jB+nrhs-1)
In the case of PCPOSV, the matrix must be Hermitian positive definite.
The Cholesky decomposition is used to factor sub(A) in the following way:
T
sub(A)=U * U if uplo = ’U’
T
sub(A)=L * L if uplo = ’L’
U is an upper triangular matrix, and L is a lower triangular matrix. The factored form of sub(A) is then used
to solve the system of equations.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
These routines accept the following arguments. For PCPOSV, the following real arguments must be
complex:
uplo Character. (global input)
uplo = ’U’: Upper triangle of sub(A) is stored.
uplo = ’L’: Lower triangle of sub(A) is stored.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
nrhs Integer. (global input)
The number of right-hand sides (the number of columns of the distributed submatrix sub(B)). nrhs
must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the n-by-n symmetric distributed matrix sub(A) to be factored.
If uplo =’U’, the leading n-by-n upper triangular part of sub(A) contains the upper triangular part of
the matrix, and its strictly lower triangular part is not refrenced.
If uplo = ’L’, the leading n-by-n lower triangular part of sub(A) contains the lower triangular part of
the distributed matrix, and its strictly upper triangular part is not referenced.
On exit, if info = 0, this array contains the local pieces of the factor U or L from the Cholesky
T T
factorization sub(A) = U * U or L * L .
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
B Real pointer into the local memory to an array of dimension (LLD_B,LOC(jB+nrhs– 1). (local
input/local output)
On entry, the right hand side distributed matrix sub(BfR).
On exit, if info = 0, sub(B) is overwritten by the solution distributed matrix X.
iB Integer. (global input)
The global row index of B, which points to the beginning of the submatrix that will be operated on.
jB Integer. (global input)
The global column index of B, which points to the beginning of the submatrix that will be operated
on.
descB Integer array of dimension 9. (input)
The array descriptor for the distributed matrix B.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
>0 If info = K, the leading minor of order K, A(iA:iA+K-1,jA+K-1) is not positive
definite. The factorization could not be completed, and the solution could not be
computed.
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)
NAME
PSPOTRF, PCPOTRF – Computes the Cholesky factorization of a real symmetric or complex Hermitian
positive definite distributed matrix
SYNOPSIS
CALL PSPOTRF (uplo, n, A, iA, jA, descA, info)
CALL PCPOTRF (uplo, n, A, iA, jA, descA, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSPOTRF computes the Cholesky factorization of an n-by-n real symmetric positive definite distributed
matrix of the form:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
PCPOTRF computes the Cholesky factorization of a Hermitian positive definite distributed matrix.
The factorization has the following form; U is an upper triangular matrix, and L is a lower triangular matrix.
sub(A)=U’ * U if uplo=’U’
sub(A)=L * L’ if uplo=’L’
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSRC_A , NPR OW)
LOCq(N )=N UMROC(N, NB_ A, MYC OL, CSR C_A, NPCOL)
These routines accept the following arguments. For PCPOTRF, the following real arguments must be
complex:
uplo Character. (global input)
uplo= ’U’: Upper triangle of sub(A) is stored;
uplo = ’L’: Lower triangle of sub(A) is stored.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the n-by-n symmetric distributed matrix sub(A) to be factored.
If uplo = ’U’, the leading n-by-n upper triangular part of the matrix sub(A) contains the upper
triangular matrix, and its strictly lower triangular part of sub(A) is not referenced.
If uplo = ’L’, the leading n-by-n lower triangular part of the matrix sub(A) contains the lower
triangular matrix, and the strictly upper triangular part of sub(A) is not referenced.
On exit, if uplo = ’U’, the upper triangular part of the distributed matrix contains the Cholesky
factor U; if uplo = ’L’, the lower triangular part of the distributed matrix contains the Cholesky
factor L.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A, which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)
NAME
PSPOTRI, PCPOTRI – Computes the inverse of a real symmetric or complex Hermitian positive definite
distributed matrix
SYNOPSIS
CALL PSPOTRI (uplo, n, A, iA, jA, descA, info)
CALL PCPOTRI (uplo, n, A, iA, jA, descA, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSPOTRI computes the inverse of a real symmetric positive definite distributed matrix of the form:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
T T
by using the Cholesky factorization sub(A)=U * U or L * L computed by PSPOTRF(3S).
PCPOTRI computes the inverse of a complex Hermitian positive definite matrix using the output from
PCPOTRF(3S).
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSR C_A, NPROW)
These routines accept the following arguments. For PCPOTRI, the following real arguments must be
complex:
uplo Character. (global input)
uplo = ’U’: Upper triangle of sub(A) is stored.
uplo = ’L’: Lower triangle of sub(A) is stored.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the triangular factor U or L from the Cholesky factorization of the
T T
distributed matrix sub(A)=U * U or L * L , as computed by PSPOTRF(3S).
On exit, the local pieces of the upper or lower triangle of the (symmetric) inverse of sub(A),
overwriting the input factor U or L.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A, which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
info > 0 If info = i, the (i,i) element of the factor U or L is 0, and the inverse could not be
computed.
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S), PSPOTRF(3S)
NAME
PSPOTRS, PCPOTRS – Solves a real symmetric positive definite or complex Hermitian positive definite
system of linear equations
SYNOPSIS
CALL PSPOTRS (uplo, n, nrhs, A, iA, jA, descA, B, iB, jB, descB, info)
CALL PCPOTRS (uplo, n, nrhs, A, iA, jA, descA, B, iB, jB, descB, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSPOTRS solves a real symmetric positive definite system of linear equations of the form:
sub(A) * X = sub(B)
where sub(A) denotes the following:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
sub(A) is an n-by-n symmetric positive definite distributed matrix using the following Cholesky factorization
and computed by PSPOTRF(3S):
T
sub(A)=U * U
or
T
L*L
sub(B) denotes the following distributed matrix B:
sub(B)=B(iB:iB+n-1,jB:jB+nrhs-1)
PCPOTRS requires a Hermitian positive definite matrix.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSRC_A , NPR OW)
These routines accept the following arguments. For PCPOTRS, the following real arguments must be
complex:
uplo Character. (global input)
uplo = ’U’: Upper triangle of sub(A) is stored.
uplo = ’L’: Lower triangle of sub(A) is stored.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
nrhs Integer. (global input)
The number of right-hand sides (the number of columns of the distributed submatrix sub(B)). nrhs
must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
T
On entry, this array contains the factors L or U from the Cholesky factorization sub(A)=L * L or
T
U * U, as computed by PSPOTRF(3S).
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S), PSPOTRF(3S)
NAME
PSSYEVX – Computes selected eigenvalues and eigenvectors of a real symmetric matrix
SYNOPSIS
CALL PSSYEVX (jobZ, range, uplo, n, A, iA, jA, descA, vl, vu, il, iu, abstol, m, nZ, w,
orfac, Z, iZ, jZ, descZ, work, lwork, iwork, liwork, ifail, iclustr, gap, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSSYEVX computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A by
calling the recommended sequence of ScaLAPACK routines. Eigenvalues/vectors can be selected by
specifying a range of values or a range of indices for the desired eigenvalues.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSR C_A, NPROW)
In describing the following arguments, NP, the number of rows local to a given processor, and NQ, the
number of columns local to a given processor, are used.
These routines accept the following arguments:
jobZ Character*1. (global input)
Specifies whether to compute the eigenvectors:
jobZ =’N’: Compute only eigenvalues.
jobZ =’V’: Compute eigenvalues and eigenvectors.
range Character*1. (global input)
range =’A’: All eigenvalues will be found.
range =’V’: All eigenvalues in the half-open interval (vl,vu) will be found.
range =’I’: The ilth through iuth eigenvalues will be found.
uplo Character. (global input)
Specifies whether the upper or lower triangular part of the symmetric matrix A is stored:
uplo =’U’: Upper triangle of sub(A) is stored.
uplo =’L’: Lower triangle of sub(A) is stored.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Block cyclic real array. (local input/workspace)
Global dimension (n,n), local dimension (descA(9), NQ)
On entry, the symmetric matrix A.
If uplo=’U’, only the upper triangular part of A is used to define the elements of the symmetric
matrix.
If uplo=’L’, only the lower triangular part of A is used to define the elements of the symmetric
matrix.
On exit, the lower triangle (if iplo=’L’) or the upper triangle (if uplo=’U’) of A, including the
diagonal, is destroyed.
NN = MAX( N, NB, 2 )
NEIG = number of eigenvectors requested
NB = descA( 3 ) = des cA( 4 ) = des cZ( 3 ) = descZ( 4 )
des cA( 5 ) = des cA( 4 ) = des cZ( 5 ) = des cZ( 6 ) = 0
IA = JA = IZ = JZ = 1
NP = NUMROC ( N, NB, MYR OW, 0, NPR OW )
NP0 = NUMROC ( NN, NB, 0, 0, NPROW )
NQ0 = MAX ( NUM ROC ( NEI G, NB, 0, 0, NPC OL ), NB )
ICEIL( X, Y ) is a ScaLAPACK function returning ceiling (X/Y)
If no eigenvectors are requested (jobZ = ’N’), lwork ≥ 5*N + MAX( 5*NN, NB*(NP+1) ).
If eigenvectors are requested (jobZ = ’V’), the amount of workspace required to guarantee that all
eigenvectors are computed is the following:
work≥5*N+MAX(5*NN,NP0*NQ0)+ICEIL(NEIG,NPROW*NPCOL)*NN+2*NB*NB
The computed eigenvectors may not be orthogonal if the minimal workspace is supplied and ortol
is too small. If you want to guarantee orthogonality (at the cost of potentially poor performance)
you should add the following to lwork:
(CLUSTERSIZE-1)*N
CLUSTERSIZE is the number of eigenvalues in the largest cluster, where a cluster is defined as a
set of close eigenvalues:
{W(K),...,W(K+CLUSTERSIZE-1)|W(J+1)≤ W(J)+orfac*norm(A)}
If lwork is too small to guarantee orthogonality, PSSYEVX attempts to maintain orthogonality in
the clusters with the smallest spacing between the eigenvalues. If lwork is too small to compute all
of the eigenvectors requested, no computation is performed and info = – 23 is returned. Note that
when range = ’V’, PSSYEVX does not know how many eigenvectors are requested until the
eigenvalues are computed. Therefore, when range = ’V’ and as long as lwork is large enough to
allow PSSYEVX to compute the eigenvalues, PSSYEVX will compute the eigenvalues and as many
eigenvectors as it can.
Relationship between workspace, orthogonality, and performance:
If CLUSTERSIZE ≥ N/SQRT(NPROW*NPCOL), providing enough space to compute all the
eigenvectors orthogonally will cause serious degradation in performance. In the limit (i.e.
CLUSTERSIZE = N-1), PSSTEIN will perform no better than DSTEIN on one processor. For
CLUSTERSIZE = N/SQRT(NPROW*NPCOL) reorthogonalizing all eigenvectors will increase the
total execution time by a factor of 2 or more.
For CLUSTERSIZE > N/SQRT(NPROW*NPCOL), execution time will grow as the square of the
cluster size, all other factors remaining equal and assuming enough workspace. Less workspace
means less reorthogonalization but faster execution.
NAME
PSSYGVX – Computes selected eigenvalues and eigenvectors of a real symmetric-definite generalized
eigenproblem
SYNOPSIS
CALL PSSYGVX (ibtype, jobZ, range, uplo, n, A, iA, jA, descA, B, iB, jB, descB, vl, vu,
il, iu, abstol, m, nZ, w, orfac, Z, iZ, jZ, descZ, work, lwork, iwork, liwork, ifail, iclustr,
gap, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSSYGVX computes all the eigenvalues and, optionally, eigenvectors of a real generalized SY-definite
eigenproblem, of the form:
sub (A)*x= (lambd a)* sub (B) *x, sub (A) *su b(B )x= (la mbd a)* x
or
sub (B)*su b(A)*x =(l amb da) *x
Here sub(A) denoting A(IA:IA+N-1, JA:JA+N-1) is assumed to be SY, and sub(B) denoting
B(IB:IB+N-1, JB:JB+N-1) is assumed to be symmetric positive definite.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSRC_A , NPR OW)
If no eigenvectors are requested (jobZ = ’N’), lwork ≥ 5*N + MAX( 5*NN, NB*(NP+1) ).
If eigenvectors are requested (jobZ = ’V’), the amount of workspace required to guarantee that all
eigenvectors are computed is the following:
lwork≥5*N+MAX(5*NN,NP0*MQ0+2*NB*NB) +ICEIL(NEIG,NPROW*NPCOL)*NN
The computed eigenvectors may not be orthogonal if the minimal workspace is supplied and ortol
is too small. If you want to guarantee orthogonality (at the cost of potentially poor performance)
you should add the following to lwork:
(CLUSTERSIZE-1)*N
CLUSTERSIZE is the number of eigenvalues in the largest cluster, where a cluster is defined as a
set of close eigenvalues:
{W(K),...,W(K+CLUSTERSIZE-1)|W(J+1)≤ W(J)+orfac*norm(A)}
If lwork is too small to guarantee orthogonality, PSSYGVX attempts to maintain orthogonality in
the clusters with the smallest spacing between the eigenvalues. If lwork is too small to compute all
of the eigenvectors requested, no computation is performed and info = – 23 is returned. Note that
when range = ’V’, PSSYGVX does not know how many eigenvectors are requested until the
eigenvalues are computed. Therefore, when range = ’V’ and as long as lwork is large enough to
allow PSSYGVX to compute the eigenvalues, PSSYGVX will compute the eigenvalues and as many
eigenvectors as it can.
Relationship between workspace, orthogonality, and performance:
If CLUSTERSIZE ≥ N/SQRT(NPROW*NPCOL), providing enough space to compute all the
eigenvectors orthogonally will cause serious degradation in performance. In the limit (i.e.
CLUSTERSIZE = N-1), PSSTEIN will perform no better than SSTEIN on one processor. For
CLUSTERSIZE = N/SQRT(NPROW*NPCOL) reorthogonalizing all eigenvectors will increase the
total execution time by a factor of 2 or more.
For CLUSTERSIZE > N/SQRT(NPROW*NPCOL), execution time will grow as the square of the
cluster size, all other factors remaining equal and assuming enough workspace. Less workspace
means less reorthogonalization but faster execution.
iwork Integer array. (local workspace)
On return, iwork(1) contains the amount of integer workspace required. If the input parameters are
incorrect, iwork(1) may also be incorrect.
liwork Integer. (local input)
Size of iwork. liwork ≥ 6*NNP
where:
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)
NAME
PSSYTRD, PCHETRD – Reduces a real symmetric or complex Hermitian distributed matrix to tridiagonal
form
SYNOPSIS
CALL PSSYTRD (uplo, n, A, iA, jA, descA, D, E, tau, work, lwork, info)
CALL PCHETRD (uplo, n, A, iA, jA, descA, D, E, tau, work, lwork, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSSYTRD reduces a real symmetric matrix sub(A) to symmetric tridiagonal form T by an orthogonal
similarity transformation:
Q’ sub(A) * Q = T
where sub(A)= A(iA:iA+n– 1, jA:jA+n– 1).
PCHETRD requires a complex Hermitian matrix.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSRC_A , NPR OW)
These routines accept the following arguments. For PCHETRD, the following real arguments must be
complex:
uplo Character. (global input)
uplo = ’U’: Upper triangle of sub(A) is stored.
uplo = ’L’: Lower triangle of sub(A) is stored.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the symmetric distributed matrix sub(A) to be factored.
If uplo = ’U’, the leading n-by-n upper triangular part of sub(A) contains the upper triangular part of
the matrix, and its strictly lower triangular part is not refrenced.
If uplo = ’L’, the leading n-by-n lower triangular part of sub(A) contains the lower triangular part of
the distributed matrix, and its strictly upper triangular part is not referenced.
On exit, if uplo = ’U’, the diagonal and first superdiagonal of sub(A) are overwritten by the
corresponding elements of the tridiagonal matrix T, and the elements above the first superdiagonal,
with the array tau, represent the orthogonal matrix Q as a product of elementary reflectors; if uplo =
’L’, the diagonal and first subdiagonal of sub(A) are overwritten by the corresponding elements of
the tridiagonal matrix T, and the elements below the first subdiagonal, with the array tau, represent
the orthogonal matrix Q as a product of elementary reflectors. See the Further Details subsection
for more information.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool function; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
Alignment Requirements
The distributed submatrix sub(A) must verify some alignment properties, namely the following expression
should be true:
( MB_ A.E Q.NB_A .AND. IRO FFA .EQ.IC OFFA .AND. IROFFA .EQ.0 )
Further Details
If uplo = ’U’, the matrix Q is represented as a product of elementary reflectors
Q = H(n -1) ... H(2 ) H(1 )
tau is a real scalar, and v is a real vector with v(i+1:n) = 0 and v(i) = 1; v(1:i-1) is stored on
exit in A(iA+i-1:iA+m-1,jA+i-1), and tau is stored in tau(jA+i-1).
If uplo = ’L’, the matrix Q is represented as a product of elementary reflectors
Q = H(1 ) ... H(2 ) H(n -1)
where tau is a real scalar, and v is a real vector with v(1:i) = 0 and v(i+1) = 1; v(i+1:n) is
stored on exit in A(iA+i-1:iA+m-1,jA+i-1), and tau is stored in tau(jA+i-1).
The contents of sub(A) on exit are illustrated by the following examples with n = 5:
if uplo = ’U’: if uplo = ’L’:
( d e v2 v3 v4 ) ( d )
( d e v3 v4 ) ( e d )
( d e v4 ) ( v1 e d )
( d e ) ( v1 v2 e d )
( d ) ( v1 v2 v3 e d )
In this example, d and e denote diagonal and off-diagonal elements of T, and v1 denotes an element of the
vector defining H(i).
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), INDXG2P(3S), NUMROC(3S)
NAME
PSTRTRI, PCTRTRI – Computes the inverse of a real or complex upper or lower triangular distributed
matrix
SYNOPSIS
CALL PSTRTRI (uplo, diag, n, A, iA, jA, descA, info)
CALL PCTRTRI (uplo, diag, n, A, iA, jA, descA, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSTRTRI and PCTRTRI compute the inverse of a real or complex upper or lower triangular distributed
matrix of the form:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSRC_A , NPR OW)
These routines accept the following arguments. For PCTRTRI, the following real arguments must be
complex:
uplo Character. (global input)
Specifies whether the distributed matrix sub(A) is upper or lower triangular:
uplo = ’U’: Upper triangle of sub(A) is stored.
uplo = ’L’: Lower triangle of sub(A) is stored.
diag Character. (global input)
Specifies whether the distributed matrix sub(A) is unit triangular:
diag = ’N’: Non-unit triangular.
diag = ’U’: Unit triangular.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n-1). (local
input/local output)
On entry, the local pieces of the triangular matrix sub(A).
If uplo = ’U’, the leading n-by-n upper triangular part of the matrix sub(A) contains the upper
triangular matrix to be inverted, and the strictly lower triangular part of sub(A) is not referenced.
If uplo = ’L’, the leading n-by-n lower triangular part of the matrix sub(A) contains the lower
triangular matrix, and the strictly upper triangular part of sub(A) is not referenced.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)
NAME
PSTRTRS, PCTRTRS – Solves a real or complex distributed triangular system
SYNOPSIS
CALL PSTRTRS (uplo, trans, diag, n, nrhs, A, iA, jA, descA, B, iB, jB, descB, info)
CALL PCTRTRS (uplo, trans, diag, n, nrhs, A, iA, jA, descA, B, iB, jB, descB, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSTRTRS and PCTRTRS solve a real or complex triangular system of the form
sub(A) * X = sub(B)
or
T
sub(A) * X = sub(B)
where sub(A) denotes the following:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
and sub(A) is a triangular distributed matrix of order N, and the following is an n-by-nrhs distributed matrix
denoted by sub(B):
sub(B)=B(iB:iB+n-1,jB:jB+n-1)
A check is made to verify that sub(A) is nonsingular.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMROC(M, MB_ A, MYR OW, RSR C_A, NPROW)
These routines accept the following arguments. For PCTRTRS, the following real arguments must be
complex:
uplo Character. (input)
uplo = ’U’: sub(A)=A(iA:iA+n-1,jA:jA+n-1) is upper triangular.
uplo = ’L’: sub(A)=A(iA:iA+n-1,jA:jA+n-1) is lower triangular.
trans Character. (global input)
Specifies the form of the system of equations:
trans = ’N’: sub(A) * X = sub(B) (No transpose).
T
trans = ’T’: sub(A) * X = sub(B) (Transpose).
T
trans = ’C’: sub(A) * X = sub(B) (Transpose).
diag Character. (global input)
diag = ’N’: sub(A) is non-unit triangular
diag = ’U’: sub(A) is unit triangular
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
nrhs Integer. (global input)
The number of right-hand sides (the number of columns of the distributed matrix sub(B)). nrhs
must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n-1). (local
input/local output)
If uplo = ’U’, the leading n-by-n upper triangular part of the matrix sub(A) contains the upper
triangular matrix and its strictly lower triangular part of sub(A) is not referenced.
If uplo = ’L’, the leading n-by-n lower triangular part of the matrix sub(A) contains the lower
triangular matrix, and the strictly upper triangular part of sub(A) is not referenced.
If diag = ’U’, the diagonal elements of sub(A) are also not referenced and are assumed to be 1.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated
on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be
operated on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
B Real pointer into the local memory to an array of dimension (LLD_B, LOCq(jB +nrhs– 1). (local
input/local output)
On entry, the right-hand side distributed matrix sub(B).
On exit, if info = 0, sub(B) is overwritten by the solution distributed matrix X.
iB Integer. (global input)
The global row index of B, which points to the beginning of the submatrix that will be operated
on.
jB Integer. (global input)
The global column index of B, which points to the beginning of the submatrix that will be
operated on.
descB Integer array of dimension 9. (input)
The array descriptor for the distributed matrix B.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value,
info = -(i*100+j); if the ith argument is a scalar and had an illegal value, info = – i.
info > 0: If info = i, the i-ith diagonal element of sub(A) is 0, which indicates that the
submatrix is singular and the solutions have not been computed.
NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)
NAME
INTRO_SPARSE – Introduction to solvers for sparse linear systems
IMPLEMENTATION
UNICOS systems
DESCRIPTION
The following table lists the purpose and name of the sparse linear system routines.
Purpose Name
A sparse matrix is a matrix that has relatively few nonzero values. This type of matrix occurs frequently in
key computational steps of a variety of engineering and scientific applications. Most sparse matrix software
takes advantage of this "sparseness" to reduce the amount of storage and arithmetic required by keeping track
of only the nonzero entries in the matrix.
Storage Formats
Suppose that the n-by-n input matrix A has nza nonzero entries. The data structure used to represent A is a
column-oriented format, which is referred to as the sparse column format, in which the entries are grouped
by columns. In this format, the row indices of the nonzero elements in the first column are stored
contiguously in ascending order in an array irowind; then the row indices are stored for the second column,
and so on. The corresponding values are stored in an array values. A pointer array, icolptr, points to the
first entry in each column of A in irowind and values. icolptr(n+1) is set to nza+1. irowind and values are
arrays of length nza, and icolptr is of length n+1. Hence, 2nza+n+1 words of storage are required to
2
represent A, rather than the usual n words in the corresponding dense matrix format. Moreover, in the case
when A is symmetric, there is an even more compact symmetric column pointer format, in which only the
lower triangular part of A is stored.
Suppose A is a 5-by-5 matrix with 13 nonzero elements defined as follows:
11 0 0 41 0
0 22 32 0 52
A = 0 32 33 43 0
41 0 43 44 0
0 52 0 0 55
The full sparse column format representation of A is as follows:
values = (11 41 22 32 52 32 33 43 41 43 44 52 55 )
irowind = ( 1 4 2 3 5 2 3 4 1 3 4 2 5 )
icolptr = ( 1 3 6 9 12 14 )
Because A is symmetric, the following symmetric sparse column format representation of A also is valid:
values = (11 41 22 32 52 33 43 44 55 )
irowind = ( 1 4 2 3 5 3 4 4 5 )
icolptr = ( 1 3 6 8 9 10 )
Direct Versus Iterative Solution
Techniques for the solution of sparse linear systems can be divided into two broad classes: direct and
iterative.
Direct solution
An explicit factorization of the matrix is computed, and it is used to solve for a solution of the linear system
given a right-hand side. The solution obtained by direct methods is certain to be as accurate as the problem
definition.
Iterative solution
A sequence of approximations is generated iteratively, which should converge to a solution of the linear
system. Unlike direct methods, iterative methods tend to be more special-purpose, and it is well known that
no general, effective iterative algorithms exist for an arbitrary sparse linear system. However, for certain
classes of problems, the use of an appropriate iterative method can yield an approximate solution
significantly faster than direct methods. Also, iterative methods typically require less memory than direct
methods, making iterative methods the only feasible approach for some large problems. In an attempt to
compensate for the lack of robustness of any single iterative method and preconditioner, this package
provides a variety of methods and preconditioners. All are preconditioned conjugate gradient-type methods.
You can find a reference to a good introduction to these methods in the SEE ALSO section.
Analyze Phase for the Direct Sparse
In the direct solution of sparse linear systems, the structure of the input matrix usually is preprocessed prior
to the numerical factorization and the numerical solution phase. This is often referred to as the Analyze
phase. Only the structure of the matrix (that is, icolptr and irowind) is required at this stage. As
implemented in the package, the Analyze phase is further divided into the following:
• Fill-reduction reordering phase
• Symbolic factorization phase
• Execution sequence and memory management phase
Fill-reduction reordering phase
For a given sparse symmetric matrix A, the lower triangular matrix L from the LDL T factorization of A is
generally much more dense than A because of the fill-in generated at locations in which Ai j = 0. To reduce
this amount of fill-in, the routine applies an appropriate symmetric row and column permutation P to A
before carrying out the numerical factorization on PAP T . The system to be solved is then
PAP T y = Pb , x =P T y .
The reordering heuristic used in the package is based on the multiple minimum degree algorithm (see the
SEE ALSO section for a reference), which has proven to be a very effective practical method for reducing
the amount of fill-in created during the factorization. Moreover, in most problems, some of the columns of
the resultant factor L naturally have identical sparsity structure. These columns are grouped into what is
commonly referred to as supernode, and they are processed together in subsequent stages. This results in
significant performance improvement over previous sparse matrix solvers. The supernode concept can be
relaxed further by allowing additional fill-ins in L, so that more columns can be grouped together, resulting
in fewer and larger supernodes.
Experience shows that more often than not this trade-off of additional fill-ins (and therefore, more
operations) for fewer but larger supernodes reduces the execution time overall.
Symbolic factorization phase
Given the structure of the input matrix and a permutation matrix P as determined from the fill-reduction
reordering phase, the symbolic factorization phase builds the data structure for the nonzero entries of L.
EXAMPLES
The following examples show the use of the iterative and direct sparse solver routines.
PRO GRA M EX1
PAR AME TER (NM AX = 5, NZA S = 9, NZA U = 13)
PAR AME TER (LI WORK = 350 , LWO RK = LIW ORK )
INT EGE R NEQ NS, NZA , IPA TH, IER R, ROW U(NZAU ), COL U(N ZAU),
& ROW S(N ZAS), COL S(N MAX +1), IWORK( LIW ORK ), IPARAM (40 )
REA L AMA TU( NZAU), AMA TS(NZA S), RPARAM (30 ), X(N MAX), B(N MAX ),
& BGE (NM AX) , BPO (NMAX) , BTS(NM AX) , SOL N(N MAX), WOR K(L WOR K)
CHA RAC TER *3 MET HOD
c
c --- --- --- --- --------- --- --- ----
c Def ine matrix , sol uti on and RHS
c --- --- --- ------ --- --- ------ --- -
c
c.. ... Ful l col umn poi nter format
DAT A COL U / 1, 4, 7, 9, 12, 14/
DAT A ROW U / 1, 2, 4, 1, 2, 3, 2, 3, 1, 4, 5, 4, 5/
DAT A AMA TU / 4.,-1.,-1 .,- 1., 4., -1. ,-1 ., 4.,-1. , 4., -1. ,-1 ., 4./
c
c.. ... Sym met ric column poi nter format
DAT A COL S / 1, 4, 6, 7, 9, 10/
DAT A ROW S / 1, 2, 4, 2, 3, 3, 4, 5, 5/
DAT A AMA TS / 4.,-1. ,-1 ., 4.,-1. , 4., 4.,-1. , 4./
c
DAT A SOL N / 1., 1., 1., 1., 1. /
DAT A B / 2., 2., 3., 2., 3. /
DAT A BGE / 2., 2., 3., 2., 3. /
DAT A BPO / 2., 2., 3., 2., 3. /
DAT A BTS / 2., 2., 3., 2., 3. /
c
NEQ NS = 5
c
c --- --- ------ --- --------- ---
c Sol ve pro ble m usi ng SIT RSO L
c --- --- --- --- --- --- ------ --- ------ --- --- -
c Sol ve sam e pro ble m usi ng SSTSTR F/S STS TRS
c --- --- --- --- --- --- ------ --- ------ --- --- -
c
c.. ... use all def aul t val ues
IPA RAM (1) = 0
c.. ... do all 4 pha ses of fac toriza tio n
IDO = 14
c
c.. ... com put e fac tor iza tio n usi ng SST STR F
CAL L SST STR F ( IDO , NEQ NS, COL U, ROWU, AMA TU, LWO RK,
& WOR K, IPA RAM , IER R )
c
c.. ... com put e sol uti on usi ng SSTSTR S
c
c.. ...sol ve sta nda rd way
IDO = 1
c.. ...sol ve for 1 RHS wit h lea din g dim = neq ns
NRH S = 1
LDB = NEQ NS
c
CAL L SST STR S ( IDO , LWO RK, WOR K, NRHS, BTS , LDB ,
& IPA RAM, IERR )
c
c --- --- --- --- --- --- ------ --- --- --- --
c Com par e sol uti ons to exact sol ution
c --- --- --- --- --- --- ------ --- --- --- --
c
c.. ... Com put e two -no rm of the dif fer ence betwee n exa ct and com put ed
c for all sol uti on tec hni ques (SSxxTRS sol uti on is in Bxx)
c
c.. ... com put e dif fer enc es
CAL L SAX PY ( NEQ NS, -1. , SOL N, 1, X, 1 )
CAL L SAX PY ( NEQ NS, -1. , SOL N, 1, BGE , 1 )
CAL L SAX PY ( NEQ NS, -1., SOLN, 1, BPO , 1 )
CAL L SAX PY ( NEQ NS, -1. , SOL N, 1, BTS , 1 )
c
c.. ... com put e nor ms
ERR I = SNR M2( NEQ NS, X, 1 )
ERR GE = SNR M2( NEQ NS, BGE , 1 )
ERR PO = SNR M2( NEQ NS, BPO, 1 )
ERR TS = SNR M2( NEQ NS, BTS , 1 )
c
c.. ... pri nt res ult s
SEE ALSO
Golub, G. H. and C. F. Van Loan, Matrix Computations, second edition. Baltimore, MD: Johns Hopkins
University Press, 1989.
Liu, J. W., "Modification of the Minimum Degree Algorithm by Multiple Elimination," ACM Transactions
on Math Software, 11, (1985): pp. 141– 153.
NAME
DFAULTS – Assigns default values to the parameter arguments for SITRSOL(3S)
SYNOPSIS
CALL DFAULTS (iparam, rparam)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
Users of SITRSOL usually would have to explicitly define each required parameter in the iparam and
rparam array arguments. DFAULTS lets you easily assign default values to the parameters in iparam and
rparam. After you set the default values by using DFAULTS, you can then change any of the parameter
values explicitly, as needed.
This routine has the following arguments:
iparam Integer array of dimension 40. (output)
Array of integer parameters required by SITRSOL.
rparam Real array of dimension 30. (output)
Array of real parameters required by SITRSOL.
To see the complete range of valid values for these arguments, see the SITRSOL(3S) man page.
Many of these parameters are set on exit from SITRSOL. After a call to SITRSOL, the DFAULTS setting
of these parameters is destructive.
The default values for iparam and rparam (output of DFAULTS) are as follows:
iparam
iparam(1): isym Full or symmetric format flag.
=1 Matrix is in symmetric column pointer format.
iparam(2): itest Stopping criterion.
=0 Use ’natural’ (cheapest) stopping criterion for the chosen iterative
method.
iparam(3): maxiter Maximum number of iterations allowed.
= 500
iparam(4): niter On exit, SITRSOL sets this to the number of iterations actually performed.
=0
iparam(5): msglvl Flag to control the level of messages output.
=2 Warning and fatal messages only.
iparam(17): nvorth Number of previous Krylov basis vectors to which each new basis vector is made
orthogonal. (GMRES method only.)
= 10
iparam(18): nrstrt Number of iterations between restart in OMN[k].
= 20
iparam(19): irestrt Save-and-restart control flag.
=0 No save-and-restart.
iparam(20): iosave Save-and-restart unit number of the unformatted file, which is assumed to have
been opened by the user.
=0 This is not a valid unit number for the save-and-restart operation. If
you change the value of irestrt to enable save-and-restart, you also must
change the value of iosave.
iparam(21): mvformat Desired format for computation of matrix-vector products.
=1 Use jagged diagonal form. This requires more storage, but it offers
faster performance.
iparam(22): nicfmax Maximum number of times to try IC[k] factorization by using shifted IC
factorization. (See rparam(15) and rparam(16).)
= 11
iparam(23): nicfacs On exit, SITRSOL sets this to the actual number of shifted IC[k] factorizations
tried. (See rparam(15) and rparam(16).)
=0
iparam(24) — iparam(40)
Presently unused. These parameters are reserved for future use.
rparam
rparam(1): tol Stopping criterion tolerance.
–6
= 1.0
rparam(2): err On exit, SITRSOL sets this to the computed error estimate at each iteration.
= 0.0
rparam(3): alpha Absolute value of the estimate of the smallest eigenvalue of A. Currently, this
parameter is unused and is assumed to be 0.
= 0.0
rparam(4): beta Absolute value of the estimate of the largest eigenvalue of A. This is needed only
by the least-squares polynomial preconditioner (ipretyp=5).
= 0.0 SITRSOL computes an estimate of the spectral radius.
rparam(5): timscal On exit, SITRSOL sets this to the accumulated time (in seconds) to scale and
unscale the user matrix. (See the NOTES section.)
= 0.0
rparam(6): timsets On exit, SITRSOL sets this to the accumulated time (in seconds) to compute the
symbolic incomplete factorization. (See the NOTES section.)
= 0.0
rparam(7): timsetn On exit, SITRSOL sets this to the accumulated time (in seconds) to compute the
numerical incomplete factorization. (See the NOTES section.)
= 0.0
rparam(8): timset On exit, SITRSOL sets this to the accumulated total time (in seconds) to perform
the preconditioner setup. If incomplete factorization is used, this includes both
timsets and timsetn. (See the NOTES section.)
= 0.0
rparam(9): timpre On exit, SITRSOL sets this to the accumulated total time (in seconds) to apply the
preconditioner in the iteration phase of the solution process. (See the NOTES
section.)
= 0.0
rparam(10): timmvs On exit, SITRSOL sets this to the accumulated time (in seconds) to convert from
column pointer to jagged diagonal format. If parallel processing is used, this also
includes the setup time to perform the parallel matrix vector operations. (See the
NOTES section.)
= 0.0
rparam(11): timmv On exit, SITRSOL sets this to the accumulated time (in seconds) to perform the
matrix vector product (not including those performed in applying the polynomial
preconditioners). (See the NOTES section.)
= 0.0
rparam(12): timmtv On exit, SITRSOL sets this to the accumulated time (in seconds) to perform the
transpose matrix vector product (not including those in applying the polynomial
preconditioners). (See the NOTES section.)
= 0.0
rparam(13): timit On exit, SITRSOL sets this to the accumulated time (in seconds) spent in the
iterative routine (not including the time spent computing matrix vector products or
applying the preconditioners). (See the NOTES section.)
= 0.0
rparam(14): timtot On exit, SITRSOL sets this to the accumulated total time (in seconds) for this call
to SITRSOL, plus that of previous calls if not reset. (See the NOTES section.)
= 0.0
rparam(15): gammin Minimum value for shift factor γ. For some problems, IC[k] preconditioning fails in
the factorization. In many cases, "shifting" the diagonal elements allows the
factorization to be computed for this modified matrix.
= 0.0
rparam(16): gammax Maximum value for shift factor γ. If the IC[k] factorization fails, SITRSOL
increments γ and tries again. γ may take on nicfmax different values between
gammin and gammax.
On exit from SITRSOL (if nicfacs > 1), gammax contains the actual value of
gamma used to compute the factorization.
= 0.3
rparam(17) – rparam(30)
Presently unused. These parameters are reserved for future use.
NOTES
If the timing parameters, rparam(5) through rparam(14), are not reset to 0.0 (for example, by DFAULTS),
timing information for subsequent calls to SITRSOL will be added to existing timing information.
If multiple CPUs are used, rparam(5) through rparam(14) report the cumulative time for all CPUs.
SEE ALSO
INTRO_SPARSE(3S) for an example of a Fortran program that uses DFAULTS and SITRSOL
SITRSOL(3S) for a more complete description of iparam and rparam
Scientific Libraries User’s Guide
NAME
SITRSOL – Solves a real general sparse system, using a preconditioned conjugate gradient-like method
SYNOPSIS
CALL SITRSOL (method, ipath, neqns, nvars, x, b, icolptr, irowind, value, liwork, iwork,
lrwork, rwork, iparam, rparam, ierr)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
SITRSOL uses any of several iterative techniques to solve a real general sparse system of equations.
Because no single robust iterative algorithm for solving sparse linear systems exists, SITRSOL lets users
select from a wide variety of iterative techniques, preconditioning schemes, and many tuning parameters.
You can initialize the iparam and rparam tuning parameter arguments by using a call to DFAULTS. Then
you must change only selected parameter values, rather than setting up the entire arrays of parameters
manually.
This routine has the following arguments:
method Character*3. (input)
Name used to select the iterative method.
= ’BCG’ Biconjugate gradient method
= ’CGN’ Conjugate gradient method applied to the equations:
T T
AA y = b x = A y (Craig’s method)
= ’CGS’ Conjugate gradient squared method
= ’GMR’ Generalized minimum residual (GMRES) method
= ’GMN’ Orthomin or generalized conjugate residual (GCR) method
= ’PCG’ Preconditioned conjugate gradient method
ipath Integer. (input)
Value used to control the execution path in the solver. This argument is useful when the driver is
used to solve similar problems or a large problem in pieces.
= 1 Processes only the structure of the matrix. No solution is computed.
= 2 Processes both the structure and values of the matrix. The solution is computed.
= 3 Processes only the values of the matrix. The solution is computed. It is assumed that
SITRSOL has been called with ipath equal to 1 or 2 and that the structure previously set up
is used.
= 4 Solves the same linear system with different right-hand side.
= 5 Restarts from a previously saved run.
neqns Integer. (input)
Number of equations (rows) in the system.
rparam(7): timsetn Accumulated time (in seconds) to compute the numerical incomplete factorization.
(input and output)
DFAULTS returns timsetn = 0.0.
rparam(8): timset Accumulated total time (in seconds) to perform the preconditioner setup. (input and
output)
If incomplete factorization is used, this includes both timsets and timsetn.
DFAULTS returns timset = 0.0.
rparam(9): timpre Accumulated total time (in seconds) to apply the preconditioner in the iteration phase
of the solution process. (input and output)
DFAULTS returns timpre = 0.0.
rparam(10): timmvs Accumulated time (in seconds) to convert from column pointer to jagged diagonal
format. (input and output)
If you use parallel processing, this also includes the setup time to perform the parallel
matrix vector operations.
DFAULTS returns timmvs = 0.0.
rparam(11): timmv Accumulated time (in seconds) to perform the matrix vector product. (input and
output)
This does not include the products that apply the polynomial preconditioners.
DFAULTS returns timmv = 0.0.
rparam(12): timmtv Accumulated time (in seconds) to perform the transpose matrix vector product. (input
and output)
This does not include the products that apply the polynomial preconditioners.
DFAULTS returns timmtv = 0.0.
rparam(13): timit Accumulated time (in seconds) spent in the iterative routine. (input and output)
This does not include the time spent computing matrix vector products or applying
the preconditioners.
DFAULTS returns timit = 0.0.
rparam(14): timtot Accumulated total time (in seconds) for this call to SITRSOL. (input and output)
DFAULTS returns timtot = 0.0.
rparam(15): gammin Minimum value for shift factor γ. (input)
For some problems, IC[k] preconditioning fails in the factorization. In many cases,
"shifting" the diagonal elements (that is, letting a(i,i) = (1 + γ) . a(i,i)) allows the
factorization to be computed for this modified matrix.
DFAULTS returns gammin=0.0.
Workspace
The following are three methods for estimating your workspace needs:
Rough estimate Fastest (only one multiply), but least accurate.
SITRSOL estimate Much slower, but also a lot more accurate.
Hand-coded estimate If you already know certain information about the size of the final factorization, a
hand-coded SITRSOL estimation algorithm with your information will be more
accurate than SITRSOL’s calculations.
Rough estimate
You can make a very rough estimate of your workspace needs by setting
liwork =lrwork =6 . nza
where nza is the number of nonzero elements in matrix A (= icolptr(neqns+1)– 1). This estimate is usually
sufficient.
If you are not using certain memory-intensive preconditioning or matrix formatting options, you can refine
this estimate further:
A. If you are not using IC[k] or ILU[k] preconditioning, subtract 2 . nza from your previous estimate for
liwork and lrwork.
B. If you are not using jagged diagonal format, subtract 2 . nza from your previous estimate for liwork and
lrwork.
One, both, or neither of the preceding conditions might be true; therefore, the estimate could end up being
any of the following:
4 . nza If only one or the other of A and B were true
2 . nza If both A and B were true
6 . nza If neither A nor B were true
SITRSOL estimate
You can get a more accurate estimate by calling SITRSOL with liwork or lrwork set to 0. This causes
SITRSOL to generate an error flag (– 10004 or – 10005) and to return an estimate of workspace requirements
in iwork(1) for lrwork and iwork(2) for liwork. You can then use these estimates in another call to
SITRSOL. In computing this estimate, SITRSOL uses the precise formulas that follow in the Algorithm for
Accurate Workspace Estimate subsection.
Hand-coded estimate
If you are using IC[k] or ILU[k] preconditioning and you already know the number of nonzero elements in
the IC[k] factor matrix L or in the ILU[k] factor matrices L and U, you can get the most accurate estimate
by hand-coding the high-precision algorithm used by SITRSOL.
Then you can use your exact numbers in the algorithm in formulas for which SITRSOL has only estimates.
This means your final result will be more accurate than SITRSOL’s, even though the algorithm is the same.
NOTES
This section discusses parallel processing and workspace considerations.
Using Parallel Processing in SITRSOL
SITRSOL is designed to exploit the parallel processing capabilities of Cray Y-MP systems. In particular,
the preconditioners and matrix-vector operations are designed to achieve significant speedup on multiple
CPUs; however, the parallelism is designed to be effective only for large problems. Small problems will not
benefit, and performance probably will be degraded. What constitutes a "small" problem or a "large"
problem is difficult to define. Also, a large gray area exists in which using fewer than all CPUs gives better
performance than using all CPUs. Experimentation is the best way to decide on the optimal number of
CPUs.
To select the number of CPUs, define the NCPUS environment variable to the desired value. SITRSOL will
then obtain this value and try to use that number of CPUs. In a batch environment, it is unusual to get all of
the physical CPUs on the machine. In this case, it is better to request a smaller number of CPUs than is
physically available. If you do not define the NCPUS variable, the default value for NCPUS will be the total
number of physical processors on the machine.
Timing and parallel processing
SITRSOL uses the system timing function SECOND(3F), which is a real-valued function that returns the
accumulated CPU time for all processors. If you use parallel processing and you want wall-clock timing
information, replace the SECOND function with the following function, which uses IRTC(3I) to do the
timing:
REA L FUN CTI ON SEC OND ()
C.. ... CRA Y Y-M P C-9 0 clo ck per iod
PAR AME TER ( CP= 4.2 E-9 )
C..... CRA Y Y-M P clo ck period
C PAR AME TER ( CP= 6.0 E-9 )
C
SEC OND = FLO AT( IRT C())*C P
RET URN
END
Based on your system, use the appropriate parameter CP. You should replace the SECOND system function
because it returns the accumulated CPU time for all CPUs; if you do not replace SECOND, the multiple-CPU
timings will always be worse than the single-CPU timings.
A drawback exists to using the IRTC function. It returns the real system time and does not subtract time
spent being swapped out. Thus, in a batch environment, timing information typically will not be consistent
between two identical runs.
If ( mvformat = 0 ) then
Iuse(1)= Iret(1) = neqns
Ruse(1)= Rret(1) = nscale
Else
Iuse(1)= nzpap + 4*neqns + maxnz + nsegs + 2
Iret(1)= nzpap + 3*neqns + maxnz + nsegs + 2
Ruse(1)= nzpap + neqns + nscale
Rret(1)= nzpap + nscale
End If
where
If ( iscale > 0 ) then
nscale = neqns
Else If ( iscale = 0 ) then
nscale = 0
End If
where
• nzlE = Estimated number of nonzero elements in L as defined by maxlfil on input
• nzlA = Actual number of nonzero elements in L as defined by maxlfil on exit from the preconditioner
setup phase
If ( ncpus = 1 ) Then
nparU = nparR = 0
Else If ( ncpus > 1 ) Then
nparU = 6*neqns + 4
nparR = 4*neqns + 4
End If
In the hand-coded version, if you already know the value of nzlA, you can improve your workspace estimate
by setting nzlE = nzlA.
For incomplete LU preconditioning:
Iuse(2) = max( (4*neqns + 2*nzlE + nzuE + 3), (2*neqns + nzlA + nzuA + 2 + nparU) )
Iret(2) = nzlA + nzuA + 2*neqns + 2 + nparR
If ( ncpus = 1 ) then
Ruse(2) = nzlA + nzuA + neqns
Else If ( ncpus > 1 ) then
Ruse(2) = nzlA + nzuA + max(ncpus*neqns, nzlA, nzuA )
Rret(2) = nzlA + nzuA
where
• nzlE = Estimated number of nonzero elements in L as defined by maxlfil on input
• nzuE = Estimated number of nonzero elements in U as defined by maxufil on input
• nzlA = Actual number of nonzero elements in L as defined by maxlfil on exit from the preconditioner
setup phase
• nzuA = Actual number of nonzero elements in U as defined by maxufil on exit from the preconditioner
setup phase
If ( ncpus = 1 ) Then
nparU = nparR = 0
Else If ( ncpus > 1 ) Then
nparU = max(nzlA,nzuA) + 5*neqns + 4
nparR = 4*neqns + 4
End If
In the hand-coded version, if you already know the values of nzlA and nzuA, you can improve your
workspace estimate by setting nzlE = nzlA and nzuE = nzuA.
SEE ALSO
DFAULTS(3S)
INTRO_SPARSE(3S) for an example of using this routine and the other sparse matrix routines
NAME
SSGETRF – Factors a real sparse general matrix with threshold pivoting implemented
SYNOPSIS
CALL SSGETRF (ido, neqns, icolptr, irowind, value, lwork, work, iparam, thresh, ierror)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
Given a real sparse general matrix A, SSGETRF computes the LU factorization of PA(transpose of P), in
which P is an internally computed permutation matrix. Threshold pivoting is implemented for stability.
This routine has the following arguments:
ido Integer. (input)
Controls the execution path through the routine. ido is a two-digit integer whose digits are
represented on this man page as i and j. i indicates the starting phase of execution, and j
indicates the ending phase. For SSGETRF, there are four phases of execution, as follows:
Phase 1: Fill reduction reordering
Phase 2: Symbolic factorization
Phase 3: Determination of the node execution sequence and the storage requirement for the
frontal matrices
Phase 4: Numerical factorization
If a previous call to the routine has computed information from previous phases, execution can
start at any phase.
ido = 10i + j 1≤i≤j≤4
neqns Integer. (input)
Number of equations (or unknowns, rows, or columns).
icolptr Integer array of dimension neqns + 1. (input)
Column pointer array for the sparse matrix A. The first and last elements of the array must be
set as follows:
icolptr(1) = 1 icolptr(neqns+1) = nza + 1
where nza is the number of nonzero elements in the sparse matrix A.
irowind Integer array of dimension nza (see icolptr). (input)
Row indices array for the sparse matrix A.
value Real array of dimension nza (see icolptr). (input)
Array of nonzero values for the sparse matrix A. The icolptr, irowind, and value arguments
taken together contain the input matrix in sparse column format. See the introduction to the
sparse solvers (INTRO_SPARSE(3S)) for a full description of the sparse column format.
lwork Integer. (input)
Length of the work array work. Workspace requirements vary from phase to phase. If lwork is
not sufficient to execute a particular phase successfully, the routine will return with an indication
of how much workspace is required to continue. See the Workspace subsection.
work Real array of dimension lwork. (input and output)
Work array used to hold the results of each phase that are needed to process the next phase.
Between calls to SSGETRF to compute subsequent phases, the user must not modify this array.
iparam Integer array of dimension 13. (input)
List of user control parameters. The value of iparam(1) controls the use of the parameter array:
0 Uses default values for all parameters.
1 Overrides default values by using iparam.
For a full description, see the Parameters subsection.
thresh Real. (input)
The thresh variable determines whether pivoting occurs. 0 ≤ thresh ≤ 1.
ierror Integer. (output)
Error code to report any error condition detected.
0 Normal completion.
–1 ido is not a valid path for a fresh start.
–2 ido is not a valid path for a restart run.
– 10000 Input matrix structure is incorrect.
– k0001 Insufficient storage allocated for phase k. (1 ≤ k ≤ 4)
– 20002 Fatal error from the symbolic factorization. Either the input structure is incorrect or the
active part of array work was changed between successive calls to SSGETRF.
– 40002 Input matrix structure is not consistent with the structure of the lower triangular factor.
The active part of array work may have been changed between successive calls to
SSGETRF.
– 40301 Fatal error from the numerical factorization. The input matrix is numerically singular.
Parameters
The following is a list of user control parameters and their default values to be used by SSGETRF and
SSGETRS routines. To use the default values, pass a constant 0 as the iparam argument, as follows:
CAL L SSG ETR F(IDO,NEQ ,IC OL,IRO W,V AL, LWK ,WORK, 0 ,TH R,I ER)
Phase 1:
Use(1) = 150 + 12*neqns + 4*nza + 4
Ret(1) = 150 + 5*neqns + 3*nsup + nnzsym + 3
Phase 2:
I1 = Ret(1) + ngssubs + 4*neqns + nsup + 2
I2 = Ret(1) + 2*ngssubs + 10*nsup + 2*neqns + 4
If adjacency structure is saved
Use(2) = max ( I1, I2 )
Ret(2) = I2
Otherwise
Use(2) = max ( I1, I2 - nnzsym - neqns - 1 )
Ret(2) = I2(1)
Phase 3:
Use(3) = Ret(2) + 3*nsup
Ret(3) = Ret(2) + nsup
Phase 4:
If the sort information is saved
Use(4) ≥ Ret(3) + 5*neqns + nfctnzs + 3*nza + 4
Ret(4) = Ret(3) + nza + lusize
Otherwise
Use(4) ≥ Ret(3) + 6*neqns + nfctnzs + 2*nza + 4
Ret(4) = Ret(3) + lusize
SEE ALSO
INTRO_SPARSE(3S) for general information on sparse solvers and a usage example
SSGETRS(3S) to solve one or more right-hand sides by using the factorization computed by SSGETRF
NAME
SSGETRS – Solves a real sparse general system, using the factorization computed in SSGETRF(3S)
SYNOPSIS
CALL SSGETRS (ido, lwork, work, nrhs, rhs, ldrhs, iparam, ierror)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
Given the LU factorization computed from SSGETRF and a (set of) right-hand side(s), SSGETRS solves the
linear systems.
This routine has the following arguments:
ido Integer. (input)
Variable used to control the execution path in SSGETRS.
= 1 Solve AX = B
= 2 Forward solve
= 3 Backward solve
Calling SSGETRS with ido = 2 and again with ido = 3 gives the same result as calling SSGETRS
once with ido = 1.
lwork Integer. (input)
Length of the work array work as in SSGETRF.
work Real array of dimension lwork. (input and output)
Work array exactly as output from SSGETRF. The user must not have modified this array because
it contains information about the LU factorization.
nrhs Integer. (input)
Number of right-hand sides.
rhs Real array of dimension (ldrhs,nrhs). (input and output)
On entry, rhs contains the nrhs vectors. If ido = 1 or 2, the vectors are the right-hand side vectors
b from the system of equations Ax = b. If ido = 3, the right-hand sides should be the intermediate
result z obtained by calling SSGETRS with ido = 2.
On exit, rhs contains the nrhs corresponding solution vectors.
ldrhs Integer. (input)
Leading dimension of array rhs exactly as specified in the calling program.
iparam Integer array of dimension 13. (input)
List of user control options as in SSGETRF. Only four elements, iparam(1), iparam(2), iparam(3),
and iparam(5), are required for the solution phase.
SEE ALSO
INTRO_SPARSE(3S) for general information on sparse solvers and a usage example
SSGETRF(3S) to compute the factorization used by SSGETRS
NAME
SSPOTRF – Factors a real sparse symmetric definite matrix
SYNOPSIS
CALL SSPOTRF (ido, neqns, icolptr, irowind, value, lwork, work, iparam, ierror)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
Given a real sparse symmetric definite matrix A, SSPOTRF computes the LD(transpose of L) factorization of
(PA(transpose of P); P is an internally computed permutation matrix.
This routine has the following arguments:
ido Integer. (input)
Controls the execution path through the routine. ido is a two-digit integer whose digits are
represented on this man page as i and j. i indicates the starting phase of execution, and j indicates
the ending phase. For SSPOTRF, there are four phases of execution, as follows:
Phase 1: Fill reduction reordering
Phase 2: Symbolic factorization
Phase 3: Determination of the node execution sequence and the storage requirement for the
frontal matrices
Phase 4: Numerical factorization
If a previous call to the routine has computed information from previous phases, execution can
start at any phase.
ido = 10i + j 1≤i≤j≤4
neqns Integer. (input)
Number of equations (or unknowns, rows, or columns).
icolptr Integer array of dimension neqns + 1. (input)
Column pointer array for the sparse matrix A. The first and last elements of the array must be set
as follows:
icolptr(1) = 1 icolptr(neqns+1) = nza + 1
where nza is the number of nonzero elements in the sparse matrix A.
irowind Integer array of dimension nza (see icolptr). (input)
Row indices array for the sparse matrix A.
Workspace
You can determine the amount of workspace needed to execute phase k (denoted Use(k)) and the amount of
workspace retained after the execution of phase k (denoted Ret(k)) by using the following notation:
ncpus = Number of CPUs.
neqns = Number of unknowns or equations.
nsup = Number of supernodes.
This can be obtained from work(32) after phase 1.
nza = Number of nonzero elements in A (=icolptr(neqns+1)– 1).
nadj = 2*(nza – neqns), size of the adjacency structure of A.
nfctnzs = Number of nonzero elements in L.
This can be obtained from work(11) after phase 1.
gs subs = Number of row subscripts required to represent L.
This can be obtained from work(14) after phase 1.
maxrow = Maximum number of nonzero elements in a row of L.
This can be obtained from work(20) after phase 1.
maxsup = Maximum size of a supernode.
This can be obtained from work(21) after phase 1.
minstk = Minimum amount of workspace required for the temporary frontal matrices.
This can be obtained from work(22) after phase 3.
Phase 1:
Use(1) = 150 + 2*nadj + 11*neqns + 4
Ret(1) = 150 + 4*neqns + 3*nsup + nadj + 3
Phase 2:
I1 = Ret(1) + ngssubs + 3*neqns + nsup + 1
I2 = Ret(1) + 2*ngssubs + 10*nsup + 3
If saving the adjacency structure
Use(2) = max ( I1, I2-(2*nza+1) )
Ret(2) = 150 + 3*neqns + 2*ngssubs + 13*nsup + 5
Otherwise
Use(2) = max ( I1, I2 )
Ret(2) = Ret(1) + 2*ngssubs + 10*nsup + 3
Phase 3:
For single processing (ncpus = 1)
Use(3) = Ret(2) + 2*nsup
Ret(3) = Ret(2)
For multiple processing (ncpus > 1)
Use(3) = Ret(2) + 12*nsup + 2
Ret(3) = Ret(2) + 8*nsup + 1
Phase 4:
For single processing (ncpus = 1)
Use(4) = Ret(3) + neqns + nfctnzs + 2*(maxsup+maxrow) +
nsup + minstk
For multiple processing (ncpus > 1)
Use(4) = Ret(3) + neqns + nfctnzs + ncpus + maxsup + 3*(nsup) +
(ncpus+1)*(maxsup+2*maxrow) + minstk
Ret(4) = Ret(3) + neqns + nfctnzs
SEE ALSO
INTRO_SPARSE(3S) for general information on sparse solvers and a usage example
SSPOTRS(3S) to solve one or more right-hand sides, using the factorization computed by SSPOTRF
NAME
SSPOTRS – Solves a real sparse symmetric definite system, using the factorization computed in
SSPOTRF(3S)
SYNOPSIS
CALL SSPOTRS (ido, lwork, work, nrhs, rhs, ldrhs, iparam, ierror)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
T T
Given the LDL factorization of PAP computed from SSPOTRF and a (set of) right-hand side(s),
SSPOTRS solves the following linear system for the solution of the system Ax = b. P is an internally
computed permutation matrix.
T T
PAP y = Pb, x = P y
This routine has the following arguments:
ido Integer. (input)
Variable used to control the execution path in SSPOTRS.
T T T
ido = 1 Solves P (LDL )(Px) = P P(b))
ido = 2 Solves L(Px) = P(rhs)
ido = 3 Solves Dx = rhs
T T T
ido = 4 Solves P L x = P (rhs)
1/2
ido = 5 Solves (LD )(Px) = P(rhs)
T 1/2 T T
ido = 6 Solves P (LD ) x = P (rhs)
lwork Integer. (input)
Length of the work array work as in SSPOTRF.
work Real array of dimension lwork. (input and output)
Work array exactly as output from SSPOTRF. The user must not have modified this array because
it contains information about the LD(transpose of L) factorization.
nrhs Integer. (input)
Number of right-hand sides.
rhs Real array of dimension (ldrhs,nrhs). (input and output)
On entry, rhs contains the nrhs right-hand side b for which to solve. On exit, rhs contains the nrhs
corresponding solutions.
ldrhs Integer. (input)
Leading dimension of array rhs exactly as specified in the calling program.
SEE ALSO
INTRO_SPARSE(3S) for general information on sparse solvers and a usage example
SSPOTRF(3S) to compute the factorization used by SSPOTRS
NAME
SSTSTRF – Factors a real sparse general matrix with a symmetric nonzero pattern (no form of pivoting is
implemented)
SYNOPSIS
CALL SSTSTRF (ido, neqns, icolptr, irowind, value, lwork, work, iparam, ierror)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
Given a real sparse general matrix A with a symmetric nonzero pattern, SSTSTRF computes the LU
factorization of PA(transpose of P). P is an internally computed permutation matrix. No form of pivoting is
implemented.
This routine has the following arguments:
ido Integer. (input)
Controls the execution path through the routine. ido is a two-digit integer whose digits are
represented on this man page as i and j. i indicates the starting phase of execution, and j
indicates the ending phase. For SSTSTRF, there are four phases of execution, as follows:
Phase 1: Fill reduction reordering
Phase 2: Symbolic factorization
Phase 3: Determination of the node execution sequence and the storage requirement for the
frontal matrices
Phase 4: Numerical factorization
If a previous call to the routine has computed information from previous phases, execution may
start at any phase.
ido = 10i + j 1≤i≤j≤4
neqns Integer. (input)
Number of equations (or unknowns, rows, or columns).
icolptr Integer array of dimension neqns + 1 . (input)
Column pointer array for the sparse matrix A. The first and last elements of the array must be
set as follows:
icolptr(1) = 1 icolptr(neqns+1) = nza + 1
where nza is the number of nonzero elements in the sparse matrix A.
irowind Integer array of dimension nza (see icolptr). (input)
Row indices array for the sparse matrix A.
Parameters
The following is a list of user control parameters and their default values to be used by SSTSTRF and
SSTSTRS routines. iparam(1)=0.
CALL SSTSTR F(IDO, NEQ ,IC OL, IRO W,V AL, LWK ,WO RK, 0 ,IE R)
iparam(11) Size of the fixed block to accommodate the grouping of temporary frontal matrices. This is
needed only when you want to exploit the parallelism in the elimination of independent
supernodes; in this case, workspace for temporary frontal and update matrices of the
independent supernodes are allocated using a fixed-block scheme. When in use, iparam(11)
must be greater than or equal to iparam(10).
Default is 0.
iparam(12) 0 Check for valid input structure.
1 Do not check input structure.
Default is 0.
Workspace
You can determine the amount of workspace needed to execute phase k (denoted Use(k)) and the amount of
workspace retained after the execution of phase k (denoted Ret(k)) by using the following notation:
ncpus = Number of CPUs.
neqns = Number of unknowns or equations.
nsup = Number of supernodes. This can be obtained from work(32) after phase 1.
nza = Number of nonzero elements in A (=icolptr(neqns+1)– 1).
nadj = (nza – neqns), size of the adjacency structure of A.
nfctnzs = Number of nonzero elements in L. This can be obtained from work(11) after phase 1.
ngssubs = Number of row subscripts required to represent L. This can be obtained from work(14) after
phase 1.
minstk = Minimum amount of workspace required for the temporary frontal matrices. This can be
obtained from work(22) after phase 3.
Phase 1:
Use(1) = 150 + 2*nadj + 11*neqns + 4
Ret(1) = 150 + 4*neqns + 3*nsup + nadj + 3
Phase 2:
Use(2) = Ret(1) + 2*ngssubs + 10*nsup + neqns + 5
Phase 3:
For single processing (ncpus = 1)
Use(3) = Ret(2) + 2*nsup
Ret(3) = Ret(2)
Phase 4:
For single processing (ncpus = 1)
Use(4) = Ret(3) + neqns + nfctnzs + nsup + minstk
SEE ALSO
INTRO_SPARSE(3S) for general information on sparse solvers and a usage example
SSTSTRS(3S) to solve one or more right-hand sides, using the factorization computed by SSTSTRF
NAME
SSTSTRS – Solves a real sparse general system with a symmetric nonzero pattern, using the factorization
computed in SSTSTRF(3S)
SYNOPSIS
CALL SSTSTRS (ido, lwork, work, nrhs, rhs, ldrhs, iparam, ierror)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
T
Given the LU factorization of PAP computed from SSTSTRF(3S) and a (set of) right-hand side(s),
SSTSTRS solves the following linear system for the solution of the system Ax = b.
T
P is an internally computed permutation matrix. P is the transpose of P.
T T
PAP y = Pb, x = P
This routine has the following arguments:
ido Integer. (input)
Variable used to control the execution path in SSTSTRS.
T
ido = 1 Solves P (LU)Px = b (that is, Ax = b) for x
T
ido = 2 Solves P Lz = b for z
ido = 3 Solves UPx = z for x
Calling SSTSTRS with ido = 2 and again with ido = 3 has the same result as calling SSTSTRS
once with ido = 1.
lwork Integer. (input)
Length of the work array work as in SSTSTRF.
work Real array of dimension lwork. (input and output)
Work array exactly as output from SSTSTRF. The user must not have modified this array
because it contains information about the LU factorization.
nrhs Integer. (input)
Number of right-hand sides.
rhs Real array of dimension (ldrhs, nrhs). (input and output)
On entry, rhs contains the nrhs right-hand side vectors. If ido = 1 or 2, the right-hand side
vectors should be b from the system of equations Ax = b. If ido = 3, the right-hand sides
should be the intermediate result z obtained by calling SSTSTRS with ido = 2.
On exit, rhs contains the nrhs solution vectors.
ldrhs Integer. (input)
Leading dimension of array rhs exactly as specified in the calling program.
SEE ALSO
INTRO_SPARSE(3S) for general information on sparse solvers and a usage example
SSTSTRF(3S) to compute the factorization used by SSTSTRF
NAME
INTRO_SPEC – Introduction to solvers for special linear systems
IMPLEMENTATION
UNICOS systems
DESCRIPTION
All solvers for special linear systems run only on Cray PVP systems.
The following table lists the solvers for special linear systems. The first name in each block of the table is
the name of the man page that documents all of the routines listed in that block.
Purpose Name
NAME
FOLR, FOLRP – Solves first-order linear recurrences
SYNOPSIS
CALL FOLR (n, x, incx, a, inca)
CALL FOLRP (n, x, incx, a, inca)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
DESCRIPTION
FOLR solves first-order linear recurrences, as follows:
a1 = a1
a i = a i – x i a i– 1 for i = 2, 3, . . ., n
FOLRP solves first-order linear recurrences, as follows:
a1 = a1
a i = a i + x i a i– 1 for i = 2, 3, . . ., n
These routines have the following arguments:
n Integer. (input)
Length of linear recurrence. If n ≤ 1, neither routine performs any computation.
x Real array of dimension 1+(n – 1) . incx . (input)
Contains multiplier vector. The first element of x in the recurrence is arbitrary.
incx Integer. (input)
Increment between elements of x.
a Real array of dimension 1+(n – 1) . inca . (input and output)
Contains operand vector. On input, a contains the initial values for the recurrence relation. On
output, a receives the result of the linear recurrence.
inca Integer. (input)
Increment between recurrence elements of a.
NOTES
When working backward (incx < 0 or inca < 0), each routine starts at the end of the vector and moves
backward, as follows:
CAUTIONS
Do not specify inca as 0, because unpredictable results may occur.
EXAMPLES
The following examples illustrate the use of these routines with positive and negative increments. (The first
three executable statements of each example are Fortran 90 array syntax.)
Example 1: FOLR with positive increments
PRO GRA M EX1
PAR AME TER (NMAX = 100)
REA L X(N MAX), A(N MAX ), A1( NMA X)
C
C.. ...Loa d vec tor s wit h ran dom number s, ini tializ e N.
X = RANF()
A = RANF()
A1 = A
N = NMAX
C
C.. ...Cal l to FOL R
CAL L FOL R(N,X,1,A 1,1 )
C
C..... Equiva len t FOR TRAN code
A(1)=A (1)
DO 10 I = 2, N
A(I)=A(I) -X( I)*A(I -1)
10 CONTIN UE
C
C.. ... Verify res ults
A = A - A1
PRI NT*,’D iff erence = ’,S NRM 2(N ,A, 1)
END
SEE ALSO
FOLR2(3S) and FOLR2P(3S) to solve the same recurrences as solved by FOLR and FOLRP, without
overwriting the a operand
FOLRC(3S) to solve a first-order linear recurrence by using scalar multiplier
FOLRN(3S) and FOLRNP(3S) to solve for only the last term of the same recurrences as solved by FOLR and
FOLRP
SOLR(3S), SOLR3(3S), SOLRN(3S) to solve various forms of second-order linear recurrence
NAME
FOLR2, FOLR2P – Solves first-order linear recurrences without overwriting the operand vector
SYNOPSIS
CALL FOLR2 (n, x, incx, a, inca, b, incb)
CALL FOLR2P (n, x, incx, a, inca, b, incb)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
DESCRIPTION
B(1)=A (1)
DO 10 I=2 ,N
B(I )=A(I) -X(I)* B(I -1)
10 CON TIN UE
The following is the Fortran equivalent of FOLR2P (given for case incx = inca = incb = 1):
B(1)=A (1)
DO 10 I=2 ,N
B(I )=A(I) +X(I)* B(I -1)
10 CON TIN UE
NOTES
When working backward (incx < 0, inca < 0 or incb < 0), each routine starts at the end of the vector and
moves backward, as follows:
x (1−incx . (n −1)), x (1−incx . (n −2)),. . ., x (1)
a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)
b (1−incb . (n −1)), b (1−incb . (n −2)),. . ., b (1)
CAUTIONS
Do not specify inca or incb as 0, because unpredictable results may occur.
SEE ALSO
FOLR(3S), FOLRP(3S) to solve the same recurrences as solved by FOLR2 and FOLRP2, but they overwrite
the a operand rather than producing a separate result vector
FOLRC(3S) to solve a first-order linear recurrence by using scalar multiplier
FOLRN(3S), FOLRNP(3S) to solve for only the last term of the same recurrences as solved by FOLR and
FOLRP
SOLR(3S), SOLR3(3S), SOLRN(3S) to solve various forms of second-order linear recurrence
NAME
FOLRC – Solves a first-order linear recurrence with a scalar multiplier
SYNOPSIS
CALL FOLRC (n, b, incb, a, inca, alpha)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
DESCRIPTION
FOLRC solves first-order linear recurrences, as follows:
b1 = a1
b i = a i + α . b i– 1 for i = 2, 3, . . ., n
This routine has the following arguments:
n Integer. (input)
Length of linear recurrence. If n ≤ 0, FOLRC returns without any computation.
b Real array of dimension 1+(n – 1) . incb . (output)
Contains result vector.
incb Integer. (input)
Increment between recurrence elements of b.
a Real array of dimension 1+(n – 1) . inca . (input)
Contains operand vector.
inca Integer. (input)
Increment between recurrence elements of a.
alpha Real. (input)
Scalar multiplier α.
The following is the Fortran equivalent of FOLRC (given for case inca = incb = 1):
B(1)=A (1)
DO 10 I=2 ,N
B(I)=A (I) +AL PHA *B( I-1 )
10 CON TINUE
NOTES
When working backward (inca < 0 or incb < 0), this routine starts at the end of the vector and moves
backward, as follows:
a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)
b (1−incb . (n −1)), b (1−incb . (n −2)),. . ., b (1)
CAUTIONS
Do not specify incb as 0, because unpredictable results may occur.
SEE ALSO
FOLR(3S), FOLRP(3S) to solve recurrences similar to that solved by FOLRC, but they require a vector of
multipliers rather than one scalar multiplier
FOLR2(3S), FOLR2P(3S) to solve the same recurrences as solved by FOLR and FOLRP, without overwriting
the a operand
FOLRN(3S), FOLRNP(3S) to solve for only the last term of the same recurrences as solved by FOLR and
FOLRP
RECPS(3S) to perform a partial summation operation (same as FOLRC with α = 1.0)
SOLR(3S), SOLR3(3S), SOLRN(3S) to solve various forms of second-order linear recurrence
NAME
FOLRN, FOLRNP – Solves for the last term of first-order linear recurrence
SYNOPSIS
r = FOLRN (n, x, incx, a, inca)
r = FOLRNP (n, x, incx, a, inca)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
DESCRIPTION
FOLRN solves for r, the last term of first-order linear recurrence, as follows:
r ← a1
r ← a i – x i r for i = 2,3,. . .,n
FOLRNP solves for r, the last term of first-order linear recurrence, as follows:
r ← a1
r ← a i + x i r for i = 2,3,. . .,n
These functions have the following arguments:
r Real. (output)
Value of the last term of the linear recurrence.
n Integer. (input)
Length of linear recurrence. If n ≤ 0, neither routine performs any computation.
x Real array of dimension 1+(n – 1) . incx . (input)
Contains multiplier vector. The first element of x in the recurrence is arbitrary.
incx Integer. (input)
Increment between recurrence elements of x.
a Real array of dimension 1+(n – 1) . inca . (input)
Contains operand vector.
inca Integer. (input)
Increment between recurrence elements of a.
The following is the Fortran equivalent of FOLRN (given for case incx = inca = 1):
R=A (1)
DO 10 I=2 ,N
R=A(I) -X( I)* R
10 CON TINUE
The following is the Fortran equivalent of FOLRNP (given for case incx = inca = 1):
R=A (1)
DO 10 I=2,N
R=A (I) +X(I)*R
10 CONTIN UE
NOTES
When working backward (incx < 0 or inca < 0), each routine starts at the end of the vector and moves
backward, as follows:
x (1−incx . (n −1)), x (1−incx . (n −2)),. . ., x (1)
a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)
CAUTIONS
Do not specify inca as 0, because unpredictable results may occur.
EXAMPLES
You can use FOLRNP to perform Horner’s rule, an efficient method for evaluation of polynomials.
m
Let p (x ) = Σ
i =0
ai x m−i , a polynomial of degree m.
Thus, the following is the Fortran equivalent to Horner’s rule for evaluating p(x):
REA L A(0 :M), PX, X
. . .
PX = A(0 )
DO 10 I = 1, M
PX = PX * X + A(I)
10 CON TINUE
This is the same as the Fortran equivalent to FOLRNP, when x is a scalar (incx = 0); that is, the following is
also an equivalent to Horner’s rule for evaluating p(x):
SEE ALSO
FOLR(3S), FOLRP(3S) to solve for all terms (not just the last term) in the same recurrences as solved by
FOLRN and FOLRNP, overwriting the a operand with the results
FOLR2(3S), FOLR2P(3S) to solve for all terms in the same recurrences as solved by FOLRN and FOLRNP,
without overwriting the a operand
FOLRC(3S) to solve for all terms in a first-order linear recurrence by using scalar multiplier
SOLR(3S), SOLR3(3S), SOLRN(3S) to solve various forms of second-order linear recurrence
NAME
RECPP, RECPS – Solves a partial product or partial summation problem
SYNOPSIS
CALL RECPP (n, y, incy, x,
incx)
CALL RECPS (n, y, incy, x, incx)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
RECPP solves a partial product problem, as follows:
y1 ← x1
y 1 ← x 1 . y i– 1 for i = 2, 3 . . ., n
RECPS solves a partial summation problem, as follows:
y1 ← x1
y i ← x i + y i– 1 for i = 2, 3 . . ., n
These routines have the following arguments:
n Integer. (input)
Length of linear recurrence. If n ≤ 0, neither routine performs any computation.
y Real array of dimension 1+(n – 1) . incy . (output)
Contains recurrent operand vector. Array y receives the result.
incy Integer. (input)
Increment between recurrence elements of y.
x Real array of dimension 1+(n – 1) . incx . (input)
Contains nonrecurrent operand vector.
incx Integer. (input)
Increment between recurrence elements of x.
NOTES
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
CAUTIONS
Do not specify incy as 0, because unpredictable results may occur.
NAME
SDTSOL, CDTSOL – Solves a real-valued or complex-valued tridiagonal system with one right-hand side
SYNOPSIS
CALL SDTSOL (n, c, d, e, inct, b, incb)
CALL CDTSOL (n, c, d, e, inct, b, incb)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
SDTSOL solves a real-valued tridiagonal system with one right-hand side by combination of
burn-at-both-ends and 3:1 cyclic reduction.
CDTSOL solves a complex-valued tridiagonal system with one right-hand side by combination of
burn-at-both-ends and 3:1 cyclic reduction.
These routines have the following arguments:
n Integer. (input)
Dimension of the tridiagonal matrix. If n < 1, these routines return without any computation.
c SDTSOL: Real array of dimension (1+(n – 1) . inct ). (input and output)
Lower off-diagonal of the real-valued tridiagonal matrix with c(1) = 0.0.
CDTSOL: Complex array of dimension (1+(n – 1) . inct ). (input and output)
Lower off-diagonal of the complex-valued tridiagonal matrix with c(1) = (0.0,0.0).
d SDTSOL: Real array of dimension (1+(n – 1) . inct ). (input and output)
Main diagonal of the real-valued tridiagonal matrix.
CDTSOL: Complex array of dimension (1+(n – 1) . inct ). (input and output)
Main diagonal of the complex-valued tridiagonal matrix.
e SDTSOL: Real array of dimension (1+(n – 1) . inct ). (input and output)
Upper off-diagonal of the real-valued tridiagonal matrix with e (1+(n – 1) . inct ) = 0.0.
CDTSOL: Complex array of dimension (1+(n – 1) . inct ). (input and output)
Upper off-diagonal of the complex-valued tridiagonal matrix with e (1+(n – 1) . inct )=(0.0,0.0).
inct Integer. (input)
Increment between elements in each of the input vectors c, d, and e. inct must be positive.
Typically inct = 1, in which case, the elements of c are contiguous in memory, as are the elements
of d and e.
b SDTSOL: Real array of dimension (1+(n – 1) . incb ). (input and output)
CDTSOL: Complex array of dimension (1+(n – 1) . incb ). (input and output)
NOTES
A 3:1 cyclic reduction is used until the size of the system is reduced to 40. Then the reduced system is
solved directly using a burn-at-both-ends algorithm. The remaining values are obtained by backfilling.
When calling these routines, the elements of c(1) and e (1+(n – 1) . inct ) must be allocated and set equal to
0.0. See the EXAMPLES section.
These routines are appropriate only for tridiagonal matrices that require no pivoting.
EXAMPLES
The following example shows how to set up the arguments c, d, and e, given the tridiagonal matrix T.
Let T be the tridiagonal matrix:
11 12 0 0 0
21 22 23 0 0
T = 0 32 33 34 0
0 0 43 44 45
î 0 0 0 54 55
Then to pass T to TRID (with inct = 1), set the following:
0 11 12
21 22 23
c = 32 d = 33 e = 34
43 44 45
î 54 î 55 î 0
NAME
SDTTRF, CDTTRF – Factors a real-valued or complex-valued tridiagonal system
SYNOPSIS
CALL SDTTRF (n, c, d, e, inct, work, lwork, info)
CALL CDTTRF (n, c, d, e, inct, work, lwork, info)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
SDTTRF factors a real-valued tridiagonal system by combination of burn-at-both-ends and 3:1 cyclic
reduction.
CDTTRF factors a complex-valued tridiagonal system by combination of burn-at-both-ends and 3:1 cyclic
reduction.
These routines have the following arguments:
n Integer. (input)
Dimension of the tridiagonal matrix. If n < 1, these routines return without any computation.
c SDTTRF: Real array of dimension (1+(n – 1) . inct ). (input and output)
Lower off-diagonal of the real-valued tridiagonal matrix with c(1) = 0.0.
CDTTRF: Complex array of dimension (1+(n – 1) . inct ). (input and output)
Lower off-diagonal of the complex-valued tridiagonal matrix with c(1) = (0.0,0.0).
d SDTTRF: Real array of dimension (1+(n – 1) . inct ). (input and output)
Main diagonal of the real-valued tridiagonal matrix.
CDTTRF: Complex array of dimension (1+(n – 1) . inct ). (input and output)
Main diagonal of the complex-valued tridiagonal matrix.
e SDTTRF: Real array of dimension (1+(n – 1) . inct ). (input and output)
Upper off-diagonal of the real-valued tridiagonal matrix with e (1+(n – 1) . inct ) = 0.0.
CDTTRF: Complex array of dimension (1+(n – 1) . inct ). (input and output)
Upper off-diagonal of the complex-valued tridiagonal matrix with e (1+(n – 1) . inct )=(0.0,0.0).
inct Integer. (input)
Increment between elements in each of the input vectors c, d, and e. inct must be positive.
Typically, inct = 1, in which case, the elements of c are contiguous in memory as are the
elements of d and e.
work SDTTRF: Real array of dimension (lwork). (output)
Storage for intermediate results needed for subsequent calls to SDTTRS. This space must not be
modified between calls to this routine and SDTTRS.
NOTES
A 3:1 cyclic reduction is used until the size of the system is reduced to 40. Then the reduced system is
factored directly using a burn-at-both-ends algorithm. You should use these routines with SDTTRS or
CDTTRS, either of which solves for one right-hand side given the factorization computed in SDTTRF or
CDTTRF, respectively.
When calling these routines, the elements of c(1) and e (1+(n – 1) . inct ) must be allocated and set equal to 0.
See the EXAMPLES section.
These routines are appropriate only for tridiagonal matrices that require no pivoting.
CDTTRF only: Because this routine is for complex data, the amount of memory needed is 4n words, which
is 2n complex elements.
EXAMPLES
The following example shows how to set up the arguments c, d, and e, given the tridiagonal matrix T.
Let T be the tridiagonal matrix:
11 12 0 0 0
21 22 23 0 0
T= 0 32 33 34 0
0 0 43 44 45
î 0 0 0 54 55
Then to pass T to TRID (with inct = 1), set the following:
0 11 12
21 22 23
c = 32 d = 33 e = 34
43 44 45
î 54 î 55 î 0
SEE ALSO
SDTSOL(3S) for a description of SDTSOL and CDTSOL, which factor and solve tridiagonal systems
SDTTRS(3S) for a description of SDTTRS(3S) and CDTTRS(3S), which solve tridiagonal systems based on
the factorization computed by SDTTRF or CDTTRF, respectively
NAME
SDTTRS, CDTTRS – Solves a real-valued or complex-valued tridiagonal system with one right-hand side,
using its factorization as computed by SDTTRF(3S) or CDTTRF(3)
SYNOPSIS
CALL SDTTRS (n, c, d, e, inct, b, incb, work, lwork, info)
CALL CDTTRS (n, c, d, e, inct, b, incb, work, lwork, info)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
SDTTRS solves a real-valued tridiagonal system with one right-hand-side by combination of
burn-at-both-ends and 3:1 cyclic reduction. SDTTRF(3S) must be called first to factor the matrix.
CDTTRS solves a complex-valued tridiagonal system with one right-hand-side by combination of
burn-at-both-ends and 3:1 cyclic reduction. CDTTRF(3S) must be called first to factor the matrix.
These routines have the following arguments:
n Integer. (input)
Dimension of the tridiagonal matrix. If n < 1, these routines return without any computation.
c SDTTRS: Real array of dimension (1+(n – 1) . inct ). (input)
Factored lower off-diagonal of the real-valued tridiagonal matrix as computed by SDTTRF.
CDTTRS: Complex array of dimension (1+(n – 1) . inct ). (input)
Factored lower off-diagonal of the complex-valued tridiagonal matrix as computed by CDTTRF.
d SDTTRS: Real array of dimension (1+(n – 1) . inct ). (input)
Factored main diagonal of the real-valued tridiagonal matrix as computed by SDTTRF.
CDTTRS: Complex array of dimension (1+(n – 1) . inct ). (input)
Factored main diagonal of the complex-valued tridiagonal matrix as computed by CDTTRF.
e SDTTRS: Real array of dimension (1+(n – 1) . inct ). (input)
Factored upper off-diagonal of the real-valued tridiagonal matrix as computed by SDTTRF.
CDTTRS: Complex array of dimension (1+(n – 1) . inct ). (input)
Factored upper off-diagonal of the complex-valued tridiagonal matrix as computed by CDTTRF.
inct Integer. (input)
Increment between elements in each of the input vectors c, d, and e. inct must be positive.
Typically, inct = 1, in which case, the elements of c are contiguous in memory as are the
elements of d and e.
NOTES
A 3:1 cyclic reduction is used until the size of the system is reduced to 40. Then the reduced system is
solved directly using a burn-at-both-ends algorithm. You should use these routines after factoring the
tridiagonal matrix with SDTTRF or CDTTRF.
CDTTRS only: Because this routine is for complex data, the amount of memory needed is 4n words, which
is 2n complex elements.
EXAMPLES
The following example shows how to set up the arguments c, d, and e, given the tridiagonal matrix T.
Let T be the tridiagonal matrix:
11 12 0 0 0
21 22 23 0 0
T= 0 32 33 34 0
0 0 43 44 45
î 0 0 0 54 55
SEE ALSO
SDTSOL(3S) for a description of SDTSOL and CDTSOL, which factor and solve tridiagonal systems
SDTTRF(3S) for a description of SDTTRF(3S) and CDTTRF(3S), which compute the factorization used by
SDTTRS or CDTTRS, respectively
NAME
SOLR – Solves a second-order linear recurrence
SYNOPSIS
CALL SOLR (n, x, incx, y, incy, a, inca)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
DESCRIPTION
SOLR solves second-order linear recurrences, as in the following equation:
a i ← x i– 1 a i– 1 + y i– 2 a i– 2 for i = 3, . . ., n
a 1 and a 2 are input to this routine, and a 3 , a 4 , . . ., a n are output.
This routine has the following arguments:
n Integer. (input)
Length of linear recurrence. If n ≤ 2, SOLR returns without any computation.
x Real array of dimension 1+(n – 1) . incx . (input)
Contains vector of multipliers for the first-order term of the recurrence.
If incx > 0, x (incx . (n – 2)+1) and x (incx . (n – 1)+1) are arbitrary.
If incx < 0, x(1) and x(1– incx) are arbitrary.
If incx = 0, x is a scalar multiplier.
incx Integer. (input)
Increment between elements of x.
y Real array of dimension 1+(n – 1) . incy . (input)
Contains vector of multipliers for the second-order term of the recurrence.
If incy > 0, y (incy . (n – 2)+1) and y (incy . (n – 1)+1) are arbitrary.
If incy < 0, y(1) and y(1– incy) are arbitrary.
If incy = 0, y is a scalar multiplier.
incy Integer. (input)
Increment between elements of y.
a Real array of dimension 1+(n – 1) . inca . (input and output)
Contains result vector.
inca Integer. (input)
Increment between elements of a.
The following is the Fortran equivalent of SOLR (given for case incx = incy = inca = 1):
DO 10 I=3 ,N
A(I )=X(I-2)* A(I -1) +Y(I-2 )*A (I- 2)
10 CON TIN UE
NOTES
When working backward (incx < 0, incy < 0, or inca < 0), each routine starts at the end of the vector and
moves backward, as follows:
x (1−incx . (n −1)), x (1−incx . (n −2)),. . ., x (1−2 . incx)
y (1−incy . (n −1)), y (1−incy . (n −2)),. . ., y (1−2 . incy)
a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)
CAUTIONS
Do not specify inca as 0, because unpredictable results may occur.
SEE ALSO
FOLR(3S), FOLR2(3S), FOLR2P(3S), FOLRC(3S), FOLRN(3S), FOLRNP(3S), FOLRP(3S) to solve various
forms of first-order linear recurrence
SOLR3(3S) to solve a three-term, second-order linear recurrence
SOLRN(3S) to solve the same recurrence as SOLR, but SOLRN calculates only the last term
NAME
SOLR3 – Solves a second-order linear recurrence for three terms
SYNOPSIS
CALL SOLR3 (n, x, incx, y, incy, a, inca)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
DESCRIPTION
SOLR3 solves second-order linear recurrences of three terms, as in the following equation:
a i ← a i + x i– 1 a i– 1 + y i– 2 a i– 2 for i = 3, . . ., n
All values of a are input to this routine, and a 3 , a 4 , . . ., a n are output.
This routine has the following arguments:
n Integer. (input)
Length of linear recurrence. If n ≤ 2, SOLR3 returns without any computation.
x Real array of dimension 1+(n – 1) . incx . (input)
Contains vector of multipliers for the first-order term of the recurrence.
If incx > 0, x (incx . (n – 2)+1) and x (incx . (n – 1)+1) are arbitrary.
If incx < 0, x(1) and x(1– incx) are arbitrary.
If incx = 0, x is a scalar multiplier.
incx Integer. (input)
Increment between elements of x.
y Real array of dimension 1+(n – 1) . incy . (input)
Contains vector of multipliers for the second-order term of the recurrence.
If incy > 0, y (incy . (n – 2)+1) and y (incy . (n – 1)+1) are arbitrary.
If incy < 0, y(1) and y(1– incy) are arbitrary.
If incy = 0, y is a scalar multiplier.
incy Integer. (input)
Increment between elements of y.
a Real array of dimension 1+(n – 1) . inca . (input and output)
Contains result vector.
inca Integer. (input)
Increment between elements of a.
The following is the Fortran equivalent of SOLR (given for case incx = incy = inca = 1):
DO 10 I=3 ,N
A(I)=A(I) +X( I-2 )*A(I- 1)+ Y(I -2)*A( I-2 )
10 CON TINUE
NOTES
When working backward (incx < 0, incy < 0, or inca < 0), each routine starts at the end of the vector and
moves backward, as follows:
x (1−incx . (n −1)), x (1−incx . (n −2)),. . ., x (1−incx . 2)
y (1−incy . (n −1)), y (1−incy . (n −2)),. . ., y (1−incy . 2)
a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)
If incx = 0 or incy = 0, x or y (respectively) is a scalar multiplier.
CAUTIONS
Do not specify inca as 0, because unpredictable results may occur.
EXAMPLES
You can use SOLR3 to solve a lower triangular two-subdiagonal system of linear equations La = b. That is,
because
| 1 0 0 0 . . . . 0| |a( 1)| |b( 1)|
|e( 1) 1 0 0 . . . . 0| |a( 2)| |b( 2)|
|f( 1) e(2 ) 1 0 . . . . 0| |a( 3)| |b( 3)|
| 0 f(2 ) e(3 ) 1 0 . . . 0| |a( 4)| |b( 4)|
La =| 0 0 f(3 ) e(4 ) 1 0 . . 0| | . | = | . | = b
| . . . . . . . . 0| | . | | . |
| . . . . . . . . 0| | . | | . |
| . . . . . . . . 0| | . | | . |
| 0 0 0 . . . f(n -2) e(n -1) 1| |a( n)| |b( n)|
DO 10 I=1,N- 1
10 E(I )=-E(I )
DO 20 I=1,N- 2
20 F(I )=-F(I )
B(2 )=B (2)+E( 1)*B(1 )
CAL L SOL R3( N,E(2) ,1,F(1 ),1 ,B(1), 1)
SEE ALSO
FOLR(3S), FOLR2(3S), FOLR2P(3S), FOLRC(3S), FOLRN(3S), FOLRNP(3S), FOLRP(3S) to solve various
forms of first-order linear recurrence
SOLR(3S) to solve a two-term second-order linear recurrence
SOLRN(3S) to solve the same recurrence as SOLR, but SOLRN calculates only the last term
NAME
SOLRN – Solves a second-order linear recurrence for only the last term
SYNOPSIS
r = SOLRN (n, x, incx, y, incy, a, inca)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
DESCRIPTION
SOLRN solves for r, the last term in the following second-order linear recurrence:
a i ← x i– 2 a i– 1 + y i– 2 a i– 2 i = 3,4,. . .,n
r ← an
Only a 1 and a 2 are used as input. The remaining elements of a are workspace that is overwritten on output.
This function has the following arguments:
r Real. (output)
Value of the last term of the linear recurrence.
If n ≤ 0, r is set to 0.
If n = 1, r is set to the first element of a.
If n = 2, r is set to the second element of a.
n Integer. (input)
Length of linear recurrence.
x Real array of dimension 1+(n – 1) . incx . (input)
Contains vector of multipliers for the first-order term of the recurrence.
If incx > 0, x (incx . (n – 2)+1) and x (incx . (n – 1)+1) are arbitrary.
If incx < 0, x(1) and x(1– incx) are arbitrary.
If incx = 0, x is a scalar multiplier.
incx Integer. (input)
Increment between elements of x.
y Real array of dimension 1+(n – 1) . incy . (input)
Contains vector of multipliers for the second-order term of the recurrence.
If incy > 0, y (incy . (n – 2)+1) and y (incy . (n – 1)+1) are arbitrary.
If incy < 0, y(1) and y(1– incy) are arbitrary.
If incy = 0, y is a scalar multiplier.
incy Integer. (input)
Increment between elements of y.
For SOLRN, even though only the last term is computed, array a (A in this Fortran code) is used to hold
intermediate results and, therefore, it is overwritten.
NOTES
When working backward (incx < 0, incy < 0, or inca < 0), each routine starts at the end of the vector and
moves backward, as follows:
x (1−incx . (n −1)), x (1−incx . (n −2)),. . ., x (1−2 . incx)
y (1−incy . (n −1)), y (1−incy . (n −2)),. . ., y (1−2 . incy)
a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)
If incx = 0 or incy = 0, x or y (respectively) is a scalar multiplier.
CAUTIONS
Do not specify inca as 0, because unpredictable results may occur.
EXAMPLES
SOLRN might be used to find r 2 of the calculation
x 1 y 1 x 2 y 2 . . . x n−2 y n−2 a 2 = r 2
1 0 1 0 1 0 a1 r1
with the following call:
R2 = SOLRN( N,X ,1,Y,1 ,A, 1)
R1=A(1 )
R2=A(2 )
DO 10 I=1 ,N-2
TEM P=R 2
R2= X(I)*R 2+Y(I) *R1
R1= TEMP
10 CON TIN UE
SEE ALSO
FOLR(3S), FOLR2(3S), FOLR2P(3S), FOLRC(3S), FOLRN(3S), FOLRNP(3S), FOLRP(3S) to solve various
forms of first-order linear recurrence
SOLR(3S) to solve the same recurrence as SOLRN, but it calculates all terms, not just the last term
SOLR3(3S) to solve a three-term second-order linear recurrence
NAME
INTRO_BLACS – Introduction to Basic Linear Algebra Communication Subprograms
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
The Basic Linear Algebra Communication Subprograms (BLACS) is a package of routines for UNICOS/mk
systems that provides the same functionality for message-passing linear algebra communication as the Basic
Linear Algebra Subprograms (BLAS) provide for linear algebra computation. With these two packages,
software for dense linear algebra on UNICOS/mk systems can use calls to the BLAS for computation and
calls to the BLACS for communication.
The BLACS consist of communication primitives routines, and global reduction routines. There are several
support routines.
The current version of the BLACS is compatible with the version last released by the ScaLAPACK group at
the University of Tennessee. Arrays passed to the BLACS routines must not be dynamically allocated from
the heap.
Communication Primitives
The communication primitives send a matrix to another processor or receive a matrix from another
processor; if a processor has data to be broadcast to all or a subset of processors, a broadcast communication
primitive must be used to send or receive the data. Any processor involved in a send or receive operation
must have the same amount of available matrix space.
The communication primitives can work on matrices (as indicated by the m, n, and lda arguments to the
routines) of data types of integer, real, or complex. The user can specify that only a portion of the matrix (a
trapezoidal matrix) be referenced in the operation. The uplo argument specifies whether the upper or lower
trapezoid should be used; the diag argument specifies if the matrix is a unit trapedoizal matrix or a non-unit
trapezoidal matrix.
When using the scope argument for the BLACS routines, operations can be expressed in terms of all
processors, a row of processors, or a column of processors. All processors indicated by the scope argument
will be involved in the operation being performed, even if the processor does not have data to contribute or
does not need the data being communicated.
When broadcast operations are involved, a communication pattern must be selected. The top argument
denotes the communication topology for a communication primitive or global operation.
The following table describes the available communication primitives, the routine names, and the man page
where the primitive is described:
Routine
Description name Man page
The following table describes the available global reduction routines, the routine names, and the man page
name where the primitive is described:
Routine
Description name Man page
Topologies
Different communication topologies can be used to optimize performance. Several factors can be used to
determine the best topologies. For example, a ring topology is often preferred if one processor’s time is
preferred over another processor’s; or a minimum spanning tree can be used if all processors need the
information as quickly as possible. The following topologies are supported on UNICOS/mk systems:
• Unidirectional ring. Using the unidirectional ring topology, the source processor issues one broadcast,
and each processor then receives and forwards the message. There are two types of unidirectional rings:
the increasing ring topology and the decreasing ring topology. These are "quiet" topologies (only one
processor is communicating at a time).
• Hypercube or minimum spanning tree. Hypercube broadcasts follow the physical connection of the
system; these are most useful when distributing information to all processors is more important than
saving processor time. In addition, hypercube broadcasts are more noisy, because several processors are
sending data simultaneously.
Support Routines
The BLACS package contains several routines that are not directly releated to linear processing. These
routines are used to compute grid coordinates, to initialize routines, and to return information about
processors.
The following table describes the available support routines, the routine names, and the man page name
where the routine is described:
Context Argument
A new feature in this release of the BLACS is the added capability for the BLACS routines to communicate
over any of many coexisting grids or contexts. Each of the grids (contexts) is identified by an integer called
a context handle. The context handle is output by BLACS_GRIDINIT upon the creation of the grid.
SEE ALSO
Dongarra, Jack J. and Robert A. van de Geijn, "Two Dimensional Basic Linear Algebra Communication
Subprograms," Technical Report CS-91-138, University of Tennessee, October 1991.
NAME
BLACS_BARRIER – Stops execution until all specifed processes have called a routine
SYNOPSIS
CALL BLACS_BARRIER (icntxt, scope)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
BLACS_BARRIER stops execution until all specified processes have called a routine.
This routine has the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT.
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S)
NAME
BLACS_EXIT – Frees all existing grids
SYNOPSIS
CALL BLACS_EXIT()
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
BLACS_EXIT frees all the grids that have been created in the course of a user’s program. The call frees
internal buffer space that was allocated when the different grids were created.
SEE ALSO
BLACS_GRIDEXIT(3S), BLACS_GRIDINIT(3S), INTRO_BLACS(3S)
NAME
BLACS_GRIDEXIT – Frees a grid
SYNPOSIS
CALL BLACS_GRIDEXIT(icntxt)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
BLACS_GRIDEXIT frees a grid that has been created by a call to BLACS_GRIDINIT(3S). The call frees
internal buffer space that has been allocated upon the creation of the grid.
This routine has the following argument:
icntxt Integer. (input)
The context handle identifying the grid returned by BLACS_GRIDINIT(3S) upon the creation
of the grid.
NOTES
If a call to a BLACS routine is made after a call to BLACS_GRIDEXIT with the same context handle, the
program will abort.
SEE ALSO
BLACS_EXIT(3S), BLACS_GRIDINIT(3S), INTRO_BLACS(3S)
NAME
BLACS_GRIDINFO – Returns information about the two-dimensional processor grid
SYNOPSIS
CALL BLACS_GRIDINFO (icntxt, nprow, npcol, myrow, mycol)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
BLACS_GRIDINFO returns information about the processor grid, such as: the number of processor rows,
the number of processor columns, and the grid coordinates of the calling processor.
This routine has the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S). This argument must be passed
but is currently ignored internally.
nprow Integer. (output)
The number of processor rows.
npcol Integer. (output)
The number of processor columns.
myrow Integer. (output)
Row coordinate of processor.
mycol Integer. (output)
Column coordinate of processor.
SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S)
NAME
BLACS_GRIDINIT – Initializes counters, variables, and so on, for the BLACS routines
SYNOPSIS
CALL BLACS_GRIDINIT (icntxt, order, nprow, npcol)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
BLACS_GRIDINIT initializes a nprow-by-npcol grid of processors in a row-major or column-major fashion.
The BLACS_GRIDINIT routine assigns grid coordinates to each processor. Users must call this routine and
it must be called before any access to the other BLACS routines, ScaLAPACK routines, BLAS_S routines,
or the parallel two-dimensional FFT routines. The arguments should be the same on all nodes.
This routine has the following arguments:
icntxt Integer. (output)
Context handle identifying the grid being initialized.
order Character*1. (input)
Specifies whether the grid of processors will be initialized in row-major or col-major order. If
the grid is to match the distribution of a SHARED array, the order should be c.
order = R or r: row-major order
order = C or c: col-major order
nprow Integer. (input)
Indicates the number of processor rows for the processor grid.
npcol Integer. (input)
Indicates the number of processor columns for the processor grid.
SEE ALSO
INTRO_BLACS(3S)
NAME
BLACS_GRIDMAP – a grid of processors
SYNOPSIS
CALL BLACS_GRIDMAP (icntxt, gridmap ld, nprow, npcol)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
BLACS_GRIDMAP initializes a nprow-by-npcol grid of processors in the image of the (input) array gridmap.
This routine can be used as an alternative to BLACS_GRIDINIT in cases where the user’s application
requires a mapping of the processors to the grid that is different from those implemented in
BLACS_GRIDINIT.
This routine has the following arguments:
icntxt Integer. (output)
The context handle identifying the grid being initialized.
gridmap Integer array of dimension (ld, npcol). (input)
Array specifying the map of the processors to the grid. gridmap(i, j) will fill the (i-1)-th
row and (j-1)-th column of the grid (assuming indexing starts from 1).
ld Integer. (input)
Specifies the first dimension of array gridmap as declared in the calling program.
nprow Integer. (input)
Indicates the number of processor rows for the processor grid.
npcol Integer. (input)
Indicates the number of processor columns for the processor grid.
SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S)
NAME
BLACS_PCOORD – Computes coordinates in two-dimensional grids
SYNOPSIS
CALL BLACS_PCOORD (icntxt, pe_num, prow, pcol)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
BLACS_PCOORD computes processor grid coordinates prow and pcol by using pe_num.
This routine has the following arguments:
icntxt Integer. (input)
The context handle returned by a call to BLACS_GRIDINIT(3S). This argument must be
passed but is currently ignored internally.
pe_num Integer. (input)
Processing element.
prow Integer. (output)
Row coordinate for processor.
pcol Integer. (output)
Column coordinate for processor.
SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S)
NAME
BLACS_PNUM – Returns the processor element number for specified coordinates in two-dimensional grids
SYNOPSIS
PE_number = BLACS_PNUM (icntxt, prow, pcol)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
BLACS_PNUM returns the processor element number at grid coordinate prow, pcol.
This routine has the following arguments:
icntxt Integer. (input)
The context handle returned by a call to BLACS_GRIDINIT(3S). This argument must be
passed but is currently ignored internally.
prow Integer. (input)
Row coordinate of processor.
pcol Integer. (output)
Column coordinate of processor.
SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S)
NAME
GRIDINFO3D – Returns information about the three-dimensional processor grid
SYNOPSIS
CALL GRIDINFO3D (ictxt, npx, npy, npz, mypex, mypey, mypez)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
GRIDINFO3D returns information about the processor grid, such as the number of processors assigned to the
X, Y, and Z dimensions and the grid coordinates of the calling processor.
The following arguments are available with this routine:
ictxt Integer. (input)
Handle that describes the grid initialized by GRIDINIT3D(3S).
npx Integer. (output)
Number of processors assigned to the X dimension.
npy Integer. (output)
Number of processors assigned to the Y dimension.
npz Integer. (output)
Number of processors assigned to the Z dimension.
mypex Integer. (output)
X coordinate of processor.
mypey Integer. (output)
Y coordinate of processor.
mypez Integer. (output)
Z coordinate of processor.
NOTES
The GRIDINIT3D(3S) routine must be called somewhere in the program before the first call to
GRIDINFO3D.
SEE ALSO
DESCINIT3D(3S), GRIDINIT3D(3S), PCOORD3D(3S), PNUM3D(3S)
NAME
GRIDINIT3D – Initializes variables for a three-dimensional (3D) grid partition of processor set
SYNOPSIS
CALL GRIDINIT3D (ictxt, npx, npy, npz)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
GRIDINIT3D initializes a npx-by-npy-by-npz grid of processors in a column-major fashion. The
GRIDINIT3D routine assigns grid coordinates to each processor. Users must call this routine before calling
any routine that uses information about the 3D grid of processors. The arguments should be the same on all
nodes.
The GRIDINIT3D routine accepts the following arguments:
ictxt Integer. (output)
Handle that describes the 3D grid.
npx Integer. (input)
Number of processors assigned to the X dimension of the processor grid. This argument must
be a power of 2.
npy Integer. (input)
Number of processors assigned to the Y dimension of the processor grid. This argument must
be a power of 2.
npz Integer. (input)
Number of processors assigned to the Z dimension of the processor grid. This argument must be
a power of 2.
As an example, consider a partition of 16 processors (N$PES = 16) that will be initialized as a 3D grid of
size 2-by-4-by-2 (that is, 2 processors assigned to the X dimension, 4 to the Y dimension and 2 to the Z
dimension). GRIDINIT3D assigns the following coordinates to the processors:
Z = 0
Y 0 1 2 3
X |----| --- -|----|-- --|
0 | 0 | 2 | 4 | 6 |
|-- --|--- -|- ---|-- --|
1 | 1 | 3 | 5 | 7 |
|-- --|--- -|- ---|-- --|
Z = 1
Y 0 1 2 3
X |----| --- -|----|-- --|
0 | 8 | 10 | 12 | 14 |
|-- --|--- -|- ---|-- --|
1 | 9 | 11 | 13 | 15 |
|-- --|--- -|- ---|-- --|
In this case processor 2 would have coordinates (0,1,0) and processor 13 would have coordinates (1,2,1).
SEE ALSO
DESCINIT3D(3S), GRIDINFO3D(3S), PCOORD3D(3S), PNUM3D(3S)
NAME
IGAMN2D, SGAMN2D, CGAMN2D – Determines minimum absolute values of rectangular matrices
SYNOPSIS
CALL IGAMN2D (icntxt, scope, top, m, n, a, lda, ra, ca, ldia, rdest, cdest)
CALL SGAMN2D (icntxt, scope, top, m, n, a, lda, ra, ca, ldia, rdest, cdest)
CALL CGAMN2D (icntxt, scope, top, m, n, a, lda, ra, ca, ldia, rdest, cdest)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
IGAMN2D determines minimum absolute values of rectangular matrices.
IGAMN2D communicates integer data. SGAMN2D communicates real data. CGAMN2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Network topology. Only the h topology (minimum spanning tree) is currently supported.
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a IGAMN2D: Integer array, dimension (lda,n). (input/output)
SGAMN2D: Real array, dimension (lda,n). (input/output)
CGAMN2D: Complex array, dimension (lda,n). (input/output)
On entry, a is an m-by-n matrix of values. a is such that, a(i, j) is the element of maximum
absolute value from the (i, j) entry of all the input arrays.
NOTES
The m, n, and lda arguments determine the matrix shape. For an operation to proceed, all processors
indicated by the scope argument must call the given routine. The result is left on all processors indicated by
the scope argument.
These routines were named IGMIN2D, SGMIN2D, and CGMIN2D in a previous release.
SEE ALSO
BLACS_GRIDINIT(3S), IGAMX2D(3S), IGSUM2D(3S), INTRO_BLACS(3S)
NAME
IGAMX2D, SGAMX2D, CGAMX2D – Determines maximum absolute values of rectangular matrices
SYNOPSIS
CALL IGAMX2D (icntxt, scope, top, m, n, a, lda, ra, ca, ldia, rdest, cdest)
CALL SGAMX2D (icntxt, scope, top, m, n, a, lda, ra, ca, ldia, rdest, cdest)
CALL CGAMX2D (icntxt, scope, top, m, n, a, lda, ra, ca, ldia, rdest, cdest)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
IGAMX2D determines maximum absolute values of rectangular matrices.
IGAMX2D communicates integer data. SGAMX2D communicates real data. CGAMX2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Network topology. Only the h topology (minimum spanning tree) is currently supported.
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a IGAMX2D: Integer array, dimension (lda,n). (input/output)
SGAMX2D: Real array, dimension (lda,n). (input/output)
CGMAX: Complex array, dimension (lda,n). (input/output)
On entry, a is an m-by-n matrix of values. On exit, a is such that a(i, j) is the element of
maximum absolute value from the (i, j) entry of all the input arrays.
NOTES
The m, n, and lda arguments determine the matrix shape. For an operation to proceed, all processors
indicated by the scope argument must call the given routine. The result is left on all processors indicated by
the scope argument.
These routines were named IGMAX2D, SGMAX2D, and CGMAX2D in a previous release.
SEE ALSO
BLACS_GRIDINIT(3S), IGAMN2D(3S), IGSUM2D(3S), INTRO_BLACS(3S)
NAME
IGEBR2D, SGEBR2D, CGEBR2D – Receives a broadcast general rectangular matrix from all or a subset of
processors
SYNOPSIS
CALL IGEBR2D (icntxt, scope, top, m, n, a, lda, rsrc, csrc)
CALL IGEBR2D (icntxt, scope, top, m, n, a, lda, rsrc, csrc)
CALL IGEBR2D (icntxt, scope, top, m, n, a, lda, rsrc, csrc)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
IGEBR2D receives a broadcast general rectangular matrix from all or a subset of processors. The source of
the broadcast uses the IGEBS2D(3S) routine to send the matrix. Execution does not resume until the data
arrives.
IGEBR2D communicates integer data. SGEBR2D communicates real data. CGEBR2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Specifies the network topology used by the broadcast.
top = I or i: increasing ring
top = D or d: decreasing ring
top = H or h: hypercube
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.
For an operation to proceed, all processors indicated by the scope argument must call the routine.
These routines will default to the h topology if called with any of the other values of top that are supported
by the standard version of the BLACS from the University of Tennessee (except on UNICOS/mk systems).
SEE ALSO
BLACS_GRIDINIT(3S), IGEBS2D(3S), INTRO_BLACS(3S)
NAME
IGEBS2D, SGEBS2D, CGEBS2D – Broadcasts a general rectangular matrix to all or a subset of processors
SYNOPSIS
CALL IGEBS2D (icntxt, scope, top, m, n, a, lda)
CALL SGEBS2D (icntxt, scope, top, m, n, a, lda)
CALL CGEBS2D (icntxt, scope, top, m, n, a, lda)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
IGEBS2D broadcasts a general rectangular matrix to all or a subsection of processors. The other processors
use the IGERV2D(3S) routine to receive the broadcast matrix. Execution does not resume until the data
arrives.
IGEBS2D communicates integer data. SGEBS2D communicates real data. CGEBS2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Specifies the network topology used by the broadcast.
top = I or i: increasing ring
top = D or d: decreasing ring
top = H or h: hypercube
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.
For an operation to proceed, all processors indicated by the scope argument must call the routine.
These routines will default to the h topology if called with any of the other values of top that are supported
by the standard version of the BLACS from the University of Tennessee (except on Cray T3D systems).
SEE ALSO
BLACS_GRIDINIT(3S), IGEBR2D(3S), IGERV2D(3S), INTRO_BLACS(3S)
NAME
IGERV2D, SGERV2D, CGERV2D – Receives a general rectangular matrix from another processor
SYNOPSIS
CALL IGERV2D (icntxt, m, n, a, lda, rsrc, csrc)
CALL SGERV2D (icntxt, m, n, a, lda, rsrc, csrc)
CALL CGERV2D (icntxt, m, n, a, lda, rsrc, csrc)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
IGERV2D receives a general rectangular matrix from another processor. The other processor uses the
IGESD2D(3S) routine to send the matrix. Execution does not resume until the data arrives.
IGERV2D communicates integer data. SGERV2D communicates real data. CGERV2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a IGERV2D: Integer array, dimension (lda,n). (output)
SGERV2D: Real array, dimension (lda,n). (output)
CGERV2D: Complex array, dimension (lda,n). (output)
The m-by-n array at which the message is to be received.
lda Integer. (input)
The leading dimension of the array a. lda ≥ MAX(m,1).
rsrc Integer. (input)
Row index of source processor.
csrc Integer. (input)
Column index of source processor.
NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.
SEE ALSO
BLACS_GRIDINIT(3S), IGESD2D(3S), INTRO_BLACS(3S)
NAME
IGESD2D, SGESD2D, CGESD2D – Sends a general rectangular matrix to another processor
SYNOPSIS
CALL IGESD2D (icntxt, m, n, a, lda, rdest, cdest)
CALL SGESD2D (icntxt, m, n, a, lda, rdest, cdest)
CALL CGESD2D (icntxt, m, n, a, lda, rdest, cdest)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
IGESD2D sends a general rectangular matrix to another processor. The other processor uses the
IGERV2D(3S) routine to receive the matrix. Execution does not resume until the data arrives.
IGESD2D communicates integer data. SGESD2D communicates real data. CGESD2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a IGESD2D: Integer array, dimension (lda,n). (input)
SGESD2D: Real array, dimension (lda,n). (input)
CGESD2D: Complex array, dimension (lda,n). (input)
The m-by-n array to be sent.
lda Integer. (input)
The leading dimension of the array a. lda ≥ MAX(m,1).
rdest Integer. (input)
Row index of destination processor.
cdest Integer. (input)
Column index of destination processor.
NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.
SEE ALSO
BLACS_GRIDINIT(3S), IGERV2D(3S), INTRO_BLACS(3S)
NAME
IGSUM2D, SGSUM2D, CGSUM2D – Performs element summation operations on rectangular matrices
SYNOPSIS
CALL IGSUM2D (icntxt, scope, top, m, n, a, lda, rdest, cdest)
CALL SGSUM2D (icntxt, scope, top, m, n, a, lda, rdest, cdest)
CALL CGSUM2D (icntxt, scope, top, m, n, a, lda, rdest, cdest)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
IGSUM2D performs element summation operations on rectangular matrices.
IGSUM2D communicates integer data. SGSUM2D communicates real data. CGSUM2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Network topology. Only the h topology (minimum spanning tree) is currently supported.
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a IGSUM2D: Integer array, dimension (lda,n). (input/output)
SGSUM2D: Real array, dimension (lda,n). (input/output)
CGSUM2D: Complex array, dimension (lda,n). (input/output)
On exit, a is such that a(i, j) is the sum of all (i, j) entries in the input arrays.
lda Integer. (input)
The leading dimension of the array a. lda ≥ MAX(m,1).
rdest Ignored.
cdest Ignored.
NOTES
The m, n, and lda arguments determine the matrix shape. For an operation to proceed, all processors
indicated by the scope argument must call the given routine. The result is left on all processors indicated by
the scope argument.
SEE ALSO
BLACS_GRIDINIT(3S), IGAMX2D(3S), IGAMN2D(3S), INTRO_BLACS(3S)
NAME
ITRBR2D, STRBR2D, CTRBR2D – Receives a broadcast trapezoidal rectangular matrix from all or a subset
of processors
SYNOPSIS
CALL ITRBR2D (icntxt, scope, top, uplo, diag, m, n, a, lda, rsrc, csrc)
CALL STRBR2D (icntxt, scope, top, uplo, diag, m, n, a, lda, rsrc, csrc)
CALL CTRBR2D (icntxt, scope, top, uplo, diag, m, n, a, lda, rsrc, csrc)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
ITRBR2D receives a broadcast trapezoidal matrix from all or a subset of processors. The source of the
broadcast uses the ITRBS2D(3S) routine to send the matrix. Execution does not resume until the data
arrives.
ITRBR2D communicates integer data. STRBR2D communicates real data. CTRBR2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Specifies the network topology used by the broadcast.
top = I or i: increasing ring
top = D or d: decreasing ring
top = H or h: hypercube
uplo Character*1. (input)
Specifies whether the trapezoid is in the upper or lower triangular part of the matrix a, as
follows:
If uplo = ’U’ or ’u’, the trapezoid is in the upper triangular part of the matrix.
If uplo = ’L’ or ’l’, the trapezoid is in the lower triangular part of the matrix.
NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.
For an operation to proceed, all processors indicated by the scope argument must call the routine.
These routines will default to the h topology if called with any of the other values of top that are supported
by the standard version of the BLACS from the University of Tennessee (except on Cray T3D systems).
SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S), ITRBS2D(3S)
NAME
ITRBS2D, STRBS2D, CTRBS2D – Broadcasts a trapezoidal rectangular matrix to all or a subset of
processors
SYNOPSIS
CALL ITRBS2D (icntxt, scope, top, uplo, diag, m, n, a, lda)
CALL STRBS2D (icntxt, scope, top, uplo, diag, m, n, a, lda)
CALL CTRBS2D (icntxt, scope, top, uplo, diag, m, n, a, lda)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
ITRBS2D broadcasts a trapezoidal rectangular matrix to all or a subset of processors. The other processors
use the ITRBR2D(3S) routine to receive the broadcast matrix. Execution does not resume until the data
arrives.
ITRBS2D communicates integer data. STRBS2D communicates real data. CTRBS2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Specifies the network topology used by the broadcast.
top = I or i: increasing ring
top = D or d: decreasing ring
top = H or h: hypercube
uplo Character*1. (input)
Specifies whether the trapezoid is in the upper or lower triangular part of the matrix a, as
follows:
If uplo = ’U’ or ’u’, the trapezoid is in the upper triangular part of the matrix.
If uplo = ’L’ or ’l’, the trapezoid is in the lower triangular part of the matrix.
NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.
For an operation to proceed, all processors indicated by the scope argument must call the routine.
These routines will default to the h topology if called with any of the other values of top that are supported
by the standard version of the BLACS from the University of Tennessee (except on Cray T3D systems).
SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S), ITRBR2D(3S)
NAME
ITRRV2D, STRRV2D, CTRRV2D – Receives a trapezoidal rectangular matrix from another processor
SYNOPSIS
CALL ITRRV2D (icntxt, uplo, diag, m, n, a, lda, rsrc, csrc)
CALL STRRV2D (icntxt, uplo, diag, m, n, a, lda, rsrc, csrc)
CALL CTRRV2D (icntxt, uplo, diag, m, n, a, lda, rsrc, csrc)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
ITRRV2D receives a trapezoidal matrix from another processor. The other processor uses the ITRSD2D(3S)
routine to send the matrix. Execution does not resume until the data arrives.
ITRRV2D communicates integer data. STRRV2D communicates real data. CTRRV2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
uplo Character*1. (input)
Specifies whether the trapezoid is in the upper or lower triangular part of the matrix a, as
follows:
If uplo = ’U’ or ’u’, the trapezoid is in the upper triangular part of the matrix.
If uplo = ’L’ or ’l’, the trapezoid is in the lower triangular part of the matrix.
diag Character*1. (input)
Specifies whether the matrix a has ones on the diagonal, as follows:
If diag = ’U’ or ’u’, specifies a unit trapezoidal matrix.
If diag = ’N’ or ’n’, specifies a non-unit trapezoidal matrix.
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a ITRRV2D: Integer array, dimension (lda,n). (output)
STRRV2D: Real array, dimension (lda,n). (output)
CTRRV2D: Complex array, dimension (lda,n). (output)
The m-by-n matrix containing the trapezoidal matrix to be sent.
NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.
SEE ALSO
INTRO_BLACS(3S), ITRSD2D(3S)
NAME
ITRSD2D, STRSD2D, CTRSD2D – Sends a trapezoidal rectangular matrix to another processor
SYNOPSIS
CALL ITRSD2D (icntxt, uplo, diag, m, n, a, lda, rdest, cdest)
CALL STRSD2D (icntxt, uplo, diag, m, n, a, lda, rdest, cdest)
CALL CTRSD2D (icntxt, uplo, diag, m, n, a, lda, rdest, cdest)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
ITRSD2D sends a trapezoidal matrix to another processor. The other processor uses the ITRRV2D(3S)
routine to receive the matrix. Execution does not resume until the data arrives.
ITRSD2D communicates integer data. STRSD2D communicates real data. CTRSD2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
uplo Character*1. (input)
Specifies whether the trapezoid is in the upper or lower triangular part of the matrix a, as
follows:
If uplo = ’U’ or ’u’, the trapezoid is in the upper triangular part of the matrix.
If uplo = ’L’ or ’l’, the trapezoid is in the lower triangular part of the matrix.
diag Character*1. (input)
Specifies whether the matrix a has ones on the diagonal, as follows:
If diag = ’U’ or ’u’, specifies a unit trapezoidal matrix.
If diag = ’N’ or ’n’, specifies a non-unit trapezoidal matrix.
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a ITRSD2D: Integer array, dimension (lda,n). (input)
STRSD2D: Real array, dimension (lda,n). (input)
CTRSD2D: Complex array, dimension (lda,n). (input)
The m-by-n matrix containing the trapezoidal matrix where the message is to be sent.
NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.
SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S), ITRRV2D(3S)
NAME
MYNODE – Returns the calling processor’s assigned number
SYNOPSIS
MY_NUMBER = MYNODE()
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
MYNODE returns a number between 0 and NPES– 1, where NPES is the number of processors in the
mainframe partition on which the program is executing.
SEE ALSO
BLACS_GRIDINFO(3S), BLACS_PCOORD(3S), BLACS_PNUM(3), INTRO_BLACS(3S)
NAME
PCOORD3D – Computes three-dimensional (3D) processor grid coordinates
SYNOPSIS
CALL PCOORD3D (ictxt, pe_num, pex, pey, pez)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PCOORD3D computes processor grid coordinates pex, pey, and pez by using pe_num.
This routine accepts the following arguments:
ictxt Integer. (input)
Handle that describes the grid initalized by GRIDINIT3D(3S).
pe_num Integer. (input)
Processing element.
pex Integer. (output)
X coordinate for processor.
pey Integer. (output)
Y coordinate for processor.
pez Integer. (output)
Z coordinate for processor.
NOTES
The GRIDINIT3D(3S) routine must be called somewhere in the program before the first call to PCOORD3D.
SEE ALSO
DESCINIT3D(3S), GRIDINFO3D(3S), GRIDINIT3D(3S), PNUM3D(3S)
NAME
PNUM3D – Returns the processor element number for specified three-dimensional (3D) coordinates
SYNOPSIS
PE_number = PNUM3D (ictxt, pex, pey, pez)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PNUM3D returns the processor element number at grid coordinate pex, pey, and pez.
This routine accepts the following arguments:
ictxt Integer. (input)
Handle that describes the grid initalized by GRIDINIT3D(3S).
pex Integer. (input)
X coordinate of processor.
pey Integer. (input)
Y coordinate of processor.
pez Integer. (input)
Z coordinate of processor.
NOTES
The routine GRIDINIT3D(3S) must be called somewhere in the program before the first call to PNUM3D.
SEE ALSO
DESCINIT3D(3S), GRIDINFO3D(3S), GRIDINIT3D(3S), PCOORD3D(3S)
NAME
INTRO_CORE – Introduction to the Scientific Library out-of-core routines for linear algebra
IMPLEMENTATION
UNICOS systems
DESCRIPTION
The Scientific Library out-of-core routines for linear algebra let you solve problems in which it is not
possible, or not convenient, to store all of the data in main memory during program execution. The central
concept on which these routines are based is the idea of the virtual matrix, which is stored outside main
memory (perhaps on disk or on SSD), and referenced through a Fortran I/O unit number.
The following list describes the purpose and name of each out-of-core routine. The first name listed is the
name of the man page that documents the routines.
Virtual Matrix Initialization and Termination Routines
• VBEGIN: Initializes out-of-core routine data structures.
• VEND: Handles terminal processing for the out-of-core routines.
• VSTORAGE: Declares packed storage mode for a triangular, symmetric, or Hermitian virtual matrix.
Virtual Matrix Copy Routines
• SCOPY2RV, CCOPY2RV: Copies a submatrix of a real (in memory) matrix to a virtual matrix.
• SCOPY2VR, CCOPY2VR: Copies a submatrix of a virtual matrix to a real (in memory) matrix.
Virtual Linear Algebra Package Routines
• VSGETRF, VCGETRF: Computes an LU factorization of a virtual general matrix, using partial pivoting
with row interchanges.
• VSGETRS, VCGETRS: Solves a system of linear equations AX = B; A is a virtual general matrix whose
LU factorization has been computed by VSGETRF(3S).
• VSPOTRF: Computes the Cholesky factorization of a virtual real symmetric positive definite matrix.
• VSPOTRS: Solves a system of linear equations AX = B; A is a virtual real symmetric positive definite
matrix whose Cholesky factorization has been computed by VSPOTRF(3S).
Virtual Level 3 Basic Linear Algebra
• VSGEMM, VCGEMM: Multiplies a virtual general matrix by a virtual general matrix.
• VSTRSM, VCTRSM: Solves a virtual triangular system of equations with multiple right-hand sides.
• VSSYRK: Performs symmetric rank k update of virtual symmetric matrix.
General Introduction
Some problems are so large that it is not possible, or at least not convenient, to store all of the data in main
memory during program execution. For such problems, you can use an out-of-core technique. This term is
an anachronism, referring as it does to magnetic core memory, but the name is still used to refer to
algorithms that combine input and output with computation to solve problems in which the data resides on
disk or some other secondary random-access storage device.
Consider the problem of solving a system of simultaneous linear equations. If the system contains n
equations with n unknowns, the amount of data required to represent the problem is n 2 floating-point
numbers. The amount of computation required to compute a solution is approximately 2n 3 ⁄ 3 floating-point
operations. For example, if n = 30,000, the amount of memory required to store the matrix is 900 Mwords.
If the effective computational rate were 2.0 GFLOPS, the amount of time required to solve the problem
would be 9,000 seconds (2.5 hours).
This amount of computation is large compared to the amount of input and output required, so this problem is
computationally intensive. Therefore, it is an excellent candidate for solution by an out-of-core technique
(especially because solving large problems of this type is of great practical importance in many areas of
application).
Out-of-core Linear Algebra Software
The Scientific Library contains a unified set of routines for out-of-core solution of problems in dense linear
algebra. These routines are designed to be easy to use and highly efficient. The design of the out-of-core
routines is parallel to the design of library software that solves similar problems "in-core" (in memory),
namely LAPACK (Linear Algebra PACKage) and BLAS (Basic Linear Algebra Subprograms).
The LAPACK library is a state-of-the-art package for solving problems in dense linear algebra (see the
INTRO_LAPACK(3S) man page for more information on the Cray implementation of LAPACK). For out-
of-core problems in dense linear algebra, the Scientific Library has followed a software design which, from a
user perspective, is very similar to the LAPACK routines, and which uses similar or identical algorithms.
These routines are called the Virtual LAPACK, or VLAPACK, routines.
The LAPACK routines perform much of their computational work through calls to the Level 3 Basic Linear
Algebra Subprograms (Level 3 BLAS), which are designed to perform very efficiently on parallel vector
computers (see the INTRO_BLAS3(3S) man page for more information on the Level 3 BLAS routines).
Likewise, the Virtual LAPACK routines are based on a set of Virtual BLAS, called VBLAS.
Some features of these out-of-core library routines include the following:
• The routines are based on state-of-the-art algorithms for numerical linear algebra.
• Highly-efficient computational kernels perform at peak attainable speed on the hardware.
• Highly-efficient input and output is done automatically by the software; therefore, users do not have to be
involved in the details of the I/O routines.
• Virtual matrices are easy to create and use.
• Detailed performance measurement capabilities are built in. Performance statistics can be printed
automatically to give users complete information on software and hardware performance.
• Users can easily change certain tuning parameters to optimize the software for each specific problem and
computing environment.
Virtual Matrices
An important concept in the out-of-core routines is that of a virtual matrix. You can think of a virtual
matrix as a mathematical matrix, the elements of which are accessed in a certain way, using subroutine calls.
In some ways, a virtual matrix is like a two-dimensional Fortran array. Like a Fortran array, a virtual matrix
has elements that are real numbers. Like a Fortran array, a virtual matrix has subscripts that are integers
between 1 and some positive number n, and has a certain "leading dimension," which the user defines when
creating the virtual matrix.
Unlike a Fortran array, a virtual matrix is not accessed directly from a Fortran (or C) program. Instead, you
access a virtual matrix by using calls to the out-of-core routines.
These subroutines provide the only mechanism for manipulating a virtual matrix. In particular, a user never
has to do any explicit input or output to read or write a virtual matrix. Even though a virtual matrix is
actually stored as a file, users do not have to be concerned with the actual I/O. The library software handles
the I/O details automatically and efficiently, leaving users free to concentrate on the mathematical solution to
the problem at hand, and for the most part, to ignore the fact that out-of-core techniques are in use.
The next subsection, "Subroutine Types," briefly describes these routines. After that, the NOTES section
provides more specific information about virtual matrices.
Subroutine Types
The Scientific Library out-of-core software user interface comprises four types of subroutines:
• Initialization and termination routines
• Virtual matrix copy (VCOPY) routines
• Virtual LAPACK (VLAPACK) routines
• Virtual Level 3 BLAS (VBLAS) routines
The subsections that follow describe each subroutine.
Initialization and termination routines
You must initialize the underlying library routines by a call to the VBEGIN(3S) routine. This routine has
several optional arguments, all of which relate to tuning performance of the package. The most important
argument is an integer that specifies how many words to use for buffer space. VBEGIN(3S) automatically
allocates the requested amount of memory, using a call to the operating system. Likewise, you must call the
VEND(3S) routine when you are done with virtual linear algebra. VEND(3S) closes any open files that are
being used for virtual matrices and deallocates the memory that was allocated by VBEGIN(3S).
VSTORAGE(3S) declares that an existing virtual matrix (initialized with VBEGIN(3S)) is stored and
referenced in packed form. See the NOTES section for more on packed storage.
For applications in which the matrices contain complex numbers, rather than real numbers, you can use the
Virtual LAPACK routines VCGETRF(3S) and VCGETRS(3S).
Virtual Level 3 BLAS routines
The LAPACK routines perform much of their actual computation by calling Level 3 BLAS routines, which
are designed for speed and efficiency on parallel-vector computers (see the INTRO_BLAS3(3S) man page).
Likewise, the Virtual LAPACK routines are based on a set of Virtual Level 3 BLAS (VBLAS).
For example, the BLAS routine to perform a matrix multiply is called SGEMM(3S) (single-precision real) or
CGEMM(3S) (complex). The corresponding out-of-core routine for virtual matrices is VSGEMM(3S) or
VCGEMM(3S), respectively.
The calling sequences of the Virtual LAPACK and Virtual BLAS routines are similar to those of the
corresponding LAPACK and BLAS routines, but when an in-memory routine requires a matrix argument, the
corresponding virtual routine requires one or more arguments that specify a virtual matrix.
NOTES
This section describes further aspects of virtual matrices and the out-of-core routines that operate on them.
Unit Numbers
The name of a virtual matrix is an integer number between 1 and 99, inclusive. The name identifies the
Fortran unit number of the file in which the virtual matrix file is stored. By default, unit number 1 is
associated with file fort.1, unit 2 with file fort.2, and so on.
Do not use any unit number that your program is using for another purpose.
Also, do not use any of the following units: 0, 5, 6, 100, 101, or 102, because these unit numbers are, by
default, associated with the following special files:
stdin 5 and 100
stdout 6 and 101
stderr 0 and 102
You may close and reopen units 0, 5, and 6 as virtual matrices, but not units 100, 101, and 102.
You can associate a particular file with a particular unit number by using the "assign by unit" option of the
assign(1) command (see the assign(1) man page for more information about the assign command).
As an example, suppose you want to store your virtual matrix on a file in directory /tmp/xxx and call the
file mydata. If you choose to use Fortran unit number 3 for the file, prior to executing the program that
calls the out-of-core software, you could issue the following command line:
ass ign -a /tm p/x xx/ myd ata u:3
Within the out-of-core subroutines, you would use the number 3 as the value of the argument for the virtual
matrix name.
File Format
A virtual matrix is actually stored as a file, in a special format that is useful only for the virtual linear
algebra routines. But outside of the program, at the operating system level, such a file can be copied,
moved, archived, compressed, and so on, just like any other binary file. The assign(1) command
determines the actual characteristics of the file, including the device to which it is assigned (that is, disk or
SSD).
Technically, a virtual matrix is a binary unblocked file. You do not have to specify the -s u option on the
assign command; you cannot use other formats or conversions in conjunction with the out-of-core
routines. If you try to use other formats or conversions, your program will abort with an Asynchronous
Queued I/O (AQIO) error message. Actually, the virtual matrix file is blocked into "pages," but this
blocking is done by the Scientific Library out-of-core routines, not by the system I/O routines; therefore, for
the assign command, the virtual matrix file is considered to be an unblocked file.
The actual input and output is done internally using a feature called Asynchronous Queued I/O (AQIO).
This feature allows highly-efficient, random-access I/O without using any unnecessary intermediate buffering
of data.
If you want to use a file of data that was created by some means other than using Virtual LAPACK or
Virtual BLAS routines, you should write a program that reads the file, using the usual Fortran I/O facilities,
and copies the file, one section at a time, to a virtual matrix, by using the virtual copy routines. Likewise, if
you want to use a virtual matrix as input to some other program, you should write a program that uses the
virtual copy routines to get data from the virtual matrix, and then write it out using the usual Fortran I/O
facilities. If only the virtual linear algebra routines use the data, it is most convenient to just work with the
virtual matrix files themselves, using the subroutines provided.
Leading Virtual Dimension
A virtual matrix has a certain "leading dimension" (that is, the first dimension) just like a Fortran
two-dimensional array. For instance, if the virtual matrix is 1000 by 2000 elements, the first (leading)
dimension is 1000. You should supply the value 1000 for the leading dimension argument in the
subroutines.
You can use any value for the leading dimension, but after it is defined, you cannot change it. If you
originally created the virtual matrix with 1000 for the leading dimension, you must always use the same
value in subsequent subroutine calls.
Definition and Redefinition of Elements
When accessing elements of a virtual matrix, the value of the first subscript must be in the range
1 ≤ i ≤ lvd
where i is the subscript, and lvd is the leading virtual dimension, as defined in the subroutine call. The
second subscript must be a positive integer. No set upper limit to the value of the second subscript exists.
When you first create a virtual matrix, you must explicitly define every element of the matrix before you use
it in a computation. You can consider that any element you have not explicitly defined is undefined, and it
should not be referenced. For example, if you want to create an identity matrix of size 2000 by 2000, you
could zero out all 4,000,000 elements, then set the 2000 diagonal elements to 1, using the virtual copy
routines. You should not just set the diagonal elements to 1 and assume that the off-diagonal elements are 0.
After the elements of a virtual matrix are defined, their values remain defined unless you explicitly change
them or remove the file.
File Size
The size (in words) of a virtual matrix file is slightly larger than the total number of elements it contains.
Thus, a virtual matrix of size 5000 by 5000 would contain slightly more than 25 million words, or 200
Mbytes of data. The reason that it is not exactly 25 million words has to do with the way that the software
organizes data internally into pages.
When you define the value of a virtual matrix element, you are implicitly creating file space for all elements
up to the one you define. For example, if you declare that a virtual matrix has a leading dimension of 5000,
and you define a value for element (1, 1000), the software will create a virtual matrix file large enough to
contain elements (i, j) for 1 ≤ i ≤ 5000, 1 ≤ j ≤ 1000, which is 5 Mwords, or 40 Mbytes of file space.
Page Size
At the internal level, the software organizes virtual matrices into "pages." The size of a page is, by default,
256 by 256 words, or 65,536 words. I/O transfers, internally, are done in minimum units of one page. For
both disk and SSD, this size gives excellent performance.
You may redefine the page size that the out-of-core routines use, although it is not recommended unless
special performance tuning considerations are involved. The internal file structure of a virtual matrix
depends on the page size. Thus, a virtual matrix created with a certain page size could not later be read or
written using a different page size; but instead, it would have to be re-created.
Lower-level Routines
The user-level out-of-core routines are built on lower-level routines that manage work request queues, active
page queues, and other tasks. These routines in turn, depend on the AQIO routines and the operating system
routines.
Strassen’s Algorithm
Strassen’s algorithm for matrix multiplication is a recursive algorithm that is slightly faster than the ordinary
(inner product) algorithm. This additional speed is purchased at the expense of requiring some additional
memory for intermediate workspace. Because the Virtual LAPACK and Virtual BLAS routines are
managing their own memory anyway, and performing their work on individual page size blocks, it is an easy
matter to use Strassen’s algorithm everywhere that a matrix multiplication is required.
Strassen’s algorithm performs the floating-point operations for matrix multiplication in an order that is very
different than the usual vector method. In some cases, this could cause differences in round-off, possibly
leading to numerical differences in the result.
You may choose whether to use Strassen’s algorithm when calling VBEGIN(3S), either by passing an
argument to VBEGIN(3S), or by setting the VBLAS_STRASSEN environment variable before run time. For
C shell, use the following command:
set env VBLAS_ STR ASSEN
If the user selects Strassen’s algorithm, VBEGIN(3S) automatically allocates the necessary workspace. In
subsequent virtual matrix computations, Strassen’s algorithm is then automatically used for all matrix
multiplications, including matrix multiplications done as part of the VSGEMM(3S) and VSTRSM(3S) routines.
Multitasking
Like most of the Scientific Library routines, the Virtual LAPACK and Virtual BLAS routines perform
multitasking automatically. To control the use of multitasking, set the value of the NCPUS environment
variable before run time to an integer number that indicates the number of processors you want to use. For
example, to use only one CPU (which effectively turns off multitasking), use one of the following
commands. For the C shell, enter the following command:
set env NCPUS 1
the software will try to use four CPUs. The actual number of CPUs used depends on the availability of
resources (see the INTRO_LIBSCI(3S) man page for more information on multitasking in the Scientific
Library).
Complex Routines
Most of the out-of-core software described previously deals with matrices of real numbers. There are also
counterparts to these routines that work with matrices of complex numbers (numbers that have a real and
imaginary part). For example, the complex two-dimensional counterpart of the virtual copy routine
SCOPY2RV(3S) is routine CCOPY2RV(3S). Likewise, routine VCGETRF(3S) factors a general complex
virtual matrix. In the naming conventions for all routines, the letter "S" denotes real (that is, "single-
precision") data; the letter "C" denotes complex data.
Packed Storage
Packed storage of a triangular or symmetric matrix means that only half of the matrix is actually stored on
disk or SSD. If a real matrix is declared to be lower triangular, only the lower triangle is stored; if upper
triangular, only the upper triangle is stored. If the matrix is symmetric, either the lower or upper triangular
part may be stored.
Likewise, a complex matrix may be lower or upper triangular, or may be symmetric, with only the lower or
upper triangle being stored. Additionally, a complex matrix may be Hermitian (equal to the conjugate of its
transpose), with either the lower or upper triangle being stored.
For the purpose of storing a matrix, the out-of-core routines do not have to distinguish between a triangular,
symmetric, or Hermitian matrix; they must know only which part of the matrix is being stored (that is, the
full matrix, the lower triangle, or the upper triangle).
In the Level 2 BLAS routines, packed storage implies a linearized storage scheme. For the out-of-core
routines, packed storage is similar, but more complicated. Because it is the page structure of the virtual
matrix binary file that is linearized, pages that correspond to the upper (or lower) part of a triangular matrix
are omitted.
Three possible storage modes are possible:
• FULL — The full matrix is stored
• LOWER — Only the lower triangle is stored
• UPPER — Only the upper triangle is stored
To define this storage mode, call the VSTORAGE(3S) routine, which has the calling sequence:
CAL L VST ORA GE( nunit, mode)
The nunit argument is an integer that gives the unit number of the virtual matrix, and mode is a character
string giving the storage mode. See VSTORAGE(3S) for further information.
Performance Measurement
The out-of-core software has a built-in feature for performance measurement, and it will collect various
performance statistics automatically. The user can print these statistics when calling the VEND(3S) routine,
either by providing a nonzero argument to the VEND(3S) routine (within the program) or by setting the
VBLAS_STATISTICS environment variable (before run time). To set this environment variable in the C
shell, enter the following:
set env VBL AS_ STA TIS TIC S
You may use this feature in addition to the usual performance tools (for example, see the procstat(1)
man page).
Error Reporting
When the out-of-core software diagnoses an error, it writes an error diagnostic to stderr and aborts. If the
error is diagnosed by the out-of-core routines themselves, the error message should be complete and self-
explanatory. For instance, a common error is to provide an insufficient amount of memory for workspace.
In this case, the error diagnostic will indicate how much memory was needed.
Example:
*** Err or in rou tin e: VBE GIN
*** Ins uff ici ent mem ory was given;
min imu m req uir ed (de cim al wor ds) = 198 144
If the error was diagnosed by a lower-level system or library routine, the diagnostic will include the error
code. Usually, you can use the explain(1) command to get more information about the error by entering
one of the following commands:
exp lai n sys -xxx
The character string xxx represents the error code listed in the diagnostic. Use explain sys for error
status codes less than 100, and explain lib for higher-numbered codes (see the explain(1) man page
in the UNICOS User Commands Reference Manual, for more information).
For example, suppose that unit 1 was assigned to file /tmp/xxx/yyy/zzz, using the command:
ass ign -a /tmp/x xx/ yyy /zzz u:1
But suppose that the /tmp/xxx/yyy directory has not been created. When the out-of-core routine tries to
create the file, it cannot, and aborts after printing the following message:
*** Err or in rou tin e: pag e_r equ est
*** Err or status on AQO PEN for uni t num ber : 1
*** Err or status on AQO PEN = -2
Because AQIO routines are used internally for input and output, the error is usually detected by
AQOPEN(3F), AQREAD(3F), or AQWRITE(3F). In this case, it was AQOPEN. Of more concern to users,
however, is the specific error status. The diagnostic denotes that the error occurred on unit number 1, and
that the error status code was -2. You can enter the following command, which prints a further description,
that explains that one of the directories in a path name does not exist:
exp lai n sys -2
Performance Tuning
The most important tuning parameter for the out-of-core routines is the value of nwork, the amount of buffer
space. This value is set either as an explicit argument to VBEGIN(3S) or by setting the
VBLAS_WORKSPACE environment variable before run time. If the virtual matrix is disk resident, larger
buffer space means faster I/O performance, within certain limits.
CPU time is essentially unaffected by this parameter; only I/O wait time, and hence, total wall-clock time,
are affected.
As always with out-of-core techniques, a trade-off exists between performance and size. If you use more
memory, performance will be better, but the program size increases. It is difficult to give firm rules for how
much memory you should use, but the following are some guidelines:
• The absolute minimum amount of out-of-core routine page-buffer space must be enough to hold three
pages.
• If the virtual matrix is disk resident, larger buffer space means better I/O performance, within certain
limits.
• If the virtual matrix is SSD resident, much less buffer space is needed to obtain good performance.
• If running in a dedicated environment, you should use as much memory as is available.
• If running in a batch environment, it may be desirable to use less memory, so that the job can be
scheduled and run at the same time that other user jobs are running; that is, the turnaround time of a
smaller job might be much less than for a large job, even if the I/O wait time for the smaller job is larger.
• Use enough buffer space for one "column" of pages; that is, n . np words, for which np is the number of
columns per page, and n is the leading dimension of the matrix (rounded up to a multiple of np). If you
use twice this much memory, performance will improve.
• The use of Strassen’s algorithm almost always speeds the computation for a small increase in memory.
The VBLAS statistics report the amount of memory used by Strassen’s algorithm.
• Packed storage mode should be used when appropriate, because it will save disk space with no penalty in
CPU time.
For solution of a general matrix (with VSGETRF and VSGETRS, or with VCGETRF and VCGETRS), a
special memory requirement exists. These routines need enough buffer space to contain one "column" of
pages; that is, if the matrix is 5000 by 5000, the buffer space must be 5000 by 256 (assuming 256 is the
page size). This requirement is necessary because of the nature of Gaussian elimination with partial
pivoting. To do the pivots, the performance might be extremely poor if less memory was used.
ENVIRONMENT VARIABLES
These environment variables change the default behavior of either the VBEGIN or the VEND routine. You
can override the effect of any of these settings by the corresponding argument of the affected routine.
VBLAS_PAGESIZE
Numeric value of the default page size, np. VBEGIN uses this variable to set up in-memory pages
for virtual matrices. Each page acts as an np-by-np submatrix of a virtual matrix. If unspecified,
VBEGIN defaults to np = 256.
VBLAS_STATISTICS
Flag to determine whether to print performance statistics after using the out-of-core routines.
VEND uses this variable to determine whether it should print statistics to stdout after terminating
out-of-core processing. If this variable is set (even if it has no value), the default behavior of
VEND is to print the statistics. If unspecified, VEND prints no statistics by default.
VBLAS_STRASSEN
Flag to determine whether to use Strassen’s algorithm for matrix multiplication. VBEGIN uses this
variable to determine whether it should set up data structures for Strassen’s algorithm. If the data
structures are set up, all virtual matrix multiplies use Strassen’s algorithm. If this variable is set
(even if it has no value), the default behavior of VBEGIN is to set up for Strassen’s algorithm. If
unspecified, VBEGIN defaults to the regular (inner product) matrix multiply algorithm.
VBLAS_WORKSPACE
Numeric value of nwork, the number of words of memory to set aside for I/O buffering (pages). If
unspecified, VBEGIN defaults to nwork =6 . np 2 (the number of words of memory required for six
pages).
EXAMPLES
Some short examples illustrate how you can use these subroutines to manipulate virtual matrices. For an
explanation of the specific arguments to the subroutines, see the man pages for the individual routines.
Example 1: This example shows how you can use routine SCOPY2RV(3S) to create a virtual matrix. This
program creates a virtual matrix on unit number 1 (which, by default, is on file fort.1). Within the
program, this matrix is referred to as V, which corresponds to IV, an integer parameter set equal to 1.
The first step is to call routine VBEGIN(3S) for initialization. Next, create a vector, X, of random numbers,
and copy it to one column of the virtual matrix, using routine SCOPY2RV(3S). This procedure is repeated
for each column J of the virtual matrix. The program is as follows:
* Cre ate a vir tual mat rix of ran dom number s of siz e n by n.
INT EGE R N
PAR AME TER (N = 200 0)
INT EGE R IV ! uni t num ber of the vir tua l mat rix V
PAR AME TER (IV = 1)
REA L X(N ) ! vec tor for sto rin g a col umn of V
CAL L VEN D
END
Example 2: This example illustrates the Virtual BLAS routine VSGEMM(3S) and multiplying a virtual matrix
by itself. The example assumes that the virtual matrix on unit 1 was already created, possibly by the
program in example 1. The example multiplies this virtual matrix by itself, resulting in a new virtual matrix,
called W, that corresponds to unit number 2 (integer IW). The following program copies the first column of
the result matrix W into array X, and then it prints out the first element of X, which is the value of virtual
matrix element W(1,1):
INT EGE R N
PAR AME TER (N = 200 0)
INT EGE R IV, IW ! uni t num bers of the vir tua l mat ric es
PARAME TER (IV = 1, IW = 2)
REAL X(N) ! vec tor for sto rin g a col umn of W
CALL VBEGIN
* Multip ly virtua l mat rix V by its elf , cre ati ng vir tua l
* matrix W = V*V
CAL L VSGEMM (’N OTR ANSPOS E’, ’NO TRA NSP OSE ’, N, N, N, 1.0 ,
& IV, 1, 1, N, IV, 1, 1, N, 0.0 , IW, 1, 1, N)
CAL L VEN D
END
Example 3: This example shows sample usage protocol for solving systems of equations. A program to
solve a large system of equations by using the Virtual LAPACK routines might be organized according to
the following general outline. This sample outline assumes that the user can generate the original matrix one
row at a time (by computing it, reading it, or whatever).
1. Call VBEGIN(3S) to initialize the virtual matrix routines.
2. For each row of the matrix, call a virtual copy routine to store the row in a virtual matrix. Likewise,
create a virtual matrix of right-hand sides.
3. Call routine VSGETRF(3S) to factor the general matrix.
4. Call routine VSGETRS(3S) to solve the right-hand sides.
5. For each column of the solution matrix, call a virtual copy routine to fetch the solution vector and
process it.
6. Call the VEND(3S) routine to terminate the virtual matrix routines and to close the files.
NAME
SCOPY2RV, CCOPY2RV – Copies a submatrix of a real or complex matrix in memory into a virtual matrix
SYNOPSIS
CALL SCOPY2RV (m, n, a, lda, nunit, iv, jv, ldv)
CALL CCOPY2RV (m, n, a, lda, nunit, iv, jv, ldv)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
SCOPY2RV copies a matrix contained in a two-dimensional array of type REAL located in central memory
into a submatrix of a virtual matrix.
CCOPY2RV copies a matrix contained in a two-dimensional array of type COMPLEX located in central
memory into a submatrix of a virtual matrix.
These routines have the following arguments:
m Integer. (input)
Number of rows.
n Integer. (input)
Number of columns.
a SCOPY2RV: Real array of dimension (lda, *). (input)
CCOPY2RV: Complex array of dimension (lda, *). (input)
Contains real (in memory) input matrix.
lda Integer. (input)
Leading (first) dimension of a.
nunit Integer. (input)
Unit number of the virtual matrix.
This routine changes the contents of the virtual matrix. The virtual matrix itself contains either
real (SCOPY2RV) or complex (CCOPY2RV) elements.
iv Integer. (input)
Starting row index of the virtual matrix (1 to ldv).
jv Integer. (input)
Starting column of the virtual matrix (1 to n).
ldv Integer. (input)
Leading (first) dimension of the virtual matrix.
NOTES
These routines are two-dimensional analogues of the Level 1 BLAS routines SCOPY and CCOPY (see
SCOPY(3S)). The initial S in SCOPY2RV means "single-precision real," the initial C in CCOPY2RV means
"complex," the 2 means "two-dimensional," and the RV means "real (in memory) to virtual (on disk or
SSD)."
These routines provide the only available method for reading data directly from memory into a virtual
matrix. Companion routines SCOPY2VR and CCOPY2VR go in the opposite direction: virtual to real.
EXAMPLES
The following examples show how to copy various types of matrices from central memory into a virtual
matrix.
Example 1: Copy vector X, of N real elements, to row I of the virtual matrix on unit number 3. Suppose
that the virtual matrix is of size N by N, so that the leading dimension is N. Because X is a vector, the
leading dimension of X is irrelevant, and you can use the constant 1 for the lda argument.
CALL SCOPY2 RV( 1, N, X, 1, 3, I, 1, N)
Example 2: Copy vector X, of N complex elements, to column J of the virtual matrix on unit number 3,
with the same assumptions as in example 1.
CALL CCOPY2 RV( N, 1, X, 1, 3, 1, J, N)
Example 3: Copy the 100-by-100 matrix A to the 100-by-100 submatrix of the virtual matrix on unit
NUNIT, beginning at virtual matrix subscript location (I, J). Assume that the virtual matrix has leading
dimension 3000.
CALL SCOPY2 RV( 100 , 100 , A, 100 , NUN IT, I, J, 300 0)
Example 4: Copy the single element X(I, J) of a complex matrix X to element (IV, JV) of the virtual
matrix on unit NUNITX. Assume that the leading dimension of the virtual matrix is LDXV. Because this
subroutine call copies one element, the leading dimension of X is irrelevant, and you can use the constant 1
for the lda argument.
CALL CCOPY2 RV( 1, 1, X(I , J), 1, NUN ITX , IV, JV, LDX V)
Example 5: Copy the lower triangular part (the part below the main diagonal) of the first 100 rows and
columns of matrix A to the lower triangular part of the virtual matrix NVA, of leading dimension 1000,
starting at virtual array element NVA(101, 101).
CALL VSTORA GE( NVA , ’LO WER ’)
DO, I = 1, 100
CAL L SCO PY2 RV( 1, I, A(I , 1), 100 , NVA , 100 +I, 101 , 100 0)
END DO
SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
MSCOPY2VR(3S) for a desciption of SCOPY2VR and CCOPY2VR, each of which copy a submatrix of a real
or complex virtual matrix into a real or complex matrix in central memory (the copy routines for the
opposite direction are SCOPY2RV and CCOPY2RV)
NAME
SCOPY2VR, CCOPY2VR – Copies a submatrix of a virtual matrix to a real or complex (in memory) matrix
SYNOPSIS
CALL SCOPY2VR (m, n, nunit, iv, jv, ldv, a, lda)
CALL CCOPY2VR (m, n, nunit, iv, jv, ldv, a, lda)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
SCOPY2VR copies a submatrix of a virtual matrix into a two-dimensional array of type REAL located in
central memory.
CCOPY2VR copies a submatrix of a virtual matrix into a two-dimensional array of type COMPLEX located in
central memory.
These routines have the following arguments:
m Integer. (input)
Number of rows.
n Integer. (input)
Number of columns.
nunit Integer. (input)
Unit number of the virtual matrix. The virtual matrix itself contains either real (SCOPY2VR) or
complex (CCOPY2VR) elements.
iv Integer. (input)
Starting row index of the virtual matrix (1 to m).
jv Integer. (input)
Starting column of the virtual matrix (1 to n).
ldv Integer. (input)
Leading (first) dimension of the virtual matrix.
a SCOPY2VR: Real array of dimension (lda, *). (output)
CCOPY2VR: Complex array of dimension (lda, *). (output)
Contains real (in memory) output matrix.
lda Integer. (input)
Leading (first) dimension of a.
NOTES
These routines are two-dimensional analogues of the Level 1 BLAS routines SCOPY and CCOPY (see
SCOPY(3S)). The initial S in SCOPY2VR means "single-precision real," the initial C in CCOPY2VR means
"complex," the 2 means "two-dimensional," and the VR means "virtual (on disk or SSD) to real (in
memory)."
These routines provide the only available method for reading data from a virtual matrix into memory.
Companion routines SCOPY2RV and CCOPY2RV go in the opposite direction: real to virtual.
EXAMPLES
The following examples show how to copy various types of matrices from a virtual matrix into central
memory.
Example 1: Copy row I of the virtual matrix on unit number 3 to the real vector X. Suppose that the
virtual matrix is of size N by N, so that the leading dimension is N. Because X is a vector, the leading
dimension of X is irrelevant, and you can use the constant 1 for the lda argument.
Example 2: Copy column J of the complex virtual matrix on unit number 3 to the complex vector X, with
the same assumptions as in example 1.
Example 3: Copy the 100-by-100 submatrix of the virtual matrix on unit NUNIT, beginning at virtual
matrix subscript location (I, J), to the 100-by-100 matrix A. Assume that the virtual matrix has leading
dimension 3000.
Example 4: Copy the single element of the complex virtual matrix on unit NUNITX, at subscript position
(IV, JV), to the single element X(I, J) of the complex matrix X. Assume that the leading dimension
of the virtual matrix is LDXV. Because this subroutine call copies one element, the leading dimension of X
is irrelevant, and you can use the constant 1 for the lda argument.
CAL L CCO PY2VR( 1, 1, NUN ITX , IV, JV, LDX V, X(I , J), 1)
SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
SCOPY2RV(3S), which documents SCOPY2RV and CCOPY2RV, each of which copies a submatrix of a real
or complex matrix in central memory into a virtual matrix (the copy routines for the opposite direction are
SCOPY2VR and CCOPY2VR)
NAME
VBEGIN – Initializes the out-of-core routine data structures
SYNOPSIS
CALL VBEGIN
CALL VBEGIN [(nwork)]
CALL VBEGIN [(nwork, nstrassen)]
CALL VBEGIN [(nwork, nstrassen, np)]
IMPLEMENTATION
UNICOS systems
DESCRIPTION
VBEGIN initializes the data structures in memory that are used to handle virtual matrices. A program must
call VBEGIN once before beginning virtual matrix work and must call the companion routine, VEND(3S),
after virtual matrix work is complete. A program can have more than one "code block" of virtual matrix
work, but each block must begin with a call to VBEGIN and end with a call to VEND.
This routine takes as its first argument the minimum size of the I/O buffer space the user wants to allocate.
The routine allocates this much memory for buffers (using a malloc(3C) library call). The routine also
allocates a small additional amount of memory for its own data structures. When the VEND(3S) routine is
called to handle terminal processing, all allocated memory is freed. Other arguments determine some of the
inner working of the out-of-core routines.
All of these arguments are optional; you can specify them by using environment variables, with the
following order of precedence:
1. Explicit argument; if the argument is given explicitly in the VBEGIN call, any conflicting settings are
ignored.
2. Environment variable; if no explicit argument is given, but there is an environment variable setting, that
setting is used.
3. If neither the argument nor the environment variable is set, there is an internal default (shown in the
argument list that follows as "DEFAULT: . . .").
This routine has the following optional arguments:
nwork Integer. (input)
Minimum number of words to use for buffer space for I/O. The minimum number of words is
3 . np 2, enough space for three "pages" (see the np argument description).
The corresponding environment variable is VBLAS_WORKSPACE, which you should set to a
numeric value that gives the number of words to use.
DEFAULT: nwork = 6 . np 2, enough space for six "pages."
NOTES
The most important tuning parameter for the out-of-core routines is the value of nwork, the minimum
amount of buffer space.
If the virtual matrix is disk resident, larger buffer space means faster I/O performance, within certain limits.
CPU time is essentially unaffected by the amount of buffer space; only I/O wait time, and hence, total wall-
clock time, are affected.
As always with out-of-core techniques, a trade-off exists between performance and size. If you use more
memory, performance will be better, but the program size increases. It is difficult to give firm guidelines as
to how much memory you should use. If running in a multiuser environment, it may be desirable to use less
memory, so that the job can be scheduled and run at the same time that other user jobs are running. This
means that the turnaround time of a smaller job might be much less than for a large job, even if the I/O wait
time for the smaller job is larger. If running in a dedicated environment, it would make sense to use as
much available memory as possible.
One rule of thumb for good performance is to use enough buffer space for one column of pages. For
example, if the page size is np and the largest of the leading dimensions of your virtual matrices (rounded up
to the next multiple of np) is n, set the minimum buffer size to be nwork = n . np . If you use twice this
much memory (nwork = 2 . n . np ), performance will improve.
If the virtual matrix resides in SSD, much less buffer space is needed to obtain good performance.
For solving a general virtual matrix (with VSGETRF(3S) and VSGETRS(3S)), the preceding rule of thumb
becomes a minimum memory requirement. These routines need enough buffer space to contain one column
of pages; that is, if you are factoring a virtual matrix of size 5000 by 4000 and the page size is np = 256, the
minimum buffer space setting must be nwork = 5000 . 256 = 1,280,000words. This size restriction is needed
because of the nature of Gaussian elimination with partial pivoting. If less memory is used, the performance
when doing pivots might be extremely poor.
ENVIRONMENT VARIABLES
These environment variables change the default behavior of the VBEGIN routine. To override the effect of
any of these settings, use the corresponding argument of VBEGIN.
VBLAS_PAGESIZE
Numeric value of the default page size, np. VBEGIN uses this variable to set up in-memory pages
for virtual matrices. Each page acts as an np-by-np submatrix of a virtual matrix. If unspecified,
VBEGIN defaults to np = 256.
VBLAS_STRASSEN
Flag to determine whether to use Strassen’s algorithm for matrix multiplication. VBEGIN uses this
variable to determine whether it should set up data structures for Strassen’s algorithm. If the data
structures are set up, all virtual matrix multiplies use Strassen’s algorithm. If this variable is set
(even if it has no value), the default behavior of VBEGIN is to set up for Strassen’s algorithm. If
unspecified, VBEGIN defaults to the regular (inner product) matrix multiply algorithm.
VBLAS_WORKSPACE
Numeric value of nwork, the number of words of memory to set aside for I/O buffering (pages). If
unspecified, VBEGIN defaults to nwork = 6 . np 2 (the number of words of memory required for
six pages).
SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VEND(3S), VSGETRF(3S), VSGETRS(3S)
malloc(3C) in the UNICOS System Libraries Reference Manual
NAME
VEND – Handles terminal processing for the out-of-core routines
SYNOPSIS
CALL VEND [(info)]
IMPLEMENTATION
UNICOS systems
DESCRIPTION
The VEND routine does termination processing for the out-of-core routines. You must call VEND as the last
step in out-of-core processing. This routine ensures that any output in progress from the out-of-core routines
is completed, and then it deallocates all of the storage space that VBEGIN(3S) allocated.
After calling VEND, you can call the VBEGIN(3S) routine again, if you desire, to reinitialize the out-of-core
routines (including their performance statistics).
This routine has the following optional argument:
info Integer. (input)
Flag to request out-of-core routine performance statistics output.
If you supply this argument with a nonzero value, a set of performance statistics about the out-of-
core routines is printed on stdout. If you omit the argument, the statistics are printed if and only
if the VBLAS_STATISTICS environment variable is set. Statistics reported include the following:
• Total elapsed time
• Total CPU time
• Total I/O wait time
• Total workspace used
• Number of words read and written
• A distribution of wait times
You can use this performance statistics feature in addition to the usual performance tools.
NOTES
If a program terminates abnormally before VEND is called, you should assume that any virtual matrices
created or changed by the program were destroyed, because the integrity of their data cannot be guaranteed.
Virtual matrices used only for input will remain valid.
You can use the optional statistics report that VEND prints to judge whether using more or less memory for
buffer space would significantly affect performance.
Generally, if the total I/O wait time is a small percentage of wall-clock time, the program is compute-bound
and no more memory is needed.
Other useful statistics are the virtual read and write rates. These statistics measure the amount of data
transferred divided by the time when the out-of-core routines were idle because they were waiting for I/O. If
the virtual read and write rates are much faster than the physical speed of the device being used, ample
memory was used for buffer space.
ENVIRONMENT VARIABLES
The following environment variable changes the default behavior of the VEND routine. To override the
effect of this setting use the info argument.
VBLAS_STATISTICS
Flag to determine whether to print performance statistics after using the out-of-core routines.
VEND uses this variable to determine whether it should print statistics to stdout after terminating
out-of-core processing. If this variable is set (even if it has no value), the default behavior of
VEND is to print the statistics. If unspecified, VEND prints no statistics by default.
SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VBEGIN(3S)
NAME
VSGEMM, VCGEMM – Multiplies a virtual real or complex general matrix by a virtual real or complex general
matrix
SYNOPSIS
CALL VSGEMM (transa, transb, m, n, l, alpha, nunita, ia1, ja1, lda, nunitb, ib1, jb1, ldb,
beta, nunitc, ic1, jc1, ldc)
CALL VCGEMM (transa, transb, m, n, l, alpha, nunita, ia1, ja1, lda, nunitb, ib1, jb1, ldb,
beta, nunitc, ic1, jc1, ldc)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
VSGEMM and VCGEMM each perform one of the following matrix-matrix operations:
C ← α op(A)op(B) + β C
where α and β are scalars; A, B, and C are virtual matrices or submatrices; op(A) is an m-by-k matrix; op(B)
is a k-by-n matrix; C is an m-by-n matrix; and op(X) is one of the following:
op(X) = X
T
op(X) = X (transpose of X)
H
op(X) = X (conjugate transpose of X; VCGEMM only)
These routines have the following arguments:
transa Character*1. (input)
Specifies the form of op(A) to be used in the matrix multiplication, as follows:
transa = ’N’ or ’n’: op(A) = A
T
transa = ’T’ or ’t’: op(A) = A
T H
transa = ’C’ or ’c’: op(A) = A (VSGEMM), or op(A) = A (VCGEMM)
This argument can be any length. Only the first character is significant (for example, ’t’ and
’transpose’ have the same effect).
transb Character*1. (input)
Specifies the form of op (B ) to be used in the matrix multiplication, as follows:
transb = ’N’ or ’n’: op(B) = B
T
transb = ’T’ or ’t’: op(B) = B
T H
transb = ’C’ or ’c’: op(B) = B (VSGEMM), or op(B) = B (VCGEMM)
This argument can be any length. Only the first character is significant (for example, ’t’ and
’transpose’ have the same effect).
m Integer. (input)
Number of rows of output matrix.
n Integer. (input)
Number of columns of output matrix.
l Integer. (input)
Number of columns of A = number of rows of B.
alpha VSGEMM: Real. (input)
VCGEMM: Complex. (input)
Scalar factor α.
nunita Integer. (input)
Fortran unit number of virtual matrix A. The virtual matrix itself is composed of real numbers
(VSGEMM) or complex numbers (VCGEMM).
ia1 Integer. (input)
Row subscript of first element of virtual matrix A.
ja1 Integer. (input)
Column subscript of first element of A.
lda Integer. (input)
Leading virtual dimension of virtual matrix A.
nunitb Integer. (input)
Fortran unit number of virtual matrix B. The virtual matrix itself is composed of real numbers
(VSGEMM) or complex numbers (VCGEMM).
ib1 Integer. (input)
Row subscript of first element of virtual matrix B.
jb1 Integer. (input)
Column subscript of first element of B.
ldb Integer. (input)
Leading virtual dimension of virtual matrix B.
beta VSGEMM: Real. (input)
VCGEMM: Complex. (input)
Scalar factor β.
nunitc Integer. (input)
Fortran unit number of virtual matrix C. The virtual matrix itself is composed of real numbers
(VSGEMM) or complex numbers (VCGEMM), and it is changed by this routine.
ic1 Integer. (input)
Row subscript of first element of virtual matrix C.
jc1 Integer. (input)
Column subscript of first element of C.
NOTES
This routine is the virtual counterpart of the Level 3 BLAS routine SGEMM. The calling sequence is similar
to SGEMM. The difference is that in SGEMM a matrix operand (A, for example) would be defined by the
following two arguments:
a(i,j) The starting position of the matrix A within array a
lda The leading dimension of the array a (distance between adjacent elements of a row of A)
In VSGEMM, however, a matrix operand (A) would be defined by the following four arguments:
nunita The Fortran unit number of the file that contains the virtual matrix
ia1 Row subscript within the virtual matrix file at which the submatrix A begins
ja1 Column subscript within the virtual matrix file at which the submatrix A begins
lda Leading virtual dimension of the virtual matrix file (virtual subscript distance between adjacent
elements of a row of the submatrix A)
EXAMPLES
Two examples of virtual matrix multiplication follow.
Example 1: Multiply the complex virtual matrix V, of dimension N-by-N, by itself, creating complex virtual
matrix W = V * V. Assume the virtual matrix V was already created, on unit 1, and that this operation will
create the virtual matrix W on unit 2.
INTEGE R V, W
PARAME TER (V = 1, W = 2)
CALL VBEGIN
CALL VCGEMM (’N OTR ANS POS E’, ’NO TRANSP OSE’, N, N, N,
& (1. 0,0.0) ,V,1,1 ,N,V,1 ,1,N,( 0.0,0. 0),W,1 ,1,N)
CAL L VEN D
END
Example 2: Let X be an in-memory vector of length N, consisting of random numbers. Copy X to the
virtual matrix on unit 1, which is defined as the parameter VX, that has dimension N-by-1. Multiply this
virtual matrix by its transpose, giving an N-by-N virtual matrix on unit 2, which is defined as the VY
parameter.
INT EGE R N
PAR AME TER (N = 100 0)
REA L X(N )
INT EGE R VX, VY
PARAME TER (VX = 1, VY = 2)
C ini tializ e
CALL VBEGIN
CAL L VEN D
END
SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
SGEMM(3S) for a description of SGEMM(3S) and CGEMM(3S), the in-memory equivalents of the out-of-core
routines VSGEMM and VCGEMM
NAME
VSGETRF, VCGETRF – Computes an LU factorization of a virtual general matrix with real or complex
elements, using partial pivoting with row interchanges
SYNOPSIS
CALL VSGETRF (m, n, nunita, lda, ipiv, info)
CALL VCGETRF (m, n, nunita, lda, ipiv, info)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
VSGETRF and VCGETRF are the out-of-core versions of SGETRF and CGETRF (see SGETRF(3L)). Each
computes an LU factorization of a real (VSGETRF) or complex (VCGETRF) general m-by-n matrix A, using
partial pivoting with row interchanges.
The factorization has the form:
A =P .L .U
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if
m > n), and U is upper triangular (upper trapezoidal if m < n).
These routines have the following arguments.
m Integer. (input)
The number of rows of the virtual matrix A. m ≥ 0.
n Integer. (input)
The number of columns of the virtual matrix A. n ≥ 0.
nunita Integer. (input)
VSGETRF: Unit number of the virtual real matrix of dimension (lda, n).
VCGETRF: Unit number of the virtual complex matrix of dimension (lda, n).
The virtual matrix itself is used for both input and output.
On entry, A, the m-by-n matrix to be factored, is stored in the virtual matrix file, starting at
subscript (1,1). On exit, the virtual matrix A is replaced by the triangular matrix factors L and
U; the unit diagonal elements of L are not stored.
lda Integer. (input)
The leading (first) dimension of the virtual matrix A. lda ≥ MAX(1,m).
ipiv Integer array of dimension (MIN(m,n)). (output)
The pivot indices. Row i of the matrix was interchanged with row ipiv(i).
info Integer. (output)
=0 Successful exit.
<0 If info = – k, the kth argument had an illegal value.
>0 If info = k, U(k,k) is exactly 0. The factorization has been completed, but the factor U is
exactly singular, and division by 0 will occur if it is used to solve a system of equations or
to compute the inverse of A.
NOTES
This routine requires workspace of two types:
• If m is the first argument (number of rows) and np is the page size established by VBEGIN(3S), the
amount of buffer space that was allocated in the VBEGIN(3S) routine must be at least m . np words.
This minimum size of workspace is necessary to prevent a huge amount of page thrashing (excessive I/O
to reread data) when doing the partial pivoting operations.
• In addition to the buffer space allocated by VBEGIN(3S), this routine also allocates m . np words of
workspace for its own use. Thus, the total memory requirement is a minimum of 2 . np . m (plus the
much smaller workspace for data structures that is also allocated by VBEGIN(3S)).
If insufficient memory is available, the routine exits with an error message, which indicates how much
memory was needed.
EXAMPLES
This example illustrates using both VSGETRF and VSGETRS to solve a set of 1000 linear equations in 1000
unknowns. It is assumed that a virtual matrix of size 1000 by 1000 was created on unit 1 (defined as
parameter VA), representing the equations, and that a virtual matrix of size 1000 by 1 was created on unit 2
(defined as parameter VB), representing the right-hand side. The square matrix is factored and that
factorization is used, along with the right-hand side matrix, to compute a solution matrix.
C Comput e the LU factor iza tio n of the vir tua l mat rix A on uni t 1,
C which is assume d to hav e bee n cre ate d pre vio usl y and have
C dimension 100 0 by 1000.
C Solve the equati on A*X = B
C whe re B is a vir tual matrix on unit 2, ass umed to hav e dim ension
C 100 0 by 1.
C
INTEGE R M, LD, VA, VB
PAR AME TER (M = 100 0, LD = M, VA = 1, VB = 2)
INT EGE R IPI V(L D)
REA L X(L D)
DO, i = 1, 5
PRI NT *, ’X(’, I, ’) = ’, X(I )
END DO
CAL L VEND
PRI NT *, ’do ne’
END
SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VBEGIN(3S)
VSGETRS(3S) (which documents VSGETRS(3S) and VCGETRS(3S)) to solve linear systems based on the
factorization computed by VSGETRF or VCGETRF, respectively
SGETRF(3L) (available only online) for a description of SGETRF(3L) and CGETRF(3L), which are the in-
memory equivalents of the out-of-core routines VSGETRF and VCGETRF
NAME
VSGETRS, VCGETRS – Solves a virtual system of linear equations, using the LU factorization computed by
VSGETRF(3S) or VCGETRF(3S)
SYNOPSIS
CALL VSGETRS (trans, n, nrhs, nunita, lda, ipiv, nunitb, ldb, info)
CALL VCGETRS (trans, n, nrhs, nunita, lda, ipiv, nunitb, ldb, info)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
VSGETRS and VCGETRS are virtual matrix versions of the LAPACK routines SGETRS and CGETRS (see
SGETRS(3L)). VSGETRS or VCGETRS uses the LU factorization of matrix A as computed by
VSGETRF(3S) or VCGETRF(3S), respectively.
VSGETRS and VCGETRS solve one of the following linear systems:
AX = B
T
A X= B
H
A X = B (VCGETRS only)
T H
A is the transpose of A, A is the conjugate transpose of A, A is an n-by-n matrix, and X and B are n-by-
nrhs matrices.
These routines have the following arguments:
trans Character*1. (input)
Specifies the solution, X, to be computed as follows:
trans = ’N’ or ’n’ (no transpose): AX = B
T
trans = ’T’ or ’t’ (transpose): A X = B
H T
trans = ’C’ or ’c’ (conjugate transpose): A X = B (VCGETRS), or A X = B (VSGETRS)
This argument can be any length. Only the first character is significant (for example, ’t’ and
’transpose’ have the same effect).
n Integer. (input)
Number of rows and columns of the matrix A. n ≥ 0.
nrhs Integer. (input)
Number of right-hand sides. The number of columns of the matrix B. nrhs ≥ 0.
nunita Integer. (input)
VSGETRS: Unit number of the real virtual matrix of dimension (lda, n).
VCGETRS: Unit number of the complex virtual matrix of dimension (lda, n).
nunita is itself a virtual matrix file used only for input. The matrix contains the LU factorization
of the matrix A, which must be computed by VSGETRF or VCGETRF before calling VSGETRS
or VCGETRS, respectively.
lda Integer. (input)
Leading (first) dimension of the virtual matrix A. lda ≥ MAX(1, n).
ipiv Integer array of dimension n. (input)
Array of pivot indices as determined by VSGETRF or VCGETRF.
nunitb Integer. (input)
VSGETRS: Unit number of the real virtual matrix B of dimension (ldb, n).
VCGETRS: Unit number of the complex virtual matrix B of dimension (ldb, n).
nunitb is itself a virtual matrix file used for input and output. On entry, B contains the right-
hand side vectors for the systems of linear equations. On exit, the solution vectors, columns of
the matrix X, replace the right-hand side vectors.
ldb Integer. (input)
Leading (first) dimension of the virtual matrix B. ldb ≥ MAX(1, n).
info Integer. (output)
= 0 Normal return (successful exit).
< 0 If info = – k, the kth argument has an illegal value.
NOTES
This routine requires workspace of two types:
• The amount of buffer space that was allocated in the VBEGIN routine must be at least m . np words; m
is the first argument (number of rows), and np is the page size established by VBEGIN(3S). This
minimum size of workspace is necessary to prevent a huge amount of page thrashing (excessive I/O to
reread data) when doing the partial pivoting operations.
• In addition to the buffer space allocated by VBEGIN, this routine also allocates m*np words of workspace
for its own use. Thus, the total memory requirement is a minimum of 2 . np . m (plus the much smaller
workspace for data structures that is also allocated by VBEGIN).
If insufficient memory is available, the routine exits with an error message, which indicates how much
memory is needed.
SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VSGETRF(3S) for an example of using VSGETRS in conjunction with VSGETRF
VBEGIN(3S), VCGETRF(3S)
SGETRS(3L) (available only online) for a description of SGETRS and CGETRS, which are the LAPACK
routines on which VSGETRS and VCGETRS are based
NAME
VSPOTRF – Computes the Cholesky factorization of a real symmetric positive definite virtual matrix
SYNOPSIS
CALL VSPOTRF (uplo, n, nunita, lda, info)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
VSPOTRF is the virtual (out-of-core) implementation of the LAPACK routine SPOTRF(3L) (documented
online). It computes the Cholesky factorization of the real symmetric positive definite virtual matrix A,
which is accessed through the I/O unit number nunita. It uses an out-of-core technique based on the Virtual
Level 3 Basic Linear Algebra Subroutines (Virtual Level 3 BLAS or VBLAS).
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of the symmetric matrix A is stored.
uplo = ’U’ or ’u’: the upper triangle of matrix A is stored.
uplo = ’L’ or ’l’: the lower triangle of matrix A is stored.
n Integer. (input)
Number of columns in virtual matrix A. n ≥ 0.
nunita Integer. (input)
Unit number of the file that contains the virtual matrix A.
The virtual matrix A is a real array of dimension (lda, n). The virtual matrix file nunita is used
for both input and output. On entry, the virtual matrix contains the symmetric positive definite
matrix to be factored.
If uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a contains the upper
triangular part of matrix A, and the strictly lower triangular part of a is not referenced.
If uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a contains the lower
triangular part of matrix A, and the strictly upper triangular part of a is not referenced.
On exit, the triangular factor L or U from the Cholesky factorization of matrix A overwrites
virtual matrix A. Given L or U, you can write the factorization as one of the following:
T T
A = U . U, where U is an upper triangular matrix, and U is the transpose of U.
T T
A = L . L , where L is a lower triangular matrix, and L is the transpose of L.
1 ≤ nunita ≤ 99
You may use packed storage mode for the virtual matrix. See the NOTES section.
NOTES
This routine allocates workspace of size (np . np )words ; np is the page size used by the virtual matrix
routines.
See INTRO_CORE(3S) for general information about Virtual BLAS and Virtual LAPACK out-of-core
software.
You can use packed storage mode to store virtual matrix A, reducing the required amount of disk space by
about 50%. The uplo argument itself does not necessarily imply that packed storage mode will be used;
however, if packed storage is used, the storage mode must agree with the value of uplo (that is, both ’U’ or
both ’L’).
To specify packed storage mode, you must use a prior call to the VSTORAGE(3S) routine.
SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VSPOTRS(3S) to solve linear systems by using the factorization computed by VSPOTRF
VSTORAGE(3S) for a definition of the packed storage mode for the virtual matrix used by VSPOTRF
SPOTRF(3L) (available only online), which is the in-memory equivalent of the out-of-core routine VSPOTRF
NAME
VSPOTRS – Solves a virtual system of linear equations with a symmetric positive definite matrix whose
Cholesky factorization has been computed by VSPOTRF(3S)
SYNOPSIS
CALL VSPOTRS (uplo, n, nrhs, nunita, lda, nunitb, ldb, info)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
VSPOTRS is the virtual (out-of-core) implementation of the LAPACK routine. SPOTRS(3L). It solves a
system of linear equations AX = B; A is a symmetric positive definite virtual matrix (stored on I/O unit
nunita), whose Cholesky factorization has been computed by VSPOTRF(3S).
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the Cholesky factor stored in virtual matrix A is upper triangular or lower
triangular.
uplo = ’U’ or ’u’: the factor is upper triangular.
uplo = ’L’ or ’l’: the factor is lower triangular.
n Integer. (input)
Number of rows (or columns) of virtual matrix A. n ≥ 0.
nrhs Integer. (input)
Number of right-hand sides. (The number of columns of matrix B.) nrhs ≥ 0.
nunita Integer. (input)
Unit number of the virtual matrix of dimension (lda, n).
The virtual matrix contains a Cholesky factor of A. The virtual matrix file nunita is used only
for input. On entry, the virtual matrix must contain the upper or lower triangular Cholesky
factor of the original virtual matrix A, as computed by VSPOTRF(3S). 1 ≤ nunita ≤ 99.
lda Integer. (input)
Leading dimension of the virtual matrix A. lda ≥ MAX(1, n).
nunitb Integer. (input)
Unit number of the virtual matrix B of dimension (ldb, nrhs). The virtual matrix file nunitb is
used for both input and output. On entry, the virtual matrix stores the right-hand-side vectors
(columns of B) for the system of linear equations. On exit, the virtual matrix contains the
solution vectors (columns of X). 1 ≤ nunitb ≤ 99.
NOTES
See INTRO_CORE(3S) for general information about the Virtual BLAS and Virtual LAPACK out-of-core
software.
You can use packed storage mode to store virtual matrix A, reducing the required amount of disk space by
about 50%. The uplo argument itself does not necessarily imply that packed storage mode will be used;
however, if packed storage is used, the storage mode must agree with the value of uplo (that is, both ’U’ or
both ’L’).
To specify packed storage mode, you must use a prior call to the VSTORAGE(3S) routine.
SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VSPOTRF(3S) to compute the factorization used by VSPOTRS
VSTORAGE(3S) for a definition of the packed storage mode for the virtual matrix used by VSPOTRS
SPOTRS(3L) (available only online), which is the in-memory equivalent of the out-of-core routine VSPOTRS
NAME
VSSYRK – Performs symmetric rank k update of a real or complex symmetric virtual matrix
SYNOPSIS
CALL VSSYRK (uplo, trans, n, k, alpha, nunita, ia1, ja1, lda, beta, nunitc, ic1, jc1, ldc)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
VSSYRK is the virtual matrix equivalent of the Level 3 Basic Linear Algebra Subprogram (Level 3 BLAS)
routines SSYRK(3S). VSSYRK performs a symmetric rank k update of a real symmetric virtual matrix.
VSSYRK performs one of the following symmetric rank k operations:
T
C ← α AA + β C
T
C←αA A+βC
T
where A is the transpose of A, α and β are scalars, C is an n-by-n symmetric virtual matrix, and A is an
n-by-k virtual matrix in the first operation listed previously, or a k-by-n virtual matrix in the second.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of virtual matrix C is referenced, as follows:
uplo = ’U’ or ’u’: only the upper triangular part of c is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of c is referenced.
trans Character*1. (input)
Specifies the operation to be performed, as follows:
T
trans = ’N’ or ’n’: C ← α AA + β C
T
trans = ’T’ or ’t’: C ← α A A + β C
n Integer. (input)
Specifies the order of virtual matrix C (the number of rows or columns in C). n ≥ 0.
k Integer. (input)
On entry with trans = ’N’ or ’n’: k specifies the number of columns of matrix A.
On entry with trans = ’T’ or ’t’: k specifies the number of rows of matrix A.
k ≥ 0.
alpha VSSYRK: Real. (input)
Scalar factor α.
NOTES
Each calling sequence is similar to that of the equivalent Level 3 BLAS routine, except that a real (in-
memory) matrix is specified by the following:
• Location (for example, A(I,J))
• Leading dimension (LDA)
A virtual (out-of-core) matrix is specified by the following:
• Unit number (NUNITA)
• Location within the file (IA1, JA1)
• Leading dimension (LDA)
SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
SSYRK(3S) for the in-memory equivalent of the out-of-core routine VSSYRK
NAME
VSTORAGE – Declares packed storage mode for a triangular, symmetric, or Hermitian (complex only)
virtual matrix
SYNOPSIS
CALL VSTORAGE (nunit, mode)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
VSTORAGE declares the mode of packed storage, mode, on the I/O unit nunit that contains a triangular,
symmetric, or Hermitian virtual matrix. These packed storage modes are for use with out-of-core routines in
the Level 3 Virtual Basic Linear Algebra Subprograms (Virtual BLAS or VBLAS) and the Virtual LAPACK
routines.
Packed storage of a triangular or symmetric matrix means that only half of the matrix is actually stored. If a
real virtual matrix is declared to be lower triangular, only the lower triangle is stored; if upper triangular,
only the upper triangle is stored. If the matrix is symmetric, either the lower or upper triangular part is
stored.
Likewise, a complex matrix may be lower or upper triangular, or it may be symmetric, with only the lower
or upper triangle being stored. In addition, a complex matrix may be Hermitian (equal to the conjugate of
its transpose), with only the lower or upper triangle being stored.
When reading from or writing to a virtual matrix, the out-of-core routines do not have to distinguish between
a triangular, symmetric, or Hermitian matrix. They must know only which part of the matrix is being stored:
the full matrix, the lower triangle, or the upper triangle. This defines the three modes of storage for a virtual
matrix:
• Full. The full matrix is stored.
• Lower. Only the lower triangle is stored.
• Upper. Only the upper triangle is stored.
VSTORAGE associates one of these modes of storage with the unit number of a virtual matrix. Then,
whenever an out-of-core routine has that unit number as a virtual matrix argument, it handles the virtual
matrix according to the mode of storage declared in the VSTORAGE call (see the NOTES section).
This routine has the following arguments:
nunit Integer. (input)
Fortran unit number of a virtual matrix.
mode Character*1. (input)
The storage mode of the virtual matrix stored in nunit.
NOTES
For a given virtual matrix, if VSTORAGE is used to set mode = ’L’ or ’l’, any out-of-core routine that
accesses that virtual matrix must not refer to any elements above the main diagonal. Similarly, if
VSTORAGE is used to set mode = ’U’ or ’u’, any out-of-core routine that accesses the virtual matrix must
not refer to any elements below the main diagonal. If the program tries to access any such elements, it will
terminate with the following error message:
Tried to acc ess the upper par t of a low er tri angula r
matrix (or vic e ver sa) , uni t num ber nunit.
Use one call to VSTORAGE for each virtual matrix, unless the virtual matrix uses full storage mode. In this
case, calling VSTORAGE is not necessary, because full matrix storage is the default.
The call (or calls) to VSTORAGE should occur right after the call to the VBEGIN(3S) routine. For a given
virtual matrix, any call to VSTORAGE must occur before the first reference to the virtual matrix. After the
mode is defined for a given virtual matrix, that matrix cannot change modes; the same mode applies to all
subsequent references to the matrix, up until the call to VEND(3S).
In the LAPACK and Level 3 BLAS routines, "packed storage" implies a linearized storage scheme. For the
Virtual LAPACK and VBLAS routines, "packed storage" is similar, but more complicated, because it is the
page structure of the virtual matrix binary file that is linearized; therefore, pages that correspond to the upper
(or lower) part of a triangular matrix are omitted.
EXAMPLES
The following program defines a virtual matrix on unit 1 to be stored in upper triangular packed mode, and
it sets all elements on or above the main diagonal to 1.
INTEGE R I
REAL X(N )
CAL L VEN D
END
SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VBEGIN(3S), VEND(3S)
NAME
VSTRSM, VCTRSM – Solves a virtual real or virtual complex triangular system of equations with multiple
right-hand sides
SYNOPSIS
CALL VSTRSM (side, uplo, transa, diag, m, n, alpha, nunita, ia1, ja1, lda, nunitb, ib1,
jb1, ldb)
CALL VCTRSM (side, uplo, transa, diag, m, n, alpha, nunita, ia1, ja1, lda, nunitb, ib1,
jb1, ldb)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
VSTRSM solves a virtual real triangular system of equations with multiple right-hand sides. VCTRSM solves
a virtual complex triangular system of equations with multiple right-hand sides. VSTRSM and VCTRSM are
out-of-core versions of STRSM(3S) and CTRSM(3S), which are Level 3 Basic Linear Algebra Subprograms
(Level 3 BLAS).
VSTRSM and VCTRSM each solve one of the following matrix equations, using the operation associated with
each:
Equation Operation
op (A )X = α B B ← α op (A −1)B
X op (A ) = α B B ← α B op (A −1)
–1
where A is the inverse of A, α is a scalar, X and B are m-by-n matrices, A is either a unit or nonunit upper
or lower triangular matrix, and op(A) is one of the following:
op(A)=A
T
op(A)=A
H
op(A)=A (VCTRSM only)
T H
where A is the transpose of A, and A is the conjugate transpose of A.
SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines
STRSM(3S) for a description of STRSM(3S) and CTRSM(3S), which are the in-memory equivalents of the
out-of-core routines VSTRSM and VCTRSM
NAME
INTRO_MACH – Introduction to machine constant functions
IMPLEMENTATION
UNICOS and UNICOS/mk systems
DESCRIPTION
These functions return machine constants for UNICOS systems.
The SLAMCH routine runs on UNICOS and UNICOS/mk systems. The R1MACH(3S) and SMACH(3S)
routines run only on UNICOS systems.
The following table contains the purpose and name of each machine constant function. The first routine is
the name of the man page that documents all of the listed routines.
• R1MACH: Returns machine constants
• SLAMCH: LAPACK routine (see INTRO_LAPACK(3S)) which returns a wide variety of machine
constants
• SMACH, CMACH: Returns machine epsilon, numerically safe small and large normalized numbers
NAME
R1MACH – Returns UNICOS machine constants
SYNOPSIS
r = R1MACH (i)
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
The R1MACH function returns UNICOS machine constants.
This function has the following arguments:
r Real. (output)
Machine constant returned.
i Integer. (input)
Indicates the machine constant to be returned.
Must be an integer from 1 to 5; any other value prints an error message on standard output and
executes a Fortran STOP (thus aborting the program). The following lists the machine constant
returned for each valid value of i:
Value Machine Constant Returned
1 B**(EMIN– 1), the smallest positive magnitude
2 B**EMAX*(1 – B**(– T)), the largest magnitude
3 B**(– T), the smallest relative spacing
4 B**(1– T), the largest relative spacing
5 LOG10(B); B is the base or radix of the machine
where
B = Base of the machine
T = Number of base-B digits in the mantissa
EMIN = Minimum exponent before underflow
EMAX = Largest exponent before overflow
The constants that define the model of rounded floating-point arithmetic on Cray Research systems are as
follows:
B = 2 EMI N = -81 89
T = 47 EMA X = 819 0
Because of the characteristics of Cray Research floating-point hardware, the constant used for R1MACH(1)
is one bit larger than the smallest magnitude defined by the model. The constants that R1MACH returns, in
both decimal and the internal representation, are as follows:
R1M ACH (1) = 0.3667 207 735109 720E-2 465 020003 400 000 000000 000 1
R1M ACH (2) = 0.2726 870339 048520 E+2466 057 776 777 777 777777 777 6
R1M ACH (3) = 0.7105 427 357601 002E-1 4 037722 400 000 000000 000 0
R1M ACH (4) = 0.1421 085471 520200 E-13 037 723 400 000 000000 000 0
R1M ACH (5) = 0.3010 299 956639 813E+0 0 037777 464 202 324117 572 0
SEE ALSO
SMACH(3S)
NAME
SLAMCH – Determines single-precision machine parameters
SYNOPSIS
s = SLAMCH(cmach)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
DESCRIPTION
SLAMCH determines single-precision machine parameters.
This routine accepts the following arguments:
cmach Specifies the value to be returned by SLAMCH.
CHARACTER*1. (input)
’B’ or ’b’ = base (base of the machine)
’E’ or ’e’ = eps (epsilon, relative machine precision, base*(1– t))
’L’ or ’l’ = emax (largest exponent before overflow)
’M’ or ’m’ = emin (minimum exponent before (gradual) overflow)
’N’ or ’n’ = t (number of (base) digits in the mantissa)
’O’ or ’o’ = rmax (overflow threshold, (base*emax)*(1– eps))
’P’ or ’p’ = prec (precision)
’R’ or ’r’ = rnd (1.0 if rounding occurs in addition; otherwise, 0.0)
’S’ or ’s’ = sfmin (safe minimum such that 1/sfmin does not overflow)
’U’ or ’u’ = rmin (underflow threshold, base*(emin– 1))
NOTES
The constants used to define the model of rounded floating-point arithmetic on UNICOS systems are as
follows:
base = 2
t = 47
emin = – 8189
emax = 8190
Two exceptions are the values used for sfmin and rmin. They are taken to be 1 bit larger than the smallest
magnitude defined by the model to ensure that reciprocal operations on these values do not become smaller
than sfmin or larger than rmax. The values returned by SLAMCH on UNICOS systems are as follows:
The constants used to define the model of rounded floating-point arithmetic on UNICOS/mk systems are as
follows:
base = 2
t = 53
emin = – 1021
emax = 1023
The values returned by SLMACH are as follows:
SLAMCH (’B’) = 2
SLA MCH(’E’) = 0.222044604925031308E– 15
SLA MCH (’L’) = 1023
SLA MCH (’M’) = – 1021
SLA MCH(’N’) = 53
SLA MCH(’O’) = 0.898846567431157754E+308
SLA MCH(’P’) = 0.444089209850062616E– 15
SLAMCH (’R’) = 1
SLAMCH(’S’) = 0.222507385850720138E– 307
SLAMCH (’U’) = 0.222507385850720138E– 307
NAME
SMACH, CMACH – Returns machine epsilon, small or large normalized numbers
SYNOPSIS
result = SMACH (int)
result = CMACH (int)
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
The SMACH and CMACH routines return machine epsilon, small or large normalized numbers.
These routines have the following arguments:
result Real. (output)
Machine constant returned.
int Integer. (input)
Selects machine constant to be returned.
1 ≤ int ≤ 3. Any other value returns an error message.
For SMACH, int indicates that one of the following machine constants is returned as result:
int result
1 0.7105427357601002E-14
The machine epsilon (the smallest positive machine number ε for which 1.0 ± ε ≠ 1.0).
2 0.1290284014791423E-2449
A "numerically safe" number close to the smallest normalized, representable number.
3 0.7750231643082450E+2450
A "numerically safe" number close to the largest normalized, representable number.
For CMACH, int indicates that one of the following machine constants is returned as result:
int result
1 0.7105427357601002E-14
The machine epsilon (the smallest positive machine number ε for which 1.0 ± ε ≠ 1.0).
2 0.1347558278913286E-1216
A "numerically safe" number close to the square root of the smallest normalized, representable
number.
3 0.7420829329967288E+1217
A "numerically safe" number close to the square root of the largest normalized, representable number.
You can use CMACH(2) and CMACH(3) to prevent overflow during complex arithmetic.
SEE ALSO
Lawson, C. L., Hanson, R. J., Kincaid, D. R., and Krogh, F. T., "Basic Linear Algebra Subprograms for
Fortran Usage – An Extended Report," Sandia Technical Report SAND 77-0898, Sandia Laboratories,
Albuquerque, NM, 1977.
NAME
INTRO_SUPERSEDED – Introduction to superseded Scientific Library routines
IMPLEMENTATION
UNICOS systems
DESCRIPTION
The routines is this section are superseded by newer routines. Many routines and one software package
(LINPACK) are almost, but not totally, superseded. Each of these superseded routines or packages is
documented in another section, according to its purpose.
Each of these routines, whether fully, mostly, or partially superseded, is minimally supported to maintain
continuity.
These routines are not available on Cray T90 systems that support IEEE arithmetic.
Fully Superseded Routines
The following table contains the purpose and name of each superseded Scientific Library routine. Column 3
contains a reference to the preferred replacement for each superseded routine. Each superseded routine has
its own man page.
Superseded
Purpose routine Preferred routine
Gathers a vector from a source vector GATHER None needed (see GATHER(3S))
Solves a system of linear equations by inverting a MINV SGESV (see
square matrix INTRO_LAPACK(3S))
Multiplies a matrix by a vector (unit increments) MXV SGEMV(3S)
Multiplies a matrix by a vector (arbitrary increments) MXVA SGEMV(3S)
Multiplies a matrix by a matrix (unit increments) MXM SGEMM(3S)
Multiplies a matrix by a matrix (arbitrary increments) MXMA SGEMM(3S)
Multiplies a matrix by a column vector and adds the SMXPY SGEMV(3S)
result to another column vector
Multiplies a matrix by a row vector and adds the result SXMPY SGEMV(3S)
to another row vector
Scatters a vector into another vector SCATTER None needed (see SCATTER(3S))
Solves a tridiagonal system TRID SGTSV (see
INTRO_LAPACK(3S))
NAME
GATHER – Gathers a vector from a source vector
SYNOPSIS
CALL GATHER (n, a, b, index)
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
GATHER is defined as follows:
ai = b j i
where i = 1, . . ., n
CAUTIONS
You should not use this routine on systems that have Compress-Index Gather-Scatter (CIGS) hardware,
because it will degrade performance.
SEE ALSO
SCATTER(3S)
NAME
MINV – Solves systems of linear equations by inverting a square matrix
SYNOPSIS
CALL MINV (ab, n, ldab, scratch, det, tol, m, mode)
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
MINV computes the determinant of a matrix A, subject to the restriction imposed by tol (see the CAUTIONS
section. You may also use it to solve systems of linear equations (if m > 0) or to compute the inverse of a
square matrix (if mode ≠ 0).
If m>0, MINV solves the following matrix equation:
AX = B
where B represents an n-by-m matrix of known values, and X represents an n-by-m matrix of unknowns for
which to solve.
You may consider each column of B to be the right-hand side values of a system of linear equations, and
each corresponding column of X to be the unknowns for the system of linear equations defined by A and the
corresponding column of A. On output, the solution matrix X overwrites the right-hand side matrix B.
If mode ≠ 0, MINV calculates A −1, which overwrites A. If mode = 0, A is still overwritten, but not by A −1.
This routine has the following arguments:
ab Real array of dimension (ldab,n+m). (input and output)
On input, ab contains the augmented matrix A:B. A is the square matrix to be inverted (if
mode ≠ 0), and B is the matrix whose columns are the right-hand sides for the systems of linear
equations to be solved.
On output, ab contains the augmented matrix Z:X. Z is either A −1 (mode ≠ 0), the inverse of A
(mode is nonzero), or some other n-by-n matrix replacing A. X is the matrix, each column of
which is the solution vector for the system of linear equations defined by the corresponding
column of B.
n Integer. (input)
Order of matrix A; that is, the number of rows in A (same as number of columns).
ldab Integer. (input)
Leading dimension of array ab.
ldab ≥ n .
NOTES
MINV solves linear equations by using a partial pivot search (one unused row) and Gauss-Jordan reduction.
MINV is superseded by the LAPACK routines SGETRF(3L) and SGETRI(3L) (which together can calculate
the determinant and inverse of a general square matrix), or by the LAPACK routine SGESV(3L) (which
solves the matrix equation AX = B). LAPACK routines are preferred because they are the emerging de facto
standard linear systems interface. Using LAPACK routines will enhance your program’s portability, and also
should enhance its performance portability.
Man pages for the LAPACK routines SGETRF(3L), SGETRI(3L), SGESV(3L) are available only online,
using the man(1) command.
CAUTIONS
At each reduction step, MINV computes the partial product of pivot elements. If this product’s magnitude is
less than or equal to tol, MINV aborts computation. Therefore, if the value returned in det is less or equal in
magnitude to the value input as tol, MINV did not invert A or solve for X (although A:B may have been
overwritten); in this case, the value returned in det may not be the determinant of A.
SEE ALSO
INTRO_LAPACK(3S) for more information and further references regarding the preferred routines
SGETRF(3L), SGETRI(3L), SGESV(3L) (available only online)
man(1) in the UNICOS User Commands Reference Manual
Partial Pivoting Linear Equation Solver (MINV), publication SN– 0215 (1980), which contains more
information on the algorithm used by MINV
Knuth, D.E., The Art of Computer Programming, Volume 1 (Fundamental Algorithms), Reading, MA:
Addison-Wesley, 1973; pp. 301– 302
NAME
MXM – Computes matrix-times-matrix product (unit increments)
SYNOPSIS
CALL MXM (a, nra, b, nca, c, ncb)
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
MXM computes the nra-by-ncb matrix product C = AB of the nra-by-nca matrix A and the nca-by-ncb matrix
B.
This routine has the following arguments:
a Real array of dimension (nra,nca). (input)
Matrix A, the first factor.
nra Integer. (input)
Number of rows in A (same as number of rows in C).
b Real array of dimension (nca,ncb). (input)
Matrix B, the second factor.
nca Integer. (input)
Number of columns in A (same as number of rows in B).
c Real array of dimension (nra,ncb). (output)
Matrix C, the product AB.
ncb Integer. (input)
Number of columns in B (same as number of columns in C).
NOTES
You should use the Level 3 Basic Linear Algebra Subprogram (Level 3 BLAS) SGEMM(3S) rather than MXM.
BLAS routines are preferred because they are the de facto standard linear algebra interface. Using Level 3
BLAS routines will enhance your program’s portability, and also should enhance its performance portability.
For example,
CALL MXM (A, NRA , B, NCA, C, NCB )
is equivalent to,
CALL SGEMM (’N ’, ’N’ , NRA , NCB , NCA , 1.0 ,
$ A, NRA , B, NCA , 0.0 , C, NRA )
MXM is restricted to multiplying matrices that have elements stored by columns in successive memory
locations. MXMA(3S) is a general subroutine for multiplying matrices that can be used to multiply matrices
that do not satisfy the requirements of MXM (although SGEMM also supersedes MXMA). If B and C have only
one column, MXV(3S) or MXVA(3S) (both superseded by Level 2 BLAS routine SGEMV, see SGEMV(3S)) are
similar subroutines, each of which computes the product of a matrix and a vector.
CAUTIONS
The product must not overwrite either factor. For example, the following call will not (in general) assign the
product AB to A:
CALL MXM(A, NRA ,B,NCA ,A, NCA )
SEE ALSO
MXMA(3S) to multiply less strictly declared matrices
MXV(3S), MXVA(3S) to perform a matrix-vector multiply
SGEMM(3S), which supersedes MXM and MXMA
SGEMV(3S), which supersedes MXV and MXVA
NAME
MXMA – Computes matrix-times-matrix product (arbitrary increments)
SYNOPSIS
CALL MXMA (sa, iac, iar, sb, ibc, ibr, sc, icc, icr, nrp, m, ncp)
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
MXMA calculates the following nrp-by-ncp matrix product where A is a nrp-by-m matrix, and B is a m-by-ncp
matrix:
C = AB
NOTES
You should use the Level 3 Basic Linear Algebra Subprogram (Level 3 BLAS) SGEMM (see SGEMM(3S))
rather than MXMA, because they are the de facto standard linear algebra interface. Using Level 3 BLAS
routines will enhance your program’s portability, and also should enhance its performance portability.
MXMA is a general subroutine for multiplying matrices. It can be used to compute a product of matrices in
which one or more of the operands or the product must be transposed. You can use MXMA to multiply any
matrices whose elements are not stored by columns in successive memory locations, provided only that the
elements of rows and columns are spaced by increments constant for each matrix. (The preferred routine,
SGEMM, also can do these operations.)
If B and C have only one column, MXVA(3S) (superseded by Level 2 BLAS routine SGEMV, see SGEMV(3S))
is a similarly general subroutine that computes the product of a matrix and a vector.
The product of matrices whose elements are stored by columns in successive memory locations can be
computed slightly faster using MXM(3S) (superseded by SGEMM) for matrices of more than one column or
MXV(3S) (superseded by SGEMV) for matrices B and C which have only one column.
The following subroutine calls are equivalent:
CALL MXMA(S A,1 ,NRP,S B,1 ,M,SC, 1,NCP, NRP,M, NCP)
(The product elements computed by MXM are also stored by columns in successive memory locations.)
CAUTIONS
To be computed correctly, the product must not overwrite either operand. Thus, if ALPHA is a
one-dimensional array,
CALL MXMA(A LPH A,3,9, BET A,1,2, ALPHA( 2),1,3 , 3,2 ,2)
correctly computes the product of the matrices defined in ALPHA and BETA, whereas the following does not
(in general):
CALL MXMA(A LPH A,3,9, BET A,1,2, ALPHA, 1,3, 3,2,2)
SEE ALSO
MXM(3S) to multiply more strictly declared matrices
MXV(3S), MXVA(3S) to perform a matrix-vector multiply
SGEMM(3S), which supersedes MXM and MXMA
SGEMV(3S), which supersedes MXV and MXVA
NAME
MXV – Computes matrix-times-vector product (unit increments)
SYNOPSIS
CALL MXV (a, nra, b, nca, c)
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
MXV computes the nra vector product c = Ab of the nra-by-nca matrix A and the nca vector b.
This routine has the following arguments:
a Real array of dimension (nra,nca). (input)
Matrix factor.
nra Integer. (input)
Number of rows in the matrix.
b Real array of dimension nca. (input)
Vector factor.
nca Integer. (input)
Number of columns in the matrix.
c Real array of dimension nra. (output)
Vector product.
NOTES
You should use the Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS) SGEMV (see SGEMV(3S))
rather than MXV, because they are the de facto standard linear algebra interface. Using Level 2 BLAS
routines will enhance your program’s portability, and also should enhance its performance portability. For
example,
CAL L MXV (A, NRA , B, NCA , C)
is equivalent to,
CAL L SGE MV (’N ’, NRA , NCA , 1.0 , A, NRA , B, 1, 0.0 , C, 1)
MXV is restricted to using matrix and vector arguments that have elements stored by columns in successive
memory locations. MXVA(3S) is a general matrix-vector multiply subroutine that can use matrix and vector
arguments that do not satisfy the requirements of MXV (although SGEMV also supersedes MXVA).
CAUTIONS
MXV is restricted to multiplying a vector that occupies successive memory locations (in order) by a matrix
whose elements are stored by columns in successive memory locations. MXVA is a general subroutine for
multiplying a matrix and a vector, which can be used to multiply a vector by a matrix stored with arbitrary
column and row increments.
SEE ALSO
MXM(3S), MXMA(3S) to perform a matrix-matrix multiply
MXVA(3S) to multiply with less strictly declared matrix and vector arguments
SGEMM(3S), which supersedes MXM and MXMA
SGEMV(3S), which supersedes MXV and MXVA
NAME
MXVA – Computes matrix-times-vector product (arbitrary increments)
SYNOPSIS
CALL MXVA (sa, iac, iar, sb, ib, sc, ic, nra, nca)
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
MXVA calculates the following nra matrix-vector product:
c = Ab
Then
CALL MXVA(S A,I AC,LDS A,SB,I B,SC,I C,NCA, NCA)
multiplies a square submatrix A of sa times a vector b from sb, storing the product c in sc, while
CALL MXVA(S A,L DSA,IA C,SB,I B,SC,I C,NCA, NCA)
NOTES
You should use the Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS) SGEMV(3S) rather than
MXVA, because they are the de facto standard linear algebra interface. Using Level 2 BLAS routines will
enhance your program’s portability, and also should enhance its performance portability.
MXVA is a general matrix-vector multiply subroutine. As demonstrated earlier, you can use MXVA with a
matrix or its transpose. You can use MXVA to multiply any vector or matrix arguments whose elements are
not stored by columns in successive memory locations, provided only that the elements of rows and columns
are spaced by increments constant for each matrix. (The preferred routine, SGEMV, also can do these
operations.)
The the matrix-vector product whose elements are stored by columns in successive memory locations can be
computed slightly faster using MXV(3S) (superseded by SGEMV).
The following two subroutine calls have the same result:
CALL MXVA(S A,1 ,NRA,SB,1 ,SC ,1, NRA ,NC A)
(The product elements computed by MXV are also stored in successive memory locations.)
CAUTIONS
To be computed correctly, the product must not overwrite either operand. Thus, for example, the following
call will not (in general) compute correctly the product of the matrix in sa and the vector in sb:
CALL MXVA(S A,I AC,IAR,SB ,IB ,SB ,IB ,NR A,N CA)
SEE ALSO
MXM(3S), MXMA(3S) to perform a matrix-matrix multiply
MXV(3S) to multiply with more strictly declared matrix and vector arguments
SGEMM(3S), which supersedes MXM and MXMA
SGEMV(3S), which supersedes MXV and MXVA
NAME
SCATTER – Scatters a vector into another vector
SYNOPSIS
CALL SCATTER (n, a, index, b)
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
SCATTER is defined as follows:
a j = bi
i
where i = 1, . . ., n
This routine has the following arguments:
n Integer. (input)
Number of elements in arrays index and b (not in a).
a Real or integer array of dimension max(index(i): i=1,. . .,n). (output)
Contains the result vector.
b Real or integer array of dimension n. (input)
Contains the source vector.
index Integer array of dimension n. (input)
Contains the vector of indices.
The Fortran equivalent of this routine is as follows:
DO 100 I=1 ,N
A(INDE X(I ))= B(I )
100 CON TINUE
CAUTIONS
You should not use this routine on systems that have Compress-Index Gather-Scatter (CIGS) hardware,
because it will degrade performance.
SEE ALSO
GATHER(3S)
NAME
SMXPY – Multiplies a column vector by a matrix and adds the result to another column vector
SYNOPSIS
CALL SMXPY (n1, y, n2, ldam, x, am)
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
SMXPY performs the matrix-vector operation:
y ← y + Mx
where y is a vector of length n1, M is an n1-by-n2 matrix, and x is a vector of length n2.
This routine has the following arguments:
n1 Integer. (input)
Number of elements in y (same as number of rows in M).
y Real array of dimension n1. (input and output)
On input, y is the vector to be added to the product of M and x. On output, the result vector
overwrites y.
n2 Integer. (input)
Number of elements in x (same as number of columns in M).
ldam Integer. (input)
Leading dimension of array am, which contains the matrix M.
n1 ≤ ldam.
x Real array of dimension n2. (input)
Vector used in the matrix-vector product.
am Real array of dimension (ldam, n2). (input)
Contains the n1-by-n2 matrix M used in the matrix-vector product.
NOTES
You should use the Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS) SGEMV (see SGEMV(3S))
rather than SMXPY, because they are the de facto standard linear algebra interface. Using Level 2 BLAS
routines will enhance your program’s portability, and also should enhance its performance portability.
SEE ALSO
SGEMV(3S), which supersedes SMXPY
SXMPY(3S) (also superseded by SGEMV) to multiply a row vector by a matrix and add the result to another
row vector
NAME
SXMPY – Multiplies a row vector by a matrix and adds the result to another row vector
SYNOPSIS
CALL SXMPY (n1, ldy, sy, n2, ldx, sx, ldam, am)
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
SXMPY performs the matrix-vector operation:
y <-- y + xM
where y is a row vector of length n1, x is a vector of length n2, and M is an n2-by-n1 matrix.
These "row vectors" would normally be written as transposes in the more conventional "column vector"
notation; however, SXMPY assumes that these vectors are actual rows from matrices Y and X, not merely lists
of elements considered to be a row for algebraic purposes. For some numbers l and m, the elements of y
and x are as follows:
yi = Yli for i = 1,. . .,n1 x j = Xm j for j = 1,. . .,n2
This routine has the following arguments:
n1 Integer. (input)
Number of columns in Y (same as number of elements in y, same as number of columns in M).
ldy Integer. (input)
Leading dimension of Y (same as increment between elements of y).
sy Real element from array of dimension (ldy, n1). (input and output)
sy locates the first element of the vector y; that is, Yl 1, or Y(l,1).
On input, y is the vector to be added to the product of x and M. On output, the result vector
overwrites y.
n2 Integer. (input)
Number of columns in X (same as number of elements in x, same as number of rows in M).
ldx Integer. (input)
Leading dimension of X (same as increment between elements of x).
sx Real element from array of dimension (ldx, n2). (input)
sx locates the first element of the vector x; that is, Xm 1, or X(m,1).
x is the row vector used in the vector-matrix product.
ldam Integer. (input)
Leading dimension of array am.
n2 ≤ ldam.
NOTES
Cray Research recommends that you use the Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS)
SGEMV (see SGEMV(3S)) rather than SXMPY, because they are the de facto standard linear algebra interface.
Using Level 2 BLAS routines will enhance your program’s portability, and also should enhance its
performance portability.
SEE ALSO
SGEMV(3S), which supersedes SXMPY
SMXPY(3S) (also superseded by SGEMV), to multiply a matrix by a column vector and add the result to
another column vector
NAME
TRID – Solves a tridiagonal system
SYNOPSIS
CALL TRID (tl, tc, tr, inct, n, s, incs)
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
TRID solves a tridiagonal system for a single right-hand side by a combination of burn-at-both-ends and 3:1
cyclic reduction. 3:1 cyclic reduction is used until the size of the system is reduced to 40. Then the reduced
system is solved directly using a burn-at-both-ends algorithm. The remaining values are obtained by
backfilling. No type of pivoting is done.
This routine has the following arguments:
tl Real array of dimension (n– 1)*inct+1. (input)
Contains the lower off-diagonal of the tridiagonal matrix with tl(1) = 0.0.
tc Real array of dimension (n– 1)*inct+1. (input)
Contains the main diagonal of the tridiagonal matrix.
tr Real array of dimension (n– 1)*inct+1. (input)
Contains the upper off-diagonal of the tridiagonal matrix with tr(1+(n– 1)*inct) = 0.0.
inct Integer. (input)
Increment between elements of tl, tc, and tr.
Typically, inct = 1.
n Integer. (input)
Contains the dimension of the matrix system being solved.
s Real array of dimension (n– 1)*incs+1. (input and output)
On input, s contains the right-hand side values of the matrix system. On output, s contains the
solution of the matrix system.
incs Integer. (input)
Increment between elements of s.
Typically, incs = 1.
NOTES
To perform this operation using the same algorithm, CRI recommends that you use the newer routine
SDTSOL(3S) rather than TRID. SDTSOL(3S) uses the same algorithm as TRID, but SDTSOL(3S) is part of
a larger package of tridiagonal system routines, including SDTTRF(3S) to factor the tridiagonal matrix, and
SDTTRS(3S) to solve systems based on that factorization. There are also complex versions of these
routines: CDTSOL(3S), CDTTRF(3S), and CDTTRS(3S).
To perform this operation for ill-conditioned systems, CRI recommends the LAPACK routine SGTSV(3L),
which uses partial pivoting for better numerical stability.
When calling TRID, the elements tl(1) and tr(1+(n– 1)*inct) must be allocated and set equal to 0.
EXAMPLES
The following examples show how to set up arguments tl, tc, and tr, given the tridiagonal matrix T. Let T
be the tridiagonal matrix:
11 12 0 0 0
21 22 23 0 0
T= 0 32 33 34 0
0 0 43 44 45
î 0 0 0 54 55
Then to pass T to TRID (with inct = 1), set
0 11 12
21 22 23
tl = 32 tc = 33 tr = 34
43 44 45
î 54 î 55 î 0
SEE ALSO
SDTSOL(3S), SDTTRF(3S), SDTTRS(3S) to factor and solve tridiagonal systems by using the same
algorithm as TRID
SGTSV(3L) (available only online) to solve tridiagonal system by using Gaussian elimination with partial
pivoting