Professional Documents
Culture Documents
004 2081 002 Scientific Libraries Reference Manual Volume 1 Version 3.3 July 1999
004 2081 002 Scientific Libraries Reference Manual Volume 1 Version 3.3 July 1999
Manual, Volume 1
004–2081–002
"! # $&%' ( ) # *+), $#-./! !102 3)4*50)67*867 9267:1-;1) *)<=( $>?(!?(4*@ )6AB67C<=( /$)DFE6
B6G?BD: >?#67H
: $I($=C4D <J>$)! 6A*K*?67"<="4+6A:LEDM# $4(#DNEDPOQ"446G$R?67 <M *8*K"$SDC "! # $M%5( #*4?, $#-
.Z>?4D4(* q $)32?eD ?? (K?( R./:( )( 2 DC4 ?( wv2g]U p )? ( Dg8 02, $)C]"</)02,]ix7yjz{B|}` 92( _r8~2 T Em# U pp .Z?B6G$4 #6mmX2
a/p W102 Ta 8;W10j a Y[,]kZ8~2g4U p W.Z?( $): a Y[,]?klmi< q (B6rC467:167 (! ! MB6AD *8467B67:H4 (:6G<M( q *'( $):H167#( >?*86$lO[D q *K4(4 $I *5( $
*8! ( $:?, ??1UQ;D?eDo ??e ;D2e ;1f2e ; $)1>?G6G$)Us(A $)46G$($#6;D! *42?kl8( w.Z$ <=(4 $h;1)6A(A46A )? ( w. pp )( H?o
( HAoX)? ( MQ</? !" $)RK2*846G</)? (2X5D#]?? (HW T )( R8o ( Po*K62? ( T $ q ( PYQZ82( 2i02W1W T ! E("( $F
( H8g4U p )( MmmX)g4;o ( M8tr 2( P;o ? (w;1uX))( P;1uW ( D;m>?4D 2( R~2g4U p ?? (w~)UV8?? ( Dg4fm, UQ1t;D
XZ67! 926A $/ )6?OQ67`-F-j- ?X5%5( >?*8*4)X5D#]92 6O@)W UVXZ8?%5 j(02 $)3)_rW ~2./0j),]kZ8FYQXm67 67*Y[6A O[D q XZ * q ./ (3
Y[6A O[D q >?6G>? $PW $F9) B$<=6G$ Y[67 O[D q >?6G>? $)b;1D!"*+?k T Y[W ;D02l8)mW1% T XZ0j8U ./0m;W a/p W0 T , YQ}1
8*846G<UV( $46G$)( $#6l( $):H026<=D+6?;67*84 $PW $F92"B$F<M6G$) F; >?*8467: a Y[,]kZ8($: a YQ,4kl5UQ.Z~(AG6L4 (:6<=( q *@DC( M026A*K6A(AG#]F
T - T - - 2('Or)D! ! M Or$67:[*4>E*8 :1 ( =DC)m !" # $M%5( )"#*4?, $)# -
m%5,N *`(4(:16G<=( q D Cm ! "# $M%5( ) # *4), $#-,]02, ~\( $):Qm ! # $M%5 ( #*@(AG6dB67j *8467B67:[4 (:6<=( q *@( $):Q )6d "! # $M%5( #*5! DDw *`(
4 (:6G<M( q DCm ! "# $I5
% ( #*4), $)# -
X5I"*'(4(:16G<=( q C2? $)4BD!XZ(4(b8*846G<M*+?, $)# -)XZW1 a/T ;02, ~2Ft.Z~2?( $):LtUs(B6r4(:16G<=( q *@DCX5 j 4(!W1 >? <M6G$)
"? (A4"$-2W102oL *5(d4(:16G<M(A q DCW UQ.P 8, $)# -)W ;D."*'(4(:16G<=( q C2W;D.K2*846G<=*4?, $#-,]U"*'(4(:16G<=( q C2, $467 $(4 $(!
>?*8 $)6A*K*NUs(#]) $)67*'"? (A4"$-UV, p *5(d4(:16G<M(A q DCDUs, p ? </>?467@K2*846G<=*8- a Y[, ~\ *5(LB67j *8467B67:H4(:16G<M(A q $I )6 a $) 46A:
m4(467*@($:[D )67`# >$)4"67*4N! "#6G$*K6A: 67 #! >*K 9267! B>?3[~5i1k)?6G$S? </?( $F T <="467:1-~5ik?6G$M"*`(dB67D"*K46AB67:H4(:16G<=( q Cj~5ik?6G$
<P?($ T 4:1-F~&V $):1OK2*846G<( $:H )6?~\:6G9) #6b(B6d4 (:6<=( q *5DCD;16dk?6G$S%5B>?-
;1)6 a Y[,]?klw?67 (4 $)P*4*846G< *5:67 9267: C4B < a YQ, ~?h82*K46<t?-7;1)6 a Y[,]?klr ?67(4 $)h*4*846G< *5(! *8lE(*867:Q $h?(A 2 $S )6
ej>? =167 q 67! 6GSmC] O[(B6dXZ *84 ED>?4 $V41mX5F>$:167?! # 6G$)*86rC]B <;1)6b0267D6$)4*5C2 )6 a $) 9267*8 HDC?(! "C4D $)"(-
New Features
V'hB`KAK¡D&¡D¢``£Q¤V3&¥\¦5M8)hK¢`BQ3§Bm?¨
Record of Revision
©`ª1«m¬1D®2¯ °ª1¬±1«m3²w³D®2¯
´`¨ µ ¶3¡D¢¸·¹º1¹
»H¡D'¥\` B^m'¼'¼') B`£\ ¢`V½HV¾8¿/ÀMÁs´`¨ µV3§BmSA'``B`£¸¿/1Â\Ãrm3¡D¢
¡D¥\¼'' RmÂmK¥\¨
Ä`¨ µ ÅK`'A·¹¹1·
ÃL¼'B`b¤VBK¢^3Æ?BmBÇm'¼'¼')AKB`£\ ¢`V½HV¾K¿PÀ=ÁÄ`¨ µV3§B1SA'``B`£)^¿PAÂÃL3¡D¢
¡D¥\¼'' RmÂmK¥\¨
È`¨ µ É='£'md·¹¹Ê
ÃL¼'B`b¤VBK¢^3Æ?BmBËm'¼'¼') B`£\½=&¾8¿PÀ=ÁsÈ`¨ µs3§BmVA'``B`£)¸¿PAÂÃL3¡D¢
¡D¥\¼'' RmÂmK¥\¨
º`¨ µ É='£'md·¹¹Ì
ÃL¼'B`b¤VBK¢^3Æ?BmBËm'¼'¼') B`£\K¢`S¿/1Â?Í5BÎ ·`¨ µs3§BmSÏ81Â`¡D¢`3`)'mÐw ¢`bA'`
¿/ÂÃL3¡D¢^m ¥\¨Z¾KK¢`BQ3ÆBmB))r ¢`S¦5)¡j'¥\`K B)`Ñ? ¢`S¥\ ¢§BBÎAÂB
`Ò§B`£R¦5)¡j'¥\`K¦BÇ ¢`Vm¥\M¥\`'§b1QK¢`Sm¡DB` BÓ5¡H§BBÎAA¨Z¾8`m ¦5Ñ@BdB
¦5)¡j'¥\`K¦¸BÇ ¢`RÔÖÕD×8ØVÙÚÜÛFÝAÕDÝAÞ&ßLà áDàAÝ àAâã àZÔÖÕDâ?äÕDåÜÑ`¼''ΧBB¡DKB)^ÁÃLæKÊ·Ì1º`¨5ç5¢`SmAr`¦
m3¡D¢3)'KB`Ñ@¤V¢`B¡D¢\¤V3&B\K¢`S½HV¾8¿/ÀMÁsÈ`¨ µIÆ?AB))r ¢`Sm¡DB`KBÓ'¡H§BBÎAÂ Ñ`¤V3
¥\Æ?¦¸ \ ¢`&½=V¾K¿PÀ=Áè) 1¸Í5BÎAA¨
º`¨ · ÅK'`S·1¹¹é
ÃL¤sAB M \m'¼'¼')r ¢`&¿PAÂ?Í'BÎ?[·`¨ ·V3§B1&ÏKmÂ?`¡D¢`3)`'mÐr ¢`r'`Q¿PAÂ
ÃL3¡D¢^mÂ?m ¥\¨Zç5¢`BQ3Æ?BmBËB`¡DA¼')1K[m5¼@¼'ArK)[ ¢`S¿PAÂ^¶ê'êË¢`3¦'¤s13
¼'§B 8)A¥¨
Ê`¨ µ »H¡D¥\ÎR·¹¹´
ÃL¤sAB M \m'¼'¼')r ¢`&¿PAÂ?Í'BÎ?[Ê`¨ µV3§B1&K¢`1b'`Q)¸¿PAÂ\Ãrm3¡D¢¸m ¥\¨
ç5¢`BQ3Æ?BB)ÇB`¡D)¼')AKQm'¼'¼')rK)hÁ2¡D§BΧB&Í'É=ê1ÉH¿PíÏKÁ2¡DÍ5ÉHêÉH¿PíRÐd`¦
¦5)¡j'¥\`K¦¸m'¼'¼')d8)hÌÊæKÎBrè?èçË3' B`¨lÉ=¦5¦5B B§Z3)'KB`Q¤V3&¦5¦5¦K)
è?èçË`¦ë5Í5ÉHÁ'¨
Ì`¨ µ ÅK'`S·1¹¹È
ÃL¤sAB M \m'¼'¼')r ¢`&¿PAÂ?Í'BÎ?[Ì`¨ µV3§B1&K¢`1b'`Q)¸¿PAÂ\Ãrm3¡D¢¸m ¥\¨
ç5¢`BQ3Æ?BB)Ç3¥\Æ?Hm'¼'¼')d8)P ¢`Vë'B¡HÍ5B`RÉH§B£ÎsÁ2'μ'3)£¥\QK)[m¢`3¦
004–2081–002 i
Scientific Libraries Reference Manual, Volume 1
Ì`¨ · É='£'md·¹¹º
½=¼'¦'K¦¸K^3ï5¡Dd¡D¢``£ BÇ ¢`Sê'3£A¥\¥\B`£ð5`Æ?B3`¥\`̨ ·s3§Bm?¨5ç5¢`
¼'AB`K¦ËKñd)r ¢`BQ¥\`'§b¤V ¥\¦'SÆ?B§BΧB&BǼ')mKm¡DB¼@dÏ ¨ ¼'mÐK)¥\b)`§BÂK
K¢`BQ3§Bm?¨
Ì`¨ Ì ÅK'§B·¹¹1¹
½=¼'¦'K¦¸K^3ï5¡Dd¡D¢``£ BÇ ¢`Sê'3£A¥\¥\B`£ð5`Æ?B3`¥\`̨ Ìs3§Bm?¨5ç5¢`
¼'AB`K¦ËKñd)r ¢`BQ¥\`'§b¤V ¥\¦'SÆ?B§BΧB&BǼ')mKm¡DB¼@dÏ ¨ ¼'mÐK)¥\b)`§BÂK
K¢`BQ3§Bm?¨
ii 004–2081–002
About This Guide
Documentation Organization
ç5¢`&¼'B` ¦ËÆmB)`H)r ¢`SÁ¡jB`KBÓ'¡MÍ5BÎAAÂ¥\^¼'£Q¼'¼'1[BÇÊ&Æ?)§B5¥\
`¦3I£3'¼@¦¡D¡D)3¦5B`£Ç \ ¼'B¡D¨5ÁS ¢` ÏKÌÁÐd¥\1˼'£SK)
¦5KB§BHÎ'lK¢`&¡D` ` d¡D¢ÇÆ?§B'¥\?¨ INTRO_LIBSCI
ð51¡j¢¸ ¼'B¡=¡j B)ǧBm\¢`[¸B` 3¦5'¡D AÂ\¥\˼'1£?S¤V¢`B¡D¢^ñ¼@§BB` K¢`
¡D)`K` )K¢`Sm¡D BÇ1`¦¼'3)ÆB¦5=) ¢`RB`K)A¥\KB)¸Î)'rK¢`S'm£I)K¢`)m
3)'KB`N¨lç5¢`SK§B§B)¤VB`£\B`K3)¦5'¡DK)Â^¥\^¼'£Q3IÆ?B§BΧB?ø
Ï8Ì1Á2Ð
INTRO_BLACS
Ï8Ì1Á2Ð
INTRO_BLAS1
Ï8Ì1Á2Ð
INTRO_BLAS2
Ï8Ì1Á2Ð
INTRO_BLAS3
ÏKÌÁÐ
INTRO_CORE
ÏKÌÁÐ
INTRO_FFT
ÏKÌÁÐ
INTRO_LAPACK
ÏKÌÁÐ
INTRO_MACH
Ï8Ì1Á2Ð
INTRO_SCALAPACK
ÏKÌÁÐ
INTRO_SPARSE
004–2081–002 iii
Scientific Libraries Reference Manual, Volume 1
ÏKÌÁÐ
INTRO_SPEC
ÏKÌÁ2Ð
INTRO_SUPERSEDED
Related Publications
ç5¢`&8)§B§B)¤VB`£¥\`'1§B ¦5)¡j'¥\`d ¢`V¿/1Â?Í5BÎ[¼'3)¦5'¡DK¨ZÉH§B§b¥\^¼'£QB
¢`mI¥\`'§B ¡DÇ1§BÎ&Æ?B¤V¦)`§BB`IÎÂ'B`£ ¢` ¡D)¥\¥\`¦w¨
man
ùûú â?×KÝ7ÚÜâ?ü7ÚÜã[ý5Ý þDã àAÿäÝ àAüRßrà áDà7Ý3àAâã àbÔÖÕDâ?äÕDå
å]ÚÜãÕj×KÚÜþDâý5Ý þ`ÝAÕ HàAÝ ühÙ?Ú]ÛFÝAÕDÝAÞVßrà áDà7Ý3àAâã àbÔÖÕDâ?äÕDå
ù
ù ã ÚÜà7â?×KÚ ãQÙÚÜÛFÝAÕDÝAÚ]àAüPßLàAÕDÿÞIßLà áDàAÝ àAâã à
ù å]ÚÜãÕj×KÚÜþDâý5Ý þ`ÝAÕ HàAÝ ühÙ?Ú]ÛFÝAÕDÝAÞVßràAÕDÿÞsßrà áDàAÝ àAâã à
ç5¢`SK§B§B)¤VB`£\¥\`'§Bh¦5m¡DBÎS ¢`S¼'3)¦5'¡j BÇ ¢`Sê53)£1¥\¥\B`£ð5`Æ?B3)`¥\` ¨
ç5¢`mI¼''ΧBB¡DKB)`Q¦5m¡DBÎ& ¢`S¼' B`£^mÂ?mK¥\ÑB`¼''ö2)'K¼''LÏK¾ ö2À=ÐKÑ`¦
)K¢`[3§B ¦ )¼'B¡j¨
ù
à HàAâ?×rÙþDÕDÿàAÝ Ù=ßLÕDâÿVåÜÿ&ßLà áàAÝ àAâ?ãàdÔÖÕDâä?ÕDå
ù H ú !" ü7àAÝ þ#!=ÕDâÿüRßLà áDàAÝ àAâã àbÔÖÕjâäÕjå
ù H ú !" ü7àAÝ þ#!= ÕDâÿüRßLà7Õjÿ1Þ&ßrà áDàAÝ àAâã à
ù äÚÜÿàP×KþHý5ÕjÝ7ÕDåÜåÜàAå%)$ àAã×KþjÝ åÜÚ]ã ÕD×KÚÜþDâ?ü
ù å]ÚÜãÕj×KÚÜþDâý5Ý þ`
ÝAÕ H àAÝ ü ú'&() ä?Ú]ÿà
¾K¸1¦5¦5BKB)¸ \ ¢`mI¦5)¡j'¥\`KmÑ@ƧZ¦5)¡j'¥\`KQ3IÆ?B§BΧBIK¢`l¦5m¡DBÎ
¢`V¡D)¥\¼'B§BRÂmK¥\ Æ1B§BΧBS)¸½=&¾8¿PÀ=Ás`¦½=&¾8¿PÀ=Á'ö2¥\õl¨'Á2)¥\IdK¢`m
¥\`'1§B 3?ø
ù *,+- ßLàAÕDÿÞIßLà áDàAÝ àAâã à
ù *,+-. þ HÕDâ?ÿüRÕDâÿ/=Ú]Ý3à7ã ×KÚ10à7üRßrà áDà7Ý3à7â?ãàLÔÖÕDâ?äÕDå
ù * þjÝ7×8Ý7Õjâ\Ù?ÕDâ ` äÕ2 àPßrà áDàAÝ àAâã àbÔÖÕDâ?äÕDå13$ þDåÜä= à54
ù * þjÝ7×8Ý7Õjâ\Ù?ÕDâ ` äÕ2 àPßrà áDàAÝ àAâã àbÔÖÕDâ?äÕDå13$ þDåÜä= à56
ù * þjÝ7×8Ý7Õjâ\Ù?ÕDâ ` äÕ2 àPßrà áDàAÝ àAâã àbÔÖÕDâ?äÕDå13$ þDåÜä= à57
ù ÝAÕDÞ &(88 ßLà áDàAÝ àAâã àbÔÖÕjâäÕjå
iv 004–2081–002
About This Guide
DC
http://techpubs.sgi.com/library
BA FE
ç5¢`B ÎÒmBKI¡D` B`RB`8)A¥\KB)K¢`1b§B§B)¤V[Â?)¸ ÒÎ3)¤VmI¦5¡D'¥\` h)`§BB`Ñ
)3¦5h¦5¡D'¥\` mÑ'1`¦m`¦K¦5ΡDõ \Á ¾m¨ `)¡D^§Bm\3¦'[I¼'B` ¦Á ¾
¦5)¡D'¥\`LΡD§B§BB`£·sºµµVÄ1ÊÈV¹Ìµ1È`¨
BA
ç5¢` üAà7ÝPý5äÛFåÜÚÜãÕj×KÚ]þjâü ÕD×KÕDåÜþ Ǧ5m¡DBÎQ ¢`VÆ?B§BÎB§BB Â`¦¡D` `Ld§B§l¿PAÂ
¢`3¦'¤s13V`¦m)8 ¤V3V¦5)¡j'¥\`KQ ¢`r3IÆ?B§BΧBMK¡j'm ¥\A¨Z¿P'mK)¥\A
¤V¢`m5Îm¡DABÎ?I \ ¢`S¿PAÂ\¾K`KA¥ Ï8¿/ÃL¾K`KA¥\Ðr¼'3)£¥ó¡j11¡j¡Dm[ ¢`B
B`K)¥\1KB)¸)¸K¢`S¿/ÃL¾8`K)A¥ ÂmK¥¨
BK)A §B§B¾w¥\¤VB1`B`£ÇK½HB`ÃL[ÍwBø `K)A¥\KB)¸)¸¼''ΧBB¡D§BÂ\Æ?B§BΧBI¿PAÂ^¦5)¡j'¥\`KQd ¢`
Á
!C
http://www.cray.com/swpubs/
GA
ç5¢`B ÎmBKI¡D)`KB`QB`K)A¥\KB)¸ ¢`r§B§B)¤V[Â?)KÎ3¤VmI¦5)¡j'¥\`KQ)`§BB`
FI
HH
`¦m`¦¸K¦5ΡDõ ^Á ¾ ¨1ç2Ò)3¦5[s¼@B` ¦¿/1Â\¦5)¡D'¥\`KÑ`B ¢`h¡D§B§
GA
·sÄ´1·VĺÌV´1¹µÈVhm`¦sK¡DmB¥\B§BI)rÂ'R3 ?'L \Kñ`'¥\Î
·sÄ´1·VĺÌVÌ1ºéµ`¨5Á ¾¥\¼'§B)Â?[¥\Â\§Bm\)3¦5h¼'AB`K¦Ë¿PAÂ^¦5)¡D'¥\`KQÎÂ
m`¦5B`£^K¢`Bh3¦'A Æ?BV§B¡DK3)`B¡M¥\B§5 ¨
orderdsk
KJ
¿/5m ¥\ )'KmB¦5I)r ¢`V½H`BK¦Á2 1`¦¿P`1¦5Vm¢`'§B¦¡D)`K¡Dd ¢`Bh§B)¡D§
mAÆ?B¡j&)3£` BÇKh)3¦5AB`£B`KA¥\ BÇ`¦¦5)¡D'¥\`K1KB)¸B`KA¥\ Br¨
Conventions
ç5¢`&8)§B§B)¤VB`£¡D)`Æ`KB)`H13&'m¦¸ ¢`3'£¢`' ¢`BQ¦5)¡j'¥\` ø
004–2081–002 v
Scientific Libraries Reference Manual, Volume 1
¿ Æ ¶ £
ç5¢`BQÓ'ñ¦'æ8m¼'¡DIK)`r¦5` §BB A§lB ¥\[m5¡D¢Ç
command ¡D¥\¥\1`¦5mÑÓ5§BÑ`3)'KB`mÑ@¼' ¢Ç`¥\Ñ@B£`§BmÑ
¥\mm£mÑ``¦Ë¼'3£A¥\¥\B`£§B`£?'1£?&m '¡D 5 ¨
0ÕDÝAÚ]ÕjÛFå]à ¾8 §BB¡= Â?¼'81¡jS¦5`)K Æ?ABΧBI` ABQ1`¦¤V3¦5
)h¡D`¡D¼' [ÎB`£¦5Ó5`¦w¨
ç5¢`BQΧB¦5ÑÓ'ñ¦5æK¼'1¡jS8)`r¦'` Q§BBKA§lB ¥\
user input
¢`rK¢`S'mh` AQBÇB` 1¡j BÆ?ImmB)`¨
À='K¼''LB[m¢`)¤sÇBÇ`)`Î)§B¦5Ñ`Ó'ñN¦5æKm¼'¡DIK`¨
ML N QN PO
¾K¸1¦5¦5BKB)¸ \ ¢`mIKA¥\ KB`£^¡D)`Æ`KB)`mÑ`ƧZ`¥\B`£¡D)`Æ`KB)` 3
'm¦ ¢`3)'£?¢`)'K¢`S¦'¡D'¥\` KB)r¨ ¿/1Â\ê RêËmÂ?m ¥\ &¦5`) §B§
L PO
¡D)`Ó5£'1KB)`H)r¿/¼'A§B§B§lÆ?¡D R¼'3)¡DmmB`£^ÏKê Rê'ÐdmÂ?mK¥\ K¢`1LA'^K¢`
L
½HV¾8¿/ÀMÁÒ¼' B`£\m ¥¨ 1¿PAÂǶê'êËm ¥\ s¦5`)K[1§B§l¡D`Ó'£5A B`
PO BA
¢`V¿/1Â\ç5ÌðmBQ ¢`r'^ ¢`S½=V¾K¿PÀ=Á5ö¥\õ^)¼'AKB`£ÂmK¥¨ ¾8Ãr¾8÷
m ¥\ s¦5`)K Á ¾r¼'§BKK)A¥\=¤V¢`B¡D¢^'^ ¢`S¾8ÃL¾K÷Ö)¼'A B`£\mÂ?m ¥¨
ç5¢`&¦5K'§Bdm¢`§B§lBÇ ¢`S½=V¾K¿PÀ=Á`¦½HV¾K¿PÀ=Á5ö¥\õ\)¼'A B`£ÇmÂmK¥\mÑ`3K ¦
^Q ¢`SüA×KÕjâÿÕDÝAÿsüAØàAå]åÜÑB[sÆmB)ÇdK¢`&íh)AÇm¢`§B§l ¢`b¡D`K)¥\H \ ¢`
K)§B§B¤VB`£Çm `¦53¦5ø
ù
R
¾8`m BK' I)rð5§B¡j AB¡j1§l`¦ð'§B¡DK3)`B¡DHð5`£B`AQÏK¾Kð5ð5ð5ÐLê5)AKΧBSÀM¼@AKB`£
ÁÂ?m ¥ô¾8` K1¡jIÏ8ê'À=Á¾8÷rÐdÁ2 `¦513¦·µµ1Ì`¨ Ê ·¹1¹Ê
ù ÷[ö2À=¼'^ê'AK1Î?ܧBBKÂ;A 'B¦5Ñ`¾Km'IésÏK÷rêA é1Ð
ç5¢`S½=&¾8¿PÀ=Ás`¦Ç½=V¾K¿PÀ=Á5ö¥\õ\)¼'A B`£\mÂ?m ¥\[1§Bm'¼'¼')lK¢`S)¼' B`§''m
)K¢`S¿ m¢`§B§3¨
Á ¡ ¢ ¦ £ » ¡ ¼
VÉH¶ð Á¼'¡DBÓ'H ¢`S`¥\S)K¢`S` AÂ`¦ÎABï5ÂmK
B K'`¡DKB)r¨
BE
Á L&ÀMê'Á¾8Á ê'3m` ¢`VmÂ`KñVdK¢`&` ÂN¨
¾K¶ê'Í5ð5¶ð5&ç)ÉPç5¾KÀM ¾8¦'`KBÓ'[ ¢`Vm ¥\ \¤V¢`B¡D¢Ç ¢`S` Â\¼'¼'§BB¨
vi 004–2081–002
About This Guide
004–2081–002 vii
Scientific Libraries Reference Manual, Volume 1
Reader Comments
KJ
¾KÂ)¢`1ÆS¡j)¥\¥\` Q1Î?)'rK¢`S ¡D¢``B¡D§b¡D¡D'1¡jÂmÑ'¡D` ` ÑR3£` B)¸)
¢`B=¦'¡D'¥\` Ñ@¼@§BmIK§B§Z'¨'ë5Vm'3& \B`¡D§B'¦5M ¢`V BK§BI1`¦¼'Ad`'¥\ÎR
¢`V¦'¡D'¥\`b¤VBK¢ËÂ'R¡D¥\¥\`K¨
E¡DÇ¡D` ¡Dd'HBÇ`Â)r ¢`&8)§B§B)¤VB`£¤VÂø
ù Á`¦æK¥\B§Z \ ¢`SK§B§B)¤VB`£\¦5¦53mø
ù
techpubs@sgi.com
L PO H ·sÄ1´µV¹ÌÊ&µºµ·`¨
Á`¦&Kñ \ ¢`&K ` BÇ) 1çN¡D¢``B¡j1§lê''ΧBB¡j1KB)` V ø
½=mMK¢`&è¦5ΡDõ\)¼'KB)Ǹ ¢`SçN¡j¢``B¡D§Zê''Î?§BB¡D B)`=Í'BÎ?AAÂT
C )§B¦UC B¦5
ù
Cμ@£?ø
ù
http://techpubs.sgi.com
DA
¿P1§B§ZK¢`Sç¡D¢``B¡j1§lê''ΧBB¡j1KB)` 3)'¼'Ñ@ ¢`3)'£?¢ÇK¢`&çN¡j¢``B¡D§ZÉ=mmBmK`¡D
¿P`K Ñ'mB`£)`&dK¢`SK)§B§B¤VB`£\`'¥\ÎANø
èRÁGA ¾r¾KÃL¾K÷ Îm¦)¼'AKB`£ÂmK¥\Nø5·&ºµµ&ºµµéÁ BA ¾
èR½=V¾K¿PÀ=Á)h½HV¾8¿/ÀMÁ'ö¥\õ\Îm¦¼' B`£\m ¥\ )h¿PAÂÀMAB£BËʵµµ
H
mÂ?m ¥\øZ·&º1µµV¹´µ&ÊÈʹVÏKK)§B§Z83&K3)¥ ¢`S½=`B ¦Á K `¦¿P1`¦5Ðb
·sÄ´·&ĺÌV´Ä1µµ
ù Á`¦¥\B§5 \ ¢`VK)§B§B¤VB`£\¦5¦53ø
BA
ç¡D¢``B¡j1§lê''ΧBB¡j1KB)`
Á ¾
VN
·Äµ1µsÉH¥\¼'¢`BK¢`K3Sê'õ?¤V¨
R
¶'` B LB¤[Ñ`¿P1§BB8)A`BV¹éµé1Ì ·Ì´·
CÇ&Æ?§B'IÂ)'h¡D¥\¥\`K[`¦¤VB§B§l3¼')`¦KK¢`¥ó¼@3¥\¼' §BÂ2¨
viii 004–2081–002
CONTENTS
NAME
INTRO_LIBSCI – Introduction to Scientific Library routines
IMPLEMENTATION
See individual man pages for implementation details
DESCRIPTION
The printed versions of the Scientific Library routines appear in 3 volumes and are grouped according to
topics. Not all man pages are available on all hardware types; see the individual man pages for details about
supported hardware types.
Volume 1 contains the following topic sections:
• Solvers for dense linear systems and eigensystems (see INTRO_LAPACK(3S) introductory man page)
• Vector-vector linear algebra subprograms (see INTRO_BLAS1(3S) introductory man page)
• Matrix-vector linear algebra subprograms (see INTRO_BLAS2(3S) introductory man page)
• Matrix-matrix linear algebra subprograms (see INTRO_BLAS3(3S) introductory man page)
• Signal processing routines (see INTRO_FFT(3S) introductory man page)
Volume 2 contains the following topic sections:
• Solvers for dense linear systems and eigensystems (see INTRO_LAPACK(3S) introductory man page)
• Scalable LAPACK subprograms for UNICOS/mk systems (see INTRO_SCALAPACK(3S) introductory
man page)
• Solvers for sparse linear systems (not available on UNICOS/mk systems) (see INTRO_SPARSE(3S)
introductory man page)
• Solvers for special linear systems (not available on UNICOS/mk systems) (see INTRO_SPEC(3S)
introductory man page)
• Basic Linear Algebra Communication Subprograms (BLACS) routines (see INTRO_BLACS(3S)
introductory man page)
• Out-of-core routines (not available on UNICOS/mk systems) (see INTRO_CORE(3S) introductory man
page)
• Machine constant functions (see INTRO_MACH(3S) introductory man page)
• Superseded routines (not available on UNICOS/mk systems): (see INTRO_SUPERSEDED(3S)
introductory man page)
NOTES
Default kinds
When using the CF90 compiler or MIPSpro 7 Fortran 90 compiler on UNICOS, UNICOS/mk, or IRIX, all
arguments must be of default kind unless documented otherwise. On UNICOS and UNICOS/mk, default
kind is KIND=8 for integer, real, complex, and logical arguments; on IRIX, the default kind is KIND=4.
Multitasking
Many of the Scientific Library routines are multitasked. This means that a program that calls a multitasked
Scientific Library routine will run in parallel mode and take advantage of multiple processors whenever
possible, even if the program has not specifically requested multitasking. If a significant percentage of time
is spent in the Scientific Library routine, this feature can significantly reduce wall-clock time.
The NCPUS environment variable determines the maximum number of (logical) central processors that a
multitasked Scientific Library routine uses. If you do not define this variable, the default value is the
number of central processors on the system. To change the number of CPUs used, you can set the value of
NCPUS before your program is executed. If you do not want your program to run in multitasked mode, set
the value of NCPUS equal to 1.
To set the number of logical CPUs used by multitasked Scientific Library routines equal to n, use one of the
following commands.
Under the POSIX shell (sh) or Korn shell (ksh):
NCPUS= n
export NCPUS
The routines are grouped by the section of the manual in which they appear, according to the list given
previously in the DESCRIPTION section of this man page. In many cases, a real variable (single-precision)
routine is paired with its complex variable equivalent.
LAPACK routines are not listed. Most LAPACK routines do not perform multiprocessing, but almost all
LAPACK routines call Level 2 BLAS and Level 3 BLAS that do multiprocessing.
The following are the multitasked Level 2 BLAS routines:
• SGBMV, CGBMV
• SGEMV, CGEMV
• SGER
• CGERC
• CGERU
• CHBMV
• SSBMV
• STRSV, CTRSV
• CHEMV
• CHER
• CHER2
• SSPR
• SSPR12
• SSYMV, CSYMV
• SSYR, CSYR
• SSYR2
• STBMV, CTBMV
• STBSV, CTBSV
• STRMV, CTRMV
The following are the multitasked Level 3 BLAS routines:
• SCOPY2, CCOPY2
• SGEMMS, CGEMMS
• SGEMM, CGEMM
• CHEMM
• CHER2K
• CHERK
• SSYMM, CSYMM
• SSYR2K, CSYR2K
• SSYRK, CSYRK
• STRMM, CTRMM
• STRSM, CTRSM
The following are the multitasked LINPACK routines:
• SCHDD, CCHDD
• SCHEX, CCHEX
• SCHUD, CCHUD
• SGBFA, CGBFA
• SGEDI
• SGEFA
• SPODI, CPODI
• SSVDC, CSVDC
• STRDI, CTRDI
The following are the multitasked Out-of-core routines:
• SCOPY2RV, CCOPY2RV
• SCOPY2VR, CCOPY2VR
• VSGEMM, VCGEMM
• VSGETRF, VCGETRF
• VSGETRS, VCGETRS
• VSPOTRF, VSPOTRS
• VSTRSM, VCTRSM
The following are the multitasked EISPACK routines:
• BAKVEC
• BALBAK
• BANDR
• CBABK2
• COMBAK
• COMLR2
• COMQR2
• CORTB
• CORTH
• FIGI2
• HQR2
• HTRIB3
• HTRIDI
• IMTQLV
• MINFIT
• QZIT
• REBAKB
• REDUC
• REDUC2
• SVD
• TRED2
The following are the multitasked Sparse routines:
• SITRSOL
• SSGETRF
• SSGETRS
• SSPOTRF
• SSPOTRS
• SSTSTRF
• SSTSTRS
The following are the multitasked Signal processing routines:
• CCNVL
• CCNVLF
• CCFFT
• CCFFTM
• CCFFT2D
• CCFFT3D
SEE ALSO
INTRO_BLACS(3S), INTRO_BLAS1(3S), INTRO_BLAS2(3S), INTRO_BLAS3(3S), INTRO_FFT(3S),
INTRO_LAPACK(3S), INTRO_MACH(3S), INTRO_SCALAPACK(3S)
The following man pages are not available on UNICOS/mk systems:
INTRO_CORE(3S), INTRO_SPARSE(3S), INTRO_SPEC(3S), INTRO_SUPERSEDED(3S)
NAME
INTRO_LAPACK – Introduction to LAPACK solvers for dense linear systems
IMPLEMENTATION
See individual man pages for implementation details
DESCRIPTION
The preferred solvers for dense linear systems are those parts of the LAPACK package included in the
current version of the Scientific Library. The LAPACK routines in the Scientific Library supersede the older
LINPACK routines (see LINPACK(3S) for more information).
LAPACK Routines
LAPACK is a public domain library of subroutines for solving dense linear algebra problems, including the
following:
• Systems of linear equations
• Linear least squares problems
• Eigenvalue problems
• Singular value decomposition (SVD) problems
For details about which routines are supported, see LAPACK Routines Contained in the Scientific Library,
which follows.
The LAPACK package is designed to be the successor to the older LINPACK and EISPACK packages. It
uses today’s high-performance computers more efficiently than the older packages. It also extends the
functionality of these packages by including equilibration, iterative refinement, error bounds, and driver
routines for linear systems, routines for computing and reordering the Schur factorization, and condition
estimation routines for eigenvalue problems.
Performance issues are addressed by implementing the most computationally-intensive algorithms by using
the Level 2 and 3 Basic Linear Algebra Subprograms (BLAS). Because most of the BLAS were optimized
in single- and multiple-processor environments for UNICOS and UNICOS/mk systems, these algorithms give
near optimal performance.
The original Fortran programs are described in the LAPACK User’s Guide by E. Anderson, Z. Bai,
C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney,
S. Ostrouchov, and D. Sorensen, published by the Society for Industrial and Applied Mathematics (SIAM),
Philadelphia, 1992. You can order the LAPACK User’s Guide, publication TPD– 0003.
LAPACK Routines Contained in the Scientific Library
Most of the single-precision (64-bit) real and complex routines from LAPACK 2.0 are supported in the
Scientific Library. This includes driver routines and computational routines for solving linear systems, least
squares problems, and eigenvalue and singular value problems. Selected auxiliary routines for generating
and manipulating elementary orthogonal transformations are also supported.
The Scientific Library does not include the LAPACK driver routines for certain generalized eigenvalue and
singular value computations and the divide-and-conquer routines for computing eigenvalues, which were new
for LAPACK 2.0. This may be added in a future release. Also, most of the auxiliary routines used only
internally by LAPACK have been renamed to avoid conflicts with user-defined subroutine names.
The LAPACK routines in the Scientific Library are described online in man pages. For example, to see a
description of the arguments to the expert driver routine for solving a general system of equations, enter the
following command:
% man sgesvx
The user interface to all LAPACK routines is exactly the same as the standard LAPACK interface, except
for the CPTSV(3L) and CPTSVX(3L) driver routines. An optional character argument was added to CPTSV
and CPTSVX to afford upward compatibility with the storage format in LINPACK’s CPTSL. However,
because the argument is optional the LAPACK calling sequence also is accepted.
Several enhancements were made to the public-domain LAPACK software to improve performance for
UNICOS and UNICOS/mk systems. In particular, the solve routines were redesigned to give better
performance for one or a small number of right-hand sides, and to make better use of parallelism when the
number of right-hand sides is large.
Tuning parameters for the block algorithms provided in the Scientific Library are set within the LAPACK
routine ILAENV(3L). ILAENV(3L) is an integer function subprogram that accepts information about the
problem type and dimensions, and it returns one integer parameter, such as the optimal block size, the
minimum block size for which a block algorithm should be used, or the crossover point (the problem size at
which it becomes more efficient to switch to an unblocked algorithm). The setting of tuning parameters
occurs without user intervention, but users may call ILAENV(3L) directly to discover the values that will be
used (for example, to determine how much workspace to provide).
Naming Scheme
The name of each LAPACK routine is a coded specification of its function (within the limits of standard
FORTRAN 77 six-character names).
All driver and computational routines have five- or six-character names of the form XYYZZ or XYYZZZ.
The first letter in each name, X, indicates the data type, as follows:
S REAL (single precision)
C COMPLEX
The next two letters, YY, indicate the type of matrix (or the most-significant matrix). Most of these
two-letter codes apply to both real and complex matrices, but a few apply specifically to only one or the
other. The matrix types are as follows:
BD BiDiagonal
GB General Band
GE GEneral (nonsymmetric)
GG General matrices, Generalized problem
GT General Tridiagonal
HB Hermitian Band (complex only)
HE HErmitian (possibly indefinite) (complex only)
HG Hessenberg matrix, Generalized problem
HP Hermitian Packed (possibly indefinite) (complex only)
HS upper HeSsenberg
OP Orthogonal Packed (real only)
OR ORthogonal (real only)
PB Positive definite Band (symmetric or Hermitian)
PO POsitive definite (symmetric or Hermitian)
PP Positive definite Packed (symmetric or Hermitian)
PT Positive definite Tridiagonal (symmetric or Hermitian)
SB Symmetric Band (real only)
SP Symmetric Packed (possibly indefinite)
ST Symmetric Tridiagonal
SY SYmmetric (possibly indefinite)
TB Triangular Band
TG Triangular matrices, Generalized problem
TP Triangular Packed
TR TRiangular
TZ TrapeZoidal
UN UNitary (complex only)
UP Unitary Packed (complex only)
Some LAPACK auxiliary routines also have man pages on UNICOS and UNICOS/mk systems. These
routines use the special YY designation:
LA LAPACK Auxiliary routine
For example, ILAENV(3) is the auxiliary routine that determines the block size for a particular algorithm and
problem size.
The last two or three letters, ZZ or ZZZ, indicate the computation performed. For example, SGETRF
performs a TRiangular Factorization of a Single-precision (real) GEneral matrix; CGETRF performs the
factorization of a Complex GEneral matrix.
Name Purpose
Name Purpose
Name Purpose
Name Purpose
SSYSVX Solves a real or complex symmetric indefinite system of linear equations AX = B and
CSYSVX provides an estimate of the condition number and error bounds on the solution.
Computational Routines
These computational routines are listed in alphabetical order, with real matrix routines and complex matrix
routines grouped together as appropriate.
Name Purpose
CHECON Estimates the reciprocal of the condition number of a complex Hermitian indefinite matrix,
using the factorization computed by CHETRF.
CHERFS Improves the computed solution to a complex Hermitian indefinite system of linear
equations AX = B and provides error bounds for the solution.
CHETRF Computes the factorization of a complex Hermitian indefinite matrix, using the diagonal
pivoting method.
CHETRI Computes the inverse of a complex Hermitian indefinite matrix, using the factorization
computed by CHETRF.
CHETRS Solves a complex Hermitian indefinite system of linear equations AX = B, using the
factorization computed by CHETRF.
CHPCON Estimates the reciprocal of the condition number of a complex Hermitian indefinite matrix
in packed storage, using the factorization computed by CHPTRF.
CHPRFS Improves the computed solution to a complex Hermitian indefinite system of linear
equations AX = B (A is held in packed storage) and provides error bounds for the solution.
CHPTRF Computes the factorization of a complex Hermitian indefinite matrix in packed storage,
using the diagonal pivoting method.
CHPTRI Computes the inverse of a complex Hermitian indefinite matrix in packed storage, using the
factorization computed by CHPTRF.
CHPTRS Solves a complex Hermitian indefinite system of linear equations AX = B (A is held in
packed storage) using the factorization computed by CHPTRF.
ILAENV Determines tuning parameters (such as the block size).
SBDSQR Compute the singular value decomposition of a general matrix reduced to bidiagonal form
CBDSQR
SGBCON Estimates the reciprocal of the condition number of a general band matrix, in either the 1-
CGBCON norm or the infinity-norm, using the LU factorization computed by SGBTRF or CGBTRF.
Name Purpose
SGBEQU Computes row and column scalings to equilibrate a general band matrix and reduce its
CGBEQU condition number. Does not multiprocess or call any multiprocessing routines.
SGBRFS Improves the computed solution to any of the following general banded systems of linear
CGBRFS equations and provides error bounds for the solution.
AX = B
T
A X=B
H
A X=B
SGBTRF Computes an LU factorization of a general band matrix, using partial pivoting with row
CGBTRF interchanges.
SGBTRS Solves any of the following general banded systems of linear equations using the LU
CGBTRS factorization computed by SGBTRF or CGBTRF.
AX = B
T
A X=B
H
A X=B
SGEBAK Back transform the eigenvectors of a matrix transformed by SGEBAL/CGEBAL.
CGEBAK
SGEBAL Balances a general matrix A.
CGEBAL
SGEBRD Reduces a general matrix to upper or lower bidiagonal form by an orthogonal/unitary
CGEBRD transformation.
SGECON Estimates the reciprocal of the condition number of a general matrix, in either the 1-norm or
CGECON the infinity-norm, using the LU factorization computed by SGETRF or CGETRF.
SGEEQU Computes row and column scalings to equilibrate a general rectangular matrix and to reduce
CGEEQU its condition number.
SGEHRD Reduces a general matrix to upper Hessenberg form by an orthogonal/unitary transformation.
CGEHRD
SGELQF Computes an LQ factorization of a general rectangular matrix.
CGELQF
SGEQLF Computes a QL factorization of a general rectangular matrix.
CGEQLF
SGEQPF Computes a QR factorization with column pivoting of a general rectangular matrix.
CGEQPF
Name Purpose
Name Purpose
SGTTRF Computes an LU factorization of a general tridiagonal matrix, using partial pivoting with
CGTTRF row interchanges.
SGTTRS Solves a general tridiagonal system of linear equations using the LU factorization computed
CGTTRS by SGTTRF or CGTTRF. AX = B
T
A X=B
H
A X=B
SHGEQZ Compute the eigenvalues of a matrix pair (A,B) in generalized upper Hessenberg form using
CHGEQZ the QZ method
SHSEIN Compute eigenvectors of a upper Hessenberg matrix by inverse iteration
CHSEIN
SHSEQR Compute eigenvalues, Schur form, and Schur vectors of a upper Hessenberg matrix
CHSEQR
SLAMCH Computes machine-specific constants.
SLARF Applies an elementary reflector.
CLARF
SLARFB Applies a block reflector.
CLARFB
SLARFG Generates an elementary reflector.
CLARFG
SLARFT Forms the triangular factor of a block reflector.
CLARFT
SLARGV Generate a vector of real or complex plane rotations
CLARGV
SLARNV Generates a vector of random numbers.
CLARNV
SLARTG Generates a plane rotation.
CLARTG
SLARTV Apply a vector of real or complex plane rotations to two vectors
CLARTV
SLASR Apply a sequence of real plane rotations to a matrix
CLASR
SOPGTR Generates the orthogonal/unitary matrix Q from SSPTRD/CHPTRD.
CUPGTR
Name Purpose
Name Purpose
SPBRFS Improves the computed solution to a symmetric or Hermitian positive definite banded
CPBRFS system of linear equations AX = B and provides error bounds for the solution.
SPBSTF Compute a split Cholesky factorization of a symmetric or Hermitian positive definite band
CPBSTF matrix.
SPBTRF Computes the Cholesky factorization of a symmetric or Hermitian positive definite band
CPBTRF matrix.
SPBTRS Solves a symmetric or Hermitian positive definite banded system of linear equations AX =
CPBTRS B, using the Cholesky factorization computed by SPBTRF or CPBTRF.
SPOCON Estimates the reciprocal of the condition number of a symmetric or Hermitian positive
CPOCON definite matrix, using the Cholesky factorization computed by SPOTRF or CPOTRF.
SPOEQU Computes row and column scalings to equilibrate a symmetric or Hermitian positive definite
CPOEQU matrix and reduces its condition number.
SPORFS Improves the computed solution to a symmetric or Hermitian positive definite system of
CPORFS linear equations AX = B and provides error bounds for the solution.
SPOTRF Computes the Cholesky factorization of a symmetric or Hermitian positive definite matrix.
CPOTRF
SPOTRI Computes the inverse of a symmetric or Hermitian positive definite matrix, using the
CPOTRI Cholesky factorization computed by SPOTRF or CPOTRF.
SPOTRS Solves a symmetric or Hermitian positive definite system of linear equations AX = B, using
CPOTRS the Cholesky factorization computed by SPOTRF or CPOTRF.
SPPCON Estimates the reciprocal of the condition number of a symmetric or Hermitian positive
CPPCON definite matrix in packed storage, using the Cholesky factorization computed by SPPTRF or
CPPTRF.
SPPEQU Computes row and column scalings to equilibrate a symmetric or Hermitian positive definite
CPPEQU matrix in packed storage and reduces its condition number.
SPPRFS Improves the computed solution to a symmetric or Hermitian positive definite system of
CPPRFS linear equations AX = B (A is held in packed storage) and provides error bounds for the
solution.
SPPTRF Computes the Cholesky factorization of a symmetric or Hermitian positive definite matrix in
CPPTRF packed storage.
SPPTRI Computes the inverse of a symmetric or Hermitian positive definite matrix in packed
CPPTRI storage, using the Cholesky factorization computed by SPPTRF or CPPTRF.
SPPTRS Solves a symmetric or Hermitian positive definite system of linear equations AX = B (A is
CPPTRS held in packed storage) using the Cholesky factorization computed by SPPTRF or CPPTRF.
Name Purpose
H
SPTCON Uses the LDL factorization computed by SPTTRF or CPTTRF to compute the reciprocal
CPTCON of the condition number of a symmetric or Hermitian positive definite tridiagonal matrix.
SPTEQR Compute eigenvalues and eigenvectors of a symmetric or Hermitian positive definite
CPTEQR tridiagonal matrix.
SPTRFS Improves the computed solution to a symmetric or Hermitian positive definite tridiagonal
CPTRFS system of linear equations AX = B and provides error bounds for the solution.
SPTTRF Computes the LDL H factorization of a symmetric or Hermitian positive definite tridiagonal
CPTTRF matrix.
H
SPTTRS Uses the LDL factorization computed by SPTTRF or CPTTRF to solve a symmetric or
CPTTRS Hermitian positive definite tridiagonal system of linear equations.
SSBGST Reduce a symmetric or Hermitian definite banded generalized eigenproblem to standard
CHBGST form.
SSBTRD Reduce a symmetric or Hermitian band matrix to real symmetric tridiagonal form by an
CHBTRD orthogonal/unitary transformation.
SSPCON Estimates the reciprocal of the condition number of a real or complex symmetric indefinite
CSPCON matrix in packed storage, using the factorization computed by SSPTRF or CSPTRF.
SSPGST Reduce a symmetric or Hermitian definite generalized eigenproblem to standard form, using
CHPGST packed storage.
SSPRFS Improves the computed solution to a real or complex symmetric indefinite system of linear
CSPRFS equations AX = B (A is held in packed storage) and provides error bounds for the solution.
SSPTRD Reduces a symmetric/Hermitian packed matrix A to real symmetric tridiagonal form by an
CHPTRD orthogonal/unitary transformation.
SSPTRF Computes the factorization of a real or complex symmetric indefinite matrix in packed
CSPTRF storage, using the diagonal pivoting method.
SSPTRI Computes the inverse of a real or complex symmetric indefinite matrix in packed storage,
CSPTRI using the factorization computed by SSPTRF or CSPTRF.
SSPTRS Solves a real or complex symmetric indefinite system of linear equations AX = B (A is held
CSPTRS in packed storage) using the factorization computed by SSPTRF or CSPTRF.
SSTEBZ Compute eigenvalues of a symmetric tridiagonal matrix by bisection.
SSTEIN Compute eigenvectors of a real symmetric tridiagonal matrix by inverse iteration.
CSTEIN
SSTEQR Compute eigenvalues and eigenvectors of a real symmetric tridiagonal matrix using the
CSTEQR implicit QL or QR method.
Name Purpose
SSTERF Compute all eigenvalues of a symmetric tridiagonal matrix using the root-free variant of the
QL or QR algorithm.
SSYCON Estimates the reciprocal of the condition number of a real or complex symmetric indefinite
CSYCON matrix, using the factorization computed by SSYTRF or CSYTRF.
SSYGST Reduce a symmetric or Hermitian definite generalized eigenproblem to standard form.
CHEGST
SSYRFS Improves the computed solution to a real or complex symmetric indefinite system of linear
CSYRFS equations AX = B and provides error bounds for the solution.
SSYTRD Reduces a symmetric/Hermitian matrix A to real symmetric tridiagonal form by an
CHETRD orthogonal/unitary transformation.
SSYTRF Computes the factorization of a real complex symmetric indefinite matrix, using the
CSYTRF diagonal pivoting method.
SSYTRI Computes the inverse of a real or complex symmetric indefinite matrix, using the
CSYTRI factorization computed by SSYTRF or CSYTRF.
SSYTRS Solves a real or complex symmetric indefinite system of linear equations AX = B, using the
CSYTRS factorization computed by SSYTRF or CSYTRF.
STBCON Estimates the reciprocal of the condition number of a triangular band matrix, in either the
CTBCON 1-norm or the infinity-norm.
STBRFS Provides error bounds for the solution of any of the following triangular banded systems of
CTBRFS linear equations:
AX = B
T
A X=B
H
A X=B
STBTRS Solves any of the following triangular banded systems of linear equations:
CTBTRS AX = B
T
A X=B
H
A X=B
STGEVC Compute eigenvectors of a pair of matrices (A,B) in generalized Schur form.
CTGEVC
STPCON Estimates the reciprocal of the condition number of a triangular matrix in packed storage, in
CTPCON either the 1-norm or the infinity-norm.
Name Purpose
STPRFS Provides error bounds for the solution of any of the following triangular systems of linear
CTPRFS equations where A is held in packed storage.
AX = B
T
A X=B
H
A X=B
STPTRI Computes the inverse of a triangular matrix in packed storage.
CTPTRI
STPTRS Solves any of the following triangular systems of linear equations where A is held in packed
CTPTRS storage.
AX = B
T
A X=B
H
A X=B
STRCON Estimates the reciprocal of the condition number of a triangular matrix, in either the 1-norm
CTRCON or the infinity-norm.
STREVC Compute eigenvectors of a real upper quasi-triangular matrix.
CTREVC Compute eigenvectors of a complex triangular matrix.
STREXC Exchange diagonal blocks in the real Schur factorization of a real matrix.
CTREXC Exchange diagonal elements in the Schur factorization of a complex matrix.
STRRFS Provides error bounds for the solution of any of the following triangular systems of linear
CTRRFS equations:
AX = B
T
A X=B
H
A X=B
STRSEN Compute condition numbers to measure the sensitivity of a cluster of eigenvalues and its
CTRSEN corresponding invariant subspace.
STRSNA Compute condition numbers for specified eigenvalues and eigenvectors of a real upper
quasi-triangular matrix.
CTRSNA Compute condition numbers for specified eigenvalues and eigenvectors of a complex upper
triangular matrix.
STRSYL Solve the Sylvester matrix equation
CTRSYL
Name Purpose
SEE ALSO
LINPACK(3S) which lists the names of the LINPACK routines that are superseded by the linear system
solvers in LAPACK
LAPACK User’s Guide, CRI publication TPD– 0003
NAME
EISPACK – Introduction to Eigensystem computation for dense linear systems
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
EISPACK is a package of Fortran routines for solving the eigenvalue problem and for computing and using
the singular-value decomposition.
The original Fortran versions are described in the Matrix Eigensystem Routines – EISPACK Guide, second
edition, by B. T. Smith, J. M. Boyle, J. J. Dongarra, B. S. Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler,
published by Springer-Verlag, New York, 1976, Library of Congress catalog card number 76– 2662. The
original Fortran versions also are documented in the Matrix Eigensystem Routines - EISPACK Guide
Extensions (Lecture Notes in Computer Science, Vol. 51) by B. S. Garbow, J. M. Boyle, J. J. Dongarra, and
C. B. Moler, published by Springer-Verlag, New York, 1977, Library of Congress catalog card number
77– 2802.
Most EISPACK routines are superseded by routines from the more recent public domain package, LAPACK,
described in the LAPACK User’s Guide (see INTRO_LAPACK(3S) for a complete reference). Of particular
interest to EISPACK users who want to switch to LAPACK is Appendix D, "Converting from LINPACK
and EISPACK," of the LAPACK User’s Guide. This appendix contains a table that shows the name of the
LAPACK routines that are functionally equivalent to each EISPACK routine.
Each Scientific Library version of the EISPACK routines has the same name, algorithm, and calling
sequence as the original version. Optimization of each routine includes the following:
• Use of the Level 1 BLAS routines when applicable, and use of the Level 2 and 3 BLAS in TRED1,
TRED2, TRBAK, and REDUC.
• Removal of Fortran IF statements when the result of either branch is the same.
• Unrolling complicated Fortran DO loops to improve vectorization.
• Use of Fortran compiler directives to aid vector optimization.
These modifications increase vectorization and use optimized library routines; therefore, they reduce
execution time. Only the order of computations within a loop is changed; the modified versions produce the
same answers as the original versions, unless the problem is sensitive to small changes in the data.
The following table lists the routines, name, matrix or decomposition, and purpose for each routine.
Reduces matrix to upper Hessenberg form by using unitary Complex general CORTH
similarity transformations
Forms eigenvectors by back transforming those of the Real general ELMBAK
corresponding matrices determined by ELMHES
Reduces matrix to upper Hessenberg form by using Real general ELMHES
elementary similarity transformations
Accumulates transformations used in the reduction to upper Real general ELTRAN
Hessenberg form done by ELMHES
Reduces to symmetric tridiagonal matrix that has the same Real nonsymmetric tridiagonal FIGI
eigenvalues
Reduces to symmetric tridiagonal matrix that has the same Real nonsymmetric tridiagonal FIGI2
eigenvalues, retaining the diagonal similarity transformations
Finds eigenvalues by QR method Real upper Hessenberg HQR
Finds eigenvalues and eigenvectors by QR method Real upper Hessenberg HQR2
Finds eigenvectors given the eigenvectors of the real Complex Hermitian HTRIBK
symmetric tridiagonal matrix calculated by HTRIDI
(including eigenvectors calculated by TQL2 or IMTQL2)
Finds eigenvectors given the eigenvectors of the real Complex Hermitian (packed) HTRIB3
symmetric tridiagonal matrix calculated by HTRID3
(eigenvectors calculated by TQL2 or IMTQL2, among
others)
Reduces to real symmetric tridiagonal form by using unitary Complex Hermitian HTRIDI
similarity transformations
Reduces to real symmetric tridiagonal form by using unitary Complex Hermitian (packed) HTRID3
similarity transformations
Finds eigenvalues by using implicit QL method, and Real symmetric tridiagonal IMTQLV
associates them with their corresponding submatrix indices
Finds eigenvalues by implicit QL method Real symmetric tridiagonal IMTQL1
Finds eigenvalues and eigenvectors by implicit QL method Real symmetric tridiagonal IMTQL2
Finds eigenvectors that correspond to specified eigenvalues Real upper Hessenberg INVIT
by using inverse iteration
Determines the singular-value decomposition A = USV T , Real rectangular MINFIT
forming U T B rather than U by using Householder
bidiagonalization and a variant of the QR algorithm
Finds the eigenvalues that lie between specified indices by Real symmetric tridiagonal TRIDIB
using bisection
Finds the eigenvalues that lie in a specified interval and each Real symmetric tridiagonal TSTURM
corresponding eigenvector by using bisection and inverse
iteration
SEE ALSO
LAPACK User’s Guide, CRI publication TPD– 0003
NAME
LINPACK – Single-precision real and complex LINPACK routines
IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)
DESCRIPTION
LINPACK is a public domain package of Fortran routines that solves systems of linear equations and
computes the QR, Cholesky, and singular value decompositions. The original Fortran programs are
described in the LINPACK User’s Guide by J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart,
published by the Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 1979, Library of
Congress catalog card number 78– 78206.
Most LINPACK routines are superseded by routines from the more recent public domain package, LAPACK,
described in the LAPACK User’s Guide (see INTRO_LAPACK(3S) for a complete reference). Of particular
interest to LINPACK users who want to switch to LAPACK is Appendix D, "Converting from LINPACK
and EISPACK," of the LAPACK User’s Guide. This appendix contains a table that shows the name of the
LAPACK routines that are functionally equivalent to each LINPACK routine.
Each single-precision Scientific Library version of the LINPACK routines has the same name, algorithm, and
calling sequence as the original version. Optimization of each routine includes the following:
• Replacement of calls to the BLAS routines SSCAL, SCOPY, SSWAP, SAXPY, and SROT with inline
Fortran code vectorized by the Cray Research Fortran compilers. (SROTG is still called by LINPACK.)
• Removal of Fortran IF statements in which the result of either branch is the same.
• Replacement of SDOT to solve triangular systems of linear equations in SPOSL, STRSL, and SCHDD
with more vectorizable code.
These optimizations affect only the execution order of floating-point operations in DO loops. See the
LINPACK User’s Guide for further descriptions. The complex routines have been added without much
optimization.
As mentioned previously, LAPACK does not completely supersede LINPACK. In the following table, an
asterick (*) marks LINPACK routines that are not superseded in public domain LAPACK. This table lists
the name, matrix or decomposition, and purpose for each routine.
SEE ALSO
INTRO_LAPACK(3S) for information and references about the LAPACK routines that supersede LINPACK
LAPACK User’s Guide, CRI publication TPD– 0003
Dongarra, J. J., C. B. Moler, J. R. Bunch, and G. W. Stewart, LINPACK User’s Guide. Society for
Industrial and Applied Mathematics (SIAM), Philadelphia, 1979.
NAME
INTRO_BLAS1 – Introduction to vector-vector linear algebra subprograms
IMPLEMENTATION
See individual man pages for implementation details
DESCRIPTION
The linear algebra subprograms are written to run optimally on UNICOS and UNICOS/mk systems. These
subprograms use call-by-address convention when called by a Fortran, C, or CAL program.
Level 1 Basic Linear Algebra Subprograms
The Level 1 BLAS perform basic vector-vector operations. Only the single-precision real and complex data
types are supported from the standard set of Level 1 BLAS. In addition, several half-precision subroutines
are provided as extensions to the BLAS on UNICOS/mk systems, using the following naming conventions:
H half-precision (32-bit) REAL
G half-precision (32-bit) COMPLEX
The following three types of vector-vector operations are available:
• Dot products and various vector norms
• Scaling, copying, swapping, and computing linear combination of vectors
• Generate or apply plane or modified plane rotations
Increment arguments
A vector’s description consists of the name of the array (x or y) followed by the storage spacing (increment)
in the array of vector elements (incx or incy). The increment can be positive or negative. When a vector x
consists of n elements, the corresponding actual array arguments must be of a length at least
1+(n – 1) . incx . For a negative increment, the first element of x is assumed to be x (1+(n – 1) . incx ).
The standard specification of _SCAL, _NRM2, _ASUM, and I_AMAX does not define their behavior for
negative increments, so this functionality is an extension to the standard BLAS.
Setting an increment argument to 0 can cause unpredictable results.
Fortran type declaration for functions
Always declare the data type of external functions. Declaring the data type of the complex Level 1 BLAS
functions is particularily important because, based on the first letter of their names and the Fortran data
typing rules, the default implied data type would be REAL.
Fortran type declarations for function names follow:
Type Function Name
REAL SASUM, SCASUM, SCNRM2, SDOT, SNRM2, SPDOT, SSUM
COMPLEX CDOTC, CDOTU, CSUM
When using half-precision routines, the following types can only be declared in Fortran 90:
Type Function Name
REAL(KIND=4) HDOT
COMPLEX(KIND=4) GTOC,GDOTU
Level 1 BLAS search functions
Several search functions are properly a part of Level 1 BLAS, but they are not described in this section of
the manual. See the INTRO_SORTSEARCH(3F) man page for details. These functions are as follows
(functions marked with an asterisk [*] are extensions to the standard set of Level 1 BLAS routines):
ISA MAX , ICA MAX, ISA MIN *, ISMAX* , ISM IN*
These man pages are documented in the Application Programmer’s Library Reference Manual.
Table of Level 1 BLAS routines
The following table contains the purpose, operation, and name of each Level 1 BLAS routine (except search
functions). The first routine name listed in each table block is the name of the manual page that contains
documentation for any routines listed in that block. The routines marked with an asterisk (*) are extensions
to the standard set of Level 1 BLAS routines. For complete details about each operation, see the individual
man pages.
Sums the absolute values of the real and imaginary parts scasum ← ||Real (x )||1 + ||Imag (x )||1 SCASUM
n n
Σ Real (xi ) + iΣ=1 Imag (xi )
of the elements of a complex vector =
i =1
√Σ x
Computes the Euclidean norm (also called l 2 norm) of a n SNRM2
snrm2 ← ||x ||2 = i
2
real or complex vector i =1
√
n SCNRM2
scnrm2 ← ||x || = Σ x x 2 i i
i =1
SEE ALSO
Lawson, C., Hanson, R., Kincaid, D., and Krogh, F., "Basic Linear Algebra Subprograms for Fortran Usage,"
ACM Transactions on Mathematical Software, 5 (1979), pp. 308 – 325.
NAME
CSROT – Applies a real plane rotation to a pair of complex vectors
SYNOPSIS
CALL CSROT ( n, x, incx, y, incy, c, s)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.
DESCRIPTION
CSROT applies a real plane rotation to a pair of complex vectors. The form of the operation is the
following:
xi c s xi
:=
yi −s c yi
SEE ALSO
CROTG(3S), SROT(3S), SROTG(3S), SROTM(3S)
NAME
HAXPY, GAXPY – Adds a scalar multiple of a real or complex vector to another real or complex vector
SYNOPSIS
CALL HAXPY (n, alpha, x, incx, y, incy)
CALL GAXPY (n, alpha, x, incx, y, incy)
IMPLEMENTATION
UNICOS/mk systems
These subroutines execute on a single processor and use private data.
DESCRIPTION
HAXPY adds a scalar multiple of a real vector to another real vector.
GAXPY adds a scalar multiple of a complex vector to another complex vector.
HAXPY and GAXPY perform the following vector operation:
y←αx+y
where α is a real or complex scalar, and x and y are real or complex vectors.
These routines have the following arguments:
n INTEGER(KIND=8). (input)
Number of elements in the vectors. If n ≤ 0, HAXPY and GAXPY return without any
computation.
alpha HAXPY: REAL(KIND=4). (input)
GAXPY: COMPLEX(KIND=4). (input)
Scalar multiplier α. If real α = 0. or complex α = 0 = 0. + 0.i, HAXPY and GAXPY return
without any computation.
x HAXPY: REAL(KIND=4) array of dimension (n– 1) . incx + 1. (input)
GAXPY: COMPLEX(KIND=4) array of dimension (n– 1) . incx + 1. (input)
Contains the vector to be scaled before summation.
incx INTEGER(KIND=8). (input)
Increment between elements of x. incx should not be 0.
y HAXPY: REAL(KIND=4) array of dimension (n– 1) . incy + 1. (input and output)
GAXPY: COMPLEX(KIND=4) array of dimension (n– 1) . incy + 1. (input and output)
Before calling the routine, y contains the vector to be summed. After the routine ends, y
contains the result of the summation.
incy INTEGER(KIND=8). (input)
Increment between elements of y. incy should not be 0.
NOTES
HAXPY and GAXPY are based on SAXPY and CAXPY from the Level 1 Basic Linear Algebra Subprograms
(Level 1 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)
RETURN VALUES
When n ≤ 0, real α = 0., or complex α = 0 = 0.+0.i, these routines return immediately with no change in
their arguments.
NAME
HDOT, GDOTC, GDOTU – Computes a dot product (inner product) of two real or complex vectors
SYNOPSIS
dot = HDOT (n, x, incx, y, incy)
dot = GDOTC (n, x, incx, y, incy)
dot = GDOTU (n, x, incx, y, incy)
IMPLEMENTATION
UNICOS/mk systems
This subroutine executes on a single processor and uses private data.
DESCRIPTION
HDOT computes a dot product of two real vectors (l 2 real inner product).
GDOTC computes a dot product of the conjugate of a complex vector and another complex vector (l 2 real
inner product).
GDOTU computes a dot product of two complex vectors.
HDOT and GDOTU perform the following vector operation:
n
dot ← x T y = Σ xi yi
i =1
T
where x and y are real or complex vectors and x is the transpose of x.
GDOTC performs the following vector operation:
n
dot ← x H y = Σ xi yi
i =1
H
where x and y are complex vectors, and x is the conjugate transpose of x.
These functions have the following arguments:
dot HDOT: REAL(KIND=4). (output)
GDOTC, GDOTU: COMPLEX(KIND=4). (output)
Result (dot product). If n ≤ 0, dot is set to 0.
n INTEGER(KIND=8). (input)
Number of elements in each vector.
x HDOT: REAL(KIND=4) array of dimension (n– 1) . incx + 1. (input)
GDOTC, GDOTU: COMPLEX(KIND=4) array of dimension (n– 1) . incx + 1. (input)
Array x contains the first vector operand.
NOTES
HDOT, GDOTC, and GDOTU are based on SDOT, CDOTC, and CDOTU from the Level 1 Basic Linear Algebra
Subprograms (Level 1 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)
NAME
SASUM, SCASUM – Sums the absolute value of elements in a real or complex vector
SYNOPSIS
sum = SASUM (n, x, incx)
sum = SCASUM (n, x, incx)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use private data.
DESCRIPTION
SASUM sums the absolute values of the elements of a real vector, as follows:
n
sum ← ||x ||1 = Σ
i =1
xi
NOTES
SASUM and SCASUM are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
NAME
SAXPBY, CAXPBY – Adds a scalar multiple of a real or complex vector x to a scalar multiple of another
real or complex vector y
SYNOPSIS
CALL SAXPBY (n, alpha, x, incx, beta, y, incy)
CALL CAXPBY (n, alpha, x, incx, beta, y, incy)
IMPLEMENTATION
UNICOS/mk systems
These subroutines execute on a single processor and use private data only.
DESCRIPTION
SAXPBY adds a scalar multiple of a real vector x to a scalar multiple of a real vector y.
CAXPBY adds a scalar multiple of a complex vector x to a scalar multiple of a complex vector y.
y←αx+βy
where x and y are n-vectors and α and β are scalars.
The following special cases are recognized:
α = 0: equivalent to SSCAL or CSCAL
α = 1, β = 0: equivalent to SCOPY or CCOPY
α ≠1, β = 0: like SCOPY or CCOPY, with scaling
α ≠0, β = 1: equivalent to SAXPY or CAXPY
These routines have the following arguments:
n Integer. (input)
Number of elements of the vectors x and y.
alpha SAXPBY: Real. (input)
CAXPBY: Complex. (input)
The scalar α.
x SAXPBY: Real array of dimension (1+(n– 1) . incx ). (input)
CAXPBY: Complex array of dimension (1+(n– 1) . incx ). (input)
The vector x. If incx > 0, the i-th element of the vector x is located in x(1+(i-1) . incx ). If
incx < 0, the i-th element of the vector x is located in x(1+(n-i) . incx ).
incx Integer. (input)
Increment between elements of the vector x. If incx < 0, x is processed in reverse order.
beta SAXPBY: Real. (input)
CAXPBY: Complex. (input)
The scalar β.
NAME
SAXPY, CAXPY – Adds a scalar multiple of a real or complex vector to another real or complex vector
SYNOPSIS
CALL SAXPY (n, alpha, x, incx, y, incy)
CALL CAXPY (n, alpha, x, incx, y, incy)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SAXPY adds a scalar multiple of a real vector to another real vector.
CAXPY adds a scalar multiple of a complex vector to another complex vector.
SAXPY and CAXPY perform the following vector operation:
y ← αx +y
where α is a real or complex scalar, and x and y are real or complex vectors.
These routines have the following arguments:
n Integer. (input)
Number of elements in the vectors. If n ≤ 0, SAXPY and CAXPY return without any
computation.
alpha SAXPY: Real. (input)
CAXPY: Complex. (input)
Scalar multiplier α. If real α = 0 or complex α = 0 = 0. + 0.i, SAXPY and CAXPY return
without any computation.
x SAXPY: Real array of dimension (n– 1) . incx + 1. (input)
CAXPY: Complex array of dimension (n– 1) . incx + 1. (input)
Contains the vector to be scaled before summation.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.
NOTES
SAXPY and CAXPY are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)
RETURN VALUES
When n ≤ 0, real α = 0., or complex α = 0 = 0.+0.i, these routines return immediately with no change in
their arguments.
NAME
SCOPY, CCOPY – Copies a real or complex vector into another real or complex vector
SYNOPSIS
CALL SCOPY (n, x, incx, y, incy)
CALL CCOPY (n, x, incx, y, incy)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.
DESCRIPTION
SCOPY copies a real vector into another real vector.
CCOPY copies a complex vector into another complex vector.
SCOPY and CCOPY perform the following vector operation:
y←x
where x and y are real or complex vectors.
These routines have the following arguments:
n Integer. (input)
Number of elements to be copied. If n ≤ 0, SCOPY and CCOPY return without any computation.
x SCOPY: Real array of dimension (n– 1) . incx + 1. (input)
CCOPY: Complex array of dimension (n– 1) . incx + 1. (input)
Vector from which to copy.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.
y SCOPY: Real array of dimension (n– 1) . incy + 1. (output)
CCOPY: Complex array of dimension (n– 1) . incy + 1. (output)
Result vector.
incy Integer. (input)
Increment between elements of y. If incy = 0, the results will be unpredictable.
NOTES
SCOPY and CCOPY are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)
NAME
SDOT, CDOTC, CDOTU – Computes a dot product (inner product) of two real or complex vectors
SYNOPSIS
dot = SDOT (n, x, incx, y, incy)
dot = CDOTC (n, x, incx, y, incy)
dot = CDOTU (n, x, incx, y, incy)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.
DESCRIPTION
SDOT computes a dot product of two real vectors l 2 real inner product).
CDOTC computes a dot product of the conjugate of a complex vector and another complex vector l 2 complex
inner product).
CDOTU computes a dot product of two complex vectors.
SDOT and CDOTU perform the following vector operation:
n
dot ← x y = Σ x i y
T
i
i =1
T
where x and y are real or complex vectors, and x is the transpose of x.
CDOTC performs the following vector operation:
n
Σ xi
H
dot ← x y = yi
i =1
H
where x and y are complex vectors, and x is the conjugate transpose of x.
These functions have the following arguments:
dot SDOT: Real. (output)
CDOTC, CDOTU: Complex. (output)
Result (dot product). If n ≤ 0, dot is set to 0.
n Integer. (input)
Number of elements in each vector.
x SDOT: Real array of dimension (n– 1) . incx + 1. (input)
CDOTC, CDOTU: Complex array of dimension (n– 1) . incx + 1. (input)
Array x contains the first vector operand.
NOTES
SDOT, CDOTC, and CDOTU are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)
NAME
SHAD – Computes the Hadamard product of two vectors
SYNOPSIS
CALL SHAD (n, alpha, x, incx, y, incy, beta, z, incz)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
SHAD computes the Hadamard product of two vectors X and Y, storing the results in a vector Z.
z (i ) := α x (i ) y (i ) + β z (i ), i = 1, . . ., n
α = 0 is recognized as a special case. β = 0 or β = 1 is also recognized as a special case.
The SHAD routine accepts the following arguments:
n Integer. (input)
The number of elements in each vector.
alpha Real. (input)
The scalar α.
x Real array, dimension (1+(n– 1) . incx). (input)
The vector x.
If incx > 0, the ith element of the vector x is located in x(1+(i– 1) . incx).
If incx < 0, the ith element of the vector x is located in x(1+(n– i) . incx ).
incx Integer. (input)
The increment between elements of the vector x.
incx must not = 0.
y Real array, dimension (1+(n– 1) . incy). (input)
The vector y.
If incy > 0, the ith element of the vector y is located in y(1+(i– 1) . incy).
If incy < 0, the ith element of the vector y is located in y(1+(n– i) . incy ).
incy Integer. (input)
The increment between elements of the vector y. incy must not = 0.
beta Real. (input)
The scalar beta.
NAME
SNRM2, SCNRM2 – Computes the Euclidean norm of a vector
SYNOPSIS
enrm = SNRM2 (n, x, incx)
enrm = SCNRM2 (n, xi, incx)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.
DESCRIPTION
SNRM2 computes the Euclidean (l 2 ) norm of a real vector, as follows:
√
n
enrm ← ||x ||2 = √x T x = Σ
i =1
xi 2
√
n
enrm ← ||x ||2 =√x H x = Σ xi xi
i =1
H
where x is a complex vector, and x denotes the conjugate transpose of x.
These functions have the following arguments:
enrm Real. (output)
Result (Euclidean norm). If n ≤ 0, enrm is set to 0.
n Integer. (input)
Number of elements in the operand vector.
x SNRM2: Real array of dimension (n– 1) . incx + 1. (input)
SCNRM2: Complex array of dimension (n– 1) . incx + 1. (input)
Array x contains the operand vector.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.
NOTES
SNRM2 and SCNRM2 are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
The version of these routines on UNICOS systems does not behave the same way that the public domain
FORTRAN version behaves. For performance reasons, they do not scale the input values; input for the
routines must be within a certain range of numbers.
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
NAME
SPAXPY – Adds a scalar multiple of a real vector to a sparse real vector
SYNOPSIS
CALL SPAXPY (n, alpha, x, y, index)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
SPAXPY adds a scalar multiple of a real vector to a sparse real vector. It performs the following vector
operation where α is a real scalar, x is a real vector, and y is a sparse real vector:
y←αx+y
This routine has the following arguments:
n Integer. (input)
Number of vector elements to be used in the computation. If n ≤ 0, SPAXPY returns without
any computation.
alpha Real. (input)
Scalar multiplier α. If α = 0.0, SPAXPY returns without any computation.
x Real array of dimension n. (input)
Contains the dense vector operand to be scaled before adding.
y Real array of dimension MAX{index(1),. . .,index(n)}. (input and output)
On input, y contains the sparse vector used in the addition. On output, y receives the resulting
vector.
index Integer array of dimension n. (input)
Contains the vector of indices for elements of y. All elements in index should be unique.
SPAXPY executes an operation equivalent to the following Fortran code:
DO 10 I=1,N
Y(I NDEX(I ))= ALP HA* X(I )+Y (IN DEX (I) )
10 CONTIN UE
NOTES
SPAXPY is an extension to the standard Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
NAME
SPDOT – Computes the dot product of a real vector and a real sparse vector
SYNOPSIS
dot = SPDOT (n, y, index, x)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
SPDOT computes the dot product of a real vector and a sparse real vector (l 2 real inner product). It
T
performs the following vector operation where y is a real vector, y is the transpose of y, and x is a real
sparse vector:
n
dot ← y T x = Σ yi xi
i =1
NOTES
SPDOT is an extension to the standard Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
NAME
SROT, CROT – Applies a real plane rotation or complex coordinate rotation
SYNOPSIS
CALL SROT (n, x, incx, y, incy, c, s)
CALL CROT (n, x, incx, y, incy, c, s)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use private data.
DESCRIPTION
SROT applies a plane rotation matrix to a real sequence of ordered pairs:
(x i , y i ), for all i = 1, 2, . . ., n.
CROT applies a rotation matrix to a complex sequence of ordered pairs:
(x i , y i ), for all i = 1, 2, . . ., n.
These routines have the following arguments:
n Integer. (input)
Number of ordered pairs (planar points in SROT) to be rotated. If n ≤ 0, SROT or CROT returns
without computation.
x SROT: Real array of dimension (n– 1) . incx + 1. (input and output)
On input, array x contains the x-coordinate of each planar point to be rotated. On output, array x
contains the x-coordinate of each rotated planar point.
CROT: Complex array of dimension (n– 1) . incx + 1. (input and output)
On input, array x contains the first element of each ordered pair to be rotated. On output, array
x contains the first element of each rotated ordered pair.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.
y SROT: Real array of dimension (n– 1) . incy + 1. (input and output)
On input, array y contains the y-coordinate of each planar point to be rotated. On output, array y
contains the y-coordinate of each rotated planar point.
CROT: Complex array of dimension (n– 1) . incy + 1. (input and output)
On input, array y contains the second element of each ordered pair to be rotated. On output,
array y contains the second element of each rotated ordered pair.
incy Integer. (input)
Increment between elements of y. If incy = 0, the results will be unpredictable.
c Real. (input)
Cosine of the angle of rotation, usually calculated using SROTG(3S) or CROTG(3S).
s SROT: Real. (input)
Sine of the angle of rotation, usually calculated using SROTG.
CROT: Complex. (input)
Complex sine of the angle of rotation, usually calculated using CROTG.
NOTES
SROT and CROT are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS). SROT applies the
following plane rotation to each pair of elements (x i , y i ):
xi c s xi
← for i =1, 2,. . ., n
yi −s c yi
2 2
If coefficients c and s satisfy c + s = 1.0, the rotation matrix is orthogonal, and the transformation is called
a Givens plane rotation. If c = 1 and s = 0, SROT returns without modifying any input parameters.
CROT applies the following rotation to each pair of complex elements (x i , y i ):
xi c s xi
← for i =1, 2,. . ., n
yi −s c yi
where s is the complex conjugate of s.
For CROT, if the coefficient c is real, and the coefficients c and s satisfy c 2 + ss = 1.0, the rotation matrix is
unitary, and the transformation is called a Givens complex rotation.
To calculate the Givens coefficients c and s from a two-element vector to determine the angle of rotation,
use SROTG(3S) or CROTG(3S).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)
SEE ALSO
CROTG(3S), SROTG(3S), SROTM(3S)
NAME
SROTG, CROTG – Constructs a Givens plane rotation
SYNOPSIS
CALL SROTG (a, b, c, s)
CALL CROTG (a, b, c, s)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use private data.
DESCRIPTION
SROTG computes the elements of a rotation matrix such that:
c s a r
. =
−s c b 0
where r =
a √aa +bb
√aa
and the notation z represents the complex conjugate of z.
These routines have the following arguments:
a SROTG: Real.
CROTG: Complex.
(input and output)
SROTG: On input, the first component of the vector to be rotated. On output, a is overwritten by r,
the first component of the vector in the rotated coordinate system, where:
r =sign ( √(a 2+b 2 ), a ), if a > b
r =sign ( √(a 2+b 2 ), b ), if a ≤ b
CROTG: On output, a is overwritten by the unique complex number r, whose size in the complex
plane is the Euclidean norm of the complex vector (a,b), and whose direction in the complex
plane is the same as that of the original complex element a.
b SROTG: Real.
CROTG: Complex.
(input and output)
On input, the second component of the vector to be rotated. On output, b contains z, where:
z=s if a > b
z=1/c if a ≤ b and c ≠0
z=1 if c = 0.
c Real. (output).
Cosine, c, of the angle of rotation.
s SROTG: Real.
CROTG: Complex.
(output)
Sine, s, of the angle of rotation.
NOTES
SROTG and CROTG are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
The value of z, returned in b by SROTG, gives a compact representation of the rotation matrix, which can be
used later to reconstruct c and s as in the following example:
IF (B .EQ. 1. ) THEN
C = 0.
S = 1.
ELS EIF ( ABS ( B) .LT. 1) THEN
C = SQRT( 1. - B * B)
S = B
ELS E
C = 1. / B
S = SQRT( 1 - C * C)
END IF
SEE ALSO
SROT(3S)
NAME
SROTM – Applies a modified Givens plane rotation
SYNOPSIS
CALL SROTM (n, x, incx, y, incy, rparam)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SROTM applies the modified Givens plane rotation constructed by SROTMG(3S).
This routine has the following arguments:
n Integer. (input)
Number of planar points to be rotated. If n ≤ 0, SROTM returns without any computation.
x Real array of dimension (n– 1) . incx + 1. (input and output)
On input, array x contains the x-coordinate of each planar point to be rotated. On output, array x
contains the x-coordinate of each rotated planar point.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.
y Real array of dimension (n– 1) . incy + 1. (input and output)
On input, array y contains the y-coordinate of each planar point to be rotated. On output, array y
contains the y-coordinate of each rotated planar point.
incy Integer. (input)
Increment between elements of y. If incx = 0, the results will be unpredictable.
rparam Real array of dimension 5. (input)
Contains rotation matrix information.
SROTM computes a planar rotation, with possible scaling or reflection, as follows:
xi h 1,1 h 1,2 xi
← : for i =1, 2,. . ., n
yi h 2,1 h 2,2 yi
where the matrix that contains the elements h 1,1, h 2,1, h 1,2, and h 2,2 is called a rotation matrix.
The rparam array determines the contents of the rotation matrix, as follows:
The key parameter, rparam(1), may have one of four values:
1.0, 0.0, – 1.0, or – 2.0
If rparam(1) = 1.0:
h 11 h 1,2 rparam (2) 1.0
=
h 2,1 h 2,2 −1.0 rparam (5)
and rparam(3) and rparam(4) are ignored.
If rparam(1) = 0.0:
h 1,1 h 1,2 1.0 rparam (4)
=
h 2,1 h 2,2 rparam (3) 1.0
SEE ALSO
SROTMG(3S) for further details about the modified Givens transformation and array rparam
NAME
SROTMG – Constructs a modified Givens plane rotation
SYNOPSIS
CALL SROTMG (d 1 , d 2 , b 1 , b 2 , rparam)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SROTMG computes the elements of a modified Givens plane rotation matrix.
This routine has the following arguments:
d1 Real. (input and output)
On input, this value is the first diagonal element of the scaling matrix D. On the first call to
SROTMG, this value is typically 1.0. Subsequent calls typically use the value from the previous
call. On output, this value is the first diagonal element of the updated scaling matrix D’.
d2 Real. (input and output)
On input, this is the first diagonal element of the scaling matrix D. On the first call to SROTMG,
this value is typically 1.0. Subsequent calls typically use the value from the previous call. On
output, this value is the first diagonal element of the updated scaling matrix D’.
b1 Real. (input and output)
On input, this value is the x-coordinate of the vector used to define the angle of rotation, before
scaling (multiplying by the matrix D). On output, this value is the x-coordinate of the rotated
vector, before scaling (multiplying by the matrix D’).
b2 Real. (input)
On input, this value is the y-coordinate of the vector used to define the angle of rotation, before
scaling (multiplying by the matrix D). It is unchanged on output.
rparam Real array of dimension 5. (output)
This array contains rotation matrix information. SROTMG sets up the computed elements in
rparam from inputs d 1 , d 2 , b 1 , and b 2 .
Standard Givens Rotation
A standard Givens rotation (see SROTG(3S)) is based on an orthogonal matrix G that rotates points on a
Cartesian xy-coordinate plane. To calculate the rotation matrix, you must provide the angle of rotation
desired, or, equivalently, a vector (point) that lies along the angle of rotation. For a given planar point (x r ,
y r ), G is formed so that:
x ′ c s xr xr
= = G
0 −s c yr yr
where x ′ = √xr 2 + yr 2.
With this rotation matrix G, you can then convert any number of existing planar points to the new (rotated)
xy-coordinate system. For n points, the rotations would be as follows:
xi c s xi
← for i =1, 2,. . ., n
yi −s c yi
Modified Givens Rotation
The algorithm for SROTMG is based on the following observation. The rotation matrix G can be factored
into a scaling matrix (diagonal matrix) and modified rotation matrix H, for which either the diagonal or the
off-diagonal elements are units (that is, ±1). Thus, to perform m modified (scaled) rotations on n planar
points, requires only 2nm, rather than 4nm multiplications for the standard rotation.
Because you may want to perform several successive rotations, this routine assumes that you have leftover
scaling factors from your previous modified Givens rotation; that is, the routine requires you to input not
only a planar rotation vector (b 1 , b 2 ) but also the squares of the diagonal elements of the scaling matrix, d 1
and d 2 . The actual rotation vector is specified as follows:
xr √d 1
0 b1 1 b
=
1
= 2
r
0 √d 2 2
D
y
b b 2
where:
1
√d 1′ 0
′2
D =
0 √d 2′
uses the updated scaling factors d 1 ’ and d 2 ’, which are d 1 and d 2 on output.
h 1,1 h 1,2
H = is stored in the output array argument rparam
h 2,1 h 2,2
b 1′ is stored as b 1 on output.
1/2 1/2
D’ H equals G D , not G, as implied earlier. You must account for the old scaling factors when
calculating the new scaling factors.
After calculating the matrix H by using SROTMG, you can then use it in SROTM(3S) to convert points to the
new coordinate system.
NOTES
If rescaling is needed, SROTMG will further modify these output values before the end of the routine. See
case 4 later in this subsection.
Case 3: √d 1b 1 ≤ √d 2b 2 ( xr ≤ yr )
In this case, the off-diagonal elements of H are units (to be specific, h 2,1 = −1 and h 1,2 = 1). Thus, the
rparam values set on output are as follows:
log ( d ′ ) log ( d ′ )
1
qi = int(logγ(√ di ′ )) = int = int , for i =1, 2.
2 i 2 i
î 2 12 î 24
Then the following is true:
qi < 0, if di ′ < γ2
qi = 0, if γ-2 ≤ di ′ ≤γ2
qi > 0, if di ′ > γ2
q
rparam (4) ← h 1,2′ = h 1,2γ 1
q
rparam (5) ← h 2,2′ = h 2,2γ 2
SEE ALSO
SROTG(3S), SROTM(3S)
Gentleman, W. M., "Least Squares Computations by Givens Transformations Without Square Roots,"
Journal of the Institute for Mathematical Applications 12 (1973), pp. 329 – 336.
Lawson, C., Hanson, R., Kincaid, D., and Krogh, F., "Basic Linear Algebra Subprograms for Fortran Usage,"
ACM Transactions on Mathematical Software, 5 (1979), pp. 308 – 325.
NAME
SSCAL, CSSCAL, CSCAL – Scales a real or complex vector
SYNOPSIS
CALL SSCAL (n, alpha, x, incx)
CALL CSSCAL (n, alpha, x, incx)
CALL CSCAL (n, alpha, x, incx)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.
DESCRIPTION
SSCAL scales a real vector with a real scalar.
CSSCAL scales a complex vector with a real scalar.
CSCAL scales a complex vector with a complex scalar.
These routines perform the following vector operation:
x←αx
where α is a real or complex scalar, and x is a real or complex vector.
These routines have the following arguments:
n Integer. (input)
Number of elements in the vector. If n ≤ 0, SSCAL, CSSCAL, and CSCAL return without any
computation.
alpha SSCAL, CSSCAL: Real. (input)
CSCAL: Complex. (input)
Scalar value α by which to scale the vector.
x SSCAL: Real array of dimension (n– 1) . incx + 1. (input and output)
CSSCAL, CSCAL: Complex array of dimension (n– 1) . incx + 1. (input and output)
Vector to be scaled.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.
NOTES
SSCAL, CSSCAL, and CSCAL are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
NAME
SSUM, CSUM – Sums the elements of a real or complex vector
SYNOPSIS
sum = SSUM (n, x, incx)
sum = CSUM (n, x, incx)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.
DESCRIPTION
SSUM sums the elements of a real vector.
CSUM sums the elements of a complex vector.
SSUM and CSUM perform the following vector operation:
n
sum ← Σ xi
i =1
NOTES
SSUM and CSUM are extensions to the standard Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
NAME
SSWAP, CSWAP – Swaps two real or complex vectors
SYNOPSIS
CALL SSWAP (n, x, incx, y, incy)
CALL CSWAP (n, x, incx, y, incy)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.
DESCRIPTION
SSWAP swaps two real vectors.
CSWAP swaps two complex vectors.
SSWAP and CSWAP perform the following vector operation:
x <→ y
where x and y are real or complex vectors.
These routines have the following arguments:
n Integer. (input)
Number of vector elements to be swapped. If n ≤ 0, SSWAP and CSWAP return without any
computation.
x SSWAP: Real array of dimension (n– 1) . incx + 1. (input and output)
CSWAP: Complex array of dimension (n– 1) . incx + 1. (input and output)
Vector to be swapped.
incx Integer. (input)
Increment between elements of x.
If incx = 0, the results will be unpredictable.
y SSWAP: Real array of dimension (n– 1) . incy + 1. (input and output)
CSWAP: Complex array of dimension (n– 1) . incy + 1. (input and output)
Vector to be swapped.
incy Integer. (input)
Increment between elements of y. If incy = 0, the results will be unpredictable.
NOTES
SSWAP and CSWAP are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)
NAME
INTRO_BLAS2 – Introduction to matrix-vector linear algebra subprograms
IMPLEMENTATION
See individual man pages for implementation details
DESCRIPTION
The linear algebra subprograms are written to run optimally on UNICOS and UNICOS/mk systems. These
subprograms use call-by-address convention when called by a Fortran or C program, or the assembler for
your system.
Level 2 Basic Linear Algebra Subprograms
The Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS) consist of CAM or CAL routines for real
and complex data. They handle matrix-vector operations. Only the single-precision real and complex data
types are supported.
Increment arguments for vectors
The description of a vector consists of the name of the array (x or y) followed by the storage spacing
(increment) in the array of vector elements (incx or incy). The increment can be positive or negative. When
a vector x consists of n elements, the corresponding actual array arguments must be of length at least
1+(n – 1) . incx . For a negative increment, the first element of x is assumed to be x (1+(n – 1) . incx .
Table of Level 2 BLAS routines
The following table describes these routines. If more than one routine name appears for a given block in the
table, the first name listed is the name of the man page that documents all routines listed in that block.
The table is in alphabetic order, except that each Hermitian matrix routine (any routine whose name begins
with CH) is grouped next to equivalent symmetric matrix routines (whose names begin with SS or CS). This
is because the Hermitian property is a type of symmetry.
Each routine in the table marked with an asterick (*) is an extension to the standard set of Level 2 BLAS
routines.
SEE ALSO
Dongarra, J., J. Du Croz, S. Hammarling, and R. Hanson, "An Extended Set of FORTRAN Basic Linear
Algebra Subprograms," ACM Transactions on Mathematical Software, Vol. 14, No. 1, March 1988, pp. 1 –
17.
NAME
CHBMV – Multiplies a complex vector by a complex Hermitian band matrix
SYNOPSIS
CALL CHBMV (uplo, n, k, alpha, a, lda, x, incx, beta, y, incy)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
CHBMV performs the following matrix-vector operation where α and β are scalars, x and y are n-element
vectors, and A is an n-by-n Hermitian band matrix:
y ← α Ax+ β y
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of the band matrix A is supplied, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
k Integer. (input)
Specifies the number of superdiagonals of matrix A. k ≥ 0.
alpha Complex. (input)
Scalar factor α.
a Complex array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading (k+1)-by-n part of array a must contain the
upper triangular band part of the Hermitian matrix, supplied column-by-column, with the leading
diagonal of the matrix in row k+1 of the array, the first superdiagonal starting at position 2 in
row k, and so on. The top left k-by-k triangle of array a is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading (k+1)-by-n part of array a must contain the lower
triangular band part of the Hermitian matrix, supplied column-by-column, with the leading
diagonal of the matrix in row 1 of the array, the first subdiagonal starting at position 1 in row 2,
and so on. The bottom right k-by-k triangle of array a is not referenced.
The imaginary parts of the diagonal elements need not be set and are assumed to be 0. See the
EXAMPLES section for examples of Fortran code that transfer a band matrix from conventional
full matrix storage to band storage.
NOTES
CHBMV is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
EXAMPLES
The following program segment transfers the upper triangular part of a Hermitian band matrix from
conventional full matrix storage to band storage:
DO 20, J = 1, N
M = K + 1 - J
DO 10, I = MAX ( 1, J - K ), J
A( M + I, J ) = MAT RIX( I, J )
10 CON TINUE
20 CON TIN UE
The following program segment transfers the lower triangular part of a Hermitian band matrix from
conventional full matrix storage to band storage:
DO 20, J = 1, N
M = 1 - J
DO 10, I = J, MIN ( N, J + K )
A( M + I, J ) = MATRIX ( I, J )
10 CON TINUE
20 CON TIN UE
SEE ALSO
SSBMV(3S)
NAME
CHEMV – Multiplies a complex vector by a complex Hermitian matrix
SYNOPSIS
CALL CHEMV (uplo, n, alpha, a, lda, x, incx, beta, y, incy)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
CHEMV performs the following matrix-vector operation:
y ← α Ax + β y
where α and β are scalars, x and y are n-element vectors, and A is an n-by-n Hermitian matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array a is referenced, as follows:
uplo= ’U’ or ’u’: only the upper triangular part of a is referenced.
uplo= ’L’ or ’l’: only the lower triangular part of a is referenced.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Complex. (input)
Scalar factor α.
a Complex array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular part of the Hermitian matrix. The strictly lower triangular part of a
is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular part of the Hermitian matrix. The strictly upper triangular part of a
is not referenced.
The imaginary parts of the diagonal elements need not be set and are assumed to be 0.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program.
Argument lda ≥ MAX(1,n).
x Complex array of dimension 1+(n– 1) . incx . (input)
Contains vector x.
NOTES
CHEMV is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
SEE ALSO
SSYMV(3S)
NAME
CHER – Performs Hermitian rank 1 update of a complex Hermitian matrix
SYNOPSIS
CALL CHER (uplo, n, alpha, x, incx, a, lda)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
CHER performs the following Hermitian rank 1 operation:
H
A ←α x x + A
H
where α is a real scalar, x is an n-element vector, x is the conjugate transpose of x, and A is an n-by-n
Hermitian matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array a is referenced, as follows:
uplo= ’U’ or ’u’: only the upper triangular part of a is referenced.
uplo= ’L’ or ’l’: only the lower triangular part of a is referenced.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Real. (input)
Scalar factor α.
x Complex array of dimension 1+(n– 1) . incx . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
a Complex array of dimension (lda,n). (input and output)
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular part of the Hermitian matrix. The strictly lower triangular part of a
is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array a.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular part of the Hermitian matrix. The strictly upper triangular part of a
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array a.
The imaginary parts of the diagonal elements need not be set. They are assumed to be 0; on
exit, they are set to 0.
lda Integer. (input)
On entry, lda specifies the first dimension of a as declared in the calling program. lda ≥
MAX(1,n).
NOTES
CHER is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0), this routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
SEE ALSO
SSYR(3S)
NAME
CHER2 – Performs Hermitian rank 2 update of a complex Hermitian matrix
SYNOPSIS
CALL CHER2 (uplo, n, alpha, x, incx, y, incy, a, lda)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
CHER2 performs the following Hermitian rank 2 operation:
H H
A ← α xy + α yx + A
H H
where α is a scalar, α is the complex conjugate of α, x and y are n-element vectors, x and y conjugate
transposes of x and y, respectively, and A is an n-by-n Hermitian matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array a is referenced, as follows:
uplo= ’U’ or ’u’: only the upper triangular part of a is referenced.
uplo= ’L’ or ’l’: only the lower triangular part of a is referenced.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Complex. (input)
Scalar factor α.
x Complex array of dimension 1+(n– 1) . incx . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
y Complex array of dimension 1+(n– 1) . incy . (input)
Contains vector y.
incy Integer. (input)
Specifies the increment for the elements of y. incy must not be 0.
a Complex array of dimension (lda,n). (input and output)
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular part of the Hermitian matrix. The strictly lower triangular part of a
is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array a.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular part of the Hermitian matrix. The strictly upper triangular part of a
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array a.
The imaginary parts of the diagonal elements need not be set. They are assumed to be 0; on
exit, they are set to 0.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ MAX(1,n).
NOTES
CHER2 is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
SEE ALSO
SSYR2(3S)
NAME
CHPMV – Multiplies a complex vector by a packed complex Hermitian matrix
SYNOPSIS
CALL CHPMV (uplo, n, alpha, ap, x, incx, beta, y, incy)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
CHPMV performs the following matrix-vector operation:
y ← α Ax + β y
where α and β are complex scalars, x and y are n-element vectors, and A is an n-by-n packed complex
Hermitian matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied in the argument ap.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Complex. (input)
Scalar factor α.
n (n +1)
ap Complex array of dimension . (input)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on.
x Complex array of dimension 1+(n– 1) . incx . (input)
Contains vector x.
NOTES
CHPMV is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
SEE ALSO
SSPMV(3S)
NAME
CHPR – Performs Hermitian rank 1 update of a packed complex Hermitian matrix
SYNOPSIS
CALL CHPR (uplo, n, alpha, x, incx, ap)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.
DESCRIPTION
CHPR performs the following Hermitian rank 1 operation:
H
A ← α xx + A
H
where α is a real scalar, x is an n-element vector, x is the conjugate transpose of x, and A is an n-by-n
packed complex Hermitian matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is supplied in the argument ap.
uplo= ’L’ or ’l’: the lower triangular part of A is supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Complex. (input)
Scalar factor α.
x Complex array of dimension 1+(n– 1) . incx . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
n (n +1)
ap Complex array of dimension . (input and output)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on. On exit, the upper triangular part of the
updated matrix overwrites array ap.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on. On exit, the lower triangular part of the
updated matrix overwrites array ap.
The imaginary parts of the diagonal elements need not be set. They are assumed to be 0; on
exit, they are set to 0.
NOTES
CHPR is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0), this routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
SEE ALSO
SSPR(3S)
NAME
CHPR2 – Performs Hermitian rank 2 update of a packed complex Hermitian matrix
SYNOPSIS
CALL CHPR2 (uplo, n, alpha, x, incx, y, incy, ap)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.
DESCRIPTION
CHPR2 performs the following Hermitian rank 2 operation:
H H
A ← α xy + α yx + A
H H
where α is a scalar, α is the complex conjugate of α, x and y are n-element vectors, x and y conjugate
transposes of x and y, respectively, and A is an n-by-n packed complex Hermitian matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied in the argument ap.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Complex. (input)
Scalar factor α.
x Complex array of dimension 1+(n– 1) . incx . (input)
Contains vector x.
incx Integer. (input)
Increment for the elements of x. incx must not be 0.
y Complex array of dimension 1+(n– 1) . incy . (input)
Contains vector y.
incy Integer. (input)
Increment for the elements of y. Argument incy must not be 0.
n (n +1)
ap Complex array of dimension . (input and output)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on. On exit, the upper triangular part of the
updated matrix overwrites array ap.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on. On exit, the lower triangular part of the
updated matrix overwrites array ap.
The imaginary parts of the diagonal elements need not be set. They are assumed to be 0; on
exit, they are set to 0.
NOTES
CHPR2 is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
SEE ALSO
SSPR(3S)
NAME
SGBMV, CGBMV – Multiplies a real or complex vector by a real or complex general band matrix
SYNOPSIS
CALL SGBMV (trans, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
CALL CGBMV (trans, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.
DESCRIPTION
SGBMV multiplies a real vector by a real general band matrix.
CGBMV multiplies a complex vector by a complex general band matrix.
SGBMV and CGBMV perform one of the following matrix-vector operations:
y ← α Ax + β y
T
y ←αA x+βy
H
y ←αA x+βy
where
• α and β are scalars,
• x and y are vectors
• A is an m-by-n band matrix with kl subdiagonals and ku superdiagonals
T
• A is the transpose of A
H
• A is the conjugate transpose of A
These routines have the following arguments:
trans Character*1. (input)
Specifies the operation to be performed:
trans = ’N’ or ’n’: y ← α Ax + βy
T
trans = ’T’ or ’t’: y ← α A x + β y
T H
trans = ’C’ or ’c’: y ← α A x + β y (SGBMV), or y ← α A x + β y (CGBMV)
m Integer. (input)
Specifies the number of rows in matrix A. m ≥ 0.
n Integer. (input)
Specifies the number of columns in the matrix A. n ≥ 0.
kl Integer. (input)
Specifies the number of subdiagonals of matrix A. kl ≥ 0.
ku Integer. (input)
Specifies the number of superdiagonals of matrix A. ku ≥ 0.
alpha SGBMV: Real. (input)
CGBMV: Complex. (input)
Scalar factor α.
a SGBMV: Real array of dimension (lda,n). (input)
CGBMV: Complex array of dimension (lda,n). (input)
Before entry, the leading (kl+ku+1)-by-n part of array a must contain the matrix of coefficients,
supplied column-by-column, with the leading diagonal of the matrix in row (ku+1) of the array,
the first superdiagonal starting at position 2 in row ku, the first subdiagonal starting at position 1
in row (ku+2), and so on. Elements in array a that do not correspond to elements in the band
matrix (such as the top left ku-by-ku triangle) are not referenced.
See the NOTES section for an example of Fortran code that transfers a band matrix from
conventional full matrix storage to band storage.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ (kl+ku+1).
x SGBMV: Real array of dimension 1+(kx– 1) . incx . (input)
CGBMV: Complex array of dimension 1+(kx– 1) . incx . (input)
Contains the vector x. When trans = ’N’ or ’n’, kx is n; otherwise, it is m.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
beta SGBMV: Real. (input)
CGBMV: Complex. (input)
Scalar factor β. When beta is supplied as 0, y need not be set on input.
y SGBMV: Real array of dimension 1+(ky– 1) . incy . (input and output)
CGBMV: Complex array of dimension 1+(ky– 1) . incy . (input and output)
Contains the vector y. When trans = ’N’ or ’n’, ky is m; otherwise, it is n. On exit, the updated
vector overwrites array y.
incy Integer. (input)
Specifies the increment for the elements of y.
incy must not be 0.
NOTES
The following program segment transfers a band matrix from conventional full matrix storage to band
storage:
DO 20, J = 1, N
K = KU + 1 - J
DO 10, I = MAX (1, J - KU) , MIN (M, J + KL)
A(K + I, J) = MAT RIX (I, J)
10 CONTIN UE
20 CON TINUE
SGBMV and CGBMV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
NAME
SGEMV, CGEMV – Multiplies a real or complex vector by a real or complex general matrix
SYNOPSIS
CALL SGEMV (trans, m, n, alpha, a, lda, x, incx, beta, y, incy)
CALL CGEMV (trans, m, n, alpha, a, lda, x, incx, beta, y, incy)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.
DESCRIPTION
SGEMV multiplies a real vector by a real general matrix.
CGEMV multiplies a complex vector by a complex general matrix.
SGEMV and CGEMV perform one of the following matrix-vector operations:
y ← α Ax + β y
T
y ←αA x+βy
H
y ←αA x+βy
where
• α and β are scalars,
• x and y are vectors
• A is an m-by-n general matrix
T
• A is the transpose of A
H
• A is the conjugate transpose of A
These routines have the following arguments:
trans Character*1. (input)
Specifies the operation to be performed:
trans = ’N’ or ’n’: y ← α Ax + βy
T
trans = ’T’ or ’t’: y ← α A x + β y
T H
trans = ’C’ or ’c’: y ← α A x + β y (SGEMV), or y ← α A x + β y (CGEMV)
m Integer. (input)
Specifies the number of rows in matrix A. m ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix A. n ≥ 0.
NOTES
SGEMV and CGEMV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
NAME
SGER, CGERC, CGERU – Performs rank 1 update of a real general matrix
SYNOPSIS
CALL SGER (m, n, alpha, x, incx, y, incy, a, lda)
CALL CGERC (m, n, alpha, x, incx, y, incy, a, lda)
CALL CGERU (m, n, alpha, x, incx, y, incy, a, lda)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SGER performs a rank 1 update of a real general matrix.
CGERC performs a conjugated rank 1 update of a complex general matrix.
CGERU performs an unconjugated rank 1 update of a complex general matrix.
SGER and CGERU perform the rank 1 operation:
T
A ← α xy + A
T
where y is the transpose of y, α is a scalar, x is an m-element vector, y is an n-element vector, and A is an
m-by-n matrix.
CGERC performs the rank 1 operation:
H
A ← α xy + A
H
where y is the conjugate transpose of y, α is a scalar, x is an m-element vector, y is an n-element vector,
and A is an m-by-n matrix.
These routines have the following arguments:
m Integer. (input)
Specifies the number of rows in matrix A. m ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix A. n ≥ 0.
alpha SGER: Real. (input)
CGERC, CGERU: Complex. (input)
Scalar factor α.
NOTES
SGER, CGERC, and CGERU are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), these routines start at the end of the vector and move
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
NAME
SGESUM, CGESUM – Adds a scalar multiple of a real or complex matrix to a scalar multiple of another real
or complex matrix
SYNOPSIS
CALL SGESUM (trans, m, n, alpha, a, lda, beta, b, ldb)
CALL CGESUM (trans, m, n, alpha, a, lda, beta, b, ldb)
IMPLEMENTATION
UNICOS/mk systems
These subroutines execute on a single processor and use private data only.
DESCRIPTION
SGESUM adds two real matrices with optional scaling; CGESUM adds two complex matrices.
B ← α op(A) + β B
where
• op(A) represents A, its transpose A T , or its conjugate transpose A H
• op(A) and B are m-by-n matrices
• α and β are scalars. β =0 is a special case, used to copy α . op (A ) to B. α =0 is a special case, used to
scale B.
These routines have the following arguments:
trans Character*1. (input)
Specifies whether the matrix A is transposed.
trans = ’N’ or ’n’: op(A) = A
T
trans = ’T’ or ’t’: op(A) = A
T H
trans = ’C’ or ’c’: op(A) = A (SGESUM), or op(A) = A (CGESUM)
m Integer. (input)
Specifies the number of rows in matrix op(A) and in matrix B.
n Integer. (input)
Specifies the number of columns in matrix op(A) and in matrix B.
alpha SGESUM: Real. (input)
CGESUM: Complex. (input)
Scalar factor α.
a SGESUM: Real array of dimension (lda,k). (input)
CGESUM: Complex array of dimension (lda,k). (input)
When trans = ’N’ or ’n’, k is n; otherwise, it is m. When trans = ’N’ or ’n’, the leading m-by-n
part of the array a contains matrix A. When trans = ’T’ or ’t’ or trans = ’C’ or ’c’, the leading
n-by-m part or the array a contains matrix A, whose transpose or conjugate transpose will be
used in the matrix sum. If alpha = 0, a need not be specified on entry.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. When trans = ’N’ or ’n’,
lda ≥ MAX(1,m); otherwise, lda ≥ MAX(1,n).
beta SGESUM: Real. (input)
CGESUM: Complex. (input)
Scalar factor β.
b SGESUM: Real array of dimension (ldb,n). (input/output)
CGESUM: Complex array of dimension (ldb,n). (input/output)
On entry, if beta ≠0, the m-by-n matrix b contains B. (If β = 0, b need not be specifed on
entry.) On exit, b is overwritten with the matrix sum α op (A ) + β B .
ldb Integer. (input)
The leading dimension of array b. ldb ≥ MAX(1,m).
EXAMPLES
An important use of SGESUM is to copy an array to another array, in which the second array may be a
temporary workspace that has a better data layout than the first array. For example, suppose array A was
declared as follows in the main program:
REA L A(1 024, 1024)
This data layout is particularity bad for Level 3 BLAS operations that must fit a block of A in the direct-
mapped cache of UNICOS/mk systems, because every column has exactly the same cache offset as every
other column. It might be worthwhile to operate on a block of A at a time by copying a part of A into a
second array B. Give B a leading dimension that is an odd multiple of 16, so that a 16-by-64 subblock of B
will fit in the cache:
REA L B(8 0, 64)
CDI R$ CACHE_ ALIGN B
SEE ALSO
SAXPBY(3S)
NAME
SSBMV – Multiplies a real vector by a real symmetric band matrix
SYNOPSIS
CALL SSBMV (uplo, n, k, alpha, a, lda, x, incx, beta, y, incy)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SSBMV performs the following matrix-vector operation:
y ← α Ax + β y
where α and β are scalars, x and y are n-element vectors, and A is an n-by-n symmetric band matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of band matrix A is supplied, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
k Integer. (input)
Specifies the number of superdiagonals of matrix A. k ≥ 0.
alpha Real. (input)
Scalar factor α.
a Real array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading (k+1)-by-n part of array a must contain the
upper triangular band part of the symmetric matrix, supplied column-by-column, with the leading
diagonal of the matrix in row (k+1) of the array, the first superdiagonal starting at position 2 in
row k, and so on. The top left k-by-k triangle of array a is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading (k+1)-by-n part of array a must contain the lower
triangular band part of the symmetric matrix, supplied column-by-column, with the leading
diagonal of the matrix in row 1 of the array, the first subdiagonal starting at position 1 in row 2,
and so on. The bottom right k-by-k triangle of array a is not referenced.
See the NOTES section for examples of Fortran code that transfer upper and lower parts of
symmetric band matrices from conventional full matrix storage to band storage.
NOTES
The following program segment transfers the upper triangular part of a symmetric band matrix from
conventional full matrix storage to band storage:
DO 20, J = 1, N
M = K + 1 - J
DO 10, I = MAX ( 1, J - K ), J
A( M + I, J ) = MAT RIX( I, J )
10 CON TINUE
20 CON TIN UE
The following program segment transfers the lower triangular part of a symmetric band matrix from
conventional full matrix storage to band storage:
DO 20, J = 1, N
M = 1 - J
DO 10, I = J, MIN( N, J + K )
A( M + I, J ) = MAT RIX( I, J )
10 CON TINUE
20 CON TIN UE
NAME
SSPMV, CSPMV – Multiplies a real or complex symmetric packed matrix by a real or complex vector
SYNOPSIS
CALL SSPMV (uplo, n, alpha, ap, x, incx, beta, y, incy)
CALL CSPMV (uplo, n, alpha, ap, x, incx, beta, y, incy)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
Only the real routine executes on UNICOS/mk systems (on a single processor, using only private data).
DESCRIPTION
SSPMV and CSPMV perform the following matrix-vector operation:
y ← α Ax + β y
where α and β are scalars, x and y are n-element vectors, and A is an n-by-n symmetric packed matrix.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied in the argument ap.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha SSPMV: Real. (input)
CSPMV: Complex. (input) Scalar factor α .
n (n +1)
ap SSPMV: real array of dimension . (input)
2
n (n +1)
CSPMV: complex array of dimension . (input)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on.
NOTES
SSPMV is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS). CSPMV is an extension to Level 2
BLAS.
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
SEE ALSO
CHPMV(3S)
NAME
SSPR, CSPR – Performs symmetric rank 1 update of a real or complex symmetric packed matrix
SYNOPSIS
CALL SSPR (uplo, n, alpha, x, incx, ap)
CALL CSPR (uplo, n, alpha, x, incx, ap)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SSPR and CSPR each perform the following symmetric rank 1 operation:
T
A ← α xx + A
T
where x is the transpose of x, α is a real or complex scalar, x is an n-element vector, and A is an n-by-n
symmetric packed matrix.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied in the argument ap.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha SSPR: Real. (input)
CSPR: Complex. (input)
Scalar factor α.
x SSPR: Real array of dimension 1+(n– 1) . incx . (input)
CSPR: Complex array of dimension 1+(n– 1) . incx . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
n (n +1)
ap SSPR: Real array of dimension . (input and output)
2
n (n +1)
CSPR: Complex array of dimension . (input and output)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on. On exit, the upper triangular part of the
updated matrix overwrites array ap.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on. On exit, the lower triangular part of the
updated matrix overwrites array ap.
NOTES
SSPR is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS). CSPR is an extension to Level 2
BLAS.
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
SEE ALSO
CHPR(3S)
NAME
SSPR2 – Performs symmetric rank 2 update of a real symmetric packed matrix
SYNOPSIS
CALL SSPR2 (uplo, n, alpha, x, incx, y, incy, ap)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SSPR2 performs the following symmetric rank 2 operation:
T T
A ← α xy + α yx + A
T T
where x is the transpose of x, y is the transpose of y, α is a real scalar, x and y are n-element vectors,
and A is an n-by-n symmetric packed matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied in the argument ap.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Real. (input)
Scalar factor α.
x Real array of dimension 1+(n– 1) . incx . (input)
Contains vector x.
incx Integer. (input)
Increment for the elements of x. incx must not be 0.
y Real array of dimension 1+(n– 1) . incy . (input)
Contains vector y.
incy Integer. (input)
Increment for the elements of y. incy must not be 0.
n (n +1)
ap Real array of dimension . (input and output)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on. On exit, the upper triangular part of the
updated matrix overwrites array ap.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on. On exit, the lower triangular part of the
updated matrix overwrites array ap.
NOTES
SSPR2 is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
SEE ALSO
CHPR2(3S)
NAME
SSPR12 – Performs two simultaneous symmetric rank 1 updates of a real symmetric packed matrix
SYNOPSIS
CALL SSPR12 (uplo, n, alpha, x, incx, beta, y, incy, ap)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
SSPR12 performs the following matrix-vector operation:
T T
A ← α xx + β yy + A
T T
where x is the transpose of x, y is the transpose of y, α and β are real scalars, x and y are n-element
vectors, and A is an n-by-n real symmetric packed matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied in the argument ap.
uplo= ’L’ or ’l’; the lower triangular part of A is being supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Real. (input)
Scalar factor α.
x Real array of dimension 1+(n– 1) . incx . (input)
Contains vector x.
incx Integer. (input)
Increment for the elements of x. incx must not be 0.
beta Real. (input)
Scalar factor β.
y Real array of dimension 1+(n– 1) . incy . (input)
Contains vector y.
incy Integer. (input)
Increment for the elements of y. incy must not be 0.
n (n +1)
ap Real array of dimension . (input and output)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on. On exit, the upper triangular part of the
updated matrix overwrites array ap.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on. On exit, the lower triangular part of the
updated matrix overwrites array ap.
NOTES
SSPR12 is an extension to the standard Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS). It is
similar in function to the Level 2 BLAS routine SSPR2(3S) and is equivalent to two rank 1 updates.
For example,
CALL SSPR12 (UPLO, N,A LPHA,X ,IN CX, BET A,Y,IN CY,AP)
is equivalent to:
CALL SSPR(U PLO,N, ALP HA,X,I NCX ,AP )
CAL L SSP R(UPLO ,N,BET A,Y ,IN CY,AP)
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
NAME
SSYMV, CSYMV – Multiplies a real or complex vector by a real or complex symmetric matrix
SYNOPSIS
CALL SSYMV (uplo, n, alpha, a, lda, x, incx, beta, y, incy)
CALL CSYMV (uplo, n, alpha, a, lda, x, incx, beta, y, incy)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SSYMV multiplies a real vector by a real symmetric matrix.
CSYMV multiplies a complex vector by a complex symmetric matrix.
SSYMV and CSYMV perform the following matrix-vector operation:
y ← α Ax + β y
where α and β are scalars, x and y are n-element vectors, and A is an n-by-n symmetric matrix.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is being supplied, as follows:
uplo= ’U’ or ’u’: only the upper triangular part of A is being supplied.
uplo= ’L’ or ’l’: only the lower triangular part of A is being supplied.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha SSYMV: Real. (input)
CSYMV: Complex. (input)
Scalar factor α.
a SSYMV: Real array of dimension (lda,n). (input)
CSYMV: Complex array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular part of the symmetric matrix. The strictly lower triangular part of a
is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular part of the symmetric matrix. The strictly upper triangular part of a
is not referenced.
NOTES
SSYMV is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS). CSYMV is an extension to Level 2
BLAS.
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
SEE ALSO
CHEMV(3S)
NAME
SSYR, CSYR – Performs symmetric rank 1 update of a real or complex symmetric matrix
SYNOPSIS
CALL SSYR (uplo, n, alpha, x, incx, a, lda)
CALL CSYR (uplo, n, alpha, x, incx, a, lda)
IMPLEMENTATION
UNICOS/mk systems
Only the real routine executes on UNICOS/mk systems, on a single processor, using only private data.
DESCRIPTION
SSYR and CSYR perform the following symmetric rank 1 operation:
T
A ← α xx + A
T
where x is the transpose of x, α is a real or complex scalar, x is an n-element vector, and A is an n-by-n
symmetric matrix.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array a is referenced, as follows:
uplo= ’U’ or ’u’: only the upper triangular part of a is referenced.
uplo= ’L’ or ’l’: only the lower triangular part of a is referenced.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha SSYR: Real. (input)
CSYR: Complex. (input)
Scalar factor α.
x SSYR: Real array of dimension 1+(n– 1) . incx . (input)
CSYR: Complex array of dimension 1+(n– 1) . incx . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
a SSYR: Real array of dimension (lda,n). (input and output)
CSYR: Complex array of dimension (lda,n). (input and output)
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular part of the symmetric matrix. The strictly lower triangular part of a
is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array a.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular part of the symmetric matrix. The strictly upper triangular part of a
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array a.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ MAX(1,n).
NOTES
SSYR is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS). CSYR is an extension to Level 2
BLAS.
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
SEE ALSO
CHER(3S)
NAME
SSYR2 – Performs symmetric rank 2 update of a real symmetric matrix
SYNOPSIS
CALL SSYR2 (uplo, n, alpha, x, incx, y, incy, a, lda)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk, this subroutine executes on a single processor and uses only private data
DESCRIPTION
SSYR2 performs the following symmetric rank 2 operation:
T T
A ← α xy + α yx + A
T T
where α is a real scalar, y is the transpose of y, x is the transpose of x, x and y are n-element vectors, and
A is an n-by-n real symmetric matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is being supplied, as follows:
uplo= ’U’ or ’u’: only the upper triangular part of array a is referenced.
uplo= ’L’ or ’l’: only the lower triangular part of array a is referenced.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Real. (input)
Scalar factor α.
x Real array of dimension 1+(n– 1) . incx . (input)
Contains vector x.
incx Integer. (input)
On entry, incx specifies the increment for the elements of x. incx must not be 0.
y Real array of dimension 1+(n– 1) . incy . (input)
Contains vector y.
incy Integer. (input)
On entry, incy specifies the increment for the elements of y. incy must not be 0.
a Real array of dimension (lda,n). (input and output)
Before entry with uplo=’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of
a is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array a.
Before entry with uplo=’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of
a is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array a.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ MAX(1,n).
NOTES
SSYR2 is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)
SEE ALSO
CHER2(3S)
NAME
STBMV, CTBMV – Multiplies a real or complex vector by a real or complex triangular band matrix
SYNOPSIS
CALL STBMV (uplo, trans, diag, n, k, a, lda, x, incx)
CALL CTBMV (uplo, trans, diag, n, k, a, lda, x, incx)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
STBMV multiplies a real vector by a real triangular band matrix.
CTBMV multiplies a complex vector by a complex triangular band matrix.
STBMV and CTBMV perform one of the following matrix-vector operations:
x ← Ax
T
x←A x
H
x ← A x (CTBMV only)
T H
where A is the transpose of A, A is the conjugate transpose of A, x is an n-element vector, and A may be
either a unit or nonunit n-by-n upper or lower triangular band matrix with (k+1) diagonals.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the matrix is upper or lower triangular, as follows:
uplo = ’U’ or ’u’: A is an upper triangular matrix.
uplo = ’L’ or ’l’: A is a lower triangular matrix.
trans Character *1. (input)
Specifies the operation to be performed, as follows:
trans = ’N’ or ’n’: x ← Ax
T
trans = ’T’ or ’t’: x ← A x
T H
trans = ’C’ or ’c’: x ← A x (STBMV), or x ← A x (CTBMV)
diag Character *1. (input)
Specifies whether A is unit triangular, as follows:
diag = ’U’ or ’u’: A is assumed to be unit triangular.
diag = ’N’ or ’n’: A is not assumed to be unit triangular.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
k Integer. (input)
uplo = ’U’ or ’u’: k specifies the number of superdiagonals of matrix A.
uplo = ’L’ or ’l’: k specifies the number of subdiagonals of matrix A.
k ≥ 0.
a STBMV: Real array of dimension (lda,n). (input)
CTBMV: Complex array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading (k+1)-by-n upper part of array a must contain
the upper triangular band part of the matrix of coefficients, supplied column-by-column, with the
leading diagonal of the matrix in row (k+1) of the array, the first superdiagonal starting at
position 2 in row k, and so on. The top left k-by-k triangle of array a is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading (k+1)-by-n part of array a must contain the lower
triangular band part of the matrix of coefficients, supplied column-by-column, with the leading
diagonal of the matrix in row 1 of the array, the first subdiagonal starting at position 1 in row 2,
and so on. The bottom right k-by-k triangle of array a is not referenced.
See the NOTES section for examples of Fortran code that transfer upper and lower triangular
band matrices from conventional full matrix storage to band storage.
When diag = ’U’ or ’u’, these routines assume that all elements of the array a that represent
diagonal elements of the matrix A are 1. In this case, neither of these routines will reference any
of the diagonal elements.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ (k+1).
x STBMV: Real array of dimension 1+(n– 1) . incx . (input and output)
CTBMV: Complex array of dimension 1+(n– 1) . incx . (input and output)
Contains the vector x. On exit, the transformed vector overwrites array x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
NOTES
The following program segment transfers an upper triangular band matrix from conventional full matrix
storage to band storage:
DO 20, J = 1, N
M = K + 1 - J
DO 10, I = MAX ( 1, J - K ), J
A( M + I, J ) = MATRIX ( I, J )
10 CONTIN UE
20 CONTINUE
The following program segment transfers a lower triangular band matrix from conventional full matrix
storage to band storage:
DO 20, J = 1, N
M = 1 - J
DO 10, I = J, MIN ( N, J + K )
A( M + I, J ) = MAT RIX ( I, J )
10 CON TIN UE
20 CON TIN UE
STBMV and CTBMV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
NAME
STBSV, CTBSV – Solves a real or complex triangular banded system of equations
SYNOPSIS
CALL STBSV (uplo, trans, diag, n, k, a, lda, x, incx)
CALL CTBSV (uplo, trans, diag, n, k, a, lda, x, incx)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
STBSV solves a real triangular banded system of equations.
CTBSV solves a complex triangular banded system of equations.
STBSV and CTBSV solve one of the following systems of equations, using the operation associated with
each:
Equations Operation
–1
Ax=b x←A x
T –T
A x=b x←A x
H –H
A x=b x←A x (CTBSV only)
where
• b and x are n-element vectors
• A is either a unit or nonunit n-by-n upper or lower triangular band matrix with (k+1) diagonals
–1
• A is the inverse of A
T
• A is the transpose of A
–T T
• A is the inverse of A
H
• A is the conjugate transpose of A
–H H
• A is the inverse of A
On input, the right-hand side vector b is stored in the array argument x. On output, the solution vector x
overwrites b in the same array argument x.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the matrix is an upper or lower triangular matrix, as follows:
NOTES
The following program segment transfers an upper triangular band matrix from conventional full matrix
storage to band storage:
DO 20, J = 1, N
M = K + 1 - J
DO 10, I = MAX ( 1, J - K ), J
A( M + I, J ) = MATRIX ( I, J )
10 CONTIN UE
20 CONTINUE
The following program segment transfers a lower triangular band matrix from conventional full matrix
storage to band storage:
DO 20, J = 1, N
M = 1 - J
DO 10, I = J, MIN ( N, J + K )
A( M + I, J ) = MATRIX ( I, J )
10 CONTIN UE
20 CONTINUE
Tests for singularity or near-singularity are not included in STBSV or CTBSV. You must perform such tests
before calling these routines.
STBSV and CTBSV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
NAME
STPMV, CTPMV – Multiplies a real or complex vector by a real or complex triangular packed matrix
SYNOPSIS
CALL STPMV (uplo, trans, diag, n, ap, x, incx)
CALL CTPMV (uplo, trans, diag, n, ap, x, incx)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
STPMV and CTPMV perform one of the following matrix-vector operations:
x ← Ax
T
x←A x
H
x ← A x (CTPMV only)
T H
where A is the transpose of A, A is the conjugate transpose of A, x is an n-element vector, and A may be
either a unit or nonunit n-by-n upper or lower triangular matrix.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the matrix is an upper or lower triangular matrix, as follows:
uplo = ’U’ or ’u’: A is an upper triangular matrix.
uplo = ’L’ or ’l’: A is a lower triangular matrix.
trans Character*1. (input)
Specifies the operation to be performed, as follows:
trans = ’N’ or ’n’: x ← Ax
T
trans = ’T’ or ’t’: x ← A x
T H
trans = ’C’ or ’c’: x ← A x (STPMV), or x ← A x (CTPMV)
diag Character*1. (input)
Specifies whether A is unit triangular, as follows:
diag = ’U’ or ’u’: A is assumed to be unit triangular.
diag = ’N’ or ’n’: A is not assumed to be unit triangular.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
n (n +1)
ap STPMV: Real array of dimension . (input)
2
n (n +1)
CTPMV: Complex array of dimension . (input)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular matrix packed
sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2) contains A(1,2), ap(3)
contains A(2,2), and so on.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular matrix packed
sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2) contains A(2,1), ap(3)
contains A(3,1), and so on.
When diag = ’U’ or ’u’, these routines assume that all elements of the array a that represent
diagonal elements of the matrix A are 1. In this case, neither of these routines will reference any
of the diagonal elements.
x STPMV: Real array of dimension 1+(n– 1) . incx . (input and output)
CTPMV: Complex array of dimension 1+(n– 1) . incx . (input and output)
Contains the vector x. On exit, the transformed vector overwrites array x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
NOTES
STPMV and CTPMV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
NAME
STPSV, CTPSV – Solves a real or complex triangular packed system of equations
SYNOPSIS
CALL STPSV (uplo, trans, diag, n, ap, x, incx)
CALL CTPSV (uplo, trans, diag, n, ap, x, incx)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
STPSV and CTPSV solve one of the following systems of equations, using the operation associated with
each:
Equations Operation
–1
Ax=b x←A x
T –T
A x=b x←A x
H –H
A x=b x←A x (CTPSV only)
where
• b and x are n-element vectors
• A is either a unit or nonunit n-by-n upper or lower triangular band matrix with (k+1) diagonals
–1
• A is the inverse of A
T
• A is the transpose of A
–T T
• A is the inverse of A
H
• A is the conjugate transpose of A
–H H
• A is the inverse of A
On input, the right-hand side vector b is stored in the array argument x. On output, the solution vector x
overwrites b in the same array argument x.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the matrix is an upper or lower triangular matrix, as follows:
uplo = ’U’ or ’u’: A is an upper triangular matrix.
uplo = ’L’ or ’l’: A is a lower triangular matrix.
NOTES
Tests for singularity or near-singularity are not included in STPSV or CTPSV. You must perform such tests
before calling either routine.
STPSV and CTPSV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
NAME
STRMV, CTRMV – Multiplies a real or complex vector by a real or complex triangular matrix
SYNOPSIS
CALL STRMV (uplo, trans, diag, n, a, lda, x, incx)
CALL CTRMV (uplo, trans, diag, n, a, lda, x, incx)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
STRMV multiplies a real vector by a real triangular matrix.
CTRMV multiplies a complex vector by a complex triangular matrix.
STRMV and CTRMV perform one of the following matrix-vector operations:
x ← Ax
T
x←A x
H
x ← A x (CTRMV only)
T H
where A is the transpose of A, A is the conjugate transpose of A, x is an n-element vector, and A may be
either a unit or nonunit n-by-n upper or lower triangular matrix.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the matrix is upper or lower triangular, as follows:
uplo = ’U’ or ’u’: A is an upper triangular matrix.
uplo = ’L’ or ’l’: A is a lower triangular matrix.
trans Character *1. (input)
Specifies the operation to be performed, as follows:
trans = ’N’ or ’n’: x ← Ax
T
trans = ’T’ or ’t’: x ← A x
T H
trans = ’C’ or ’c’: x ← A x (STRMV), or x ← A x (CTRMV)
diag Character *1. (input)
Specifies whether A is unit triangular, as follows:
diag = ’U’ or ’u’: A is assumed to be unit triangular.
diag = ’N’ or ’n’: A is not assumed to be unit triangular.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
a STRMV: Real array of dimension (lda,n). (input)
CTRMV: Complex array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular matrix. The strictly lower triangular part of a is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular matrix. The strictly upper triangular part of a is not referenced.
When diag = ’U’ or ’u’, these routines assume that all elements of array a that represent
diagonal elements of matrix A are 1. In this case, neither of these routines will reference any of
the diagonal elements.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda must be at least
MAX(1,n).
x STRMV: Real array of dimension 1+(n– 1) . incx . (input and output)
CTRMV: Complex array of dimension 1+(n– 1) . incx . (input and output)
Contains the vector x. On exit, the transformed vector overwrites array x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
NOTES
STRMV and CTRMV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
NAME
STRSV, CTRSV – Solves a real or complex triangular system of equations
SYNOPSIS
CALL STRSV (uplo, trans, diag, n, a, lda, x, incx)
CALL CTRSV (uplo, trans, diag, n, a, lda, x, incx)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
STRSV solves a real triangular system of equations.
CTRSV solves a complex triangular system of equations.
STRSV and CTRSV solve one of the following systems of equations, using the operation associated with
each:
Equations Operation
–1
Ax=b x←A x
T –T
A x=b x←A x
H –H
A x=b x←A x (CTRSV only)
where
• b and x are n-element vectors
• A is either a unit or nonunit n-by-n upper or lower triangular matrix
–1
• A is the inverse of A
T
• A is the transpose of A
–T T
• A is the inverse of A
H
• A is the conjugate transpose of A
–H H
• A is the inverse of A
On input, the right-hand side vector b is stored in array argument x. On output, the solution vector x
overwrites b in the same array argument x.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the matrix is an upper or lower triangular matrix, as follows:
NOTES
Tests for singularity or near-singularity are not included in STRSV or CTRSV. You must perform such tests
before calling either routine.
STRSV and CTRSV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
NAME
INTRO_BLAS3 – Introduction to matrix-matrix linear algebra subprograms
IMPLEMENTATION
See individual man pages for implementation details
DESCRIPTION
The Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS) consist of routines for unpacked real and
complex data. They handle matrix-matrix operations.
Level 3 Basic Linear Algebra Subprograms
The following table describes these routines. If more than one routine name appears for a given block in the
table, the first name listed is the name of the man page that describes all routines listed in that block. For
complete information about each operation performed by the routine, see the individual man page for that
routine.
The table is in alphabetic order, except that each Hermitian matrix routine (any routine whose name begins
with CH) is grouped next to equivalent symmetric matrix routines (whose names begin with SS or CS). This
is because the Hermitian property is a type of symmetry.
Each routine in the table marked with an asterisk is an extension to the standard set of Level 3 BLAS
routines.
SEE ALSO
Dongarra, J., J. Du Croz, I. Duff, and S. Hammarling,"A Set of Level 3 Basic Linear Algebra Subprograms,"
ACM Transactions on Mathematical Software, Vol. 16, No. 1, March 1990, pp. 1 – 17.
NAME
CHEMM – Multiplies a complex general matrix by a complex Hermitian matrix
SYNOPSIS
CALL CHEMM (side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
CHEMM multiplies a complex general matrix by a complex Hermitian matrix.
CHEMM performs one of the following matrix-matrix operations:
C ← αA B + βC
C ← αB A + βC
where α and β are scalars, A is a Hermitian matrix, and B and C are m-by-n matrices.
This routine has the following arguments:
side Character*1. (input)
Specifies whether the Hermitian matrix A appears on the left or right in the operation, as
follows:
side = ’L’ or ’l’: C ← α A B + β C
side = ’R’ or ’r’: C ← α B A + β C
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of the Hermitian matrix A is referenced, as
follows:
uplo = ’U’ or ’u’: only the upper triangular part of the Hermitian matrix is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of the Hermitian matrix is referenced.
m Integer. (input)
Specifies the number of rows in matrix C. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix C. n must be ≥ 0.
alpha Complex. (input)
Scalar factor α.
a Complex array of dimension (lda,ka). (input)
Contains matrix A. When side = ’L’ or ’l’, ka is m; otherwise, it is n.
Before entry with side = ’L’ or ’l’, the m-by-m part of array a must contain the Hermitian
matrix, such that:
• If uplo = ’U’ or ’u’, the leading m-by-m upper triangular part of array a must contain the
upper triangular part of the Hermitian matrix. The strictly lower triangular part of a is not
referenced.
• If uplo = ’L’ or ’l’, the leading m-by-m lower triangular part of array a must contain the
lower triangular part of the Hermitian matrix. The strictly upper triangular part of a is not
referenced.
Before entry with side = ’R’ or ’r’, the n-by-n part of array a must contain the Hermitian matrix,
such that:
• If uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must contain the
upper triangular part of the Hermitian matrix. The strictly lower triangular part of a is not
referenced.
• If uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must contain the lower
triangular part of the Hermitian matrix. The strictly upper triangular part of a is not
referenced.
The imaginary parts of the diagonal elements need not be set. They are assumed to be 0.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. When side = ’L’ or ’l’, lda
≥ MAX(1,m); otherwise, lda ≥ MAX(1,n).
b Complex array of dimension (ldb,n). (input)
Contains matrix B. Before entry, the leading m-by-n part of array b must contain matrix B.
ldb Integer. (input)
Specifies the first dimension of b as declared in the calling program. ldb ≥ MAX(1,m).
beta Complex. (input)
Scalar factor β. When beta is supplied as 0, c need not be set on input.
c Complex array of dimension (ldc,n). (input and output)
Contains matrix C.
Before entry, the leading m-by-n part of array c must contain matrix C, except when beta is 0; in
which case, c need not be set. On exit, the m-by-n updated matrix overwrites array c.
ldc Integer. (input)
Specifies the first dimension of c as declared in the calling program. ldc ≥ MAX(1,m).
NOTES
CHEMM is a Level 3 Basic Linear Algebra Subprogram (Level 3 BLAS).
SEE ALSO
SSYMM(3S)
NAME
CHER2K – Performs Hermitian rank 2k update of a complex Hermitian matrix
SYNOPSIS
CALL CHER2K (uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
CHER2K performs a Hermitian rank 2k update of a complex Hermitian matrix.
CHER2K performs one of the following Hermitian rank 2k operations:
H H
C ← α AB + α BA + β C
H H
C ← α A B + α B A + βC
where the following is true:
• α and β are scalars;
H H
• A H and B are the conjugate transposes of A and B, respectively;
• C is an n-by-n Hermitian matrix;
• A and B and A and B are n-by-k matrices in the first operation listed previously, and k-by-n matrices in
the second.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array c is referenced, as follows:
uplo = ’U’ or ’u’: only the upper triangular part of c is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of c is referenced.
trans Character*1. (input)
Specifies the operation to be performed, as follows:
H H
trans = ’N’ or ’n’: C ← α AB + α BA + β C
H H
trans = ’C’ or ’c’: C ← α A B + α B A + βC
n Integer. (input)
Specifies the order of matrix C. n must be ≥ 0.
k Integer. (input)
On entry with trans = ’N’ or ’n’, k specifies the number of columns of matrices A and B.
On entry with trans = ’C’ or ’c’, k specifies the number of rows of matrices A and B.
k must be ≥ 0.
alpha Complex. (input)
Scalar factor α.
a Complex array of dimension (lda,ka). (input)
When trans = ’N’ or ’n’, ka is k; otherwise, it is n. Contains matrix A.
Before entry with trans = ’N’ or ’n’, the leading n-by-k part of array a must contain matrix A;
otherwise, the leading k-by-n part of array a must contain matrix A.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program.
If trans = ’N’ or ’n’, lda ≥ MAX(1,n); otherwise, lda ≥ MAX(1,k).
b Complex array of dimension (ldb,kb). (input)
When trans = ’N’ or ’n’, kb is k; otherwise, it is n. Contains matrix B.
Before entry with trans = ’N’ or ’n’, the leading n-by-k part of array b must contain matrix B;
otherwise, the leading k-by-n part of array b must contain matrix B.
ldb Integer. (input)
Specifies the first dimension of b as declared in the calling program. If trans = ’N’ or ’n’, ldb ≥
MAX(1,n); otherwise, ldb ≥ MAX(1,k).
beta Real. (input)
Scalar factor α.
c Complex array of dimension (ldc,n). (input)
Contains matrix C.
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array c must
contain the upper triangular part of the Hermitian matrix. The strictly lower triangular part of c
is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array c.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array c must
contain the lower triangular part of the Hermitian matrix. The strictly upper triangular part of c
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array c.
The imaginary parts of the diagonal elements need not be set and are assumed to be 0. On exit,
they are set to 0.
ldc Integer. (input)
Specifies the first dimension of c as declared in the calling program. ldc ≥ MAX(1,n).
NOTES
CHER2K is a Level 3 Basic Linear Algebra Subprogram (Level 3 BLAS).
SEE ALSO
SSYR2K(3S)
NAME
CHERK – Performs Hermitian rank k update of a complex Hermitian matrix
SYNOPSIS
CALL CHERK (uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
CHERK performs a Hermitian rank k update of a complex Hermitian matrix.
CHERK performs one of the following Hermitian rank k operations:
H
C ← α AA + β C
H
C←αA A+β C
where the following is true:
• α and β are scalars;
H
• A is the conjugate transpose of A.
• C is an n-by-n Hermitian matrix;
• A is an n-by-k matrix in the first operation listed previously, and a k-by-n matrix in the second.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array c is referenced, as follows:
uplo = ’U’ or ’u’: only the upper triangular part of c is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of c is referenced.
trans Character*1. (input)
Specifies the operation to be performed, as follows:
H
trans = ’N’ or ’n’: C ← α AA + β C
H
trans = ’C’ or ’c’: C ← α A A + β C
n Integer. (input)
Specifies the order of matrix C. n must be ≥ 0.
k Integer. (input)
On entry with trans = ’N’ or ’n’, k specifies the number of columns of matrix A.
On entry with trans = ’C’ or ’c’, k specifies the number of rows of matrix A.
k must be ≥ 0.
alpha Real. (input)
Scalar factor α.
a Complex array of dimension (lda,ka). (input)
When trans = ’N’ or ’n’, ka is k; otherwise, it is n. Contains matrix A.
Before entry with trans = ’N’ or ’n’, the leading n-by-k part of array a must contain matrix A;
otherwise, the leading k-by-n part of array a must contain matrix A.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. If trans = ’N’ or ’n’, lda ≥
MAX(1,n); otherwise, lda ≥ MAX(1,k).
beta Real. (input)
Scalar factor β.
c Complex array of dimension (ldc,n). (input and output) Contains matrix C.
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array c must
contain the upper triangular part of the Hermitian matrix. The strictly lower triangular part of c
is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array c.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array c must
contain the lower triangular part of the Hermitian matrix. The strictly upper triangular part of c
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array c.
The imaginary parts of the diagonal elements need not be set and are assumed to be 0. On exit,
they are set to 0.
ldc Integer. (input)
Specifies the first dimension of c as declared in the calling (sub)program. ldc ≥ MAX(1,n).
NOTES
CHERK is a Level 3 Basic Linear Algebra Subprogram (Level 3 BLAS).
SEE ALSO
SSYRK(3S)
NAME
SCOPY2, CCOPY2 – Copies a real or complex matrix into another real or complex matrix
SYNOPSIS
CALL SCOPY2 (m, n, a, lda, b, ldb)
CALL CCOPY2 (m, n, a, lda, b, ldb)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
SCOPY2 copies a real matrix into another real matrix.
CCOPY2 copies a complex matrix into another complex matrix.
SCOPY2 and CCOPY2 perform the following matrix operation:
B←A
where A and B are real or complex matrices.
This routine has the following arguments:
m Integer. (input)
Number of rows of A and B.
n Integer. (input)
Number of columns of A and B.
a SCOPY2: Real array of dimension (lda,n). (input)
CCOPY2: Complex array of dimension (lda,n). (input)
Contains matrix from which to copy.
lda Integer. (input)
Leading dimension of array a.
b SCOPY2: Real array of dimension (ldb,n). (output)
CCOPY2: Complex array of dimension (ldb,n). (output)
Contains matrix into which to copy.
ldb Integer. (input)
Leading dimension of array b.
NOTES
SCOPY2 and CCOPY2 are extensions to the standard set of Level 3 Basic Linear Algebra Subprograms
(Level 3 BLAS). They are matrix analogues of the Level 1 BLAS vector copy routines SCOPY(3S) and
CCOPY(3S).
The 2 in the routine name means "two-dimensional."
This routine vectorizes along the rows or columns, whichever is longer, and processes the other direction in
parallel.
SEE ALSO
CCOPY(3S), SCOPY(3S), SGESUM(3S)
NAME
SGEMM, CGEMM – Multiplies a real or complex general matrix by a real or complex general matrix
SYNOPSIS
CALL SGEMM (transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
CALL CGEMM (transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SGEMM multiplies a real general matrix by a real general matrix.
CGEMM multiplies a complex general matrix by a complex general matrix.
SGEMM and CGEMM perform one of the matrix-matrix operations:
C ←α op(A) op(B) + β C
where op(X) is one of the following:
op(X) = X
T
op(X) = X
H
op(X) = X (CGEMM only)
where
• α and β are scalars
• A, B, and C are matrices
• op(A) is an m-by-k matrix
• op(B) is a k-by-n matrix
• C is an m-by-n matrix.
T
• X is the transpose of x
H
• X is the conjugate transpose of X.
These routines have the following arguments:
NOTES
SGEMM and CGEMM are Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS).
SEE ALSO
SGEMMS(3S) to multiply general matrices by using Strassen’s algorithm
NAME
SGEMMS, CGEMMS – Multiplies a real or complex general matrix by a real or complex general matrix, using
Strassen’s algorithm
SYNOPSIS
UNICOS systems:
CALL SGEMMS (transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc, work)
CALL CGEMMS (transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc, work)
UNICOS/mk systems:
CALL SGEMMS (transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
CALL CGEMMS (transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
DESCRIPTION
SGEMMS multiplies a real general matrix by a real general matrix. CGEMMS multiplies a complex general
matrix by a complex general matrix.
SGEMMS and CGEMMS are implementations of the Winograd’s variation of Strassen’s algorithm for matrix
multiplication. The algorithm is descibed in the NOTES section of this man page. Because of the very
different order of operations performed by the Strassen’s algorithm, numerical results from SGEMMS and
CGEMMS may differ slightly from those of SGEMM and CGEMM.
On UNICOS systems, these routines are functionally equivalent to SGEMM and CGEMM except for the
addditional argument, work.
On UNICOS/mk systems, SGEMMS and CGEMMS are functionally equivalent to SGEMM and CGEMM except
for the following:
• m, n, k must be greater than 0.
• transa and transb = ’C’ or ’c’ is invalid in SGEMMS.
The UNICOS/mk version of these routines requires a workspace which is allocated, managed, and freed by
the routines.
SGEMMS and CGEMMS perform one of the matrix-matrix operations:
C ← α op (A ) op (B ) + β C
where op(X) is one of the following:
op(X) = X
T
op(X) = X
H
op(X) = X (CGEMMS only)
where
• α and β are scalars
• A, B, and C are matrices
• op(A) is an m-by-k matrix
• op(B) is a k-by-n matrix
• C is an m-by-n matrix
T
• X is the transpose of x
H
• X is the conjugate transpose of X.
These routines have the following arguments:
transa Character*1. (input)
Specifies the form of op(A) to be used in the matrix multiplication, as follows:
If transa = ’N’ or ’n’, op(A) = A
T
If transa = ’T’ or ’t’, op(A) = A
T H
In the UNICOS version, if transa = ’C’ or ’c’, op(A) = A (SGEMMS) or op(A) = A (CGEMMS)
H
In the UNICOS/mk version, if transa = ’C’ or ’c’, op(A) = A
transb Character*1. (input)
Specifies the form of op(B) to be used in the matrix multiplication, as follows:
If transb = ’N’ or ’n’, op(B) = B
T
If transb = ’T’ or ’t’, op(B) = B
T H
In the UNICOS version, if transb = ’C’ or ’c’: op(B) = B (SGEMMS) or op(B) = B (CGEMMS)
H
In the UNICOS/mk version, if transb = ’C’ or ’c’, op(B) = B
m Integer. (input)
Specifies the number of rows in matrix op(A) and in matrix C. m must be ≥ 0 on UNICOS
systems; m must be ≥ 1 for UNICOS/mk systems..
n Integer. (input)
Specifies the number of columns in matrix op(B) and in matrix C. n must be ≥ 0 for UNICOS
systems; n must be ≥ 1 for UNICOS/mk systems.
k Integer. (input)
Specifies the number of columns of matrix op(A) and the number of rows of matrix op(B). k
must be ≥ 0 for UNICOS systems; k must be ≥ 1 for UNICOS/mk systems.
NOTES
SGEMMS and CGEMMS are extensions to the standard Level 3 Basic Linear Algebra Subprograms (Level 3
BLAS).
Strassen’s Algorithm
Strassen’s algorithm for matrix multiplication is a complex, recursive algorithm that performs the
multiplication in a manner completely different from the usual inner product method.
Suppose you want to multiply a pair of square matrices of order n. The typical inner product method for
matrix multiplication has an operations count on the order of n 3; the operations count for Strassen’s
2.8
algorithm is on the order of n . The trade-off is that Strassen’s algorithm requires an auxiliary work space.
The UNICOS implementations of SGEMMS and CGEMMS require an array work, supplied by the calling
program, of the following size (or equivalently, a real array of twice this dimension for CGEMMS):
2.34*MAX(m,k)*MAX(k,n)
The work array is overwritten, and no diagnostic is given if the supplied array is too small.
For small problem sizes of dimensions less than or equal to 128 on UNICOS systems and 360 on
UNICOS/mk systems, SGEMMS and CGEMMS call SGEMM and CGEMM, respectively, to compute the matrix
multiply. Only when the problem sizes are larger than the dimensions indicated above is the Strassen’s
algorithm used. Because of the very different order of operations carried out by Strassen’s algorithm,
numerical results of SGEMMS and CGEMMS may differ slightly from those of SGEMM and CGEMM.
SEE ALSO
SGEMM(3S) to multiply general matrices by using the more standard inner product algorithm
NAME
SSYMM, CSYMM – Multiplies a real or complex general matrix by a real or complex symmetric matrix
SYNOPSIS
CALL SSYMM (side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
CALL CSYMM (side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SSYMM multiplies a real general matrix by a real symmetric matrix.
CSYMM multiplies a complex general matrix by a complex symmetric matrix.
SSYMM and CSYMM perform one of the following matrix-matrix operations:
C ← α AB + β C
where α and β are scalars, A is a symmetric matrix, and B and C are m-by-n matrices.
These routines have the following arguments:
side Character*1. (input)
Specifies whether the symmetric matrix A appears on the left or right in the operation, as
follows:
side = ’L’ or ’l’: C ← α A B + β C
side = ’R’ or ’r’; C ← α B A + β C
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of the symmetric matrix A is referenced, as
follows:
uplo = ’U’ or ’u’: only the upper triangular part of the symmetric matrix is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of the symmetric matrix is referenced.
m Integer. (input)
Specifies the number of rows in matrix C. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix C. n must be ≥ 0.
alpha SSYMM: Real. (input)
CSYMM: Complex. (input)
Scalar factor α.
NOTES
SSYMM and CSYMM are Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS).
SEE ALSO
CHEMM(3S)
NAME
SSYR2K, CSYR2K – Performs symmetric rank 2k update of a real or complex symmetric matrix
SYNOPSIS
CALL SSYR2K (uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
CALL CSYR2K (uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SSYR2K performs a symmetric rank 2k update of a real symmetric matrix.
CSYR2K performs a symmetric rank 2k update of a complex symmetric matrix.
SSYR2K and CSYR2K perform one of the following symmetric rank 2k operations:
T T
C ← α AB + α BA + β C
T T
C←αA B+αB A+βC
where
• α and β are scalars
• C is an n-by-n symmetric matrix
• A and B are n-by-k matrices in the first operation listed previously and k-by-n matrices in the second
T T
• A and B are transposes of A and B, respectively
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array c is referenced, as follows:
uplo = ’U’ or ’u’: only the upper triangular part of c is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of c is referenced.
trans Character*1. (input)
Specifies the operation to be performed, as follows:
T T
trans = ’N’ or ’n’: C ← α AB + α BA + β C
T T
trans = ’T’ or ’t’:C ← α A B + α B A + β C
n Integer. (input)
Specifies the order of matrix C. n must be ≥ 0.
k Integer. (input)
On entry with trans = ’N’ or ’n’, k specifies the number of columns of matrices A and B.
On entry with trans = ’T’ or ’t’, k specifies the number of rows of matrices A and B.
k must be ≥ 0.
alpha SSYR2K: Real. (input)
CSYR2K: Complex. (input) Scalar factor α.
a SSYR2K: Real array of dimension (lda,ka). (input)
CSYR2K: Complex array of dimension (lda,ka). (input)
When trans = ’N’ or ’n’, ka is k; otherwise, it is n. Contains the matrix A.
Before entry with trans = ’N’ or ’n’, the leading n-by-k part of array a must contain matrix A;
otherwise, the leading k-by-n part of array a must contain matrix A.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. If trans = ’N’ or ’n’, lda ≥
MAX(1,n); otherwise, lda ≥ MAX(1,k).
b SSYR2K: Real array of dimension (ldb,kb). (input)
CSYR2K: Complex array of dimension (ldb,kb). (input)
When trans = ’N’ or ’n’, kb is k; otherwise, it is n. Contains the matrix B.
Before entry with trans = ’N’ or ’n’, the leading n-by-k part of array b must contain matrix B;
otherwise, the leading k-by-n part of array b must contain matrix B.
ldb Integer. (input)
Specifies the first dimension of b as declared in the calling program. If trans = ’N’ or ’n’, ldb ≥
MAX(1,n); otherwise, ldb ≥ MAX(1,k).
beta SSYR2K: Real. (input)
CSYR2K: Complex. (input)
Scalar factor β.
c SSYR2K: Real array of dimension (ldc,n). (input and output)
CSYR2K: Complex array of dimension (ldc,n). (input and output)
Contains the matrix C.
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array c must
contain the upper triangular part of the symmetric matrix. The strictly lower triangular part of c
is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array c.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array c must
contain the lower triangular part of the symmetric matrix. The strictly upper triangular part of c
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array c.
ldc Integer. (input)
Specifies the first dimension of c as declared in the calling program. ldc ≥ MAX(1,n).
NOTES
SSYR2K and CSYR2K are Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS).
SEE ALSO
CHER2K(3S)
NAME
SSYRK, CSYRK – Performs symmetric rank k update of a real or complex symmetric matrix
SYNOPSIS
CALL SSYRK (uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
CALL CSYRK (uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SSYRK performs a symmetric rank k update of a real symmetric matrix.
CSYRK performs a symmetric rank k update of a complex symmetric matrix.
SSYRK and CSYRK perform one of the following symmetric rank k operations:
T
C ← α AA + β C
T
C←αA A+βC
T
where A is the transpose of A; α and β are scalars; C is an n-by-n symmetric matrix; A is an n-by-k matrix
in the first operation listed previously, and a k-by-n matrix in the second.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array c is referenced, as follows:
uplo = ’U’ or ’u’: only the upper triangular part of c is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of c is referenced.
trans Character*1. (input)
Specifies the operation to be performed, as follows:
trans = ’N’ or ’n’: C ← α A A T + β C
trans = ’T’ or ’t’: C ← α A T A + β C
n Integer. (input)
Specifies the order of matrix C. n must be ≥ 0.
k Integer. (input)
On entry with trans = ’N’ or ’n’, k specifies the number of columns of matrix A.
On entry with trans = ’T’ or ’t’, k specifies the number of rows of matrix A.
k must be ≥ 0.
NOTES
SSYRK and CSYRK are Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS).
SEE ALSO
CHERK(3S)
NAME
STRMM, CTRMM – Multiplies a real or complex general matrix by a real or complex triangular matrix
SYNOPSIS
CALL STRMM (side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
CALL CTRMM (side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
STRMM multiplies a real general matrix by a real triangular matrix.
CTRMM multiplies a complex general matrix by a complex triangular matrix.
STRMM and CTRMM perform one of the matrix-matrix operations:
B ←α op(A) B
B ←α B op(A)
where α is a scalar; B is an m-by-n matrix; A is either a unit or nonunit upper or lower triangular matrix,
and op(A) is one of the following:
• op(A) = A
• op(A) = A T
• op(A) = A H (CTRMM only)
where
• A T is the transpose of A
• A H is the conjugate transpose of A.
These routines have the following arguments:
side Character*1. (input)
Specifies whether op (A ) multiplies B from the left or right, as follows:
side = ’L’ or ’l’: B ← α op (A ) B
side = ’R’ or ’r’: B ← α B op (A )
uplo Character*1. (input)
Specifies whether matrix A is an upper or lower triangular matrix, as follows:
uplo = ’U’ or ’u’: A is an upper triangular matrix.
uplo = ’L’ or ’l’: A is a lower triangular matrix.
NOTES
STRMM and CTRMM are Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS).
NAME
INTRO_FFT – Introduction to signal processing routines
IMPLEMENTATION
See individual man pages for implementation details
DESCRIPTION
The signal processing routines consist of Fast Fourier Transform (FFT) routines, filter routines, and
convolution routines.
Fast Fourier Transform Routines
These routines apply to one or more FFTs. The Standard FFT package, available only on UNICOS systems,
is discussed first. Then the superseded FFT routines, available on all Cray vector architectures, are
discussed.
Standard FFT package (UNICOS systems)
The following is a matrix of preferred FFT routines. These routines are preferred because they have more
functionality, are more generally applicable, and are more fully optimized than the superseded routines that
follow them. Each of these routines is multitasked, but they also are highly optimized for single-processor
use. Each routine can compute either a forward or inverse Fourier transform.
In this matrix, columns of the matrix represent input and output data types for the routines in each column:
• Complex-to-complex implies complex input and output. In this column, the routine name in parentheses
is the name of the equivalent UNICOS routine, which is provided in this release for backward
compatibility. Each routine named in this column has a man page in this section.
• Real-to-complex implies real input and complex output. Each routine named in this column has a man
page in this section.
• Complex-to-real implies complex input and real output. Each routine named in this column is
documented with the real-to-complex routine in the same row.
Rows of the matrix represent the number of dimensions for which the FFT is calculated for the routines in
each row:
• One-dimensional (single) calculates one FFT in one dimension.
• One-dimensional (multiple) calculates an FFT in one dimension for each column of a two-dimensional
matrix.
• Two-dimensional calculates one FFT in two dimensions.
• Three-dimensional calculates one FFT in three dimensions.
Those routines marked with an asterisk (*) are available on UNICOS/mk systems only.
Number of
FFTs Complex-to-complex Real-to-complex Complex-to-real
Purpose Name
Convolution Routines
The convolution routines compute the convolution of a complex sequence with one or more other complex
sequences. The following table contains a summary of the convolution routines. Each routine has its own
man page.
Purpose Name
UNICOS/mk Routines
These routines run only on UNICOS/mk systems. Each routine has its own man page.
Purpose Name
Purpose Name
Purpose Name
NAME
CCFFT – Applies a multitasked complex-to-complex Fast Fourier Transform (FFT)
SYNOPSIS
CALL CCFFT (isign, n, scale, x, y, table, work, isys)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data only.
DESCRIPTION
CCFFT computes the Fast Fourier Transform (FFT) of the complex vector x, and it stores the result in vector
y.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
COMPLE X X(0 :N- 1), Y(0 :N- 1)
The output array is the FFT of the input array, using the following formula for the FFT:
n −1
Σ
. j .k
Yk = scale . X j . ωisign for k = 0, . . ., n −1
j =0
where
2.π.i
ω=e n
i = + √−1
π = 3.14159. . .
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.
The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 /(n .scale ). In particular,
if you use isign = +1 and scale = 1.0 you can compute the inverse FFT by using isign = – 1 and
scale = 1.0 /n.
The output array may be the same as the input array, provided that n has at least 2 factors.
On UNICOS/mk systems only: if the length of the FFT (i.e. n) is not factorizable into powers of 2, 3 and 5
(that is, when a fast mixed-radix algorithm cannot be used) the user may specify that a fast chirp-z
transform-based algorithm be used instead of a slow O(nˆ2) algorithm. The isys variable allows the user to
exercise this option. Setting the value of isys to 0 uses the slow algorithm while setting it to 1 flags the use
of the fast algorithm. Depending on the value of isys specified, the size of the table vector and workspace
vector vary.
This routine has the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n Integer. (input)
Size of the transform (the number of values in the input array). n ≥ 2.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale after taking the Fourier
transform, as defined by the previous formula.
x Complex array of dimension (0:n– 1). (input)
Input array of values to be transformed.
y Complex array of dimension (0:n– 1). (output)
Output array of transformed values. The output array may be the same as the input array. In
that case, the transform is done in place and the input array is overwritten with the transformed
values.
table UNICOS: Real array of dimension 100 + 8 . n. (input or output)
UNICOS/mk: Real array of dimension 2n when isys = 0 and real array of dimension 12n when
isys = 1.
Table of factors and trigonometric functions.
If isign = 0, the routine initializes table (table is output only).
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0 (table is input only).
work UNICOS: Real array of dimension 8n. (workspace)
UNICOS/mk: Real array of dimension 4n when isys = 0 and real array of dimension 8n when
isys = 1 (workspace).
Work array. This is a scratch array used for intermediate calculations. Its address space must
be different address space from that of the input and output arrays.
NOTES
This section contains information about the algorithm for CCFFT, the initialization of the table array, the
declaration of dimensions for x and y arrays, some performance tips, and some implementation dependent
details.
Algorithm
UNICOS/mk:
The algorithm used is a variant of Agarwal’s algorithm when n is factorizable into powers of 2, 3 and 5. If
n is prime or is not factorizable into powers of 2, 3 and 5 then setting isys to 1 results in the use of a fast
(O(n log(n))) algorithm based on the chirp-z transform. For example, 120 and 256 are factorizable into
powers of 2, 3 and 5, but 254 = 2 . 127 is not factorizable. To obtain considerable reduction in time to
compute the FFT of a 254 length vector, the integer isys may be set to 1.
UNICOS systems:
The algorithm is the "four-step" method, in which the data is considered as a matrix of dimensions n1 by n2,
for which n1 . n2 = n , and the values of n1 and n2 are chosen for efficiency.
The rows are transformed, the phase factors are applied, the columns are transformed, and finally the matrix
is transposed to obtain the result.
Initialization
The table array stores the trigonometric tables used in calculation of the FFT. You must initialize table by
calling the routine with isign = 0 prior to doing the transforms. If the value of the problem size, n, does not
change, table does not have to be reinitialized.
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared as follows:
COM PLE X X(0 :N-1)
COMPLE X Y(0 :N- 1)
However, if you prefer to use the more customary Fortran style with subscripts starting at 1 you do not have
to change the calling sequence, as in the following (assuming N > 0):
COM PLEX X(N)
COM PLEX Y(N)
Performance Tips
This routine computes an FFT for any value of n, but the performance for a given value of n depends on the
prime factorization of n. This fact is characteristic of all FFT algorithms.
Fastest performance is realized when n is a power of 2. In that case, the number of floating-point operations
is approximately 5nlog 2 (n).
If n contains factors of 3, computation time is slightly longer, because more floating-point operations are
required. If n contains powers of 5, it is longer still. Slowest performance is when n is a prime number. In
that case, the number of floating-point operations is approximately 8 . n 2 when isys = 0.
On UNICOS/mk systems only, if n is prime, setting isys = 1 results in the use of an algorithm whose
complexity is approximately the same as using a fast mixed-radix algorithm on a vector of twice the length.
The kernel routines are optimized for values of n that are products of powers of 2, 3, and 5.
(Because the kernel routines have a special case for multiples of 4, powers of 4 will be slightly faster than
odd powers of 2.)
Implementation-dependent Items
The UNICOS and UNICOS/mk FFT routines were designed so that they can be implemented efficiently on
many different architectures. The calling sequence is the same in any implementation. Certain details,
however, depend on the particular implementation. These details are confined to two areas:
• The first area is the size of the table and work arrays. Different systems may need different sizes. The
subroutine call requires no change, but you may have to change array sizes in the DIMENSION or type
statements that declare the arrays.
• The second area is the isys parameter array, an array that gives certain implementation-specific
information. All features and functions of the FFT routines specific to any particular implementation are
confined to this isys array. On any implementation, you can use the default values by using an argument
value of 0.
In the current UNICOS implementation, no special options are supported; therefore, you may specify that
the isys parameter always be given as a constant 0. Subsequent software releases may provide other
options.
EXAMPLES
These examples use the table and workspace sizes appropriate to UNICOS systems.
Example 1: Initialize the complex array table in preparation for doing an FFT of size 1024. Only the isign,
n, and table arguments are used in this case. You can use dummy arguments or zeros for the other
arguments in the subroutine call.
REAL TAB LE(100 + 8*1024)
CALL CCFFT( 0, 1024, 0.0 , DUM MY, DUM MY, TAB LE, DUM MY, 0)
Example 2: x and y are complex arrays of dimension (0:1023). Take the FFT of x and store the results in y.
Before taking the FFT, initialize the table array, as in example 1.
COM PLE X X(0 :1023), Y(0 :10 23)
REA L TAB LE( 100 + 8*1024 )
REAL WORK(8 *10 24)
...
CAL L CCF FT( 0, 1024, 1.0 , X, Y, TAB LE, WOR K, 0)
CALL CCF FT(1, 102 4, 1.0, X, Y, TABLE, WOR K, 0)
Example 3: Using the same x and y as in example 2, take the inverse FFT of y and store it back in x. The
scale factor 1/1024 is used. Assume that the table array is already initialized.
CAL L CCF FT( -1, 102 4, 1.0/10 24.0, Y, X, TABLE, WOR K, 0)
Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. No change was needed in the subroutine calls.
COM PLEX X(1024), Y(1 024 )
...
CALL CCF FT(0, 102 4, 1.0, X, Y, TABLE, WOR K, 0)
CAL L CCF FT( 1, 1024, 1.0 , X, Y, TAB LE, WORK, 0)
Example 5: Do the same computation as in example 4, but put the output back in array x to save storage
space. Assume that table is already initialized.
COMPLE X X(1024 )
...
CAL L CCF FT( 1, 1024, 1.0 , X, X, TAB LE, WOR K, 0)
SEE ALSO
CCFFTM(3S), SCFFT(3S), SCFFTM(3S)
NAME
CCFFT2D – Applies a two-dimensional complex-to-complex Fast Fourier Transform (FFT)
SYNOPSIS
CALL CCFFT2D (isign, n1, n2, scale, x, ldx, y, ldy, table, work, isys)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.
DESCRIPTION
CCFFT2D computes the two-dimensional complex Fast Fourier Transform (FFT) of the complex matrix X,
and it stores the results in the complex matrix Y.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
COM PLEX X(0:n1 -1, 0:n 2-1 )
COM PLEX Y(0:n1 -1, 0:n 2-1 )
where
isign . 2 . π . i
ω1 = e isign n1
. 2 . π . i i = +√−1
ω2 = e n2
π = 3.14159. . .
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.
The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 / (n1.n2.scale). In
particular, if you use isign = +1 and scale = 1.0 for the forward FFT, you can compute the inverse FFT by
using isign = – 1 and scale = 1 /(n1.n2).
n2 Integer. (input)
Transform size in the second dimension. If n2 is not positive, CCFFT2D returns without
performing a transform.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale factor after taking the
Fourier transform, as defined previously.
x Complex array of dimension (0:ldx– 1, 0:n2– 1). (input)
Input array of values to be transformed.
ldx Integer. (input)
The number of rows in the x array, as it was declared in the calling program (the leading
dimension of x). ldx ≥ MAX(n1, 1).
y Complex array of dimension (0:ldy– 1, 0:n2– 1). (output)
Output array of transformed values. The output array may be the same as the input array, in
which case, the transform is done in place (the input array is overwritten with the transformed
values). In this case, it is necessary that ldx = ldy.
ldy Integer. (input)
The number of rows in the y array, as it was declared in the calling program (the leading
dimension of y). ldy ≥ MAX(n1, 1).
table UNICOS systems: real array of dimension 100 + 2(n1 + n2). (input or output)
UNICOS/mk systems: Real array of dimension 2(n1 + n2) if both isys(2) and isys(3) are equal to
zero. Private real vector of length 12(n1 + n2) if either isys(2) or isys(3) is equal to 1. (input or
output)
Table of factors and trigonometric functions.
If isign = 0, the routine initializes table (table is output only).
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0 (table is input only).
work UNICOS systems: real array of dimension 512 . MAX(n1, n2). (scratch output)
UNICOS/mk systems: Real array of dimension 2 . n1 . n2 (scratch output)
Work array. This is a scratch array used for intermediate calculations. Its address space must
be different from that of the input and output arrays.
isys UNICOS systems: ignored.
UNICOS/mk systems: Integer array of length 3. (input)
isys(1) = 2
isys(2) = 0 (if n1 is factorizable into powers of 2, 3 and 5)
1 ( if n1 is not factorizable into powers of 2, 3 and 5)
isys(3) = 0 (if n1 is factorizable into powers of 2, 3 and 5)
1 ( if n1 is not factorizable into powers of 2, 3 and 5)
NOTES
This section contains information about the algorithm for CCFFT2D, the initialization of the table array, the
declaration of dimensions for x and y arrays, some performance tips, and some implementation dependent
details.
The following notes are for UNICOS systems only. CCFFT2D(3S) on UNICOS/mk systems provides the
functionality of PCCFFT2D(3S) on a single PE. For notes about CCFFT2D on UNICOS/mk systems, see
PCCFFT2D(3S).
Algorithm
CCFFT2D uses a routine very much like CCFFTM(3S) to do multiple FFTs first on all columns in an input
matrix and then on all of the rows.
Initialization
The table array stores factors of n1 and n2 and also trigonometric tables that are used in calculation of the
FFT. This table must be initialized by calling the routine with isign = 0. If the values of the problem sizes,
n1 and n2, do not change, the table does not have to be reinitialized.
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared as follows:
COMPLE X X(0 :ldx-1 , 0:n 2-1 )
COM PLE X Y(0 :ldy-1, 0:n 2-1 )
However, the calling sequence does not change if you prefer to use the more customary Fortran style with
subscripts starting at 1. The same values of ldx and ldy would be passed to the subroutine even if the input
and output arrays were dimensioned as follows:
COM PLE X X(l dx, n2)
COM PLE X Y(l dy, n2)
Performance Tips
This routine computes an FFT for any values of n1 and n2, but the performance depends on the prime
factorizations of n1 and n2. This fact is characteristic of all FFT algorithms.
Fastest performance is realized when both n1 and n2 are powers of 2. In that case, the number of floating-
point operations is approximately 5 . n1 . n2 . log 2 (n1 . n2).
If either n1 or n2 contains factors of 3, computation time is slightly longer, because more floating-point
operations are required. If they contain powers of 5, it is longer still.
The kernel routines are optimized for values of n1 and n2 that are products of powers of 2, 3, and 5.
In the UNICOS systems implementation, it is very important to make the leading dimensions of the arrays
odd numbers (or, if that is not possible, make them an odd multiple of 2) to avoid memory bank conflicts.
Implementation-dependent Items
The UNICOS and UNICOS/mk FFT routines were designed so that they could be implemented efficiently on
many different architectures. The calling sequence is the same in any implementation. Certain details,
however, depend on the particular implementation. These details are confined to three areas:
• The first area is the size of the table and work arrays. Different systems may need different sizes. No
change is required to the subroutine call, but you may have to change array sizes in the DIMENSION or
type statements that declare the arrays.
• The second area is the isys parameter array, an argument that gives certain implementation-specific
information. All features and functions of the FFT routines that are specific to any particular
implementation are confined to this isys array. On any implementation, you can use the default values by
using an argument value of 0.
In the UNICOS systems implementation, no special options are supported; therefore, you can always
specify the isys parameter as a constant 0. Subsequent software releases may provide other options.
• The third area is the issue of which problem sizes or dimensions give optimal performance in a particular
implementation. See the Performance Tips subsection.
EXAMPLES
All examples here are for UNICOS systems only.
Example 1: Initialize the TABLE array in preparation for doing a two-dimensional FFT of size 128 by 256.
In this case only the isign, n1, n2, and table arguments are used; you can use dummy arguments or zeros for
other arguments.
REA L TAB LE( 100 + 2*( 128 + 256))
CAL L CCF FT2 D (0, 128, 256 , 0.0 , DUM MY, 1, DUM MY, 1,
& TAB LE, DUM MY, 0)
Example 2: X and Y are complex arrays of dimension (0:128, 0:255). The first 128 elements of each
column contain data. For performance reasons, the extra element forces the leading dimension to be an odd
number. Take the two-dimensional FFT of X and store it in Y. Initialize the TABLE array, as in example 1.
COM PLE X X(0 :12 8, 0:2 55)
COM PLE X Y(0 :12 8, 0:2 55)
REA L TAB LE( 100 + 2*( 128 + 256 ))
REA L WOR K(5 12* 256 )
...
CAL L CCF FT2 D(0 , 128 , 256 , 1.0 , X, 129, Y, 129 , TAB LE, WORK, 0)
CAL L CCF FT2 D(1 , 128 , 256, 1.0, X, 129 , Y, 129 , TAB LE, WOR K, 0)
Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
1
factor is used. Assume that the TABLE array is already initialized.
128.256
CALL CCFFT2D(- 1, 128, 256 , 1.0 /(128. 0*2 56.0), Y, 129 ,
& X, 129 , TAB LE, WOR K, 0)
Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. The subroutine calls are not changed.
COMPLEX X(1 29, 256)
COMPLEX Y(1 29, 256)
...
CALL CCFFT2 D(0 , 128 , 256 , 1.0 , X, 129 , Y, 129, TABLE, WOR K, 0)
CAL L CCF FT2D(1, 128, 256 , 1.0 , X, 129 , Y, 129 , TAB LE, WOR K, 0)
Example 5: Do the same computation as in example 4, but put the output back in array X to save storage
space. Assume that the TABLE array is already initialized.
COMPLE X X(1 29, 256)
...
CAL L CCF FT2D(1, 128, 256 , 1.0 , X, 129 , X, 129 , TAB LE, WOR K, 0)
SEE ALSO
CCFFT(3S), CCFFT3D(3S), CCFFTM(3S), SCFFT(3S), SCFFT2D(3S), SCFFT3D(3S), SCFFTM(3S)
NAME
CCFFT3D – Applies a three-dimensional complex-to-complex Fast Fourier Transform (FFT)
SYNOPSIS
CALL CCFFT3D (isign, n1, n2, n3, scale, x, ldx, ldx2, y, ldy, ldy2, table, work, isys)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.
DESCRIPTION
CCFFT3D computes the three-dimensional complex FFT of the complex matrix X, and it stores the results in
the complex matrix Y.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way. So
suppose the arrays are dimensioned as follows:
COM PLEX X(0:n1 -1, 0:n 2-1 , 0:n 3-1 )
COM PLEX Y(0:n1 -1, 0:n 2-1 , 0:n 3-1 )
NOTES
CCFFT3D is the generalization of CCFFT2D to three dimensions. All of the notes for CCFFT2D apply,
with the obvious modifications for three dimensions.
EXAMPLES
The following examples are for UNICOS systems only. CCFFT3D(3S) on UNICOS/mk systems provides
the functionality of PCCFFT3D(3S) on a single PE. For notes on CCFFT3D(3S) on UNICOS/mk systems,
see PCCFFT3D(3S).
In all the examples shown below isys is set to 0. For better performance on small size 3D FFTs, setting isys
= 1 and providing adequate workspace would yield better performance.
Example 1: Initialize the TABLE array in preparation for doing a three-dimensional FFT of size 128 by 128
by 128. In this case, only the isign, n1, n2, n3, and table arguments are used; you can use dummy
arguments or zeros for other arguments.
REA L TAB LE( 100 + 2*( 128 + 128 + 128))
CAL L CCF FT3 D (0, 128, 128, 128, 0.0, DUMMY, 1, 1, DUM MY, 1, 1,
& TAB LE, DUM MY, 0)
Example 2: X and Y are complex arrays of dimension (0:128, 0:128, 0:128). The first 128 elements of each
dimension contain data; for performance reasons, the extra element forces the leading dimensions to be odd
numbers. Take the three-dimensional FFT of X and store it in Y. Initialize the TABLE array, as in example
1.
COM PLE X X(0 :12 8, 0:1 28, 0:1 28)
COM PLE X Y(0 :12 8, 0:1 28, 0:1 28)
REA L TAB LE( 100 + 2*( 128 + 128 + 128))
REA L WOR K(5 12* 128 )
...
CAL L CCF FT3 D(0 , 128, 128 , 128 , 1.0 , DUM MY, 1, 1,
& DUM MY, 1, 1, TAB LE, WOR K, 0)
CAL L CCF FT3 D(1 , 128, 128, 128, 1.0 , X, 129 , 129 ,
& Y, 129 , 129 , TAB LE, WOR K, 0)
Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
1
factor is used. Assume that the TABLE array is already initialized.
1283
CAL L CCF FT3D(- 1, 128 , 128 , 128 , 1.0 /(1 28. 0**3), Y, 129 , 129 ,
& X, 129 , 129 , TAB LE, WORK, 0)
Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. The subroutine calls do not change.
COMPLEX X(129, 129 , 129 )
COM PLEX Y(129, 129 , 129 )
...
CAL L CCF FT3 D(0, 128 , 128 , 128 , 1.0 , DUM MY, 1, 1,
& DUMMY, 1, 1, TAB LE, WORK, 0)
CAL L CCF FT3D(1, 128, 128, 128, 1.0, X, 129 , 129 ,
& Y, 129 , 129 , TABLE, WOR K, 0)
Example 5: Do the same computation as in example 4, but put the output back in the array X to save
storage space. Assume that the TABLE array is already initialized.
COMPLE X X(1 29, 129, 129)
...
CALL CCFFT3 D(1 , 128 , 128 , 128 , 1.0 , X, 129, 129,
& X, 129, 129, TABLE, WOR K, 0)
SEE ALSO
CCFFT(3S), CCFFT2D(3S), CCFFTM(3S), SCFFT(3S), SCFFT2D(3S), SCFFT3D(3S), SCFFTM(3S)
NAME
CCFFTM – Applies multiple multitasked complex-to-complex Fast Fourier Transforms (FFTs)
SYNOPSIS
CALL CCFFTM (isign, n, lot, scale, x, ldx, y, ldy, table, work, isys)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
CCFFTM computes the FFT of each column of the complex matrix X, and it stores the results in the columns
of complex matrix Y.
Suppose the arrays are dimensioned as follows:
COMPLE X X(0 :ld x-1 , 0:lot- 1)
COMPLE X Y(0 :ld y-1 , 0:l ot- 1)
where:
2 . π . i
ω=e n
i = +√−1
π = 3.14159. . . isign = ±1
lot = Number of columns to transform
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.
The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 / (n . scale). In
particular, if you use isign = +1 and scale = 1.0 for the forward FFT, you can compute the inverse FFT by
using the following: isign = – 1 and scale = 1.0 / n.
NOTES
This section contains information about the algorithm for CCFFTM, the initialization of the table array, the
declaration of dimensions for x and y arrays, some performance tips, and some implementation-dependent
details.
Algorithm
UNICOS only: CCFFTM uses decimation-in-frequency type FFT. It takes the FFT of the columns and
vectorizes the operations along the rows of the matrix. Thus, the vector length in the calculations depends
on the row size, and the strides for vector loads and stores are the leading dimensions, ldx and ldy.
On UNICOS/mk systems, this routine is not optimized.
Initialization
The table array stores the trigonometric tables used in calculation of the FFT. You must initialize the table
array by calling the routine with isign = 0 prior to doing the transforms. If the value of the problem size, n,
does not change, table does not have to be reinitialized.
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared as follows:
COM PLEX X(0:ldx-1 , 0:l ot- 1)
COM PLEX Y(0:ldy-1 , 0:l ot- 1)
The calling sequence does not have to change, however, if you prefer to use the more customary Fortran
style with subscripts starting at 1. The same values of ldx and ldy would be passed to the subroutine even if
the input and output arrays were dimensioned as follows:
COM PLEX X(ldx, lot)
COM PLEX Y(ldy, lot)
Performance Tips
This routine computes an FFT for any value of n, but the performance for a given value of n depends on the
prime factorization of n. This fact is characteristic of all FFT algorithms.
Fastest performance is realized when n is a power of 2. In that case, the number of floating-point operations
is approximately 5 . lot . n . log 2 (n)
If n contains factors of 3, computation time is slightly longer, because more floating-point operations are
required. If n contains powers of 5, it is longer still. Slowest performance is when n is a prime number. In
2
that case, the number of floating-point operations is approximately 8 . lot . n .
The kernel routines are optimized for values of n that are products of powers of 2, 3, and 5. Because the
kernel routines have a special case for multiples of 4, even powers of 2 will be slightly faster than odd
powers of 2.
In this implementation, to avoid memory bank conflicts, it is very important to make the leading dimensions
of the arrays odd numbers (or, if that is not possible, make them an odd multiple of 2). To attain best
vectorization performance, the lot size should be at least 64, and preferably, it should be a multiple of 64.
Implementation-dependent Items
The UNICOS and UNICOS/mk FFT routines were designed so that they can be implemented efficiently on
many different architectures. The calling sequence is the same in any implementation. Certain details,
however, depend on the particular implementation. These details are confined to three areas:
• The first area is the size of the table and work arrays. Different systems may need different sizes. The
subroutine call requires no change, but you may have to change the array sizes in the DIMENSION or
type statements that declare the arrays.
• The second area is the isys parameter array, an argument that gives certain implementation-specific
information. All features and functions of the FFT routines that are specific to any particular
implementation are confined to this isys array. On any implementation, you can use the default values by
specifying an argument value of 0.
In the UNICOS systems implementation, no special options are supported; therefore, you can always
specify the isys parameter as a constant 0. Subsequent software releases may provide other options.
• The third area is the issue of which problem sizes or dimensions give optimal performance in a particular
implementation. See the Performance Tips subsection.
EXAMPLES
Example 1: Initialize the TABLE array in preparation for doing an FFT of size 128. Only the isign, n, and
table arguments are used in this case. You can use dummy arguments or zeros for the other arguments in
the subroutine call.
REA L TAB LE(100 + 2*1 28)
CALL CCFFTM (0, 128, 0, 0., DUM MY, 1, DUM MY, 1, TAB LE, DUM MY, 0)
Example 2: X and Y are complex arrays of dimension (0:128) by (0:55). The first 128 elements of each
column contain data. For performance reasons, the extra element forces the leading dimension to be an odd
number. Take the FFT of the first 50 columns of X and store the results in the first 50 columns of Y.
Before taking the FFT, initialize the TABLE array, as in example 1.
COM PLEX X(0:128, 0:5 5)
COM PLE X Y(0 :128, 0:5 5)
REA L TAB LE( 100 + 2*128)
REAL WORK(4 *12 8*50)
...
CALL CCFFTM (0, 128, 50, 1.0 , X, 129 , Y, 129 , TAB LE, WOR K, 0)
CALL CCFFTM (1, 128, 50, 1.0 , X, 129 , Y, 129 , TAB LE, WOR K, 0)
Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
factor 1/128 is used. Assume that the TABLE array is already initialized.
CAL L CCF FTM(-1, 128, 50, 1./ 128 ., Y, 129 , X, 129 , TAB LE, WOR K,0 )
Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. The subroutine calls are not changed.
COM PLEX X(129, 55)
COM PLEX Y(129, 55)
...
CALL CCFFTM (0, 128, 50, 1.0 , X, 129 , Y, 129 , TAB LE, WOR K, 0)
CALL CCFFTM (1, 128, 50, 1.0 , X, 129 , Y, 129 , TAB LE, WOR K, 0)
Example 5: Do the same computation as in example 4, but put the output back in array X to save storage
space. Assume that the TABLE array is already initialized.
COMPLE X X(1 29, 55)
...
CALL CCFFTM (1, 128, 50, 1.0 , X, 129 , X, 129 , TAB LE, WOR K, 0)
SEE ALSO
CCFFT(3S), SCFFT(3S), SCFFTM(3S)
NAME
CCNVL – Computes the convolution of a complex sequence with one or more other complex sequences
SYNOPSIS
CALL CCNVL (nh, nx, m, ny, h, inc1h, x, inc1x, inc2x, y, inc1y, inc2y)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
CCNVL computes the convolution of the complex sequence h with one or more complex sequences x.
This routine has the following arguments:
nh Integer. (input)
Number of elements in the h sequence. nh ≥ 0. If nh = 0, CCNVL zeroes out all elements of
output matrix y.
nx Integer. (input)
Number of elements in each sequence of x values. nx ≥ 0. If nx = 0, CCNVL zeroes out all
elements of output matrix y.
m Integer. (input)
Number of sequences of x values. m ≥ 0. If m = 0, CCNVL returns without calculating a
convolution.
ny Integer. (input)
Number of elements in output sequence y. ny ≥ 0. If ny = 0, CCNVL returns without calculating
a convolution.
h Complex array of dimension nh. (input)
Input sequence to be convolved with x.
inc1h Integer. (input)
Address increment between elements in array h. inc1h must not be zero.
x Complex array of dimension (nx, m). (input)
Input matrix to be convolved with h.
inc1x Integer. (input)
Address increment between elements in each sequence of x values. inc1x must not be zero.
inc2x Integer. (input)
Address increment between sequences of x values. inc2x must not be 0.
y Complex array of dimension (ny, m). (output)
Output matrix of convolutions.
NOTES
The following notes define the convolution more precisely, and discuss its uses and performance.
Convolution Definition
The precise definition of convolution computed by CCNVL is as follows:
Let h be a sequence of nh elements and X be a matrix with m columns, and nx elements in each column, as
follows:
x x 0,1 x 0,2 . . . x 0,m −1
h0 0,0
h x 1,0 x 1,1 x 1,2 . . . x 1,m −1
1
h = h.2 X = x 2,0 x 2,1 x 2,2 . . . x 2,m −1
. . .. .. .. ..
. .. . . . .
h nh −1
xnx −1,0 xnx −1,1 xnx −1,2 . . . xnx −1,m −1
A complex convolution is similar to the multiplication of complex polynomials. You can think of the
sequence h as a sequence of coefficients of the polynomial
H (z ) = h 0 + h 1z + h 2z 2 + h 3z 3 + . . . + hnh −1z nh −1
Similarly, each column
x
0,j
..
xj = .
xnx −1,j
of the matrix X can be considered the coefficients of each of the polynomials
X j (z ) = x 0,j + x 1z + x 2,j z 2 + x 3,j z 3 + . . . + xnx −1,j z nx −1
The convolution product h and each column y j = h . x j is the sequence whose elements are the coefficients
of the product polynomial
Y j (z) = H(z)X j (z)
The operation of convolution is commutative, so that h . x = x . h. In this subroutine, however, h and x are
not interchangeable, because h is restricted to be a vector, but x can be a matrix of one or more vectors, all
of the same length.
Uses for Convolution
Convolutions have numerous applications in signal processing, where the convolution operation is sometimes
called filtering. The sequence h might represent a filter, and the matrix x might represent a set of input
signals. The output matrix y would represent the output signals obtained by filtering (convolving) x with h.
Performance
If the NCPUS environment variable is set and greater than 1, CCNVL multitasks on m; that is, it performs the
convolutions of successive x sequences in parallel.
This routine is efficient for any value of the arguments. For long sequences, however, there is a faster
algorithm that uses a Fourier transform technique. For details, see the CCNVLF(3S) routine.
SEE ALSO
CCNVLF(3S)
NAME
CCNVLF – Computes the convolution of a complex sequence with one or more other complex sequences by
using a Fourier transform method
SYNOPSIS
CALL CCNVLF (nh, nx, m, ny, h, inc1h, x, inc1x, inc2x, y, inc1y, inc2y, table, ntable,
work, nwork)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
CCNVLF computes the convolution of the complex sequence h with one or more complex sequences x.
CCNVLF has exactly the same effect as the CCNVL(3S) routine. The difference is in the algorithm.
CCNVL(3S) computes the convolution directly, but CCNVLF uses a Fourier transform method, using a
routine similar to CCFFTM(3S). CCNVLF requires additional space for tables and workspace, but for long
sequences, it can be significantly faster than CCNVL(3S).
See the CCNVL(3S) man page for a definition of the convolution product that is computed by both routines.
This routine has the following arguments:
nh Integer. (input)
Number of elements in sequence h. nh ≥ 0. If nh = 0, CCNVLF zeroes out all elements of
output matrix y.
nx Integer. (input)
Number of elements in each sequence of x values. nx ≥ 0. If nx = 0, CCNVLF zeroes out all
elements of output matrix y.
m Integer. (input)
Number of sequences of x values. m ≥ 0. If m = 0, CCNVLF returns without calculating a
convolution.
ny Integer. (input)
Number of elements in output sequence y. ny ≥ 0. If ny = 0, CCNVLF returns without
calculating a convolution.
h Complex array of dimension nh. (input)
Input sequence to be convolved with x.
inc1h Integer. (input)
Address increment between elements in array h. inc1h must not be zero.
x Complex array of dimension (nx, m). (input)
Input matrix to be convolved with h.
NOTES
The computed output values y are the same as those computed by routine CCNVL(3S). See the CCNVL(3S)
man page for a definition of the convolution product.
The algorithm of this routine uses the famous Convolution Theorem, which states that the Fourier transform
of a convolution is the product of the Fourier transforms of the original sequences.
This routine performs a Fast Fourier Transform (FFT) on each on the sequences h and x, and an inverse FFT
on the product to compute the convolution. In each case, the order of each FFT is n = nh + nx.
The routine works correctly for any value of n, but as with all FFT routines, the performance depends on the
prime factorization of n. For the best performance, n should be a power of 2. If n is a moderately large
number that is a product of powers of 2, 3, and 5, you will still get very good performance. To obtain a
good value of n, the input sequences can be padded with zeros, if necessary. See the CCFFTM(3S) man page
for general information about FFTs.
The table array is initialized on the first call to routine CCNVLF. It is not reinitialized in subsequent calls
unless the value of n changes.
If the NCPUS environment variable is set and greater than 1, this routine multitasks on m; that is, it performs
the convolutions of successive x sequences in parallel.
SEE ALSO
CCFFTM(3S), CCNVL(3S)
NAME
CFFT – Applies a multitasked complex Fast Fourier Transform (FFT)
SYNOPSIS
CALL CFFT (isign, n, scale, x, incx, y, incy, table, ntable, work, nwork)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
CFFT computes the FFT of the complex vector x, and it stores the result in vector y. For most purposes,
CFFT is superseded by the FFT routine CCFFT(3S).
Suppose arrays X and Y are dimensioned as follows:
COMPLE X X(0 :N- 1), Y(0 :N- 1)
where
(isign )2πi
ω=e n
isign = ±1
π = 3.14159. . . e = 2.71828. . .
i = √−1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. In this routine, when
isign = +1 it is called the forward transform, and when isign = – 1 it is called the inverse transform.
This routine has the following arguments:
NOTES
This section contains information about the algorithm for CFFT, the initialization of the table array, the size
of the table and work arrays, and some performance tips.
Algorithm
The algorithm for CFFT is the "Four Step FFT," which is as follows:
Let n be the order of the transform.
Let n = n1 . n2; n1 and n2 are close to the square root of n, and they are chosen for performance. Then:
1. Perform n1 simultaneous n2-point FFTs on the n data elements, treated as an n1-by-n2 matrix.
jk
2. Multiply the resulting matrix by the phase factors α .
3. Transpose the resulting data array, treated as an n1-by-n2 matrix, into an n2-by-n1 matrix.
4. Perform n2 simultaneous n1-point FFTs on the n data elements, treated as an n2-by-n1 matrix.
CFFT includes a special case for n2 = 1, in which case, a conventional FFT is done.
Initialization
The table array is used to store factors of n1 and n2 and trigonometric tables that are used in calculation of
the FFT. You can initialize table explicitly by calling the routine with isign = 0. If you do not initialize
table, the routine does so automatically on the first call. If the value of the problem size, n, does not change
between calls, the table does not need to be reinitialized. If you call the routine with a different value of n
without first reinitializing the table, the routine reinitializes the table automatically.
Re-initialization of table is relatively time-consuming. If you are continually changing the problem size, you
might consider using more than one table array so that it will not have to be reinitialized on each subroutine
call.
If you initialize the table explicitly by calling the routine with isign = 0, the only arguments that are
significant are isign, n, table, and ntable. In this case, the other arguments are ignored.
The value of ntable is checked when the table is initialized to verify that the table space you provided is
large enough. If it is not, the routine stops after printing an error message, which indicates the amount of
table space required. (See the following subsection.)
Size of Table and Work Arrays
The precise sizes of the table and work arrays depend on the numbers n1 and n2 in the factorization of
n = n1 . n2.
ntable = 100 + 2(n1 + n2 + np)
nwork = 4np
where np = 2n if n2 is odd,
(2n + n1) if n2 is even
Because the user does not know in advance the values of n1 and n2, the estimates given in the preceding
argument list may be used in all cases. If insufficient table or workspace is provided, the subroutine stops
after printing an error message that tells exactly how much space was needed.
Performance Tips
CFFT computes an FFT for any value of n, but the performance for a given value of n depends on the
factorization of n. This is characteristic of all FFT algorithms.
Best performance is realized when n is a power of 2. In that case, the number of arithmetic operations is
proportional to nlog 2 (n).
Performance is slightly worse when n contains factors of 3; it is even worse when n contains powers of 5.
The worst performance is when n is a prime number. In that case, the number of operations is proportional
2
to n . The kernel routines are optimized for values of n that are products of powers of 2, 3, and 5.
CFFT is rather slow for small values of n (for example, n < 128) because in such cases, the vector lengths
are very short. For small n, however, performance should not matter unless you are performing many
transforms. In this case, you should use the MCFFT(3S) routine (multiple complex FFT). MCFFT(3S)
vectorizes in the "lot direction," and performance can be quite good even for small values of n.
CFFT runs in multitasked mode for large values of n. If
MIN (n1, n2)
≥ 16
ncpus
where ncpus is the number of CPUs being used, the calculation runs in multitasked mode. The values of n
must be fairly large to realize an appreciable performance gain from multitasking, however, because the
vector lengths are proportional to the square root of n.
EXAMPLES
The following is a test program for CFFT. It computes the FFT of a random sequence of complex numbers,
first using the direct definition of the Fourier Transform, and then using CFFT. Afterward, it compares the
results. Finally, it computes the inverse transform (dividing by N), and compares with the original data.
PAR AMETER (N = 4*3 *5*7)
COM PLEX I, W, X(0 :N-1), Y(0 :N-1), YY( 0:N -1)
PAR AMETER (NT ABLE = 100 + 8*N, NWORK = 12* N)
REA L TAB LE( NTABLE ), WOR K(N WORK)
PAR AME TER (I = (0. 0, 1.0))
LOGICA L LFW D, LIN V
*-------- ------ --- --------- --- --- ------ ------ ------
* Ini tializ e input arr ay, X, wit h ran dom
* comple x num bers.
DO 10, J = 0, N-1
X(J) = CMPLX( RAN F(), RANF() )
10 CON TINUE
*-------- ------ --- --------- --- ------ ------ ------ ---
SEE ALSO
CCFFT(3S), which supersedes most uses of CFFT
CCFFT2D(3S), CFFT2D(3S) to calculate a two-dimensional FFT. CCFFT2D(3S) supersedes most uses of
CFFT2D(3S).
CCFFT3D(3S), CFFT3D(3S) to calculate a three-dimensional FFT. CCFFT3D(3S) supersedes most uses of
CFFT3D(3S).
CCFFTM(3S), MCFFT(3S) to calculate multiple one-dimensional FFTs. CCFFTM(3S) supersedes most uses
of MCFFT(3S).
NAME
CFFT2 – Applies a complex Fast Fourier Transform (FFT)
SYNOPSIS
CALL CFFT2 (init, ix, n, x, work, y)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
CFFT2 performs the following calculation:
n −1
2πi
Σ xj
2
yk = exp (± j k ) for k = 0, . . ., n– 1, where i = – 1
j =0 n
The sign of the exponent is the same as the sign of the argument ix. This routine has the following
arguments:
init Integer. (input)
If nonzero, generates sine and cosine tables in work. If 0, calculates FFTs by using sine and
cosine tables of the previous call.
ix Integer. (input)
>0 Calculates a forward transform
<0 Calculates an inverse transform
n Integer. (input)
m
Size of the Fourier transform (2 , where m ≥ 3).
x Complex array of dimension n. (input)
Input vector. Range of x:
n 102466
≤ xi ≤ , for i = 1,2,. . .,n.
102466 n
Vector x can be equivalenced to the work vector. In this case, scratch work overwrites the
input values.
work Complex array of dimension 5 . n / 2. (scratch output)
Work storage vector.
y Complex array of dimension n. (output)
Result vector.
SEE ALSO
CCFFT(3S) (which supersedes CFFT2 only on Cray Y-MP systems), CCFFTM(3S), CRFFT2(3S),
RCFFT2(3S)
NAME
CFFT2D – Applies a multitasked two-dimensional complex Fast Fourier Transform (FFT)
SYNOPSIS
CALL CFFT2D (isign, n1, n2, scale, x, inc1x, inc2x, y, inc1y, inc2y, table, ntable, work,
nwork)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
CFFT2D computes the two-dimensional complex Fourier Transform of each column of the complex matrix
X, and it stores the results in the complex matrix Y. For most purposes, CFFT2D is superseded by
CCFFT2D(3S).
Suppose the matrices are stored in Fortran arrays dimensioned as follows:
COM PLEX X(0:N1 -1, 0:N 2-1 )
COM PLEX Y(0:N1 -1, 0:N 2-1 )
n1−1 n2−1
j1.k1 j2.k2
Y k1,k2 = scale Σ Σ
j1=0 j2=0
X j1,j2 ω1 ω2 for k1 = 0, . . ., n1– 1 k2 = 0, . . ., n2– 1
where:
(isign )2πi
ω1 = e n1
isign = ±1
π = 3.14159. . . e = 2.71828. . .
(isign )2πi
i = √−1 ω2 = e n2
In this documentation, when isign = +1 it is called the forward transform, and when isign = – 1 it is called
the inverse transform.
This routine has the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array, or whether to do the forward or inverse transform:
isign = 0 Initializes the table array
isign = +1 Computes the forward transform
isign = – 1 Computes the inverse transform
n1 Integer. (input)
Transform size in the first dimension. If n1 is not positive, CFFT2D returns without performing
a transform.
n2 Integer. (input)
Transform size in the second dimension. If n2 is not positive, CFFT2D returns without
performing a transform.
scale Real. (input)
Real scale factor. Each element of the output array is multiplied by scale after taking the
Fourier transform, as defined previously.
x Complex array. (input)
Input array of values to be transformed.
inc1x Integer. (input)
x increment in the first dimension (the address increment between successive complex row
elements of input array x). To use every row element in a given column, set inc1x = 1. inc1x
must not be 0.
inc2x Integer. (input)
x increment in the second dimension (the address increment between successive complex column
elements of input array x). To use every column element in a given row, set inc1x to be twice
the leading dimension of the complex array x. inc2x must not be 0.
y Complex array. (output)
Output array of transformed values. The output array may be the same as the input array. In
that case, the transform is done in place.
inc1y Integer. (input)
y increment in the first dimension (the address increment between successive complex row
elements of output array y). To use every row element in a given column, set inc1y = 1. inc1y
must not be 0.
inc2y Integer. (input)
y increment in the second dimension (the address increment between successive complex column
elements of output array y). To use every column element in a given row, set inc1y to be twice
the leading dimension of the complex array y. inc2y must not be 0.
table Real array of dimension ntable. (input or output)
Table of factors and trigonometric functions. This array may be initialized by a call to CFFT2D
with isign = 0.
ntable Integer. (input)
Number of (real) words in table. The value of ntable should be at least 2(n1 + n2) + 100. If
not enough space is provided, CFFT2D prints an error message and stops.
work Real array. (scratch output)
Work array of size nwork. This is a scratch array used for intermediate calculations. It must be
a different address space from the input and output arrays.
NOTES
This section includes information about the algorithm for CFFT2D, initialization of arrays, the significance
of the increment arguments, and performance.
Algorithm
CFFT2D uses MCFFT(3S) to do multiple FFTs, first on all of the columns of the input matrix and then on
all of the rows.
Initialization
The table array stores factors of n1 and n2 and also trigonometric tables that are used in the calculation of
the FFT. You can initialize table explicitly by calling the routine with isign = 0. If you do not initialize
table, the routine does so automatically on the first call. If the values of the problem size, n1 and n2, do not
change, the table does not need to be reinitialized. If you call the routine with different values of n1 and n2
without first reinitializing the table, the routine will reinitialize the table automatically.
Reinitialization of table is relatively time-consuming. If you are continually changing the problem size, you
might consider using more than one table array, so that it will not have to be reinitialized on each subroutine
call.
If you initialize the table explicitly by calling the routine with isign = 0, the only arguments that are
significant are isign, n1, n2, table, and ntable. In this case, the other arguments are ignored.
The value of ntable is checked when the table is initialized to verify that the table space you provided is
large enough. If it is not, the routine stops after printing an error message, which indicates the amount of
table space required.
Increment Arguments
The inc1x, inc2x, inc1y, and inc2y increment arguments describe how the matrices are stored in Fortran
arrays. These arguments are the link between the mathematical matrices and their representation in computer
memory.
Consider the following 4-by-5 matrix X:
X(1,1) X(1 ,2) X(1,3) X(1 ,4) X(1,5)
X(2,1) X(2,2) X(2 ,3) X(2 ,4) X(2 ,5)
X(3,1) X(3 ,2) X(3,3) X(3,4) X(3 ,5)
X(4,1) X(4,2) X(4 ,3) X(4 ,4) X(4 ,5)
Thus, the increment in the first dimension, inc1x, is 1. The increment in the second dimension, inc2x, is the
(address) distance between X(1,1) and X(1,2), which is 4, the leading dimension of X. Generally, the
increment in the second dimension is the leading dimension of the array as it is declared in the Fortran
program.
The increment arguments are not directly related to the values of n1 and n2, except that the matrix must fit
in the allocated address space.
Performance Tips
CFFT2D works for any values of the arguments, subject only to the restrictions as described. The
performance of this algorithm, however, depends on the values of the following arguments:
• n1 and n2 (the problem size in each dimension)
• inc1x, inc2x, inc1y, inc2y (the increment arguments)
• nwork (the amount of workspace)
Each of these factors are considered separately in the following subsections.
Performance relative to the problem size
CFFT2D computes an FFT for any values of n1 and n2, but the performance depends on the factorization of
these numbers. This is characteristic of all FFT algorithms.
Best performance is realized when n1 and n2 are each a power of 2. In that case, the number of arithmetic
operations in the calculation is proportional to n1.n2.log 2 (n1.n2).
Performance is slightly worse when n1 or n2 contain factors of 3. It is worse if n1 or n2 contain factors of 5.
Worst performance is when n1 and n2 are prime numbers. In that case, the number of arithmetic operations
2
in the calculation is proportional to (n1 . n2) .
The kernel routines are optimized for values of n1 and n2 that are products of powers of 2, 3, and 5. The
values of n1 and n2 also relate to vectorization and multitasking performance. Each of the dimensions is
used as a vector length for part of the calculation. Thus, as with all vector calculations, performance is less
than optimal if either n1 or n2 is small (for example, < 8).
If either of the dimensions is large enough, CFFT2D will multitask. If
MIN (n1,n2)
≥16
ncpus
where ncpus is the number of CPUs being used, the entire calculation runs in multitasked mode.
Performance relative to the increment arguments
The increment arguments have no effect on the algorithm itself, but their values are significant for memory
contention.
The stride for vector loads is, alternately, inc1x and inc2x. To avoid memory bank conflicts, neither number
should be a large multiple of 2. Best performance occurs when both numbers are odd. One way to do this
is to make the leading dimension of array x an odd number when the array is declared in the calling
program.
Likewise, the stride for vector stores are inc1y and inc2y, and the best performance occurs when both
numbers are odd.
Performance relative to the amount of workspace
To do all of the FFTs in one lot, the workspace required (in real words of storage) is nwork = 4 . n1 . n2.
If n1 and n2 are large, this amounts to a lot of memory. You can provide less workspace and still obtain
very good performance. At a minimum, you need the following amount of storage, in (real) words:
nwork = 4(MAX(n1, n2)MIN(n1, n2, 16ncpus))
where ncpus is the number of CPUs being used. If you give a value of nwork in the range:
4(MAX(n1, n2)MIN(n1, n2, 16ncpus)) ≤ nwork < 4(n1n2)
CFFT2D divides the work into lots, in which the size of each lot is sufficiently small to be accommodated
by the workspace provided. For best performance, nwork should be at least 128(n1n2)ncpus.
EXAMPLES
The following program computes the forward and inverse two-dimensional Fourier transform of a random
matrix of complex numbers and compares the result with the original matrix.
PARAMETER (N1 = 256 , N2 = 300 )
COMPLE X X(N 1, N2) , Y(N 1, N2)
PARAME TER (NTABL E = 100 + 2*( N1 + N2) )
PAR AMETER (NW ORK = 4*N 1*N2)
REAL TABLE( NTABLE ), WOR K(NWOR K)
LOGICAL LPASS
*----- --- ------ ------ --- ------ --- --- --- ------ ------
* Fil l array X wit h ran dom com ple x num bers.
DO 2, J = 1, N2
DO 1, I = 1, N1
X(I ,J) = CMPLX( RAN F(), RANF() )
1 CON TIN UE
2 CONTINUE
*----------- --- ------ ------ --- --- ------ --- --- --- ---
* Comput e Y = 2-D Fou rie r tra nsf orm of X.
CAL L CFF T2D (-1 , N1, N2, 1.0 /(N 1*N 2),
& Y, 1, N1, Y, 1, N1,
& TABLE, NTA BLE, WORK, NWO RK)
*-- --- --- --- --- --- --- --- --- --- ------ --- --- --- ------
* Com par e X and Y.
SEE ALSO
CCFFT(3S), CFFT(3S) to calculate a one-dimensional FFT. CCFFT(3S) supersedes most uses of CFFT(3S).
CCFFT2D(3S), which supersedes most uses of CFFT2D
CCFFT3D(3S), CFFT3D(3S) to calculate a three-dimensional FFT. CCFFT3D(3S) supersedes most uses of
CFFT3D(3S).
CCFFTM(3S), MCFFT(3S) to calculate multiple one-dimensional FFTs. CCFFTM(3S) supersedes most uses
of MCFFT(3S).
NAME
CFFT3D – Applies a multitasked three-dimensional complex Fast Fourier Transform (FFT)
SYNOPSIS
CALL CFFT3D (isign, n1, n2, n3, scale, x, inc1x, inc2x, inc3x, y, inc1y, inc2y, inc3y,
table, ntable, work, nwork)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
CFFT3D computes the three-dimensional complex Fourier transform of the complex matrix X, and it stores
the results in the complex matrix Y. For most purposes, CFFT3D is superseded by CCFFT3D(3S).
On this man page, the first dimension of a three-dimensional matrix is defined as the row dimension, the
second dimension is the column dimension, and the third dimension is the plane dimension.
Suppose that the matrices are stored in Fortran arrays, which are declared as follows:
COM PLEX X(0:N1 -1, 0:N 2-1 , 0:N 3-1 )
COM PLEX Y(0:N1 -1, 0:N 2-1 , 0:N 3-1 )
i =√−1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or the inverse transform, and what the scale factor should be in either case. In this documentation,
when isign = +1, it is called the forward transform, and when isign = – 1, it is called the inverse transform.
This routine has the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:
NOTES
This section includes information about the algorithm for CFFT3D, table initialization, increment arguments,
and performance tips.
Algorithm
CFFT3D uses MCFFT(3S) to do multiple FFTs first on all of the rows, then on all of the columns, and then
on all of the planes of the input matrix.
Table Initialization
The table array stores factors of n1, n2, and n3 and also trigonometric tables that are used in calculation of
the FFT. You can initialize table explicitly by calling CFFT3D with isign = 0. If you do not initialize
table, CFFT3D does so automatically on the first call. If the values of the problem size, n1, n2, and n3, do
not change, table does not need to be reinitialized. If you call CFFT3D with different values of n1, n2, and
n3 without reinitializing table first, CFFT3D reinitializes table automatically.
Reinitialization of table is relatively time-consuming. If you are continually changing the problem size, you
might consider using more than one table array, so that it will not have to be reinitialized on each call to
CFFT3D.
If you initialize table explicitly by calling the routine with isign = 0, the only arguments that are significant
are isign, n1, n2, n3, table, and ntable. In this case, the other arguments are ignored.
CFFT3D checks the value of ntable to ensure that enough space is available to store the entire table. If
ntable is not large enough, the routine stops after printing an error message, which indicates the amount of
table space required.
Increment Arguments
The inc1x, inc2x, inc3x, inc1y, inc2y, and inc3y increment arguments describe how the matrices are stored in
Fortran arrays. These arguments are the link between the mathematical matrices and their representation in
computer memory. The use of these increment arguments allows complete generality in specifying the
matrices. Because CFFT3D deals with three-dimensional matrices, some explanation is necessary.
Consider the following 2-by-3-by-4 matrix X:
X(1 ,1,1) X(1 ,2,1) X(1 ,3, 1)
X(2,1, 1) X(2,2, 1) X(2,3, 1)
Fortran stores matrices "by column," meaning that it stores the matrix so that the first index changes most
rapidly and the last index changes least rapidly, which results in the following order:
X(1 ,1, 1) –> X(2 ,1, 1) –> X(1 ,2, 1) –> X(2 ,2, 1) –> X(1 ,3, 1) –> X(2 ,3, 1) –>
X(1 ,1, 2) –> X(2 ,1, 2) –> X(1 ,2, 2) –> X(2 ,2, 2) –> X(1 ,3, 2) –> X(2 ,3, 2) –>
X(1 ,1, 3) –> X(2 ,1, 3) –> X(1 ,2, 3) –> X(2 ,2, 3) –> X(1 ,3, 3) –> X(2 ,3, 3) –>
X(1 ,1, 4) –> X(2 ,1, 4) –> X(1 ,2, 4) –> X(2 ,2, 4) –> X(1 ,3, 4) –> X(2 ,3, 4)
Thus, the increment in the first dimension, inc1x, is 1. The increment in the second dimension, inc2x, is the
(address) distance between X(1,1,1) and X(1,2,1), which is 2, the leading dimension of X.
The increment in the third dimension, inc3x, is the (address) distance between X(1,1,1) and X(1,1,2),
which is 6. This number 6 is the product of the first two leading dimensions of X.
Generally, the increment in the second dimension is the leading dimension of the array as it is declared in
the Fortran program, and the increment in the third dimension is the product of the two leading dimensions
of the array.
The increment arguments are not directly related to the values of n1, n2, and n3, except that the matrix must
fit in the allocated address space.
Negative increments are legal. If a row, column, or plane increment is negative, the address given as the x
or y argument should be the address of the first element used in the array (last in memory) by row number,
column number, or plane number.
Performance Tips
CFFT3D works for any values of the arguments, subject only to the restrictions given previously. The
performance of this algorithm, however, depends on the values of the following arguments:
• n1, n2, n3: problem size in each dimension
• inc1x, inc2x, inc3x, inc1y, inc2y, inc3y: increment arguments
• nwork: (amount of workspace)
Each of these factors is considered separately in the following subsections.
Performance relative to the problem size
CFFT3D computes an FFT for any value of n1, n2, and n3, but the performance depends on the factorization
of these numbers. This is characteristic of all FFT algorithms.
Best performance is realized when n1, n2, and n3 are each a power of 2. In that case the number of
arithmetic operations in the calculation is proportional to
n1 . n2 . n3 . log 2 (n1 . n2 . n3).
Performance is slightly worse when n1, n2, or n3 contain factors of 3. It is worse if n1, n2, or n3 contain
factors of 5. Worst performance is when n1, n2, and n3 are prime numbers. In that case, the number of
2
arithmetic operations in the calculation is proportional to (n1 . n2 . n3) .
The kernel routines are optimized for values of n1, n2, and n3 that are products of powers of 2, 3, and 5.
The values of n1, n2, and n3 also relate to vectorization and multitasking performance. Each of the
dimensions is used as a vector length for part of the calculation. Thus, as with all vector calculations,
performance will be less than optimum if either n1, n2, or n3 is small (for example, < 8).
EXAMPLES
The following program computes the forward and inverse two-dimensional Fourier transform of a random
matrix of complex numbers and compares the result with the original matrix.
PARAME TER (N1 = 16, N2 = 18, N3 = 25)
COMPLE X X(N 1, N2, N3) , Y(N 1, N2, N3)
PAR AMETER (NT ABL E = 100 + 2*(N1 + N2 + N3) )
PAR AMETER (NW ORK = 4*N 2*N 3)
REA L TAB LE( NTABLE ), WOR K(N WORK)
LOG ICAL LPASS
*-------- --- ------ ------ --- ------ --- --- --- --- --- ---
* Fill array X with random com plex number s.
DO 3, K = 1, N3
DO 2, J = 1, N2
DO 1, I = 1, N1
X(I ,J, K) = CMPLX( RAN F() , RAN F() )
1 CONTIN UE
2 CONTIN UE
3 CONTIN UE
*----- --- ------ ------ --- ------ --- --- --- ------ ------
* Com put e Y = 3-D Fourie r tra nsf orm of X.
CALL CFF T3D(+1 , N1, N2, N3, 1.0 ,
& X, 1, N1, N1* N2, Y, 1, N1, N1* N2,
& TABLE, NTA BLE , WOR K, NWO RK)
*----------- --- ------ ------ --- ------ --- --- --- --- ---
* Comput e Y = Invers e 3-D tra nsform of Y.
CALL CFF T3D(-1 , N1, N2, N3, 1.0 /(N 1*N 2*N 3),
& Y, 1, N1, N1* N2, Y, 1, N1, N1* N2,
& TABLE, NTA BLE , WOR K, NWO RK)
*----------- --- ------ ------ --- ------ --- --- --- --- ---
* Compare X and Y.
LPA SS = .TR UE.
DO 6, K = 1, N3
DO 5, J = 1, N2
DO 4, I = 1, N1
ERR OR = ABS (X(I,J ,K) -Y( I,J ,K))/A BS(X(I ,J,K))
LPASS = LPA SS .AN D. (ERROR .LE . 1.0 E-6 )
4 CON TINUE
5 CONTIN UE
6 CONTIN UE
IF (.N OT. LPASS) PRINT *,’ Failed the test’
IF (LP ASS) PRINT *, ’Pa sse d the test’
END
SEE ALSO
CCFFT(3S), CFFT(3S) to calculate a one-dimensional FFT. CCFFT(3S) supersedes most uses of CFFT(3S).
CCFFT2D(3S), CFFT2D(3S) to calculate a two-dimensional FFT. CCFFT2D(3S) supersedes most uses of
CFFT2D(3S).
CCFFT3D(3S), which supersedes most uses of CFFT3D
CCFFTM(3S), MCFFT(3S) to calculate multiple one-dimensional FFTs. CCFFTM(3S) supersedes most uses
of MCFFT(3S).
NAME
CFFTMLT – Applies complex-to-complex Fast Fourier Transforms (FFTs) on multiple input vectors
SYNOPSIS
CALL CFFTMLT (ar, ai, work, trigs, ifax, inc, jump, n, lot, isign)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
This routine is inow obsolete. It has been replaced by CCFFTM. See the CCFFTM man page for details on
its use.
CFFTMLT applies complex-to-complex FFTs on more than one input vector, as follows:
(ar (jump .l +inc .k +1),ai (jump .l +inc .k +1)) =
n −1
isign .i .2.π. j .k )
Σ exp(
j =0 n
(ar (jump .l +inc . j +1), ai (jump .l +inc . j +1))
for
k = 0,1,. . .,n– 1
l = 0, . . ., lot– 1,
i = √−1
This calculation is performed for each of the n vectors in the input.
Vectorization is achieved by doing parallel transforms, with vector length = lot.
This routine has the following arguments:
ar Real array of dimension n . lot. (input and output)
On input, it contains the real part of the input data. On output, it contains the real part of the
transformed data.
ai Real array of dimension n . lot. (input and output)
On input, it contains the imaginary part of the input data. On output, it contains the imaginary
part of the transformed data.
work Real array of dimension 4 . n . lot. (scratch output)
Work storage array.
trigs Real array of dimension 2*n. (input)
Must be initialized to contain sine and cosine tables. The following call initializes both trigs
and ifax:
CAL L CFTFAX (n,ifax,trigs)
NOTES
In the division by n, the normalization used by CFFTMLT differs from that used by CFFT2, CRFFT2, and
RCFFT2.
SEE ALSO
CCFFTM(3S), which supersedes this routine only on Cray Y-MP systems
RFFTMLT(3S) to calculate multiple real-to-complex or complex-to-real FFTs
NAME
CRFFT2 – Applies a complex-to-real Fast Fourier Transform (FFT)
SYNOPSIS
CALL CRFFT2 (init, ix, n, x, work, y)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
CRFFT2 calculates the following:
n −1
2πi
yk = Σ xj
j =0
exp (±
n
jk ) for k = 0,1,. . .,n– 1
NOTES
x j elements are complex and related by x j = x ((n– j)) for j = 1,2,. . .,(n / 2).
Only the first (n / 2)+1 elements are stored in x.
SEE ALSO
CFFT2(3S), RCFFT2(3S)
SCFFT(3S) for a description of CSFFT, a routine that supersedes CRFFT2 only on UNICOS systems
NAME
DESCINIT3D – Initializes a descriptor vector that contains information about the distribution of a
three-dimensional (3D) array across a 3D grid of processors
SYNOPSIS
CALL DESCINIT3D (desc, nx, ny, nz, nxpp, nypp, nzpp, pesx, pesy, pesz, ictxt, lldx, lldy,
info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
DESCINIT3D initializes a descriptor vector desc with information about the distribution of a 3D array A
across a 3D grid of processors. The information contained in this descriptor vector allows any routine that
uses distributed 3D arrays to know how the data is distributed across the processors. To specify a lower
dimensional grid of processors, initialize the corresponding entries in the descriptor vector to be 1.
Description of the Distributed Data
Consider a 3D array A, which is passed to a distributed library routine to be operated on. Let this 3D array
A of global size nx-by-ny-by-nz be distributed over a 3D grid of processors of size npx-by-npy-by-npz where
N$PES = npx x npy x npz.
To specify a two-dimensional (2D) grid of processors, the dimension any one of npx, npy, or npz could be
initialized to 1.
Each processor is assigned an address to denote its location in the 3D grid. The routine that initializes the
3D grid is called GRIDINIT3D(3S), and the user must first call it before calling DESCINIT3D (see the
man pages for GRIDINIT3D(3S)). This sets up the processor set as a 3D grid of processors. If you
initialized a 3D grid of size npx-by-npy-by-npz, the call to GRIDINIT3D(3S) would be as follows:
CALL GRIDIN IT3 D (ictxt, npx, npy, npz)
The nxpp, nypp, and nzpp arguments are initialized to the block size along each of the dimensions. Consider
the X dimension. If all of the data along this dimension will reside on the same processor (degenerate
distribution), nxpp would have the same value as nx. If the distribution is block, nxpp would be initialized
to ICEIL(nx, npx). If the distribution were cyclic, nxpp would be initialized to 1, and if the distribution
were block-cyclic, nxpp would be initialized to the corresponding block size.
Distribution along X Dimension Value of nxpp
Degenerate nxpp = nx
Block nxpp = ICEIL(nx, npx)
Cyclic nxpp = 1
Block-cyclic nxpp = block size desired
The nypp and nzpp arguments are initialized accordingly.
As an example, consider an array A of size 128-by-200-by-100 distributed on a 128-processor 2D grid of
size 1-by-16-by-8, as follows:
nx = 128
ny = 200
nz = 100
npx = 1
npy = 16
npz = 8
Let the distribution be degenerate along the X axis, block along the Y axis, and block-cyclic with a size of 4
along the Z axis, then
nxpp = 128
nypp = ICEIL( 200 ,16) = 13
nzpp = 4
The pesx, pesy, and pesz arguments are the grid coordinates of the processor specifying the location of the
first element of the global array A (that is, the processor that owns A(1,1,1)).
If the first element is in processor pes, then pesx, pesy, and pesz can be obtained by a call to
PCOORD3D(3S), as follows:
CALL PCOORD3D (ictxt, pes, pesx, pesy, pesz)
The first global element of the array A given by A(1,1,1) is usually located in processor 0. Therefore, the
following is true:
pesx = 0
pesy = 0
pesz = 0
However, in some cases it is advantageous to align array A in a slightly skewed manner in regard to another
array to avoid some communication. The following example illustrates this in the one-dimensional case.
Consider two vectors X and Y of length N that are involved in the computation (globally speaking) of a third
vector Z, as follows:
do i = 1, N
Z(i ) = X(i ) + Y(m od( i+N /2, N))
end do
In this example, the vector Y is to be stored in a skewed manner in regard to X so that the first element of X
will reside on the same processor that has Y(mod(1+N/2),N). If pesx were this processor, in the
descriptor for the vector X, the user would pass pesx to the routine to initialize that variable. You can
extend this idea to the other two dimensions.
However, as mentioned earlier, most applications would not require this flexibility in the data distribution;
therefore, pesx, pesy, and pesz in these applications would be initialized to 0.
The ictxt argument is a handle that describes the 3D partitioning of the set of processors done by
GRIDINIT3D(3S).
The lldx and lldy arguments are the leading dimensions of the local array in each processor that stores a
share of the global data.
If the distribution along the X or Y axes were degenerate, lldx ≥ nx and lldy ≥ ny, respectively.
If the distribution were block along either dimension, lldx ≥ ICEIL(nx,npx) and lldy ≥ ICEIL(ny,npy).
If the distribution were cyclic along either dimension, lldx ≥ INT(nx/npx) + 1 and lldy ≥ INT(ny/npy) + 1.
If the distribution were block-cyclic along either dimension with a block size of nxpp and nypp,
lldx ≥ INT(nx/(npx*nxpp)) + nxpp and lldy ≥ INT(ny/(npy*nypp)) + nypp.
The following example uses the previous example with this change: where the array A of size
128-by-200-by-100 was distributed on a 128-processor 2D grid of size 1-by-16-by-8 with the data along the
X axis being degenerate and the data along the Y axis being block, the following is true:
nx = 128
ny = 200
nz = 100
npx = 1
npy = 16
npz = 8
NOTES
The GRIDINIT3D(3S) routine must be called somewhere in the program before the first call to
DESCINIT3D.
SEE ALSO
GRIDINFO3D(3S), GRIDINIT3D(3S), PCOORD3D(3S), PNUM3D(3S)
NAME
FILTERG – Computes a correlation of two vectors
SYNOPSIS
CALL FILTERG (a, m, d, n, o)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
FILTERG computes a correlation of two vectors.
Given the following:
(a i ) i = 1,. . .,m Filter coefficients
(d j ) j = 1,. . .,n Data
FILTERG computes the following:
m
oi = Σ
j =1
a j di +j −1 i =1,. . . n−m +1
SEE ALSO
FILTERS(3S)
NAME
FILTERS – Computes a correlation of two vectors (symmetric coefficient)
SYNOPSIS
CALL FILTERS (a, m, d, n, r)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
FILTERS computes the same correlation as FILTERG(3S) except that it assumes the filter coefficient vector
is symmetric.
Given the following:
m
(c i ) i = 1, . . .,
2
(d j ) j=1,. . . n
m m
= for m even, and (m+1)/2 for m odd.
2 2
This is called the ceiling function.
FILTERS computes the following when m is an odd number:
(m −1)
2
ri = Σ
j =1
a j . (di +j −1 + di +m −j ) + a (m +1) . d
i +(
m +1)
2 2
i=1, . . ., n– m+1
FILTERS computes the following when m is an even number:
m
2
ri = Σ aj
j =1
. (di +j −1 + di +m −j )
i=1, . . ., n– m+1
This routine has the following arguments:
a Real array of dimension m / 2 . (input)
Symmetric filter coefficient vector.
m Integer. (input)
Formal length of vector a. The actual length of a is as indicated previously.
SEE ALSO
FILTERG(3S)
NAME
GGFFT – Applies a multitasked complex-to-complex Fast Fourier Transform (FFT)
SYNOPSIS
CALL GGFFT (isign, n, scale, x, y, table, work, isys)
IMPLEMENTATION
UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data only.
DESCRIPTION
GGFFT computes the Fast Fourier Transform (FFT) of the complex vector x, and it stores the result in vector
y.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
COMPLE X(KIND =4) X(0 :N- 1), Y(0 :N- 1)
The output array is the FFT of the input array, using the following formula for the FFT:
n −1
Σ
. j .k
Yk = scale . X j . ωisign for k = 0, . . ., n −1
j =0
where
2.π.i
ω=e n
i = +√−1
π = 3.14159. . .
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.
The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 / (n . scale). In
particular, if you use isign = +1 and scale = 1.0 for the forward FFT, you can compute the inverse FFT by
using isign = – 1 and scale = 1.0/n.
The output array may be the same as the input array, provided that n has at least 2 factors.
NOTES
This section contains information about the algorithm for GGFFT, the initialization of the table array, the
declaration of dimensions for x and y arrays, some performance tips, and some implementation dependent
details.
Algorithm
The algorithm used is a variant of Agarwal’s algorithm.
Initialization
The table array stores the trigonometric tables used in calculation of the FFT. You must initialize table by
calling the routine with isign = 0 prior to doing the transforms. If the value of the problem size, n, does not
change, table does not have to be reinitialized.
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared as follows:
COMPLE X(KIND =4) X(0 :N- 1)
COMPLE X(KIND =4) Y(0 :N- 1)
However, if you prefer to use the more customary Fortran style with subscripts starting at 1 you do not have
to change the calling sequence, as in (assuming N > 0):
COMPLE X(KIND =4) X(N )
COMPLE X(KIND =4) Y(N )
Performance Tips
This routine computes an FFT for any value of n, but the performance for a given value of n depends on the
prime factorization of n. This fact is characteristic of all FFT algorithms.
Fastest performance is realized when n is a power of 2; in which case, the number of floating-point
operations is approximately 5n . log 2 (n).
If n contains factors of 3, computation time is slightly longer, because more floating-point operations are
required. If n contains powers of 5, it is longer still. Slowest performance is when n is a prime number; in
2
which case, the number of floating-point operations is approximately 8n .
The kernel routines are optimized for values of n that are products of powers of 2, 3, and 5. (Because the
kernel routines have a special case for multiples of 4, powers of 4 will be slightly faster than odd powers of
2.)
Implementation-dependent Items
The UNICOS and UNICOS/mk FFT routines were designed so that they can be implemented efficiently on
many different architectures. The calling sequence is the same in any implementation. Certain details,
however, depend on the particular implementation. These details are confined to two areas:
• The first area is the size of the table and work arrays. Different systems may need different sizes. The
subroutine call requires no change, but you may have to change array sizes in the DIMENSION or type
statements that declare the arrays.
• The second area is the isys parameter array, an array that gives certain implementation-specific
information. All features and functions of the FFT routines specific to any particular implementation are
confined to this isys array. On any implementation, you can use the default values by using an argument
value of 0.
EXAMPLES
Example 1: Initialize the complex array TABLE in preparation for doing an FFT of size 1024. Only the
ISIGN, N, and TABLE arguments are used in this case; you can use dummy arguments or zeros for the
other arguments in the subroutine call.
REAL(KIND =4) TABLE( 100 + 8*1024 )
CALL GGFFT( 0, 1024, 0.0 , DUMMY, DUM MY, TAB LE, DUMMY, 0)
Example 2: X and Y are complex arrays of dimension (0:1023). Take the FFT of X and store the results in
Y. Before taking the FFT, initialize the TABLE array, as in example 1.
COMPLEX(KIND =4) X(0 :10 23) , Y(0 :10 23)
REAL TAB LE(100 + 8*1 024 )
REAL WOR K(8 *1024)
...
CALL GGF FT(0, 102 4, 1.0 , X, Y, TAB LE, WOR K, 0)
CALL GGF FT(1, 102 4, 1.0 , X, Y, TAB LE, WOR K, 0)
Example 3: Using the same X and Y as in example 2, take the inverse FFT of Y and store it back in X. The
scale factor 1/1024 is used. Assume that the TABLE array is already initialized.
CALL GGF FT(-1, 1024, 1.0 /1024. 0, Y, X, TAB LE, WORK, 0)
Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. No change was needed in the subroutine calls.
COMPLEX X(1024), Y(1 024 )
...
CALL GGF FT(0, 102 4, 1.0 , X, Y, TAB LE, WOR K, 0)
CALL GGF FT(1, 102 4, 1.0 , X, Y, TAB LE, WOR K, 0)
Example 5: Do the same computation as in example 4, but put the output back in array X to save storage
space. Assume that TABLE is already initialized.
COMPLE X X(1 024)
...
CALL GGF FT(1, 102 4, 1.0 , X, X, TAB LE, WORK, 0)
SEE ALSO
CCFFT(3S), HGFFT(3S), SCFFT(3S),
NAME
HCONV – Performs the convolution of two sequences of real numbers
SYNOPSIS
CALL HCONV (nh, nx, ny, h, x, y)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
HCONV computes the convolution of the filter sequence h with the data sequence x, producing the output
sequence y.
Suppose h and x are two sequences of real numbers, having nh and nx elements, respectively. As is
customary in signal processing applications, let the subscripts start at 0, therefore:
h = h(0), h(1), . . . h(nh – 1)
x = x(0), x(1), . . . x(nx – 1)
The "convolution product," y, is the sequence having elements defined by:
y(0) = h(nh– 1) . x(0) + h(nh– 2) . x(1) + . . . + h(0) . x(nh– 1)
y(1) = h(nh– 1) . x(1) + h(nh– 2) . x(2) + . . . + h(0) . x(nh)
y(2) = h(nh– 1) . x(2) + h(nh– 2) . x(3) + . . . + h(0) . x(nh+1)
This example definition assumes nx > nh.
The precise definition of the convolution is:
Yk = Σ H (nh −1−j ) . x (k +j )
0≤j ≤min
for 0 ≤ k ≤ ny−1.
The number of terms in the output sequence is specified by an argument, ny. If ny < nx, then the output
sequence is just truncated. If ny > nx, then zeros are appended to the output sequence.
By choosing ny > nx − nh+1, the routine does what is sometimes called "post-tapered" convolution. The
effect is as though the data sequence, x, were padded on the end with zeros, except that no zeros are actually
stored and no multiplications by zero are actually done.
This routine has the following arguments:
nh INTEGER(KIND=8). (input)
Specifies the number of elements in the filter sequence, h. nh ≥ 0.
nx INTEGER(KIND=8). (input)
Specifies the number of elements in the data sequence, x. nx ≥ 0.
ny INTEGER(KIND=8). (input)
Specifies the number of elements in the output sequence, y. ny ≥ 0.
h REAL(KIND=4) array of dimension (0, nh−1). (input)
Specifies the input sequence of filter values.
x REAL(KIND=4) array of dimension (0, nx−1). (input)
Specifies the input sequence of data values.
y REAL(KIND=4) array of dimension (0, ny−1). (output)
Specifies the output matrix of convolutions.
NOTES
If ny = 0, the routine just returns. If either nh = 0 or nx = 0, the routine will zero the first ny elements in y
and return.
EXAMPLES
SEE ALSO
HCORR(3S), HCORRS(3S), SCONV(3S)
NAME
HCORR – Performs the correlation of two sequences of real numbers
SYNOPSIS
CALL HCORR (nh, nx, ny, h, x, y)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
HCORR computes the correlation of the filter sequence h with the data sequence x, producing the output
sequence y.
Suppose h and x are two sequences of real numbers, having nh and nx elements, respectively. As is
customary in signal processing applications, let the subscripts start at 0, so
h = h(0), h(1), . . ., h(nh – 1)
x = x(0), x(1), . . ., x(nx – 1)
The "correlation product", y, is the sequence having elements defined by:
y(0) = h(0) . x(0) + h(1) . x(1) + . . . + h(nh – 1) . x(nh – 1)
y(1) = h(0) . x(1) + h(1) . x(2) + . . . + h(nh – 1) . x(nh)
y(2) = h(0) . x(2) + h(1) . x(3) + . . . + h(nh – 1) . x(nh + 1)
This example definition assumes that nx ≥ nh.
The precise definition is as follows:
Yk = Σ H (nh −1−j ) . x (k +j ) for 0 ≤ k ≤ny −1
0≤j ≤MIN
The number of terms in the output sequence is specified by the argument ny. If ny < nx, the output sequence
is just truncated. If ny > nx, zeros are appended to the output sequence.
By choosing ny > nx – nh + 1, the routine does what is sometimes called "post-tapered" correlation. The
effect is as though the data sequence, x, were padded on the end with zeros, except that no zeros are actually
stored and no multiplications by zero are actually done. This routine has the following arguments:
nh INTEGER(KIND=8). (input)
Specifies the number of elements in the filter sequence, h. nh ≥ 0.
nx INTEGER(KIND=8). (input)
Specifies the number of elements in the sequence of data sequence, x. nx ≥ 0.
ny INTEGER(KIND=8). (input)
Specifies the number of elements in the output sequence, y. ny ≥ 0.
h REAL(KIND+4) array of dimension (0, nh−1). (input)
Specifies the input sequence of filter values.
NOTES
If ny = 0, the routine returns. If either nh = 0 or nx = 0, the routine will zero the first ny elements in y and
return.
EXAMPLES
SEE ALSO
HCONV(3S), HCORRS(3S), SCORR(3S)
NAME
HCORRS – Performs the correlation of two sequences of real numbers (symmetric filter)
SYNOPSIS
CALL HCORRS (nh, nx, ny, h, x, y)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
HCORRS computes the correlation of the symmetric filter sequence h with the data sequence x, producing the
output sequence y. The filter, h, is assumed to be symmetric about its middle.
The computation carried out by HCORRS is exactly the same as that done by routine HCORR, with one
exception: the filter, h, is assumed to be symmetric, so only the first half of the elements are accessed. The
values of the second half are inferred from the first half and do not actually have to be supplied by the
calling routine.
To review the definition of correlation (not necessarily assuming a symmetric filter), suppose h and x are two
sequences of real numbers, having nh and nx elements, respectively. As is customary in signal processing
applications, let the subscripts start at 0, so
h = h(0), h(1), . . ., h(nh – 1)
x = x(0), x(1), . . ., x(nx – 1)
The "correlation product", y, is the sequence having elements defined by:
y(0) = h(0) . x(0) + h(1) . x(1) + . . . + h(nh – 1) . x(nh – 1)
y(1) = h(0) . x(1) + h(1) . x(2) + . . . + h(nh – 1) . x(nh)
y(2) = h(0) . x(2) + h(1) . x(3) + . . . + h(nh – 1) . x(nh + 1)
This example definition assumes that nx ≥ nh.)
The precise definition of correlation is as follows:
Yk = Σ H (nh −1−j ) . x (k +j ) for 0 ≤ k ≤ny −1
0≤j ≤MIN
The HCORRS routine makes the assumption that the filter is symmetric; in other words, that h(nh − j) = h(j),
for 0 ≤ j ≤ nh / 2.
Only the elements h(0) through h(nh/2) are accessed by the routine. The last half of the filter values are not
accessed and do not actually have to be supplied by the calling routine.
The number of terms in the output sequence is specified by an argument, ny. If ny < nx, then the output
sequence is just truncated. If ny > nx, then zeros are appended to the output sequence.
By choosing ny > nx − nh+1, the routine does what is sometimes called "post-tapered" correlation. The
effect is as though the data sequence, x, were padded on the end with zeros, except that no zeros are actually
stored and no multiplications by zero are actually done.
This routine has the following arguments:
nh Integer. (input)
Specifies the number of elements in the filter sequence, h.
nh ≥ 0.
nx Integer. (input)
Specifies the number of elements in the data sequence, x.
nx ≥ 0.
ny Integer. (input)
Specifies the number of elements in the output sequence, y.
ny ≥ 0.
h REAL(KIND=4) array of dimension (0, nh/2). (input)
Specifies the input sequence of filter values. Only values h(0) through h(nh/2) are accessed; the
second half of the filter values are inferred from the symmetry of h.
x REAL(KIND=4) array of dimension (0, nx−1). (input)
Specifies the input sequence of data values.
y REAL(KIND=4) array of dimension (0, ny−1). (output)
Specifies the output sequence.
NOTES
If ny = 0, the routine just returns. If either nh = 0 or nx = 0, the routine will zero the first ny elements in y
and return.
EXAMPLES
SEE ALSO
HCONV(3S), HCORR(3S), SCORRS(3S)
NAME
HGFFT, GHFFT – Computes a real-to-complex or complex-to-real Fast Fourier Transform (FFT)
SYNOPSIS
CALL HGFFT (isign, n, scale, x, y, table, work, isys)
CALL GHFFT (isign, n, scale, x, y, table, work, isys)
IMPLEMENTATION
UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
HGFFT computes the FFT of the real array X, and it stores the results in the complex array Y. GHFFT
computes the corresponding inverse complex-to-real transform.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way. First
the function of HGFFT is described. Suppose that the arrays are dimensioned as follows:
REAL(K IND=4) X(0 :n-1)
COMPLE X(KIND =4) Y(0 :n/ 2)
Then the output array is the FFT of the input array, using the following formula for the FFT:
n −1
n
Σ
. j .k
Yk = scale X j . ωisign for k = 0, . . .,
j =0 2
where
2.π.i
ω=e n
i = +√−1
π = 3.14159. . .
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make these
routines compute any of the various possible definitions, however, by choosing the appropriate values for
isign and scale.
The relevant fact from FFT theory is this: If you call HGFFT with any particular values of isign and scale,
the mathematical inverse function is computed by calling GHFFT with – isign and 1 / (n . scale). In
particular, if you use isign = +1 and scale = 1.0 in HGFFT for the forward FFT, you can compute the
inverse FFT by using GHFFT with isign = – 1 and scale = 1.0/n.
If isys(0) = 0, the default values of such parameters are used. In this case, you can specify the
argument value as the scalar integer constant 0. If isys(0)>0, isys(0) gives the upper bound of
the isys array; that is, if il=isys(0), user-specified parameters are expected in isys(1) through
isys(il).
Real-to-complex FFTs
Notice in the preceding formula that there are n real input values, and n / 2 + 1 complex output values. This
property is characteristic of real-to-complex FFTs.
The mathematical definition of the Fourier transform takes a sequence of n complex values and transforms it
to another sequence of n complex values. A complex-to-complex FFT routine, such as GGFFT(3S), will take
n complex input values, and produce n complex output values. In fact, one easy way to compute a real-to-
complex FFT is to store the input data in a complex array, then call routine GGFFT to compute the FFT.
You get the same answer when using the HGFFT routine.
The reason for having a separate real-to-complex FFT routine is efficiency. Because the input data is real,
you can make use of this fact to save almost half of the computational work.
The theory of Fourier transforms tells us that for real input data, you have to compute only the first n/2 + 1
complex output values, because the remaining values can be computed from the first half of the values by
the following simple formula:
Y(k)=conjg(Y(n-k)) for n / 2 ≤ k ≤ n-1
where the notation conjg(z) represents the complex conjugate of z.
In fact, in many applications, the second half of the complex output data is never explicitly computed or
stored. Likewise, as explained below, only the first half of the complex data has to be supplied for the
complex-to-real FFT.
Another implication of FFT theory is that, for real input data, the first output value, Y(0), will always be a
real number; therefore, the imaginary part will always be 0. If n is an even number, Y(n/2) will also be real
and thus, have zero imaginary part.
Complex-to-real FFTs
Consider the complex-to-real case. The effect of the computation is given by the preceding formula, but
with X complex and Y real.
Generally, the FFT transforms a complex sequence into a complex sequence. However, in a certain
application we may know the output sequence is real. Often, this is the case because the complex input
sequence was the transform of a real sequence. In this case, you can save about half of the computational
work.
According to the theory of Fourier transforms, for the output sequence, Y, to be a real sequence, the
following identity on the input sequence, X, must be true:
X(k) = conjg(X(n-k)) for n / 2 ≤k ≤ n-1
And, in fact, the input values X(k) for k > n/2 need not be supplied; they can be inferred from the first half
of the input.
Thus, in the complex-to-real routine, GHFFT, the arrays can be dimensioned as follows:
COMPLE X(K IND=4) X(0:n/ 2)
REAL(K IND=4) Y(0:n- 1)
There are n / 2 + 1 complex input values and n real output values. Even though only n/2 + 1 input values
are supplied, the size of the transform is still n in this case, because implicitly you are using the FFT
formula for a sequence of length n.
Another implication of the theory is that X(0) must be a real number (that is, it must have zero imaginary
part). Also, if n is even, X(n/2) must also be real. Routine GHFFT assumes that these values are real; if you
specify a nonzero imaginary part, it is ignored.
NOTES
Table Initialization
The table array stores the trigonometric tables used in calculation of the FFT. This table must be initialized
by calling the routine with isign = 0 prior to doing the transforms. The table does not have to be
reinitialized if the value of the problem size, n, does not change. Because HGFFT and GHFFT use the same
format for table, either can be used to initialize it (GGFFT uses a different table format).
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared (assuming n > 0):
REAL(KIND =4) X(0:n- 1)
COMPLEX(K IND =4) Y(0 :n/ 2)
No change is needed in the calling sequence; however, if you prefer you can use the more customary Fortran
style with subscripts starting at 1, as in the following:
REAL(K IND =4) X(n )
COMPLEX(K IND =4) Y(n /2 + 1)
Performance Tips
These routines will compute an FFT for any value of n, provided only that n is an even number, n ≥ 2,
Performance for a given value of n depends on the prime factorization of n. This fact is characteristic of all
FFT algorithms.
Fastest performance is realized when n is a power of 2; in which case, the number of floating-point
operations is approximately
5. .
n log 2 (n)
2
If n contains factors of 3, performance is slightly worse; if n contains powers of 5, it is slightly worse still.
Worst performance is when n is a prime number; in which case, the number of operations is approximately 4
. n2.
The kernel routines are optimized for values of n that are even numbers and are products of powers of 2, 3,
and 5. (Because the kernel routines have a special case for multiples of 4, even powers of 2 will be slightly
faster than odd powers of 2.)
Implementation-dependent Items
The UNICOS and UNICOS/mk FFT routines were designed so that they could be implemented efficiently on
many different architectures. The calling sequence is the same in any implementation. Certain details,
however, depend on the particular implementation. These details are confined to two areas:
• The first area is the size of the table and work arrays. Different sizes may be needed on different
systems. No change is required to the subroutine call, but you may have to change the array sizes in the
DIMENSION or type statements that declare the arrays.
• The second area is the isys parameter array, an argument that gives certain implementation-specific
information. All features and functions of the FFT routines that are specific to any particular
implementation are confined to this isys array. On any implementation, you can use the default values by
specifying an argument value of 0.
EXAMPLES
Example 1: Initialize the complex array TABLE in preparation for doing an FFT of size 1024. In this case
only the arguments isign, n, and table are used; you can use dummy arguments or zeros for the other
arguments in the subroutine call.
REA L(KIND =4) TAB LE( 100 + 4*1 024 )
CAL L HGF FT(0, 102 4, 0.0 , DUM MY, DUM MY, TAB LE, DUM MY, 0)
Example 2: X is a real array of dimension (0:1023), and Y is a complex array of dimension (0:512). Take
the FFT of X and store the results in Y. Before taking the FFT, initialize the TABLE array, as in example 1.
REA L(K IND =4) X(0 :10 23)
COM PLE X(KIND =4) Y(0:51 2)
REA L(KIND =4) TAB LE( 100 + 4*1 024 )
REA L(K IND =4) WOR K(4 *10 24 + 4)
...
CAL L HGF FT(0, 102 4, 1.0 , X, Y, TAB LE, WORK, 0)
CAL L HGF FT(1, 102 4, 1.0 , X, Y, TAB LE, WORK, 0)
Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
factor 1/1024 is used. Assume that the TABLE array is initialized already.
CAL L GHF FT(-1, 102 4, 1.0/10 24. 0, Y, X, TAB LE, WOR K, 0)
Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. The subroutine calls are not changed.
REA L(K IND =4) X(1 024 )
COM PLE X(K IND =4) Y(5 13)
...
CAL L HGF FT( 0, 102 4, 1.0 , X, Y, TABLE, WOR K, 0)
CAL L HGF FT( 1, 102 4, 1.0 , X, Y, TABLE, WOR K, 0)
Example 5: Do the same computation as in example 4, but equivalence the input and output arrays to save
storage space. Assume that the TABLE array is initialized already.
REA L(K IND =4) X(1 024 )
COM PLE X(K IND =4) Y(5 13)
EQU IVA LEN CE ( X(1 ), Y(1 ) )
...
CAL L HGF FT( 1, 102 4, 1.0 , X, Y, TAB LE, WOR K, 0)
SEE ALSO
GGFFT(3S), SCFFT(3S)
NAME
HOPFILT – Solves Weiner-Levinson linear equations
SYNOPSIS
CALL HOPFILT (m, a, b, c, r)
IMPLEMENTATION
UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data only.
DESCRIPTION
HOPFILT computes the solution to the Weiner-Levinson system of linear equations Ta = b; T is a
symmetric Toeplitz matrix in which elements are described as follows:
t ij = R(1+MOD(m+j– i,m))
for some vector R = (R(1), R(2), . . ., R(m))
This routine has the following arguments:
m Integer. (input)
Order of the system of equations.
a REAL(KIND=4) array of dimension m. (output)
Resulting vector of filter coefficients.
b REAL(KIND=4) array of dimension m. (input)
Information auto-correlation vector (right-hand side vector in system of linear equations).
c REAL(KIND=4) array of dimension 2m. (scratch output)
Scratch vector.
r REAL(KIND=4) array of dimension m. (input)
Signal auto-correlation vector (band values of the symmetric Toeplitz matrix T).
NOTES
Although HOPFILT solves this matrix equation faster than Gaussian elimination, HOPFILT does no
pivoting; therefore, it is less numerically stable than Gaussian elimination, unless the matrix T is either
positive definite or diagonally dominant.
EXAMPLES
You can solve the following system of linear equations with the call HOPFILT (3,A,B,C,R). Vector c
has a length of at least 6.
R (1) R (2) R (3) A (1) B (1)
R (2) R (1) R (2) A (2) = B (2)
R (3) R (2) R (1) A (3) B (3)
SEE ALSO
OPFILT(3S)
NAME
MCFFT – Applies multiple multitasked complex Fast Fourier Transforms (FFTs)
SYNOPSIS
CALL MCFFT (isign, n, m, scale, x, inc1x, inc2x, y, inc1y, inc2y, table, ntable, work,
nwork)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
MCFFT computes the Fourier transform of each column of the complex matrix x, and it stores the results in
the columns of matrix y. For most purposes, MCFFT is superseded by the UNICOS standard FFT routine
CCFFTM(3S).
Suppose the arrays are dimensioned as follows:
COMPLE X X(0 :N- 1, M), Y(0 :N- 1, M)
n −1
Yk ,l = scale
ΣX
j =0
j ,l ω jk for k = 0, . . ., n −1, l = 1, . . ., m
where
isign . 2 . π . i
ω=e n
isign =±1
π=3.14159. . .
e =2.71828. . .
i =√−1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or the inverse transform, and what the scale factor should be in either case. In this documentation,
when isign = +1, it is called the forward transform, and when isign = – 1, it is called the inverse transform.
NOTES
This section contains information about the algorithm for MCFFT, table initialization, increment arguments,
and performance tips.
Algorithm
MCFFT uses decimation-in-frequency type FFT that performs its operations on each row of the matrix. This
means that as the algorithm is transforming each column of the input matrix, it vectorizes along the rows.
Thus, the vector length in the calculations depends on the row size. The performance tips later in this
subsection give more information on the algorithm as it relates to performance.
Table Initialization
The table array stores factors of n and trigonometric tables that are used in calculation of the FFT. You can
initialize table explicitly by calling MCFFT with isign = 0. If you do not initialize table, MCFFT does so
automatically on the first call. If the value of the problem size, n, does not change, table does not have to
be reinitialized. If you call MCFFT with a different value of n without first reinitializing table, MCFFT
reinitializes table automatically.
Reinitialization of table is relatively time-consuming. If you are continually changing the problem size, you
might consider using more than one table array, so that it will not have to be reinitialized on each call to
MCFFT.
If you initialize table explicitly by calling MCFFT with isign = 0, the only arguments that are significant are
isign, n, table, and ntable. In this case, the other arguments are ignored.
The value of ntable is checked when the table is initialized to verify that the table space you provided is
large enough. If it is not, MCFFT stops after printing an error message, indicating the amount of table space
required.
Increment Arguments
The inc1x, inc2x, inc1y, and inc2y increment arguments describe how the matrices are stored in Fortran
arrays. These arguments are the link between the mathematical matrices and their representation in computer
memory.
Consider the following 4-by-5 matrix X.
X(1,1) X(1,2) X(1 ,3) X(1 ,4) X(1 ,5)
X(2,1) X(2 ,2) X(2,3) X(2 ,4) X(2,5)
X(3,1) X(3,2) X(3 ,3) X(3 ,4) X(3 ,5)
X(4,1) X(4 ,2) X(4,3) X(4,4) X(4 ,5)
Thus, the increment in the first dimension, inc1x, is just 1. The increment in the second dimension, inc2x, is
the (address) distance between X(1,1) and X(1,2), which is 4, the leading dimension of X. Generally, the
increment in the second dimension is the leading dimension of the array as it is declared in the Fortran
program, or a multiple thereof.
The previous information described transforming each column of X into a column of Y. Actually, it could
just as well have described transforming rows of X into rows of Y. MCFFT can do either one, as follows:
Suppose that X and Y have been declared with the following statement, as in the previous example:
COMPLE X X(4, 5), Y(4 , 5)
To transform the columns of X into the columns of Y, using every element of each column, set the
following:
INC1X = 1
INC2X = 4
INC1Y = 1
INC2Y = 4
INC1X and INC1Y are 1, meaning to use every element of the column, and the values of INC2X and
INC2Y are the leading dimensions of the arrays, as they are declared.
To transform the rows of X into the rows of Y, interchange the values of INC1X and INC2X, and also of
INC1Y and INC2Y (the increment arguments) as follows:
INC1X = 4
INC2X = 1
INC1Y = 4
INC2Y = 1
Because of the way that arrays are stored in Fortran, interchanging the increments this way is equivalent to
transposing the matrices.
The increment arguments are not directly related to the values of n and m, except insofar as the matrix must
fit in the allocated address space.
Negative increments are legal. If row or column increment is negative, the address given as the x or y
argument should be the address of the element at the end of the row or column of the array (not the
beginning).
Because each transform has n elements, this implies that the increment values must satisfy the following
logical expressions:
inc2x ≥n inc1x or inc1x ≥n inc2x
inc2y ≥n inc1y or inc1y ≥n inc2y
Performance Tips
MCFFT will work for any values of the arguments, subject only to the restrictions given previously. The
performance of this algorithm, however, depends on the values of the following arguments:
• n: Order of each transform
• m: Number of transforms
• inc1x, inc2x, inc1y, inc2y: Increment arguments
• nwork: Amount of workspace
Each of these factors is considered separately in the following subsections.
Performance relative to the order of transform
MCFFT computes an FFT for any value of n, but the performance for a given value of n depends on the
factorization of n. This is characteristic of all FFT algorithms.
Best performance is realized when n is a power of 2. In that case, the number of operations is proportional
to mnlog 2 (n).
Performance is slightly worse if n contains factors of 3. It is worse if n contains powers of 5. Worst
2
performance is when n is a prime number. In that case, the number of operations is proportional to mn .
The kernel routines are optimized for values of n that are products of powers of 2, 3, and 5. The value of n
has no effect on vectorization or multitasking, which depend only on m.
Performance relative to the number of transforms
MCFFT uses a vectorized FFT algorithm that vectorizes across the rows of x and y. Thus, the vector length
for the computations is m, the number of transforms. As with all vector calculation, performance is poor if
m is small (for example, less than 8). If m ≥ 32, performance will be good. Performance is best when m is
a multiple of 64 (128 on Cray C90 series computer systems), particularly if m ≥ 256.
EXAMPLES
The following program illustrates the use of MCFFT. The program computes 256 one-dimensional FFTs out
of a matrix of random numbers, first by using MCFFT, then by using CFFT(3S) for each column. Then it
compares the two results.
The program then computes each inverse transform, also using MCFFT, and compares the results with the
original sequence.
PAR AMETER (N = 2*3 *5*7, M = 256)
PAR AMETER (LD 1 = N+1 , LD2 = M+3)
COMPLE X X(LD1, LD2 ), Y(LD1, LD2 ), YY( LD1 , LD2 )
PAR AMETER ( NTA BLE = 100 + 8*N )
PAR AMETER ( NWORK = 4*N*M )
REA L TAB LE( NTABLE), WOR K(N WOR K)
LOGICA L LFW D, LINV
*----- ------ ------ --- ------ --- --- --- --- --- --- --- --- --- --- ---
* Ini tializ e inp ut array, X, to a
DO 15, J = 1, M
DO 10, I = 1, N
X(I, J) = CMP LX(RAN F() , RAN F() )
10 CON TINUE
15 CONTIN UE
*-- --- ------ --- --------- --- --- ------ --- --- ------ --- --- --- ---
* Compute Y(:,J) = the Fou rie r Tra nsf orm of X(: ,J)
* usi ng MCFFT.
DO 20, J = 1, M
CALL CFFT(+1, N, 1.0 , X(1 ,J) , 1, YY(1,J ), 1,
& TAB LE, NTA BLE , WOR K, NWO RK)
20 CONTIN UE
*-- --- ------ --- --------- --- --- ------ --- --- --- ------ --- --- ---
* Com par e Y and YY.
LFW D = .TRUE.
DO 40, J = 1, M
DO 30, I = 1, N
ERR OR = ABS ( Y(I ,J)-YY (I, J) )/A BS(Y(I ,J) )
LFW D = LFW D .AN D. (ER ROR .LE . 1.0 E-6 )
30 CON TIN UE
40 CON TINUE
IF (.N OT. LFWD) PRI NT *, ’Fa ile d for war d tes t’
IF (LFWD) PRINT *, ’Fo rwa rd transf orm OK’
*-- --- ------ --- --------- --- --- ------ --- --- --- ------ --- --- ---
* Com put e the invers e tra nsf orm of Y,
* and sto re it bac k in Y.
LIN V = .TRUE.
DO 60, I = 1, N
DO 50, J = 1, M
ERR OR = ABS ( X(I ,J)-Y( I,J ) )/A BS( X(I ,J) )
LIN V = LIN V .AN D. (ER ROR .LE. 1.0E-6 )
50 CONTIN UE
60 CON TINUE
IF (.N OT. LIN V) PRI NT *, ’Faile d inv ers e tes t’
IF (LI NV) PRINT *, ’In ver se tra nsf orm OK’
IF (LINV .AN D. LFWD) PRI NT *, ’Test suc cee ded ’
END
SEE ALSO
CCFFT(3S), CFFT(3S) to calculate a single one-dimensional FFT. CCFFT(3S) supersedes most uses of
CFFT(3S).
CCFFT2D(3S), CFFT2D(3S) to calculate a two-dimensional FFT. CCFFT2D(3S) supersedes most uses of
CFFT2D(3S).
CCFFT3D(3S), CFFT3D(3S) to calculate a three-dimensional FFT. CCFFT3D(3S) supersedes most uses of
CFFT3D(3S).
CCFFTM(3S), which supersedes most uses of MCFFT
NAME
OPFILT – Solves Weiner-Levinson linear equations
SYNOPSIS
CALL OPFILT (m, a, b, c, r)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data only.
DESCRIPTION
OPFILT computes the solution to the Weiner-Levinson system of linear equations Ta = b; T is a symmetric
Toeplitz matrix in which elements are described as follows:
ti j = R(1+mod (m +j −i , m ))
for some vector R = (R(1), R(2), . . ., R(m)).
This routine has the following arguments:
m Integer. (input)
Order of the system of equations.
a Real array of dimension m. (output)
Resulting vector of filter coefficients.
b Real array of dimension m. (input)
Information auto-correlation vector (right-hand side vector in system of linear equations).
c Real array of dimension 2m. (scratch output)
Scratch vector.
r Real array of dimension m. (input)
Signal auto-correlation vector (band values of the symmetric Toeplitz matrix T).
NOTES
Although OPFILT solves this matrix equation faster than Gaussian elimination, OPFILT does no pivoting;
therefore, it is less numerically stable than Gaussian elimination, unless the matrix T is either positive
definite or diagonally dominant.
EXAMPLES
You can solve the following system of linear equations with the call OPFILT (3,A,B,C,R). Vector c
has a length of at least 6.
NAME
PCCFFT2D – Applies a two-dimensional (2D) complex-to-complex Fast Fourier Transform (FFT) to a
matrix distributed across a set of processors
SYNOPSIS
CALL PCCFFT2D (isign, n1, n2, scale, A, iA, jA, descA, B, iB, jB, descB, table, work,
isys, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PCCFFT2D computes the 2D complex Fast Fourier Transform (FFT) of the distributed complex matrix A,
and it stores the results in the distributed complex matrix B.
Description of the Distributed Data
This routine considers the processors to be partitioned into a one-dimensional (1D) linear array of processors.
The 2D input matrix A is then distributed across this 1D grid of processors as discussed in the following
text.
Consider a 2D matrix A of size nr-by-nc, where nr is the number of rows and nc is the number of columns
of matrix A.
Let the processors (N$PES in number) be partitioned into a 1D grid of size 1-by-npc, where npc = N$PES is
the number of processors assigned to the column dimension of A. Let the number of processors assigned to
the row dimension be 1. To partition processors into this grid, call the BLACS_GRIDINIT routine as
shown:
CALL BLACS_ GRI DIN IT (ictxt, ’C’ , 1, npc)
The input matrix, A, and the output matrix, B, are distributed across this 1D linear array of processors by
using the block (as defined in FORTRAN D and HPF) distribution along the columns. The distribution
along the rows is degenerate. The descriptors descA and descB provide information on the distribution of
the matrices A and B across the processor grid. The descriptors descA and descB are initialized using the
DESCINIT(3S) routine. The DESCINIT(3S) routine would have to be called after the call to
BLACS_GRIDINIT and would look like this:
CALL DESCIN IT (descA, nr, nc, nr, ICE IL( nc, npc),
pesr, pesc, ictxt, lld, info)
Given that matrix A is distributed in a block manner across the processor grid along the columns, the block
size along the columns would be the following:
ncpp = ICE IL( nc, npc)
Further assume that the user wants the 2D FFT to be performed on a submatrix starting at global address
(iA, jA). Let this submatrix be of size n1-by-n2. Then the arguments iA and jA represent the global address
of the first element of the submatrix and n1 and n2 represent the size of the submatrix over which the 2D
FFT is to be performed.
Similarly, the iB and jB arguments represent the first element of the submatrix to which the output is to be
written.
Restrictions
In the current release, the matrices A and B must be distributed identically. This means that all of the
arguments provided to DESCINIT(3S) to initalize descA and descB must be equal except for the local
leading dimension.
The flexibility of performing the FFT over any submatrix of the global input matrix is not available in the
current release. Therefore, users must initialize iA, jA, iB, and jB with 1 and n1 with nr and n2 with nc.
All processors must call this routine. In future releases, only those processors that own the matrix over
which the FFT is to be performed will participate in the computation. All other processors will exit the
routine immediately.
2D FFT Theory
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
COM PLE X A(0 :n1 -1, 0:n2-1 )
COM PLE X B(0 :n1 -1, 0:n2-1 )
where
isign . 2 . π . i
ω1 = e isign n1
. 2 . π . i i = +√−1
ω2 = e n2
π = 3.14159. . .
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. To compute any of the
various possible definitions, however, choose the appropriate values for isign and scale.
If you take the FFT with any particular values of isign and scale, the mathematical inverse function is
computed by taking the FFT with -isign and 1 / (n1 . n2 . scale). In particular, if you use isign = +1 and
scale = 1.0 for the forward FFT, you can compute the inverse FFT by using isign = – 1 and scale = 1.0 / (n1
. n2).
If the values of either n1 or n2 are prime or not factorizable into powers of 2, 3 and 5 then significant
improvements in computational time can be obtained by using the following initializations of isys which is a
vector of length 3.
If both n1 and n2 are factorizable into powers of 2, 3 and 5, for example, n1 = 30 and n2 = 120, then
isys(1) = 2
isys(2) = 0
isys(3) = 0
If any one dimension is not factorizable into powers of 2, 3 and 5, then the following intializations of isys
yield the fastest times:
n1 not factorizable but n2 factorizable
isys(1) = 2
isys(2) = 1
isys(3) = 0
n1 factorizable but n2 not factorizable
isys(1) = 2
isys(2) = 0
isys(3) = 1
both n1 and n2 not factorizable
isys(1) = 2
isys(2) = 1
isys(3) = 1
Here isys(1) indicates the dimension of the matrix over which the FFT is being performed.
If the numbers n1 and n2 are not known ahead of time, then isys(2) and isys(3) could be initialized to 0 or 1;
if an inappropriate choice is made, the routine would compute the correct result for n1 and n2, although
slowly (if either n1 or n2 were prime). If initialized to 1, more workspace is needed; see the description of
table which follows.
The storage requirements for the vector table depend on the values of the isys vector. The PCCFFT2D
routine accepts the following arguments (all scalar values are private data):
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse transform as
follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Number of rows in the sumbatrix to be transformed.
n2 Integer. (input)
Number of columns in the sumbatrix to be transformed.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale factor after taking the
Fourier transform.
A Private complex array of dimension (0:lldA– 1, 0:ICEIL(nc,npc)– 1). (input)
Input array of values to be transformed. lldA is the local leading dimension and is initialized
using DESCINIT(3S). A must be declared in a COMMON block.
iA With jA, the global address of the first element of the global input matrix.
jA With iA, the global address of the first element of the global input matrix.
descA Integer vector of dimension 9. (input)
Contains description of the distribution of the matrix A across a 1D processor grid.
B Private complex array of dimension (0:lldB– 1, 0:ICEIL(nc,npc)– 1). (output)
Output array of transformed values. lldB is the local leading dimension and is initialized using
DESCINIT(3S).
Output array B may be the same as the input array A in which case the input array A is
overwritten with the transformed values. B must be declared in a COMMON block.
iB With jB, the global address of the first element of the global matrix where output will be written.
jB With iB, the global address of the first element of the global matrix where output will be written.
descB Integer vector of dimension 9. (input)
Contains description of the distribution of the matrix B across a 1D processor grid. If the input
array and the output array are the same, you must use the same descriptors.
table Private real vector of length 2(n1 + n2) if both isys(2) and isys(3) are equal to zero. Private real
vector of length 12(n1 + n2) if either isys(2) or isys(3) is equal to 1. (input or output)
Table of trigonometric function values.
If isign = 0, the routine initializes table (table is output only).
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0 (table is input only).
work Private complex vector of length (n1r)(ICEIL(n2r/npc))
Where n1r and n2r are the values of n1 and n2 rounded up to the nearest powers of 2 greater
than or equal to them. work must be declared in a COMMON block.
isys Private integer vector of length 3. (input)
isys(1) indicates the dimension of the problem which is 2. isys(1) should be set to 2.
isys(2) indicates if n1 is prime or not factorizable into powers of 2, 3 and 5. Should be set to 1
if the number is not factorizable into powers of 2, 3 and 5. Should be set to 0 if it is
factorizable.
NOTES
The scale factor scale can take on values of 1.0 or 1.0 /(n1 . n2) depending on whether the forward or
inverse FFT is being computed.
Algorithm
The routine uses a very efficient single processor FFT routine, CCFFT, to do the FFT of each column on the
processors that own the submatrix. It then transposes the submatrix by using intermediate workspace that it
allocates for the purpose, and it again does the FFT along the columns (FFTs of the rows).
If either isys(2) or isys(3) or both are initialized to 1, then a fast (O(n log(n))) algorithm based on the chirp-z
transform is used for the one dimensional FFT in the corresponding direction. In this case, the vector table
must be real of length 12(n1+n2).
Workspace
The routine dynamically allocates two real work arrays:
Size of one work array:
8 . MAX(nr . nc)
Size of the other work array:
2 . MAX(nr . ICEIL(nc,npc), nc . ICEIL(nr,npc))
The workspaces are freed on exiting.
Example code for PCCFFT2D on a 16-processor partition:
com ple x A(2 56,16)
com ple x B(2 56,16)
comple x wor k(4 096)
com mon /abw/ A, B, work
real table(819 2)
intege r ict xt, des cA( 9), des cB(9), isi gn, isy s(3)
intege r nr, nc, iceil, inf o, np
intege r n1, n2
rea l sca le
nr = 240
nc = 181
n1 = nr
n2 = nc
np = n$pes
cal l des cin it( des cA, nr, nc, nr, ice il( nc, np) ,
0, 0, ict xt, 256, info )
call descin it( descB, nr, nc, nr, ice il(nc, np),
0, 0, ict xt, 256 , inf o )
isi gn = -1
sca le = 1.0
isy s(1) = 2
isy s(2) = 0
isy s(3) = 1
*
* Ini tializ ing the tri g tab les
*
call pccfft 2d( 0, n1, n2, sca le, A, 1, 1, des cA,
B, 1, 1, des cB, tab le, wor k, isy s, inf o)
call exit(0 )
end if
call exit(0 )
end
SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), BLACS_PCOORD(3S), BLACS_PNUM(3S),
DESCINIT(3S)
NAME
PCCFFT3D – Applies a three-dimensional (3D) complex-to-complex Fast Fourier Transform (FFT) to a
matrix distributed across a set of processors
SYNOPSIS
CALL PCCFFT3D (isign, n1, n2, n3, scale, A, iA, jA, kA, descA, B, iB, jB, kB, descB,
table, work, isys, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PCCFFT3D computes the 3D complex Fast Fourier Transform (FFT) of the distributed complex matrix A,
and it stores the results in the distributed complex matrix B.
Description of the Distributed Data
This routine considers the processors to be partitioned into a two-dimensional (2D) grid. The 3D input
matrix A is then distributed across this 2D grid of processors as discussed in the following text.
Consider a 3D matrix A of size nx-by-ny-by-nz. nx, ny, and nz are the sizes of the matrix A along the X, Y,
and Z dimensions, respectively.
Let the processors (N$PES in number) be partitioned into a 2D grid of size npy-by-npz where npy is the
number of processors assigned to the Y dimension and npz is the number of processors assigned to the Z
dimension. Let the number of processors assigned to the X dimension be 1. To partition the processors into
this grid call the GRIDINIT3D(3S) routine as follows:
CAL L GRI DINIT3 D (IC TXT , 1, npy, npz)
The input matrix, A, and the output matrix, B, are distributed across this 2D processor grid by using the
block (as defined in FORTRAN D and HPF) distribution along the Y and Z dimensions. The distribution
along the X dimension is degenerate. The descriptors descA and descB provide information on the
distribution of the matrices A and B across the processor grid. The descriptors descA and descB are
initialized using the DESCINIT3D(3S) routine.
The DESCINIT3D(3S) routine would have to be called after the call to GRIDINIT3D(3S) and would look
like this:
CALL DESCINIT3D (descA, nx, ny, nz, nx, ICEIL(ny, npy), ICEIL(nz, npz), pesx, pesy, pesz,
ictxt, lldx, lldy, info)
Given that matrix A is distributed in a block manner across the processor grid along the Y and Z dimensions,
the block size along these two dimensions would be as follows:
nypp = ICEIL(ny, npy) and nzpp = ICEIL(nz, npz)
Further assume that the user wants the 3D FFT to be performed on a submatrix starting at global address
(iA, jA, kA). Let this submatrix be of size n1-by-n2-by-n3. Then the iA, jA, and kA arguments represent the
global address of the first element of the submatrix and n1, n2, and n3 represent the size of the submatrix
over which the 3D FFT will be performed.
Similarly, the iB, jB, and kB arguments represent the first element of the submatrix to which the output will
be written.
Restrictions
In the current release, the matrices A and B must be distributed identically. This means that all of the
arguments provided to DESCINIT3D(3S) to initalize descA and descB must be equal except the local
leading dimensions.
The flexibility of performing the FFT over any submatrix of the global input matrix A is not available in the
current release. Therefore, you must initialize iA, jA, kA, iB, jB, and kB with 1 and n1 with nx, n2 with ny,
and n3 with nz.
All processors must call this routine. In future releases only those processors that own the matrix over
which the FFT is to be performed, will participate in the computation. All other processors will exit the
routine immediately.
3D FFT Theory
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
COM PLE X A(0 :n1 -1, 0:n 2-1 , 0:n 3-1 )
COM PLE X B(0 :n1 -1, 0:n 2-1 , 0:n 3-1 )
isign . 2 . π . i
ω3 = e n3
i = +√−1
π = 3.14159. . . isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. To make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.
If you take the FFT with any particular values of isign and scale, the mathematical inverse function is
computed by taking the FFT with -isign and 1/(n1 . n2 . n3 . scale). In particular, if you use isign = +1 and
scale = 1.0 for the forward FFT, you can compute the inverse FFT by using isign = – 1 and scale = 1/(n1 .
n2 . n3).
If the values of either n1 or n2 are prime or not factorizable into powers of 2, 3 and 5 then significant
improvements in computational time can be obtained by using the following initializations of isys which is a
vector of length 4.
The first element of isys indicates the dimension of the problem, i.e., isys(1) = 3. The next three elements of
isys indicate if the lengths n1, n2 and n3 are factorizable into powers of 2, 3 and 5. isys(2) is set to 0 if n1
is factorizable into powers of 2, 3 and 5 and is set to 1 otherwise. Similarly isys(3) and isys(4) are set to
zero if n2 and n3 are factorizable into powers of 2, 3 and 5 and set to 1 if they are not.
For example if n1 = 256, n2 = 240 and n3 = 254, then the best computational time is obtained by setting
isys(1) = 3 (dimension of the problem)
isys(2) = 0
isys(3) = 0
isys(4) = 1
If the numbers n1, n2 and n3 are not known ahead of time, then isys(2), isys(3) and isys(4) could be
initialized to 0 or 1; if an inappropriate choice is made, the routine would compute the correct result,
although slowly (if either n1, n2 or n3 were not factorizable into powers of 2, 3 and 5). If initialized to 1,
more workspace is needed; see the description of table which follows.
The storage requirements for the vector table depend on the values of the isys vector.
The PCCFFT3D routine accepts the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse transform as
follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, n3, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Transform size in the X dimension.
n2 Integer. (input)
Transform size in the Y dimension.
n3 Integer. (input)
Transform size in the Z dimension.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale factor after taking the
Fourier transform.
A Private complex array of the following dimension: (0:lldxA– 1, 0:lldyA– 1, 0:ICEIL(nz,npz)– 1).
(input)
Input array of values to be transformed.
lldxA and lldyA are the local leading dimensions along the X and Y dimensions, and are
initialized using DESCINIT3D(3S). A must be declared in a COMMON block.
iA With jA and kA, the global address of the first element of the global input matrix.
jA With iA and kA, the global address of the first element of the global input matrix.
kA With iA and jA, the global address of the first element of the global input matrix.
descA Integer vector of dimension 12. (input)
Contains description of the distribution of the matrix A across a 3D processor grid.
B Private complex array of the following dimension: (0:lldxB– 1, 0:lldyB– 1, 0:ICEIL(nz,npz)– 1).
(output)
Output array of transformed values.
lldxB and lldyB are the local leading dimensions along the X and Y dimensions and are
initialized using DESCINIT3D(3S).
The output array B may be the same as the input array A in which case the input array A is
overwritten with the transformed values. B must be declared in a COMMON block.
iB With jB and kB, the global address of the first element of the global matrix where output will be
written.
jB With iB and kB, the global address of the first element of the global matrix where output will be
written.
kB With iB and jB, the global address of the first element of the global matrix where output will be
written.
descB Integer vector of dimension 12. (input)
Contains description of the distribution of the matrix B across a 3D processor grid.
If the input array and the output array are the same, then the same descriptors must be used.
table Private real vector of length 2(n1 + n2 + n3) if isys(2), isys(3) and isys(4) = 0. Private real
vector of length 12(n1 + n2 + n3), if isys(2), isys(3) or isys(4) = 1. (input or output)
If isign = 0, the routine initializes table (table is output only).
If isign = +1 or – 1, the values in the table are assumed to be initialized already by a prior call
with isign = 0 (table is input only).
work Private complex vector of length (n1r . ICEIL(n2r,npy) . ICEIL(n3r,npz). (workspace)
Where n1r, n2r and n3r are the values of n1, n2, and n3 rounded up to the nearest powers of 2
greater than or equal to them. work must be declared in a COMMON block.
NOTES
The scale factor scale can take on values of 1.0 or 1.0/(n1 . n2 . n3) depending on whether the forward or
inverse FFT is being computed.
Algorithm
The routine uses a very efficient single FFT routine, CCFFT, to do the FFT of each column (X dimension)
on the processors that own the submatrix. It then transposes the submatrix along the X-Y plane, using
intermediate workspace that it allocates for the purpose, and again does the FFT along the columns (FFTs of
the Y dimension). The submatrix is again transposed along the X-Y plane to restore the original
distribution. Now the submatrix is transposed along the X-Z planes and the FFTs along the Z dimension are
computed. Finally another transpose along the X-Z plane restores the original distribution.
If either isys(2), isys(3) or isys(4) or all are initialized to 1, then a fast (O(n log(n))) algorithm based on the
chirp-z transform is used for the one dimensional FFT in the corresponding direction. In this case, the
vector table must be real of length 12(n1+n2+n3).
Workspace
The routine dynamically allocates two real work arrays:
Size of one work array:
8 . MAX(nx . ny . nz)
Size of the other work array:
2 . MAX(nx . ICEIL(ny,npy), ny . ICEIL(nx,npy), nx . ICEIL(nz,npz), nz . ICEIL(nx,npz))
The workspaces are freed on exiting.
EXAMPLES
Example code for PCCFFT3D on a 16-processor partition:
integer ictxt, des cA( 12) , des cB( 12), isign, isy s(4 )
int ege r nx, ny, nz, npy , npz , ice il, inf o
int ege r n1, n2, n3
rea l sca le
nx = 240
ny = 181
nz = 145
n1 = nx
n2 = ny
n3 = nz
npy = 4
npz = n$p es / 4
call descinit3 d( des cA, nx, ny, nz, nx, ice il( ny, npy ),
iceil( nz, npz ), 0, 0, 0, ictxt, 256 , 70, inf o )
cal l des cin it3d( des cB, nx, ny, nz, nx, ice il(ny, npy),
iceil( nz, npz ), 0, 0, 0, ict xt, 256, 70, inf o )
isign = -1
scale = 1.0
isys(1 ) = 3
isys(2 ) = 0
isys(3 ) = 1
isys(4 ) = 1
*
* Ini tializ ing the tri g tab les
*
cal l pccfft 3d( 0, n1, n2, n3, sca le, A, 1, 1, 1,
des cA, B, 1, 1, 1, descB, tab le, wor k, isy s, inf o)
*
* FFT
*
cal l pccfft 3d( isign, n1, n2, n3, sca le, A, 1, 1, 1,
des cA, B, 1, 1, 1, descB, tab le, wor k, isy s, inf o)
SEE ALSO
DESCINIT3D(3S), GRIDINFO3D(3S), GRIDINIT3D(3S), PCOORD3D(3S), PNUM3D(3S)
NAME
PSCFFT2D, PCSFFT2D – Applies a two-dimensional (2D) real-to-complex or complex-to-real Fast Fourier
Transform (FFT) to a matrix distributed across a set of processors
SYNOPSIS
CALL PSCFFT2D (isign, n1, n2, scale, A, iA, jA, descA, B, iB, jB, descB, table, work,
isys, info)
CALL PCSFFT2D (isign, n1, n2, scale, A, iA, jA, descA, B, iB, jB, descB, table, work,
isys, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSCFFT2D computes the 2D real-to-complex Fast Fourier Transform (FFT) of the distributed real matrix A,
and it stores the results in the distributed complex matrix B.
PCSFFT2D computes the two-dimensional complex-to-real Fast Fourier Transform (FFT) of the distributed
complex matrix A, and it stores the results in the distributed real matrix B.
Description of the Distributed Data
PSCFFT2D considers the processors to be partitioned into a one-dimensional (1D) linear array of processors.
The 2D input matrix A is then distributed across this 1D grid of processors as discussed in the following
text.
Consider a 2D matrix A of size nr-by-nc, where nr is the number of rows and nc is the number of columns
of matrix A.
Let the processors (N$PES in number) be partitioned into a 1D grid of size 1-by-npc, where npc = N$PES is
the number of processors assigned to the column dimension of A. Let the number of processors assigned to
the row dimension be 1. To partition processors into this grid, call the BLACS_GRIDINIT routine as
shown:
CALL BLACS_ GRI DIN IT (ictxt, ’C’ , 1, npc)
The input matrix, A, and the output matrix, B, are distributed across this 1D linear array of processors by
using the block (as defined in FORTRAN D and HPF) distribution along the columns. The distribution
along the rows is degenerate. The descriptors descA and descB provide information on the distribution of
the matrices A and B across the processor grid. The descriptors descA and descB are initialized using the
DESCINIT(3S) routine. The DESCINIT(3S) routine would have to be called after the call to
BLACS_GRIDINIT and would look like this:
CALL DESCINIT (descA, nr, nc, nr, ICEIL(nc, npc), pesr, pesc, llda, info)
Due to the symmetry in the FFT of A, the only computed values stored in the output matrix B are (0:(nr/2),
0:nc-1). Therefore, the call to DESCINIT(3S) for the output matrix B would look like the following:
nr
nrb = +1
2
CALL DESCINIT (descB, nrb, nc, nrb, ICEIL(nc, npc), pesr, pesc, lldb, info)
Here, llda and lldb are the local leading dimensions of the private data matrices A and B that store the local
portions of the global input and output 2D matrices participating in the FFT computation.
Given that matrix A is distributed in a block manner across the processor grid along the columns, the block
size along the columns would be the following:
ncpp = ICEIL(nc, npc)
Further assume that the user wants the 2D FFT to be performed on a submatrix starting at global address
(iA, jA). Let this submatrix be of size n1-by-n2. Then the arguments iA and jA represent the global address
of the first element of the submatrix and n1 and n2 represent the size of the submatrix over which the 2D
FFT is to be performed.
Similarly, the iB and jB arguments represent the first element of the submatrix to which the output is to be
written.
Restrictions
In the current release, it is required that the matrices A and B be distributed conformably. This means that
apart from the difference in the dimensions of A and B in the number of rows, the distribution along the
columns must be identical.
The flexibility of performing the FFT over any submatrix of the global matrix is not available in the current
release. Therefore, users must initialize iA, jA, iB, and jB with 1 and n1 with nr and n2 with nc.
All processors must call this routine. In future releases, only those processors that own the matrix over
which the FFT is to be performed will participate in the computation. All other processors will exit the
routine immediately.
2D FFT Theory
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
REAL A(0 :n1-1, 0:n 2-1)
COMPLE X B(0 :n1/2, 0:n 2-1)
where
isign . 2 . π . i isign . 2 . π . i
ω1 = e n1
ω2 = e n2
i = +√−1 π = 3.14159. . .
isign = ±1
If in a certain application it is known that the FFT of the complex input matrix is real, then instead of using
PCCFFT2D, you can save computation time by using PCSFFT2D. This is often the case because the
complex input matrix was the transform of a real matrix. In this case, you can save about half the
computational work and PCSFFT2D computes an identical formula with the input and output matrices
reversed.
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. To compute any of the
various possible definitions, however, choose the appropriate values for isign and scale.
If you call PSCFFT2D with any particular values of isign and scale the mathematical inverse function is
computed by calling PCSFFT2D with – isign and 1/(n1 . n2 . scale). In particular, if you use isign = +1 and
scale = 1.0 in PSCFFT2D for the forward FFT, you can compute the inverse FFT by using PCSFFT2D with
isign = – 1 and scale = 1.0/(n1 . n2).
PSCFFT2D is very similar in function to PCCFFT2D, but it takes the real-to-complex transform in the first
dimension, followed by the complex-to-complex transform in the second dimension.
PCSFFT2D does the reverse. It takes the complex-to-real FFT in the second dimension, followed by the
complex-to-real FFT in the first dimension.
See the SCFFT(3S) man page for more information about real-to-complex and complex-to-real FFTs. The
2D analog of the conjugate formula is as follows:
B(k1, k2) = conjg(B(n1 – k1, n2 – k2))
for
n1/2 < k1 ≤ n1– 1
Therefore, you have to compute only slightly more than half of the output values, namely:
B(k1, k2)
for
0 ≤ k1 ≤ n1 / 2
0 ≤ k2 ≤ n2– 1
Therefore, the only value of B that is computed is B(0:n1/2, 0:n2-1).
If the values of either n1 or n2 are prime or not factorizable into powers of 2, 3 and 5 then significant
improvements in computational time can be obtained by using the following initializations of isys which is a
vector of length 3.
If both n1 and n2 are factorizable into powers of 2, 3 and 5, for example, n1 = 30 and n2 = 120 then
isys(1) = 2
isys(2) = 0
isys(3) = 0
If any one dimension is not factorizable into powers of 2, 3 and 5 then the following intializations of isys
yield the fastest times:
n1 not factorizable but n2 factorizable
isys(1) = 2
isys(2) = 1
isys(3) = 0
n1 factorizable but n2 not factorizable
isys(1) = 2
isys(2) = 0
isys(3) = 1
both n1 and n2 not factorizable
isys(1) = 2
isys(2) = 1
isys(3) = 1
Here isys(1) indicates the dimension of the matrix over which the FFT is being performed.
If the numbers n1 and n2 are not known ahead of time, then isys(2) and isys(3) could be initialized to 0 or 1;
if an inapproriate choice is made, the routine would compute the correct result for n1 and n2, although
slowly. If initialized to 1, more workspace is needed; see the description of table which follows.
The storage requirements for the vector table depend on the values of the isys vector.
These routines accept the following arguments (all scalar values are private data):
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse transform as
follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Number of rows in the sumbatrix to be transformed.
n2 Integer. (input)
Number of columns in the sumbatrix to be transformed.
info is set to 0 if all the arguments passed to the routine are legal. If any argument has an
illegal value, the routine exits after setting info to a negative number. – info indicates the
position of the illegal argument.
The argument list for PCSFFT2D is identical to that of PSCFFT2D except that the input array for
PCSFFT2D is complex and the output array is real. If the routine PCSFFT2D was being used to compute
the inverse FFT of the matrix B (the FFT of the matrix A), then the arguments pertaining to A and B in
PSCFFT2D are reversed for PCSFFT2D.
NOTES
The scale factor scale can take on values of 1.0 or 1.0/(n1 . n2), depending on whether the forward or
inverse FFT is being computed.
The format of the vector that stores the trig tables (table) is the same for both routines. It can be initialized
by either routine.
Algorithm
The routine uses a very efficient single FFT routine, SCFFT, to do the FFT of each column on the
processors that own the submatrix. It then transposes the submatrix by using intermediate workspace that it
allocates for the purpose, and it again does the FFT along the columns (FFTs of the rows).
PCSFFT2D first transposes the matrix and performs a very efficient single processor FFT routine, CCFFT,
on the columns of the transposed matrix (that is, the rows of the original input matrix). This intermediate
matrix is then transposed again after which a complex-to-real FFT, CSFFT, is applied to the columns.
If either isys(2) or isys(3) or both are initialized to 1, then a fast (0(n log(n))) algorithm based on the chirp-z
transform is used for the 1D FFT in the corresponding direction. In this case, the vector table must be real
of length 12(n1+n2).
Workspace
The routine dynamically allocates two real work arrays:
Size of one work array:
8 . MAX(nr, nc)
Size of the other work array:
2 . MAX(nr . ICEIL(nc,npc), nc . ICEIL(nr,npc))
The workspaces are freed on exiting.
EXAMPLES
Example code for PSCFFT2D and PCSFFT2D on a 16-processor partition:
int eger ictxt, des cA( 9), des cB( 9), des cC( 9), isi gn, isy s(3)
int ege r nr, nc, ice il, inf o, np, nrB
int ege r n1, n2
rea l sca le
nr = 240
nc = 181
nrB= (nr/2) + 1
n1 = nr
n2 = nc
np = n$pes
call descin it( des cA, nr, nc, nr, ice il(nc, np),
0, 0, ict xt, 256 , inf o )
cal l des cin it( descC, nr, nc, nr, ice il( nc, np) ,
0, 0, ict xt, 256 , inf o )
cal l des cin it( descB, nrB , nc, nrB , ice il(nc, np),
0, 0, ict xt, 129 , inf o )
isign = -1
scale = 1.0
isys(1 ) = 2
isys(2 ) = 0
isys(3 ) = 1
*
* Ini tia liz ing the tri g tab les
*
cal l pscfft 2d( 0, n1, n2, sca le, A, 1, 1, des cA,
B, 1, 1, des cB, table, wor k, isy s, inf o)
cal l pcsfft 2d( isign, n1, n2, sca le, B, 1, 1, des cB,
C, 1, 1, des cC, tab le, work, isys, info)
call exit(0 )
end
SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), BLACS_PCOORD(3S), BLACS_PNUM(3S),
DESCINIT(3S)
NAME
PSCFFT3D, PCSFFT3D – Applies a three-dimensional (3D) real-to-complex or complex-to-real Fast Fourier
Transform (FFT) to a matrix distributed across a set of processors
SYNOPSIS
CALL PSCFFT3D (isign, n1, n2, n3, scale, A, iA, jA, kA, descA, B, iB, jB, kB, descB,
table, work, isys, info)
CALL PCSFFT3D (isign, n1, n2, n3, scale, A, iA, jA, kA, descA, B, iB, jB, kB, descB,
table, work, isys, info)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
PSCFFT3D computes the 3D real-to-complex Fast Fourier Transform (FFT) of the distributed real matrix A,
and it stores the results in the distributed complex matrix B.
PCSFFT3D computes the 3D complex-to-real Fast Fourier Transform (FFT) of the distributed complex
matrix B, and it stores the results in the distributed real matrix A.
Description of the Distributed Data
This routine considers the processors to be partitioned into a two-dimensional (2D) grid. The 3D input
matrix A is then distributed across this 2D grid of processors as discussed in the following text.
Consider a 3D matrix A of size nx-by-ny-by-nz. nx, ny, and nz are the sizes of the matrix A along the X, Y,
and Z dimensions, respectively.
Let the processors (N$PES in number) be partitioned into a 2D grid of size npy-by-npz where npy is the
number of processors assigned to the Y dimension and npz is the number of processors assigned to the Z
dimension. Let the number of processors assigned to the X dimension be 1. To partition the processors into
this grid call the GRIDINIT3D(3S) routine as follows:
CALL GRIDIN IT3 D (IC TXT, 1, npy, npz)
The input matrix, A, and the output matrix, B, are distributed across this 2D processor grid by using the
block (as defined in FORTRAN D and HPF) distribution along the Y and Z dimensions. The distribution
along the X dimension is degenerate. The descriptors descA and descB provide information on the
distribution of the matrices A and B across the processor grid. The descriptors descA and descB are
initialized using the DESCINIT3D(3S) routine. The DESCINIT3D(3S) routine would have to be called
after the call to GRIDINIT3D(3S) and would look like this:
CALL DESCINIT3D (descA, nx, ny, nz, nx, ICEIL(ny, npy), ICEIL(nz, npz), pesz, pesy, pesz,
ictxt, lldxA,
Due to the symmetry in the FFT of A, the only computed values stored in the output matrix B are (0:(nx/2),
0:ny-1, 0:nz-1). Therefore, the call to DESCINIT3D(3S) for the output matrix B would look like the
following:
nxB = (nx/2) + 1
CALL DESCINIT3D (descB, nxB, ny, nz, nxB, ICEIL(ny, npy), ICEIL(nz, npz), pesx, pesy,
pesz, ictxt, lldxB, lldyB, info)
Here, llda and lldb are the local leading dimensions of the private data matrices A and B that store the local
portions of the global input and output 3D matrices participating in the FFT computation.
Given that matrix A is distributed in a block manner across the processor grid along the Y and Z dimensions,
the block size along these two dimensions would be as follows:
nypp = ICEIL(ny, npy) and nzpp = ICEIL (nz, npz)
Further assume that the user wants the 3D FFT to be performed on a submatrix starting at global address
(iA, jA, kA). Let this submatrix be of size n1-by-n2-by-n3. Then the iA, jA, and kA arguments represent the
global address of the first element of the submatrix and n1, n2, and n3 represent the size of the submatrix
over which the 3D FFT will be performed.
Similarly, the iB, jB, and kB arguments represent the first element of the submatrix to which the output will
be written.
Restrictions
In the current release, the matrices A and B must be distributed identically. This means that all of the
arguments provided to DESCINIT3D(3S) to initalize descA and descB must be equal, except the local
leading dimensions, and the size of the matrix in the X dimension.
The flexibility of performing the FFT over any submatrix of the global matrix is not available in the current
release. Therefore, you must initialize iA, jA, kA, iB, jB, and kB with 1 and n1 with nx, n2 with ny, and n3
with nz.
All processors must call this routine. In future releases only those processors that own the matrix over
which the FFT is to be performed, will participate in the computation. All other processors will exit the
routine immediately.
3D FFT Theory
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
REAL A(0 :n1 -1, 0:n 2-1 , 0:n 3-1)
COMPLE X B(0 :n1/2, 0:n2-1 , 0:n3-1 )
n1−1 n2−1 n3−1 .k .k .k
Σ Σ Σ
j . ω2 j . ω3 j
PSCFFT3D computes the formula: Bk k k
1, 2, 3
= scale . A j ,j ,j . ω1
1 2 3
1 1 2 2 3 3
j =0 j =0 j =0
1 2 3
k 1 = 0, . . ., n 1 / 2
for k 2 = 0, . . ., n2−1
k 3 = 0, . . ., n3−1
where
isign . 2 . π . i
ω1 = e n1
π = 3.14159. . .
isign . 2 . π . i
ω2 = e n2
i = +√−1
isign . 2 . π . i
ω3 = e n3
isign = ±1
If in a certain application it is known that the FFT of the complex input matrix is real, then instead of using
PCCFFT3D, you can save computation time by using PCSFFT3D. Often, this is the case because the
complex input matrix was the transform of a real matrix. In this case, you can save about half of the
computational work and PCSFFT3D computes an identical formula with the input and output matrices
reversed.
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. To make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.
If you call PSCFFT3D with any particular values of isign and scale, the mathematical inverse function is
computed by taking the FFT with – isign and 1 /(n1 . n2 . n3 . scale). In particular, if you use isign = +1
and scale = 1.0 in PSCFFT3D for the forward FFT, you can compute the inverse FFT by using PCSFFT3D
with isign = – 1 and scale = 1/(n1 . n2 . n3).
PSCFFT3D is very similar in function to PCCFFT3D, but it takes the real-to-complex transform in the first
dimension, followed by the complex-to-complex transform in the second and third dimension. PCSFFT3D
does the reverse. It takes the complex-to-complex FFT in the third and second dimension, followed by the
complex-to-real FFT in the first dimension. See the SCFFT(3S) man page for more information about
real-to-complex and complex-to-real FFTs. The three dimensional analog of the conjugate formulate is as
follows:
B(k1, k2, k3) = conjg(B(n1 – k1, n2 – k2, n3 – k3))
for
n1 / 2 < k1 ≤ n1 – 1
0 ≤ k2 ≤ n2 – 1
0 ≤ k3 ≤ n3 – 1
where the notation conjg(z) represents the complex conjugate of z.
Therefore, you have to compute only slightly more than half of the output values, namely:
B(k1, k2, k3)
for
0 ≤ k1 ≤ n1/2
0 ≤ k2 n2– 1
0 ≤ k3 ≤ n3– 1
Therefore, the only values of B that are computed are B(0: n1 / 2, 0:n2– 1,0:n3– 1).
If the values of either n1, n2, or n3 are prime or not factorizable into powers of 2, 3 and 5 then significant
improvements in computational time can be obtained by using the following initializations of isys which is a
vector of length 4.
The first element of isys indicates the dimension of the problem, i.e., isys(1) = 3. The next three elements of
isys indicate if the lengths n1, n2 and n3 are factorizable into powers of 2, 3 and 5. isys(2) is set to 0 if n1
is factorizable into powers of 2, 3 and 5 and is set to 1 otherwise. Similarly isys(3) and isys(4) are set to
zero if n2 and n3 are factorizable into powers of 2, 3 and 5 and set to 1 if they are not.
For example if n1 = 256, n2 = 240 and n3 = 254, then the best computational time is obtained by setting
isys(1) = 3 (dimension of the problem)
isys(2) = 0
isys(3) = 0
isys(4) = 1
If the numbers n1, n2 and n3 are not known ahead of time, then isys(2), isys(3) and isys(4) could be
initialized to 0 or 1; if an inappropriate choice is made, the routine would compute the correct result,
although slowly. If initialized to 1, more workspace is needed; see the description of table which follows.
The storage requirements for the vector table depend on the values of the isys vector.
The PSCFFT3D routine accepts the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse transform as
follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, n3, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Transform size in the X dimension.
n2 Integer. (input)
Transform size in the Y dimension.
n3 Integer. (input)
Transform size in the Z dimension.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale factor after taking the
Fourier transform.
A Private complex array of dimension (0:lldxA– 1,0:lldyA– 1,0:ICEIL(nz,npz)– 1). (input)
Input array of values to be transformed. lldxA and lldyA are the local leading dimensions along
the X and Y dimensions, and are initialized using DESCINIT3D(3S). A must be declared in a
COMMON block.
iA With jA and kA, the global address of the first element of the global input matrix.
jA With iA and kA, the global address of the first element of the global input matrix.
kA With iA and jA, the global address of the first element of the global input matrix.
descA Integer vector of dimension 12. (input)
Contains description of the distribution of the matrix A across a 3D processor grid.
B Private complex array of dimension (0:lldxB– 1,0:lldyB– 1,0:ICEIL(nz,npz)– 1). (output)
Output array of transformed values. lldxB and lldyB are the local leading dimensions along the
X and Y dimensions and are initialized using DESCINIT3D(3S). B must be declared in a
COMMON block.
The output array B may be the same as the input array A in which case the input array A is
overwritten with the transformed values.
iB With jB and kB, the global address of the first element of the global matrix where output will be
written.
jB With iB and kB, the global address of the first element of the global matrix where output will be
written.
kB With iB and jB, the global address of the first element of the global matrix where output will be
written.
descB Integer vector of dimension 12. (input)
Contains description of the distribution of the matrix B across a 3D processor grid.
If the input array and the output array are the same, then the same descriptors must be used.
table Private real vector of length 2(n1 + n2 + n3) if isys(2), isys(3) and isys(4) = 0. Private real
vector of length 12(n1 + n2 + n3), if isys(2), isys(3) or isys(4) = 1. (input or output)
If isign = 0, the routine initializes table (table is output only). If isign = +1 or – 1, the values in
the table are assumed to be initialized already by a prior call with isign = 0 (table is input only).
work Private complex vector length 2(n1r . ICEIL(n2r,npy) . ICEIL(n3r,npz)
where n1r, n2r, and n3r are the values of n1, n2 and n3 rounded up to the nearest powers of 2
greater than or equal to them. work must be declared in a COMMON block.
isys Private integer vector of length 4. (input)
isys(1) indicates the dimension of the problem which is 3. isys(1) should be set to 3.
isys(2), isys(3) and isys(4) should be set to 0 or 1, depending on whether n1, n2, or n3 is
factorizable or not into powers of 2, 3 and 5 correspondingly.
info Integer. (output)
info is set to 0 if all the arguments passed to the routine are legal. If any argument has an
illegal value, the routine exits after setting info to a negative number. – info indicates the
position of the illegal argument.
NOTES
The scale factor scale can take on values of 1.0 or 1.0/(n1 . n2 . n3) depending on whether the forward or
inverse FFT is being computed.
The format of the vector that stores the trig tables (table) is the same for both routines. It can be initizalized
by either routine.
Algorithm
The routine uses a very efficient single FFT routine, CCFFT, to do the FFT of each column (X dimension)
on the processors that own the submatrix. It then transposes the submatrix along the X-Y plane, using
intermediate ,orkspace that it allocates for the purpose, and again does the FFT along the columns (FFTs of
the Y dimension). The submatrix is again transposed along the X-Y plane to restore the original
distribution. Now the submatrix is transposed along the X-Z planes and the FFTs along the Z dimension are
computed. Finally another transpose along the X-Z plane restores the original distribution.
If either isys(2), isys(3) or isys(4) or all are initialized to 1, then a fast (O(n log(n))) algorithm based on the
chirp-z transform is used for the one dimensional FFT in the corresponding direction. In this case, the
vector table must be real of length 12(n1+n2+n3).
Workspace
The routine dynamically allocates two real work arrays:
Size of one work array:
8 . (MAX(nx,ny,nz).
Size of the other work array:
2 . MAX(nx . ICEIL(ny,npy), ny . ICEIL(nx,npy), nx . ICEIL(nz,npz), nz . ICEIL (nx,npz))
EXAMPLES
Example code for PSCFFT3D on a 16-processor system:
real A(256, 50, 40)
real C(256, 50, 40)
com plex B(1 29, 50, 40)
com plex wor k(2 097 152 )
common /ab cw/ A, B, C, wor k
real table( 819 2)
int eger ict xt, des cA( 12) , des cB( 12) , isi gn, isy s(4 )
int eger nx, ny, nz, npy , npz , ice il, inf o
int eger n1, n2, n3
rea l sca le
nx = 240
ny = 181
nz = 145
n1 = nx
n2 = ny
n3 = nz
npy = 4
npz = n$p es / 4
cal l des cinit3 d( des cA, nx, ny, nz, nx, ice il( ny, npy ),
iceil( nz,npz ), 0, 0, 0, ict xt, 256 , 50, inf o )
cal l des cinit3 d( des cB, nx, ny, nz, nx, ice il( ny, npy ),
ice il(nz, npz), 0, 0 ,0, ictxt, 129 , 50, info)
isi gn = -1
sca le = 1.0
isy s(1) = 3
isy s(2) = 0
isy s(3) = 1
isy s(4) = 1
*
* Ini tializ ing the trig tables
*
call pscfft 3d(0, n1, n2, n3, sca le, A, 1, 1, 1,
*
* FFT
*
cal l psc fft 3d( isi gn, n1, n2, n3, sca le, A, 1, 1, 1,
des cA, B, 1, 1, 1, des cB, tab le, wor k, isys, inf o)
isi gn = +1
sca le = 1.0 /float (n1 *n2*n3 )
*
* Inv ers e FFT
*
cal l pcs fft 3d( isi gn, n1, n2, n3, scale, B, 1, 1, 1
des cB, C, 1, 1, 1, des cC, table, wor k, isy s, inf o)
SEE ALSO
DESCINIT3D(3S), GRIDINFO3D(3S), GRIDINIT3D(3S), PCOORD3D(3S), PNUM3D(3S)
NAME
RCFFT2 – Applies a real-to-complex Fast Fourier Transform (FFT)
SYNOPSIS
CALL RCFFT2 (init, ix, n, x, work, y)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
RCFFT2 calculates the following:
n−1
2πi
yk = 2 Σ xj
j =0
exp (±
n
jk ) for k = 0,1,. . ., n / 2
2n 102466
≤ xi ≤ for i = 1,2,. . .,n.
102466 2n
work Complex array of dimension (3 . n / 2) + 2. (scratch output)
Work storage vector.
y Complex array of dimension (n / 2) + 1. (output)
Result vector.
SEE ALSO
CFFT2(3S), CRFFT2(3S)
SCFFT(3S), which supersedes this routine only on Cray Y-MP systems
NAME
RFFTMLT – Applies complex-to-real or real-to-complex Fast Fourier Transforms (FFTs) on multiple input
vectors
SYNOPSIS
CALL RFFTMLT (x, work, trigs, ifax, inc1x, inc2x, n, lot, isign)
IMPLEMENTATION
UNICOS systems
DESCRIPTION
When isign = – 1, RFFTMLT applies real-to-complex FFTs (forward transforms) on more than one input
vector. When isign = +1, RFFTMLT applies complex-to-real inverse FFTs (inverse transforms) on more than
one input vector.
This routine has the following arguments:
x Real array of dimension (0:n+1, lot). (input and output)
Contains the input values before the call to RFFTMLT, and output values after the call. On exit,
the computed output values are stored in the space originally occupied by input values. Because
the output is written back into the input array and contains n/2+1 complex values per transform,
you must size the input array to contain at least n+2 real elements per transform. (See the Data
Format subsection.)
work Real array of dimension 2 . n . lot . (scratch output)
Work storage vector.
trigs Real array of dimension 2n. (input)
Sine and cosine tables for FFT calculation. The following call initializes the vectors trigs and
ifax:
CALL FFT FAX (n, ifax, trigs)
n Integer. (input)
Length of each data vector. n ≥ 2. n must be even. Any value of n that is not valid causes
FFTFAX to return the error code ifax(1) = – 99.
lot Integer. (input)
The number of data vectors.
isign Integer. (input)
Sign of the transform:
isign = – 1 Calculates real-to-complex (forward) FFT
isign = +1 Calculates complex-to-real (inverse) FFT
NOTES
Only the first n / 2+1 complex output vectors are computed for each vector. The theory of Fourier transforms
implies that because the input is real, the values obey the symmetry:
y n– k,m = y k,m
(where the notation z denotes the complex conjugate of z).
Thus, the last n / 2 output values are complex conjugates of the first n / 2 output values.
Although the summation in the definition runs from 0 to n– 1, actually only the first n/2+1 values for each
input vector are used. The other input values are deduced from the following symmetry:
y n– k,m = y k,m
which, according to the theory, must be true because the transform of the input data is real-valued. The
isign=– 1 and isign=+1 transforms are inverses of each other.
Data Format
The x array contains both input and output values of either the real-to-complex or complex-to-real transform.
The array is declared real, but, on output from the real-to-complex transform and on input into the
complex-to-real transform, x contains complex values. The following describes how complex values are
arranged in the real array x.
Real-to-complex (isign = – 1)
The output values are stored in the same array as the input values. On input, lot real input vectors of length
n are stored as follows:
x is stored in X(j . inc1x, m) for j = 0,1,. . .,n-1 m = 1,2,. . .,lot
j,m
Space for the values X(n . inc1x,m) and X (n+1)inc1x,m must also be reserved, although these values are
not used on input, and they may be undefined.
On output, lot complex output vectors of length n / 2 +1 (same as n+2 "real" elements per vector) are stored
in the same array elements as the real input vectors, so that
Real(y ) is stored in X(2k . inc1x,m)
(k,m)
Imaginary(y (k,m) ) is stored in X((2k+1)inc1x,m)
for k = 0,1,. . .,n / 2
m = 1,2,. . .,lot
For all lot output vectors, y 0,m and y n / 2,m have real number values. Thus, their imaginary parts are set to 0.
EXAMPLES
The following program shows how to invoke RFFTMLT.
parame ter (n = 16, lot = 2, inc = 1, jump = inc *(n+2) )
real a(jump, lot ), trigs( 2*n), work(2 *n*lot )
int ege r ifa x(19)
. . .
*-- --------- --- ------ --- ------ ------ ------ ------ ------ ------ ------ ----
* Com put e the FFT of A, usi ng RFF TML T
end
SEE ALSO
CFFTMLT(3S), CRFFT2(3S), RCFFT2(3S)
SCFFTM(3S), which supersedes this routine only on Cray PVP systems
NAME
SCFFT, CSFFT – Computes a real-to-complex or complex-to-real Fast Fourier Transform (FFT)
SYNOPSIS
CALL SCFFT (isign, n, scale, x, y, table, work, isys)
CALL CSFFT (isign, n, scale, x, y, table, work, isys)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.
DESCRIPTION
SCFFT computes the FFT of the real array X, and it stores the results in the complex array Y. CSFFT
computes the corresponding inverse complex-to-real transform.
It is customary in FFT applications to use zero-based subscripts; the formulas are simpler that way. For
SCFFT, suppose that the arrays are dimensioned as follows:
REA L X(0:n- 1)
COM PLEX Y(0:n/ 2)
Then the output array is the FFT of the input array, using the following formula for the FFT:
n −1
Σ
. j .k
Yk = scale X j . ωisign for k = 0, . . ., n ⁄ 2
j =0
where 2 . π . i
ω=e n
i = + √−1
π = 3.14159. . . isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make these
routines compute any of the various possible definitions, however, by choosing the appropriate values for
isign and scale.
The relevant fact from FFT theory is this: If you call SCFFT with any particular values of isign and scale,
the mathematical inverse function is computed by calling CSFFT with – isign and 1 /(n .scale ). In particular,
if you use isign = +1 and scale = 1.0 in SCFFT for the forward FFT, you can compute the inverse FFT by
using CSFFT with isign = – 1 and scale = 1.0 / n.
This routine has the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n Integer. (input)
Size of transform. If n ≤ 2, SCFFT returns without calculating the transform.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale after taking the Fourier
transform, as defined in the preceding formula.
x SCFFT: Real array of dimension (0:n– 1). (input)
CSFFT: Complex array of dimension (0:n / 2). (input)
Input array of values to be transformed.
y SCFFT: Complex array of dimension (0:n / 2). (output)
CSFFT: Real array of dimension (0:n– 1). (output)
Output array of transformed values.
The output array, y, is the FFT of the the input array, x, computed according to the preceding
formula. The output array may be equivalenced to the input array in the calling program. Be
careful when dimensioning the arrays, in this case, to allow for the fact that the complex array
contains two (real) words more than the real array.
table UNICOS systems: Real array of dimension (100 + 4n). (input or output)
UNICOS/mk systems: Real array of dimension (2n). (input or output)
Table of factors and trigonometric functions.
If isign = 0, the table array is initialized to contain trigonometric tables needed to compute an
FFT of size n.
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0.
work UNICOS systems: Real array of dimension (4 + 4n). (scratch output)
UNICOS/mk systems: Real array of dimension (2n).
Work array used for intermediate calculations. Its address space must be different from that of
the input and output arrays.
isys Integer array of dimension (0:isys(0)). (input and output)
Use isys to specify certain processor-specific parameters or options. The first element of the
array specifies how many more elements are in the array.
If isys(0) = 0, the default values of such parameters are used. In this case, you can specify the
argument value as the scalar integer constant 0. If isys(0) > 0, isys(0) gives the upper bound of
the isys array; that is, if il = isys(0), user-specified parameters are expected in isys(1) through
isys(il).
NOTES
This subsection contains implementation information, initialization information, and performance tips.
Real-to-complex FFTs
Notice in the preceding formula that there are n real input values, and n / 2 + 1 complex output values. This
property is characteristic of real-to-complex FFTs.
The mathematical definition of the Fourier transform takes a sequence of n complex values and transforms it
to another sequence of n complex values. A complex-to-complex FFT routine, such as CCFFT(3S), will take
n complex input values, and produce n complex output values. In fact, one easy way to compute a real-to-
complex FFT is to store the input data in a complex array, then call routine CCFFT to compute the FFT.
You get the same answer when using the SCFFT routine.
The reason for having a separate real-to-complex FFT routine is efficiency. Because the input data is real,
you can make use of this fact to save almost half of the computational work. The theory of Fourier
transforms tells us that for real input data, you have to compute only the first n / 2 + 1 complex output
values, because the remaining values can be computed from the first half of the values by the simple
formula:
Y(k) = conjg(Y(n-k)) for n / 2 ≤ k ≤ n-1
where the notation conjgY represents the complex conjugate of y.
In fact, in many applications, the second half of the complex output data is never explicitly computed or
stored. Likewise, as explained later, only the first half of the complex data has to be supplied for the
complex-to-real FFT.
Another implication of FFT theory is that, for real input data, the first output value, Y(0), will always be a
real number; therefore, the imaginary part will always be 0. If n is an even number, Y(n/2) will also be real
and thus, have zero imaginary parts.
Complex-to-real FFTs
Consider the complex-to-real case. The effect of the computation is given by the preceding formula, but
with X complex and Y real.
Generally, the FFT transforms a complex sequence into a complex sequence. However, in a certain
application we may know the output sequence is real. Often, this is the case because the complex input
sequence was the transform of a real sequence. In this case, you can save about half of the computational
work.
According to the theory of Fourier transforms, for the output sequence, Y, to be a real sequence, the
following identity on the input sequence, X, must be true:
n
X(k) = conjg(X(n-k)) for ≤ k ≤ n-1
2
And, in fact, the input values X(k) for k > n / 2 need not be supplied; they can be inferred from the first half
of the input.
Thus, in the complex-to-real routine, CSFFT, the arrays can be dimensioned as follows:
COMPLE X X(0 :n/ 2)
REAL Y(0:n- 1)
There are n / 2 + 1 complex input values and n real output values. Even though only n / 2 + 1 input values
are supplied, the size of the transform is still n in this case, because implicitly you are using the FFT
formula for a sequence of length n.
Another implication of the theory is that X(0) must be a real number (that is, it must have zero imaginary
part). Also, if n is even, X(n/2) must also be real. Routine CSFFT assumes that these values are real; if you
specify a nonzero imaginary part, it is ignored.
Table Initialization
The table array stores the trigonometric tables used in calculation of the FFT. This table must be initialized
by calling the routine with isign = 0 prior to doing the transforms. The table does not have to be
reinitialized if the value of the problem size, n, does not change. Because SCFFT and CSFFT use the same
format for table, either can be used to initialize it (note that CCFFT uses a different table format).
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared (assuming n > 0):
REAL X(0:n- 1)
COM PLEX Y(0 :n/2)
No change is needed in the calling sequence; however, if you prefer you can use the more customary Fortran
style with subscripts starting at 1, as in the following:
REA L X(n )
COM PLE X Y(n /2 + 1)
Performance Tips
These routines will compute an FFT for any value of n.
Performance for a given value of n depends on the prime factorization of n. This fact is characteristic of all
FFT algorithms.
Fastest performance is realized when n is a power of 2, in which case the number of floating-point
operations is approximately (5 / 2) . n . log 2 (n).
If n contains factors of 3, performance is slightly worse. If n contains powers of 5, it is slightly more worse.
Worst performance is when n is a prime number. In that case, the number of operations is approximately
2
4n .
The kernel routines are optimized for values of n that are even numbers and are products of powers of 2, 3,
and 5. (Because the kernel routines have a special case for multiples of 4, even powers of 2 will be slightly
faster than odd powers of 2.)
Implementation-dependent Items
The Standard FFT routines were designed so that they could be implemented efficiently on many different
architectures. The calling sequence is the same in any implementation. Certain details, however, depend on
the particular implementation. These details are confined to two areas:
• The first area is the size of the table and work arrays. Different sizes may be needed on different
systems. No change is required to the subroutine call, but you may have to change the array sizes in the
DIMENSION or type statements that declare the arrays.
• The second area is the isys parameter array, an argument that gives certain implementation-specific
information. All features and functions of the FFT routines that are specific to any particular
implementation are confined to this isys array. On any implementation, you can use the default values by
specifying an argument value of 0.
In the UNICOS systems implementation, no special options are supported; therefore, you can always
specify an isys argument as constant 0. Other options may be provided in subsequent software releases.
EXAMPLES
These examples use the table and workspace sizes appropriate to UNICOS systems.
Example 1: Initialize the complex array TABLE in preparation for doing an FFT of size 1024. In this case
only the arguments isign, n, and table are used. You can use dummy arguments or zeros for the other
arguments in the subroutine call.
REAL TAB LE( 100 + 4*1024 )
CALL SCF FT( 0, 102 4, 0.0 , DUM MY, DUM MY, TAB LE, DUM MY, 0)
Example 2: X is a real array of dimension (0:1023), and Y is a complex array of dimension (0:512). Take
the FFT of X and store the results in Y. Before taking the FFT, initialize the TABLE array, as in example 1.
REAL X(0 :1023)
COMPLE X Y(0:51 2)
REAL TABLE( 100 + 4*1 024)
REAL WOR K(4*10 24 + 4)
...
CALL SCFFT( 0, 102 4, 1.0, X, Y, TAB LE, WORK, 0)
CALL SCFFT( 1, 102 4, 1.0, X, Y, TAB LE, WOR K, 0)
Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
factor 1/1024 is used. Assume that the TABLE array is initialized already.
CALL CSF FT( -1, 102 4, 1.0/1024. 0, Y, X, TABLE, WOR K, 0)
Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. The subroutine calls are not changed.
REA L X(1 024 )
COM PLE X Y(5 13)
...
CAL L SCF FT( 0, 102 4, 1.0 , X, Y, TAB LE, WOR K, 0)
CAL L SCF FT( 1, 102 4, 1.0 , X, Y, TAB LE, WOR K, 0)
Example 5: Do the same computation as in example 4, but equivalence the input and output arrays to save
storage space. Assume that the TABLE array is initialized already.
REA L X(1 024 )
COM PLE X Y(5 13)
EQU IVA LEN CE ( X(1 ), Y(1 ) )
...
CAL L SCF FT( 1, 102 4, 1.0 , X, Y, TABLE, WOR K, 0)
SEE ALSO
CCFFT(3S), CCFFTM(3S), SCFFTM(3S)
NAME
SCFFT2D, CSFFT2D – Applies a two-dimensional real-to-complex or complex-to-real Fast Fourier
Transform (FFT)
SYNOPSIS
CALL SCFFT2D (isign, n1, n2, scale, x, ldx, y, ldy, table, work, isys)
CALL CSFFT2D (isign, n1, n2, scale, x, ldx, y, ldy, table, work, isys)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.
DESCRIPTION
SCFFT2D computes the two-dimensional Fast Fourier Transform (FFT) of the real matrix X, and it stores
the results in the complex matrix Y. CSFFT2D computes the corresponding inverse transform.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way. First
the function of SCFFT2D is described. Suppose the arrays are dimensioned as follows:
REA L X(0:ld x-1 , 0:n 2-1 )
COM PLEX Y(0:ld y-1 , 0:n 2-1 )
where isign . 2 . π . i
ω1 = e n1
i = +√−1
isign . 2 . π . i
ω2 = e n2
π = 3.14159. . .
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make these
routines compute any of the various possible definitions, however, by choosing the appropriate values for
isign and scale.
The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 / (n1 . n2 . scale). In
particular, if you use isign = +1 and scale = 1.0 for the forward FFT, you can compute the inverse FFT by
using isign = – 1 and scale = 1.0/(n1 . n2).
SCFFT2D is very similar in function to CCFFT2D, but it takes the real-to-complex transform in the first
dimension, followed by the complex-to-complex transform in the second dimension.
CSFFT2D does the reverse. It takes the complex-to-complex FFT in the second dimension, followed by the
complex-to-real FFT in the first dimension.
See the SCFFT(3S) man page for more information about real-to-complex and complex-to-real FFTs. The
two-dimensional analog of the conjugate formula is as follows:
Yk , k
=Y n1 – k , n2 – k
for n1 / 2 < k 1 ≤ n1 – 1 0 ≤ k 2 ≤ n2 – 1
1 2 1 2
The storage requirements for the vector table depend on the values of the isys vector.
This feature does not exist for UNICOS systems and isys is ignored on those machines.
These routines have the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Transform size in the first dimension. If n1 is not positive, SCFFT2D returns without
calculating a transform.
n2 Integer. (input)
Transform size in the second dimension. If n2 is not positive, SCFFT2D returns without
calculating a transform.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale factor after taking the
Fourier transform, as defined previously.
x SCFFT2D: Real array of dimension (0:ldx– 1, 0:n2– 1). (input)
CSFFT2D: Complex array of dimension (0:ldx– 1, 0:n2– 1). (input)
Array of values to be transformed.
ldx Integer. (input)
The number of rows in the x array, as it was declared in the calling program. That is, the
leading dimension of x.
SCFFT2D: ldx ≥ MAX(n1, 1).
CSFFT2D: ldx ≥ MAX(n1/2 + 1, 1).
y SCFFT2D: Complex array of dimension (0:ldy– 1, 0:n2– 1). (output)
CSFFT2D: Real array of dimension (0:ldy– 1, 0:n2– 1). (output)
Output array of transformed values. The output array can be the same as the input array, in
which case, the transform is done in place and the input array is overwritten with the
transformed values. In this case, it is necessary that the following equalities hold:
SCFFT2D: ldx = 2ldy.
CSFFT2D: ldy = 2ldx.
ldy Integer. (input)
The number of rows in the y array, as it was declared in the calling program (the leading
dimension of y).
NOTES
The following notes are for UNICOS systems only. SCFFT2D(3S) and CSFFT2D(3S) on UNICOS/mk
systems provide the functionality of PCSFFT2D(3S) and PSCFFT2D on a single PE. For notes about
CSFFT2D(3S) on UNICOS/mk systems, see PSCFFT2D(3S).
Algorithm
SCFFT2D uses a routine similar to SCFFTM to do a real-to-complex FFT on the columns, then uses a
routine similar to CCFFTM to do a complex-to-complex FFT on the rows.
CSFFT2D uses a routine similar to CCFFTM to do a complex-to-complex FFT on the rows, then uses a
routine similar to CSFFTM to do a complex-to-real FFT on the columns.
Table Initialization
The table array stores factors of n1 and n2, and trigonometric tables that are used in calculation of the FFT.
table must be initialized by calling the routine with isign = 0. table does not have to be reinitialized if the
values of the problem sizes, n1 and n2, do not change.
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared:
REAL X(0 :ld x-1, 0:n 2-1)
COMPLE X Y(0 :ld y-1, 0:n 2-1)
No change is made in the calling sequence, however, if you prefer to use the more customary Fortran style
with subscripts starting at 1. The same values of ldx and ldy would be passed to the subroutine even if the
input and output arrays were dimensioned as follows:
REAL X(l dx, n2)
COMPLE X Y(l dy, n2)
Performance Tips
This routine computes an FFT for any values of n1 and n2, but the performance depends on the prime
factorizations of n1 and n2. This fact is characteristic of all FFT algorithms.
Fastest performance is realized when both n1 and n2 are powers of 2; in which case, the number of floating-
point operations is approximately 5 . n1 . n2 . log 2 (n1 . n2)
If either n1 or n2 contains factors of 3, computation time is slightly longer, because more floating-point
operations are required. If they contain powers of 5, it is longer still.
The kernel routines are optimized for values of n1 and n2 that are products of powers of 2, 3, and 5.
In UNICOS implementation, to avoid memory bank conflicts, it is very important to make the leading
dimensions of the arrays odd numbers (or, if that is not possible, make them an odd multiple of 2).
Implementation-dependent Items
The Cray Standard FFT routines were designed so that they could be implemented efficiently on many
different architectures. The calling sequence is the same in any implementation. Certain details, however,
depend on the particular implementation. These details are confined to three areas:
• The first area is the size of the table and work arrays. Different systems may need different sizes. You
do not have to change the subroutine call, but you might have to change the array sizes in the
DIMENSION or type statements that declare the arrays.
• The second area is the isys parameter array, an argument that gives certain implementation-specific
information. All features and functions of the FFT routines that are specific to any particular
implementation are confined to this isys array. On any implementation, you can use the default values by
specifying an argument value of 0.
In the UNICOS implementation, no special options are supported; therefore, you can always specify an
isys argument as constant 0. Other options may be provided in subsequent software releases.
• The third area is the issue of which problem sizes or dimensions give optimal performance in a particular
implementation. See the Performance Tips subsection.
EXAMPLES
The following examples are for UNICOS systems only.
Example 1: Initialize the TABLE array in preparation for doing a two-dimensional FFT of size 128 by 256.
In this case, only the isign, n1, n2, and table arguments are used; you can use dummy arguments or zeros for
other arguments.
REA L TAB LE( 100 + 2*( 128 + 256 ))
CAL L SCF FT2 D (0, 128 , 256 , 0.0 , DUM MY, 1, DUMMY, 1,
& TAB LE, DUM MY, 0)
Example 2: X is a real array of size (0:128, 0: 255), and Y is a complex array of dimension (0:64, 0:255).
The first 128 elements of each column of X contain data; for performance reasons, the extra element forces
the leading dimension to be an odd number. Take the two-dimensional FFT of X and store it in Y. Initialize
the TABLE array, as in example 1.
REA L X(0 :12 8, 0:2 55)
COM PLE X Y(0 :64 , 0:2 55)
REA L TAB LE( 100 + 2*( 128 + 256 ))
REA L WOR K(5 12* 256 )
...
CAL L SCF FT2 D(0 , 128 , 256 , 1.0 , X, 129 , Y, 65, TAB LE, WOR K, 0)
CAL L SCF FT2 D(1 , 128 , 256 , 1.0 , X, 129 , Y, 65, TAB LE, WORK, 0)
Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
factor 1/(128*256) is used. Assume that the TABLE array is initialized already.
CAL L CSF FT2 D(- 1, 128 , 256, 1.0/(1 28.0*2 56.0), Y, 65,
& X, 130 , TAB LE, WOR K, 0)
Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. No change is needed in the subroutine calls.
REA L X(1 29, 256 )
COM PLE X Y(6 5, 256 )
...
CAL L SCF FT2 D(0 , 128 , 256 , 1.0 , X, 129 , Y, 65, TAB LE, WOR K, 0)
CAL L SCF FT2 D(1 , 128 , 256 , 1.0 , X, 129 , Y, 65, TAB LE, WORK, 0)
Example 5: Do the same computation as in example 4, but equivalence the input and output arrays to save
storage space. In this case, a row must be added to X, because it is equivalenced to a complex array.
Assume that TABLE is already initialized.
REA L X(1 30, 256 )
COM PLE X Y(6 5, 256 )
EQU IVA LEN CE ( X(1 , 1), Y(1 , 1) )
...
CAL L SCF FT2 D(1 , 128 , 256 , 1.0 , X, 130 , Y, 65, TAB LE, WORK, 0)
SEE ALSO
CCFFT(3S), CCFFT2D(3S), CCFFT3D(3S), CCFFTM(3S), SCFFT(3S), SCFFT3D(3S), SCFFTM(3S)
NAME
SCFFT3D, CSFFT3D – Applies a multitasked three-dimensional real-to-complex Fast Fourier Transform
(FFT)
SYNOPSIS
CALL SCFFT3D (isign, n1, n2, n3, scale, x, ldx, ldx2, y, ldy, ldy2, table, work, isys)
CALL CSFFT3D (isign, n1, n2, n3, scale, x, ldx, ldx2, y, ldy, ldy2, table, work, isys)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.
DESCRIPTION
SCFFT3D computes the three-dimensional Fast Fourier Transform (FFT) of the real matrix X, and it stores
the results in the complex matrix Y. CSFFT3D computes the corresponding inverse transform.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way. First,
the function of SCFFT3D is described. Suppose the arrays are dimensioned as follows:
REAL X(0 :ld x-1 , 0:l dx2 -1, 0:n 3-1 )
COMPLE X Y(0 :ld y-1 , 0:l dy2 -1, 0:n 3-1 )
j =0 j =0 j =0
1 2 3
k 1 = 0, . . ., n1 ⁄ 2
for k 2 = 0, . . ., n2−1
k 3 = 0, . . ., n3−1
where:
isign . 2 . π . i
ω1 = e n1
i = +√−1
isign . 2 . π . i
ω2 = e n2
π = 3.14159. . .
isign . 2 . π . i
ω3 = e n3
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make these
routines compute any of the various possible definitions, however, by choosing the appropriate values for
isign and scale.
The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 / (n1 . n2 . n3 . scale).
In particular, if you use isign = +1 and scale = 1.0 for the forward FFT, you can compute the inverse FFT
by isign = – 1 and
1
scale =
n1 . n2 . n3
SCFFT3D is very similar in function to CCFFT3D(3S), but it takes the real-to-complex transform in the first
dimension, followed by the complex-to-complex transform in the second and third dimensions.
CSFFT3D does the reverse. It takes the complex-to-complex FFT in the third and second dimensions,
followed by the complex-to-real FFT in the first dimension.
See the SCFFTM(3S) man page for more information about real-to-complex and complex-to-real FFTs. The
three dimensional analog of the conjugate formula is as follows:
Yk , k , k = Y n1 – k , n2 – k , n3 – k
1 2 3 1 2 3
for n1 / 2 < k 1 ≤ n1 - 1
0 ≤ k 2 ≤ n2 - 1
0 ≤ k 3 ≤ n3 - 1
where the notation z represents the complex conjugate of z.
Thus, you have to compute only (slightly more than) half out the output values, namely:
Yk , k , k
1 2 3
for 0 ≤ k 1 ≤ n1 / 2
0 ≤ k 2 ≤ n2 - 1
0 ≤ k 3 ≤ n3 - 1
UNICOS/mk systems only
If the values of either n1, n2, or n3 are prime or not factorizable into powers of 2, 3 and 5 significant
improvements in computational time can be obtained by using the following initializations of isys, which is a
vector of length 4.
The first element of isys indicates the dimension of the problem, that is, isys(1) = 3. The next three elements
of isys indicate if the lengths n1, n2 and n3 are factorizable into powers of 2, 3 and 5. isys(2) is set to 0 if
n1 is factorizable into powers of 2, 3 and 5 and is set to 1 otherwise. Similarly, isys(3) and isys(4) are set to
zero if n2 and n3 are factorizable into powers of 2, 3 and 5 and set to 1 if they are not.
For example, if n1 = 256, n2 = 240, and n3 = 254, then the best computational time is obtained by setting
the following:
isys(1) = 3 (dimension of the problem)
isys(2) = 0
isys(3) = 0
isys(4) = 1
If the numbers n1, n2, and n3 are not known ahead of time, then isys(2), isys(3), and isys(4) could be
initialized to 0 and the routine would compute correct result, albeit slowly, if either n1, n2, or n3 were not
factorizable into powers of 2, 3, and 5.
The storage requirements for the vector table depend on the values of the isys vector.
UNICOS systems
The isys parameter is used to choose between two multitasking strategies and correspondingly different
amounts of workspace to be provided. A brief discussion about the significance of the isys parameter is
provided in the following argument list.
These routines have the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, n3, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Transform size in the first dimension. If n1 is not positive, SCFFT3D returns without
computing a transform.
n2 Integer. (input)
Transform size in the second dimension. If n2 is not positive, SCFFT3D returns without
computing a transform.
n3 Integer. (input)
Transform size in the third dimension. If n3 is not positive, SCFFT3D returns without
computing a transform.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale after taking the Fourier
transform, as defined previously.
x SCFFT3D: Real array of dimension (0:ldx– 1, 0:ldx2– 1, 0:n3– 1). (input)
CSFFT3D: Complex array of dimension (0:ldx– 1, 0:ldx2– 1, 0:n3– 1). (input)
Array of values to be transformed.
ldx Integer. (input)
The first dimension of x, as it was declared in the calling program (the leading dimension of x).
SCFFT3D: ldx ≥ MAX(n1, 1).
CSFFT3D: ldx ≥ MAX(n1/2 + 1, 1).
ldx2 Integer. (input)
The second dimension of x, as it was declared in the calling program. ldx2 ≥ MAX(n2, 1).
isys = 0 or 1 depending on the amount of workspace the user can provide to the routine.
UNICOS/mk systems: Integer array of dimension 4. (input)
isys(1) = 3
isys(2) = 0 (if n1 is factorizable into powers of 2, 3 and 5)
1 ( if n1 is not factorizable into powers of 2, 3 and 5)
isys(3) = 0 (if n2 is factorizable into powers of 2, 3 and 5)
1 ( if n2 is not factorizable into powers of 2, 3 and 5)
isys(4) = 0 (if n3 is factorizable into powers of 2, 3 and 5)
1 ( if n3 is not factorizable into powers of 2, 3 and 5)
NOTES
The following notes are for UNICOS systems only. SCFFT3D and CSFFT3D on UNICOS/mk systems
provide the functionality of PSCFFT3D and PCSFFT3D on a single PE. For notes on SC and CSFFT3D on
UNICOS/mk systems, see PSCFFT3D(3S).
SCFFT3D is the generalization of SCFFT2D(3S) to three dimensions. All the notes for SCFFT2D(3S)
apply, with the obvious modifications for three dimensions.
Algorithm
SCFFT3D uses a routine similar to SCFFTM(3S) to do multiple FFTs first on all columns of the input
matrix, then uses a routine similar to CCFFTM(3S) on all rows of the result, and then on all planes of that
result. See SCFFTM(3S) and CCFFTM(3S) for more information about the algorithms used.
EXAMPLES
The following examples are for UNICOS systems only. In all the examples shown below isys is set to 0.
For better performance on small size 3D FFTs, setting isys = 1 and providing adequate workspace would
yield better performance.
Example 1: Initialize the TABLE array in preparation for doing a three-dimensional FFT of size 128 by 128
by 128. In this case only the isign, n1, n2, n3, and table arguments are used; you can use dummy arguments
or zeros for other arguments.
REA L TAB LE(100 + 2*(128 + 128 + 128 ))
CAL L SCF FT3 D (0, 128 , 128 , 128 , 0.0 , DUM MY, 1, 1, DUMMY, 1, 1,
& TAB LE, DUM MY, 0)
Example 2: X is a real array of size (0:128, 0:128, 0:128). The first 128 elements of each dimension
contain data; for performance reasons, the extra element forces the leading dimensions to be odd numbers. Y
is a complex array of dimension (0:64, 0:128, 0:128). Take the three-dimensional FFT of X and store it in
Y. Initialize the TABLE array, as in example 1.
Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
factor 1/(128**3) is used. Assume that the TABLE array is initialized already.
CAL L CSF FT3D(-1, 128 , 128 , 128 , 1.0 /12 8.0**3 , Y, 65, 129,
& X, 130 , 129 , TAB LE, WOR K, 0)
Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. No change is made in the subroutine calls.
REA L X(129, 129, 129)
COM PLE X Y(65, 129 , 129 )
REA L TABLE( 100 + 2*(128 + 128 + 128))
REA L WORK(5 12* 128)
...
CAL L SCFFT3 D(0 , 128 , 128 , 128 , 1.0 , X, 129, 129,
& Y, 65, 129 , TAB LE, WOR K, 0)
CAL L SCF FT3D(1, 128, 128, 128 , 1.0 , X, 129 , 129 ,
& X, 129 , 129 , TAB LE, WOR K, 0)
Example 5: Do the same computation as in example 4, but equivalence the input and output arrays to save
storage space. Assume that the TABLE array is initialized already.
REA L X(130, 129 , 129 )
COM PLE X Y(65, 129 , 129 )
EQU IVA LENCE (X( 1, 1, 1), Y(1 , 1, 1))
...
CAL L SCFFT3 D(1 , 128 , 128 , 128 , 1.0 , X, 130 , 129 ,
& Y, 65, 129 , TAB LE, WORK, 0)
SEE ALSO
CCFFT(3S), CCFFT2D(3S), CCFFT3D(3S), CCFFTM(3S), SCFFT(3S), SCFFT2D(3S), SCFFTM(3S)
NAME
SCFFTM, CSFFTM – Applies multiple real-to-complex or complex-to-real Fast Fourier Transforms (FFTs)
SYNOPSIS
CALL SCFFTM (isign, n, lot, scale, x, ldx, y, ldy, table, work, isys)
CALL CSFFTM (isign, n, lot, scale, x, ldx, y, ldy, table, work, isys)
IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data
DESCRIPTION
SCFFTM computes the FFT of each column of the real matrix X, and it stores the results in the
corresponding column of the complex matrix Y. CSFFTM computes the corresponding inverse transforms.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way. First,
the function of SCFFTM is described. Suppose that the arrays are dimensioned as follows:
REAL X(0 :ld x-1 , 0:lot- 1)
COMPLE X Y(0 :ld y-1 , 0:l ot- 1)
In fact, in many applications, the second half of the complex output data is never explicitly computed or
stored. Likewise, you must supply only the first half of the complex data in each column has to be supplied
for the complex-to-real FFT.
Another implication of FFT theory is that for real input data, the first output value in each column, Y(0, L),
will always be a real number; therefore, the imaginary part will always be 0. If n is an even number, Y(n / 2,
L) will also be real and have 0 imaginary parts.
Complex-to-real FFTs
Consider the complex-to-real case. The effect of the computation is given by the preceding formula, but
with X complex and Y real.
In general, the FFT transforms a complex sequence into a complex sequence; however, in a certain
application you may know the output sequence is real, perhaps because the complex input sequence was the
transform of a real sequence. In this case, you can save about half of the computational work.
According to the theory of Fourier transforms, for the output sequence, Y, to be a real sequence, the
following identity on the input sequence, X, must be true:
n
X k,L = X n– k,L for ≤ k ≤ n– 1
2
n
And, in fact, the input values X k,L for k > do not have to be supplied, because they can be inferred from
2
the first half of the input.
Thus, in the complex-to-real routine, CSFFTM, the arrays can be dimensioned as follows:
COM PLE X X(0 :ld x-1 , 0:l ot- 1)
REA L Y(0 :ld y-1 , 0:l ot- 1)
NOTES
Table Initialization
The table array contains the trigonometric tables used in calculation of the FFT. You must initialize this
table by calling the routine with isign = 0 prior to doing the transforms. table does not have to be
reinitialized if the value of the problem size, n, does not change.
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared (for SCFFTM):
REAL X(0:ld x-1 , 0:l ot- 1)
COM PLE X Y(0 :ldy-1 , 0:l ot-1)
No change is made in the calling sequence, however, if you prefer to use the more customary Fortran style
with subscripts starting at 1. The same values of ldx and ldy would be passed to the subroutine even if the
input and output arrays were dimensioned as follows:
REAL X(l dx, lot )
COMPLE X Y(l dy, lot )
Performance Tips
This routine computes an FFT for any value of n, but the performance for a given value of n depends on the
prime factorization of n. This fact is characteristic of all FFT algorithms.
Fastest performance is realized when n is a power of 2; in which case, the number of floating-point
operations is approximately:
5 .
lot . n . log 2 (n)
2
If n contains factors of 3, computation time is slightly longer, because more floating-point operations are
required. It is longer still if n contains powers of 5. Slowest performance is when n is a prime number, in
2
which case, the number of floating-point operations is approximately 4 . lot . n .
The kernel routines are optimized for values of n that are products of powers of 2, 3, and 5. (Because the
kernel routines have a special case for multiples of 4, even powers of 2 will be slightly faster than odd
powers of 2.)
In the UNICOS implementation, to avoid memory bank conflicts, it is very important to make the leading
dimensions of the arrays odd numbers (or, if that is not possible, make them an odd multiple of 2). To
attain best vectorization performance, the lot size should be at least 64, and preferably it should be a multiple
of 64.
Neither SCFFTM nor CSFFTM is optimized on UNICOS/mk systems.
Implementation-dependent Items
The Standard FFT routines were designed so that they could be implemented efficiently on many different
architectures. The calling sequence is the same in any implementation. Certain details, however, depend on
the particular implementation. These details are confined to three areas:
• The first area is the size of the table and work arrays. Different systems may need different sizes. No
change is required to the subroutine call, but you might have to change the array sizes in the
DIMENSION or type statements that declare the arrays.
• The second area is the isys parameter array, an argument that gives certain implementation-specific
information. All features and functions of the FFT routines that are specific to any particular
implementation are confined to this isys array. On any implementation, you can use the default values by
specifying an argument value of 0.
No special options are supported; therefore, you can always specify an isys argument as constant 0. Other
options may be provided in subsequent software releases.
• The third area is the issue of which problem sizes or dimensions give optimal performance in a particular
implementation. See the Performance Tips subsection.
EXAMPLES
Example 1: Initialize the complex array TABLE in preparation for doing an FFT of size 128. In this case
only the isign, n, and table arguments are used; you may use dummy arguments or zeros for the other
arguments in the subroutine call.
REAL TABLE( 100 + 2*1 28)
CALL SCFFTM (0, 128 , 1, 0.0 , DUM MY, 1, DUMMY, 1,
& TAB LE, DUMMY, 0)
Example 2: X is a real array of dimension (0:128, 0:55), and Y is a complex array of dimension (0:64,
0:55). The first 128 elements in each column of X contain data; the extra element forces an odd leading
dimension. Take the FFT of the first 50 columns of X and store the results in the first 50 columns of Y.
Before taking the FFT, initialize the TABLE array, as in example 1.
REA L X(0:128, 0:5 5)
COM PLEX Y(0:64, 0:5 5)
REA L TAB LE(100 + 2*1 28)
REA L WORK((2*1 28 + 4)* 50)
...
CALL SCFFTM (0, 128 , 50, 1.0 , X, 129, Y, 65, TAB LE, WORK, 0)
CALL SCFFTM (1, 128 , 50, 1.0 , X, 129, Y, 65, TAB LE, WORK, 0)
Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
factor 1/128 is used. Assume that the TABLE array is initialized already.
CALL CSFFTM (-1 , 128 , 50, 1.0 /128.0 , Y, 65, X, 129 ,
& TABLE, WOR K, 0)
Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. No change is made in the subroutine calls.
Example 5: Do the same computation as in example 4, but equivalence the input and output arrays to save
storage space. In this case, a row must be added to X, because it is equivalenced to a complex array. The
leading dimension of X is two times an odd number; therefore, memory bank conflicts are minimal. Assume
that TABLE is initialized already.
REAL X(1 30, 56)
COM PLEX Y(6 5, 56)
EQU IVALEN CE ( X(1, 1), Y(1 , 1) )
...
CAL L SCF FTM(1, 128, 50, 1.0 , X, 130 , Y, 65, TABLE, WOR K, 0)
SEE ALSO
CCFFT(3S), CCFFTM(3S), SCFFT(3S)
NAME
SCNVL1D – Computes a real one-dimensional (1D) convolution of two vectors
SYNOPSIS
CALL SCNVL1D (domain, isign, shape, symm, a, m, b, n, c, inc, table, work)
IMPLEMENTATION
UNICOS/mk systems
This routine executes on a single PE and uses only private data.
DESCRIPTION
SCNVL1D computes the convolution of a real filter vector a with a real data vector b, producing the output
vector c.
Let
a = a(1), a(2), . . . , a(m)
b = b(1), b(2), . . . , b(n)
be the filter and data vectors. This routine requires that m ≤ n. If m is greater than n, then a and b must be
interchanged in the calling sequence.
The convolution operation can be defined either with or without a zero-padded data vector. If we assume a
zero-padded data vector, then the convolution output sequence would be the following:
The routine allows the user to choose between computing the convolution product in either the time or
frequency domain. This can be done by setting the character variable domain to either ’F’ or ’f’ to indicate
frequency domain or ’T’ or ’t’ to indicate time domain computation.
If the user chooses to compute the convolution product in the frequency domain, then the variable isign is
used to decide if the trig tables are being initialized or the convolution product is being computed. The
SCNVL1D routine would have to be called twice (like the FFT routines), once to initialize the trig tables and
the second time to actually compute the convolution product.
If the user chooses to compute the convolution in the time domain, then the variable isign is ignored and the
convolution product is computed in the first and only pass. In addition, the two other vectors, table and
work, are also ignored.
SCNVL1D also provides a feature of immense importance in several signal processing applications. It allows
the user to set a decimation rate for the output vector. This is done using the argument inc. If inc is set to
1, then all the output elements are computed, i.e., (n+m– 1) for zero-padded convolution and (n– m+1)
elements for non zero-padded convolution. Setting inc to a positive integer greater than 1 makes the routine
compute every (inc)ˆth element of the convolution output. For example, setting inc = 2 means that the
routine computes every other output element.
For example, if m = 10, n = 200 and shape = ’V’, then setting inc = 1 would result in 200– 10+1 = 191
elements of the output vector being computed and stored in c(1), . . . , c(191). If inc is set to 2, then
trunc((n– m+inc)/inc) = 96 elements of the output vector being computed and stored in c(1), . . . , c(96).
Here, trunc is used to indicate the truncation operation.
Another feature of SCNVL1D that is of importance in certain signal processing applications such as FIR
filters is that the filter vector may be symmetric or unsymmetric. If the filter vector is symmetric, the routine
saves some computation by adding the elements of b that contribute equally to the convolution sum. The
filter vector a can be either even or odd symmetric depending on whether m is even or odd. The character
argument symm can be set to ’S’ or ’s’ to indicate symmetry and ’N’ or ’n’ to indicate that a is
non-symmetric.
This routine has the following arguments:
domain Character*1. (input)
Specifies whether the computation should proceed in the frequency domain or the time domain.
domain = ’F’ or ’f’ for frequency domain computation
domain = ’T’ or ’t’ for time domain computation
isign Integer. (input)
Specifies if the routine should initialize the trig tables or proceed with the computation of the
convolution product in the frequency domain. If the computation is to be performed in the time
domain then isign is ignored.
isign = 0 to set the trig tables
isign = 1 to compute the convolution product
shape Character*1. (input)
Specifies if the computation should proceed with or without a zero padded data vector.
NOTES
The flexibility of choosing between frequency domain and time domain computation of the convolution has
been provided so that the user may experiment between the two and choose the one that is faster for the
particular problem size at hand.
Usually when n is big and m is small, time domain computation is less expensive. If both m and n are large,
then the frequency domain computation is less expensive. For all problem sizes in between, the user is
encouraged to choose between the two modes depending on the numerical complexity of the two algorithms,
the amount of workspace available, and experimentation.
EXAMPLES
If m = 5, n = 10, a = [1 2 3 4 5], and b = [1 2 3 4 5 6 7 8 9 10], then shape = ’F’ and inc = 1 would yield
c = [1 4 10 20 35 50 65 80 95 110 114 106 85 50].
If shape = ’F’ and inc = 3, then c = [1 20 65 110 85].
If shape = ’V’ and inc = 1, then c = [35 50 65 80 95 110].
If shape = ’V’ and inc = 2, then c = [35 65 95].
SEE ALSO
SCNVL2D(3S)
NAME
SCNVL2D – Computes a real two-dimensional (2D) convolution of two matrices
SYNOPSIS
CALL SCNVL2D (domain, isign, shape, symm, A, lda, m1, m2, B, ldb, n1, n2, C, ldc,
inc1, inc2, table, work)
IMPLEMENTATION
UNICOS/mk systems
This routine executes on a single PE and uses only private data.
DESCRIPTION
SCNVL2D computes the convolution of a real filter matrix A with a real data matrix B, producing the output
matrix C.
Let the following be the filter and data matrices:
A = a(j,i) 1≤j≤m1, 1≤i≤m2
B = b(j,i) 1≤j≤n1, 1≤i≤n2
This routine requires that m1 ≤ n1 and m2 ≤ n2. If m1 > n1 and m2 > n2, then A and B must be
interchanged in the calling sequence. The following cases cannot be handled by this routine: m1 > n1 and
m2 ≤ n2, m1 ≤ n1 and m2 > n2.
The convolution operation can be defined either with or without a zero-padded data matrix. If we assume a
zero-padded data matrix, then the convolution output sequence would be the following:
If the user chooses to compute the convolution in the time domain, then the variable isign is ignored and the
convolution product is computed in the first and only pass. In addition, the two other vectors, table and
work, are also ignored.
SCNVL2D also provides a feature of immense importance in several signal processing applications. It allows
the user to set a decimation rate in both dimensions for the output matrix. This is done using the arguments
inc1 and inc2. If inc1 is set to 1, then all the output elements along the first dimension are computed, i.e.,
(n1+m1– 1) for zero-padded convolution and (n1– m1+1) elements for non zero-padded convolution. The
same is the case for inc2 = 1 along the second dimension. Setting inc1 and inc2 to a positive integer greater
than 1 makes the routine compute every (inc1)th and (inc2)th element of the convolution output in the two
dimensions. For example setting inc1 = 1 and inc2 = 2 means that the routine computes every other column
in the convolution product.
For example if m1 = 10, m2 = 12, n1 = 200, n2 = 255, and shape = ’V’, then setting inc1 = 1 and inc2 = 2
would result in (200– 10+1) *trunc((255– 12+2)/2) = 191*122 elements of the output matrix being computed
and stored in c(1,1), . . . , c(191,122). Here trunc() is used to indicate the truncation operation.
Another feature of SCNVL2D that is of importance in certain signal processing applications such as FIR
filters is that the filter matrix may be symmetric or unsymmetric. If the filter matrix is symmetric, the
routine saves some computation by adding the elements of B that contribute equally to the convolution sum.
The filter matrix A can be either even or odd symmetric depending on whether m1 and m2 are even or odd.
The character argument symm can be set to ’S’ or ’s’ to indicate symmetry and ’N’ or ’n’ to indicate that A
is non-symmetric.
One restriction in exploiting the symmetry feature is that both m1 and m2 must be either odd or even. The
user cannot have m1 even and m2 odd if symmetry is to be exploited. If symm = ’S’ is specified when m1 =
10 and m2 = 11, for example, them the routine ignores the symm flag and computes the convolution product
assuming no symmetry.
This routine has the following arguments:
domain Character*1. (input)
Specifies if the computation should proceed in the frequency domain or the time domain.
domain = ’F’ or ’f’ for frequency domain computation
domain = ’T’ or ’t’ for time domain computation
isign Integer. (input)
Specifies if the routine should initialize the trig tables or proceed with the computation of the
convolution product in the frequency domain. If the computation is to be performed in the time
domain, isign is ignored. isign = 0 to set the trig tables
isign = 1 to compute the convolution product
shape Character*1. (input)
Specifies if the computation should proceed with or without a zero-padded data vector.
shape = ’F’ or ’f’ for a zero-padded convolution product
shape = ’V’ or ’v’ for a convolution with valid data (no zero padding)
NOTES
The flexibility of choosing between frequency domain and time domain computation of the convolution has
been provided so that the user may experiment between the two and choose the one that is faster for the
particular problem size at hand.
Usually when n1 and n2 are big and m1 and m2 are small, time domain computation is less expensive. If
m1, m2, n1, and n2 and n are large, the frequency domain computation is less expensive. For all problem
sizes in between, the user is encouraged to choose between the two modes depending on the numerical
complexity of the two algorithms, the amount of workspace available, and experimentation.
EXAMPLES
If
A = 1 2 3
2 3 1
3 1 2
and
B = 1 2 3 4 5
5 1 2 3 4
4 5 1 2 3
3 4 5 1 2
2 3 4 5 1
1 4 10 16 22 22 15
7 18 32 29 41 36 17
17 37 48 51 54 40 23
26 40 60 48 51 28 17
20 43 57 60 48 31 11
13 27 44 41 38 12 5
6 11 19 25 16 11 2
C =
1 4 10 16 22 22 15
17 37 48 51 54 40 23
20 43 57 60 48 31 11
6 11 19 25 16 11 2
48 51 54
57 60 48
SEE ALSO
SCNVL1D(3S)
NAME
SCONV – Performs the convolution of two sequences of real numbers
SYNOPSIS
CALL SCONV (nh, nx, ny, h, x, y)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
SCONV computes the convolution of the filter sequence h with the data sequence x, producing the output
sequence y.
Suppose h and x are two sequences of real numbers, having nh and nx elements, respectively. As is
customary in signal processing applications, let the subscripts start at 0, so
h = h(0), h(1), . . ., h(nh – 1)
x = x(0), x(1), . . ., x(nx – 1)
The "convolution product", y, is the sequence having elements defined by:
y(0) = h(nh– 1) . x(0) + h(nh– 2) . x(1) + . . . + h(0) . x(nh – 1)
y(1) = h(nh– 1) . x(1) + h(nh– 2) . x(2) + . . . + h(0) . x(nh)
y(2) = h(nh– 1) . x(2) + h(nh– 2) . x(3) + . . . + h(0) . x(nh + 1)
This example definition assumes nx > nh.
The precise definition of the convolution is:
Yk = Σ H (nh −1−j ) . x (k +j ) for 0 ≤ k ≤ny −1
0≤j ≤MIN
The number of terms in the output sequence is specified by an argument, ny. If ny < nx, the output sequence
is just truncated. If ny > nx, zeros are appended to the output sequence.
By choosing ny > nx – nh + 1, the routine does what is sometimes called "post-tapered" convolution. The
effect is as though the data sequence, x, were padded on the end with zeros, except that no zeros are actually
stored and no multiplications by zero are actually done.
This routine has the following arguments:
nh Integer. (input)
Specifies the number of elements in the filter sequence, h. nh ≥ 0.
nx Integer. (input)
Specifies the number of elements in the data sequence, x. nx ≥ 0.
ny Integer. (input)
Specifies the number of elements in the output sequence, y. ny ≥ 0.
NOTES
If ny = 0, the routine just returns. If either nh = 0 or nx = 0, the routine will zero the first ny elements in y
and return.
EXAMPLES
SEE ALSO
SCORR(3S), SCORRS(3S)
NAME
SCORR – Performs the correlation of two sequences of real numbers
SYNOPSIS
CALL SCORR (nh, nx, ny, h, x, y)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
SCORR computes the correlation of the filter sequence h with the data sequence x, producing the output
sequence y.
Suppose h and x are two sequences of real numbers, having nh and nx elements, respectively. As is
customary in signal processing applications, let the subscripts start at 0, so
h = h(0), h(1), . . ., h(nh – 1)
x = x(0), x(1), . . ., x(nx – 1)
The "correlation product", y, is the sequence having elements defined by:
y(0) = h(0) . x(0) + h(1) . x(1) + . . . + h(nh – 1) . x(nh – 1)
y(1) = h(0) . x(1) + h(1) . x(2) + . . . + h(nh – 1) . x(nh)
y(2) = h(0) . x(2) + h(1) . x(3) + . . . + h(nh – 1) . x(nh + 1)
This example definition assumes that nx ≥ nh.
The precise definition is as follows:
Yk = Σ H (nh −1−j ) . x (k +j ) for 0 ≤ k ≤ny −1
0≤j ≤min
The number of terms in the output sequence is specified by the argument ny. If ny < nx, the output sequence
is just truncated. If ny > nx, zeros are appended to the output sequence.
By choosing ny > nx – nh + 1, the routine does what is sometimes called "post-tapered" correlation. The
effect is as though the data sequence, x, were padded on the end with zeros, except that no zeros are actually
stored and no multiplications by zero are actually done. This routine has the following arguments:
nh Integer. (input)
Specifies the number of elements in the filter sequence, h. nh ≥ 0.
nx Integer. (input)
Specifies the number of elements in the sequence of data sequence, x. nx ≥ 0.
ny Integer. (input)
Specifies the number of elements in the output sequence, y. ny ≥ 0.
h Real array of dimension (0, nh−1). (input)
Specifies the input sequence of filter values.
NOTES
If ny = 0, the routine returns. If either nh = 0 or nx = 0, the routine will zero the first ny elements in y and
return.
EXAMPLES
SEE ALSO
SCONV(3S), SCORRS(3S)
NAME
SCORRS – Performs the correlation of two sequences of real numbers (symmetric filter)
SYNOPSIS
CALL SCORRS (nh, nx, ny, h, x, y)
IMPLEMENTATION
UNICOS/mk systems
DESCRIPTION
SCORRS computes the correlation of the symmetric filter sequence h with the data sequence x, producing the
output sequence y. The filter, h, is assumed to be symmetric about its middle.
The computation carried out by SCORRS is exactly the same as that done by routine SCORR, with one
exception: the filter, h, is assumed to be symmetric, so only the first half of the elements are accessed. The
values of the second half are inferred from the first half and do not actually have to be supplied by the
calling routine.
To review the definition of correlation (not necessarily assuming a symmetric filter), suppose h and x are two
sequences of real numbers, having nh and nx elements, respectively. As is customary in signal processing
applications, let the subscripts start at 0, so
h = h(0), h(1), . . ., h(nh – 1)
x = x(0), x(1), . . ., x(nx – 1)
The "correlation product", y, is the sequence having elements defined by:
y(0) = h(0) . x(0) + h(1) . x(1) + . . . + h(nh– 1) . x(nh– 1)
y(1) = h(0) . x(1) + h(1) . x(2) + . . . + h(nh– 1) . x(nh)
y(2) = h(0) . x(2) + h(1) . x(3) + . . . + h(nh– 1) . x(nh+1)
This example definition assumes that nx ≥ nh.
The precise definition of correlation is as follows:
Yk = Σ H (nh −1−j ) . x (k +j ) for 0 ≤ k ≤ny −1
0≤j ≤MIN
The SCORRS routine makes the assumption that the filter is symmetric; in other words, that h(nh − j) = h(j),
for 0 ≤ j ≤ nh / 2.
Only the elements h(0) through h (nh/2) are accessed by the routine. The last half of the filter values are not
accessed and do not actually have to be supplied by the calling routine.
The number of terms in the output sequence is specified by an argument, ny. If ny < nx, then the output
sequence is just truncated. If ny > nx, then zeros are appended to the output sequence.
By choosing ny > nx − nh+1, the routine does what is sometimes called "post-tapered" correlation. The
effect is as though the data sequence, x, were padded on the end with zeros, except that no zeros are actually
stored and no multiplications by zero are actually done.
This routine has the following arguments:
nh Integer. (input)
Specifies the number of elements in the filter sequence, h. nh ≥ 0.
nx Integer. (input)
Specifies the number of elements in the data sequence, x. nx ≥ 0.
ny Integer. (input)
Specifies the number of elements in the output sequence, y. ny ≥ 0.
h Real array of dimension (0, nh/2). (input)
Specifies the input sequence of filter values. Only values h(0) through h (nh/2) are accessed; the
second half of the filter values are inferred from the symmetry of h.
x Real array of dimension (0, nx−1). (input)
Specifies the input sequence of data values.
y Real array of dimension (0, ny−1). (output)
Specifies the output sequence.
NOTES
If ny = 0, the routine returns. If either nh = 0 or nx = 0, the routine zeroes the first ny elements in y and
returns.
EXAMPLES
SEE ALSO
SCONV(3S), SCORR(3S)