Quadruple Precision Eigenvalue Calculation Library. QPEigen Ver.1.0

Similar documents
Programming in Fortran 90 : 2017/2018

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Parallel matrix-vector multiplication

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

A Binarization Algorithm specialized on Document Images and Photos

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Conditional Speculative Decimal Addition*

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Cluster Analysis of Electrical Behavior

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Mathematics 256 a course in differential equations for engineering students

The Codesign Challenge

Solving two-person zero-sum game by Matlab

Support Vector Machines

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

S1 Note. Basis functions.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Lecture 4: Principal components

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Specifications in 2001

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

AP PHYSICS B 2008 SCORING GUIDELINES

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar

CMPS 10 Introduction to Computer Science Lecture Notes

AADL : about scheduling analysis

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Random Kernel Perceptron on ATTiny2313 Microcontroller

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

Hermite Splines in Lie Groups as Products of Geodesics

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Introduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

LECTURE : MANIFOLD LEARNING

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

APPLICATION OF A COMPUTATIONALLY EFFICIENT GEOSTATISTICAL APPROACH TO CHARACTERIZING VARIABLY SPACED WATER-TABLE DATA

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Real-time Scheduling

S.P.H. : A SOLUTION TO AVOID USING EROSION CRITERION?

Review of approximation techniques

Testing Algorithms and Software. Software Support for Metrology Good Practice Guide No. 16. R M Barker, P M Harris and L Wright NPL REPORT DEM-ES 003

High-Boost Mesh Filtering for 3-D Shape Enhancement

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Load Balancing for Hex-Cell Interconnection Network

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

NGPM -- A NSGA-II Program in Matlab

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Problem Set 3 Solutions

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

DLK Pro the all-rounder for mobile data downloading. Tailor-made for various requirements.

Module Management Tool in Software Development Organizations

Design and Analysis of Algorithms

A fault tree analysis strategy using binary decision diagrams

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

ANSYS FLUENT 12.1 in Workbench User s Guide

Feature Reduction and Selection

MATHEMATICS FORM ONE SCHEME OF WORK 2004

X- Chart Using ANOM Approach

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Analysis on the Workspace of Six-degrees-of-freedom Industrial Robot Based on AutoCAD

Lecture 5: Multilayer Perceptrons

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,

Classification / Regression Support Vector Machines

Outline. Midterm Review. Declaring Variables. Main Variable Data Types. Symbolic Constants. Arithmetic Operators. Midterm Review March 24, 2014

On Some Entertaining Applications of the Concept of Set in Computer Science Course

3D vector computer graphics

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

A Computer Vision System for Automated Container Code Recognition

mquest Quickstart Version 11.0

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming


AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

A Facet Generation Procedure. for solving 0/1 integer programs

Multitasking and Real-time Scheduling

Computer Physics Communications

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Installation Instructions. METRAwin Version 8/ Calibration Software

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Lecture 13: High-dimensional Images

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

RADIX-10 PARALLEL DECIMAL MULTIPLIER

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Transcription:

Quadruple Precson Egenvalue Calculaton Lbrary QPEgen Ver.1.0

User s Manual Feb, 2015 Japan Atomc Energy Agency Contents 1 Overvew... 1 2 Matrx dagonalzaton... 2 3 Quadruple precson algorthm... 2 4 References... 3 5 Drectory structure... 3 6 Requred software... 4 7 Installaton... 4 8 Verfcaton and performance evaluaton programs... 6 9 Sample programs... 6 10 Subroutne detals... 6 2

1 Overvew The R&D Offce of Smulaton Technology at the Center for Computatonal Scence & e-systems n the Japan Atomc Energy Agency has been engaged n research and development (R&D) for computatonal technques relevant to one of the R&D targets n ts Md-Term Plan for the second md-term, "developng hgh-precson smulaton technologes that can be used for clarfcaton of degradaton phenomena of the structural materals of a nuclear reactor, predcton of the materal propertes of fuel-related actnde compounds, and clarfcaton of the relatonshp between the structures and functons of hgh effcency thermoelectrc materals, electrc power supply materals, and superconductng materals." In the aforementoned plan, t s a common challenge of computatonal scence to develop numercal calculaton technques for use n the study of materal propertes, and t s ndspensable to develop methods to accurately calculate an electronc state based on quantum mechancs (quantum many-body problem). In order to accurately solve a quantum many-body problem, a very large degree of freedom must be handled wthout approxmaton; therefore, t s necessary to perform large-scale numercal calculatons whle mantanng suffcent precson. At present, large-scale computaton, such as solvng a quantum many-body problem, s made possble by usng a parallel computer. As long as the total amount of the avalable memory space collected from each processor s suffcently large, a problem as large as the computatonal resources allow s approachable. In contrast, a sgnfcant amount of computatonal error accumulates durng an actual large-scale calculaton for obtanng accurate quantum states because an enormous number of operatons s requred to do so. (A computer calculates wth a certan number of sgnfcant dgts. In every calculaton, a roundng error s added. The roundng errors accumulate as the number of calculatons ncreases.) Untl recently, the scale of smulatons was not suffcently large to be problematc n terms of error, but t has been ponted out that, when a smulaton that requres a very large scale computer such as the K computer to run at ts maxmum performance s conducted, the roundng errors may add up sgnfcantly, and the smulaton result may lose most of ts sgnfcant dgts. (It was actually observed that the result of an egenvalue problem that took all 4096 CPU nodes of a prevous verson of the Earth Smulator to calculate had only a couple of sgnfcant dgts.) Under the stuaton descrbed above, extendng an egenvalue calculaton routne, one of the routnes that perform basc operatons and are used frequently n computer 1

smulatons, to quadruple precson s an ndspensable R&D actvty for facltatng hgh-precson very-large-scale parallel smulatons n the future. The prorty of ths R&D actvty can be judged as very hgh because ths problem s unversal and applcable to all large-scale smulatons. Extendng a routne to quadruple precson has already been proved effectve n our project to convert Basc Lnear Algebra Subprograms (BLAS) to quadruple precson, whch was conducted pror to ths project. We have developed EgenK, a precson verson of an egenvalue calculaton lbrary used for calculatng the egenvalues and egenvectors of a real symmetrc and a Hermtan. When the egenvalues and egenvectors of a symmetrc are calculated wth the EgenK lbrary, t s possble to select whether to use the trdagonal algorthm or the pentadagonal algorthm. In ths work, we present three lbrares, whch are the quadruple precson verson of the EgenK lbrary routnes. The frst s the quadruple precson egenvalue calculaton lbrary, whch uses the trdagonal algorthm to calculate the egenvalues and egenvectors of a real symmetrc (herenafter referred to as "QPEgen_s lbrary"). The second s the "QPEgen_sx lbrary." It s essentally smlar to the QPEgen_s lbrary, except for the fact that t uses the pentadagonal algorthm. The thrd s the QPEgen_h lbrary. Ths lbrary calculates the egenvalues and egenvectors of a Hermtan wth the trdagonal algorthm. Ths document descrbes how to use the QPEgen_s,QPEgen_sx and QPEgen_h lbrary (herenafter collectvely referred to as QPEgen lbrares). 2 Matrx dagonalzaton Matrx dagonalzaton s essentally equvalent to egenvalue calculaton and s frequently used n structural analyss and quantum mechancs calculatons. Egenvalue calculaton s a process of calculatng an egenvalue and egenvector that satsfy when a s gven. QPEgen_s and QPEgen_sx are lbrares used for calculatng the egenvalues of a real symmetrc. When A s a Hermtan, QPEgen_h can be used. 3 Quadruple precson algorthm IEEE 754- complant quadruple precson (128 bt) floatng pont operatons have already been mplemented n some languages ncludng the latest Fortran. However, 2

hardware has not yet been optmzed for those operatons, and they are often executed as arbtrary precson calculatons. As a result, ther calculaton speed s extremely slow compared to precson operatons. In order to mprove both the precson and the speed of the calculaton, QPEgen lbrares use the - algorthm proposed by D.H. Baley. In ths algorthm, one hgh-precson number s represented by a combnaton of two precson numbers, and a quadruple precson operaton s mplemented by combnng precson operatons. The range and precson of the numerc representaton used n the - algorthm s slghtly worse than those of genune quadruple precson but s stll almost that of standard precson. Ths algorthm allows calculatons on such hgh-precson numbers usng just a combnaton of precson operatons and therefore s hghly effectve for mplementng a large-scale hgh-precson smulaton. 4 References Baley, H.D. Hgh-Precson Software Drectory. http://crd-legacy.lbl.gov/~dhbaley/mpdst. Susumu Yamada, Narmasa Sasa, Toshyuk Imamura, and Masahko Machda: Introducton to quadruple precson basc lnear algebra routnes QPBLAS and ts applcaton, IPSJ SIG Techncal Report, Vol. 2012-HPC-137, No. 23, pp. 1-6, 2012. 5 Drectory structure The drectory structure n the medum s descrbed below. There are three top drectores, Egen_s-verson, Egen_sx-verson, and Egen_h-verson, correspondng to each of the lbrares. Egen_s-verson has the followng subdrectores. The other lbrares also have smlar drectory structures. Drectory name Descrpton ddmp Stores a set of quadruple precson (DD) MPI communcaton lbrary fles ncludng source fles. 3

ddblas ddlapack ddscalapack ddegen_s ddegen_s_test egen_s egen_s_test sample bench verfy Stores a set of DDBLAS fles ncludng source fles. Stores a set of DDLAPACK fles ncludng source fles. Stores a set of DDScaLAPACK fles ncludng source fles. Stores a set of DDEgen_s fles ncludng source fles. Stores a set of programs that run quadruple precson (DD) tests. Stores a set of precson Egen_s fles ncludng source fles. Stores a set of programs that run precson Egen_s tests. Stores a set of programs that run quadruple precson (DD) sample routnes. Stores a set of programs that run quadruple precson (DD) benchmark routnes. Stores a set of programs that run quadruple precson (DD) verfcaton routnes. 6 Requred software The QPEgen lbrary uses DDBLAS, BLAS, quadruple precson ScaLAPACK (herenafter "DDScaLAPACK"), orgnal ScaLAPACK, quadruple precson LAPACK (herenafter "DDLAPACK"), orgnal LAPACK, and the quadruple precson MPI communcaton lbrary. Therefore, you must specfy the approprate optons to lnk DDBLAS, DDScaLAPACK, and DDLAPACK when you lnk (when load modules are created). Those optons are specfed n the confgure command. 7 Installaton Enter the followng commands to a termnal to create a Makefle that matches a user envronment and then comple. (The followng examples are for QPEgen_s. For the other lbrares, replace _s wth the approprate suffxes.) tar xzf QPEgen_s-verson.tar.gz cd QPEgen_s-verson./confgure optons 4

Make The major optons you may consder usng n the confgure command and ther meanngs are as follows. FC=f90_compler CC=c_compler FCFLAGS=optons CFLAGS=optons LDFLAGS=optons --prefx=path --dsable-openmp Specfes the Fortran compler. Specfes the C compler. Specfes Fortran comple optons. Specfes parallel computaton and optmzaton settngs. Specfes C comple optons. Specfes lnk optons. Changes the nstall drectory. Dsables parallel processng va OpenMP. If ths opton s not specfed, parallel processng s enabled usng OpenMP. Examples of the mnmum optons of the confgure command necessary for comple are as follows. Example: When GNU C and GNU Fortran are used./confgure CC=mpcc FC=mpf90 Example: When Intel C and Intel Fortran are used./confgure CC=mpcc FC=mpfort CFLAGS="-O2" --dsable-openmp LIBS="-L/usr/lb/gcc -lgfortran" The followng lst shows optons for the make command. Make make all make lb make test Smple "make" s the same as "make all". Comples all source codes. Comples only the lbrares. Comples only the test programs. Note that f the lb folder does not contan dependent lbrares (ddblas, blas, lapack, and scalapack), an error s output and then stops. 5

If other dependent lbrares (ddmp, ddblas, ddlapack, ddscalapack, and ddegen_s...), whch are ncluded n the package, cannot be found, "make lb" s automatcally executed. make sample Comples sample programs. make bench Comples benchmark programs. make verfy Comples verfcaton programs. make egen_s_test Comples only egen_s_test. make clean Deletes all compled fles. make clean-lb Deletes all fles related to lbrares. make clean-test Deletes all fles related to test programs. make clean-[programs] Deletes all fles related to the specfc programs(sample,bench verfy,egen_s_test ). If the make s successful, the followng lbrares and programs are created. ddlapack/lbddlapack.a DDLAPACK lbrary ddmp/lbmpfdd1.a DDMPI lbrary ddmp/lbmpfdd2.a DDMPI lbrary ddscalapack/lbddscalapack.a DDScaLAPACK lbrary egen_s/lbegen_s.a Double precson Egen lbrary ddegen_s/lbddegen_s.a Double- quadruple precson Egen lbrary egen_s_test/test_frank_d Double precson Frank egenvalue verfcaton program egen_s_test/test_random_d Double precson random egenvalue verfcaton program ddegen_s_test/test_frank_d Double- quadruple precson Frank egenvalue verfcaton program ddegen_s_test/test_random_d Double- quadruple precson random egenvalue verfcaton program sample/frank5_sample_dd Sample program for calculatng the egenvalues of the Frank of order 5. sample/hlbert5_sample_dd Sample program for calculatng the egenvalues of the Hlbert of order 5. verfy/ddegen_s_verfy Verfcaton program for calculatng egen values 6

and egen vectors of the real symmetrc. bench/bench_frank_ Benchmark program for calculatng the egenvalues of the Frank. bench/bench_randam_ Benchmark program for calculatng the egenvalues of the Randam data. 8 Verfcaton and performance evaluaton programs After you follow the procedure n 7 and fnsh the comple process, enter the followng commands. (The followng examples are for QPEgen_s. For the other lbrares, replace _s wth the approprate suffxes) cd ddegen_s_test./test_frank_dd (executng the Frank test)./test_random_dd (executng the random test) When you execute a test program, the accuracy of the calculaton s output to the standard output. You can determne whether a calculaton s performed n quadruple precson based on the accuracy of the calculaton. The accuracy of the calculaton s calculated as follows. (Accuracy of calculaton) = w n A x n max Ax w n : Egenvalues calculated by numercal soluton : Matrx set up for precson verfcaton : Egenvectors calculated by numercal soluton n x n Fgure 8-1 shows an example of the standard output. The output of the accuracy of the calculaton s shown n the red box. 7

Fgure 8-1 Example of Standard Output after Executng a Test Program For nformaton about the procedure for callng each routne and a descrpton of the arguments, see 0. 9 Sample programs The sample drectory contans some QPEgen sample programs wrtten n Fortran77. frank5_sample_dd hlbert5_sample_dd Dagonalzng the Frank of order 5 n - quadruple precson Dagonalzng the Hlbert of order 5 n - quadruple precson 9.1 Dagonalzng the Frank (frank5_sample_dd) Frank matrces are ll-condtoned matrces wth known egenvalues and are used n a benchmark test for the numercal calculaton of egenvalues. The Frank of order 5 s as follows. A quadruple precson s gven by substtutng the elements of the Frank for the hgh word and 0.0d for the low word. The cstab_get_optdm subroutne s called beforehand to obtan the sze of the necessary work area. The dd_adjust_s subroutne s called to copy the to the work area, and then the ddegen_* subroutne s called to calculate the egenvalues and egenvectors. 9.2 Dagonalzng the Hlbert (hlbert5_sample_dd) Hlbert matrces are typcal badly condtoned matrces that are often used n benchmark tests for numercal calculatons. Hlbert of order 5 s as follows. 8

A quadruple precson s gven by substtutng the elements of the Hlbert as the results of quadruple precson dvsons for the hgh word and low word. The cstab_get_optdm subroutne s called beforehand to obtan the sze of necessary work area. The dd_adjust_s subroutne s called to copy the to the work area, and then the ddegen_s subroutne s called to calculate the egenvalues and egenvectors. 9.3 Executon of sample programs Those sample programs have already been compled durng the make, and you can, for example, execute frank5_sample_dd by enterng as follows. (The followng examples are for QPEgen_s. For the other lbrares, replace _s wth the approprate suffxes) cd sample./frank5_sample_dd If you want to comple ths program yourself, perform, for example, the followng. mpfort frank5_sample_dd.f -I../ddegen_s../ddegen_s/lbddEgen_s.a../ddscalapack/lbddscalapack.a../ddlapack/lbddlapack.a../ddmp/lbMpfdd1.a../ddmp/lbMpfdd2.a../../lb/lbddblas.a../../lb/lbscalapack.a../../lb/lblapack.a../../lb/lbblas.a The source code of the sample program "frank5_sample_dd.f" s as follows. program frank5_sample_dd use ddcommuncaton_s, only : egen_nt &, egen_free 9

mplct precson (a-h,o-v,x-z)!! precson, allocatable :: ah(:,:), al(:,:) precson, ponter :: bh(:), bl(:) precson, ponter :: zh(:), zl(:) precson, ponter :: wh(:), wl(:) logcal exst nclude 'mpf.h' nclude 'trd.h' call mp_nt(err) call mp_comm_rank(mp_comm_world,$nod,err) call mp_comm_sze(mp_comm_world,$nnod,err) n=5 allocate(ah(n,n),al(n,n)) call egen_nt(2) NPROW = sze_of_col NPCOL = sze_of_row nx = ((n-1)/nprow+1) call CSTAB_get_optdm(nx, 2, 1, 2, nm)! call CSTAB_get_optdm(nx, 6, 16*4, 16*4*2, nm) call egen_free(0) NB = 32! NB = 64+32 nmz = ((n-1)/nprow+1) nmz = ((nmz-1)/nb+1)*nb+1 nmw = ((n-1)/npcol+1) nmw = ((nmw-1)/nb+1)*nb+1 larray = MAX(nmz,nm)*nmw allocate( bh(larray), bl(larray),zh(larray),zl(larray), 10

& wh(n),wl(n), stat=stat) f(stat.ne.0) then prnt*,"memory exhausted" call flush(6) stop endf ah(1,1:5) = (/ 5.0d0, 4.0d0, 3.0d0, 2.0d0, 1.0d0 /) ah(2,1:5) = (/ 4.0d0, 4.0d0, 3.0d0, 2.0d0, 1.0d0 /) ah(3,1:5) = (/ 3.0d0, 3.0d0, 3.0d0, 2.0d0, 1.0d0 /) ah(4,1:5) = (/ 2.0d0, 2.0d0, 2.0d0, 2.0d0, 1.0d0 /) ah(5,1:5) = (/ 1.0d0, 1.0d0, 1.0d0, 1.0d0, 1.0d0 /) al(1,1:5) = (/ 0.0d0, 0.0d0, 0.0d0, 0.0d0, 0.0d0 /) al(2,1:5) = (/ 0.0d0, 0.0d0, 0.0d0, 0.0d0, 0.0d0 /) al(3,1:5) = (/ 0.0d0, 0.0d0, 0.0d0, 0.0d0, 0.0d0 /) al(4,1:5) = (/ 0.0d0, 0.0d0, 0.0d0, 0.0d0, 0.0d0 /) al(5,1:5) = (/ 0.0d0, 0.0d0, 0.0d0, 0.0d0, 0.0d0 /) call dd_adjust_s(n, ah, al, bh, bl, nm) call ddegen_s(n, bh, bl, nm, wh, wl, zh, zl, nm, m, 1) wrte(*,*) 'EgenValue=', wh * deallocate(ah,al) deallocate(bh,bl) deallocate(zh,zl) deallocate(wh,wl) call MPI_Fnalze(err) end 11

10 Subroutne detals Table 1 shows a lst of the routne modules n the QPEgen_s lbrary. Table1. Subroutnes of QPEgen_s Routne name Functon Calculatng the egenvalues and egenvectors of a real symmetrc by trdagonalzng the usng the Householder ddegen_s method. Executes a seres of actons: trdagonalzaton, egenvalue calculaton, and egenvector calculaton. Trdagonalzes a real symmetrc usng HouseHolder ddegen_trd trdagonalzaton. Calculates the egenvalues of an nput usng the dvde-and-conquer algorthm f the nput s a trdagonal ddegen_dc obtaned by HouseHolder trdagonalzaton of a real symmetrc. Calculates the egenvectors of the trdagonal obtaned by ddegen_tbk HouseHolder trdagonalzaton of a real symmetrc. The DDEgen lbrary uses two-dmensonal cyclc dstrbuton. Ths converson routne s used to set up the data dstrbuton of a dd_adjust_s to support cyclc dstrbuton. Ths routne must be called before usng ths lbrary. ddcommuncaton_s Module used for MPI communcaton 12

Table 2 shows a lst of the routne modules n the QPEgen_sx lbrary. Table2. Subroutnes of QPEgen_sx Routne name Functon Calculatng the egenvalues and egenvectors of a real symmetrc by pentadagonalzng the usng the ddegen_sx NarrowBandReducton method. Executes a seres of actons: pentadagonalzaton, egenvalue calculaton, and egenvector calculaton. Pentadagonalzes a real symmetrc usng ddegen_prd NarrowBandReducton pentadagonalzaton. Calculates the egenvalues of an nput usng the dvde-and-conquer algorthm f the nput s a ddegen_dcx pentadagonal obtaned by NarrowBandReducton pentadagonalzaton of a real symmetrc. Calculates the egenvectors of the pentadagonal ddegen_pbk obtaned by NarrowBandReducton pentadagonalzaton of a real symmetrc. The DDEgen lbrary uses two-dmensonal cyclc dstrbuton. Ths converson routne s used to set up the data dstrbuton of dd_adjust_sx a to support cyclc dstrbuton. Ths routne must be called before usng ths lbrary. ddcommuncaton_sx Module used for MPI communcaton 13

Table 3 shows a lst of the routne modules n the QPEgen_h lbrary. Table3. Subroutnes of QPEgen_h Routne name Functon Calculatng the egenvalues and egenvectors of a Hermtan ddegen_h. ddegen_hrd Dagonalzes a Hermtan. Calculates the egenvalues of an nput usng the ddegen_dch dvde-and-conquer algorthm. ddegen_hbk Calculates the egenvectors of the. The DDEgen lbrary uses two-dmensonal cyclc dstrbuton. Ths converson routne s used to set up the data dstrbuton of dd_adjust_h a to support cyclc dstrbuton. Ths routne must be called before usng ths lbrary.ths routne s used for real data. dd_adjust_h Ths routne s used for magnary data. ddcommuncaton_h Module used for MPI communcaton 14

Routne name ddegen_s Calculatng the egenvalues and egenvectors of a real symmetrc Functon by trdagonalzng the usng the Householder method. Executes a seres of actons: trdagonalzaton, egenvalue calculaton, and egenvector calculaton. Specfcaton ddegen_s(n, ah, al, lda, wh, wl, zh, zl, ldz, m, fl) Arguments Descrptons /o Remarks n Dmenson of a nteger ah Hgh word of quadruple precson real al Low word of quadruple precson real lda Sze of array A nteger wh Hgh word of egenvalues o wl Low word of egenvalues o zh Hgh word of egenvectors o zl Low word of egenvectors o ldz Sze of array Z nteger m Blockng factor nteger about 32 to 128 fl Opton for output egenvectors nteger 0 or 1 Routne name ddegen_trd Functon Trdagonalzes a real symmetrc usng HouseHolder trdagonalzaton Specfcaton ddegen_trd(n, ah, al, lda, dh, dl, eh, el, m) Arguments Descrptons /o Remarks n Dmenson of a nteger ah Hgh word of quadruple precson real al Low word of quadruple precson real lda Sze of array A nteger dh Hgh word of dagonal elements o dl Low word of dagonal elements o eh Hgh word of subdagonal elements o el Low word of subdagonal elements o m Blockng factor nteger Routne name ddegen_dc Calculates the egenvalues of an nput usng the Functon dvde-and-conquer algorthm f the nput s a trdagonal obtaned by HouseHolder trdagonalzaton of a real symmetrc. Specfcaton ddegen_dc(n, wh, wl, eh, el, zh, zl, ldz, nfo) Argu ment Descrptons /o Remarks s n Dmenson of a nteger wh Hgh word of egenvalues o wl Low word of egenvalues o eh Hgh word of subdagonal elements 1

el Low word of subdagonal elements zh Hgh word of egenvectors o zl Low word of egenvectors o ldz Sze of array Z nteger nfo Error number nteger o Routne name ddegen_tbk Functon Calculates the egenvectors of the trdagonal obtaned by HouseHolder trdagonalzaton of a real symmetrc. Specfcaton ddegen_tbk(n, ah, al, lda, zh, zl, ldz, eh, el, m) Arguments Descrptons /o Remarks n Dmenson of a nteger ah Hgh word of the al Low word of the lda Sze of array A nteger zh Hgh word of egenvectors o zl Low word of egenvectors o ldz Sze of array Z nteger eh Hgh word of subdagonal components el Low word of subdagonal components m Blockng factor nteger Routne name dd_adjust_s The DDEgen lbrary uses two-dmensonal cyclc dstrbuton. Functon Ths converson routne s used to set up the data dstrbuton of a to support cyclc dstrbuton. Ths routne must be called before usng ths lbrary. Specfcaton dd_adjust_s(n, ah, al, bh, bl, nm) Arguments Descrptons /o Remarks n Dmenson of a nteger ah Hgh word of orgnal al Low word of orgnal bh Hgh word of dstrbuted data o bl Low word of dstrbuted data o nm Sze of array b nteger 2

Routne name ddegen_sx Calculatng the egenvalues and egenvectors of a real symmetrc by pentadagonalzng the usng the Functon NarrowBandReducton method. Executes a seres of actons: pentadagonalzaton, egenvalue calculaton, and egenvector calculaton. Specfcaton ddegen_sx(n, ah, al, nma, wh, wl, zh, zl, dh, dl, eh, el, nme, m0, fl) Arguments Descrptons /o Remarks n Dmenson of a nteger ah Hgh word of quadruple precson real al Low word of quadruple precson real nma Sze of array A nteger wh Hgh word of egenvalues o wl Low word of egenvalues o zh Hgh word of egenvectors o zl Low word of egenvectors o dh Hgh word of array to store a man dagonal elements o dl Low word of array to store a man dagonal elements o eh Hgh word of array to store a sub dagonal elements o el Low word of array to store a sub dagonal elements o nme Sze of array e nteger m0 Blockng factor nteger about 32 to 128 fl Opton for output egenvectors nteger 0 or 1 Routne name ddegen_prd Functon Pentadagonalzes a real symmetrc usng NarrowBandReducton pentadagonalzaton Specfcaton ddegen_prd(n, ah, al, nma, dh, dl, eh, el, nme, m0) Arguments Descrptons /o Remarks n Dmenson of a nteger ah Hgh word of quadruple precson real al Low word of quadruple precson real nma Sze of array A nteger dh Hgh word of dagonal elements o dl Low word of dagonal elements o eh Hgh word of subdagonal elements o el Low word of subdagonal elements o nme Sze of array e nteger m0 Blockng factor nteger about 32 to 128 3

Routne name ddegen_dcx Calculates the egenvalues of an nput usng the Functon dvde-and-conquer algorthm f the nput s a pentadagonal obtaned by NarrowBandReducton pentadagonalzaton of a real symmetrc. Specfcaton ddegen_dcx(n, wh, wl, eh, el, nme, zh, zl, nmz) Argu ment Descrptons /o Remarks s n Dmenson of a nteger wh Hgh word of egenvalues wl Low word of egenvalues eh Hgh word of sub dagonal elements el Low word of sub dagonal elements nme Sze of array e nteger zh Hgh word of egenvectors o zl Low word of egenvectors o nmz Sze of array Z nteger Routne name ddegen_pbk Calculates the egenvectors of the pentadagonal obtaned Functon by NarrowBandReducton pentadagonalzaton of a real symmetrc. Specfcaton ddegen_pbk(n, ah, al, nma, zh, zl, nmz, eh, el, m0) Arguments Descrptons /o Remarks n Dmenson of a nteger ah Hgh word of quadruple precson real al Low word of quadruple precson real nma Sze of array A nteger zh Hgh word of egenvectors o zl Low word of egenvectors o nmz Sze of array Z nteger eh Hgh word of subdagonal components el Low word of subdagonal components m0 Blockng factor nteger about 32 to 128 Routne name dd_adjust_sx The DDEgen lbrary uses two-dmensonal cyclc dstrbuton. Functon Ths converson routne s used to set up the data dstrbuton of a to support cyclc dstrbuton. Ths routne must be called before usng ths lbrary. Specfcaton dd_adjust_sx(n, ah, al, bh, bl, nm) Arguments Descrptons /o Remarks n Dmenson of a nteger ah Hgh word of orgnal al Low word of orgnal bh Hgh word of dstrbuted data o bl Low word of dstrbuted data o nm Sze of array b nteger 4

Routne name ddegen_h Functon Calculatng the egenvalues and egenvectors of a Hermtan. Specfcaton ddegen_h(n, arh, arl, ah, al, wh, wl, lda, m0, fl) Arguments Descrptons /o Remarks n Dmenson of a nteger arh Hgh word of the real part of the (on ext) /o Hgh word of the real part of egenvectors arl Hgh word of the real part of the (on ext) /o Low word of the real part of egenvectors ah Low word of the magnary part of the (on ext) /o Hgh word of the magnary part of egenvectors al Low word of the magnary part of the (on ext) /o Low word of the magnary part of egenvectors wh Hgh word of egenvalues o wl Low word of egenvalues o lda Sze of array A nteger m0 Blockng factor nteger fl Opton for output egenvectors nteger 0 or 1 Routne name ddegen_hrd Functon Calculatng the egenvalues and egenvectors of a Hermtan. Specfcaton ddegen_hrd(n, arh, arl, ah, al, lda, dh, dl, eh, el, e2h, e2l, tauh, taul, m, zrh, zrl, zh, zl) Arguments Descrptons /o Remarks n Dmenson of a nteger arh Hgh word of the real part of the arl Hgh word of the real part of the ah Low word of the magnary part of the al Low word of the magnary part of the lda Sze of array A nteger dh Hgh word of dagonal elements o dl Low word of dagonal elements o eh Hgh word of subdagonal elements o 5

el Low word of subdagonal elements o e2h Work area o e2l Work area o tauh Work area o taul Work area o m Blockng factor nteger about 128 zrh Hgh word of real part of the egenvectors o zrl Low word of real part of the egenvectors o zh Hgh word of magnary part of the egenvectors o zl Low word of magnary part of the egenvectors o Routne name ddegen_dch Functon Calculates the egenvalues of an nput usng the dvde-and-conquer algorthm. Specfcaton ddegen_dch(n, dh, dl, eh, el, zh, zl, ldz, nfo) Argu ment Descrptons /o Remarks s n Dmenson of a nteger dh Hgh word of egenvalues o dl Low word of egenvalues o eh Work area o el Work area o zh Hgh word of real part of the egenvectors o zl Low word of real part of the egenvectors o ldz Sze of array Z nteger nfo Error number nteger o Routne name ddegen_hbk Functon Calculates the egenvectors of the. Specfcaton ddegen_hbk(n, arh, arl, ah, al, lda, zrh, zrl, zh, zl, ldz, tauh, taul, ltau, n) Arguments Descrptons /o Remarks arh Hgh word of the real part of the arl Hgh word of the real part of the ah Low word of the magnary part of the al Low word of the magnary part of the lda Sze of array A nteger zrh Hgh word of the real part of the zrl Hgh word of the real part of the 6

zh Low word of the magnary part of the zl Low word of the magnary part of the ldz Sze of array z nteger tauh Work area o taul Work area o ldtau Work area Integer o n Dmenson of a nteger Routne name dd_adjust_h, dd_adjust_h The DDEgen lbrary uses two-dmensonal cyclc dstrbuton. Ths converson routne s used to set up the data dstrbuton of a Functon to support cyclc dstrbuton. Ths routne must be called before usng ths lbrary.ths routne s used for real data. Dd_adjust_h s for magnary data. Specfcaton dd_adjust_h(n, ah, al, bh, bl, nm) dd_adjust_h(n, ah, al, bh, bl, nm) Arguments Descrptons /o Remarks n Dmenson of a nteger ah Hgh word of orgnal al Low word of orgnal bh Hgh word of dstrbuted data o bl Low word of dstrbuted data o nm Sze of array b nteger 7