Chiral Fermion Proposal
Download
Report
Transcript Chiral Fermion Proposal
Physics on the Bluegene/L with
QDP++ and Chroma
Robert Edwards
Jefferson Lab
Physics Goals
Dynamical chiral fermions:
Motivation
– Good chiral symmetry control
– No taste breaking
– Avoid valence smearing
– Fully consistent valence and sea quarks
LHP Collaboration
Dru Renner University of Arizona
Richard Brower, James Osborn Boston University
Rebecca Irwin, Michael Ramsey-Musolf CalTech
Jimmy Juge, Adam Lichtl, Colin Morningstar, Carnegie
Melon University
Robert Edwards, Nilmani Mathur, David Richards JLab
Kostas Orginos, JLab/William & Mary
Bojan Bistrovic, Jonathan Bratt, Patrick Dreher, Oliver
Jahn, John Negele, Andrew Pochinsky, Dmitry Sigaev
MIT
Matthias Burkardt, Michael Engelhardt New Mexico
State University
George Fleming Yale
Constantia Alexandrou, Antonios Tsapalis Cyprus,
Wolfram Schroers DESY, Philipp Hägler Vrije
Universiteit
Realization of QCD on a lattice
Continuous space-time 4-dim Euclidean lattice
Derivatives finite differences
Quarks on sites, gluons on links. Gluons
elements of group SU(3)
O U , ,
Um (x) = exp(iga Am(x))
1
SG U M U
dU
d
d
O
U
,
,
e
Z
1
S U
' dU O U , M 1 U det M U e G
Z
•
•
Integrate anti-commuting fermion
fields ! det(M(U)) and M-1(U)
factors
Gauge action ~ continuum:
SG
1 4
a
a
2
d
x
F
F
a
4
Goal: Overlap operator
Overlap operator on the lattice:
Four dimensional space of algorithms:
Kernel:
Approximation:
Representation: (CF – Continued Fraction, PF -
Partial Fraction, DWF=CT= Cayley Transform)
Contraint: (5D, 4D)
Only 4D operator physically relevant:
The Goal – Overlap Determinant
Chiral fermion determinant:
Overlap determinant via 5 dimensions (e.g., DWF)
Upshot - Can exploit connection between 4D and 5D to
improve speed of calculations
Terminology:
– Identical physics – 4D and 5D generated propagators are identical
on each config
– Equivalent physics – kernel (H) the same, but different approxs.
All identical in a perfect approx.
– Inequivalent physics – different kernels (H) (different lattice
spacing dependence) even with same approx.
Kernel
Choice of kernel affects ``physics’’ (cutoff effects)
Wilson kernel:
Shamir kernel: (standard Domain-Wall)
Mobius kernel
Representations
Continued Fraction – Euler representation, i determine
approx.
Partial Fraction:
Euclidean Cayley Transform (Domain-Wall):
Approximations
Two popular approximations
Polar (“tanh”) [induced by DWF]
Zolotarev:
sn(z/
M,λ)
sn(z
,k)
Trick – projection: supplement approx. with exact eigenv.
Example: Continued Fraction
Want solution to
Use back-substitution – a 5D algorithm!
Equivalent to solving
5D Operator – Generic Case
Want solution to
Representation for (H) turned into 5D system – use CG
Advantages: one 5D Krylov space versus nested 4D
Generic linear system:
All 5D operators use a 4D even-odd preconditioning:
Roughly 3x speedup. Cost of Aee negligble in flops (care
needed due to memory loads)
Which 5D Fermion Action?
LHPC/UKQCD
Evaluate “cost” of various chiral ferm actions
Consider only 5D inverters for use in force term
in HMC
No projection – have residual mass
Decide by a metric – cost (#Dw apps) for fixed
mres
Tests
Use Nf=2 DWF ensembles (RBC), m = 500 MeV
Actions (D(0)=(1/2)(1+5(H))
Mobius : (Rescaled) Shamir (H=HT) and Overlap (H=Hw)
Continued Fraction rep. for (Hw) in 5D form
Different actions with same 4D physics (H)
Reduce mres by better approx. of (x)
Zolotarev (Chebyshev) and
tanh approx. to (x)
sn(z/M,λ)
sn(z,k)
Results – Cost Comparisons
Of actions tested, standard DWF Shamir is least effective.
Zolotarev Continued Fraction (Hw and HT) are candidates
Second Moment
Second norm not crazy – shows not wild cancellations in mres
Zolotarev Continued Fraction (Hw) is winner
Chiral Fermions in HMC
Generate gauge fields according to:
Have choices:
– 4D pseudofermions:
– 5D pseudofermions:
– Hybrid: 4D pseudofermions with 5D inversions!
Chiral Fermions in HMC
Generate gauge fields according to:
Gauge force usually MUCH noisier than fermion force!
Use multitime scale integrators
Other tricks:
– Imperfect chirality during integration:
– Determinant splitting, Nth rootery – lower condition number
Cost Estimates
SciDAC Software Structure
Wilson Op, DWF Inv for P4; Wilson and Stag. Op for QCDOC
Level 3
Level
2
Exists in C,
C++, scalar
and
MPP using
QMPLevel 1
Optimised Dirac Operators,
Inverters
QDP (QCD Data Parallel)
QIO
Lattice Wide Operations,
Data shifts
XML I/O
LIME
QLA (QCD Linear Algebra)
QMP (QCD Message Passing)
Exists, implemented in MPI, GM, gigE and
QCDOC
QMP Simple Example
char buf[size];
QMP_msgmem_t mm;
QMP_msghandle_t mh;
mm = QMP_declare_msgmem(buf,size);
Multiple calls
mh = QMP_declare_send_relative(mm,+x);
QMP_start(mh);
// Do computations
QMP_wait(mh);
Receiving node coordinates with the same steps except
mh = QMP_declare_receive_from(mm,-x);
Data Parallel QDP/C,C++ API
Hides architecture and layout
Operates on lattice fields across sites
Linear algebra tailored for QCD
Shifts and permutation maps across sites
Reductions
Subsets
Entry/exit – attach to existing codes
QDP++ Type Structure
Lattice Fields have various kinds of indices
Color: Uab(x) Spin: Ga Mixed: aa(x),
Qaba(x)
Tensor Product of Indices forms Type:
Lattice
Gauge Fields: Lattice
Lattice
Fermions:
Scalar
Scalars:
Propagators: Lattice
Scalar
Gamma:
Color
Matrix(Nc)
Vector(Nc)
Scalar
Matrix(Nc)
Scalar
Spin
Scalar
Vector(Ns)
Scalar
Matrix(Ns)
Matrix(Ns)
Complexity
Complex
Complex
Scalar
Complex
Complex
QDP++ forms these types via nested C++ templating
Formation of new types (eg: half fermion) possible
Data-parallel Operations
Unary and binary:
-a; a-b; …
Unary functions:
adj(a), cos(a), sin(a), …
Random numbers:
// platform independent
random(a), gaussian(a)
Comparisons (booleans)
a <= b, …
Broadcasts:
a = 0, …
Reductions:
sum(a), …
Fields have various types (indices): Tensor Product
Lattice: A(x), Color: U i j x , Spin: Ga ,
ai x ,
ij
Qa
x
QDP Expressions
Can create expressions
cai ( x) Uij ( x) bai ( x ) 2 dai ( x)
x Even
QDP/C++ code
multi1d<LatticeColorMatrix> u(Nd);
LatticeDiracFermion b, c, d;
int mu;
c[even] = u[mu] * shift(b,mu) + 2 * d;
PETE: Portable Expression Template Engine
Temporaries eliminated, expressions optimised
Linear Algebra Implementation
// Lattice operation
A = adj(B) + 2 * C;
// Lattice temporaries
t1 = 2 * C;
t2 = adj(B);
t3 = t2 + t1;
A = t3;
// Merged Lattice loop
for (i = ... ; ... ; ...) {
A[i] = adj(B[i]) + 2 * C[i];
}
Naïve ops involve lattice temps –
inefficient
Eliminate lattice temps -PETE
Allows further combining of
operations (adj(x)*y)
Overlap
communications/computations
Full performance – expressions at
site level
QDP++ Optimization
Optimizations “under the hood”
Select numerically intensive operations through template
specialization.
PETE recognises expression templates like:
z=a*x+y
from type information at compile time.
Calls machine specific optimised routine (axpyz)
Optimized routine can use assembler, reorganize loops etc.
Optimized routines can be selected at configuration time,
Unoptimized fallback routines exist for portability
LatticeFermion psi, p, r;
Real c, cp, a, d;
Subset s;
for(int k = 1; k <= MaxCG; ++k)
{
// c = | r[k-1] |**2
c = cp;
Performance Test Case Wilson Conjugate
Gradient
// a[k] := | r[k-1] |**2 / <M p[k], Mp[k] >
// Mp = M(u) * p
M(mp, p, PLUS); // Dslash
// d = | mp | ** 2
d = norm2(mp, s);
Norm squares
a = c / d;
// Psi[k] += a[k] p[k]
psi[s] += a * p;
// r[k] -= a[k] M^dag.M.p[k] ;
M(mmp, mp, MINUS);
r[s] -= a * mmp;
cp = norm2(r, s);
if ( cp <= rsd_sq ) return;
// b[k+1] := |r[k]|**2 / |r[k-1]|**2
b = cp / c;
// p[k+1] := r[k] + b[k+1] p[k]
p[s] = r + b*p;
}
VAXPY
operations
In C++ significant room for
perf. degradation
Performance limitations in
Lin. Alg. Ops (VAXPY) and
norms
Optimization:
Funcs return container
holding function type and
operands
At “=“, replace expression
with optimized code by
template specialization
Performance:
QDP overhead ~ 1% peak
Wilson: QCDOC
283Mflops/node @350
MHz, 4^4/node
Chroma
A lattice QCD toolkit/library built on top of QDP++
Library is a module – can be linked with other codes.
Features:
Utility libraries (gluonic measure, smearing, etc.)
Fermion support (DWF, Overlap, Wilson, Asqtad)
eg: McNeile: computes
Applications:
propagators with CPS,
measure pions with Chroma
Spectroscopy, Props & 3-pt funcs,
eigenvalues
all in same
code
Heatbath, HMC
Optimization hooks – level 3 Wilson-Dslash for
Pentium, QCDOC, BG/L, IBM SP-like nodes (via
Bagel)
Chroma Lib Structure
Chroma Lattice Field Theory library
Support for gauge and fermion actions
– Boson action support
– Fermion action support
–
Gauge action support
Fermion actions
Fermion boundary conditions
Inverters
Fermion linear operators
Quark propagator solution routines
Gauge actions
Gauge boundary conditions
IO routines
– Enums
Measurement routines
– Eigenvalue measurements
– Gauge fixing routines
– Gluonic observables
– Hadronic observables
– Inline measurements
–
–
–
–
Eigenvalue measurements
Glue measurements
Hadron measurements
Smear measurements
Psibar-psi measurements
Schroedinger functional
Smearing routines
Trace-log support
Gauge field update routines
– Heatbath
– Molecular dynamics support
Hamiltonian systems
HMC trajectories
HMD integrators
HMC monomials
HMC linear system solver initial guess
Utility routines
– Fermion manipulation routines
– Fourier transform support
– Utility routines for manipulating color
–
matrices
Info utilities
Fermion Actions
Actions are factory objects (foundries)
Do not hold gauge fields – only params
Factory/creation functions with gauge field argument
Takes a gauge field - creates a State & applies fermion BC.
Takes a State – creates a Linear Operator (dslash)
Takes a State – creates quark prop. solvers
Linear Ops are function objects
E.g., class Foo {int operator() (int x);} fred; // int z=fred(1);
Argument to CG, MR, etc. – simple functions
Created with XML
Fermion Actions - XML
Tag FermAct is key in
lookup map of
constructors
During construction,
action reads XML
FermBC tag invokes
another lookup
<FermionAction>
<FermAct>WILSON</FermAct>
<Kappa>0.11</Kappa>
<FermionBC>
<FermBC>SIMPLE_FERMBC</FermBC>
<boundary>1 1 1 -1</boundary>
</FermionBC>
<AnisoParam>
<anisoP>false</anisoP>
<t_dir>3</t_dir>
<xi_0>1.0</xi_0>
<nu>1.0</nu>
</AnisoParam>
</FermionAction>
XPath used in chroma/mainprogs/main/propagator.cc
/propagator/Params/FermionAction/FermAct
HMC and Monomials
HMC built on
<Monomials>
<elem>
Monomials
Monomials define <Name>TWO_FLAVOR_WILSON_FERM_MONOMIAL
</Name>
Nf, gauge, etc.
<FermionAction>
Only provide
<FermAct>WILSON</FermAct>
Mom à deriv(U)
……
and S(U) .
</FermionAction>
Pseudoferms not
<InvertParam>
visible.
<invType>CG_INVERTER</invType>
<RsdCG>1.0e-7</RsdCG>
Have Nf=2 and
<MaxCG>1000</MaxCG>
rational Nf=1
</InvertParam>
<ChronologicalPredictor>
Both 4D and 5D
<Name>LAST_SOLUTION_4D_PREDICTOR</Name>
versions.
</ChronologicalPredictor>
</elem>
<elem> …. </elem>
</Monomials>
Gauge Monomials
Gauge monomials:
Plaquette
<Monomials>
Rectangle
<elem> …. </elem>
Parallelogram <elem>
<Name>WILSON_GAUGEACT_MONOMIAL</Name>
<GaugeAction>
Monomial
<Name>WILSON_GAUGEACT</Name>
constructor will
<beta>5.7</beta>
invoke constructor
<GaugeBC>
for Name in
<Name>PERIODIC_GAUGEBC</Name>
GaugeAction
</GaugeBC>
</GaugeAction>
</elem>
</Monomials>
Chroma – Inline Measurements
<InlineMeasurements>
<elem>
HMC has Inline
<Name>MAKE_SOURCE</Name>
meas.
<Param>...</Param>
Chroma.cc is Inline
<Prop>
<source_file>./source_0</source_file>
only code.
<source_volfmt>MULTIFILE</source_volfmt>
Former mainprogs
</Prop>
now inline meas.
</elem>
Meas. are registered <elem>
with constructor call.
<Name>PROPAGATOR</Name>
<Param>...</Param>
<Prop>
Meas. given gauge
<source_file>./source_0</source_file>
field – no return value.
<prop_file>./propagator_0</prop_file>
<prop_volfmt>MULTIFILE</prop_volfmt>
Only communicate to
</Prop>
each other via disk
(maybe mem. buf.??) </elem>
<elem>….</elem>
</InlineMeasurements>
Level-3 Wilson-Dirac Computation
Strategy
( x)
Nd 1
Nd 1
U ( x)(1 ) ( x ) U ( x )(1 ) ( x )
0
0
Communication...............................................Computation
........................................................................ a ( x ) spinproj ( ( x), )
........................................................................ b ( x ) spinproj ( ( x), )
Communicate(boundaries ( a , ), )......... c U † ( x) b
Synchronize()..................................................
Communicate(boundaries( a , ), )......... d U ( x) a
Synchronize()..................................................
........................................................................ ( x)
Nd 1
reconstruct ( , )
c
0
Another Level-3 Wilson-Dirac
Strategy
( x)
Nd 1
Nd 1
U ( x)(1 ) ( x ) U ( x )(1 ) ( x )
0
0
Communication....................................................Computation
............................................................................. a ( x ) spinproj ( ( x), )
Communicate(boundaries( a , ), ).............. c ( x) U † ( x) spinproj ( ( x), )
Synchronize().......................................................
Communicate(boundaries( , ), ).............. ( x)
c
Nd 1
a
reconstruct
(
U
(
x
)
( x) , )
0
Synchronize().......................................................
Nd 1
............................................................................. ( x) ( x) reconstruct ( c ( x), )
0
Details – Performance - DWF
Level-3 DWF – strip-mined length 4 vectorize on 5th dim index
JLA B 3 G , Level I I
JLab 3 G , Level I I I
JLA B 4 G , Level I I
JLab 4 G , Level I I I
Mflops/node
FNAL
2500
M yr i net , Level I I I
F N A L I nf i ni b and , Level I I I
2000
1500
1000
500
0
0
10000
20000
Local lattice size
30000
For More Information on Software
U.S. Lattice QCD Home Page:
http://www.usqcd.org/
The JLab Lattice Portal
http://lqcd.jlab.org/
High Performance Computing at JLab
http://www.jlab.org/hpc/