Overview
There are several types of symmetry functions available in RuNNer.
A detailed description of most of them can be found in
J. Behler, J. Chem. Phys. 2011, 134, 074106.
For high-dimensional neural network potentials there are radial and angular
symmetry functions, which are all many-body functions and not to be confused
with two- and three-body functions.
In addition, spin-dependent atom-centered symmetry functions proposed in
M. Eckhoff, J. Behler, npj Comput. Mater. 2021, 7, 170
take atomic spin degrees of freedom into account to represent different magnetic orders of collinear spin-polarized atoms.
Further, RuNNer
contains some special dummy symmetry functions for testing
purposes, which are not suitable for constructing potential energy surfaces.
The Cutoff Function
The cutoff function is a central part of any symmetry function, because it
defines the local atomic environments. In RuNNer
there are several cutoff
functions available, which are selected by the keyword
cutoff_type
.
The functions
cutoff_type
1 and
cutoff_type
2 are
well tested and recommended for applications. The other cutoff functions
currently have an experimental status only and should be used with care.
For cutoff_type
1
the function
is used. Where \(R_{\mathrm{inner,c}}\) =
cutoff_alpha
* \(R_{\mathrm c}\) and x is defined as
For cutoff_type
2
it is the function
Usually, the inner cutoff \(R_{\text inner, c}\) is set to zero.
The following table shows a list of the functional forms of all cutoff functions within the cutoff radius and without the inner cutoff radius. Outside the cutoff radius the values are zero. In addition, please be aware that the inner cutoff radius must be smaller than the shortest bond length in the system in order to avoid losing information about the close atomic environment.
cutoff_type | Name | Formula |
---|---|---|
0 | hard function | 1 |
1 | cosine function | \(\frac{1}{2} [\cos(\pi x)+1]\) |
2 | hypertangent function v1 | \(\tanh^{3} (1-\frac{R_{ij}}{R_{\mathrm{c}}})\) |
3 | hypertangent function v2 | \((\frac{e+1}{e-1})^3 \tanh^{3}(1-\frac{R_{ij}}{R_{\mathrm{c}}})\) |
4 | exponential function | \(\exp(1-\frac{1}{1-x^2})\) |
5 | polynomial function v1 | \((2x -3)x^2+1\) |
6 | polynomial function v2 | \(((15-6x)x-10)x^3+1\) |
7 | polynomial function v3 | \((x(x(20x-70)+84)-35)x^4+1\) |
8 | polynomial function v4 | \((x(x(x(315-70x)-540)+420)-126)x^5+1\) |
Radial Symmetry Functions
The radial functions describe the radial distribution of neighboring atoms of a specified element with respect to the central atom \(i\). They can often be interpreted as continuous coordination numbers of the central atom.
Radial Function Type 1
For the description of the environment of atom \(i\) type 1 is the plain cutoff
function of
cutoff_type
1.
(no other cutoff type can be used here):
For \(R_{ij} < R_{\mathrm{c}}\) we then have
This symmetry function has a non-zero derivative with respect to the coordinates
of the reference atom \(x_i\), \(y_i\), \(z_i\) and with respect to the coordinates of
all atoms \(j\), \(x_j\), \(y_j\) and \(z_j\). The derivative dsfuncdxyz
with respect
to a coordinate \(\alpha\) is then
where \(\alpha\) can be any of \(x_i\), \(y_i\), \(z_i\), \(x_j\), \(y_j\) or \(z_j\). With the definition of \(R_{ij}\)
we then have
and
For the derivatives \(\frac{\partial G^{\mu}_i}{\partial \alpha}\) in
RuNNer
the array
dsfuncdxyz(num_funcvalues, max_num_atoms, max_num_atoms, 3)
is used.
Field 1 defines the number of the symmetry function (not its type
number!), field 2 specifies the number of the central atom, i.e., the
atom that is characterized by the symmetry function, field 3 the number
of the atom, for whose coordinate we are calculating the derivative, and
field 4 specifies the coordinate of the derivative, so field 3 and 4
together define \(\alpha\).
The use of this function is discouraged, because for small cutoffs there can be problems in the accuracy of energies and forces. Therefore, the symmetry function has not yet been generalized to automatically adapt to the chosen cutoff type.
Radial Function Type 2
This function is the most important radial function. It is a sum of shifted Gaussians multiplied by the cutoff function. The result is a "shell" around the atom at a certain radius determined by the shift \(R_s\). The width of the Gaussian is determined by the parameter \(\eta\).
The derivative dsfuncdxyz
with respect to a coordinate \(\alpha\) is
A good starting point is to first try \(R_s=0\).
Radial Function Type 4
The purpose of this symmetry function is to provide a more systematic way of describing the atomic environments by using a cosine function with increasing frequency determined by \(\eta\).
A disadvantage is the possible cancellation of positive and negative constributions from different neighboring atoms. This symmetry function is rarely used.
Radial Spin-dependent Function Type 21
Spin-dependent atom-centered symmetry functions can distinguish different collinear atomic spin arrangements for the construction of magnetic high-dimensional neural network potentials. To represent the magnetic order, the atomic spin coordinate,
with
is used. \(M_S\) equals the difference between the numbers of spin-up \(n_i^\uparrow\) and spin-down electrons \(n_i^\downarrow\) at an atom site \(i\) multiplied by the spin of an electron \(\tfrac{1}{2}\,\hbar\). The threshold \(M_S^\mathrm{thres}\) is set to \(0.25\,\hbar\). If the atomic spin is not smaller than \(M_S^\mathrm{thres}\), the atomic spin coordinate will be the sign of \(M_S\), i.e., \(-1\) or 1.
The radial spin-dependent symmetry functions are based on radial symmetry function type 2, but contain in addition a radial spin augmentation function \(M^\mathrm{x}\),
The radial spin-dependent symmetry function type 21 uses the spin augmentation function
This spin augmentation function represents only non-magnetic interactions.
Radial Spin-dependent Function Type 22
The radial spin-dependent symmetry function type 22 uses the spin augmentation function
This spin augmentation function represents ferromagnetic interactions between a magnetic central atom and a magnetic neighbor atom.
Radial Spin-dependent Function Type 23
The radial spin-dependent symmetry function type 23 uses the spin augmentation function
This spin augmentation function represents antiferromagnetic interactions between a magnetic central atom and a magnetic neighbor atom.
Angular Symmetry Functions
Angular Function Type 3
The angular terms are constructed for all triplets of atoms by summing the cosine values of the angles \(\theta_{ijk}=\frac{\bf{R}_{ij}\cdot\bf{R}_{ik}}{R_{ij} R_{ik}}\) centered at atom \(i\), with \(\bf{R}_{ij}=\bf{R}_i-\bf{R}_j\),
\(\lambda\) can only have the values +1 and -1. The derivative
dsfuncdxyz
with respect to a coordinate \(\alpha\) is
Angular Function Type 7
This symmetry function is currently not implemented.
Angular Function Type 8
This is an angular function with maxima at arbitrary angles, which is symmetric with respect to \(\alpha=\pi\) and \(\alpha=2\pi\).
Angular Function Type 9
This symmetry function is closely related to angular symmetry function 3, but is missing the terms depending on \(R_{jk}\).
Due to the missing multiplication by \(f_{\mathrm{c} }\left(R_{jk}\right)\) also triples with distances between \(j\) and \(k\) being larger than the cutoff radius will be considered. Therefore the numerical values of this symmetry function will be generally larger.
Angular Spin-dependent Function Type 24
The angular spin-dependent symmetry functions are based on angular symmetry function type 3, but contain in addition an angular spin augmentation function \(M^\mathrm{xx}\),
Note: The sum over \(j\) and \(k\neq j\) takes into account all contributions of a \(j\times k\) matrix except for diagonal contributions (self-interaction). Therefore, \(2^{-\zeta}\) is used as prefactor instead of \(2^{1-\zeta}\) which is employed for the sum over \(j\) and \(k>j\). In the latter case only the upper triangular (all entries above the main diagonal) are used and multiplied by an additional factor \(2\) of the prefactor because these are typically identical to the lower triangular. The only exception is the angular spin-dependent symmetry function type 27 which requires to take into account all contributions explicitly.
The angular spin-dependent symmetry function type 24 uses the spin augmentation function
This spin augmentation function represents only non-magnetic interactions of the central atom with its neighbor atoms.
Angular Spin-dependent Function Type 25
The angular spin-dependent symmetry function type 25 uses the spin augmentation function
This spin augmentation function represents interactions between three magnetic atoms of the same atomic spin coordinate. In addition, it describes ferromagnetic interactions between a magnetic central atom, one magnetic neighbor atom, and one non-magnetic neighbor atom.
Angular Spin-dependent Function Type 26
The angular spin-dependent symmetry function type 26 uses the spin augmentation function
This spin augmentation function represents interactions between three magnetic atoms, whereby the atomic spin coordinate of the central atom is different to those of the neighbor atoms. In addition, it describes antiferromagnetic interactions between a magnetic central atom, one magnetic neighbor atom, and one non-magnetic neighbor atom.
Angular Spin-dependent Function Type 27
The angular spin-dependent symmetry function type 27 uses the spin augmentation function
This spin augmentation function represents interactions between three magnetic atoms, whereby the two neighbor atoms exhibit different atomic spin coordinates.
Angular Spin-dependent Function Type 28
The angular spin-dependent symmetry function type 28 uses the spin augmentation function
This spin augmentation function represents ferromagnetic interactions between a magnetic central atom, one magnetic neighbor atom, and one non-magnetic neighbor atom.
Angular Spin-dependent Function Type 29
The angular spin-dependent symmetry function type 29 uses the spin augmentation function
This spin augmentation function represents interactions between three atoms of the same atomic spin coordinate.
Angular Spin-dependent Function Type 30
The angular spin-dependent symmetry function type 30 uses the spin augmentation function
This spin augmentation function represents antiferromagnetic interactions between a magnetic central atom, one magnetic neighbor atom, and one non-magnetic neighbor atom.
Angular Spin-dependent Function Type 31
The angular spin-dependent symmetry function type 31 uses the spin augmentation function
This spin augmentation function represents interactions between three magnetic atoms, whereby the atomic spin coordinate of the central atom is different to those of the neighbor atoms.
Other Symmetry Functions
Function Type 5
This symmetry functions is either the \(x\) (\(\eta=1.0\)), \(y\) (\(\eta=2.0\)) or \(z\) (\(\eta=3.0\)) Cartesian coordinate of an atom \(i\). It is included for testing and debugging purposes only. It cannot be used to construct potential energy surfaces.
Function Type 6
This symmetry functions is only applicable to diatomic molecules and corresponds to the interatomic distance. It is included for testing and debugging purposes only. It cannot be used to construct potential energy surfaces.
Technical Notes on the Implementation of the Symmetry Functions
For each symmetry function specified in the input.nn
file there is one
symmetry function value per atom in case of a monocomponent system. In
case of a general multicomponent system also all cross-terms have to be
specified. The number of cross-terms is different for the radial and the
angular symmetry functions. For a radial function there is one function
value for each element in the system. For an angular function all
permutations have to be considered, i.e., for a three-component system
with the elements A, B and C we have the angular functions describing
the angles A-A-A, A-B-B, A-C-C, A-A-B=A-B-A, A-A-C=A-C-A, and
A-B-C=A-C-B, where the first letter indicates the central atom of the
angle. For a system containing \(N_\mathrm{elem}\) elements we have
symmetry functions. In RuNNer the symmetry functions are stored in the
array
symfunction(max_num_atoms, num_funcvalues)
, where num_funcvalues
specifies \(N_\mathrm{sym}\). For a given structure the functions are
calculated in the subroutine calconefunction.f90
. The calculation of
the symmetry functions is done block-wise for a set of npoints
structures, where npoints < nblock
, and nblock
is the maximum
number of structures being stored in the memory at the same time. The
array storing the symmetry functions for a full set of structures is
called symfunction_list(nblock, max_num_atoms, num_funcvalues)
and is
constructed in calcfunctions.f90
from the symmetry function sets of
the individual structures.
Storing the calculated symmetry function values in the right entry of
the symfunction
array (field 2) is controlled by an index variable
nindex
. The value of nindex
is obtained from the array
nnindex(num_functions,nelem,nelem)
by inserting the element index of
the two neighboring atoms in the fields two and three for the angular
functions. For the radial functions field three is not used, because
there is only one neighbor. Please note that in case of angular
functions the order of the two elements does not matter, nnindex
will
yield the same value for nindex
, because
nnindex(,A,B)=nnindex(,B,A)
. The array nnindex
is determined in
subroutine getnnindex.f90
called by getstructurefunctions.f90
.
Scaling of the Symmetry Functions
For numerical reasons it may be advantageous to scale and/or center the
range of symmetry function values. This is done with the keywords
lscalesym
and lcentersym
.
If the symmetry functions are just scaled (lscalesym=.true.
and
lcentersym=.false.
) then the range of values for each symmetry
function is between 0 and 1 and the values are calculated according to
If the symmetry functions are just centered (lscalesym=.false.
and
lcentersym=.true.
) then the average value of the centered symmetry
function values is zero.
If the symmetry functions are centered and scaled (lscalesym=.true.
and lcentersym=.true.
) then the symmetry functions are
and there are positive and negative values (the range is not -1 to 1!)
If the symmetry functions are scaled, also the derivatives dsfuncdxyz
have to be scaled to keep the gradients consistent with the original
function.
Optimization of the Weight Parameters
Calculation of the Weight Derivatives for the Short Range NN
For the optimization of the weight parameters we need the derivatives of the NN function with respect to the weights \(b_i^j\) and \(a_{ij}^{kl}\). The derivatives with respect to the bias weights for the example NN are:
Consequently, the derivatives can be calculated recursively.
The derivatives with respect to the connecting weights for the example NN are:
Comparing the bias weight derivatives and the connecting weight derivatives yields:
Consequently, if we have calculated the bias weight derivatives, it is straightforward to calculate the derivatives with respect to the connecting weights. The derivative with respect to the connecting weights is simply the derivative with respect to the bias weight in the target node times the node value of the node in the previous layer.
The weight derivatives of the short range energy are calculated in
subroutine getdeshortdw.f90
, and inside this subroutine for each atom
the contribution is calculated by subroutine getonedeshortdw.f90
. The
derivatives are called
deshortdw(nelem, num_weightsshort, nodes_short(num_layersshort))
.
The target quantity is the short range energy only, if also an electrostatic energy contribution is used. The short range energy of the training points can be calculated by subtracting the electrostatic energy calculated from the electrostatic energy arising from the atomic charges of the training points.
Calculation of the Weight Derivatives for the Charge NN
For the weight optimization the derivatives of the total energy with respect to the weight parameters are required. Obtaining the derivatives with respect to the weights in the subnets determining the short range interaction is a standard procedure and they are given only by the NN topology. The derivatives with respect to the weights determining the electrostatic energy requires the calculation of the derivative of the Ewald energy with respect to the weight parameters \(w_{ik}^{k}\)' of the atomic charges subnets. The Ewald energy depends on these weights indirectly via the atomic charges \(Q\). We thus have
The derivatives \(\frac{\partial Q_i}{\partial w'}\) of the atomic charges with respect to the weights \(w'\) are calculated in the same way as the derivative \(\frac{\partial E_i}{\partial w}\) of the short range atomic energy contributions and are given by the NN subnet topology. The derivatives of the Ewald energy with respect to the atomic charges are
with
and
The atomic charges have to be fitted under the constraint of overall charge neutrality. This constraint has to be taken into account in the weight updates.
The optimum set of weight parameters is obtained if the error function has a minimum, i.e., when the derivative of the error function with respect to all weight parameters is zero. Since the error function, like the sum of charges, has to be zero, the sum of charges for each training point can simply be added to the error function of that point.
Consequently we need the derivative of the sum of charges with respect to the weights.
The Kalman Filter
The EKF recursions are:
The update of the weight parameters according to the Kalman filter is
done in subroutine iupdatekalman.f90
.
The following notation is used in the code
- \(\mathbf{w}=\)
weights
- \(\mathbf{K}=\)
kalgainmat
- \(\mathbf{J}=\)
dedw
- \(\mathbf{P}=\)
corrmatrix?
- \(\mathbf{Q}=\)
noise matrix
- \(\lambda=\)
kalmanlambda
For the update of the Kalman gain matrix we use\
coh
\(=\mathbf{P}(n-1)\mathbf{J}(n)\)\
coh
is calculated using the BLAS library dspmv
, which calculates
\(y=\alpha\cdot A\cdot x +\beta \cdot y\). For this we use
- \(\alpha=1\)
- \(\beta=0\)
- \(A=\mathbf{P}(n-1)\)
- \(x=\mathbf{J}(n)\)
We further define
We then obtain
For the update of the symmetric matrix \(\mathbf{P}\) we can make use of
\(\mathbf{P}(n)\) is calculated using the BLAS library dspr
, which
calculates \(A=\alpha\cdot x \cdot x^T + A\) We use
- \(\alpha=-\lambda^{-1}\cdot \texttt{inverse}\)
- \(x=\)
coh
- \(A=\lambda^{-1}\mathbf{P}(n-1)\)
Using Forces for the Weight Optimization
The forces contain valuable local information on the potential energy surface. This information can be used for the optimization of the weight parameters and RuNNer can use both, total energies and atomic forces from reference data to optimize the weight parameters.
The NN forces in RuNNer contain a short range and an electrostatic contribution. Unfortunately, these contributions cannot be separated easily in the DFT training data, because in the NN the charges are environment-dependent, and therefore the force contains the derivative of the charges with respect to the Cartesian coordinates. For the training set this information is not available. Unlike in case of the total energies, therefore the short range and electrostatic components cannot be separated and have to be fitted simultaneously. It is not possible to simply subtract the electrostatic forces to get the short range forces, because the derivative of the electrostatic energy expression assumes constant charges.
In order to use the forces for the weight optimization, the derivative of the forces with respect to the weight parameters of the short range and the electrostatic NN has to be known.
\(w\) can either be a bias weight \(b_i^j\) or a connection weight \(a_{ij}^{kl}\).
Optimization Using the Short Range Force
The short range force is given by
The derivative with respect to a short range weight \(w\) is then
because
\(\frac{\partial}{\partial w}\frac{\partial G_i^j}{\partial \alpha}=0\).
The derivative \(\frac{\partial E_i}{\partial G_i^j}\) is also calculated
in subroutine getshortforces.f90
(deshortdsfunc
) for the calculation
of the short range forces, but it needs to be recalculated for each new
set of weights. It has been shown above that the dependence on the weight parameters for
an example 2-3-4-5-1 NN is
The values at the nodes before the application of the activation functions depend on the weights:
Therefore also the derivatives of the activation functions \(\frac{\partial f(x)}{\partial G_i}\) at point \(x\) depend on these weight parameters. We thus need the derivative of the derivative of the activation functions.
The derivatives of the activation functions with respect to the \(G_i\)
have been calculated in subroutine getdnodes_values.f90
and are called
dnodes_values( num_layersshort, maxnodes_short)
. The values at the
nodes \(x_{\mu}^{\nu}\) before the application of the activation functions
\(f\) are called nodes_sum
and are calculated in subroutine
calconeatom.f90
.
Another approach to calculate \(\frac{\partial }{\partial w}\frac{\partial E_i}{\partial G_i^j}\) might be better: It is also possible to rewrite the derivative of the forces with respect to the weights as
\(\frac{\partial F_{\alpha}}{\partial w}\) is called
dfshortdw( num_weightsshort)
in the code.
\(\frac{\partial G_i^j}{\partial \alpha}\) is
dsfuncdxyz (num_funcvalues, max_num_atoms, max_num_atoms, 3)
in the
code. The term \(\frac{\partial E_i}{\partial w}\) (deshortdw
) has been
calculated before in subroutine getonedeshortdw.f90
as
dedw( num_weightsshort, nodes_short(num_layersshort))
to optimize the
short range weight parameters using the short range energies. Its values
cannot be used here, because the weight parameters have changed. Still,
we can reuse the Kalman equations for the bias weight derivatives of \(E_m\):
Then, the derivatives with respect to the input nodes for the bias weight derivatives are:
The second derivatives
\(\frac{\partial^2 f^{\nu}(x_{\mu}^{\nu})}{\partial G_i \partial b_{\mu}^{\nu}}\)
are ddnodes_values( num_layersshort, maxnodes_short)
in the code
(calculated in subroutine getddnodes_values.f90
). The derivatives
\(\frac{\partial }{\partial G_i}\frac{\partial E_m}{\partial b_{\mu}^{\nu}}\)
are called
dedgdw (num_weightsshort, num_funcvalues, nodes_short (num_layersshort))
in the code. They are calculated in subroutine getdfshortdw
.
For the derivatives with respect to the \(G_i\) we then obtain for the connecting weights
Note that in the last equation
\(\frac{\partial E_m}{\partial a_{ij}^{01}}\) has been replaced by
\(\frac{\partial E_m}{\partial a_{01}^{hj}}\) to distinguish the indices
properly (Thanks to Andi!). The terms
\(\frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_{\mu}^{\nu}}\)
have been derived before. The derivatives
\(\frac{\partial E_m}{\partial b_{\mu}^{\nu}}\), too.
We still need an expression for \(\frac{\partial}{\partial G_i}x_{\mu}^{\nu}\)
(dnodes_values_dg( num_layersshort, maxnodes_short, num_funcvalues)
):
\(\frac{\partial }{\partial G_i}x_{\mu}^{\nu}\) = dnodes_values_dg
is
calculated in subroutine get_dnodes_values_dg.f90
.
\(\frac{\partial f^{\nu}(x_{\mu}^{\nu})}{\partial G_i}\) = dnodes_values
is calculated in subroutine getdfshortdw.f90
.
Hints for Debugging
For debugging purposes it is useful to have a small NN that can be evaluated manually, e.g., a 4-2-2-1 lll NN. For such a network the weights are stored in the following order:
The numbers assigned to the weights here are arbitrary numbers to calculate the NN manually. The corresponding pointer array windex(2\(\cdot\)num_layersshort) is then
If we set the symmetry function values of the 4 input nodes as
we obtain for the values at the nodes
nodes_values(num_layersshort, maxnodes_short)
:
For the partial derivatives dnodes_values(num_layersshort, maxnodes_short)
we obtain
$$ \begin{align} dnodes_values(1, 1)&= 1.0000000000 \ dnodes_values(1, 2)&= 1.0000000000 \ dnodes_values(2, 1)&= 1.0000000000 \ dnodes_values(2, 2)&= 1.0000000000 \ dnodes_values(3, 1)&= 1.0000000000
\end{align} $$
For a 4-2-2-1 ttl NN with the hyperbolic tangent activation function in the hidden layers we would have
For the second derivatives (needed in subroutine getdfshortdw.f90
)
ddnodes_values
we obtain for a 4-2-2-1 lll NN:
and with a 4-2-2-1 ttl NN: