Skip to content

Overview

There are several types of symmetry functions available in RuNNer. A detailed description of most of them can be found in J. Behler, J. Chem. Phys. 2011, 134, 074106. For high-dimensional neural network potentials there are radial and angular symmetry functions, which are all many-body functions and not to be confused with two- and three-body functions. In addition, spin-dependent atom-centered symmetry functions proposed in M. Eckhoff, J. Behler, npj Comput. Mater. 2021, 7, 170 take atomic spin degrees of freedom into account to represent different magnetic orders of collinear spin-polarized atoms. Further, RuNNer contains some special dummy symmetry functions for testing purposes, which are not suitable for constructing potential energy surfaces.

The Cutoff Function

The cutoff function is a central part of any symmetry function, because it defines the local atomic environments. In RuNNer there are several cutoff functions available, which are selected by the keyword cutoff_type.

The functions cutoff_type 1 and cutoff_type 2 are well tested and recommended for applications. The other cutoff functions currently have an experimental status only and should be used with care.

For cutoff_type 1 the function

\[ \begin{align} f^1_{\mathrm{c}}\left(R_{\mathrm{ij}}\right)=\Bigg\{ \begin{array}{rcl} 1 & \mbox{for} & R_{\mathrm {ij}} \leq R_{\mathrm{inner,c}} \\ 0.5\cdot\left[\cos{\left(\pi x \right)}+1\right] & \mbox{for} & R_{\mathrm{inner,c}} \leq R_{\mathrm {ij}}\leq R_{\mathrm{c}} \\ 0 & \mbox{for} & R_{\mathrm {ij}} > R_{\mathrm{c}} . \end{array} \end{align} \]

is used. Where \(R_{\mathrm{inner,c}}\) = cutoff_alpha * \(R_{\mathrm c}\) and x is defined as

\[ x = \frac{R_{\mathrm {ij}} - R_{\mathrm{inner,c}}} {R_{\mathrm c} - R_{\mathrm{inner,c}}} \]

For cutoff_type 2 it is the function

\[ \begin{align} f^2_{\mathrm{c}}\left(R_{\mathrm {ij}}\right)=\Bigg\{ \begin{array}{rcl} \tanh^3(1) & \mbox{for} & R_{\mathrm{ij}} \leq R_{\mathrm{inner,c}}\\ \left[\tanh^3{\left(1-\frac{ R_{i\mathrm {ij}}}{R_{\mathrm{c}}}\right)}\right] & \mbox{for} & R_{\mathrm{inner,c}} \leq R_{\mathrm {ij}}\leq R_{\mathrm{c}} \\ 0 & \mbox{for} & R_{\mathrm {ij}} > R_{\mathrm{c}} . \end{array} \end{align} \]

Usually, the inner cutoff \(R_{\text inner, c}\) is set to zero.

The following table shows a list of the functional forms of all cutoff functions within the cutoff radius and without the inner cutoff radius. Outside the cutoff radius the values are zero. In addition, please be aware that the inner cutoff radius must be smaller than the shortest bond length in the system in order to avoid losing information about the close atomic environment.

cutoff_type Name Formula
0 hard function 1
1 cosine function \(\frac{1}{2} [\cos(\pi x)+1]\)
2 hypertangent function v1 \(\tanh^{3} (1-\frac{R_{ij}}{R_{\mathrm{c}}})\)
3 hypertangent function v2 \((\frac{e+1}{e-1})^3 \tanh^{3}(1-\frac{R_{ij}}{R_{\mathrm{c}}})\)
4 exponential function \(\exp(1-\frac{1}{1-x^2})\)
5 polynomial function v1 \((2x -3)x^2+1\)
6 polynomial function v2 \(((15-6x)x-10)x^3+1\)
7 polynomial function v3 \((x(x(20x-70)+84)-35)x^4+1\)
8 polynomial function v4 \((x(x(x(315-70x)-540)+420)-126)x^5+1\)

Radial Symmetry Functions

The radial functions describe the radial distribution of neighboring atoms of a specified element with respect to the central atom \(i\). They can often be interpreted as continuous coordination numbers of the central atom.

Radial Function Type 1

For the description of the environment of atom \(i\) type 1 is the plain cutoff function of cutoff_type 1. (no other cutoff type can be used here):

\[ \begin{align} G^1_i = \sum_j f_{\mathrm{c}}\left(R_{ij}\right) \end{align} \]

For \(R_{ij} < R_{\mathrm{c}}\) we then have

\[ \begin{align} G^{1}_i = \sum_j \frac{1}{2} \cdot \left[\cos{\left(\frac{\pi R_{ij}}{R_{\mathrm{c}}}\right)}+1\right] \end{align} \]

This symmetry function has a non-zero derivative with respect to the coordinates of the reference atom \(x_i\), \(y_i\), \(z_i\) and with respect to the coordinates of all atoms \(j\), \(x_j\), \(y_j\) and \(z_j\). The derivative dsfuncdxyz with respect to a coordinate \(\alpha\) is then

\[ \begin{align} \frac{\partial G^1_i}{\partial \alpha} = -\sum_j \frac{1}{2} \sin{ \left( \frac{\pi R_{ij}}{R_{\mathrm{c}}} \right)} \cdot \frac{\pi}{R_{\mathrm{c}}} \cdot \frac{\partial R_{ij}}{\partial \alpha} \end{align} \]

where \(\alpha\) can be any of \(x_i\), \(y_i\), \(z_i\), \(x_j\), \(y_j\) or \(z_j\). With the definition of \(R_{ij}\)

\[\begin{align} R_{ij}=\sqrt{(x_i-x_j)^2+(y_i-y_j)^2+(z_i-z_j)^2} \end{align} \]

we then have

\[ \begin{align} \frac{\partial R_{ij}}{\partial x_i} & = \frac{1}{2\cdot \sqrt{(x_i-x_j)^2+(y_i-y_j)^2+(z_i-z_j)^2} } \cdot 2(x_i-x_j) \\ & = \frac{1}{R_{ij}}(x_i-x_j) \end{align} \]

and

\[ \begin{align} \frac{\partial R_{ij}}{\partial x_j} = -\frac{\partial R_{ij}}{\partial x_i} \end{align} \]

For the derivatives \(\frac{\partial G^{\mu}_i}{\partial \alpha}\) in RuNNer the array dsfuncdxyz(num_funcvalues, max_num_atoms, max_num_atoms, 3) is used. Field 1 defines the number of the symmetry function (not its type number!), field 2 specifies the number of the central atom, i.e., the atom that is characterized by the symmetry function, field 3 the number of the atom, for whose coordinate we are calculating the derivative, and field 4 specifies the coordinate of the derivative, so field 3 and 4 together define \(\alpha\).

The use of this function is discouraged, because for small cutoffs there can be problems in the accuracy of energies and forces. Therefore, the symmetry function has not yet been generalized to automatically adapt to the chosen cutoff type.

Radial Function Type 2

This function is the most important radial function. It is a sum of shifted Gaussians multiplied by the cutoff function. The result is a "shell" around the atom at a certain radius determined by the shift \(R_s\). The width of the Gaussian is determined by the parameter \(\eta\).

\[ \begin{align} G^2_i = \sum_{j} e^{-\eta \left(R_{ij}-R_{\mathrm{s}}\right)^2}\cdot f_{\mathrm{c}}\left(R_{ij}\right) . \end{align} \]

The derivative dsfuncdxyz with respect to a coordinate \(\alpha\) is

\[ \begin{align} \frac{\partial G^2_i}{\partial \alpha} = \sum_j \left[-2 \eta \left(R_{ij}-R_s\right) f_{\mathrm c}(R_{ij})e^{-\eta \left(R_{ij}-R_s\right)^2} \cdot \frac{\partial R_{ij}}{\partial \alpha}+e^{-\eta \left(R_{ij}-R_s\right)^2} \cdot \frac{\partial f_{\mathrm c}(R_{ij})}{\partial \alpha}\right] \end{align} \]

A good starting point is to first try \(R_s=0\).

Radial Function Type 4

The purpose of this symmetry function is to provide a more systematic way of describing the atomic environments by using a cosine function with increasing frequency determined by \(\eta\).

\[ \begin{align} G^{4}_{i} = \sum_{j} \cos{(\eta\cdot R_{ij})} \cdot f_{\mathrm{c}}\left(R_{ij}\right) \quad . \end{align} \]

A disadvantage is the possible cancellation of positive and negative constributions from different neighboring atoms. This symmetry function is rarely used.

Radial Spin-dependent Function Type 21

Spin-dependent atom-centered symmetry functions can distinguish different collinear atomic spin arrangements for the construction of magnetic high-dimensional neural network potentials. To represent the magnetic order, the atomic spin coordinate,

\[ \begin{align} s_i=\begin{cases}0&\mathrm{for}\ |{M_S}_i|<M_S^\mathrm{thres}\\\mathrm{sgn}({M_S}_i)&\mathrm{otherwise}\end{cases}\ , \end{align} \]

with

\[ \begin{align} {M_S}_i=\frac{\hbar}{2}\left(n_i^\uparrow-n_i^\downarrow\right)\ , \end{align} \]

is used. \(M_S\) equals the difference between the numbers of spin-up \(n_i^\uparrow\) and spin-down electrons \(n_i^\downarrow\) at an atom site \(i\) multiplied by the spin of an electron \(\tfrac{1}{2}\,\hbar\). The threshold \(M_S^\mathrm{thres}\) is set to \(0.25\,\hbar\). If the atomic spin is not smaller than \(M_S^\mathrm{thres}\), the atomic spin coordinate will be the sign of \(M_S\), i.e., \(-1\) or 1.

The radial spin-dependent symmetry functions are based on radial symmetry function type 2, but contain in addition a radial spin augmentation function \(M^\mathrm{x}\),

\[ \begin{align} G^{21-23}_{i}=\sum_jM^\mathrm{x}(s_i,s_j)\cdot\mathrm{e}^{-\eta R_{ij}^2}\cdot f_\mathrm{c}\left(R_{ij}\right)\ . \end{align} \]

The radial spin-dependent symmetry function type 21 uses the spin augmentation function

\[ \begin{align} M^\mathrm{0^*}(s_i,s_j)=1-\left|s_is_j\right|\ . \end{align} \]

This spin augmentation function represents only non-magnetic interactions.

Radial Spin-dependent Function Type 22

The radial spin-dependent symmetry function type 22 uses the spin augmentation function

\[ \begin{align} M^\mathrm{+}(s_i,s_j)=\tfrac{1}{2}\left|s_is_j\right|\cdot\left|s_i+s_j\right|\ . \end{align} \]

This spin augmentation function represents ferromagnetic interactions between a magnetic central atom and a magnetic neighbor atom.

Radial Spin-dependent Function Type 23

The radial spin-dependent symmetry function type 23 uses the spin augmentation function

\[ \begin{align} M^\mathrm{-}(s_i,s_j)=\tfrac{1}{2}\left|s_is_j\right|\cdot\left|s_i-s_j\right|\ . \end{align} \]

This spin augmentation function represents antiferromagnetic interactions between a magnetic central atom and a magnetic neighbor atom.

Angular Symmetry Functions

Angular Function Type 3

The angular terms are constructed for all triplets of atoms by summing the cosine values of the angles \(\theta_{ijk}=\frac{\bf{R}_{ij}\cdot\bf{R}_{ik}}{R_{ij} R_{ik}}\) centered at atom \(i\), with \(\bf{R}_{ij}=\bf{R}_i-\bf{R}_j\),

\[ \begin{align} G^{3}_{i} &=2^{1-\zeta}\sum_{j}\sum_k\left[ \left(1+\lambda \cdot\cos{\theta_{ijk}}\right)^{\zeta} \cdot e^{-\eta\left(R_{ij}^2+R_{ik}^2+R_{jk}^2\right)}\right. \notag \\ & \cdot \left.f_{\mathrm{c}}\left(R_{ij}\right) \cdot f_{\mathrm{c}}\left(R_{ik}\right) \cdot f_{\mathrm{c}}\left(R_{jk}\right)\right] \end{align} \]

\(\lambda\) can only have the values +1 and -1. The derivative dsfuncdxyz with respect to a coordinate \(\alpha\) is

\[ \begin{align} \frac{\partial G^3_i}{\partial \alpha}=2^{1-\zeta}\sum_{j}\sum_k %\left[ \frac{\partial \left(1+\lambda\cdot\cos{\theta_{ijk}}\right)^{\zeta}}{\partial \alpha} \cdot e^{-\eta\left(R_{ij}^2+R_{ik}^2+R_{jk}^2\right)} \cdot f_{\mathrm{c}}\left(R_{ij}\right)\cdot f_{\mathrm{c}% }\left(R_{ik}\right)\cdot f_{\mathrm{c}}\left(R_{jk}\right) \nonumber \\ + \frac{\partial e^{-\eta\left(R_{ij}^2+R_{ik}^2+R_{jk}^2\right)} }{\partial \alpha} \cdot \left(1+\lambda\cdot\cos{\theta_{ijk}}\right)^{\zeta} \cdot f_{\mathrm{c}}\left(R_{ij}\right)\cdot f_{\mathrm{c}% }\left(R_{ik}\right)\cdot f_{\mathrm{c}}\left(R_{jk}\right) \nonumber \\ + \frac{\partial f_{\mathrm{c}}\left(R_{ij}\right)}{\partial \alpha} \cdot \left(1+\lambda\cdot\cos{\theta_{ijk}}\right)^{\zeta} \cdot e^{-\eta\left(R_{ij}^2+R_{jk}^2+R_{jk}^2\right)} \cdot f_{\mathrm{c} }\left(R_{ik}\right)\cdot f_{\mathrm{c}}\left(R_{jk}\right) \nonumber \\ + \frac{\partial f_{\mathrm{c}}\left(R_{ik}\right)}{\partial \alpha} \cdot \left(1+\lambda\cdot\cos{\theta_{ijk}}\right)^{\zeta} \cdot e^{-\eta\left(R_{ij}^2+R_{ik}^2+R_{jk}^2\right)} \cdot f_{\mathrm{c} }\left(R_{ij}\right)\cdot f_{\mathrm{c}}\left(R_{jk}\right) \nonumber \\ + \frac{\partial f_{\mathrm{c}}\left(R_{jk}\right)}{\partial \alpha} \cdot \left(1+\lambda\cdot\cos{\theta_{ijk}}\right)^{\zeta} \cdot e^{-\eta\left(R_{ij}^2+R_{ik}^2+R_{jk}^2\right)} \cdot f_{\mathrm{c} }\left(R_{ij}\right)\cdot f_{\mathrm{c}}\left(R_{ik}\right) %\right] \end{align} \]

Angular Function Type 7

This symmetry function is currently not implemented.

Angular Function Type 8

This is an angular function with maxima at arbitrary angles, which is symmetric with respect to \(\alpha=\pi\) and \(\alpha=2\pi\).

\[ \begin{align} G^{8}_{i} &=\left[ e^{-\eta\cdot\left(\theta_{ijk}-\theta_{shift}\right)^2} +e^{-\eta\cdot\left(\theta_{ijk}-\left(360^\circ-\theta_{shift} \right)\right)^2} +e^{-\eta\cdot\left(\theta_{ijk}+\theta_{shift}\right)^2} +e^{-\eta\cdot\left(\theta_{ijk}-\left(360+\theta_{shift}\right)\right)^2} \right] \notag \\ & \cdot f_c\left(R_{ij}\right) \cdot f_c\left(R_{ik}\right) \cdot f_c\left(R_{jk}\right) \end{align} \]

Angular Function Type 9

This symmetry function is closely related to angular symmetry function 3, but is missing the terms depending on \(R_{jk}\).

\[ \begin{align} G^{9}_{i} = 2^{1-\zeta}\sum_{j}\sum_k\left[ \left(1+\lambda\cdot\cos{\theta_{ijk}}\right)^{\zeta} \cdot e^{-\eta\left(R_{ij}^2+R_{ik}^2\right)} \cdot f_{\mathrm{c}}\left(R_{ij}\right) \cdot f_{\mathrm{c}}\left(R_{ik}\right) \right] \end{align} \]

Due to the missing multiplication by \(f_{\mathrm{c} }\left(R_{jk}\right)\) also triples with distances between \(j\) and \(k\) being larger than the cutoff radius will be considered. Therefore the numerical values of this symmetry function will be generally larger.

Angular Spin-dependent Function Type 24

The angular spin-dependent symmetry functions are based on angular symmetry function type 3, but contain in addition an angular spin augmentation function \(M^\mathrm{xx}\),

\[ \begin{align} G^{24-31}_{i}=2^{-\zeta}\sum_j\sum_{k\neq j}M^\mathrm{xx}(s_i,s_j,s_k)\cdot\left[1+\lambda\cos\left(\theta_{ijk}\right)\right]^\zeta\cdot\mathrm{e}^{-\eta\left(R_{ij}^2+R_{ik}^2+R_{jk}^2\right)}\cdot f_\mathrm{c}\left(R_{ij}\right)\cdot f_\mathrm{c}\left(R_{ik}\right)\cdot f_\mathrm{c}\left(R_{jk}\right)\ . \end{align} \]

Note: The sum over \(j\) and \(k\neq j\) takes into account all contributions of a \(j\times k\) matrix except for diagonal contributions (self-interaction). Therefore, \(2^{-\zeta}\) is used as prefactor instead of \(2^{1-\zeta}\) which is employed for the sum over \(j\) and \(k>j\). In the latter case only the upper triangular (all entries above the main diagonal) are used and multiplied by an additional factor \(2\) of the prefactor because these are typically identical to the lower triangular. The only exception is the angular spin-dependent symmetry function type 27 which requires to take into account all contributions explicitly.

The angular spin-dependent symmetry function type 24 uses the spin augmentation function

\[ \begin{align} M^\mathrm{00^*}(s_i,s_j,s_k)=\left(1-\left|s_i\right|\right)+\left|s_i\right|\cdot\left(1-\left|s_j\right|\right)\cdot\left(1-\left|s_k\right|\right)\ . \end{align} \]

This spin augmentation function represents only non-magnetic interactions of the central atom with its neighbor atoms.

Angular Spin-dependent Function Type 25

The angular spin-dependent symmetry function type 25 uses the spin augmentation function

\[ \begin{align} M^\mathrm{++}(s_i,s_j,s_k)=\begin{cases}\tfrac{1}{2}\left|s_i\right|\cdot\left(\left|s_i+s_j+s_k\right|-1\right)&\hspace{-0.15cm}\begin{cases}\mathrm{for}\ s_{j}\neq0\land s_{k}\neq0\\ \mathrm{for}\ s_{j}=0\land s_{k}=0\end{cases}\\ \tfrac{1}{2}\left|s_i\right|\cdot\left|s_i+s_j+s_k\right|&\hspace{-0.15cm}\mathrm{otherwise}\end{cases}\ . \end{align} \]

This spin augmentation function represents interactions between three magnetic atoms of the same atomic spin coordinate. In addition, it describes ferromagnetic interactions between a magnetic central atom, one magnetic neighbor atom, and one non-magnetic neighbor atom.

Angular Spin-dependent Function Type 26

The angular spin-dependent symmetry function type 26 uses the spin augmentation function

\[ \begin{align} M^\mathrm{--}(s_i,s_j,s_k)=\begin{cases}\tfrac{1}{2}\left|s_i\right|\cdot\left(\left|s_i-s_j-s_k\right|-1\right)&\hspace{-0.15cm}\begin{cases}\mathrm{for}\ s_{j}\neq0\land s_{k}\neq0\\ \mathrm{for}\ s_{j}=0\land s_{k}=0\end{cases}\\ \tfrac{1}{2}\left|s_i\right|\cdot\left|s_i-s_j-s_k\right|&\hspace{-0.15cm}\mathrm{otherwise}\end{cases}\ . \end{align} \]

This spin augmentation function represents interactions between three magnetic atoms, whereby the atomic spin coordinate of the central atom is different to those of the neighbor atoms. In addition, it describes antiferromagnetic interactions between a magnetic central atom, one magnetic neighbor atom, and one non-magnetic neighbor atom.

Angular Spin-dependent Function Type 27

The angular spin-dependent symmetry function type 27 uses the spin augmentation function

\[ \begin{align} M^\mathrm{+-}(s_i,s_j,s_k)=\left|s_is_js_k\right|\cdot\left(\left|s_i+s_j-s_k\right|-1\right)\ . \end{align} \]

This spin augmentation function represents interactions between three magnetic atoms, whereby the two neighbor atoms exhibit different atomic spin coordinates.

Angular Spin-dependent Function Type 28

The angular spin-dependent symmetry function type 28 uses the spin augmentation function

\[ \begin{align} M^\mathrm{++2}(s_i,s_j,s_k)=\tfrac{1}{2}\left|s_i\right|\cdot\left(1-\left|s_js_k\right|\right)\cdot\left(\left|s_j\right|+\left|s_k\right|\right)\cdot\left|s_i+s_j+s_k\right|\ . \end{align} \]

This spin augmentation function represents ferromagnetic interactions between a magnetic central atom, one magnetic neighbor atom, and one non-magnetic neighbor atom.

Angular Spin-dependent Function Type 29

The angular spin-dependent symmetry function type 29 uses the spin augmentation function

\[ \begin{align} M^\mathrm{++3}(s_i,s_j,s_k)=\tfrac{1}{2}\left|s_is_js_k\right|\cdot\left(\left|s_i+s_j+s_k\right|-1\right)\ . \end{align} \]

This spin augmentation function represents interactions between three atoms of the same atomic spin coordinate.

Angular Spin-dependent Function Type 30

The angular spin-dependent symmetry function type 30 uses the spin augmentation function

\[ \begin{align} M^\mathrm{--2}(s_i,s_j,s_k)=\tfrac{1}{2}\left|s_i\right|\cdot\left(1-\left|s_js_k\right|\right)\cdot\left(\left|s_j\right|+\left|s_k\right|\right)\cdot\left|s_i-s_j-s_k\right|\ . \end{align} \]

This spin augmentation function represents antiferromagnetic interactions between a magnetic central atom, one magnetic neighbor atom, and one non-magnetic neighbor atom.

Angular Spin-dependent Function Type 31

The angular spin-dependent symmetry function type 31 uses the spin augmentation function

\[ \begin{align} M^\mathrm{--3}(s_i,s_j,s_k)=\tfrac{1}{2}\left|s_is_js_k\right|\cdot\left(\left|s_i-s_j-s_k\right|-1\right)\ . \end{align} \]

This spin augmentation function represents interactions between three magnetic atoms, whereby the atomic spin coordinate of the central atom is different to those of the neighbor atoms.

Other Symmetry Functions

Function Type 5

This symmetry functions is either the \(x\) (\(\eta=1.0\)), \(y\) (\(\eta=2.0\)) or \(z\) (\(\eta=3.0\)) Cartesian coordinate of an atom \(i\). It is included for testing and debugging purposes only. It cannot be used to construct potential energy surfaces.

\[\begin{aligned} G^5_i=x_i \lor G^5_i=y_i \lor G^5_i=z_i\end{aligned}\]

Function Type 6

This symmetry functions is only applicable to diatomic molecules and corresponds to the interatomic distance. It is included for testing and debugging purposes only. It cannot be used to construct potential energy surfaces.

\[\begin{aligned} G^6_i=R_{ij}\end{aligned}\]

Technical Notes on the Implementation of the Symmetry Functions

For each symmetry function specified in the input.nn file there is one symmetry function value per atom in case of a monocomponent system. In case of a general multicomponent system also all cross-terms have to be specified. The number of cross-terms is different for the radial and the angular symmetry functions. For a radial function there is one function value for each element in the system. For an angular function all permutations have to be considered, i.e., for a three-component system with the elements A, B and C we have the angular functions describing the angles A-A-A, A-B-B, A-C-C, A-A-B=A-B-A, A-A-C=A-C-A, and A-B-C=A-C-B, where the first letter indicates the central atom of the angle. For a system containing \(N_\mathrm{elem}\) elements we have

\[ \begin{align} N_\mathrm{sym} &= N_\mathrm{elem}\cdot N_\mathrm{rad} + N_\mathrm{elem}\cdot N_\mathrm{ang} + N_\mathrm{ang}\cdot \sum_{i=1}^{N_\mathrm{elem}-1}i\\ N_\mathrm{sym} &= N_\mathrm{elem} \cdot N_\mathrm{rad} + N_\mathrm{ang}\cdot \sum_{i=1}^{N_\mathrm{elem}}i \end{align} \]

symmetry functions. In RuNNer the symmetry functions are stored in the array symfunction(max_num_atoms, num_funcvalues), where num_funcvalues specifies \(N_\mathrm{sym}\). For a given structure the functions are calculated in the subroutine calconefunction.f90. The calculation of the symmetry functions is done block-wise for a set of npoints structures, where npoints < nblock, and nblock is the maximum number of structures being stored in the memory at the same time. The array storing the symmetry functions for a full set of structures is called symfunction_list(nblock, max_num_atoms, num_funcvalues) and is constructed in calcfunctions.f90 from the symmetry function sets of the individual structures.

Storing the calculated symmetry function values in the right entry of the symfunction array (field 2) is controlled by an index variable nindex. The value of nindex is obtained from the array nnindex(num_functions,nelem,nelem) by inserting the element index of the two neighboring atoms in the fields two and three for the angular functions. For the radial functions field three is not used, because there is only one neighbor. Please note that in case of angular functions the order of the two elements does not matter, nnindex will yield the same value for nindex, because nnindex(,A,B)=nnindex(,B,A). The array nnindex is determined in subroutine getnnindex.f90 called by getstructurefunctions.f90.

Scaling of the Symmetry Functions

For numerical reasons it may be advantageous to scale and/or center the range of symmetry function values. This is done with the keywords lscalesym and lcentersym.

If the symmetry functions are just scaled (lscalesym=.true. and lcentersym=.false.) then the range of values for each symmetry function is between 0 and 1 and the values are calculated according to

\[ \begin{align} G = \frac{G-G_{\rm min}}{G_{\rm max}-G_{\rm min}} \end{align} \]

If the symmetry functions are just centered (lscalesym=.false. and lcentersym=.true.) then the average value of the centered symmetry function values is zero.

\[ \begin{align} G=G-G_{\rm average} \end{align} \]

If the symmetry functions are centered and scaled (lscalesym=.true. and lcentersym=.true.) then the symmetry functions are

\[ \begin{align} G=\frac{G-G_{\rm average}}{G_{\rm max}-G_{\rm min}} \end{align} \]

and there are positive and negative values (the range is not -1 to 1!)

If the symmetry functions are scaled, also the derivatives dsfuncdxyz have to be scaled to keep the gradients consistent with the original function.

\[ \begin{align} \frac{\partial G}{\partial \alpha} = \frac{\partial G}{\partial \alpha}\cdot\frac{1}{G_{\rm max}-G_{\rm min}} \end{align} \]

Optimization of the Weight Parameters

Calculation of the Weight Derivatives for the Short Range NN

For the optimization of the weight parameters we need the derivatives of the NN function with respect to the weights \(b_i^j\) and \(a_{ij}^{kl}\). The derivatives with respect to the bias weights for the example NN are:

\[\begin{align} \frac{\partial E_m}{\partial b_m^4} &=\frac{\partial f^4\left(x_m^4\right)}{\partial b_m^4} \\ % \frac{\partial E_m}{\partial b_l^3} &=\frac{\partial f^4\left(x_m^4\right)}{\partial b_l^3} \cdot a_{lm}^{34} \frac{\partial f^3\left(x_l^3\right)}{\partial b_l^3} \\ &=\frac{\partial E_m}{\partial b_m^4}\cdot a_{lm}^{34} \frac{\partial f^3\left(x_l^3\right)}{\partial b_l^3} \\ % \frac{\partial E_m}{\partial b_k^2} &=\frac{\partial f^4\left(x_m^4\right)}{\partial b_k^2} \cdot \sum_{l=1}^5 a_{lm}^{34} \frac{\partial f^3\left(x_l^3\right)}{\partial b_l^3} \cdot a_{kl}^{23} \frac{\partial f^2(x_k^2)}{\partial b_k^2}\\ &=\sum_{l=1}^5\frac{\partial E_m }{\partial b_l^3}\cdot a_{kl}^{23} \frac{\partial f^2(x_k^2)}{\partial b_k^2} \\ % \frac{\partial E_m}{\partial b_j^1} &=\frac{\partial f^4\left(x_m^4\right)}{\partial b_j^1} \cdot \sum_{l=1}^5 a_{lm}^{34} \frac{\partial f^3\left(x_l^3\right)}{\partial b_l^3} \cdot \sum_{k=1}^4 a_{kl}^{23} \frac{\partial f^2(x_k^2)}{\partial b_k^2} a_{jk}^{12} \frac{\partial f^1(x_j^1)}{\partial b_j^1} \\ &=\sum_{l=1}^5 \frac{\partial E_m}{\partial b_l^3}\sum_{k=1}^4 a_{kl}^{23} \frac{\partial f^2(x_k^2)}{\partial b_k^2} a_{jk}^{12} \frac{\partial f^1(x_j^1)}{\partial b_j^1} \\ &= \sum_{k=1}^4 \frac{\partial E_m}{\partial b_k^2}a_{jk}^{12} \frac{\partial f^1(x_j^1)}{\partial b_j^1} \end{align} \]

Consequently, the derivatives can be calculated recursively.

The derivatives with respect to the connecting weights for the example NN are:

\[ \begin{align} \frac{\partial E_m}{\partial a_{lm}^{34}} &=\frac{\partial f^4\left(x_m^4\right)}{\partial a_{lm}^{34}}\cdot y_l^3 \\ % \frac{\partial E_m}{\partial a_{kl}^{23}} &=\frac{\partial f^4\left(x_m^4\right)}{\partial a_{kl}^{23}} \cdot a_{lm}^{34} \frac{\partial f^3\left(x_l^3\right)}{\partial a_{kl}^{23}} \cdot y_k^2 \\ % \frac{\partial E_m}{\partial a_{jk}^{12}} &=\frac{\partial f^4\left(x_m^4\right)}{\partial a_{jk}^{12}} \sum_{l=1}^5 a_{lm}^{34} \frac{\partial f^3\left(x_l^3\right)}{\partial a_{jk}^{12}} \cdot a_{kl}^{23} \frac{\partial f^2\left(x_k^2\right)}{\partial a_{jk}^{12}} y_j^1 \\ % \frac{\partial E_m}{\partial a_{ij}^{01}} &=\frac{\partial f^4\left(x_m^4\right)}{\partial a_{ij}^{01}} \sum_{l=1}^5 a_{lm}^{34} \frac{\partial f^3\left(x_l^3\right)}{\partial a_{j}^{01}} \cdot \sum_{k=1}^4 a_{kl}^{23} \frac{\partial f^2\left(x_k^2\right)}{\partial a_{ij}^{01}} a_{jk}^{12} \frac{\partial f^1\left(x_j^1\right)}{\partial a_{ij}^{01}} G_i \end{align} \]

Comparing the bias weight derivatives and the connecting weight derivatives yields:

\[ \begin{align} \frac{\partial E_m}{\partial a_{lm}^{34}} &= \frac{\partial E_m}{\partial b_m^4} \cdot y_l^3 \\ % \frac{\partial E_m}{\partial a_{kl}^{23}} &= \frac{\partial E_m}{\partial b_l^3} \cdot y_k^2 \\ % \frac{\partial E_m}{\partial a_{jk}^{12}} &= \frac{\partial E_m}{\partial b_k^2} \cdot y_j^1 \\ % \frac{\partial E_m}{\partial a_{ij}^{01}} &= \frac{\partial E_m}{\partial b_j^1} \cdot G_i \end{align} \]

Consequently, if we have calculated the bias weight derivatives, it is straightforward to calculate the derivatives with respect to the connecting weights. The derivative with respect to the connecting weights is simply the derivative with respect to the bias weight in the target node times the node value of the node in the previous layer.

The weight derivatives of the short range energy are calculated in subroutine getdeshortdw.f90, and inside this subroutine for each atom the contribution is calculated by subroutine getonedeshortdw.f90. The derivatives are called deshortdw(nelem, num_weightsshort, nodes_short(num_layersshort)).

The target quantity is the short range energy only, if also an electrostatic energy contribution is used. The short range energy of the training points can be calculated by subtracting the electrostatic energy calculated from the electrostatic energy arising from the atomic charges of the training points.

Calculation of the Weight Derivatives for the Charge NN

For the weight optimization the derivatives of the total energy with respect to the weight parameters are required. Obtaining the derivatives with respect to the weights in the subnets determining the short range interaction is a standard procedure and they are given only by the NN topology. The derivatives with respect to the weights determining the electrostatic energy requires the calculation of the derivative of the Ewald energy with respect to the weight parameters \(w_{ik}^{k}\)' of the atomic charges subnets. The Ewald energy depends on these weights indirectly via the atomic charges \(Q\). We thus have

\[ \begin{align} \frac{\partial E_{\rm tot}}{\partial w'} = \frac{\partial E_{\rm Ewald}}{\partial w'} = \sum_i\frac{\partial E_{\rm Ewald}}{\partial Q_i} \cdot\frac{\partial Q_i}{\partial w'} \end{align} \]

The derivatives \(\frac{\partial Q_i}{\partial w'}\) of the atomic charges with respect to the weights \(w'\) are calculated in the same way as the derivative \(\frac{\partial E_i}{\partial w}\) of the short range atomic energy contributions and are given by the NN subnet topology. The derivatives of the Ewald energy with respect to the atomic charges are

\[ \begin{align} \frac{\partial E_{\rm Ewald}}{\partial Q_i} = \frac{\partial E_{\rm real}}{\partial Q_i} + \frac{\partial E_{\rm recip}}{\partial Q_i} + \frac{\partial E_{\rm self}}{\partial Q_i} \end{align} \]

with

\[ \begin{align} \frac{\partial E_{\rm real}}{\partial Q_i} &= \sum_{j\ne i}^N \frac{{\rm erfc} \left(\alpha R_{ij}\right)}{R_{ij}} \frac{1}{4\pi \epsilon_0}Q_j \quad, \\ % \frac{\partial E_{\rm recip}}{\partial Q_i} &= \frac{1}{2V\epsilon_0}\sum_{k\ne 0}^\infty \frac{{\rm exp}\left(-\frac{k^2}{4\alpha^2}\right)}{k^2} \notag \\ % & \cdot \Bigg[ 2\cdot \left( \sum_{j=1}^N Q_j\:{\rm cos} \left(\mathbf{k} \mathbf{R}_j\right) \right) \cdot{\rm cos}\left(\mathbf{k}\mathbf{R}_i\right) \notag \\ % & + 2\cdot \left( \sum_{j=1}^N Q_j\:{\rm sin}\left(\mathbf{k} \mathbf{R}_j\right) \right) \cdot{\rm sin}\left(\mathbf{k}\mathbf{R}_i\right) \Bigg] \end{align} \]

and

\[ \begin{align} \frac{\partial E_{\rm self}}{\partial Q_i} = \frac{1}{2\pi\epsilon_0}\frac{\alpha}{\sqrt{\pi}}Q_i \end{align} \]

The atomic charges have to be fitted under the constraint of overall charge neutrality. This constraint has to be taken into account in the weight updates.

\[\begin{align} \sum_i Q_i=0 \end{align} \]

The optimum set of weight parameters is obtained if the error function has a minimum, i.e., when the derivative of the error function with respect to all weight parameters is zero. Since the error function, like the sum of charges, has to be zero, the sum of charges for each training point can simply be added to the error function of that point.

\[ \begin{align} F(j)=E_{\rm DFT}(j)- E_{\rm NN}(j) + \sum_i Q_i(j) = min \end{align} \]

Consequently we need the derivative of the sum of charges with respect to the weights.

\[ \begin{align} \frac{\partial}{\partial w_{ij}^k}\sum_i Q_i \end{align} \]

The Kalman Filter

The EKF recursions are:

\[ \begin{align} \mathbf{K}(n) &= \lambda^{-1}\mathbf{P}(n-1)\mathbf{J}(n)\left[ \mathbf{I}+\lambda^{-1}\mathbf{J}^T(n)\mathbf{P}(n-1)\mathbf{J}(n) \right]^{-1} \notag \\ % &= \frac{\mathbf{P}(n-1)\mathbf{J}(n)}{ \lambda + \mathbf{J}^T(n)\mathbf{P}(n-1)\mathbf{J}(n) } \notag \\ % \mathbf{w}(n) &= \mathbf{w}(n-1)+\mathbf{K}(n)\left[ \mathbf{t}(n)-\mathbf{y}(\mathbf{w}(n-1),\mathbf{x}(n)) \right] \notag \\ \mathbf{P}(n) &= \lambda^{-1}\mathbf{P}(n-1) -\lambda^{-1}\mathbf{K}(n)\mathbf{J}^T(n)\mathbf{P}(n-1) \notag \\ % &= \lambda^{-1}\left[\mathbf{P}(n-1) - \mathbf{K}(n)\mathbf{J}^T(n)\mathbf{P}(n-1)\right] + \mathbf{Q}(n-1) \end{align} \]
\[ \begin{align} \mathbf{Q}(n) &= q(t)\mathcal{I} \\ q(t) &= max(q_{0}e^{-t/\tau_{q}},q_{min})\\ \lambda(k) &= \lambda_0\lambda(k-1)+1-\lambda_0 \end{align} \]

The update of the weight parameters according to the Kalman filter is done in subroutine iupdatekalman.f90.

The following notation is used in the code

  • \(\mathbf{w}=\) weights
  • \(\mathbf{K}=\) kalgainmat
  • \(\mathbf{J}=\) dedw
  • \(\mathbf{P}=\) corrmatrix?
  • \(\mathbf{Q}=\) noise matrix
  • \(\lambda=\) kalmanlambda

For the update of the Kalman gain matrix we use\ coh\(=\mathbf{P}(n-1)\mathbf{J}(n)\)\ coh is calculated using the BLAS library dspmv, which calculates \(y=\alpha\cdot A\cdot x +\beta \cdot y\). For this we use

  • \(\alpha=1\)
  • \(\beta=0\)
  • \(A=\mathbf{P}(n-1)\)
  • \(x=\mathbf{J}(n)\)

We further define

\[\begin{align} \texttt{inverse}=\frac{1}{\lambda + \mathbf{J}^T(n)\mathbf{P}(n-1)\mathbf{J}(n)} \end{align} \]

We then obtain

\[ \begin{align} \mathbf{K}(n)=\texttt{inverse}\cdot \texttt{coh} \end{align} \]

For the update of the symmetric matrix \(\mathbf{P}\) we can make use of

\[ \begin{align} \left(\mathbf{P}(n-1)\mathbf{J}(n)\right)^{T}=\mathbf{J}^T(n) \mathbf{P}^T(n-1)=\mathbf{J}^T(n) \mathbf{P}(n-1) \end{align} \]

\(\mathbf{P}(n)\) is calculated using the BLAS library dspr, which calculates \(A=\alpha\cdot x \cdot x^T + A\) We use

  • \(\alpha=-\lambda^{-1}\cdot \texttt{inverse}\)
  • \(x=\)coh
  • \(A=\lambda^{-1}\mathbf{P}(n-1)\)

Using Forces for the Weight Optimization

The forces contain valuable local information on the potential energy surface. This information can be used for the optimization of the weight parameters and RuNNer can use both, total energies and atomic forces from reference data to optimize the weight parameters.

The NN forces in RuNNer contain a short range and an electrostatic contribution. Unfortunately, these contributions cannot be separated easily in the DFT training data, because in the NN the charges are environment-dependent, and therefore the force contains the derivative of the charges with respect to the Cartesian coordinates. For the training set this information is not available. Unlike in case of the total energies, therefore the short range and electrostatic components cannot be separated and have to be fitted simultaneously. It is not possible to simply subtract the electrostatic forces to get the short range forces, because the derivative of the electrostatic energy expression assumes constant charges.

In order to use the forces for the weight optimization, the derivative of the forces with respect to the weight parameters of the short range and the electrostatic NN has to be known.

\[ \begin{align} \frac{\partial F_{\alpha}}{\partial w} =\frac{\partial F_{\alpha,short}}{\partial w} +\frac{\partial F_{\alpha,elec}}{\partial w} \end{align} \]

\(w\) can either be a bias weight \(b_i^j\) or a connection weight \(a_{ij}^{kl}\).

Optimization Using the Short Range Force

The short range force is given by

\[ \begin{align} F_{\alpha} = -\sum_i\sum_j\frac{\partial E_i}{\partial G_i^j} \frac{\partial G_i^j}{\partial \alpha} \end{align} \]

The derivative with respect to a short range weight \(w\) is then

\[ \begin{align} \frac{\partial F_{\alpha}}{\partial w} &= -\sum_i\sum_j \frac{\partial}{\partial w}\left( \frac{\partial E_i}{\partial G_i^j} \frac{\partial G_i^j}{\partial \alpha} \right) \notag \\ &= -\sum_i\sum_j\left( \frac{\partial G_i^j}{\partial \alpha} \frac{\partial }{\partial w} \frac{\partial E_i}{\partial G_i^j} +\frac{\partial E_i}{\partial G_i^j} \frac{\partial}{\partial w} \frac{\partial G_i^j}{\partial \alpha } \right) \\ &= -\sum_i\sum_j \frac{\partial G_i^j}{\partial \alpha} \frac{\partial}{\partial w} \frac{\partial E_i}{\partial G_i^j} \end{align} \]

because \(\frac{\partial}{\partial w}\frac{\partial G_i^j}{\partial \alpha}=0\). The derivative \(\frac{\partial E_i}{\partial G_i^j}\) is also calculated in subroutine getshortforces.f90 (deshortdsfunc) for the calculation of the short range forces, but it needs to be recalculated for each new set of weights. It has been shown above that the dependence on the weight parameters for an example 2-3-4-5-1 NN is

\[ \begin{align} \frac{\partial E}{\partial G_i} &= \frac{\partial f^4(x_1^4)}{\partial G_i} \cdot \sum_{l=1}^5 a_{l1}^{34} \cdot\frac{\partial f^3(x_l^3)}{\partial G_i} \cdot\sum_{k=1}^4 a_{kl}^{23} \cdot\frac{\partial f^2(x_k^2)}{\partial G_i} \cdot\sum_{j=1}^3 a_{jk}^{12} \cdot\frac{\partial f^1(x_j^1)}{\partial G_i} a_{ij}^{01} \end{align} \]

The values at the nodes before the application of the activation functions depend on the weights:

\[\begin{align} x_1^4&=x_1^4\left(b_1^4,a_{l1}^{34},b_l^3,a_{kl}^{23},b_k^2,a_{jk}^{12},b_j^1,a_{ij}^{01}\right) \\ x_l^3&=x_l^3\left(b_l^3,a_{kl}^{23},b_k^2,a_{jk}^{12},b_j^1,a_{ij}^{01}\right) \\ x_k^2&=x_k^2\left(b_k^2,a_{jk}^{12},b_j^1,a_{ij}^{01}\right) \\ x_j^1&=x_k^2\left(b_j^1,a_{ij}^{01}\right) \end{align} \]

Therefore also the derivatives of the activation functions \(\frac{\partial f(x)}{\partial G_i}\) at point \(x\) depend on these weight parameters. We thus need the derivative of the derivative of the activation functions.

The derivatives of the activation functions with respect to the \(G_i\) have been calculated in subroutine getdnodes_values.f90 and are called dnodes_values( num_layersshort, maxnodes_short). The values at the nodes \(x_{\mu}^{\nu}\) before the application of the activation functions \(f\) are called nodes_sum and are calculated in subroutine calconeatom.f90.

Another approach to calculate \(\frac{\partial }{\partial w}\frac{\partial E_i}{\partial G_i^j}\) might be better: It is also possible to rewrite the derivative of the forces with respect to the weights as

\[ \begin{align} \frac{\partial F_{\alpha}}{\partial w} &=-\sum_i\sum_j \frac{\partial G_i^j}{\partial \alpha} \frac{\partial }{\partial w} \frac{\partial E_i}{\partial G_i^j}\\ &=-\sum_i\sum_j \frac{\partial G_i^j}{\partial \alpha} \frac{\partial}{\partial G_i^j} \frac{\partial E_i}{\partial w} \end{align} \]

\(\frac{\partial F_{\alpha}}{\partial w}\) is called dfshortdw( num_weightsshort) in the code. \(\frac{\partial G_i^j}{\partial \alpha}\) is dsfuncdxyz (num_funcvalues, max_num_atoms, max_num_atoms, 3) in the code. The term \(\frac{\partial E_i}{\partial w}\) (deshortdw) has been calculated before in subroutine getonedeshortdw.f90 as dedw( num_weightsshort, nodes_short(num_layersshort)) to optimize the short range weight parameters using the short range energies. Its values cannot be used here, because the weight parameters have changed. Still, we can reuse the Kalman equations for the bias weight derivatives of \(E_m\):

\[ \begin{align} {\color{blue}\frac{\partial E_m}{\partial b_m^4}} &=\frac{\partial f^4\left(x_m^4\right)}{\partial b_m^4} \\ % {\color{blue}\frac{\partial E_m}{\partial b_l^3}} &=\frac{\partial E_m}{\partial b_m^4}\cdot a_{lm}^{34} \frac{\partial f^3\left(x_l^3\right)}{\partial b_l^3} \\ % {\color{blue}\frac{\partial E_m}{\partial b_k^2}} &=\sum_{l=1}^5\frac{\partial E_m }{\partial b_l^3}\cdot a_{kl}^{23} \frac{\partial f^2(x_k^2)}{\partial b_k^2} \\ % {\color{blue}\frac{\partial E_m}{\partial b_j^1}} &=\sum_{k=1}^4 \frac{\partial E_m}{\partial b_k^2}a_{jk}^{12} \frac{\partial f^1(x_j^1)}{\partial b_j^1} \end{align} \]

Then, the derivatives with respect to the input nodes for the bias weight derivatives are:

\[ \begin{align} {\color{red}\frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_{m}^{4}}} &=\frac{\partial}{\partial G_i} \frac{\partial f^4\left(x_m^4\right)}{\partial b_m^4} ={\color{violet}\frac{\partial^2 f^4\left(x_m^4\right)} {\partial G_i \partial b_m^4}}\cdot {\color{green2}\frac{\partial }{\partial G_i}x_m^4} \\ % {\color{red}\frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_l^3}} &= \frac{\partial}{\partial G_i}\left( \frac{\partial E_m}{\partial b_m^4}\cdot a_{lm}^{34} \frac{\partial f^3(x_l^3)}{\partial b_l^3} \right) \\ &= a_{lm}^{34}\cdot \frac{\partial f^3(x_l^3)}{\partial b_l^3} {\color{red}\frac{\partial }{\partial G_i} \frac{\partial E_m}{\partial b_m^4}} +{\color{blue}\frac{\partial E_m}{\partial b_m^4}}\cdot a_{lm}^{34} {\color{violet}\frac{\partial^2 f^3(x_l^3)}{\partial G_i \partial b_l^3}} {\color{green2}\frac{\partial}{\partial G_i}x_l^3} \\ % {\color{red}\frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_k^2}} &= \frac{\partial}{\partial G_i}\left( \sum_{l=1}^5 \frac{\partial E_m}{\partial b_l^3}\cdot a_{kl}^{23} \frac{\partial f^2(x_k^2)}{\partial b_k^2} \right)\\ &=\sum_{l=1}^5\left[ a_{kl}^{23}\frac{\partial f^2(x_k^2)}{\partial b_k^2} {\color{red}\frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_l^3}} + {\color{blue}\frac{\partial E_m}{\partial b_l^3}}\cdot a_{kl}^{23} {\color{violet}\frac{\partial^2 f^2(x_k^2)}{\partial G_i \partial b_k^2}} {\color{green2}\frac{\partial}{\partial G_i}x_k^2} \right]\\ % {\color{red}\frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_j^1}} &= \frac{\partial}{\partial G_i}\left( \sum_{k=1}^4 \frac{\partial E_m}{\partial b_k^2}a_{jk}^{12} \frac{\partial f^1(x_j^1)}{\partial b_j^1} \right)\\ &=\sum_{k=1}^4\left[ a_{jk}^{12}\frac{\partial f^1(x_j^1)}{\partial b_j^1} {\color{red}\frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_k^2}} + {\color{blue}\frac{\partial E_m}{\partial b_k^2}}\cdot a_{jk}^{12} {\color{violet}\frac{\partial^2 f^1(x_j^1)}{\partial G_i \partial b_j^1}} {\color{green2}\frac{\partial}{\partial G_i}x_j^1} \right] \end{align} \]

The second derivatives \(\frac{\partial^2 f^{\nu}(x_{\mu}^{\nu})}{\partial G_i \partial b_{\mu}^{\nu}}\) are ddnodes_values( num_layersshort, maxnodes_short) in the code (calculated in subroutine getddnodes_values.f90). The derivatives \(\frac{\partial }{\partial G_i}\frac{\partial E_m}{\partial b_{\mu}^{\nu}}\) are called dedgdw (num_weightsshort, num_funcvalues, nodes_short (num_layersshort)) in the code. They are calculated in subroutine getdfshortdw.

\[ \begin{align} \frac{\partial E_m}{\partial a_{lm}^{34}} &=\frac{\partial E_m}{\partial b_m^4} \cdot y_l^3 \\ % \frac{\partial E_m}{\partial a_{kl}^{23}} &=\frac{\partial E_m}{\partial b_l^3} \cdot y_k^2 \\ % \frac{\partial E_m}{\partial a_{jk}^{12}} &= \frac{\partial E_m}{\partial b_k^2} \cdot y_j^1 \\ % \frac{\partial E_m}{\partial a_{ij}^{01}} &= \frac{\partial E_m}{\partial b_j^1} \cdot G_i \end{align} \]

For the derivatives with respect to the \(G_i\) we then obtain for the connecting weights

\[ \begin{align} \frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial a_{lm}^{34}}&=&y_l^3 \cdot \frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_{m}^{4}} +\frac{\partial E_m}{\partial b_m^4}\frac{\partial y_l^3}{\partial G_i}\\ &=&y_l^3 \cdot {\color{red}\frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_{m}^{4}}} +{\color{blue}\frac{\partial E_m}{\partial b_m^4}}{\color{orange}\frac{\partial f^3(x_l^3)}{\partial G_i}}{\color{green2}\frac{\partial}{\partial G_i}x_l^3} \\ % \frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial a_{kl}^{23}}&=&y_k^2 \cdot \frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_l^3}+\frac{\partial E_m}{\partial b_l^3}\frac{\partial y_k^2}{\partial G_i}\\ &=&y_k^2 \cdot {\color{red}\frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_l^3}}+{\color{blue}\frac{\partial E_m}{\partial b_l^3}}{\color{orange}\frac{\partial f^2(x_k^2)}{\partial G_i}}{\color{green2}\frac{\partial}{\partial G_i}x_k^2} \\ % \frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial a_{jk}^{12}}&=&y_j^1 \cdot \frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_k^2}+\frac{\partial E_m}{\partial b_k^2}\frac{\partial y_j^1}{\partial G_i}\\ &=&y_j^1 \cdot {\color{red}\frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_k^2}}+{\color{blue}\frac{\partial E_m}{\partial b_k^2}}{\color{orange}\frac{\partial f^1(x_j^1)}{\partial G_i}}{\color{green2}\frac{\partial}{\partial G_i}x_j^1} \\ % \frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial a_{hj}^{01}} %&=&G_i \cdot {\color{red}\frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_j^1}}+{\color{blue}\frac{\partial E_m}{\partial b_j^1}} \nonumber \\ &=&\frac{\partial}{\partial G_i}\left( \frac{\partial E_m}{\partial b_j^1}\cdot G_h\right) \nonumber \\ &=& G_h\cdot \frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_j^1} +\frac{\partial E_m}{\partial b_j^1}\frac{\partial}{\partial G_i}G_h \nonumber \\ &=& G_h\cdot \frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_j^1}+\delta_{hi}\frac{\partial E_m}{\partial b_j^1} \end{align} \]

Note that in the last equation \(\frac{\partial E_m}{\partial a_{ij}^{01}}\) has been replaced by \(\frac{\partial E_m}{\partial a_{01}^{hj}}\) to distinguish the indices properly (Thanks to Andi!). The terms \(\frac{\partial}{\partial G_i}\frac{\partial E_m}{\partial b_{\mu}^{\nu}}\) have been derived before. The derivatives \(\frac{\partial E_m}{\partial b_{\mu}^{\nu}}\), too. We still need an expression for \(\frac{\partial}{\partial G_i}x_{\mu}^{\nu}\) (dnodes_values_dg( num_layersshort, maxnodes_short, num_funcvalues)):

\[ \begin{align} {\color{green2}\frac{\partial}{\partial G_i}x_j^1}&=&\frac{\partial }{\partial G_i}\left(b_j^1 +\sum_{i=1}^2 a_{ij}^{01}G_i\right)=a_{ij}^{01}\\ % {\color{green2}\frac{\partial}{\partial G_i}x_k^2}&=&\frac{\partial }{\partial G_i}\left(b_k^2 +\sum_{j=1}^3 a_{jk}^{12}y_j^1 \right)=\frac{\partial}{\partial G_i}\sum_{j=1}^3 a_{jk}^{12}f^1(x_j^1)\\ &=&\sum_{j=1}^3 a_{jk}^{12}{\color{orange}\frac{\partial f^1(x_j^1)}{\partial G_i}}\cdot {\color{green2}\frac{\partial}{\partial G_i}x_j^1} \\ % {\color{green2}\frac{\partial}{\partial G_i}x_l^3}&=&\frac{\partial }{\partial G_i}\left(b_l^3 +\sum_{k=1}^4 a_{kl}^{23}y_k^2 \right)=\frac{\partial}{\partial G_i}\sum_{k=1}^4 a_{kl}^{23}\frac{\partial}{\partial G_i}f^2(x_k^2)\\ &=&\sum_{k=1}^4 a_{kl}^{23}{\color{orange}\frac{\partial f^2(x_k^2)}{\partial G_i}}\cdot {\color{green2}\frac{\partial}{\partial G_i}x_k^2} \\ % {\color{green2}\frac{\partial}{\partial G_i}x_m^4}&=&\frac{\partial }{\partial G_i}\left(b_m^4 +\sum_{l=1}^5 a_{lm}^{34}y_l^3 \right)=\sum_{l=1}^5 a_{lm}^{34}\frac{\partial}{\partial G_i}f^3(x_l^3)\\ &=&\sum_{l=1}^5 a_{lm}^{34}{\color{orange}\frac{\partial f^3(x_l^3)}{\partial G_i}}\cdot {\color{green2}\frac{\partial}{\partial G_i}x_l^3} \end{align} \]

\(\frac{\partial }{\partial G_i}x_{\mu}^{\nu}\) = dnodes_values_dg is calculated in subroutine get_dnodes_values_dg.f90.

\(\frac{\partial f^{\nu}(x_{\mu}^{\nu})}{\partial G_i}\) = dnodes_values is calculated in subroutine getdfshortdw.f90.

Hints for Debugging

For debugging purposes it is useful to have a small NN that can be evaluated manually, e.g., a 4-2-2-1 lll NN. For such a network the weights are stored in the following order:

\[ \begin{align} weights\_short(:,1)&=1.0 &=a_{11}^{01} \\ weights\_short(:,2)&=2.0 &=a_{12}^{01} \\ weights\_short(:,3)&=3.0 &=a_{21}^{01} \\ weights\_short(:,4)&=4.0 &=a_{22}^{01} \\ weights\_short(:,5)&=5.0 &=a_{31}^{01} \\ weights\_short(:,6)&=6.0 &=a_{32}^{01} \\ weights\_short(:,7)&=7.0 &=a_{41}^{01} \\ weights\_short(:,8)&=8.0 &=a_{42}^{01} \\ weights\_short(:,9)&=1.0 &=b_{1}^{1} \\ weights\_short(:,10)&=2.0 &=b_{2}^{1} \\ weights\_short(:,11)&=0.1 &=a_{11}^{12} \\ weights\_short(:,12)&=0.2 &=a_{12}^{12} \\ weights\_short(:,13)&=0.3 &=a_{21}^{12} \\ weights\_short(:,14)&=0.4 &=a_{22}^{12} \\ weights\_short(:,15)&=3.0 &=b_{1}^{2} \\ weights\_short(:,16)&=4.0 &=b_{2}^{2} \\ weights\_short(:,17)&=2.0 &=a_{11}^{23} \\ weights\_short(:,18)&=3.0 &=a_{21}^{23} \\ weights\_short(:,19)&=5.0 &=b_{1}^{3} \end{align} \]

The numbers assigned to the weights here are arbitrary numbers to calculate the NN manually. The corresponding pointer array windex(2\(\cdot\)num_layersshort) is then

\[ \begin{align} windex(1)&=1 \\ windex(2)&=9 \\ windex(3)&=11 \\ windex(4)&=15 \\ windex(5)&=17 \\ windex(6)&=19 \end{align} \]

If we set the symmetry function values of the 4 input nodes as

\[ \begin{align} symfunction(:,1)&=1.0d0 \\ symfunction(:,2)&=2.0d0 \\ symfunction(:,3)&=3.0d0 \\ symfunction(:,4)&=4.0d0 \end{align} \]

we obtain for the values at the nodes nodes_values(num_layersshort, maxnodes_short):

\[ \begin{align} nodes\_values(1, 1)&= 51.0000000000 &=y_1^1 \\ nodes\_values(1, 2)&= 62.0000000000 &=y_2^1 \\ nodes\_values(2, 1)&= 26.7000000000 &=y_1^2 \\ nodes\_values(2, 2)&= 39.0000000000 &=y_2^2 \\ nodes\_values(3, 1)&= 175.4000000000 &=y_3^1=E \end{align} \]

For the partial derivatives dnodes_values(num_layersshort, maxnodes_short) we obtain

$$ \begin{align} dnodes_values(1, 1)&= 1.0000000000 \ dnodes_values(1, 2)&= 1.0000000000 \ dnodes_values(2, 1)&= 1.0000000000 \ dnodes_values(2, 2)&= 1.0000000000 \ dnodes_values(3, 1)&= 1.0000000000

\end{align} $$

For a 4-2-2-1 ttl NN with the hyperbolic tangent activation function in the hidden layers we would have

\[ \begin{align} dnodes\_values(1, 1)&= 0.0000000000 \\ dnodes\_values(1, 2)&= 0.0000000000 \\ dnodes\_values(2, 1)&= 0.0044451932 \\ dnodes\_values(2, 2)&= 0.0004040759 \\ dnodes\_values(3, 1)&= 1.0000000000 \end{align} \]

For the second derivatives (needed in subroutine getdfshortdw.f90) ddnodes_values we obtain for a 4-2-2-1 lll NN:

\[ \begin{align} ddnodes\_values(1, 1)&= 0.0000000000 \\ ddnodes\_values(1, 2)&= 0.0000000000 \\ ddnodes\_values(2, 1)&= 0.0000000000 \\ ddnodes\_values(2, 2)&= 0.0000000000 \\ ddnodes\_values(3, 1)&= 0.0000000000 \end{align} \]

and with a 4-2-2-1 ttl NN:

\[ \begin{align} ddnodes\_values(1, 1)&=& 0.0000000000 \\ ddnodes\_values(1, 2)&=& 0.0000000000 \\ ddnodes\_values(2, 1)&=& -0.0088706046 \\ ddnodes\_values(2, 2)&=& -0.0008079886 \\ ddnodes\_values(3, 1)&=& 0.0000000000 \end{align} \]