Crawfish is a python library for pcohp analysis on JDFTx calculations.
Crawfish (originally called ultraSoftCrawfish) is a python library intended primarily for performing bonding analysis on the output of JDFTx calculations. Its reason for existing (as alluded to in the original name) is that the state-of-the-art COHP analysis software (LOBSTER) only supports calculations with PAW pseudopotentials. While the researchers of LOBSTER have shown unavoidable pitfalls when attempting cohp analysis on non-PAW calculations, cohp analysis on calculations of other pseudopotential-type calculations is far from meaningless and still provides tremendous insight. Thus the goal of crawfish is to allow access to cohp analysis for DFT users who do not use PAW pseudopotentials. While this library is intended for JDFTx, crawfish
also offers support for pCOHP analysis on user-provided system data, whether generated by another DFT software, or created by the user for learning purposes.
All arrays are named in the general format "name_indices", where "name" provides insight to the meaning of the array, and "indices" tells the user the array's dimensionality, and the significance of each dimension. ie for h_uu
, "h" would signify the system hamiltonian, and "uu" would signify the array is 2-dimensional, where both dimensions correspond to atomic orbitals (meaning of each index name given below). Parts of the indices are also occasionally separated by an underscore for clarity, but are meaningless (ie s_tj_uu
would be assumed equivalent to s_tjuu
)
Spin and k-points are collapsed to a single index t
, called a "state" (and nstates
gives the total number of states for a calculation) . When un-collapsed, spin is given the index s
(nspin
) and steps along the first, second, and third reciprocal lattice vector are given the indices a
, b
, and c
(nka
, nkb
, nkc
= kfolding
). Bands are indexed always by j
(nbands
). Orbitals are indexed by either u
(nproj
) (v
(
-
proj
is used to signify the projection vector, typically in shapetju
. In braket notation, proj_tju[t,j,u] =$\bra{\phi_\mu}\psi_j(t)\rangle$ . -
e
($\epsilon$ ) is used to signify the Kohn-Sham eigenvalues of the DFT calculation, and has either the shapetju
orsabcju
. -
wk
is used to signify the weights of each k-point, and has only the shapet
-
occ
($f$ ) is used to signify the occupation at each state (k-point + spin) and band, and thus has either shapetj
orsabcj
-
s
is used to signify orbital overlaps, thus will either have shapeuu
($\bra{\phi_\mu} \phi_\nu\rangle$ ) ortj_uu
($\bra{\phi_\mu}\psi_j(t)\rangle\langle\psi_j(t)\ket{\phi_\nu}$ ) -
p
is used to signify orbital-overlap populations,thus will either have shapeuu
($\bra{\phi_\mu} \hat{\rho} \ket{\phi_\nu}$ ) ortj_uu
($f_j(t)\bra{\phi_\mu}\psi_j(t)\rangle\rho_j(t)\langle\psi_j(t)\ket{\phi_\nu}$ ).
Unless otherwise indicated, all energies are in Hartrees and are not normalized to the Fermi level!!
-
trim_excess_bands
is a bool class variable ofElecData
in which onlynproj
bands are included in analysis. By trimming excess bands, the projection vectorproj_tju
becomes square at each state. This has been primarily useful so far as means of allowing the projections at each state to be normalized for each band and each orbital. Theoretically, this also allows for projections to undergo a band-lowdin-orthogonalization, but the usefulness has not been investigated. Theoretically this also allows for using the dual space of the projections (allowing for less ad-hoc approaches to charge conservation), but this has yet to be implemented. -
los_orbs
is a bool class variable ofElecData
in which orbitals are made orthogonal to one another via the Lowdin-Orthogonalization technique. This may seem counterintuitive in a framework centered around how orbitals interact with each other, but remember that this orthogonality ($\langle\phi_\mu|\phi_\nu\rangle=\delta_{\mu,\nu}$ ) does not eliminate overlap between orbitals at individual bands and states ($\bra{\phi_\mu}\psi_j(t)\rangle\langle\psi_j(t)\ket{\phi_\nu}$ ), only over the sum of all bands and states ($\sum_{j,t}w_t\langle\phi_\mu|\psi_j(t)\rangle\langle\psi_j(t)|\phi_\nu\rangle)=\delta_{\mu,\nu}$ ). This is an incredibly useful technique when trying to reformulate our calculation in a LCAO picture, as it ensures that for all bonding interactions (bandsj
at statet
where$c_{\mu,j}(t)^* c_{\nu,j}(t)>0$ ), there are enough antibonding interactions ($c_{\mu,j}(t)^* c_{\nu,j}(t)<0$ ) such that the sum over all bands at that state for any orbital pair$\mu,\nu$ sums to$\delta_{\mu,\nu}$ . The Lowdin-Orthogonalization technique is the obvious choice for this orthogonalization, as it is a simple to employ (takes 5 lines of vectorized numpy processes here) and minimizes the deviation of each projection from the true value (JDFTx will orthogonalize the orbitals if given the argumentband-projection-params yes no
prior to evaluating and dumping the band projections. However, due to the incompleteness of the space spanned by the bands at each state, this orthogonality will be lost when evaluating total overlap with the dumped projections. The same can also be realized for the bands due to the incompleteness of the space spanned by the orbitals).
-
p_uu_consistent
is a bool class variable ofElecData
ensuring charge conservation when building the orbital-overlap population matrix. WhenTrue
, it will temporarily re-scaleproj
such that summing overu
andv
forp_tj_uu[t,j,u,v]
equalsocc_tj[t,j]
($$\sum_{\mu, \nu}P_{u, v}(t, j)=f_j(t)$$ ). -
s_tj_uu_real
is a bool class variable ofElecData
ensuring that orbital overlap is a real value. Since planewaves have a complex component, orbital/band projectionsproj_tju
($\bra{\phi_\mu}\psi_j(t)\rangle$ ) are typical complex. -
s_tj_uu_pos
is a bool class variable ofElecData
ensuring that orbital overlap is a positive value. This is done by subtracting out the smallest value from the entire tensor, and then rescaling the entire tensor such the sum over all indicestjuv
matches the original sum.
For the following equations, projections (proj_tju[t,j,u]
) are short-handed as e_tj[t,j]
) are notated as erange
. By default, gaussian smearing is employed, by which sig
. If linear tetrahedron integration (lti
) is requested, libtetrabz
package, and
- pDOS Projected density-of-states (pDOS) is primarily included in this package for sanity checks, and is evaluated as
- pCOHP
where
and
Similar techniques (pCOOP and COBI) are available but not reccomended as they are currently benchmarking very poorly in this implementation.
- Non-PAW JDFTx calculations The intended audience for
crawfish
is anyone curious about the bondinging within a non-PAW pseudopotential calculation performed using JDFTx. While LOBSTER is not explicitly supported by JDFTx, the output of any unsupported calculation with PAW pseudopotentials can be converted by the user to mimic the output of a calculation which is supported by LOBSTER, circumventing the need of explicit support. If this is not the case,crawfish
is here for you. - General non-PAW calculations The techniques used by
crawfish
are made available to other DFT calculators, so long as the user is able to acquire the required data to construct anElecData
object. The instructions for how to do so are available in the "Creating your ownElecData
" section of this readme. This process requires providingcrawfish
with the Kohn-Sham eigenvalues, and the projections of each Kohn-Sham wavefunction onto each orbital (as well as some other information that is typically much easier to obtain). If you are interested in doing so, please reach out to me ([email protected]) to help you with any obstacles that might require fixing some less-tested parts of the code.
- Create an
ElecData
objectElecData
is the class used to house all electronic data and derived tensors for a given calculation. Provided the JDFTx calculation has been run with the required settings (band-projection-params yes no
,dump End BandProjections
anddump End BandEigs
), this can be done in one line asedata = ElecData.from_calc_dir(calc_dir)
, whereElecData
has been imported fromcrawfish.core.elecdata
, andcalc_dir
is eitherstr
orPath
giving the full path to your directory containing the calculation output data. - Change desired settings If there are any parameters you wish to change that effect the computed tensors required for pCOHP analysis (ie
edata.los_orbs
), you can change these values in the typical fashion (edata.los_orbs = False
) triggering a re-evaluation of the affected tensors with this change in mind. If you have multiple settings you want to change, you can avoid repeated re-evaluations by changing the setting's private value (edata._los_orbs = False
) and either remembering to change the final setting through the public value or by runningedata.alloc_elec_data()
. - Import function(s) for desired analysis All spectrum-generating functions for a given analysis technique "" can be imported from
crawfish.funcs.<mode>
, which will contain a dos-like spectrum-generating functionget_<mode>
and a spectrum integrating functionget_i<mode>
. - Generate spectra and plot Dos-like generating functions and spectrum integrating function will both return a length-2 tuple
erange, spectrum
, corresponding to the spectrum values and their corresponding energy-axis values. If you are unfamiliar with plotting in python, this can be easily done withmatplotlib
(the installation of this library ensures your python environment has this package) by importingimport matplotlib.pyplot as plt
and runningplt.plot(erange, spectrum)
. The arguments for these functions can all be checked in the docustrings for the function definitions.
- Collect (or artificially construct) the following objects
- A pymatgen
Structure
of your system. It is critical that the species in this structure are ordered by their atomic number. - The band eigenvalues in shape
tj
as a numpy array. - The k-point folding and the corresponding k-points. (if you are unsure but know the total number of k-points, the kfolding can be set arbitrarily and the k-points can be left as None)
- The k-point weights (if you are unsure but know there was no symmetry reduction (ie every k-point on your MK grid was evaluated) then all your k-point weights are equal to nspin/nkpts. If there was symmetry reduction of your k-mesh but know how many k-points were reduced into each of the output k-pts, multiply the weights by the number of k-points each output k-point "represents")
- The number of projections (orbitals) gathered for each atom type and their corresponding quantum numbers (exact principal quantum number n matters less, as long as you known the ordering of them for multiple shells of a given angular momentum)
- The projection coefficients for each band+state on each orbital in shape
tju
. It is critical that you have the actual projections and not their absolute values (the latter typically dumped as it is all that is needed for pDOS analysis) as taking the absolute value removes all information about the phase of the orbital at that band+state (bonding vs antibonding interaction is determined solely by the matching of phases between two orbitals). The ordering of the projections must match the ordering of the atoms as given in the Structure (ie for an all-electron calculation of an Li2 structure, projections 0-3 should correspond to Li #1's 1s, Li #1's 2s, Li #2's 1s, and Li #2's 2s)
- Initialize an empty
ElecData
object with the class methodedata = ElecData.as_empty()
to circumvent initialization procedures for JDFTx calculations. - Set
user_proj_tju
(ieedata.user_proj_tju = np.random([10,5,4])
). This will not be touched, and all projection manipulations will be performed on a copy of this array. This will automatically definenstates
,nbands
, andnproj
- Set the
atom_orb_labels_dict
property for the class as a dictionary mapping each element in your calculation to a string representing of all the quantum numbers for the projections gathered for that element (ie for a calculation of C2H4O, setedata.atom_orb_labels_dict = {"H": ["s"], "C": ["s", "px", "py", "pz], "O": ["s", "px", "py", "pz"]}
. If you do not wish to perform orbital resolution on your analysis, it does not matter what you put here, as long as all the elements of each list are unique and the list length matches the number of projections for that element. The ordering of the projections must match the ordering as they are listed in youruser_proj_tju
. If you have multiple projections for a given angular momentum, include the principal quantum number as well (ie["1s", "2s"]
- they do not need to be the true principal quantum number. - Set
kfolding
(ieedata.kfolding = [3,3,3]
). If you are unsure but knownspin
, set as[int(edata.nstates/nspin), 1, 1]
- Set
wk_t
(ieedata.wk_t = np.ones(edata.nstates)*edata.nspin/np.prod(edata.kfolding)
). If you are unsure, just make sure they sum tonspin
. - Set the fermi level as
edata.mu
(if you are going to setocc_tj
explicitly, this step becomes optional but is still useful for plotting)
- If you have the state/band occupation, set it as
edata.occ_tj
. Otherwise, it will be calculated for your usingedata.broadening
andedata.broadening_type
.
Any github-url pip installation method should work, but below are the steps I have tested and know should work.
- Clone this repo somewhere
git clone https://github.com/benrich37/crawfish.git
- Activate the python environment you wish to use when performing pCOHP analysis NOTE: At the moment, the JDFTx IO module that part of this library depends on only exists on an independent fork of pymatgen. At the time of writing this (10/24/24) this fork is fully up-to-date, but later on this installation may roll back your pymatgen to an older version. If you are worried about dependency conflicts, I would reccomend creating a conda virtual environment with python version 3.12 (latest as of writing this)
- Navigate to ~/crawfish/ where you cloned this repo (not ~/crawfish/src/crawfish/) and install via pip
cd ./crawfish
pip install .