DescriptorDatasetBase

class hdnnpy.dataset.descriptor.descriptor_dataset_base.DescriptorDatasetBase(order, structures)[source]

Bases: abc.ABC

Base class of atomic structure based descriptor dataset.

Common instance variables for descriptor datasets are initialized.

Parameters:
  • order (int) – Derivative order of descriptor to calculate.
  • structures (list [AtomicStructure]) – Descriptors are calculated for these atomic structures.
__getitem__(item)[source]

Return descriptor data this instance has.

If item is string, it returns corresponding descriptor. Available keys can be obtained by descriptors attribute. Otherwise, it returns a list of descriptor sliced by item.

__len__()[source]

Number of atomic structures given at initialization.

calculate_descriptors(structure)[source]

Calculate required descriptors for a structure data.

This is abstract method. Subclass of this base class have to override.

Parameters:structure (AtomicStructure) – A structure data to calculate descriptors.
Returns:Calculated descriptors. The length is the same as order given at initialization.
Return type:list [ndarray]
clear()[source]

Clear up instance variables to initial state.

generate_feature_keys(*args, **kwargs)[source]

Generate feature keys of current state.

This is abstract method. Subclass of this base class have to override.

Returns:Unique keys of feature dimension.
Return type:list [str]
load(file_path, verbose=True, remake=False)[source]

Load dataset from .npz format file.

Only root MPI process load dataset.

It validates following compatibility between loaded dataset and atomic structures given at initialization.

  • length of data
  • elemental composition
  • elements
  • tag

It also validates that loaded dataset satisfies requirements.

  • feature keys
  • order
Parameters:
  • file_path (Path) – File path to load dataset.
  • verbose (bool, optional) – Print log to stdout.
  • remake (bool, optional) – If loaded dataset is lacking in any feature key or any descriptor, recalculate dataset from scratch and overwrite it to file_path. Otherwise, it raises ValueError.
Raises:
  • AssertionError – If loaded dataset is incompatible with atomic structures given at initialization.
  • ValueError – If loaded dataset is lacking in any feature key or any descriptor and remake=False.
make(verbose=True)[source]

Calculate & retain descriptor dataset

It calculates descriptor dataset by data-parallel using MPI communication.
The calculated dataset is retained in only root MPI process.
Parameters:verbose (bool, optional) – Print log to stdout.
save(file_path, verbose=True)[source]

Save dataset to .npz format file.

Only root MPI process save dataset.

Parameters:
  • file_path (Path) – File path to save dataset.
  • verbose (bool, optional) – Print log to stdout.
Raises:

RuntimeError – If this instance do not have any data.

DESCRIPTORS = []

Names of descriptors for each derivative order.

Type:list [str]
descriptors

Names of descriptors this instance have.

Type:list [str]
elemental_composition

Elemental composition of atomic structures given at initialization.

Type:list [str]
elements

Elements of atomic structures given at initialization.

Type:list [str]
feature_keys

Unique keys of feature dimension.

Type:list [str]
has_data

True if success to load or make dataset, False otherwise.

Type:bool
n_feature

Length of feature dimension.

Type:int
name = None

Name of this descriptor class.

Type:str
order

Derivative order of descriptor to calculate.

Type:int
tag

Unique tag of atomic structures given at initialization.

Usually, it is a form like <any prefix> <chemical formula>. (ex. CrystalGa2N2)

Type:str