DescriptorDatasetBase¶

class hdnnpy.dataset.descriptor.descriptor_dataset_base.DescriptorDatasetBase(order, structures)[source]¶

Bases: abc.ABC

Base class of atomic structure based descriptor dataset.

Common instance variables for descriptor datasets are initialized.

Parameters:	order (int) – Derivative order of descriptor to calculate. structures (list [AtomicStructure]) – Descriptors are calculated for these atomic structures.

__getitem__(item)[source]¶

Return descriptor data this instance has.

If item is string, it returns corresponding descriptor. Available keys can be obtained by descriptors attribute. Otherwise, it returns a list of descriptor sliced by item.

__len__()[source]¶: Number of atomic structures given at initialization.

calculate_descriptors(structure)[source]¶

Calculate required descriptors for a structure data.

This is abstract method. Subclass of this base class have to override.

Parameters:	structure (AtomicStructure) – A structure data to calculate descriptors.
Returns:	Calculated descriptors. The length is the same as `order` given at initialization.
Return type:	list [ndarray]

clear()[source]¶: Clear up instance variables to initial state.

generate_feature_keys(*args, **kwargs)[source]¶

Generate feature keys of current state.

This is abstract method. Subclass of this base class have to override.

Returns:	Unique keys of feature dimension.
Return type:	list [str]

load(file_path, verbose=True, remake=False)[source]¶

Load dataset from .npz format file.

Only root MPI process load dataset.

It validates following compatibility between loaded dataset and atomic structures given at initialization.

length of data

elemental composition

elements

tag

It also validates that loaded dataset satisfies requirements.

feature keys

order

Parameters:	file_path (Path) – File path to load dataset. verbose (bool, optional) – Print log to stdout. remake (bool, optional) – If loaded dataset is lacking in any feature key or any descriptor, recalculate dataset from scratch and overwrite it to `file_path`. Otherwise, it raises ValueError.
Raises:	`AssertionError` – If loaded dataset is incompatible with atomic structures given at initialization. `ValueError` – If loaded dataset is lacking in any feature key or any descriptor and `remake=False`.

make(verbose=True)[source]¶

Calculate & retain descriptor dataset

It calculates descriptor dataset by data-parallel using MPI communication.

The calculated dataset is retained in only root MPI process.

Parameters:	verbose (bool, optional) – Print log to stdout.

save(file_path, verbose=True)[source]¶

Save dataset to .npz format file.

Only root MPI process save dataset.

Parameters:	file_path (Path) – File path to save dataset. verbose (bool, optional) – Print log to stdout.
Raises:	`RuntimeError` – If this instance do not have any data.

DESCRIPTORS = []¶

Names of descriptors for each derivative order.

Type:	list [str]

descriptors¶

Names of descriptors this instance have.

Type:	list [str]

elemental_composition¶

Elemental composition of atomic structures given at initialization.

Type:	list [str]

elements¶

Elements of atomic structures given at initialization.

Type:	list [str]

feature_keys¶

Unique keys of feature dimension.

Type:	list [str]

has_data¶

True if success to load or make dataset, False otherwise.

Type:	bool

n_feature¶

Length of feature dimension.

Type:	int

name = None¶

Name of this descriptor class.

Type:	str

order¶

Derivative order of descriptor to calculate.

Type:	int

tag¶

Unique tag of atomic structures given at initialization.

Usually, it is a form like <any prefix> <chemical formula>. (ex. CrystalGa2N2)

Type:	str