HDNNPDataset¶

class hdnnpy.dataset.hdnnp_dataset.HDNNPDataset(descriptor, property_, dataset=None)[source]¶

Bases: object

Combine and preprocess descriptor and property dataset.

It is desirable that the type of descriptor and property used
for HDNNP is fixed at initialization.
Also, an instance itself does not have any dataset at
initialization and you need to execute construct().
If dataset is given it will be an instance’s own dataset.

Parameters:	descriptor (DescriptorDatasetBase) – Descriptor instance you want to use as HDNNP input. property_ (PropertyDatasetBase) – Property instance you want to use as HDNNP label. dataset (dict [ndarray], optional) – If specified, dataset will be initialized with this.

__getitem__(item)[source]¶: Return indexed or sliced dataset as dict data.

__len__()[source]¶: Redicect to partial_size

construct(all_elements=None, preprocesses=None, shuffle=True, verbose=True)[source]¶

Construct an instance’s own dataset.

This method does following steps:

Check compatibility between descriptor and property datasets.
Expand feature dimension of descriptor dataset according to all_elements and pre-process descriptor dataset in a given order and add to its own dataset.
Add property dataset to its own dataset.
Clear up the original data in descriptor and property dataset.
Shuffle the order of the data.

Parameters:

all_elements (list [str], optional) – If specified, it expands feature dimensions of descriptor dataset according to this.
preprocesses (list [PreprocessBase], optional) – If specified, it pre-processes descriptor dataset in a given order.
shuffle (bool, optional) – If specified, it shuffles the order of the data.
verbose (bool, optional) – Print log to stdout.

Raises:

AssertionError – If descriptor and property datasets are incompatible.

scatter(max_buf_len=268435456)[source]¶

Scatter dataset by MPI communication.

Each instance is re-initialized with received dataset.

Parameters:	max_buf_len (int, optional) – Each data is divided into chunks of this size at maximum.

take(index)[source]¶

Return copied object that has sliced dataset.

Parameters:	index (int or slice) – Copied object has dataset indexed or sliced by this.

descriptor¶

Descriptor dataset instance.

Type:	DescriptorDatasetBase

elemental_composition¶

Elemental composition of the dataset.

Type:	list [str]

elements¶

Elements of the dataset.

Type:	list [str]

n_input¶

Number of dimensions of input data.

Type:	int

n_label¶

Number of dimensions of label data.

Type:	int

partial_size¶

Number of data after scattered by MPI communication.

Type:	int

property¶

Property dataset instance.

Type:	PropertyDatasetBase

tag¶

Unique tag of the dataset.

Usually, it is a form like <any prefix> <chemical formula>. (ex. CrystalGa2N2)

Type:	str

total_size¶

Number of data before scattered by MPI communication.

Type:	int