HDNNPDataset

class hdnnpy.dataset.hdnnp_dataset.HDNNPDataset(descriptor, property_, dataset=None)[source]

Bases: object

Combine and preprocess descriptor and property dataset.

It is desirable that the type of descriptor and property used for HDNNP is fixed at initialization.
Also, an instance itself does not have any dataset at initialization and you need to execute construct().
If dataset is given it will be an instance’s own dataset.
Parameters:
  • descriptor (DescriptorDatasetBase) – Descriptor instance you want to use as HDNNP input.
  • property_ (PropertyDatasetBase) – Property instance you want to use as HDNNP label.
  • dataset (dict [ndarray], optional) – If specified, dataset will be initialized with this.
__getitem__(item)[source]

Return indexed or sliced dataset as dict data.

__len__()[source]

Redicect to partial_size

construct(all_elements=None, preprocesses=None, shuffle=True, verbose=True)[source]

Construct an instance’s own dataset.

This method does following steps:

  • Check compatibility between descriptor and property datasets.
  • Expand feature dimension of descriptor dataset according to all_elements and pre-process descriptor dataset in a given order and add to its own dataset.
  • Add property dataset to its own dataset.
  • Clear up the original data in descriptor and property dataset.
  • Shuffle the order of the data.
Parameters:
  • all_elements (list [str], optional) – If specified, it expands feature dimensions of descriptor dataset according to this.
  • preprocesses (list [PreprocessBase], optional) – If specified, it pre-processes descriptor dataset in a given order.
  • shuffle (bool, optional) – If specified, it shuffles the order of the data.
  • verbose (bool, optional) – Print log to stdout.
Raises:

AssertionError – If descriptor and property datasets are incompatible.

scatter(max_buf_len=268435456)[source]

Scatter dataset by MPI communication.

Each instance is re-initialized with received dataset.

Parameters:max_buf_len (int, optional) – Each data is divided into chunks of this size at maximum.
take(index)[source]

Return copied object that has sliced dataset.

Parameters:index (int or slice) – Copied object has dataset indexed or sliced by this.
descriptor

Descriptor dataset instance.

Type:DescriptorDatasetBase
elemental_composition

Elemental composition of the dataset.

Type:list [str]
elements

Elements of the dataset.

Type:list [str]
n_input

Number of dimensions of input data.

Type:int
n_label

Number of dimensions of label data.

Type:int
partial_size

Number of data after scattered by MPI communication.

Type:int
property

Property dataset instance.

Type:PropertyDatasetBase
tag

Unique tag of the dataset.

Usually, it is a form like <any prefix> <chemical formula>. (ex. CrystalGa2N2)

Type:str
total_size

Number of data before scattered by MPI communication.

Type:int