General utility

class madas.utils.BatchIterator(n_entries: int, batch_size: int, n_tasks: int = 1, task_id: int = 0, symmetric: bool = True)

A helper class for paralellization of calculation of large batched similarity matrices.

A batch is defined here as a nested list of integers describing the indices of a sub-matrix.

To parallelize the calculation of large matrices of different nodes in a compute cluster, each node may compute several sub-matrices that can be combined to the complete matrix.

To reduce the memory footprint, each node loads only a subset of the required fingerprints and calculates the (overlap) similarity matrix from these.

This class iteratively provides the mapping of fingerprint indices in a given list. Thus, upon iteration over an object of this class, all combinations of indices that are required to compute a unique sub-matrix for a specific task id are returned.

property batch_size

Range of integers in a batch.

property batches

_All_ batches, regardless of the task id.

get_batch_rows()

This is sensitive to the task index! Get all rows of batches.

get_batches_for_index(index: int)

Return all batches that contain a given index.

Arguments:

index: int

Index to that should be contained in returned batches.

Returns:

batches: List[List[List[int]]]

batches containing the given index

static linear_batch_list(size, batch_size)

List of indices that splits a list of size size into lists of length batch_size.

property n_entries

Total number of entries, i.e. the size of the range of integer to consider.

plot_batch_rows(figure=True, show=True, text_fontsize=10)

Generate a plot of the batches in a row for visualization.

Keyword arguments:

figure: bool

Create new matplotlib.pyplot.figure

default: True

show: bool

show plot

default: True

plot_batches(figure=True, show=True, text_fontsize=10)

Generate a plot of the batches for visualization.

Keyword arguments:

figure: bool

Create new matplotlib.pyplot.figure

default: True

show: bool

show plot

default: True

class madas.utils.JSONNumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Encoder to

default(obj: Any) Any

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
madas.utils.print_dict_tree(dict_: dict, indent: int = 0, indent_symbol: str = ' ') None

Recursively print keys of a dictionary.

madas.utils.print_key_paths(key_name: str, dictionary: dict, parent_path: str = '', child_of: str | None = None) None

Iterate recursively trough a dict and print all paths that end with a given key name.

Arguments:

key_name: str

key for to search path for

dictionary: dict

(Nested) dictionary in which the key path is searched.

Keyword arguments:

parent_path: str

path until current iteration

default: “”

child_of: str or None

if not None, print only paths that contain the string specified here

default: None

madas.utils.report_error(logger: Logger, error_message: str)

Report error by writing it to a logging.Logger instance or to sys.stderr.

Arguments:

logger: logging.Logger or None

Log target. Write to log or to stderr if logger == None

error_message: str

Message to write

Returns:

None

madas.utils.resolve_nested_dict(archive: dict, path: str, error_message='failed to resolve path', fail_on_key_error=False) Any

Given a nested dictionary (including lists), return the entry at path in the dictionary.

Arguments:

archive: Dict[Dict, List]

Nested dictionary with str or int keys

Example: {‘a’ : {‘b’ : [{‘c’:5}]}, ‘d’ : ‘data’}

path: str

Keys to navigate the dictionary given as archive, separated by a ‘/’

Example: ‘a/b/0/c’ # return value 5

Keyword arguments:

error_message: str

Message to display if the path can not be resolved

default: “failed to resolve path”

fail_on_key_error: bool

If the path can not be resolved, raise the Exception insted of returning None.

Returns:

archive: Any

Value at specified location in the dictionary. None, if the path can not resolved.

Raises:

KeyError

Path can not be resolved and fail_on_key_error==True

madas.utils.rmsle(y_true, y_pred)

Root means square logarithmic error as used in the NOMAD kaggle competition:

Sutton, C., Ghiringhelli, L.M., Yamamoto, T. et al. Crowd-sourcing materials-science challenges with the NOMAD 2018 Kaggle competition. npj Comput Mater 5, 111 (2019). https://doi.org/10.1038/s41524-019-0239-3

Arguments:

y_true: List[float]

List of true target values

y_pred: List[float]

List of predicted target values

Returns:

rmsle: float

Root mean squared logarithmic error

madas.utils.safe_log(message: str, logger: Logger | None = None, level: str = 'error')

Report error by writing it to a logging.Logger instance or to sys.stderr.

Arguments:

error_message: str

Message to write

Keyword arguments:

logger: logging.Logger or None

Log target. Write to log or to stderr if logger == None

default: None

level: str

Choose target of log. Write to logger.info, logger.error, or, stdout, stderr.

Options: “error”, “info”

default, and fallback, is “error”

Returns:

None

madas.utils.seed_random_number_generators(random_seed)

Seed Python standard library random generator and numpy random generator.

Arguments:

random_seed: Any

Seed passed to random number generators.