General utility
- class madas.utils.BatchIterator(n_entries: int, batch_size: int, n_tasks: int = 1, task_id: int = 0, symmetric: bool = True)
A helper class for paralellization of calculation of large batched similarity matrices.
A batch is defined here as a nested list of integers describing the indices of a sub-matrix.
To parallelize the calculation of large matrices of different nodes in a compute cluster, each node may compute several sub-matrices that can be combined to the complete matrix.
To reduce the memory footprint, each node loads only a subset of the required fingerprints and calculates the (overlap) similarity matrix from these.
This class iteratively provides the mapping of fingerprint indices in a given list. Thus, upon iteration over an object of this class, all combinations of indices that are required to compute a unique sub-matrix for a specific task id are returned.
- property batch_size
Range of integers in a batch.
- property batches
_All_ batches, regardless of the task id.
- get_batch_rows()
This is sensitive to the task index! Get all rows of batches.
- get_batches_for_index(index: int)
Return all batches that contain a given index.
Arguments:
- index: int
Index to that should be contained in returned batches.
Returns:
- batches: List[List[List[int]]]
batches containing the given index
- static linear_batch_list(size, batch_size)
List of indices that splits a list of size size into lists of length batch_size.
- property n_entries
Total number of entries, i.e. the size of the range of integer to consider.
- plot_batch_rows(figure=True, show=True, text_fontsize=10)
Generate a plot of the batches in a row for visualization.
Keyword arguments:
- figure: bool
Create new matplotlib.pyplot.figure
default: True
- show: bool
show plot
default: True
- plot_batches(figure=True, show=True, text_fontsize=10)
Generate a plot of the batches for visualization.
Keyword arguments:
- figure: bool
Create new matplotlib.pyplot.figure
default: True
- show: bool
show plot
default: True
- class madas.utils.JSONNumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)
Encoder to
- default(obj: Any) Any
Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- madas.utils.print_dict_tree(dict_: dict, indent: int = 0, indent_symbol: str = ' ') None
Recursively print keys of a dictionary.
- madas.utils.print_key_paths(key_name: str, dictionary: dict, parent_path: str = '', child_of: str | None = None) None
Iterate recursively trough a dict and print all paths that end with a given key name.
Arguments:
- key_name: str
key for to search path for
- dictionary: dict
(Nested) dictionary in which the key path is searched.
Keyword arguments:
- parent_path: str
path until current iteration
default: “”
- child_of: str or None
if not None, print only paths that contain the string specified here
default: None
- madas.utils.report_error(logger: Logger, error_message: str)
Report error by writing it to a logging.Logger instance or to sys.stderr.
Arguments:
- logger: logging.Logger or None
Log target. Write to log or to stderr if logger == None
- error_message: str
Message to write
Returns:
None
- madas.utils.resolve_nested_dict(archive: dict, path: str, error_message='failed to resolve path', fail_on_key_error=False) Any
Given a nested dictionary (including lists), return the entry at path in the dictionary.
Arguments:
- archive: Dict[Dict, List]
Nested dictionary with str or int keys
Example: {‘a’ : {‘b’ : [{‘c’:5}]}, ‘d’ : ‘data’}
- path: str
Keys to navigate the dictionary given as archive, separated by a ‘/’
Example: ‘a/b/0/c’ # return value 5
Keyword arguments:
- error_message: str
Message to display if the path can not be resolved
default: “failed to resolve path”
- fail_on_key_error: bool
If the path can not be resolved, raise the Exception insted of returning None.
Returns:
- archive: Any
Value at specified location in the dictionary. None, if the path can not resolved.
Raises:
- KeyError
Path can not be resolved and fail_on_key_error==True
- madas.utils.rmsle(y_true, y_pred)
Root means square logarithmic error as used in the NOMAD kaggle competition:
Sutton, C., Ghiringhelli, L.M., Yamamoto, T. et al. Crowd-sourcing materials-science challenges with the NOMAD 2018 Kaggle competition. npj Comput Mater 5, 111 (2019). https://doi.org/10.1038/s41524-019-0239-3
Arguments:
- y_true: List[float]
List of true target values
- y_pred: List[float]
List of predicted target values
Returns:
- rmsle: float
Root mean squared logarithmic error
- madas.utils.safe_log(message: str, logger: Logger | None = None, level: str = 'error')
Report error by writing it to a logging.Logger instance or to sys.stderr.
Arguments:
- error_message: str
Message to write
Keyword arguments:
- logger: logging.Logger or None
Log target. Write to log or to stderr if logger == None
default: None
- level: str
Choose target of log. Write to logger.info, logger.error, or, stdout, stderr.
Options: “error”, “info”
default, and fallback, is “error”
Returns:
None
- madas.utils.seed_random_number_generators(random_seed)
Seed Python standard library random generator and numpy random generator.
Arguments:
- random_seed: Any
Seed passed to random number generators.