About this package

MADAS is a Python framework for computing similarities between materials.

It is written out of the need to simplify the analysis of materials data from various (or changing) sources and simplifies data analytics tasks such as similarity searches or clustering and enhances reproducibility.

The figure below presents an overview of the framework structure.

_images/WorkflowDiagram.png

The left side of the figure shows data-managements related components: External APIs can be queried using API Interfaces. Despite the built-in classes, it is easy to define new API Interfaces, which will be compatible with the rest of the framework.

Data from the API Interfaces will be output as Material objects, which are used as a consistent way of exchanging data between MADAS components. They contain a unique identifier, the Material IDentifier (mid), the atomic structure (if available) as an ASE Atoms object, and material data and properties.

The database is used to interact with data. To do so, it receives data from API Interfaces and writes them to a backend. The latter is responsible for the physical storage of data, e.g., on a hard drive. Currently, a relational database using the ASE AtomsDatabase is implemented, as well as a simple, Python-dictionary based storage.

Material objects from the database can be used to generate Fingerprints, which are the combination of a descriptor of a material and a similarity measure. New types of fingerprints can be defined quickly using the Fingerprint base class.

Fingerprints can be used to calculate similarity matrices. These store the similarity scores between materials and their respective mid. The calculation of similarities using the SimilarityMatrix class is parallelized over all available CPU cores by default and computes only unique entries in the matrix. Very large matrices can be computed using the BatchedSimilarityMatrix, which allows to parallelize the computation of similarities over HPC clusters, currently supporting the SLURM resource manager.

Additional AI tools for clustering and data analysis are available and optimized for the usage with Fingerprint and SimilarityMatrix objects.