The data framework

The data framework is the basic building block of a data analysis workflow. Data that may be gathered from different sources can be stored in a local database, to provide a fixed source of data, which does not change, e.g. if the external data source is altered. Thus, the analysis is repeatable and the results retain reproducible.

You can learn more about backends in the backend module description.

You can learn more about fingerprints in the fingerprint module description.

The remaining parameters and methods can be found below.

class madas.data_framework.MaterialsDatabase(filename: str = 'materials_database.db', filepath: str = 'data', name: str | None = None, key_name: str = 'mid', api: object | None = None, backend: str | object = 'ase', log_mode: str = 'full')

A database wrapper to simplify materials data download from online repositories and study the similarity of materials based on different measures. Materials in the database can be accessed via material identifiers (mid).

Keyword arguments

filename: str

Name of database file. Ignored if a Backend is specified explicitly.

default: ‘materials_database.db’

filepath: str

Path of database file. Ignored if a Backend is specified explicitly.

default: ‘data’

key_name: str

Name of unique key used in the database backend. Ignored if a Backend is specified explicitly.

default: ‘mid’

api: madas.apis.api_core.APIClass object or None

API object that provides an interface to web databases Default will use the NOMAD Encyclopedia API.

default: None

backend: str or madas.backend.backend_core.Backend object

Name of database backend to use or backend object. Default is ASEs AtomsDatabase.

default: ‘ase’

log_mode: str

Logging mode: choose between:

“full”: Write to screen and log file

“silent” : Write to file only

“stream” : Write to screen only

“None” : Do not log

default: “full”

Methods:

add_fingerprint(fp_type: str | type, name: str | None = None, show_progress: bool = True, force_calculate: bool = False, fingerprint_kwargs: dict = {}, **kwargs) None

Calculate fingerprints of all materials in the database and store them.

Arguments:

fp_type: str or type

Type of fingerprint X, must correspond to a XFingerprint() object

Fingerprint types can also be added as type`s, then the data will be stored using the function `Fingerprint().serialize(). Deserialized fingerprints from the database will be generic Fingerprint objects and the similarity function is not set.

Keyword arguments:

name: str

Name of the fingerprint as used in the database, if name == None: name = fp_type. This parameter is used to distinguish between fingerprints in the database, i.e. it must be unique for each unique fingerprint.

default: None

force_calculate: bool

Force calculation of the fingerprint even if fingerprint data was stored in the database before.

default: False

fingerprint_kwargs: dict

Additional keyword arguments that are passed to Fingerprint().__init__.

default: {}

show_progress: bool

Show a progress bar during calculation.

default: True

Additional keyword arguments are passed to Fingerprint().calculate().

add_fingerprints(fp_types: List[str | type], names: List[str] = [None], show_progress=False, fingerprint_kwargs_list: List[dict] | None = None, fingerprint_calculate_kwargs_list: List[dict] | None = None, force_calculate=False)

Calculate several fingerprints of each material in the database and store them. Because storing data can be slow for large database, this method should be preferred over adding fingerprints one by one.

Arguments:

fp_types: List[str]

List of types of fingerprint X, must correspond to a XFingerprint() object

Keyword arguments:

names: List[str]

Names of the fingerprint as used in the database, if name == None: name = fp_type for any name in the list. This parameter is used to distinguish between fingerprints in the database, i.e. it must be unique for each unique fingerprint.

default: None

force_calculate: bool

Force calculation of the fingerprint even if fingerprint data was stored in the database before.

default: False

fingerprint_kwargs_list: List[dict]

List of additional keyword arguments that are passed to each Fingerprint().__init__.

default: None

fingerprint_calculate_kwargs_list: List[dict]

List of additional keyword arguments that are passed to each Fingerprint().calculate().

default: None

show_progress: bool

Show a progress bar during calculation.

default: True

Raises:

AssertionError: Number of kwargs passed to the __init__() and calculate() functions of all fingerprints are inconsitent. Thus, it is ambiguous which parameters correspond to which fingerprint.

add_material(*args, **kwargs) None

Add a specified material to the database. Keyword arguments are passed to the api. Arguments passed to this function are used to construct the database mid.

add_property(mid, property_name, **kwargs)

Add a property to a spacific material of the database through the API.

Arguments:

mid: str

Id of the material to update

property_name: str

Name of property for storage in the database

Keyword arguments are passed to the API to retrieve the property.

fill_database(*args, repeat_query: bool = False, **kwargs)

Fills the database with all materials matching the query. Parameters depend on the API that is used. To perform the query even though it has been performed before, set

repeat_query = True

See below for the documentation of the API functions.

get_fingerprint(fp_type: str, mid: str, name: str | None = None, force_calculate=False, similarity_function: Callable | None = None, fingerprint_kwargs: dict = {}, **kwargs) Fingerprint

Get a single fingerprint of type fp_type for a material with id mid.

Arguments:

fp_type: str

Type of fingerprint X, must correspond to a XFingerprint() object defined elsewhere

mid: str

Material id of the requested material

Keyword arguments:

name: str

Name of the fingerprint as used in the database, if name == None: name = fp_type. This parameter is used to distinguish between fingerprints in the database, i.e. it must be unique for each unique fingerprint.

default: None

force_calculate: bool

Force calculation of the fingerprint even if fingerprint data was stored in the database before.

default: False

similarity_function: Callable

Similarity function that takes two fingerprints as arguments and calculates their similarity.

default: None

fingerprint_kwargs: dict

Additional keyword arguments that are passed to Fingerprint().__init__.

default: {}

Additional keyword arguments are passed to Fingerprint().calculate().

Returns:

Fingerprint() object

None if calculation of Fingerprint failed

Raises:

KeyError: No material with specified id in the database.

get_fingerprints(fp_type: str | type, name: str | None = None, fingerprint_kwargs: dict = {}, force_calculate=False, similarity_function: Callable | None = None, show_progress: bool = True, **kwargs) List[Fingerprint]

Get fingerprints of type fp_type for all materials in the database.

Generates fingerprints if they don’t exist. To retrieve existing fingerprints,

Arguments:

fp_type: str or type

Type of fingerprint X, must correspond to a XFingerprint() object defined elsewhere

Keyword arguments:

name: str

Name of the fingerprint as used in the database, if name == None: name = fp_type. This parameter is used to distinguish between fingerprints in the database, i.e. it must be unique for each unique fingerprint.

default: None

force_calculate: bool

Force calculation of the fingerprint even if fingerprint data was stored in the database before.

default: False

fingerprint_kwargs: dict

Additional keyword arguments that are passed to Fingerprint().__init__.

default: {}

similarity_function: Callable

Similarity function that takes two fingerprints as arguments and calculates their similarity.

default: None

show_progress: bool

Show a progress bar during calculation.

default: True

Additional keyword arguments are passed to Fingerprint().calculate().

Returns:

List[Fingerprint() object or None]

get_metadata() dict

Get the metadata of the database.

get_properties(property_name: str, output_mids: bool = False, show_progress: bool = True) List[Any]

Get a list of properties for all materials of the database.

Arguments:

property_name: str

Name or path in Material object of the requested property

Example:

“code_name” –> stored in Material().data[“code_name”] retrieves the code used for the DFT calculation

“electronic_dos/dos_values” –> Material().data[“electronic_dos”][“dos_values”] retrieves the DOS values of the material

Keyword arguments:

output_mids: bool

Output the list of corresponding material ids together with the properties

default: False

show_progress: bool

Show a progress bar.

default: True

Returns:

tuple of:

properties: list

List of all properties of materials. If a property can not be retrieved, it will be set to None

,

[mids]: list

List of material ids

get_property(mid: str, property_name: str) Any

Get a property of a single material specified by name or path in Material object from the database.

Arguments:

mid: string

Material id of the requested material

property_name: str

Name or path in Material object of the requested property

Example:

“code_name” –> stored in Material().data[“code_name”] retrieves the code used for the DFT calculation

“electronic_dos/dos_values” –> Material().data[“electronic_dos”][“dos_values”] retrieves the DOS values of the material

Returns:

property: Any –> property value if it exists in the database

None –> else

get_property_dataframe(property_paths: List[str]) DataFrame

Generate a pandas DataFrame object that contains a table of properties.

Arguments:

property_paths: List[str]

List of property paths that point to the respective property.

Returns:

property_dataframe: pandas.DataFrame

Dataframe, where the index contains the mids and the columns the respective properties.

get_random(return_mid: bool = True) Material

Returns a random material from the database.

Keyword arguments:

return_id: bool

Return id of material instead of material

Returns:

Material id (str) of a random entry of the database: if return_id == True

Material object of a random entry of the database: else

get_similarity_matrix(fp_type: str | type, name: str | None = None, dtype=<class 'numpy.float64'>, **kwargs) SimilarityMatrix

Calculate a SimilarityMatrix() object from Fingerprints of type fp_type from all entries of the database. If fingerprints for some entries can not be calculated, they will be excluded from the database.

Arguments:

fp_type: str or type

Type of fingerprint X, must correspond to a XFingerprint() object defined elsewhere

Keyword arguments:

name: str

Name of the fingerprint as used in the database, if name == None: name = fp_type. This parameter is used to distinguish between fingerprints in the database, i.e. it must be unique for each unique fingerprint.

default: None

dtype: type

Data type used by SimilarityMatrix to store similarities

default: numpy.float64

Additional keyword arguments are passed to SimilarityMatrix().calculate().

Returns:

SimilarityMatrix() object

Raises:

ValueError: Fingerprints could not be obtained, no matrix can be calculated

set_name(name: str | None)

Set name of the database.

update_entries(mid_list: List[str], dictionary_list: List[dict]) None

Update a list of entries in the database.

Usage:

MaterialsDatabase().update_entries([“a”, “b”], [{“key1”value1}, {“key2”value2}])

–> updates parameter key1 of material “a” with value1 and key2 of material “b” with value2

Arguments:

mid: str

Mid of the corresponding database entry

dictionary_list: List[dict]

List of dictionaries htat contains data to update

update_entry(mid: str, **kwargs) None

Update a single entry of the database from a given dictionary.

Usage:

MaterialsDatabase().update_entry(“a”, key = value) –> updates parameter key of material (with id mid) with value

Arguments:

mid: str

Mid of the corresponding database entry

Additional keyword arguments are used to update the database entries.