The data framework
The data framework is the basic building block of a data analysis workflow. Data that may be gathered from different sources can be stored in a local database, to provide a fixed source of data, which does not change, e.g. if the external data source is altered. Thus, the analysis is repeatable and the results retain reproducible.
You can learn more about backends in the backend module description.
You can learn more about fingerprints in the fingerprint module description.
The remaining parameters and methods can be found below.
- class madas.data_framework.MaterialsDatabase(filename: str = 'materials_database.db', filepath: str = 'data', name: str | None = None, key_name: str = 'mid', api: object | None = None, backend: str | object = 'ase', log_mode: str = 'full')
A database wrapper to simplify materials data download from online repositories and study the similarity of materials based on different measures. Materials in the database can be accessed via material identifiers (mid).
Keyword arguments
- filename: str
Name of database file. Ignored if a Backend is specified explicitly.
default: ‘materials_database.db’
- filepath: str
Path of database file. Ignored if a Backend is specified explicitly.
default: ‘data’
- key_name: str
Name of unique key used in the database backend. Ignored if a Backend is specified explicitly.
default: ‘mid’
- api: madas.apis.api_core.APIClass object or None
API object that provides an interface to web databases Default will use the NOMAD Encyclopedia API.
default: None
- backend: str or madas.backend.backend_core.Backend object
Name of database backend to use or backend object. Default is ASEs AtomsDatabase.
default: ‘ase’
- log_mode: str
Logging mode: choose between:
“full”: Write to screen and log file
“silent” : Write to file only
“stream” : Write to screen only
“None” : Do not log
default: “full”
Methods:
- add_fingerprint(fp_type: str | type, name: str | None = None, show_progress: bool = True, force_calculate: bool = False, fingerprint_kwargs: dict = {}, **kwargs) None
Calculate fingerprints of all materials in the database and store them.
Arguments:
- fp_type: str or type
Type of fingerprint X, must correspond to a XFingerprint() object
Fingerprint types can also be added as type`s, then the data will be stored using the function `Fingerprint().serialize(). Deserialized fingerprints from the database will be generic Fingerprint objects and the similarity function is not set.
Keyword arguments:
- name: str
Name of the fingerprint as used in the database, if name == None: name = fp_type. This parameter is used to distinguish between fingerprints in the database, i.e. it must be unique for each unique fingerprint.
default: None
- force_calculate: bool
Force calculation of the fingerprint even if fingerprint data was stored in the database before.
default: False
- fingerprint_kwargs: dict
Additional keyword arguments that are passed to Fingerprint().__init__.
default: {}
- show_progress: bool
Show a progress bar during calculation.
default: True
Additional keyword arguments are passed to Fingerprint().calculate().
- add_fingerprints(fp_types: List[str | type], names: List[str] = [None], show_progress=False, fingerprint_kwargs_list: List[dict] | None = None, fingerprint_calculate_kwargs_list: List[dict] | None = None, force_calculate=False)
Calculate several fingerprints of each material in the database and store them. Because storing data can be slow for large database, this method should be preferred over adding fingerprints one by one.
Arguments:
- fp_types: List[str]
List of types of fingerprint X, must correspond to a XFingerprint() object
Keyword arguments:
- names: List[str]
Names of the fingerprint as used in the database, if name == None: name = fp_type for any name in the list. This parameter is used to distinguish between fingerprints in the database, i.e. it must be unique for each unique fingerprint.
default: None
- force_calculate: bool
Force calculation of the fingerprint even if fingerprint data was stored in the database before.
default: False
- fingerprint_kwargs_list: List[dict]
List of additional keyword arguments that are passed to each Fingerprint().__init__.
default: None
- fingerprint_calculate_kwargs_list: List[dict]
List of additional keyword arguments that are passed to each Fingerprint().calculate().
default: None
- show_progress: bool
Show a progress bar during calculation.
default: True
Raises:
AssertionError: Number of kwargs passed to the __init__() and calculate() functions of all fingerprints are inconsitent. Thus, it is ambiguous which parameters correspond to which fingerprint.
- add_material(*args, **kwargs) None
Add a specified material to the database. Keyword arguments are passed to the api. Arguments passed to this function are used to construct the database mid.
- add_property(mid, property_name, **kwargs)
Add a property to a spacific material of the database through the API.
Arguments:
- mid: str
Id of the material to update
- property_name: str
Name of property for storage in the database
Keyword arguments are passed to the API to retrieve the property.
- fill_database(*args, repeat_query: bool = False, **kwargs)
Fills the database with all materials matching the query. Parameters depend on the API that is used. To perform the query even though it has been performed before, set
repeat_query = TrueSee below for the documentation of the API functions.
- get_fingerprint(fp_type: str, mid: str, name: str | None = None, force_calculate=False, similarity_function: Callable | None = None, fingerprint_kwargs: dict = {}, **kwargs) Fingerprint
Get a single fingerprint of type fp_type for a material with id mid.
Arguments:
- fp_type: str
Type of fingerprint X, must correspond to a XFingerprint() object defined elsewhere
- mid: str
Material id of the requested material
Keyword arguments:
- name: str
Name of the fingerprint as used in the database, if name == None: name = fp_type. This parameter is used to distinguish between fingerprints in the database, i.e. it must be unique for each unique fingerprint.
default: None
- force_calculate: bool
Force calculation of the fingerprint even if fingerprint data was stored in the database before.
default: False
- similarity_function: Callable
Similarity function that takes two fingerprints as arguments and calculates their similarity.
default: None
- fingerprint_kwargs: dict
Additional keyword arguments that are passed to Fingerprint().__init__.
default: {}
Additional keyword arguments are passed to Fingerprint().calculate().
Returns:
Fingerprint() object
None if calculation of Fingerprint failed
Raises:
KeyError: No material with specified id in the database.
- get_fingerprints(fp_type: str | type, name: str | None = None, fingerprint_kwargs: dict = {}, force_calculate=False, similarity_function: Callable | None = None, show_progress: bool = True, **kwargs) List[Fingerprint]
Get fingerprints of type fp_type for all materials in the database.
Generates fingerprints if they don’t exist. To retrieve existing fingerprints,
Arguments:
- fp_type: str or type
Type of fingerprint X, must correspond to a XFingerprint() object defined elsewhere
Keyword arguments:
- name: str
Name of the fingerprint as used in the database, if name == None: name = fp_type. This parameter is used to distinguish between fingerprints in the database, i.e. it must be unique for each unique fingerprint.
default: None
- force_calculate: bool
Force calculation of the fingerprint even if fingerprint data was stored in the database before.
default: False
- fingerprint_kwargs: dict
Additional keyword arguments that are passed to Fingerprint().__init__.
default: {}
- similarity_function: Callable
Similarity function that takes two fingerprints as arguments and calculates their similarity.
default: None
- show_progress: bool
Show a progress bar during calculation.
default: True
Additional keyword arguments are passed to Fingerprint().calculate().
Returns:
List[Fingerprint() object or None]
- get_metadata() dict
Get the metadata of the database.
- get_properties(property_name: str, output_mids: bool = False, show_progress: bool = True) List[Any]
Get a list of properties for all materials of the database.
Arguments:
- property_name: str
Name or path in Material object of the requested property
Example:
“code_name” –> stored in Material().data[“code_name”] retrieves the code used for the DFT calculation
“electronic_dos/dos_values” –> Material().data[“electronic_dos”][“dos_values”] retrieves the DOS values of the material
Keyword arguments:
- output_mids: bool
Output the list of corresponding material ids together with the properties
default: False
- show_progress: bool
Show a progress bar.
default: True
Returns:
tuple of:
- properties: list
List of all properties of materials. If a property can not be retrieved, it will be set to None
,
- [mids]: list
List of material ids
- get_property(mid: str, property_name: str) Any
Get a property of a single material specified by name or path in Material object from the database.
Arguments:
- mid: string
Material id of the requested material
- property_name: str
Name or path in Material object of the requested property
Example:
“code_name” –> stored in Material().data[“code_name”] retrieves the code used for the DFT calculation
“electronic_dos/dos_values” –> Material().data[“electronic_dos”][“dos_values”] retrieves the DOS values of the material
Returns:
property: Any –> property value if it exists in the database
None –> else
- get_property_dataframe(property_paths: List[str]) DataFrame
Generate a pandas DataFrame object that contains a table of properties.
Arguments:
- property_paths: List[str]
List of property paths that point to the respective property.
Returns:
- property_dataframe: pandas.DataFrame
Dataframe, where the index contains the mids and the columns the respective properties.
- get_random(return_mid: bool = True) Material
Returns a random material from the database.
Keyword arguments:
- return_id: bool
Return id of material instead of material
Returns:
Material id (str) of a random entry of the database: if return_id == True
Material object of a random entry of the database: else
- get_similarity_matrix(fp_type: str | type, name: str | None = None, dtype=<class 'numpy.float64'>, **kwargs) SimilarityMatrix
Calculate a SimilarityMatrix() object from Fingerprints of type fp_type from all entries of the database. If fingerprints for some entries can not be calculated, they will be excluded from the database.
Arguments:
- fp_type: str or type
Type of fingerprint X, must correspond to a XFingerprint() object defined elsewhere
Keyword arguments:
- name: str
Name of the fingerprint as used in the database, if name == None: name = fp_type. This parameter is used to distinguish between fingerprints in the database, i.e. it must be unique for each unique fingerprint.
default: None
- dtype: type
Data type used by SimilarityMatrix to store similarities
default: numpy.float64
Additional keyword arguments are passed to SimilarityMatrix().calculate().
Returns:
SimilarityMatrix() object
Raises:
ValueError: Fingerprints could not be obtained, no matrix can be calculated
- set_name(name: str | None)
Set name of the database.
- update_entries(mid_list: List[str], dictionary_list: List[dict]) None
Update a list of entries in the database.
Usage:
- MaterialsDatabase().update_entries([“a”, “b”], [{“key1”value1}, {“key2”value2}])
–> updates parameter key1 of material “a” with value1 and key2 of material “b” with value2
Arguments:
- mid: str
Mid of the corresponding database entry
- dictionary_list: List[dict]
List of dictionaries htat contains data to update
- update_entry(mid: str, **kwargs) None
Update a single entry of the database from a given dictionary.
Usage:
MaterialsDatabase().update_entry(“a”, key = value) –> updates parameter key of material (with id mid) with value
Arguments:
- mid: str
Mid of the corresponding database entry
Additional keyword arguments are used to update the database entries.