{ "cells": [ { "cell_type": "markdown", "id": "d32ef576-c89c-47f0-b089-33e82875e067", "metadata": {}, "source": [ "# Downloading and managing data with MADAS" ] }, { "cell_type": "markdown", "id": "e18424e3-bf74-4686-9ccc-fe8ef59a4c08", "metadata": {}, "source": [ "For our tutorial, we will use data from [NOMAD](https://nomad-lab.eu). NOMAD is a free and FAIR online database of materials-science data, including results from both theory and experiments. As such, it is a rich source of data for analytics and machine learning. \n", "\n", "NOMAD follows a user-centric approach to data management, allowing users to upload raw data, which is then transformed and archived on the NOMAD platform. As such, it supports many different ways of representing data, including user-defined schemas. This rich metadata is a valuable source for data analytics, as it allows to keep track of the whole provenance of the data, allowing to find and understand outliers and creating trustable results. However, the verbosity of the schemata leads to significant complexity, making it hard to find the relevant information for a given application. Furthermore, the flexible approach of the NOMAD data schema allows that the central database contains different versions of the same schema, based on when the data was processed. While in those cases the data provenance is preserved, bringing the data to an application may require processing of the data before it can be used.\n", "\n", "MADAS as a framework allows to connect to the NOMAD API, download data, store it in a local database, apply transformations to the data, and extract the transformed data for downstream applications." ] }, { "cell_type": "markdown", "id": "eca71d9a-8ff6-4dd0-8063-6c9a1b9bd131", "metadata": {}, "source": [ "In this tutorial you are going to learn how to:\n", "\n", "
| \n", " | volume | \n", "band_gap | \n", "
|---|---|---|
| count | \n", "191.000000 | \n", "191.000000 | \n", "
| mean | \n", "217.414811 | \n", "1.255602 | \n", "
| std | \n", "246.428808 | \n", "2.051329 | \n", "
| min | \n", "7.709944 | \n", "0.000000 | \n", "
| 25% | \n", "64.001745 | \n", "0.000000 | \n", "
| 50% | \n", "114.783390 | \n", "0.000000 | \n", "
| 75% | \n", "269.220632 | \n", "2.195000 | \n", "
| max | \n", "1240.477268 | \n", "9.540000 | \n", "
| \n", " | volume | \n", "band_gap | \n", "archive/results/material/chemical_formula_reduced | \n", "
|---|---|---|---|
| lYczghNfQInQhaVu7F4TcyaBFPkg | \n", "195.532456 | \n", "2.41 | \n", "CdIn2O4 | \n", "
| rZF4gjJ48EGz2BBJuuCLJFzkPWVD | \n", "53.789343 | \n", "1.62 | \n", "Cu3N | \n", "
| Og3YctYdzQznelLm078NKakyvkNO | \n", "44.070003 | \n", "0.0 | \n", "CeS | \n", "
| luGPZXx92gJIG_ZdF-kw7bY00knV | \n", "178.910746 | \n", "1.61 | \n", "Cs3Sb | \n", "
| J5MoOxPpWnOl42x2aQJ2Qn_TBVeX | \n", "92.121642 | \n", "0.0 | \n", "Ca2H6Ir | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| qo3iMLZM-TX3FXLMTOJoW4HCJojV | \n", "95.321992 | \n", "6.86 | \n", "BaCl2 | \n", "
| xyQyh8qKd5KUJx75KYtLup9sAvP6 | \n", "263.534744 | \n", "0.0 | \n", "Be13Ca | \n", "
| AL_NDle5ybphhGeeterPG9tKxshp | \n", "29.800045 | \n", "0.0 | \n", "CeO | \n", "
| V4sjmkEC0kBNBsSTpFvzMCHuaMsS | \n", "87.985549 | \n", "0.0 | \n", "Cd3In | \n", "
| UUO7gDxGEe2jLENT_yxR1Ygy8c7A | \n", "77.888396 | \n", "0.0 | \n", "Ca2Ir | \n", "
191 rows × 3 columns
\n", "