{ "cells": [ { "cell_type": "markdown", "id": "d58e3636-0d5d-41fd-b747-05d32090959a", "metadata": {}, "source": [ "# Writing a custom Backend" ] }, { "cell_type": "markdown", "id": "3a5f6b17-8706-420c-9881-2455fde6eb1b", "metadata": {}, "source": [ "Adding data to a database requires that this data is stored in some kind of file. `MADAS` uses an abstract class, the `Backend`, to describe interactions with the database file. This - on the one hand - allows users to use `MADAS` without knowing about database files. On the other hand, it allows users to create their own `Backend`s based on their requirements. Such requirements could be, e.g., that an already existing database should be used, or a certain level of performance is needed." ] }, { "cell_type": "markdown", "id": "b8739d62-86bc-4e11-8e71-b1b7e10265a6", "metadata": {}, "source": [ "In this tutorial you are going to learn how to:\n", "\n", "
\n", " \n", "**[Find the required methods](#Find-the-required-methods)** \n", "**[A text-file based database](#A-text-file-based-database)** \n", "**[Writing a class](#Writing-a-class)** \n", "**[Testing](#Testing)** \n", " \n", "
\n", "\n", "Let's get started!" ] }, { "cell_type": "code", "execution_count": 1, "id": "14893eb8", "metadata": {}, "outputs": [], "source": [ "#imports\n", "from madas import Material\n", "from madas.backend import Backend" ] }, { "cell_type": "markdown", "id": "a8b596c9-09bc-439c-a70f-ded7d7b50bd7", "metadata": {}, "source": [ "## Find the required methods" ] }, { "cell_type": "markdown", "id": "37542278-2174-46bd-84d9-b74104a0f61b", "metadata": {}, "source": [ "We can inspect wich methods are defined for the `Backend` base class. The documentation provides most of the required information to start implementing our own `Backend`." ] }, { "cell_type": "code", "execution_count": 2, "id": "68354a1c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on class Backend in module madas.backend.backend_core:\n", "\n", "class Backend(builtins.object)\n", " | Backend(filename='materials_database.db', filepath='data', make_dirs=True, key_name='mid', log=None)\n", " |\n", " | A database backend wrapper to unify file-based storage of materials data.\n", " |\n", " | **Keyword arguments:**\n", " |\n", " | filename: `str`\n", " | Name of the data file\n", " |\n", " | default: `\"materials_database.db\"`\n", " |\n", " | filepath: `str`\n", " | Path (location) of the file.\n", " |\n", " | default: `\"data\"`\n", " |\n", " | make_dirs: `bool`\n", " | Create directory paths if it does not exist.\n", " |\n", " | default: `True`\n", " |\n", " | key_name: `str`\n", " | Name of unique keys that are used to find (individual) entries in the database.\n", " |\n", " | default: `\"mid\"`\n", " |\n", " | log: `logging.Logger` or `None`\n", " | Logger used displaying logs.\n", " |\n", " | default: `None`\n", " |\n", " | Methods defined here:\n", " |\n", " | __init__(self, filename='materials_database.db', filepath='data', make_dirs=True, key_name='mid', log=None)\n", " | Initialize self. See help(type(self)) for accurate signature.\n", " |\n", " | add_many(self, *args, **kwargs) -> None\n", " | Add data to the database.\n", " |\n", " | add_single(self, *args, **kwargs) -> None\n", " | Add data to the database.\n", " |\n", " | get_by_id(self, db_id) -> madas.material.Material\n", " | Return a single entry from an (integer valued) database id.\n", " |\n", " | get_length(self) -> int\n", " | Return the length of the database, i.e. the total number of entries.\n", " |\n", " | get_many(self, mids=None, **kwargs) -> List[madas.material.Material]\n", " | Get a single entry from the database.\n", " |\n", " | get_single(self, mid=None, **kwargs) -> madas.material.Material\n", " | Get a single entry from the database.\n", " |\n", " | has_entry(self, entry_id) -> bool\n", " | Check if an entry with the given id is present in the database.\n", " |\n", " | set_logger(self, logger: logging.Logger) -> None\n", " | Set logger.\n", " |\n", " | update_many(self, *args, update_data: bool = False, **kwargs) -> None\n", " | Update several entries in the database.\n", " |\n", " | update_metadata(self, *args, **kwargs) -> None\n", " | Updata database metadata.\n", " |\n", " | update_single(self, *args, update_data: bool = False, **kwargs) -> None\n", " | Update a single entry in the database.\n", " |\n", " | ----------------------------------------------------------------------\n", " | Readonly properties defined here:\n", " |\n", " | abs_path\n", " | Absolute path property, contains the absolute path of the backend file.\n", " |\n", " | log\n", " | Logger property, returns the log.\n", " |\n", " | metadata\n", " | Metadata property, returns metadata attached to the backend.\n", " |\n", " | ----------------------------------------------------------------------\n", " | Data descriptors defined here:\n", " |\n", " | __dict__\n", " | dictionary for instance variables\n", " |\n", " | __weakref__\n", " | list of weak references to the object\n", "\n" ] } ], "source": [ "help(Backend)" ] }, { "cell_type": "markdown", "id": "afb35dc3-6bbf-4184-8046-b22cc7569288", "metadata": {}, "source": [ "## A text-file based database" ] }, { "cell_type": "markdown", "id": "0ed1ac4c-953d-4416-90e6-397f9eb30b69", "metadata": {}, "source": [ "For this tutorial, we will write a simple 'database', that stores one material in each line of a text file. \n", "**Note** that this implementation is **neither efficient, nor safe from data loss** and is meant **only for demonstration purposes**.\n", "\n", "We will write the individual methods that are required for our `Backend` consecutively and then combine them in a new class. Eventually, we test the implementation with real data." ] }, { "cell_type": "markdown", "id": "b275723c-c72e-422b-9aa2-f1bca60cabb6", "metadata": {}, "source": [ "### Preparations" ] }, { "cell_type": "markdown", "id": "3a7b43a3-307b-4232-b958-869e63eb1d15", "metadata": {}, "source": [ "Our file will contain a serialized `Material` class on each line, which can be accessed using the database index." ] }, { "cell_type": "markdown", "id": "297aeb86-60b8-422a-b917-2b5d9adc2c7d", "metadata": {}, "source": [ "We will use a single text file called `tutorial_backend_development_file.txt` for demostration purposes. For simplicity, we will omit storing the database metadata." ] }, { "cell_type": "code", "execution_count": 3, "id": "3e591800-5e0c-46de-b119-b7897fa77c61", "metadata": {}, "outputs": [], "source": [ "# Define a filename\n", "TUTORIAL_FILE_NAME='tutorial_backend_development_file.txt'" ] }, { "cell_type": "markdown", "id": "86c5cff3-1e09-467d-9f0a-dfdaf44a8e09", "metadata": {}, "source": [ "We also need some test data to develop our methods. We will generate some synthetic `Material` objects for this purpose." ] }, { "cell_type": "code", "execution_count": 4, "id": "e393f24a-21bc-439a-b5b7-e5a1ae08bd85", "metadata": {}, "outputs": [], "source": [ "from ase.build import bulk\n", "from madas import Material\n", "\n", "tutorial_test_material1 = Material(\"test_material1\", atoms=bulk(\"Ag\"), data={\"test\":\"data1\"})\n", "tutorial_test_material2 = Material(\"test_material2\", atoms=bulk(\"Au\"), data={\"test\":\"data2\\nor something\"})\n", "tutorial_test_material3 = Material(\"test_material3\", atoms=bulk(\"Cu\"), data={\"test\":\"data3\"})" ] }, { "cell_type": "markdown", "id": "4737f0fd-c46a-4e9b-93ec-46bc78ce9653", "metadata": {}, "source": [ "### Adding data" ] }, { "cell_type": "markdown", "id": "aa3eb6d3-ea40-40d0-8c64-b704eb66af92", "metadata": {}, "source": [ "Adding data to a text file can be achieved by using the [built-in Python methods](https://docs.python.org/3.10/tutorial/inputoutput.html#reading-and-writing-files). Below, comments are added to explain each line of the code. " ] }, { "cell_type": "markdown", "id": "f48c3f4f-f99f-4d42-90a3-1d9c8966852e", "metadata": {}, "source": [ "#### add_single" ] }, { "cell_type": "code", "execution_count": 5, "id": "40207d39-7093-409b-b207-431f6dbffe8c", "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "# open the file\n", "with open(TUTORIAL_FILE_NAME, \"a\") as f_:\n", " # convert the material to a dictionary and dump the latter as json data\n", " # this data is then written to file\n", " f_.write(json.dumps(tutorial_test_material1.to_dict()))\n", " # add newline, such that each material is in its own line\n", " f_.write(\"\\n\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "d4ad478a-93ff-48c2-a38f-14eb1017b2dd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"mid\": \"test_material1\", \"atoms\": {\"numbers\": [47], \"positions\": [[0.0, 0.0, 0.0]], \"cell\": [[0.0, 2.045, 2.045], [2.045, 0.0, 2.045], [2.045, 2.045, 0.0]], \"pbc\": [true, true, true]}, \"properties\": {}, \"data\": {\"test\": \"data1\"}}\n" ] } ], "source": [ "# We can show the contents of our database file using the Bash functionality of Jupyter\n", "!cat tutorial_backend_development_file.txt" ] }, { "cell_type": "markdown", "id": "6380a30f-a85d-41ad-a0c8-991963931c2b", "metadata": {}, "source": [ "#### add_many" ] }, { "cell_type": "markdown", "id": "ac289c6f-bee6-461d-81e8-1bb2562d54ae", "metadata": {}, "source": [ "To add several materials, we can just repeat the step above:" ] }, { "cell_type": "code", "execution_count": 7, "id": "333127d0-b78f-4f02-894e-aa82e7620634", "metadata": {}, "outputs": [], "source": [ "with open(TUTORIAL_FILE_NAME, \"a\") as f_:\n", " for material in [tutorial_test_material2, tutorial_test_material3]:\n", " f_.write(json.dumps(material.to_dict()))\n", " f_.write(\"\\n\") # add newline, such that each material is in its own line" ] }, { "cell_type": "code", "execution_count": 8, "id": "5d6bbd98-df3b-4c64-81ba-f8b53ac53e47", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"mid\": \"test_material1\", \"atoms\": {\"numbers\": [47], \"positions\": [[0.0, 0.0, 0.0]], \"cell\": [[0.0, 2.045, 2.045], [2.045, 0.0, 2.045], [2.045, 2.045, 0.0]], \"pbc\": [true, true, true]}, \"properties\": {}, \"data\": {\"test\": \"data1\"}}\n", "{\"mid\": \"test_material2\", \"atoms\": {\"numbers\": [79], \"positions\": [[0.0, 0.0, 0.0]], \"cell\": [[0.0, 2.04, 2.04], [2.04, 0.0, 2.04], [2.04, 2.04, 0.0]], \"pbc\": [true, true, true]}, \"properties\": {}, \"data\": {\"test\": \"data2\\nor something\"}}\n", "{\"mid\": \"test_material3\", \"atoms\": {\"numbers\": [29], \"positions\": [[0.0, 0.0, 0.0]], \"cell\": [[0.0, 1.805, 1.805], [1.805, 0.0, 1.805], [1.805, 1.805, 0.0]], \"pbc\": [true, true, true]}, \"properties\": {}, \"data\": {\"test\": \"data3\"}}\n" ] } ], "source": [ "!cat tutorial_backend_development_file.txt" ] }, { "cell_type": "markdown", "id": "8bd338f9-51c8-46ee-b402-9ce2e9caa1a9", "metadata": {}, "source": [ "### Getting data" ] }, { "cell_type": "markdown", "id": "210a2a80-19f4-4bdf-8078-01268bf07d1c", "metadata": {}, "source": [ "Getting data is based on the `mid` of a `Material`. We can find the corresponding line using a linear search." ] }, { "cell_type": "markdown", "id": "9d1dec15-4e1b-4967-888a-a675f6b4ae21", "metadata": {}, "source": [ "#### get_single" ] }, { "cell_type": "code", "execution_count": 9, "id": "dd180ab4-1314-48f4-9e09-a54f4906f20b", "metadata": {}, "outputs": [], "source": [ "TEST_MID = \"test_material1\"\n", "\n", "with open(TUTORIAL_FILE_NAME, \"r\") as f_:\n", " for new_line in f_:\n", " # deserialize every material to check its mid\n", " material = Material.from_dict(json.loads(new_line))\n", " if material.mid == TEST_MID:\n", " break" ] }, { "cell_type": "code", "execution_count": 10, "id": "0b66b47e-401f-4431-97f6-d77afc9f00cd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Material(mid = test_material1, formula = Ag, data = {'test'}, properties = set())" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "material" ] }, { "cell_type": "markdown", "id": "cac4c2ca-5d42-479e-9128-ecdfb704fdb4", "metadata": {}, "source": [ "#### get_many" ] }, { "cell_type": "markdown", "id": "590c1235-d3a6-4286-afd7-f0f726473af0", "metadata": {}, "source": [ "For a set of different mids, this can be optimized in order to reduce searches:" ] }, { "cell_type": "code", "execution_count": 11, "id": "c502665d-d79a-4253-8fdb-73fabaf59ae2", "metadata": {}, "outputs": [], "source": [ "TEST_MIDS = [\"test_material3\", \"test_material1\"]\n", "\n", "# here we use a set to simplify checking if a material has been found\n", "mids_to_search = set(TEST_MIDS)\n", "found_materials = []\n", "with open(TUTORIAL_FILE_NAME, \"r\") as f_:\n", " for new_line in f_:\n", " # stop search if all materials have been found\n", " if len(mids_to_search) == 0:\n", " break\n", " material = Material.from_dict(json.loads(new_line))\n", " # if the current material is one of the searched ones\n", " if material.mid in mids_to_search:\n", " # append then to the results\n", " found_materials.append(material)\n", " # discard the entry from the list of materials\n", " mids_to_search.discard(material.mid)\n", " # sort to recover input sorting\n", " found_materials = sorted(found_materials, key=lambda x: TEST_MIDS.index(x.mid))" ] }, { "cell_type": "code", "execution_count": 12, "id": "f73106d9-3e5d-47a8-be9e-9d749dcc7794", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Material(mid = test_material3, formula = Cu, data = {'test'}, properties = set()),\n", " Material(mid = test_material1, formula = Ag, data = {'test'}, properties = set())]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "found_materials" ] }, { "cell_type": "markdown", "id": "e360450c-6285-4a97-9654-5e7e8902ac3b", "metadata": {}, "source": [ "#### get_by_id" ] }, { "cell_type": "markdown", "id": "a54b67ee-7d8c-4f6e-917d-c50cb75dace4", "metadata": {}, "source": [ "To get an entry by database id, we just iterate through the lines of the file." ] }, { "cell_type": "code", "execution_count": 13, "id": "2806bebc-58c6-40b8-98bd-447e20f808d4", "metadata": {}, "outputs": [], "source": [ "TEST_ID = 1\n", "\n", "with open(TUTORIAL_FILE_NAME, \"r\") as f_:\n", " for idx, new_line in enumerate(f_):\n", " if idx==TEST_ID:\n", " material = Material.from_dict(json.loads(new_line))" ] }, { "cell_type": "code", "execution_count": 14, "id": "80817946-8b4b-4faa-b199-fa645afdf7c6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Material(mid = test_material2, formula = Au, data = {'test'}, properties = set())" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "material" ] }, { "cell_type": "markdown", "id": "74d77f09-2b68-4f1b-b72f-7c7b8b8ae4e6", "metadata": {}, "source": [ "#### get_length" ] }, { "cell_type": "markdown", "id": "831e339e-d07e-4cde-a59d-45d2c86716a3", "metadata": {}, "source": [ "To get the number of entries in our database, we can count the number of lines:" ] }, { "cell_type": "code", "execution_count": 15, "id": "f85eae55-0c21-4c70-b02d-5ff2fed2035f", "metadata": {}, "outputs": [], "source": [ "with open(TUTORIAL_FILE_NAME, \"r\") as f_:\n", " length = sum(1 for _ in f_)" ] }, { "cell_type": "code", "execution_count": 16, "id": "2b87ba73-04f9-4f26-9e2d-f1f7c8a4a7ed", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "length" ] }, { "cell_type": "markdown", "id": "35675de7-48a0-4a95-bbc0-d0b04f2dfd53", "metadata": {}, "source": [ "#### has_entry" ] }, { "cell_type": "markdown", "id": "74653102-012c-41b4-bd2b-9c379c50e206", "metadata": {}, "source": [ "Because we are using a text file for storing, to check if an entry is present we have to iterate through all entries of the file." ] }, { "cell_type": "code", "execution_count": 17, "id": "e64e578b-4828-4750-8062-e93d0a577da6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "mid=TEST_MID\n", "\n", "is_in_db=False\n", "with open(TUTORIAL_FILE_NAME, \"r\") as f_:\n", " for new_line in f_:\n", " if json.loads(new_line)[\"mid\"]==mid:\n", " is_in_db=True\n", " break\n", "print(is_in_db)" ] }, { "cell_type": "markdown", "id": "d0773c84-1202-4e78-af60-ba2e3b1eef43", "metadata": {}, "source": [ "### Updating entries" ] }, { "cell_type": "markdown", "id": "2653b3bf-6dd8-4607-8a64-13d11a664b4e", "metadata": {}, "source": [ "Updating entries is more complicated, because we have to overwrite our original file. Here, we store the file content in memory instead." ] }, { "cell_type": "markdown", "id": "e5fb8b6a-109a-455b-94ed-c82766e343da", "metadata": {}, "source": [ "#### update_single" ] }, { "cell_type": "code", "execution_count": 18, "id": "e24f21d3-988e-480f-b4f9-97e4aee850d5", "metadata": {}, "outputs": [], "source": [ "mid=TEST_MID\n", "updated_data = {\"test\":\"up\"}\n", "\n", "# read the full file as a list of strings\n", "with open(TUTORIAL_FILE_NAME, \"r\") as f_:\n", " contents = f_.readlines() \n", "\n", "# change the respective entry\n", "for idx in range(len(contents)):\n", " material = Material.from_dict(json.loads(contents[idx]))\n", " if material.mid == mid:\n", " material.properties.update(**updated_data)\n", " contents[idx] = json.dumps(material.to_dict())+\"\\n\"\n", " break\n", "\n", "# overwrite the original file\n", "with open(TUTORIAL_FILE_NAME, \"w\") as f_:\n", " f_.writelines(contents)" ] }, { "cell_type": "code", "execution_count": 19, "id": "946dcb71-dd86-4f40-8e4d-dfa88c79b8d2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"mid\": \"test_material1\", \"atoms\": {\"numbers\": [47], \"positions\": [[0.0, 0.0, 0.0]], \"cell\": [[0.0, 2.045, 2.045], [2.045, 0.0, 2.045], [2.045, 2.045, 0.0]], \"pbc\": [true, true, true]}, \"properties\": {\"test\": \"up\"}, \"data\": {\"test\": \"data1\"}}\n", "{\"mid\": \"test_material2\", \"atoms\": {\"numbers\": [79], \"positions\": [[0.0, 0.0, 0.0]], \"cell\": [[0.0, 2.04, 2.04], [2.04, 0.0, 2.04], [2.04, 2.04, 0.0]], \"pbc\": [true, true, true]}, \"properties\": {}, \"data\": {\"test\": \"data2\\nor something\"}}\n", "{\"mid\": \"test_material3\", \"atoms\": {\"numbers\": [29], \"positions\": [[0.0, 0.0, 0.0]], \"cell\": [[0.0, 1.805, 1.805], [1.805, 0.0, 1.805], [1.805, 1.805, 0.0]], \"pbc\": [true, true, true]}, \"properties\": {}, \"data\": {\"test\": \"data3\"}}\n" ] } ], "source": [ "!cat $TUTORIAL_FILE_NAME" ] }, { "cell_type": "markdown", "id": "d7af5710-0deb-407d-a790-e9a0894b4ef9", "metadata": {}, "source": [ "#### update_many" ] }, { "cell_type": "markdown", "id": "aadae558-d504-489c-adb8-a7bfef4aac1a", "metadata": {}, "source": [ "To update many entries at once, we can use the same trick." ] }, { "cell_type": "code", "execution_count": 20, "id": "e6fbec80-0a98-4502-8792-57ec6f8b2e16", "metadata": {}, "outputs": [], "source": [ "mids_to_search=[\"test_material1\", \"test_material3\"]\n", "updated_data = [{\"test\":\"update1\"}, {\"test\":\"update3\"}]\n", "\n", "with open(TUTORIAL_FILE_NAME, \"r\") as f_:\n", " contents = f_.readlines() \n", "for idx in range(len(contents)):\n", " material = Material.from_dict(json.loads(contents[idx]))\n", " if material.mid in mids_to_search:\n", " for mid_, data_ in zip(mids_to_search, updated_data):\n", " if mid_ == material.mid:\n", " material.properties.update(**data_)\n", " contents[idx] = json.dumps(material.to_dict())+\"\\n\"\n", "with open(TUTORIAL_FILE_NAME, \"w\") as f_:\n", " f_.writelines(contents)" ] }, { "cell_type": "code", "execution_count": 21, "id": "c1809350-ff7b-4ccf-b0de-01b17582493b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"mid\": \"test_material1\", \"atoms\": {\"numbers\": [47], \"positions\": [[0.0, 0.0, 0.0]], \"cell\": [[0.0, 2.045, 2.045], [2.045, 0.0, 2.045], [2.045, 2.045, 0.0]], \"pbc\": [true, true, true]}, \"properties\": {\"test\": \"update1\"}, \"data\": {\"test\": \"data1\"}}\n", "{\"mid\": \"test_material2\", \"atoms\": {\"numbers\": [79], \"positions\": [[0.0, 0.0, 0.0]], \"cell\": [[0.0, 2.04, 2.04], [2.04, 0.0, 2.04], [2.04, 2.04, 0.0]], \"pbc\": [true, true, true]}, \"properties\": {}, \"data\": {\"test\": \"data2\\nor something\"}}\n", "{\"mid\": \"test_material3\", \"atoms\": {\"numbers\": [29], \"positions\": [[0.0, 0.0, 0.0]], \"cell\": [[0.0, 1.805, 1.805], [1.805, 0.0, 1.805], [1.805, 1.805, 0.0]], \"pbc\": [true, true, true]}, \"properties\": {\"test\": \"update3\"}, \"data\": {\"test\": \"data3\"}}\n" ] } ], "source": [ "!cat $TUTORIAL_FILE_NAME" ] }, { "cell_type": "markdown", "id": "0458f17e-6909-47a7-93cc-a11c8a63e978", "metadata": {}, "source": [ "### Metadata" ] }, { "cell_type": "markdown", "id": "ba4cba40-4404-4bde-868c-cc4121d757a3", "metadata": {}, "source": [ "We leave the implementation out of this tutorial. Possible would be, e.g. the implementation in another file. \n", "\n", "To ignore all metadata related operations, we will implement a function that just `pass`es instead of updating any metadata." ] }, { "cell_type": "markdown", "id": "d2e2d93e-07be-4563-81ca-c7a3d62f2baa", "metadata": {}, "source": [ "## Writing a class" ] }, { "cell_type": "markdown", "id": "76a5e07b-2c1b-44f8-ad86-817ff16bef42", "metadata": {}, "source": [ "We can combine all of our previously developed functions in a new class, which can be used seamlessly within the MADAS framework:" ] }, { "cell_type": "code", "execution_count": 22, "id": "f360fdf5-9b37-4285-866e-5cd9a2f49067", "metadata": {}, "outputs": [], "source": [ "import os\n", "from typing import List\n", "import warnings\n", "\n", "from madas.backend import Backend\n", "from madas.utils import safe_log\n", "\n", "class TXTBackend(Backend):\n", " \"\"\"\n", " Text-file based database to demonstrate the implementation of `Backend` classes in `MADAS`. \n", " \"\"\"\n", "\n", " def __init__(self,\n", " filename='materials_database.txt',\n", " filepath='data',\n", " make_dirs=True,\n", " key_name='mid',\n", " log=None):\n", " super().__init__(filename=filename, filepath=filepath, make_dirs=make_dirs, key_name=key_name, log=log)\n", " if not os.path.exists(self.abs_path):\n", " with open(self.abs_path, \"w\") as f_:\n", " f_.write(\"\")\n", "\n", " ## Add data \n", " def add_single(self, material: Material):\n", " \"\"\"\n", " Add a single material to the database.\n", " \"\"\"\n", " with open(self.abs_path, \"a\") as f_:\n", " f_.write(json.dumps(material.to_dict()))\n", " f_.write(\"\\n\") # add newline, such that each material is in its own line\n", " \n", " def add_many(self, materials: List[Material]):\n", " \"\"\"\n", " Add several materials to the database.\n", " \"\"\"\n", " with open(self.abs_path, \"a\") as f_:\n", " for material in materials:\n", " f_.write(json.dumps(material.to_dict()))\n", " f_.write(\"\\n\") # add newline, such that each material is in its own line\n", "\n", " ## Retrieve data\n", " def get_single(self, mid: str) -> Material | None:\n", " \"\"\"\n", " Get a single material from the database. \n", " \"\"\"\n", " material = None\n", " with open(self.abs_path, \"r\") as f_:\n", " for new_line in f_:\n", " material = Material.from_dict(json.loads(new_line))\n", " if material.mid == mid:\n", " break\n", " if material is None:\n", " safe_log(f\"Material with mid: {mid} is not contained in database.\", self.log)\n", " return material\n", "\n", " def has_entry(self, mid: str) -> bool:\n", " \"\"\"\n", " Check if a material with a given mid is present in the database.\n", " \"\"\"\n", " is_in_db=False\n", " with open(self.abs_path, \"r\") as f_:\n", " for new_line in f_:\n", " if json.loads(new_line)[\"mid\"]==mid:\n", " is_in_db=True\n", " break\n", " return is_in_db\n", " \n", " def get_many(self, mids: List[str]) -> List[Material]:\n", " \"\"\"\n", " Get many materials from the database.\n", " \"\"\"\n", " mids_to_search = set(mids)\n", " found_materials = []\n", " with open(self.abs_path, \"r\") as f_:\n", " for new_line in f_:\n", " # stop search early if all materials have been found\n", " if len(mids_to_search) == 0:\n", " break\n", " material = Material.from_dict(json.loads(new_line))\n", " if material.mid in mids_to_search:\n", " found_materials.append(material)\n", " mids_to_search.discard(material.mid)\n", " # sort to recover input sorting\n", " found_materials = sorted(found_materials, key=lambda x: mids.index(x.mid))\n", " if len(found_materials) != len(mids):\n", " safe_log(f\"Number of searched ({len(mids)}) and found ({len(found_materials)}) materials do not match.\", logger = self.log)\n", " return found_materials\n", "\n", " def get_by_id(self, db_id: int) -> Material | None:\n", " \"\"\"\n", " Get an entry by database id.\n", " \"\"\"\n", " material = None\n", " with open(self.abs_path, \"r\") as f_:\n", " for idx, new_line in enumerate(f_):\n", " if idx==db_id:\n", " material = Material.from_dict(json.loads(new_line))\n", " break\n", " return material\n", "\n", " def get_length(self):\n", " \"\"\"\n", " Get the length of the database.\n", " \"\"\"\n", " with open(self.abs_path, \"r\") as f_:\n", " length = sum(1 for _ in f_)\n", " return length\n", "\n", " # Update data\n", " def update_single(self, mid: str, updated_data: dict):\n", " \"\"\"\n", " Update a single entry.\n", " \"\"\"\n", " with open(self.abs_path, \"r\") as f_:\n", " contents = f_.readlines() \n", " for idx in range(len(contents)):\n", " material = Material.from_dict(json.loads(contents[idx]))\n", " if material.mid == mid:\n", " material.properties.update(**updated_data)\n", " contents[idx] = json.dumps(material.to_dict())+\"\\n\"\n", " break\n", " with open(self.abs_path, \"w\") as f_:\n", " f_.writelines(contents)\n", "\n", " def update_many(self, mids: List[str], updated_data: List[dict]):\n", " \"\"\"\n", " Update many entries.\n", " \"\"\"\n", " with open(self.abs_path, \"r\") as f_:\n", " contents = f_.readlines() \n", " for idx in range(len(contents)):\n", " material = Material.from_dict(json.loads(contents[idx]))\n", " if material.mid in mids:\n", " for mid_, data_ in zip(mids, updated_data):\n", " if mid_ == material.mid:\n", " material.properties.update(**data_)\n", " contents[idx] = json.dumps(material.to_dict())+\"\\n\"\n", " with open(self.abs_path, \"w\") as f_:\n", " f_.writelines(contents)\n", "\n", " def update_metadata(*args, **kwargs):\n", " warnings.warn(\"Saving metadata is not supported by the current Backend.\", UserWarning)\n", " pass" ] }, { "cell_type": "markdown", "id": "741a89e4-a3a4-4e65-b5d8-720c2e58e8dd", "metadata": {}, "source": [ "## Testing" ] }, { "cell_type": "markdown", "id": "dab667e4-4943-4e97-99fe-4e1e719ca322", "metadata": {}, "source": [ "To test our implementation, we can use it with together with a `MADAS` `MaterialsDatabase`." ] }, { "cell_type": "code", "execution_count": 23, "id": "e4153e03-426e-420b-859e-ea601a46bfc2", "metadata": {}, "outputs": [], "source": [ "from madas import MaterialsDatabase" ] }, { "cell_type": "markdown", "id": "12082a9c-5a60-43e1-8884-b09e2d5b0dbc", "metadata": {}, "source": [ "We initialize the `MaterialsDatabase` with our custom `Backend`." ] }, { "cell_type": "code", "execution_count": 24, "id": "dab3ea0e-57a5-4627-a884-7730b9125ad7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_161544/2145170194.py:143: UserWarning: Saving metadata is not supported by the current Backend.\n", " warnings.warn(\"Saving metadata is not supported by the current Backend.\", UserWarning)\n" ] } ], "source": [ "db = MaterialsDatabase(backend=TXTBackend())" ] }, { "cell_type": "markdown", "id": "4f18c46d-cd36-4eae-ada4-3f60c22602df", "metadata": {}, "source": [ "Adding data works as usual:" ] }, { "cell_type": "code", "execution_count": 25, "id": "5983f1e3-3e5d-408e-b600-3cee1477c5b7", "metadata": {}, "outputs": [], "source": [ "db.add_material(\"nsXJ2rebbTL8XJuomZIlntl6_iMK\")\n", "db.add_material(\"iH8jCHxCcYSNysv7E_9OabHZ33qE\")" ] }, { "cell_type": "markdown", "id": "2f569245-f9ec-43ca-a402-4341f6c997f0", "metadata": {}, "source": [ "Similarly, it can be retrieved:" ] }, { "cell_type": "code", "execution_count": 26, "id": "40778367-dc1b-4559-ac14-b166478cc2dd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Material(mid = nsXJ2rebbTL8XJuomZIlntl6_iMK, formula = AlGaO3, data = {'archive'}, properties = set())" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db[0]" ] }, { "cell_type": "markdown", "id": "768add07-3866-4e36-b101-c7646052e972", "metadata": {}, "source": [ "Fingerprints can be added (testing the `update_entries` function):" ] }, { "cell_type": "code", "execution_count": 27, "id": "aa515432-f30b-4bd7-9f0f-f502972966ea", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2026-04-09 18:33:15,926 - materials_database_log - INFO - Generating PTE fingerprints...\n", "2026-04-09 18:33:15,926 - materials_database_log - INFO - Generating \"PTE\" fingerprints.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c1288da175fb4dbc8726082a31d69261", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/2 [00:00" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcAAAAGZCAYAAAAepOFMAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAMf9JREFUeJzt3X1cVHXe//H3gDqj0qCJggiJmoKWN6um4UZmS1J26XptbkiXQmb6UMNMN9c0E0vT7UazX7pbGma13uC2mj3Ky8viksrUzLu6thTTLCkXvA1QU4Q5vz+MM00gMs4oDuf1fDzOY+U753zP99gsHz+f7/ecYzMMwxAAABYTVNMDAACgJhAAAQCWRAAEAFgSARAAYEkEQACAJREAAQCWRAAEAFgSARAAYEkEQACAJREAAT+4//77FRMT49c+b7vtNt12223mz99++61sNpuWLFni1/NMnz5dNpvNr30CgYAAiKvOkiVLZLPZzM3hcKhdu3ZKT09XQUGBJCkmJsZjnwtt5cGiqn1GjRpVg1d7dZo1a5befvvtmh4GcFnZeBYorjZLlizRsGHD9NRTT6lVq1Y6c+aMNm7cqDfffFMtW7bUv/71L61fv14nT540j1m7dq2WL1+uF154QWFhYWZ7r1691Lp1a9lsNt1xxx1KTU2tcL527dqpR48ePo353LlzcrlcstvtPvXzSyUlJZKkevXqSTqfAbZq1Uqvvfaa7r//fr+dp7S0VKWlpXI4HGZbSEiIBg0a5PdsE7ia1KnpAQAXctddd6l79+6SpAcffFBNmjTR3LlztWbNGqWkpHjsm5+fr+XLl2vgwIEXLEW2a9dOQ4YMuSxjrVu3rt/7LA98l8upU6fUsGFD1alTR3Xq8KsA1kMJFAHj9ttvlyQdOHDgip63uLhYjzzyiGJiYmS329WsWTPdcccd2rFjh7nPr+cAy+frnn/+eS1YsECtW7dWgwYN1LdvX+Xl5ckwDM2YMUNRUVGqX7++fv/73+v48eMe5/31HGBlvvjiC91///1q3bq1HA6HIiIi9MADD+jYsWMe+5XP83311Ve677771LhxY91yyy0en5Wz2Ww6deqUXn/9dbNMfP/992vDhg2y2WxavXp1hXEsW7ZMNptNmzdvru5fK2D66KOP1L9/f0VGRspms1Wr/J6Tk6OuXbvKbrfr+uuvv6RqBf/sQ8DYv3+/JKlJkyaXdPyZM2d09OjRCu1Op7PKbGvUqFF66623lJ6erg4dOujYsWPauHGjdu/era5du1Z5zqVLl6qkpERjx47V8ePH9eyzz+ree+/V7bffrpycHE2aNEn79u3TSy+9pEcffVSLFy/26pref/99ffPNNxo2bJgiIiL05ZdfauHChfryyy+1ZcuWCotb/vjHP6pt27aaNWuWLjT78eabb+rBBx9Ujx49NHLkSElSmzZtdPPNNys6OlpLly7Vf/7nf1a4zjZt2ig+Pt6r8QPS+WpE586d9cADD+gPf/jDRfc/cOCA7r77bo0aNUpLly5Vdna2HnzwQTVv3lxJSUnVP7EBXGVee+01Q5LxwQcfGEeOHDHy8vKMFStWGE2aNDHq169vfP/99xWOee655wxJxoEDByrtU9IFt+XLl1c5ntDQUOOhhx6qcp+0tDSjZcuW5s8HDhwwJBlNmzY1fvzxR7N98uTJhiSjc+fOxrlz58z2lJQUo169esaZM2fMtt69exu9e/eu0Odrr71mtp0+fbrCWJYvX25IMj766COzLSMjw5BkpKSkVNi//LNfatiwoZGWllZh38mTJxt2u93jmg4fPmzUqVPHyMjIqLA/4C1JxurVq6vc589//rNxww03eLQlJycbSUlJXp2LEiiuWomJiWratKmio6M1ePBghYSEaPXq1WrRosUl9ff73/9e77//foWtT58+VR7XqFEjffrppzp06JDX5/zjH/+o0NBQ8+eePXtKkoYMGeIx79azZ0+VlJTohx9+8Kr/+vXrm38uz3BvvvlmSfIo0ZbzdcVramqqzp49q7feestsy8rKUmlp6WWbXwV+bfPmzUpMTPRoS0pK8roETwkUV60FCxaoXbt2qlOnjsLDwxUbG6ugoEv/N1tUVFSF/9NUx7PPPqu0tDRFR0erW7du6tevn1JTU9W6deuLHnvdddd5/FweDKOjoyttP3HihFdjO378uJ588kmtWLFChw8f9vissLCwwv6tWrXyqv9fi4uL00033aSlS5dq+PDhks6XP2+++WZdf/31PvWNmnfmzBlz9bGvDMOoUIK32+1+WSmdn5+v8PBwj7bw8HAVFRXpp59+8viHYVUIgLhq9ejRw1wFWpPuvfdeJSQkaPXq1Vq/fr2ee+45PfPMM1q1apXuuuuuKo8NDg72qt3w8q6ke++9V5s2bdLEiRPVpUsXhYSEyOVy6c4775TL5aqwf3V/MVQlNTVV48aN0/fff6+zZ89qy5Ytmj9/vs/9omadOXNGrVqGKP9wmV/6CwkJ8bhVSZIyMjI0ffp0v/TvDwRAoBqaN2+uMWPGaMyYMTp8+LC6du2qp59++qIB8HI6ceKEsrOz9eSTT2ratGlm+9dff+1z31U9GWbw4MGaMGGCli9frp9++kl169ZVcnKyz+dEzSopKVH+4TId2N5Szmt8mx0rKnapVbfvlJeXJ6fTabb76z7ZiIgI86EY5QoKCuR0Or36Rx4BEKhCWVmZTp486TGP16xZM0VGRurs2bM1ODJ3FvnrrHHevHk+992wYUP9+OOPlX4WFhamu+66S3//+9915swZ3XnnnR4PH0BgaxhyfvNF2c9fSafT6REA/SU+Pl5r1671aHv//fe9XoVMAIRl7N27V3//+98rtIeHh+uOO+6o9Jji4mJFRUVp0KBB6ty5s0JCQvTBBx/os88+05w5cy73kKvkdDp166236tlnn9W5c+fUokULrV+/3i/3SXbr1k0ffPCB5s6dq8jISLVq1cpcwCOdL4MOGjRIkjRjxgyfzwdrO3nypPbt22f+fODAAe3atUvXXnutrrvuOk2ePFk//PCD3njjDUnnF3PNnz9ff/7zn/XAAw/of//3f7Vy5Uq99957Xp2XAAjLKF/1+Wu9e/e+YABs0KCBxowZo/Xr12vVqlVyuVy6/vrr9de//lWjR4++3EO+qGXLlmns2LFasGCBDMNQ37599d///d+KjIz0qd+5c+dq5MiRmjp1qn766SelpaV5BMD+/furcePGcrlcGjBggK+XgauIS4Zc8u0Jmd4ev23bNo/V2BMmTJAkpaWlacmSJfr3v/+tgwcPmp+3atVK7733nsaPH68XX3xRUVFRevXVV727B1A8CxTAJSgtLVVkZKT69++vzMzMmh4O/KCoqEihoaE6lBvllznAyNjvVVhYeFlKoP7CfYAAvPb222/ryJEjlT5cHAgUlEABVNunn36qL774QjNmzNBvfvMb9e7du6aHBD8rMwyV+VgY9PX4K4UMEEC1/e1vf9Po0aPVrFkzc0ECapfyOUBft0DAHCAAwJwD/G5PpF/mAFvGHbrq5wApgQIATC4ZKrvCq0BrCgEQAGCqidsgagpzgAAASyIAQjk5ObLZbBd89BXgC75fgaV8FaivWyAgAFrI5s2bFRwcrLvvvvui+xqGoUWLFik+Pl5Op1MhISG64YYbNG7cOI9HFlXXpk2b1K9fPzVu3FgOh0MdO3bU3LlzVVbm+eT5AQMG6LrrrpPD4VDz5s01dOjQS3oPH6686n6/ygPir7epU6dKknJzc9WnTx+Fh4fL4XCodevWmjp1qs6dO2f2cfr0aU2ePFlt2rSRw+FQ06ZN1bt3b61Zs+ayXqMVuPy0BQICoIVkZmZq7Nix+uijj6oMKoZh6L777tPDDz+sfv36af369frqq6+UmZkph8OhmTNnenXe1atXq3fv3oqKitKGDRu0Z88ejRs3TjNnztTgwYM9Hubcp08frVy5Urm5ufrnP/+p/fv3m8+cxNWtut+vcrm5ufr3v/9tbo899pgkqW7dukpNTdX69euVm5urefPmadGiRcrIyDCPHTVqlFatWqWXXnpJe/bs0bp16zRo0CAdO3bssl2fVZT9vAjG1y0gePX+eASs4uJiIyQkxNizZ4+RnJxsPP300+ZnGzZsMCQZJ06cMAzDMJYvX25IMtasWVNpXy6Xy/zz1q1bjcTERKNJkyaG0+k0br31VmP79u3m5ydPnjSaNGli/OEPf6jQzzvvvGNIMlasWHHBca9Zs8aw2WxGSUmJt5eMK8ib79evf66O8ePHG7fccov5c2hoqLFkyZIqj2nZsqXx9NNPG8OGDTNCQkKM6Oho45VXXvHquqyksLDQkGR8ubuZcfD7CJ+2L3c3MyQZhYWFNX1ZVSIDtIiVK1cqLi5OsbGxGjJkiBYvXnzBl68uX75csbGxF3zI8S/fFVdcXKy0tDRt3LhRW7ZsUdu2bdWvXz8VFxdLktavX69jx47p0UcfrdBP//791a5dOy1fvrzS8xw/flxLly5Vr169VLduXW8vGVeQN98vb+3bt0/r1q3zeOpMRESE1q5da37PLmTOnDnq3r27du7cqTFjxmj06NHKzc31y7hqqzLDP1sgIABaRGZmpoYMGSJJuvPOO1VYWKgPP/yw0n337t2r2NhYj7ZHHnlEISEhCgkJUVRUlNl+++23a8iQIYqLi1P79u21cOFCnT592ux77969kqT27dtXeq64uDhzn3KTJk1Sw4YN1aRJEx08eJB5nQDgzferXFRUlPmdCgkJqVC+7NWrlxwOh9q2bauEhAQ99dRT5mcLFy7Upk2b1KRJE910000aP368Pvnkkwrn6Nevn8aMGaPrr79ekyZNUlhYmDZs2OCHK669mANErZKbm6utW7cqJSVFklSnTh0lJyd79RT/xx9/XLt27dK0adN08uRJs72goEAjRoxQ27ZtFRoaKqfTqZMnT3q8ukSq+NLWqkycOFE7d+7U+vXrFRwcrNTUVL9lE/C/S/1+ffzxx9q1a5e5NW7c2OPzrKws7dixQ8uWLdN7772n559/3vzs1ltv1TfffKPs7GwNGjRIX375pRISEiq8m7BTp07mn202myIiInT48GFfLxm1BDfCW0BmZqb5+ppyhmHIbrdr/vz5FfZv27ZthTJR06ZN1bRpUzVr1syjPS0tTceOHdOLL76oli1bym63Kz4+XiUlJZKkdu3aSZJ2796tXr16VTjX7t271aFDB4+2sLAwhYWFqV27dmrfvr2io6O1ZcsWr9/2jCvD2+9XuVatWqlRo0YX/Dw6OlqS1KFDB5WVlWnkyJH605/+pODgYEnnF8skJCQoISFBkyZN0syZM/XUU09p0qRJqlevnrnPL9lsNrlcgZKf1AyXbCqT7eI7XqSPQEAGWMuVlpbqjTfe0Jw5czz+tf35558rMjKy0vm3lJQU5ebmVqv0+Mknn5irRW+44QbZ7XYdPXrU/Lxv37669tprK317+jvvvKOvv/7azBwqU/7L6uzZs9W5XFxhl/L9uhQul0vnzp2rMnh16NBBpaWlOnPmjF/OaVUuwz9bICADrOXeffddnThxQsOHD1doaKjHZ/fcc48yMzP13HPPebQPHjxYq1at0uDBgzV58mQlJSUpPDxc3333nbKyssx/gUvns8U333xT3bt3V1FRkSZOnKj69eubnzds2FCvvPKKBg8erJEjRyo9PV1Op1PZ2dmaOHGiBg0apHvvvVfS+VftfPbZZ7rlllvUuHFj7d+/X0888YTatGlD9neVupTv18UsXbpUdevWVceOHWW327Vt2zZNnjxZycnJZkZ32223KSUlRd27d1eTJk301VdfacqUKerTp89V/fBlXF3IAGu5zMxMJSYmVvjlJJ3/BbVt2zZ98cUXHu02m01ZWVmaN2+e1q5dq9/97neKjY3VAw88oOjoaG3cuNGj/xMnTqhr164aOnSoHn744Qpl0kGDBmnDhg06ePCgEhISFBsbqxdeeEGPP/64VqxYYa4qbdCggVatWmWeb/jw4erUqZM+/PBD2e32y/C3A19dyvfrYurUqaNnnnlGPXr0UKdOnfTkk08qPT1dr776qrlPUlKSXn/9dfXt21ft27fX2LFjlZSUpJUrV/p8TVZX9nMJ1NctEPA6JACA+TqkTV82V4iPr0M6WexSrxv+fdW/DokMEABgScwBAgBMLsMml+HjKlAfj79SCIAAAJM/5vACZQ6QEigAwJLIAAEApjIFqczH3Kjs4rtcFQiAAACT4Yc5QIM5QABAoGEOEPDC2bNnNX36dB5XBr/hO4UrgRvh4bPyG2iv9pteETj4Tl155X/n//1FKzX08Ub4U8Uu3dXpwFX/348SKADA5JJNLh+Lgy4FRl5FCRQAYElkgFVwuVw6dOiQrrnmGvOBzaioqKjI438BX/Gdqj7DMFRcXKzIyEgFBfme01hpEQwBsAqHDh0yX8qJi+PvCv7Gd6r68vLyFBUV5XM/ZUaQygwf7wMMkKUlBMAqXHPNNZKk73bEyBlCtRj+0XF9ak0PAbWI66czOjTxL+bvK1QfAbAK5WVPZ0iQnD6uigLKBdV31PQQUAv5a5rm/CIYHx+GTQkUABBoXH54FBqrQAEAuIqRAQIATCyCAQBYkktB3AgPAEBtRgYIADCVGTaV+fg6I1+Pv1IIgAAAk39eiBsYJVACIADA5DKC5PJxEYwrQBbBMAcIALAkMkAAgIkSKADAklzyfRGLyz9DuewogQIALIkMEABg8s+N8IGRWxEAAQAm/zwKLTACYGCMEgAAPyMDBACYeB8gAMCSKIECAFDLkQECAEz+uRE+MHIrAiAAwOQybHL5eiN8gLwNIjDCNAAAfkYGCAAwufxQAuVGeABAwPHP65AIgACAAFMmm8p8vI/P1+OvlMAI0wAA+BkZIADARAkUAGBJZfK9hFnmn6FcdoERpgEA8DMyQACAiRIoAMCSeBg2AABX0IIFCxQTEyOHw6GePXtq69atVe4/b948xcbGqn79+oqOjtb48eN15swZr85JAAQAmIyf3wfoy2Z4uYgmKytLEyZMUEZGhnbs2KHOnTsrKSlJhw8frnT/ZcuW6bHHHlNGRoZ2796tzMxMZWVlacqUKV6dlwAIADCVl0B93bwxd+5cjRgxQsOGDVOHDh308ssvq0GDBlq8eHGl+2/atEm//e1vdd999ykmJkZ9+/ZVSkrKRbPGXyMAAgAui6KiIo/t7NmzFfYpKSnR9u3blZiYaLYFBQUpMTFRmzdvrrTfXr16afv27WbA++abb7R27Vr169fPq/GxCAYAYPLn65Cio6M92jMyMjR9+nSPtqNHj6qsrEzh4eEe7eHh4dqzZ0+l/d933306evSobrnlFhmGodLSUo0aNcrrEigBEABg8ucLcfPy8uR0Os12u93uU7/lcnJyNGvWLP31r39Vz549tW/fPo0bN04zZszQE088Ue1+CIAAgMvC6XR6BMDKhIWFKTg4WAUFBR7tBQUFioiIqPSYJ554QkOHDtWDDz4oSerYsaNOnTqlkSNH6vHHH1dQUPUCOHOAAABTeQnU16266tWrp27duik7O9s9BpdL2dnZio+Pr/SY06dPVwhywcHBkiTDMKp9bjJAAIDJpSCfX2jr7fETJkxQWlqaunfvrh49emjevHk6deqUhg0bJklKTU1VixYtNHv2bElS//79NXfuXP3mN78xS6BPPPGE+vfvbwbC6iAAAgBMZYZNZT4ugvH2+OTkZB05ckTTpk1Tfn6+unTponXr1pkLYw4ePOiR8U2dOlU2m01Tp07VDz/8oKZNm6p///56+umnvTqvzfAmX7SYoqIihYaG6sTe1nJeQ7UY/tFq7YM1PQTUIq6fzuj79OkqLCy86HxbVcp/343++A+yh9T1aUxnT57T3xJW+Tymy40MEABg8udtEFc7AiAAwGT44W0QBg/DBgDg6kUGCAAwlcnmhzfCUwIFAAQYl+H7HJ4rQJZWUgIFAFgSGSAAwOTywyIYX4+/UgiAAABT+Uttfe0jEARGmAYAwM/IAAEAppp4FFpNIQACAExWmgMMjFECAOBnZIAAAJNLfngWaIAsgiEAAgBMhh9WgRoEQABAoLHS2yCYAwQAWBIZIADAZKVVoARAAICJEigAALUcGSAAwGSlZ4ESAAEAJkqgAADUcmSAAACTlTJAAiAAwGSlAEgJFABgSX4NgDk5ObLZbPrxxx/92S0A4AopzwB93QLBJQXAzZs3Kzg4WHffffdF9zUMQ4sWLVJ8fLycTqdCQkJ0ww03aNy4cdq3b5/X5960aZP69eunxo0by+FwqGPHjpo7d67Kyso89hswYICuu+46ORwONW/eXEOHDtWhQ4e8Ph8AWIkh960Ql7oZNX0R1XRJATAzM1Njx47VRx99VGVQMQxD9913nx5++GH169dP69ev11dffaXMzEw5HA7NnDnTq/OuXr1avXv3VlRUlDZs2KA9e/Zo3LhxmjlzpgYPHizDcP+19+nTRytXrlRubq7++c9/av/+/Ro0aNClXC4AWIaVMkCvF8GcPHlSWVlZ2rZtm/Lz87VkyRJNmTKl0n2zsrK0YsUKrVmzRgMGDDDbr7vuOt18880eAeuzzz7TlClTtHPnTp07d05dunTRCy+8oK5du0qSTp06pREjRmjAgAFauHChedyDDz6o8PBwDRgwQCtXrlRycrIkafz48eY+LVu21GOPPaaBAwfq3Llzqlu3rreXDQCoZbzOAFeuXKm4uDjFxsZqyJAhWrx4sUcg+6Xly5crNjbWI/j9ks3m/ldCcXGx0tLStHHjRm3ZskVt27ZVv379VFxcLElav369jh07pkcffbRCP/3791e7du20fPnySs9z/PhxLV26VL169aoy+J09e1ZFRUUeGwBYiZUyQK8DYGZmpoYMGSJJuvPOO1VYWKgPP/yw0n337t2r2NhYj7ZHHnlEISEhCgkJUVRUlNl+++23a8iQIYqLi1P79u21cOFCnT592ux77969kqT27dtXeq64uDhzn3KTJk1Sw4YN1aRJEx08eFBr1qyp8tpmz56t0NBQc4uOjq5yfwCobQiAF5Cbm6utW7cqJSVFklSnTh0lJycrMzOz2n08/vjj2rVrl6ZNm6aTJ0+a7QUFBRoxYoTatm2r0NBQOZ1OnTx5UgcPHvQ4/kLZZmUmTpyonTt3av369QoODlZqamqVx0+ePFmFhYXmlpeXV+1zAQACi1dzgJmZmSotLVVkZKTZZhiG7Ha75s+fX2H/tm3bKjc316OtadOmatq0qZo1a+bRnpaWpmPHjunFF19Uy5YtZbfbFR8fr5KSEklSu3btJEm7d+9Wr169Kpxr9+7d6tChg0dbWFiYwsLC1K5dO7Vv317R0dHasmWL4uPjK70+u90uu91ejb8JAKiduBG+EqWlpXrjjTc0Z84c7dq1y9w+//xzRUZGVjr/lpKSotzc3IuWHiXpk08+MVeL3nDDDbLb7Tp69Kj5ed++fXXttddqzpw5FY5955139PXXX5uZaWVcLpek8/N8AIDKGYbNL1sgqHYG+O677+rEiRMaPny4QkNDPT675557lJmZqeeee86jffDgwVq1apUGDx6syZMnKykpSeHh4fruu++UlZWl4OBgc9+2bdvqzTffVPfu3VVUVKSJEyeqfv365ucNGzbUK6+8osGDB2vkyJFKT0+X0+lUdna2Jk6cqEGDBunee++VJH366af67LPPdMstt6hx48bav3+/nnjiCbVp0+aC2R8AwFqqnQFmZmYqMTGxQvCTzgfAbdu26YsvvvBot9lsysrK0rx587R27Vr97ne/U2xsrB544AFFR0dr48aNHv2fOHFCXbt21dChQ/Xwww9XKJMOGjRIGzZs0MGDB5WQkKDY2Fi98MILevzxx7VixQpzVWmDBg20atUq83zDhw9Xp06d9OGHH1LiBIAq+HoTvD/eJ3il2AxvVpVYTFFRkUJDQ3Vib2s5r+GxqfCPVmsfrOkhoBZx/XRG36dPV2FhoZxO5yX3U/77rufbD6tOQ98ShdJTZ/XpwP/n85guN36rAwAsidchAQBM/ljEUusWwQAAaj9ugwAAoJYjAwQAmCiBAgAsyfBDCZQACAAIOIYkX2+OC5R765gDBABYEhkgAMDkkk02H5/kEihPgiEAAgBMVloEQwkUAGBJZIAAAJPLsMlmkRvhCYAAAJNh+GEVaIAsA6UECgCwJDJAAIDJSotgCIAAAJOVAiAlUACAJZEBAgBMrAIFAFgSq0ABAKjlyAABAKbzGaCvi2D8NJjLjAAIADBZaRUoARAAYDLk+/v8AiQBZA4QAGBNZIAAABMlUACANVmoBkoJFABQ4xYsWKCYmBg5HA717NlTW7durXL/H3/8UQ899JCaN28uu92udu3aae3atV6dkwwQAODmhxKovDw+KytLEyZM0Msvv6yePXtq3rx5SkpKUm5urpo1a1Zh/5KSEt1xxx1q1qyZ3nrrLbVo0ULfffedGjVq5NV5CYAAAFNNPAlm7ty5GjFihIYNGyZJevnll/Xee+9p8eLFeuyxxyrsv3jxYh0/flybNm1S3bp1JUkxMTFej5MSKADgsigqKvLYzp49W2GfkpISbd++XYmJiWZbUFCQEhMTtXnz5kr7feeddxQfH6+HHnpI4eHhuvHGGzVr1iyVlZV5NT4CIADAVL4K1NdNkqKjoxUaGmpus2fPrnC+o0ePqqysTOHh4R7t4eHhys/Pr3SM33zzjd566y2VlZVp7dq1euKJJzRnzhzNnDnTq2ulBAoAcDNsXs/hVdqHpLy8PDmdTrPZbrf71u/PXC6XmjVrpoULFyo4OFjdunXTDz/8oOeee04ZGRnV7ocACAC4LJxOp0cArExYWJiCg4NVUFDg0V5QUKCIiIhKj2nevLnq1q2r4OBgs619+/bKz89XSUmJ6tWrV63xUQIFAJjKF8H4ulVXvXr11K1bN2VnZ5ttLpdL2dnZio+Pr/SY3/72t9q3b59cLpfZtnfvXjVv3rzawU8iAAIAfsnw0+aFCRMmaNGiRXr99de1e/dujR49WqdOnTJXhaampmry5Mnm/qNHj9bx48c1btw47d27V++9955mzZqlhx56yKvzUgIFANSo5ORkHTlyRNOmTVN+fr66dOmidevWmQtjDh48qKAgd74WHR2t//mf/9H48ePVqVMntWjRQuPGjdOkSZO8Oi8BEABgqqlngaanpys9Pb3Sz3Jyciq0xcfHa8uWLV6f55cIgAAATwHyLE9fEQABACYrvQ2CRTAAAEsiAwQAuFnodUgEQADAL9h+3nzt4+pHCRQAYElkgAAAN0qgAABLslAApAQKALAkMkAAgJsfX4d0tSMAAgBM3r7N4UJ9BAJKoAAASyIDBAC4WWgRDAEQAOBmoTlASqAAAEsiAwQAmGzG+c3XPgIBARAA4MYcIADAkpgDBACgdiMDBAC4UQIFAFiShQIgJVAAgCWRAQIA3CyUARIAAQBurAIFAKB2IwMEAJh4EgwAwJosNAdICRQAYEkEQACAJVECBQCYbPLDHKBfRnL5EQCroeP6VAXVd9T0MFBLHOj3ak0PAbVIUbFLjWt6EAGKAAgAcLPQfYAEQACAm4VWgRIAAQBuFgqArAIFAFgSGSAAwMSTYAAA1kQJFACA2o0MEADgZqEMkAAIADBZaQ6QEigAwJLIAAEAbjwJBgBgSRaaA6QECgCwJDJAAIDJSotgCIAAADcLlUAJgAAANz9kgIESAJkDBABYEhkgAMCNEigAwJIsFAApgQIALIkMEABgstJtEGSAAABLIgACACyJEigAwM1Ci2AIgAAAE3OAAADUcmSAAABPAZLB+YoACABws9AcICVQAIAlkQECAExWWgRDAAQAuFmoBEoABACYrJQBMgcIALAkAiAAwM3w0+alBQsWKCYmRg6HQz179tTWrVurddyKFStks9k0cOBAr89JAAQAuNVAAMzKytKECROUkZGhHTt2qHPnzkpKStLhw4erPO7bb7/Vo48+qoSEBO9O+DMCIACgRs2dO1cjRozQsGHD1KFDB7388stq0KCBFi9efMFjysrK9F//9V968skn1bp160s6LwEQAGAqXwTj6yZJRUVFHtvZs2crnK+kpETbt29XYmKi2RYUFKTExERt3rz5guN86qmn1KxZMw0fPvySr5UACABw82MJNDo6WqGhoeY2e/bsCqc7evSoysrKFB4e7tEeHh6u/Pz8Soe4ceNGZWZmatGiRT5dKrdBAAAui7y8PDmdTvNnu93uc5/FxcUaOnSoFi1apLCwMJ/6IgACANz8eCO80+n0CICVCQsLU3BwsAoKCjzaCwoKFBERUWH//fv369tvv1X//v3NNpfLJUmqU6eOcnNz1aZNm2oNkxIoAMDkzznA6qhXr566deum7Oxss83lcik7O1vx8fEV9o+Li9P//d//adeuXeY2YMAA9enTR7t27VJ0dHS1z00GCACoURMmTFBaWpq6d++uHj16aN68eTp16pSGDRsmSUpNTVWLFi00e/ZsORwO3XjjjR7HN2rUSJIqtF8MARAA4FYDzwJNTk7WkSNHNG3aNOXn56tLly5at26duTDm4MGDCgryf8GSAAgAMNXUs0DT09OVnp5e6Wc5OTlVHrtkyRLvTyjmAAEAFkUGCABw43VIAABLIgACAKzI9vPmax+BgDlAAIAlkQECANwogQIArKimboOoCZRAAQCWRAYIAHCjBAoAsKwACWC+ogQKALAkMkAAgMlKi2AIgAAANwvNAVICBQBYEhkgAMBECRQAYE2UQAEAqN3IAAEAJkqgAABrslAJlAAIAHCzUABkDhAAYElkgAAAE3OAAABrogQKAEDtRgYIADDZDEM2w7cUztfjr5TLngHm5OTIZrPpxx9/vNynAgD4yvDTFgD8FgA3b96s4OBg3X333VXuVx4Qf71NnTpVkpSbm6s+ffooPDxcDodDrVu31tSpU3Xu3Dmzj9OnT2vy5Mlq06aNHA6HmjZtqt69e2vNmjX+uhwAQC3ntxJoZmamxo4dq8zMTB06dEiRkZFV7p+bmyun02n+HBISIkmqW7euUlNT1bVrVzVq1Eiff/65RowYIZfLpVmzZkmSRo0apU8//VQvvfSSOnTooGPHjmnTpk06duyYvy4HACyJVaBeOnnypLKysrRt2zbl5+dryZIlmjJlSpXHNGvWTI0aNarQ3rp1a7Vu3dr8uWXLlsrJydHHH39str3zzjt68cUX1a9fP0lSTEyMunXr5tFPTEyMRo4cqX379ukf//iHGjdurKlTp2rkyJE+XCkA1HKsAvXOypUrFRcXp9jYWA0ZMkSLFy+W4adJ0H379mndunXq3bu32RYREaG1a9equLi4ymPnzJmj7t27a+fOnRozZoxGjx6t3NzcC+5/9uxZFRUVeWwAgNrJLwEwMzNTQ4YMkSTdeeedKiws1IcffljlMVFRUQoJCTG3X5cve/XqJYfDobZt2yohIUFPPfWU+dnChQu1adMmNWnSRDfddJPGjx+vTz75pMI5+vXrpzFjxuj666/XpEmTFBYWpg0bNlxwTLNnz1ZoaKi5RUdHe/PXAAABr7wE6usWCHwOgLm5udq6datSUlIkSXXq1FFycrIyMzOrPO7jjz/Wrl27zK1x48Yen2dlZWnHjh1atmyZ3nvvPT3//PPmZ7feequ++eYbZWdna9CgQfryyy+VkJCgGTNmePTRqVMn8882m00RERE6fPjwBcc0efJkFRYWmlteXl61/x4AoFaw0CpQn+cAMzMzVVpa6rHoxTAM2e12zZ8//4LHtWrVqtI5wHLl2VeHDh1UVlamkSNH6k9/+pOCg4MlnV8sk5CQoISEBE2aNEkzZ87UU089pUmTJqlevXrmPr9ks9nkcrkueE673S673X7RawYABD6fAmBpaaneeOMNzZkzR3379vX4bODAgVq+fLni4uJ8GqAkuVwunTt3Ti6XywyAv9ahQweVlpbqzJkzZgAEAHiHVaDV9O677+rEiRMaPny4QkNDPT675557lJmZqeeee86rPpcuXaq6deuqY8eOstvt2rZtmyZPnqzk5GQzo7vtttuUkpKi7t27q0mTJvrqq680ZcoU9enTx+PWCgCAlyy0CtSnAJiZmanExMQKwU86HwCfffZZffHFF94NqE4dPfPMM9q7d68Mw1DLli2Vnp6u8ePHm/skJSXp9ddf15QpU3T69GlFRkbqP/7jPzRt2jRfLgcAoMDJ4HxlM/x1v0ItVFRUpNDQUEXNn66g+o6aHg5qiQP9Xq3pIaAWKSp2qXG7b1RYWOhTBaz89123e59Wnbq+/b4rPXdG21c+7vOYLjcehg0AcDOM85uvfQQAAiAAwGSlRTC8DxAAYElkgAAAN1aBAgCsyOY6v/naRyCgBAoAsCQyQACAGyVQAIAVsQoUAIBajgwQAODGjfAAACuiBAoAQC1HBggAcGMVKADAiqxUAiUAAgDcLLQIhjlAAIAlkQECAEyUQAEA1mShRTCUQAEAlkQGCAAwUQIFAFiTyzi/+dpHAKAECgCwJDJAAICbhRbBEAABACab/DAH6JeRXH6UQAEAlkQGCABw41FoAAArKr8NwtfNWwsWLFBMTIwcDod69uyprVu3XnDfRYsWKSEhQY0bN1bjxo2VmJhY5f4XQgAEALgZftq8kJWVpQkTJigjI0M7duxQ586dlZSUpMOHD1e6f05OjlJSUrRhwwZt3rxZ0dHR6tu3r3744QevzksABADUqLlz52rEiBEaNmyYOnTooJdfflkNGjTQ4sWLK91/6dKlGjNmjLp06aK4uDi9+uqrcrlcys7O9uq8BEAAgMlmGH7ZJKmoqMhjO3v2bIXzlZSUaPv27UpMTDTbgoKClJiYqM2bN1drzKdPn9a5c+d07bXXenWtBEAAgJvLT5uk6OhohYaGmtvs2bMrnO7o0aMqKytTeHi4R3t4eLjy8/OrNeRJkyYpMjLSI4hWB6tAAQCXRV5enpxOp/mz3W73+zn+8pe/aMWKFcrJyZHD4fDqWAIgAMD0yxKmL31IktPp9AiAlQkLC1NwcLAKCgo82gsKChQREVHlsc8//7z+8pe/6IMPPlCnTp28HiclUACA2xVeBVqvXj1169bNYwFL+YKW+Pj4Cx737LPPasaMGVq3bp26d+/uxQW6kQECAGrUhAkTlJaWpu7du6tHjx6aN2+eTp06pWHDhkmSUlNT1aJFC3MO8ZlnntG0adO0bNkyxcTEmHOFISEhCgkJqfZ5CYAAALcaeBJMcnKyjhw5omnTpik/P19dunTRunXrzIUxBw8eVFCQu2D5t7/9TSUlJRo0aJBHPxkZGZo+fXq1z0sABACYauqFuOnp6UpPT6/0s5ycHI+fv/32W+9PUAnmAAEAlkQGCABws9DDsAmAAACTzXV+87WPQEAJFABgSWSAAAA3SqAAAEu6hNcZVdpHACAAAgBM/nwU2tWOOUAAgCWRAQIA3JgDBABYkiHzfX4+9REAKIECACyJDBAAYLLSIhgCIADAzZAf5gD9MpLLjhIoAMCSyAABAG6sAgUAWJJLks0PfQQASqAAAEsiAwQAmFgFCgCwJgvNAVICBQBYEhkgAMDNQhkgARAA4EYABABYErdBAABQu5EBAgBM3AYBALAmC80BUgIFAFgSGSAAwM1lSDYfMzhXYGSABEAAgJuFSqAEwCoYP/9HdP10poZHgtqkqDhA1ogjIBSdPP99MgIk6FxNCIBVKC4uliQdmviXGh4JapPGNT0A1ErFxcUKDQ31Q09+yAAD5JXwBMAqREZGKi8vT9dcc41sNl/vDK29ioqKFB0drby8PDmdzpoeDmoBvlPVZxiGiouLFRkZ6a8OKYFCCgoKUlRUVE0PI2A4nU5+WcGv+E5Vj38yP+shAAIA3FyGfC5hsgoUABBwDNf5zdc+AgA3wsNndrtdGRkZstvtNT0U1BJ8p3Al2AzWzgKA5RUVFSk0NFSJ0aNVJ8i3f3iUus7qg7y/qbCw8Kqew6UECgBwYw4QAGBJFroNgjlAAIAlkQECANwM+SED9MtILjsCIADAjRIoAAC1GxkgAMDN5ZLk443srsC4EZ4ACABwowQKAEDtRgYIAHCzUAZIAAQAuFnoSTCUQAEAlkQGCAAwGYZLho+vM/L1+CuFAAgAcDMM30uYATIHSAkUAGBJZIAAADfDD4tgAiQDJAACANxcLsnm4xwec4AAgIBjoQyQOUAAgCWRAQIATIbLJcPHEii3QQAAAg8lUAAAajcyQACAm8uQbNbIAAmAAAA3w5DPL8QNkABICRQAYElkgAAAk+EyZPhYAjXIAAEAAcdw+Wfz0oIFCxQTEyOHw6GePXtq69atVe7/j3/8Q3FxcXI4HOrYsaPWrl3r9TkJgACAGpWVlaUJEyYoIyNDO3bsUOfOnZWUlKTDhw9Xuv+mTZuUkpKi4cOHa+fOnRo4cKAGDhyof/3rX16d12YESq4KALhsioqKFBoaqtts/6k6tro+9VVqnFOOsVqFhYVyOp0X3b9nz5666aabNH/+fEmSy+VSdHS0xo4dq8cee6zC/snJyTp16pTeffdds+3mm29Wly5d9PLLL1d7nGSAAAC3K1wCLSkp0fbt25WYmGi2BQUFKTExUZs3b670mM2bN3vsL0lJSUkX3P9CWAQDADCV6pzPD4Ip1TlJ57PKX7Lb7bLb7R5tR48eVVlZmcLDwz3aw8PDtWfPnkr7z8/Pr3T//Px8r8ZJAAQAqF69eoqIiNDGfO8Xk1QmJCRE0dHRHm0ZGRmaPn26X/r3BwIgAEAOh0MHDhxQSUmJX/ozDEM2m82j7dfZnySFhYUpODhYBQUFHu0FBQWKiIiotO+IiAiv9r8QAiAAQNL5IOhwOK7oOevVq6du3bopOztbAwcOlHR+EUx2drbS09MrPSY+Pl7Z2dl65JFHzLb3339f8fHxXp2bAAgAqFETJkxQWlqaunfvrh49emjevHk6deqUhg0bJklKTU1VixYtNHv2bEnSuHHj1Lt3b82ZM0d33323VqxYoW3btmnhwoVenZcACACoUcnJyTpy5IimTZum/Px8denSRevWrTMXuhw8eFBBQe6bFnr16qVly5Zp6tSpmjJlitq2bau3335bN954o1fn5T5AAIAlcR8gAMCSCIAAAEsiAAIALIkACACwJAIgAMCSCIAAAEsiAAIALIkACACwJAIgAMCSCIAAAEsiAAIALIkACACwpP8P9ViaRbKhwzcAAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure()\n", "plt.matshow(similarity_matrix)\n", "plt.xticks(range(len(db)), [entry.atoms.get_chemical_formula() for entry in db])\n", "plt.yticks(range(len(db)), [entry.atoms.get_chemical_formula() for entry in db])\n", "plt.title(\"PTE similarity\")\n", "plt.clim(0,1)\n", "plt.colorbar()\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.8" } }, "nbformat": 4, "nbformat_minor": 5 }