Administration ============== General Notes ------------- Almost all aspects of Enterprise Nucleus Server stack are documented via its settings (:code:`.env`) file included in the stack tarball we provide. We try to keep our documentation on settings and options as close to the "code" as possible here. That file should be self explanatory, with settings and comments talking about what they do. This document should be considered an addendum to information in the :code:`.env` file. Monitoring ---------- Monitoring your instances is imperative to understanding the general health of the system and if more resources are necessary. At a minimum, one should monitor: * Disk space * CPU and LA * Memory Additionally, Nucleus stack itself exposes quite a few metrics about its load characteristics (such as amount of requests per user, per request type, etc). We recommend to take advantage of these metrics. We expose them to be consumable by `Prometheus `_. As usual, the port for scraping metrics can be found in the Stack's :code:`.env` file. .. _nuc_usage_backups: Data Management and Backups --------------------------- Nucleus Data Directory contains multiple sub-directories utilized by various :ref:`Nucleus components `: * :code:`data` subdir contains core data - actual data residing in Nucleus (elements of Nucleus's file tree: their content and metadata (ACLs, timestamps, etc)) uploaded by its users * :code:`local-accounts-db` and :code:`tags-db` are Authentication and Tagging services' databases respectively * :code:`log` subdir contains log files * :code:`scratch`, :code:`resolver-cache`, and :code:`tmp` contain internal cache and scratch spaces Core Nucleus Data +++++++++++++++++ :code:`data` directory with the actual data hosted by Nucleus is opaque and should not be changed or modified externally. If making a copy of this directory (for migration to another machine for example), **Nucleus stack must be stopped**. This bears repeating - **copying Core data directory "hot" can not be done safely.** If backups of Core Data are desired, :doc:`nucleus-tools ` package contains necessary tooling to create and restore copies of Core Data. Services Data Dirs ++++++++++++++++++ These include accounts and tags' databases. They can be safely copied and backed up "hot" (while respective services are running). Logs ++++ Logs are text files and should be self-evident, with one major note: **live logs (files that are being appended to by services) in general are not externally rotatable**. However, our stack includes rotation and archival sidecars in it, and you can certainly blow away archives with no ill effects (aside from losing log data, of course). They can be copied without stopping services without problem. Scratch, Temp, Cache Data +++++++++++++++++++++++++ Data located in internal caches and scratch spaces of Nucleus does not require backups, and can be deleted without ill effects, but only **when the stack is not running**. .. _nuc_enterprise_usage_migration: Migration and Upgrades: Methodology +++++++++++++++++++++++++++++++++++ **We do not support data migration from pre-2021.2.0 versions to 2021.2.0.** :guilabel:`Nucleus 2021.2.0` When moving between servers, a method we find convenient is so-called *blue-green* approach: where a new instance is brought up alongside the old one and validated prior to switching users over to it. Here's a helpful recipe for server migration with as minimal a downtime as possible (*using short hostnames here for clarity, in practice, we recommend using FQDNs when deploying Nucleus*): * Suppose Nucleus deployed on :code:`nucleus-host-1`, and we desire to migrate to :code:`nucleus-host-2`. A DNS CNAME users are utilizing when accessing this instance is :code:`my-nucleus`, and currently it resolves into :code:`nucleus-host-1` * :code:`nucleus-host-2` is brought up and the entire data directory is :code:`rsync`'d from :code:`nucleus-host-1`. Note that while hot copies are not supported (see above), we can do that to transfer the bulk of data before shutting down the source instance. Also, there's a high degree of probability that this copy will not be corrupt enough to preclude Nucleus from starting. * Nucleus Stack is configured and launched on :code:`nucleus-host-2`. Data upgrade is performed, if required. It then is validated to be operational. * Downtime for :code:`my-nucleus` is scheduled, and users are notified. * During the downtime, * Before the scheduled downtime window, :code:`nucleus-host-1`'s data is :code:`rsync`'d to :code:`nucleus-host-2` again - this will not be the final sync, but will bring those two servers' data very close in state * Nucleus on :code:`nucleus-host-1` is shut down * Nucleus on :code:`nucleus-host-2` is shut down * Data is :code:`rsync`'d again. This will be very quick (seconds for terabytes of data) * Data upgrade is performed (because ``rsync`` will "revert" the upgrade that was made when doing the initial test deployment) * Nucleus on :code:`nucleus-host-2` is brought up and quickly validated * DNS CNAME :code:`my-nucleus` is updated to resolve to :code:`nucleus-host-2` Authentication -------------- Covered in :doc:`Authentication and User Registration <../usage/auth_user_mgmt>` document.