Administration
==============

General Notes
-------------

Almost all aspects of  Enterprise Nucleus Server stack are documented via its
settings (:code:`.env`) file included in the stack tarball we provide. 

We try to keep our documentation on settings and options as close to the "code" as possible here. 

That file should be self explanatory, with settings and comments talking about 
what they do. 

This document should be considered an addendum to information in the 
:code:`.env` file. 

Monitoring
----------

Monitoring your instances is imperative to understanding the general health of 
the system and if more resources are necessary. At a minimum, one should 
monitor:

* Disk space
* CPU and LA
* Memory 

Additionally, Nucleus stack itself exposes quite a few metrics about its load
characteristics (such as amount of requests per user, per request type, etc).

We recommend to take advantage of these metrics. We expose them to be 
consumable by `Prometheus <http://prometheus.io>`_. As usual, the port
for scraping metrics can be found in the Stack's :code:`.env` file. 

.. _nuc_usage_backups:

Data Management and Backups
---------------------------

Nucleus Data Directory contains multiple sub-directories utilized by various
:ref:`Nucleus components <nuc_overview_arch>`: 

* :code:`data` subdir contains core data - actual data residing in Nucleus (elements of Nucleus's file tree: their content and metadata (ACLs, timestamps, etc)) uploaded by its users
* :code:`local-accounts-db` and :code:`tags-db` are Authentication and Tagging services' databases respectively
* :code:`log` subdir contains log files
* :code:`scratch`, :code:`resolver-cache`, and :code:`tmp` contain internal cache and scratch spaces

Core Nucleus Data
+++++++++++++++++

:code:`data` directory with the actual data hosted by Nucleus is opaque and should
not be changed or modified externally. 

If making a copy of this directory (for migration to another machine for
example), **Nucleus stack must be stopped**. This bears repeating - **copying 
Core data directory "hot" can not be done safely.**

If backups of Core Data are desired, :doc:`nucleus-tools <nucleus_tools>` package contains necessary tooling to create and restore
copies of Core Data. 

Services Data Dirs
++++++++++++++++++

These include accounts and tags' databases. 
They can be safely copied and  backed up "hot" (while respective services are
running). 

Logs
++++

Logs are text files and should be self-evident, with one major note:
**live logs (files that are being appended to by services) in general are 
not externally rotatable**. However, our stack includes rotation and 
archival sidecars in it, and you can certainly blow away archives with no
ill effects (aside from losing log data, of course). They can be copied 
without stopping services without problem. 

Scratch, Temp, Cache Data 
+++++++++++++++++++++++++

Data located in internal caches and scratch spaces of Nucleus does not require
backups, and can be deleted without ill effects, but only **when the stack 
is not running**.

.. _nuc_enterprise_usage_migration:

Migration and Upgrades: Methodology
+++++++++++++++++++++++++++++++++++

**We do not support data migration from pre-2021.2.0 versions to 2021.2.0.**

:guilabel:`Nucleus 2021.2.0`

When moving between servers, a method we find convenient is so-called 
*blue-green* approach: where a new instance is brought up alongside
the old one and validated prior to switching users over to it. 

Here's a helpful recipe for server migration with as minimal a downtime 
as possible (*using short hostnames here for clarity, in practice, we recommend using FQDNs when deploying Nucleus*):

* Suppose Nucleus deployed on :code:`nucleus-host-1`, and we desire to migrate to :code:`nucleus-host-2`. A DNS CNAME users are utilizing when accessing this instance is :code:`my-nucleus`, and currently it resolves into :code:`nucleus-host-1`
* :code:`nucleus-host-2` is brought up and the entire data directory is :code:`rsync`'d from :code:`nucleus-host-1`. Note that while hot copies are not supported (see above), we can do that to transfer the bulk of data before shutting down the source instance. Also, there's a high degree of probability that this copy will not be corrupt enough to preclude Nucleus from starting. 
* Nucleus Stack is configured and launched on :code:`nucleus-host-2`. Data upgrade is performed, if required. It then is validated to be operational.
* Downtime for :code:`my-nucleus` is scheduled, and users are notified. 
* During the downtime, 

    * Before the scheduled downtime window, :code:`nucleus-host-1`'s data is :code:`rsync`'d to :code:`nucleus-host-2` again - this will not be the final sync, but will bring those two servers' data very close in state
    * Nucleus on :code:`nucleus-host-1` is shut down
    * Nucleus on :code:`nucleus-host-2` is shut down
    * Data is :code:`rsync`'d again. This will be very quick (seconds for terabytes of data)
    * Data upgrade is performed (because ``rsync`` will "revert" the upgrade that was made when doing the initial test deployment)
    * Nucleus on :code:`nucleus-host-2` is brought up and quickly validated 
    * DNS CNAME :code:`my-nucleus` is updated to resolve to :code:`nucleus-host-2`

Authentication
--------------

Covered in :doc:`Authentication and User Registration <../usage/auth_user_mgmt>` document.