Catalog#
Description#
Takes a root URL or a file list to build a catalog via either a file tree walk or a file list tsv file, returning a Catalog.
The catalog command can be used to create a list of files in a specified subtree and store the result together with explicit version information in a catalog (aka manifest) file. It will not capture empty folders.
The JSON file produced has archived the files and their versions at the very moment the command was run. Being a server, running the command again might produce a different catalog when files are added, deleted, or updated in the meantime.
To be able to determine if the version that was cataloged is still the same, we can use the diff command to compare two
catalogs made at different points in time or even different copy locations of the same asset.
For usage examples, see the Tutorial. For CLI options, run wrapp catalog --help.
Python API Reference#
- async wrapp.catalog(
- root_dir: str | None = None,
- file_list: str | List[Tuple[str, str]] | None = None,
- tags: bool = False,
- ignores: str | IgnoreEvaluator | None = None,
- local_hash: bool = False,
- checkpoints: bool = True,
- *,
- context: CommandParameters = CommandParameters(debug=False, verbose=False, dry_run=False, log_file=None, hash_cache_file=None),
- scheduler: SchedulerContext | None = None,
Takes a root URL or a file list to build a catalog via either a file tree walk or a file list tsv file, returning a Catalog
- Parameters:
root_dir – URL of the root of the files to be cataloged
file_list – Alternative to root_dir, specify the name of a tab separated file containing a list of files to include, or a list explicitly listing all files as tuples base path, relative path.
tags – Set this to additionally query the tagging service and make tags part of the catalog
ignores – Specify the name of an ignore file containing rules for ignoring items
local_hash – Flag to allow the catalog operation to calculate the hash locally, potentially after a download of the data
checkpoints – Flag, default on, to list checkpoints of local items and add the latest checkpoint into the source URL if the source URL supports checkpoints
context – Global configuration parameters
scheduler – Optionally pre-constructed SchedulerContext. When calling many functions in a row make sure to pre-construct the scheduler.
- Returns:
Catalog created
- Raises:
FailedCommand – When prerequisites not matched
StorageOperationError – Raised when network or file operations fail