Catalog#
Description#
Takes a root URL or a TSV file list to build a catalog via either a file tree walk or a TSV file list, returning a catalog manifest.
The catalog command can be used to create a list of files in a specified subtree and store the result together with explicit version information in a catalog (aka manifest) file. It will not capture empty folders.
The JSON file produced archives the files and their versions at the time the command is run. Being a server, running the command again might produce a different catalog when files are added, deleted, or updated in the meantime.
To determine if the cataloged version is still the same, use the Diff command to compare two catalogs made at different points in time or even different copy locations of the same asset.
Use --hash-cache-file (--hcf) to speed up repeated catalog operations by
caching file hashes between runs. See Generic Parameters for details.
Legacy pickle cache files (WRAPP <=2.1.0) are rejected; see CLI for JSON format and
one-time migration guidance.
For usage examples, see the Tutorial. For CLI options, run wrapp catalog --help.
Python API Reference#
- async wrapp.catalog(
- root_dir: str | None = None,
- file_list: str | List[Tuple[str, str]] | None = None,
- tags: bool = False,
- ignores: str | IgnoreEvaluator | None = None,
- local_hash: bool = False,
- checkpoints: bool = True,
- skip_missing: bool = False,
- *,
- context: CommandParameters = CommandParameters(debug=False, verbose=False, dry_run=False, log_file=None, hash_cache_file=None),
- scheduler: SchedulerContext | None = None,
Takes a root URL or a file list to build a catalog via either a file tree walk or a file list tsv file, returning a Catalog
- Parameters:
root_dir – URL of the root of the files to be cataloged
file_list – Alternative to root_dir, specify the name of a tab separated file containing a list of files to include, or a list explicitly listing all files as tuples base path, relative path.
tags – Set this to additionally query the tagging service and make tags part of the catalog
ignores – Specify the name of an ignore file containing rules for ignoring items, or an IgnoreEvaluator object
local_hash – Flag to allow the catalog operation to calculate the hash locally, potentially after a download of the data
checkpoints – Flag, default on, to list checkpoints of local items and add the latest checkpoint into the source URL if the source URL supports checkpoints
skip_missing – When True and using file_list mode, silently skip files that are not found instead of raising an error. This is useful for targeted cataloging where some files may not exist in the destination.
context – Global configuration parameters
scheduler – Optionally pre-constructed SchedulerContext. When calling many functions in a row make sure to pre-construct the scheduler.
- Returns:
Catalog created
- Raises:
FailedCommand – When prerequisites not matched
StorageOperationError – Raised when network or file operations fail