Commit e7e820ea authored by Andrii Salnikov's avatar Andrii Salnikov

A-REX cache and ACIX description from ARC CE Sysadm Manual ported

parent dc41daab
Pipeline #4608 passed with stages
in 3 minutes and 2 seconds
.. _arc_acix:
The ARC Cache Index (ACIX)
==========================
The ARC Cache Index (ACIX) is a catalog of locations of cached files.
It consists of two components, one on the computing resource: the *ACIX Scanner*,
and the *ACIX Index* which indexes the cache locations
retrieved from the ACIX Scanners. These components can be found
respectively in the packages ``nordugrid-arc-acix-scanner`` and
``nordugrid-arc-acix-index``. They both depend on a third package,
``nordugrid-arc-acix-core``.
ACIX Scanner
------------
The ACIX Scanner periodically scans the :ref:`A-REX cache <arex_cache>` and constructs a
Bloom filter of cache content. This filter is a way of representing the
cache content in an extremely compressed format, which allows fast query
of any element of the filter and efficient upload of the content to an
index server.
This type of compression however has the possibility of
giving false-positives, i.e. a certain file may appear to be present in
the cache according to the filter when it is not. The ACIX Scanner runs
in an HTTPS server and the filter is accessible at the endpoint
``https://hostname:5443/data/cache``.
It scans the caches specified in
the A-REX ``arc.conf``. It does not require any configuration but some
options can be changed and it is important to make sure
the ACIX Scanner port (default 5443) is open in the firewall.
ACIX Index
----------
The ACIX Index server runs independently of the ACIX Scanner and A-REX, but
can be deployed on the same host as both of them. It is configured with
a list of ACIX Scanners and periodically pulls the cache filter from
each one. It runs within an HTTPS server through which users can query
the cached locations of files. Configuration uses the regular ``arc.conf``
file in the :ref:`reference_acix-index`. Here ACIX Scanners are
specified by the :ref:`reference_acix-index_cachescanner` option. For example:
.. code-block:: ini
[acix-index]
cachescanner = https://my.host:5443/data/cache
cachescanner = https://another.host:5443/data/cache
The ACIX Index server can be queried at the endpoint
``https://hostname:6443/data/index`` and the list of URLs to check are
given as comma-separated values to the option “url" of this URL, e.g::
https://hostname:6443/data/index?url=http://www.nordugrid.org:80/data/echo.sh,\
http://my.host/data1
A JSON-formatted response is returned, consisting of a dictionary
mapping each URL to a list of locations. If remote access to cache is
configured as described above then the location will be the endpoint at
which to access the cached file, for example
``https://a-rex.host/a-rex/cache``. If not then simply the hostname will
be returned.
Using ACIX with A-REX Data Staging
----------------------------------
ACIX can be used as a fallback mechanism for A-REX downloads of input
files required by jobs by specifying :ref:`reference_arex_data-staging_use_remote_acix` in the
:ref:`reference_arex_data-staging` of arc.conf, e.g.:
.. code-block:: ini
[arex/data-staging]
use_remote_acix = https://cacheindex.ndgf.org:6443/data/index
If a download from the primary source fails, A-REX can try to use any
cached locations provided in ACIX if the cache is exposed at those
locations. In some cases it may even be preferred to download from a
close SE cache rather than Grid storage and this can be configured using
the :ref:`reference_arex_data-staging_preferredpattern` configuration option which tells A-REX in which
order to try and download replicas of a file.
Using ACIX for ARC client brockering
------------------------------------
ACIX can also be used for data-based brokering for ARC jobs. An
ACIX-based broker plugin written in Python comes packaged with the ARC
client tools (in ``$ARC_LOCATION/share/arc/examples/PythonBroker/ACIXBroker.py``) and can be used for example with:
.. code-block:: console
[user ~]$ arcsub -b PythonBroker:ACIXBroker.ACIXBroker:https://cacheindex.ndgf.org:6443/data/index
Target sites for job submission are ranked in order of how many input
files required by the job are cached there. See the comments inside this
Python file for more information.
Deployment use-case
-------------------
.. _fig_acix:
.. figure:: images/ACIX.pdf
:align: center
ACIX deployment scenario, with one global ACIX Index and a local ACIX Index for CE 1a and CE 1b.
:numref:`fig_acix` shows an example ACIX set up. Each
CE runs a ACIX Scanner and there is a central ACIX Index server which pulls
content from all CEs. In addition there is one site with two CEs, CE 1a
and CE 1b.
In order to do data-based brokering on just those two sites
(and ease the load on the global ACIX Index server), a local ACIX Index is
running which pulls content from only these two sites. In such a setup
if may be desired to prefer to dowload data from the cache on CA 1a to
CE 1b and vice versa, so those CEs could be configured with the Local
ACIX Index server as the ``use_remote_acix`` and each other’s hostname first in
``preferredpattern``.
This diff is collapsed.
......@@ -5,6 +5,8 @@ ARC Data Services Technical Description
:maxdepth: 2
overview.rst
arex_cache.rst
datastaging.rst
dds.rst
acix.rst
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment