Commit b7f1e504 authored by Dmytro Karpenko's avatar Dmytro Karpenko
Browse files

* Better formatting for README file;

* Adding a note on compatibility.
parent f57c8f67
Prerequisites
=============
The tool works by converting JURA's archived records in XML format into JSON and submitting them to
a specified ES instance. It is therefore assumed that the archiving option for JURA is turned on in arc.conf.
The tool works by converting JURA's archived records in XML format into JSON
and submitting them to a specified ES instance. It is therefore assumed that
the archiving option for JURA is turned on in arc.conf.
Please note, that the tool is being tested only against the latest version of
the ES, and with the latest 'elasticsearch-py' module. The current version of
the master tree is not guaranteed to work with non-latest ES.
Installation
============
1. Make sure the next python modules are present on the endpoint (many are a part of default python installation):
1. Make sure the next python modules are present on the endpoint (many are a
part of default python installation):
* xml
* json
* glob
......@@ -22,28 +28,33 @@ Installation
* elasticsearch
* watchdog
2. In the source code directory, run 'python setup.py install'.
2a. If you need to use other ES host/port and index, than the hardcoded defaults,
run 'python setup.py install --eshost <hostname> --esport <portnumber> --esindex <indexname>'
2a. If you need to use other ES host/port and index, than the hardcoded
defaults,
run 'python setup.py install --eshost <hostname> --esport <portnumber>
--esindex <indexname>'
3. Start the tool with '/etc/init.d/jura_to_es start'
4. If needed, make 'jura_to_es' service startable at the boot with chkconfig
Logging
=======
The tool will log by default to '/var/log/arc/jura_to_es.log'. The setup script also installs a logrorate
entry for this file.
The tool will log by default to '/var/log/arc/jura_to_es.log'. The setup script
also installs a logrorate entry for this file.
Some technical stuff
====================
* The tool submits everything as type arcusagerecord, and that means, it is expected that the mapping from the repo is applied to the receiving index before the tool starts working.
* The init script has 'dmytrok_arc_test' index hardcoded in itself. Should be changed when/if we create new production indices for ARC data.
* The tool does not allow to specify anything through setup.py. It should be made configurable at some point.
* The init script has 'dmytrok_arc_test' index hardcoded in itself. Should be
changed when/if we create new production indices for ARC data.
* See also 'python jura_to_es.py --help'.
* The tool currently works not in the most efficient way. It picks up a new file from the JURA archive dir,
converts it, and submits to the ES endpoint. Since ARC creates one record per accounting endpoint, for the
same job -- it means the tool converts the same job's archival record several times and polls ES several
times as well (the submit only happens if the poll returns "The document does not exist" status). That's why
it's so desirable that ARC itself performs the submission to ES: treating the ES as just yet another endpoint
would only interact with the ES once per each job and even can allow to perform batch-submissions.
* If ARC does not implement it and we have to rely on the tool for a long time, it might be desirable to
implement some mechanism that polls/submits the ES only once per each job.
* The tool currently works not in the most efficient way. It picks up a new
file from the JURA archive dir, converts it, and submits to the ES endpoint.
Since ARC creates one record per accounting endpoint, for the same job -- it
means the tool converts the same job's archival record several times and polls
the ES several times as well (the submit only happens if the poll returns "The
document does not exist" status). That's why it's so desirable that ARC itself
performs the submission to ES: treating the ES as just yet another endpoint
would only interact with the ES once per each job and even can allow to perform
batch-submissions.
* If ARC does not implement it and we have to rely on the tool for a long time,
it might be desirable to implement some mechanism that polls/submits the ES only
once per each job.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment