|
|
After completing the setup, you should be able to run
|
|
|
|
|
|
netatmoqc [opts] SUBCOMMAND [subcommand_opts]
|
|
|
|
|
|
where `[opts]` and `[subcommand_opts]` denote optional command line arguments
|
|
|
that apply, respectively, to `netatmoqc` in general and to `SUBCOMMAND`
|
|
|
specifically.
|
|
|
|
|
|
**Please run `netatmoqc -h` for information** about the supported subcommands
|
|
|
and general `netatmoqc` options. For info about specific subcommands and the
|
|
|
options that apply to them only, **please run `netatmoqc SUBCOMMAND -h`** (note
|
|
|
that the `-h` goes after the subcommand in this case).
|
|
|
|
|
|
**N.B.:** A typical `netatmoqc` run with the (preferred) clustering method
|
|
|
[HDBSCAN](https://hdbscan.readthedocs.io/en/latest/index.html) seems to need ca.
|
|
|
**20 GB** of RAM and takes a couple of minutes to finish. Other implemented
|
|
|
clustering strategies have more modest RAM requirements, but:
|
|
|
* [DBSCAN](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html)
|
|
|
results are not as good as HDBSCAN's in our context
|
|
|
* [OPTICS](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.OPTICS.html)
|
|
|
produces similar results as HDBSCAN but runs *much* slower
|
|
|
|
|
|
|
|
|
### Parallelism (single-host or MPI)
|
|
|
|
|
|
The `select` subcommand supports parallelism over DTGs. How to activate it
|
|
|
depends on whether you wish to run `netatmoqc` on a single host or if you wish
|
|
|
to distribute computations over different computers (e.g. on an HPC cluster).
|
|
|
|
|
|
* If you are running `netatmoqc` in a single host, then you can export the
|
|
|
environment variable `NETATMOQC_MAX_PYTHON_PROCS` to any value larger
|
|
|
than 1 and run the code as usual. Don't forget, however, to take into
|
|
|
account the memory requirements discussed in the previous section!
|
|
|
|
|
|
* If you wish to run `netatmoqc` with MPI, then you must have installed it
|
|
|
with MPI support. Assuming this is the case, you can then run the code as
|
|
|
|
|
|
mpiexec -n 1 [-usize N] netatmoqc --mpi [opts] select [subcommand_opts]
|
|
|
|
|
|
Notice that:
|
|
|
* Arguments between square brackets are optional
|
|
|
* The `--mpi` switch must come before any subcommand
|
|
|
* **The value "1" in `-n 1` is mandatory.** The code will always start with one
|
|
|
"manager" task which will dynamically spawn new worker tasks as needed
|
|
|
(up to a maximum number).
|
|
|
* If `-usize N` is passed, then `N` should be an integer greater than zero.
|
|
|
`N` defines the maximum number of extra workers that the manager task is
|
|
|
allowed to spawn if necessary.
|
|
|
* If `-usize N` is not passed, then:
|
|
|
* If the run is part of a submitted job managed by SLURM or PBS, then `N`
|
|
|
will be automatically determined from the options passed to the
|
|
|
scheduler (e.g. `--nnodes`, `--ntasks`, `--mem-per-cpu`, etc for SLURM).
|
|
|
* If the run is running interactive: `N` will take the value of the
|
|
|
environment variable `NETATMOQC_MAX_PYTHON_PROCS` if set, or, otherwise,
|
|
|
will be set to 1.
|
|
|
* No more than `length(DTGs)` new worker tasks will be spawn |
|
|
\ No newline at end of file |