Problem with iterative outlier removal method
I ran into an issue trying out alternative clustering and outlier removal methods. Using clustering methods "optics" or "dbscan" with otherwise default configurations runs into the following error in both cases:
config.toml:
[general]
data_rootdir = "/tmp/data_rootdir"
outdir = "/tmp/outdir"
dtgs.start = "2021030500"
dtgs.end = "2021030521"
clustering_method = "optics" # or "dbscan"
error:
Reading config file /tmp/config.toml
[36mDTG=2021-03-05T00 UTC[0m: Started
Traceback (most recent call last):
File "/usr/local/airflow/.local/bin/netatmoqc", line 8, in <module>
sys.exit(main())
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/main.py", line 28, in main
args.func(args)
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/commands_functions.py", line 197, in select_stations
for dtg in config.general.dtgs
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/parallel.py", line 1041, in __call__
if self.dispatch_one_batch(iterator):
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/parallel.py", line 263, in __call__
for func, args, kwargs in self.items]
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/parallel.py", line 263, in <listcomp>
for func, args, kwargs in self.items]
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/commands_functions.py", line 148, in _select_stations_single_dtg
df=df, config=config, n_jobs=cpu_share, calc_silhouette_samples=False,
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/clustering.py", line 524, in cluster_netatmo_obs
df=df_sub, config=config, **pre_clustering_kwargs
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/clustering.py", line 452, in _cluster_netatmo_obs_one_domain
**kwargs,
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/clustering.py", line 339, in run_clustering_on_df
reclustering_function=self_consistent_reclustering,
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/outlier_removal.py", line 364, in filter_outliers
rtn = filter_outliers_iterative(df, **kwargs)
TypeError: filter_outliers_iterative() got an unexpected keyword argument 'method'
However, at least when using the "optics" clustering method, adding the following to config.toml makes the error disappear, so my guess would be that the problem is in the default iterative method:
[clustering_method.optics.outlier_removal]
method = "lof"