Task 2.3 issueshttps://source.coderefinery.org/groups/iOBS/wp2/task-2-3/-/issues2022-02-14T14:38:08Zhttps://source.coderefinery.org/iOBS/wp2/task-2-3/netatmoqc/-/issues/5Problem with iterative outlier removal method2022-02-14T14:38:08ZMatias WargelinProblem with iterative outlier removal methodI ran into an issue trying out alternative clustering and outlier removal methods. Using clustering methods "optics" or "dbscan" with otherwise default configurations runs into the following error in both cases:
config.toml:
```
[gener...I ran into an issue trying out alternative clustering and outlier removal methods. Using clustering methods "optics" or "dbscan" with otherwise default configurations runs into the following error in both cases:
config.toml:
```
[general]
data_rootdir = "/tmp/data_rootdir"
outdir = "/tmp/outdir"
dtgs.start = "2021030500"
dtgs.end = "2021030521"
clustering_method = "optics" # or "dbscan"
```
----------------
error:
```
Reading config file /tmp/config.toml
[36mDTG=2021-03-05T00 UTC[0m: Started
Traceback (most recent call last):
File "/usr/local/airflow/.local/bin/netatmoqc", line 8, in <module>
sys.exit(main())
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/main.py", line 28, in main
args.func(args)
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/commands_functions.py", line 197, in select_stations
for dtg in config.general.dtgs
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/parallel.py", line 1041, in __call__
if self.dispatch_one_batch(iterator):
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/parallel.py", line 263, in __call__
for func, args, kwargs in self.items]
File "/usr/local/airflow/.local/lib/python3.6/site-packages/joblib/parallel.py", line 263, in <listcomp>
for func, args, kwargs in self.items]
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/commands_functions.py", line 148, in _select_stations_single_dtg
df=df, config=config, n_jobs=cpu_share, calc_silhouette_samples=False,
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/clustering.py", line 524, in cluster_netatmo_obs
df=df_sub, config=config, **pre_clustering_kwargs
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/clustering.py", line 452, in _cluster_netatmo_obs_one_domain
**kwargs,
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/clustering.py", line 339, in run_clustering_on_df
reclustering_function=self_consistent_reclustering,
File "/usr/local/airflow/.local/lib/python3.6/site-packages/netatmoqc/outlier_removal.py", line 364, in filter_outliers
rtn = filter_outliers_iterative(df, **kwargs)
TypeError: filter_outliers_iterative() got an unexpected keyword argument 'method'
```
However, at least when using the "optics" clustering method, adding the following to config.toml makes the error disappear, so my guess would be that the problem is in the default iterative method:
```
[clustering_method.optics.outlier_removal]
method = "lof"
```https://source.coderefinery.org/iOBS/wp2/task-2-3/netatmoqc/-/issues/4Float indices in DomainGrid.thin_obs()2021-03-18T08:51:35ZMarkus Koskelamarkus.koskela@csc.fiFloat indices in DomainGrid.thin_obs()When running thinning, I encountered an error message about float indices. I fixed it with the following line (shown as screenshot as there's something weird happening with gitlab code blocks):
![Screenshot_2021-03-12_at_10.46.23](/uplo...When running thinning, I encountered an error message about float indices. I fixed it with the following line (shown as screenshot as there's something weird happening with gitlab code blocks):
![Screenshot_2021-03-12_at_10.46.23](/uploads/46c0c32a89a838e3e6f87d8b69fa134e/Screenshot_2021-03-12_at_10.46.23.png)https://source.coderefinery.org/iOBS/wp2/task-2-3/netatmoqc/-/issues/3Command csv2obsoul fails with integer ids2021-01-27T17:16:02ZMarkus Koskelamarkus.koskela@csc.fiCommand csv2obsoul fails with integer idsThe command csv2obsoul seems to assume ids are strings and for me they are ints. I was able to fix this with:
```
diff --git a/netatmoqc/save_data.py b/netatmoqc/save_data.py
index 958e953..a572897 100644
--- a/netatmoqc/save_data.py
++...The command csv2obsoul seems to assume ids are strings and for me they are ints. I was able to fix this with:
```
diff --git a/netatmoqc/save_data.py b/netatmoqc/save_data.py
index 958e953..a572897 100644
--- a/netatmoqc/save_data.py
+++ b/netatmoqc/save_data.py
@@ -130,13 +130,17 @@ def save_df_as_obsoul(df, fpath=None, export_params=None):
obs_date = row.time_utc.strftime("%Y%m%d")
# obs_hour must not have leading zeros for the hour
obs_hour = row.time_utc.strftime("%k%M%S").strip()
+ if type(row.id) is str:
+ row_id = row.id[:8]
+ else:
+ row_id = row.id
header = (
17, # Got this from Jelena
obs_type,
obs_code,
row.lat,
row.lon,
- "'{}'".format(row.id[:8]),
+ "'{}'".format(row_id),
obs_date,
obs_hour,
row.alt,
```https://source.coderefinery.org/iOBS/wp2/task-2-3/netatmoqc/-/issues/1TBB threading layer is disabled2020-11-24T08:42:41ZMarkus Koskelamarkus.koskela@csc.fiTBB threading layer is disabledWhen running e.g. `netatmoqc --savefig cluster`, I get the following warning:
/home/cloud-user/.local/lib/python3.7/site-packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer requires TBB version 2019.5 or l...When running e.g. `netatmoqc --savefig cluster`, I get the following warning:
/home/cloud-user/.local/lib/python3.7/site-packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found TBB_INTERFACE_VERSION = 9002. The TBB threading layer is disabled.
However it looks like I have a more recent version of `tbb`:
$ pip list | grep tbb
tbb 2020.3.254https://source.coderefinery.org/iOBS/wp2/task-2-3/netatmoqc/-/issues/2NetAtmo IDs from FMI are integers2020-10-08T14:55:17ZMarkus Koskelamarkus.koskela@csc.fiNetAtmo IDs from FMI are integersThe Netatmo station IDs from FMI are integers with (at least) up to 10 digits, i.e. `1401333882`. I made the following change to get the code working:
```
diff --git a/netatmoqc/load_data.py b/netatmoqc/load_data.py
index 699c871..4d49e...The Netatmo station IDs from FMI are integers with (at least) up to 10 digits, i.e. `1401333882`. I made the following change to get the code working:
```
diff --git a/netatmoqc/load_data.py b/netatmoqc/load_data.py
index 699c871..4d49ea2 100644
--- a/netatmoqc/load_data.py
+++ b/netatmoqc/load_data.py
@@ -85,7 +85,8 @@ def read_netatmo_csv(
data = data.dropna()
# Drop the 'enc:16:' stat id prefix and shorten them to 8 chars
- data["id"] = shorten_stat_id(data["id"])
+ if data["id"].dtype != 'int64':
+ data["id"] = shorten_stat_id(data["id"])
# The netatmo data I got from Norway has mean sea-level pressure
# instead of pressure, but the label is "pressure" there. Fix this.
```