Commit 5a8b6aa9 authored by Paulo Medeiros's avatar Paulo Medeiros
Browse files

Bugfixes, changes to metrics & clustering opts.

Summary of main changes below:

Added:
    - New metrics calculation methods:
        - correlation_aware_euclidean (the new default)
        - haversine_plus_euclidean
        - haversine_plus_manhattan (the only one implemented previously)
    - "unclusterable_data_columns" general config option
    - Allow choice of HDBSCAN's cluster_selection_method

Changed:
    - Default HDBSCAN method from "leaf" to "eom"
    - Default min_samples and min_cluster_size: 5 --> 10
    - Changed internal data normalisation scheme
    - Metrics has now its own section in config file
    - Use a more strict GLOSH outlier removal score threshold
    - Visualised map uses same proj params as the configured in domain
    - Remove unused "tstep" from domain configs

Fixed:
    - InvalidIndexError caught after pandas 1.4.0 update
    - Some crashes in outlier removal methods (solves #5)
    - flakehell cannot import 'MergedConfigParser'
    - Some warnings
parents c615173b f148f400
Pipeline #10662 passed with stages
in 3 minutes and 28 seconds