Commit 7b7c005a authored by Antti Laitinen's avatar Antti Laitinen
Browse files

Update README.md

parent 0fed1161
......@@ -3,7 +3,7 @@
## Introduction
The need for daily thresholds for meteorological measurements sparked the idea of automatic algorithm to assign daily maximum and minimum thresholds based on the history of the measurement station. The algorithm is based on the work by Hasu & Aaltonen (Automatic Minimum and Maximum Alarm Thresholds for Quality Control, Hasu, V & Aaltonen, A, 2010, Journal of atmospheric and oceanic technology). The mathematical basis of the algorithm is in generalized extreme-value function. For each day of the year, the data consists of the maximum and minimum measurements from the given day of each year in stations history, plus n adjecent days. The n days are weighted to account the temporal distance. To this data the generalized extreme value function is the fitted and the threshold is assigned to the decided quantile.
For calculating the limits for given station (based on the FMI ID), the function QC_limits.py can be run. The desired station ID should be given as a list (line 177 in the file). Alternatively, keyword 'all' can be used to calculate limits for all stations. The qc_limits can be ran with
To calculate the limits, QC_limits.py should be the file to use.
## QC_limits.py
This file contains the main script for calculating the quality control limits for given station / stations. The data for the station can be retrieved from the FMI database in case the user has access, or the data can be read from a given folder. For getting the data from the database, function lims_from_DB should be used. To use the data from hard drive, lims_for_fold function should be used.
......@@ -13,3 +13,16 @@ This function takes only single input: the list of FMI ID's for the stations tha
### lims_for_fold
This function also takes only single input: the folder where the data files for the wanted stations are located. The data should be in csv-files, with headers "DATA_TIME" being the timestamp for the measurement, "TMIN" the minimum temperature and "TMAX" for the maximum. The frequency should be daily.
### QC_limits
This function calculates and plots the limits when the data is given as a pandas dataframe. This function can be used alone, if the data is already read in a dataframe before passing it into the function. The function also needs the station id and the stationdict dictionary, which is created in the createstationdict function. This requires the list of the station names and id's in a csv-format. If this is not available, the lines 46 and 77 can be commented out.
The actual computation of the limits is done in get_limits function found in quality_control file.
## quality_control.py
This file contains the get_limits function which contains most of the heavy computation. It arranges the data from each day into a format where the list contains data from the given day +/- given number of days which can be determined, from each year. For example, in order to calculate the limits for 15th of january, the list would contain data from 1st of january until 30th of january if using +/-15 days, from each year in the station's history. Each day is the weighted based on it's temporal distance from the day that the limits are being calculated for.
After the data is arranged into the lists, the quality control limits are calculated inside the threshold function in the threshold-file
## threshold.py
This file contains the threshold function, that calculates the generalized extreme value function parameters and fits the function to these parameters. The weighted statistical parameters are calcualated in the statistic function.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment