Detect and process outliers for temperature data at 3h monitoring stations in Vietnam

    1 Faculty of Information Technology, Hanoi University of Mining and Geology, Vietnam;
    2 AI Academy Vietnam, Vietnam;
    3 Center for Hydro - Meteorological Data and Information, Vietnam;
    4 Falculty of Information Technology Technical University, Vietnam
  • Keywords: Outliers, Anomalies, Z-Score, Box-plot.
  • Received: 15th-Nov-2019
  • Accepted: 6th-Jan-2020
  • Available online: 28th-Feb-2020
Pages: 132 - 146
View: 6654


Data preparation is a compulsory process in any data science project. Many research have shown that it constitutes 80% of the time, effort and resources of a data science project. Depending on the particular project and data type, Data preparation step may required different methods/steps. Detecting and processing outlier data is one of the important preprocessing steps in data preparation , especially for time series data. This paper reviews two methods for detecting outliers for low dimensional data, namely Z - Score and Box - plot charts. We also present results of experiments which applied these methods for temperature data collected from 43 monitoring stations in 3 - hour in Vietnam over the last 6 years from 01/01/2014 to 31/12/2019.

