Detect and process outliers for temperature data at 3h monitoring stations in Vietnam

- Organ:1 Faculty of Information Technology, Hanoi University of Mining and Geology, Vietnam;
2 AI Academy Vietnam, Vietnam;
3 Center for Hydro - Meteorological Data and Information, Vietnam;
4 Falculty of Information Technology Technical University, Vietnam
- Keywords: Outliers, Anomalies, Z-Score, Box-plot.
- Received: 15th-Nov-2019
- Accepted: 6th-Jan-2020
- Available online: 28th-Feb-2020
- Field: Information Technology
Abstract:
Data preparation is a compulsory process in any data science project. Many research have shown that it constitutes 80% of the time, effort and resources of a data science project. Depending on the particular project and data type, Data preparation step may required different methods/steps. Detecting and processing outlier data is one of the important preprocessing steps in data preparation , especially for time series data. This paper reviews two methods for detecting outliers for low dimensional data, namely Z - Score and Box - plot charts. We also present results of experiments which applied these methods for temperature data collected from 43 monitoring stations in 3 - hour in Vietnam over the last 6 years from 01/01/2014 to 31/12/2019.
[1]. Charu C., Aggarwal, (2017). Outlier Analysis, Springer International Publishing AG, New York.
[2]. Davy Cielen, Arno D. B., Meysman, Mohamed Ali, (2016). Introducing Data Science, Manning Publications Co.
[3]. Hermine N., Akouemo, Richard J. Povinelli, (2014). Time series outlier detection and imputation, IEEE.
[4]. Nguyễn Văn Tuấn, (2014). Phân tích dữ liệu với R,Nhà xuất bản tổng hợp Thành phố Hồ Chí Minh.
[5]. Ranga Suri, N. N. R , Narasimha Murty M., Athithan, G., (2018). Outlier Detection: Techniques and Applications, Springer Nature Switzerland AG, Cham.
[6]. Tamara Munzer, (2014). Visualization Analysis and Design,CRC Press.
Other articles