eCite Digital Repository

Outlier detection algorithms over fuzzy data with weighted least squares

Citation

Nikolova, N and Rodriguez, RM and Symes, M and Toneva, D and Kolev, K and Tenekedjiev, K, Outlier detection algorithms over fuzzy data with weighted least squares, International Journal of Fuzzy Systems, 23, (5) pp. 1234-1256. ISSN 1562-2479 (2021) [Refereed Article]

Copyright Statement

Copyright Taiwan Fuzzy Systems Association 2021

DOI: doi:10.1007/s40815-020-01049-8

Abstract

In the classical leave-one-out procedure for outlier detection in regression analysis, we exclude an observation and then construct a model on the remaining data. If the difference between predicted and observed value is high we declare this value an outlier. As a rule, those procedures utilize single comparison testing. The problem becomes much harder when the observations can be associated with a given degree of membership to an underlying population, and the outlier detection should be generalized to operate over fuzzy data. We present a new approach for outlier detection that operates over fuzzy data using two inter-related algorithms. Due to the way outliers enter the observation sample, they may be of various order of magnitude. To account for this, we divided the outlier detection procedure into cycles. Furthermore, each cycle consists of two phases. In Phase 1, we apply a leave-one-out procedure for each non-outlier in the dataset. In Phase 2, all previously declared outliers are subjected to Benjamini–Hochberg step-up multiple testing procedure controlling the false-discovery rate, and the non-confirmed outliers can return to the dataset. Finally, we construct a regression model over the resulting set of non-outliers. In that way, we ensure that a reliable and high-quality regression model is obtained in Phase 1 because the leave-one-out procedure comparatively easily purges the dubious observations due to the single comparison testing. In the same time, the confirmation of the outlier status in relation to the newly obtained high-quality regression model is much harder due to the multiple testing procedure applied hence only the true outliers remain outside the data sample. The two phases in each cycle are a good trade-off between the desire to construct a high-quality model (i.e., over informative data points) and the desire to use as much data points as possible (thus leaving as much observations as possible in the data sample). The number of cycles is user defined, but the procedures can finalize the analysis in case a cycle with no new outliers is detected. We offer one illustrative example and two other practical case studies (from real-life thrombosis studies) that demonstrate the application and strengths of our algorithms. In the concluding section, we discuss several limitations of our approach and also offer directions for future research.

Item Details

Item Type:Refereed Article
Keywords:regression analysis, leave-one-out method, degree of membership, multiple testing, Benjamini–Hochberg step-up multiple testing, false-discovery rate
Research Division:Mathematical Sciences
Research Group:Statistics
Research Field:Applied statistics
Objective Division:Expanding Knowledge
Objective Group:Expanding knowledge
Objective Field:Expanding knowledge in engineering
UTAS Author:Nikolova, N (Professor Nataliya Nikolova)
UTAS Author:Symes, M (Mr Mark Symes)
UTAS Author:Tenekedjiev, K (Professor Kiril Tenekedjiev)
ID Code:146414
Year Published:2021
Web of Science® Times Cited:2
Deposited By:Maritime and Logistics Management
Deposited On:2021-09-06
Last Modified:2021-11-18
Downloads:0

Repository Staff Only: item control page