Novel methods to construct empirical CDF for continuous random variables using censor data
Nikolova, N and Toneva, D and Tsonev, Y and Burgess, B and Tenekedjiev, K, Novel methods to construct empirical CDF for continuous random variables using censor data, Proceedings of the 10th International Conference on Intelligent Systems (IS), 28-30 August 2020, Virtual Conference, Online (Varna, Bulgaria), pp. 61-68. ISBN 978-1-7281-5456-5 (2020) [Refereed Conference Paper]
We deal with the problem of creating empirical CDF (ECDF) for a continuous random variable X, defined as time of an event of interest, such as failure or repair. The data sample to construct the ECDF is a result of an experiment, where completely observed variates are combined with right-censored variates of X. Due to the finite precision of the measurement, ties are allowed in the data sample. The Kaplan-Meier estimator (KME) is the usual method of choice when constructing ECDF under this setup. Some shortcomings of KME have been identified, most of which due to neglecting the prior information that X is a continuous random variable. A new symmetrical requirement (SR) for any estimator is motivated, which requires equal treatment of the events X <; x and X ≤ x. A new universal ECDF estimator (UECDFE) is proposed, which meets SR and overcomes some of the KME shortcomings, especially the partial utilization of the right-censored variates. Another novel invertible ECDF estimator with maximum count of nodes (IECDFmax) is developed as a linear interpolation on nodes, estimated using UECDFE. The former estimates continuous, strictly increasing, invertible ECDF, i.e. properties that the true CDF of any continuous variable theoretically possesses. Additionally, the cardinality of the node set is maximal under the given data sample, which improves the resemblance of the ECDF to the true CDF. We also address the difficult technical problem of defining appropriate domain for the IECDFmax. We show that IECDFmax overcomes all the formulated shortcomings of KME and completely utilizes all the available information contained in the data sample and in the prior knowledge that X is a continuous random variable.
Refereed Conference Paper
estimation theory, interpolation, probability, random processes, reliability theory, reliability, linear interpolation, data sample, IECDFmax, KME, UECDFE, universal ECDF estimator, symmetrical requirement, Kaplan-Meier estimator