Application of Least Absolute Deviation Regression Using the EM Algorithm on Outlier Data (Case Study: Modeling the Open Unemployment Rate in Java Island in 2021) Rizki Ayu Fitrian Sari, Setiawan, Santi Puteri Rahayu
Department of Statistics, Institut Teknologi Sepuluh Nopember, Kampus ITS-Sukolilo, Surabaya 60111, Indonesia
Abstract
Estimation of linear regression parameters is known by using the Ordinary Least Square (OLS) method. The OLS method is very sensitive to deviations from assumptions in the data, especially the residual normality assumption in the data, which is caused by outliers. To reduce the influence of outliers is to use robust estimates. One such estimator is the Least Absolute Deviation (LAD). The LAD method is a method that minimizes the absolute number of errors (the sum of the absolute values of the vertical residuals between the points generated by the function and the corresponding points in the data). The advantage of LAD is that it is resistant to outliers in the data. LAD generally uses IRLS (Iteratively Reweighted Least Square) iterations, the nature of this iteration makes the algorithm for the LAD method more complex and requires a long time to compute, so this study considers using the EM algorithm to solve computational problems. This study aims to estimate the LAD regression parameter with the EM algorithm that is applied to the Open Unemployment Rate in Java Island in 2021. The results of this study by looking at the largest r-square value indicate that the LAD method gives better results than the OLS method for cases of outlier data .
Keywords: Regression- OLS- Outlier- LAD- EM Algorithm