2022, Vol. 7, Issue 5, Part B
Imputation methods on Hardy Weinberg equilibrium for missing genome wide expression dataAuthor(s):
Makambi Abuga Dennis, Fred Monari, Robert Nyamao Nyabwanga and Lameck Ondieki AgasaAbstract:
Genomic data is a common data source containing missing expression values. Deletion of missing values can lead to a serious bias of the results. One of the common methods of addressing missing values in genomics is through implementing imputation methods. This research sought to address imputation methods which are commonly used for filling missing values. The imputation methods include; RF and KNN methods. This research obtained dataset with no missing values from GENICA study. The missingness structure was created and the imputed values generated using the above-mentioned methods. Missing genotypes were removed and markers evaluated and determined it relation to Hardy−Weinberg equilibrium. Assessment for Hardy−Weinberg equilibrium under the occurrence of missingness by using exact p-values and inbreeding coefficients were developed. Data analysis was carried out using R-program version 4.1.2 and the results presented using relevant charts and tables. The results from the findings indicated that, dataset which were missing at random and missing completely at random were taken into consideration. Based on the findings, the study found that, random forest was the most appropriate method in imputing missing values which are missing completely at random. In addition, the method was suitable for estimation of HW proportions with the aim of maintaining HWE. K NN method was the most effective method in imputing values which were missing at random since it gave out small disparities and it was appropriate method in giving close approximations to Hardy−Weinberg equilibrium.Pages: 114-124 | Views: 135 | Downloads: 16Download Full Article: Click Here
How to cite this article:
Makambi Abuga Dennis, Fred Monari, Robert Nyamao Nyabwanga, Lameck Ondieki Agasa. Imputation methods on Hardy Weinberg equilibrium for missing genome wide expression data. Int J Stat Appl Math 2022;7(5):114-124.