Prediction of CENH3 protein in maize using machine learning techniques
Author(s): Suman Dutta, Rajkumar U Zunjare, Vignesh Muthusamy and Firoz Hossain
Abstract: Centromere specific CENH3 gene encoding a variant for histone H3 protein ¬causes in-vivo haploid induction in maize. Chromosome duplication caused by colchicine therapy in haploids causes inbreds to become fully fixed after just one generation, as opposed to 6-7 generations of selfing in traditional methods. For in-vivo haploid induction, understanding of CENH3 proteins in segregation of chromosomes during cell division is therefore of vital importance. There is currently no online resource that can categorise unknown proteins into CENH3 proteins. In this study, our goal was to build a machine learning-based system for predicting the CENH3 protein of unidentified origin. Amino acid composition (AAC) was employed to construct random forest, decision tree and logistic regression classifiers to predict the CENH3 proteins. A total of 618 protein sequences were examined, including 309 CENH3 sequences from different species and 309 Non-CENH3 sequences from Zea mays. The prediction of CENH3 proteins showed considerable promise using random forest and logistic regression classifiers. AAC achieved >98% prediction accuracies using random forest and logistic regression classifiers. Also, t-SNE technique could successfully separate two different classes of proteins in two-dimensional space. The average accuracy scores from the cross-validation of the logistic regression and random forest models were promising while 10 folds of cross-validation using the k-fold method was performed. Hence, the cross-validation score also showed that each model had a promising ability to predict CENH3 proteins. The findings of the study can be applied to different crops before any experiments are conducted.
Suman Dutta, Rajkumar U Zunjare, Vignesh Muthusamy and Firoz Hossain. Prediction of CENH3 protein in maize using machine learning techniques. The Pharma Innovation Journal. 2023; 12(7S): 01-06. DOI: 10.22271/tpi.2023.v12.i7Sa.21185