International Journal of Bioscience, Biochemistry and Bioinformatics, no.4, pp.39-44, 2014 (Peer-Reviewed Journal)
A protein that lacks a three-dimensional (3-D) structure in its intrinsic state has been called natively unfolded or intrinsically disordered. The observation that many intrinsically disordered protein regions play a key role in many essential functions has promoted increased interest in studies on the structural identification of intrinsically disordered proteins in the field of bioinformatics. Since amino acid sequence have been widely used for the determination of protein structure, it has been theorized that the sequence could also determine disorder. To improve the quality of prediction, recent studies have focused on finding more useful features and developing more robust predictors. Machine learning techniques are ideally used for extracting the complex relationships and correlations hidden in large data sets. In the study, several features of the chosen proteins were combined together in different ways to obtain an optimized dataset and prediction was accomplished by using the most common method, SVM, resulting in significant increase in success rate with the modeled data. Besides, the feature selection method, ERGS, was used to explore the optimum features that have the adequate information on finding disorder. In the research, 37 attributes were found to be the most influential features in predicting disordered regions.