A three-layer ensemble predictor for identifying human RNA N7-methylguanosine sites (THRONE), a machine learning based method, predict the N7-methyguanosine (m7G) sites in human genome. m7G is the mechanism of positively charged modification of 5’ cap of eukaryotic mRNA, which involve in the translation and splicing process. The three layers are 1) baseline models, 2) meta-models and 3) reiterate predicted probabilities of best metamodels to six classifiers. Which resulted as a best m7G predictor in human genome, while compare with earlier models, i.e., Stack1, Stack2 and IFR.
Dataset : #
Training dataset: 741 m7G and 741 non-m7G
Validation dataset: HepG2 cells (Positive sequences: 41 base nucleotides with 20 bases of up and down streams and Negative sequences: Guanosine are center of 41 base nucleotides from human genome which not detected by MeRIP-Seq method.
Independent dataset: HeLa cells m7G sequences from m6A-Atlas. Positive Sequences: 334 and Negative Sequences: 3340 sequences.
- Mononucleotide binary encoding (MBE)
- Nucleotide chemical property and density (NCPD)
- Dinucleotide binary profile and frequency (DBPF)
- Essential Nucleic Acid Composition (ENAC)
- Numerical representation Features (NRF)
- Composition of K-spaced nucleic acid pairs (CKSNAP)
- K-mer compostion (k-mer)
- Series correlation pseudo-dinucleotide composition (SCPseDNC)
- Maximum mutual Information (MMI)
Used Classifiers: #
- Random Forest (RF)
- Support Vector Machine (SVM)
- Extremely Randomized Tree (ERT)
- Gradient Booting (GB)
- AdaBoost (AB)
- eXtreme gradient boosting (XGB)
 1. Shoombuatong, W., et al., THRONE: A New Approach for Accurate Prediction of Human RNA N7-Methylguanosine Sites. Journal of Molecular Biology, 2022. 434(11): p. 167549.