PERSEUCPP uses the Extremely Randomized Trees (ExtraTrees) algorithm to perform predictions. Based on the computed descriptors for each sequence, the model detects patterns that distinguish cell-penetrating peptides (CPPs) from non-CPPs and estimates their penetration efficiency. The prediction process is divided into two stages:
- CPP/non-CPP classification – Trained on a balanced dataset of sequences, validated through k-fold cross-validation and independent benchmarks.
- Penetration efficiency prediction – Trained on CPPs labeled by efficiency, also validated on an independent set.
The use of Extremely Randomized Trees ensures both high performance and interpretability, enabling the identification of the most influential descriptors for cellular penetration ability.