PERSEUCPP

A Supervised Learning Strategy for Predicting Cell-Penetrating Peptides.

PERSEUCPP is a computational strategy designed to identify cell-penetrating peptides (CPPs) and predict their penetration efficiency. It takes amino acid sequences as input and outputs CPP/non-CPP classification along with efficiency level. It's key strengths include the integration of physicochemical, structural, and atomic descriptors with high interpretability, enabling not only accurate predictions but also insights into the biological factors driving cell penetration.
PERSEUCPP computes a set of numerical descriptors from amino acid sequences, normalized by peptide length. These include Atomic Composition (relative amounts of C, H, N, O, and S), Dipeptide Composition (DPC) and Tripeptide Composition (TPC) — covering all possible combinations of the 20 amino acids, normalized by sequence length —, and Composition of k-Spaced Amino Acid Group Pairs (CKSAAGP), which measures the frequency of amino acid group pairs (grouped by properties such as charge, polarity, and hydrophobicity) separated by k residues. Additional physicochemical properties include GRAVY index, molecular weight, isoelectric point, hydrophobicity, and net charge, calculated using the Biopython library.

PERSEUCPP uses the Extremely Randomized Trees (ExtraTrees) algorithm to perform predictions. Based on the computed descriptors for each sequence, the model detects patterns that distinguish cell-penetrating peptides (CPPs) from non-CPPs and estimates their penetration efficiency. The prediction process is divided into two stages:

  • CPP/non-CPP classification – Trained on a balanced dataset of sequences, validated through k-fold cross-validation and independent benchmarks.
  • Penetration efficiency prediction – Trained on CPPs labeled by efficiency, also validated on an independent set.

The use of Extremely Randomized Trees ensures both high performance and interpretability, enabling the identification of the most influential descriptors for cellular penetration ability.