This functions cleave polypeptide sequences. Use cleavageSites to find the cleavage sites, cleavageRanges to find the cleavage ranges and cleave to get the cleavage products.

# S4 method for character
cleave(x, enzym = "trypsin", missedCleavages = 0,
                             custom = NULL, unique = TRUE)

# S4 method for AAString
cleave(x, enzym = "trypsin", missedCleavages = 0,
                            custom = NULL, unique = TRUE)

# S4 method for AAStringSet
cleave(x, enzym = "trypsin", missedCleavages = 0,
                               custom = NULL, unique = TRUE)

# S4 method for character
cleavageRanges(x, enzym = "trypsin", missedCleavages = 0,
                                     custom = NULL)

# S4 method for AAString
cleavageRanges(x, enzym = "trypsin", missedCleavages = 0,
                                    custom = NULL)

# S4 method for AAStringSet
cleavageRanges(x, enzym = "trypsin", missedCleavages = 0,
                                       custom = NULL)

# S4 method for character
cleavageSites(x, enzym = "trypsin", custom = NULL)

# S4 method for AAString
cleavageSites(x, enzym = "trypsin", custom = NULL)

# S4 method for AAStringSet
cleavageSites(x, enzym = "trypsin", custom = NULL)

Arguments

x

polypeptide sequences.

enzym

character, cleavage rule.

missedCleavages

numeric, number of missed cleavages.

custom

character, of length 1 or 2. Could be used to define own cleaveage rules. The first element would be the pattern and the optional second element would be an exception (non-cleavage) pattern. Perl-like regular expressions are supported, see gregexpr for details. If custom is set the enzym is ignored.

unique

logical, if TRUE all duplicated cleavage products per peptide are removed.

Value

cleave

If x is a character it returns a list of the same length as x. Each element contains a character vector with the corresponding cleavage products of the polypeptides. If x is an AAString or an AAStringSet an AAStringSet or an AAStringSetList instance of the same length as x is returned. Each element contains an AAString or an AAStringSet instance with the corresponding cleavage products of the polypeptides.

cleavageRanges

If x is a character it returns a list of the same length as x. Each element contains a two-column matrix with the start and end positions of the peptides. If x is an AAString or an AAStringSet instance an IRanges or an IRangesList of the same length as x is returned.

cleavageSites

Returns a list of the same length as x. Each element contains an integer vector with the cleavage positions.

Overview:

InputcleavecleavageRangescleavageSites
characterlist of characterlist of matrixlist of integer
AAStringAAStringSetIRangeslist of integer
AAStringSetAAStringSetListIRangesListlist of integer

Details

The cleavage rules are taken from: https://web.expasy.org/peptide_cutter/peptidecutter_enzymes.html

Cleavage rules (cleavage between P1 and P1'):

Rule nameP4P3P2P1P1'P2'arg-c proteinase
---R--asp-n endopeptidase-
---D-bnps-skatole-c--
-W--caspase1F,W,Y,L-H,A,T
Dnot P,E,D,Q,K,R-caspase2DVAD
not P,E,D,Q,K,R-caspase3DMQDnot P,E,D,Q,K,R
-caspase4LEVDnot P,E,D,Q,K,R-
caspase5L,WEHD--caspase6
VEH,IDnot P,E,D,Q,K,R-caspase7D
EVDnot P,E,D,Q,K,R-caspase8I,LE
TDnot P,E,D,Q,K,R-caspase9LEH
D--caspase10IEAD
--chymotrypsin-high---F,Ynot P
----Wnot M,P-
chymotrypsin-low---F,L,Ynot P-
---Wnot M,P--
--Mnot P,Y---
-Hnot D,M,P,W-clostripain---
R--cnbr---M
--enterokinaseD,ED,ED,EK-
-factor xaA,F,G,I,L,T,V,MD,EGR--
formic acid---D--glutamyl endopeptidase
---E--granzyme-bI
EPD--hydroxylamine--
-NG-iodosobenzoic acid---
W--lysc---K
--lysn----K
-neutrophil elastase---A,V--
ntcb----C-pepsin1.3
-not H,K,Rnot Pnot RF,Lnot Ppepsin-
not H,K,Rnot Pnot RF,L,W,Ynot P-not H,K,R
not PF,L,W,Y-not P-not H,K,Rnot P
F,L-not Pproline endopeptidase--not H,K,RP
not P-proteinase k---A,E,F,I,L,T,V,W,Y-
-staphylococcal peptidase i--not EE--
thermolysin---not D,EA,F,I,L,M,V-thrombin
--GRG-A,F,G,I,L,T,V,M
A,F,G,I,L,T,V,WPRnot D,Enot D,Etrypsin--
-K,Rnot P---W
KP---MR

Exceptions:

Rule nameEnzyme nameP4P3P2P1P1'P2'
trypsin--C,DKD-
--CKH,Y-
--CRK-
--RRH,R-

Rule nameEnzyme name
arg-c proteinaseArg-C proteinase
asp-n endopeptidaseAsp-N endopeptidase
bnps-skatole-cBNPS-Skatole
caspase1Caspase 1
caspase2Caspase 2
caspase3Caspase 3
caspase4Caspase 4
caspase5Caspase 5
caspase6Caspase 6
caspase7Caspase 7
caspase8Caspase 8
caspase9Caspase 9
caspase10Caspase 10
chymotrypsin-highChymotrypsin-high specificity (C-term to [FYW], not before P)
chymotrypsin-lowChymotrypsin-low specificity (C-term to [FYWML], not before P)
clostripainClostripain (Clostridiopeptidase B)
cnbrCNBr
enterokinaseEnterokinase
factor xaFactor Xa
formic acidFormic acid
glutamyl endopeptidaseGlutamyl endopeptidase
granzyme-bGranzyme B
hydroxylamineHydroxylamine
iodosobenzoic acidIodosobenzoic acid
lyscLysC
lysnLysN
neutrophil elastaseNeutrophil elastase
ntcbNTCB (2-nitro-5-thiocyanobenzoic acid)
pepsin1.3Pepsin (pH == 1.3)
pepsinPepsin (pH > 2)
proline endopeptidaseProline-endopeptidase
proteinase kProteinase K
staphylococcal peptidase iStaphylococcal Peptidase I
thermolysinThermolysin
thrombinThrombin
trypsinTrypsin

Author

Sebastian Gibb <mail@sebastiangibb.de>

References

Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A.; "Protein Identification and Analysis Tools on the ExPASy Server". (In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press (2005).
https://web.expasy.org/peptide_cutter/peptidecutter_enzymes.html

See also

AAString, AAStringSet, AAStringSetList, IRanges, IRangesList

Examples

library("cleaver")

## Gastric juice peptide 1 (UniProtKB/Swiss-Prot: GAJU_HUMAN/P01358)
gaju <- "LAAGKVEDSD"

cleave(gaju, "trypsin")
#> $LAAGKVEDSD
#> [1] "LAAGK" "VEDSD"
#> 
# $LAAGKVEDSD
# [1] "LAAGK" "VEDSD"

cleavageRanges(gaju, "trypsin")
#> $LAAGKVEDSD
#>      start end
#> [1,]     1   5
#> [2,]     6  10
#> 
# $LAAGKVEDSD
#      start end
# [1,]     1   5
# [2,]     6  10

cleavageSites(gaju, "trypsin")
#> $LAAGKVEDSD
#> [1] 5
#> 
# $LAAGKVEDSD
# [1] 5

cleave(gaju, "trypsin", missedCleavages=1)
#> $LAAGKVEDSD
#> [1] "LAAGKVEDSD"
#> 
# $LAAGKVEDSD
# [1] "LAAGKVEDSD"

cleavageRanges(gaju, "trypsin", missedCleavages=1)
#> $LAAGKVEDSD
#>      start end
#> [1,]     1  10
#> 
# $LAAGKVEDSD
#      start end
# [1,]     1  10

cleave(gaju, "trypsin", missedCleavages=0:1)
#> $LAAGKVEDSD
#> [1] "LAAGK"      "VEDSD"      "LAAGKVEDSD"
#> 
# $LAAGKVEDSD
# [1] "LAAGK" "VEDSD" "LAAGKVEDSD"

cleavageRanges(gaju, "trypsin", missedCleavages=0:1)
#> $LAAGKVEDSD
#>      start end
#> [1,]     1   5
#> [2,]     6  10
#> [3,]     1  10
#> 
# $LAAGKVEDSD
#      start end
# [1,]     1   5
# [2,]     6  10
# [3,]     1  10


cleave(gaju, "pepsin")
#> $LAAGKVEDSD
#> [1] "LAAGKVEDSD"
#> 
# $LAAGKVEDSD
# [1] "LAAGKVEDSD"
# (no cleavage)


## use AAStringSet
gaju <- AAStringSet("LAAGKVEDSD")

cleave(gaju)
#> AAStringSetList of length 1
#> [["LAAGKVEDSD"]] LAAGK VEDSD
# AAStringSetList of length 1
# [["LAAGKVEDSD"]] LAAGK VEDSD


## Beta-enolase (UniProtKB/Swiss-Prot: ENOB_THUAL/P86978)
enob <- "SITKIKAREILD"

cleave(enob, "trypsin")
#> $SITKIKAREILD
#> [1] "SITK" "IK"   "AR"   "EILD"
#> 
# $SITKIKAREILD
# [1] "SITK" "IK"   "AR"   "EILD"

cleave(enob, "trypsin", missedCleavages=2)
#> $SITKIKAREILD
#> [1] "SITKIKAR" "IKAREILD"
#> 
# $SITKIKAREILD
# [1] "SITKIKAR" "IKAREILD"

cleave(enob, "trypsin", missedCleavages=0:2)
#> $SITKIKAREILD
#> [1] "SITK"     "IK"       "AR"       "EILD"     "SITKIK"   "IKAR"     "AREILD"  
#> [8] "SITKIKAR" "IKAREILD"
#> 
# $SITKIKAREILD
# [1] "SITK"     "IK"       "AR"       "EILD"     "SITKIK"   "IKAR"
# [7] "AREILD"   "SITKIKAR" "IKAREILD"

## define own cleavage rule: cleave at K
cleave(enob, custom="K")
#> $SITKIKAREILD
#> [1] "SITK"   "IK"     "AREILD"
#> 
# $SITKIKAREILD
# [1] "SITK"   "IK"     "AREILD"

cleavageRanges(enob, custom="K")
#> $SITKIKAREILD
#>      start end
#> [1,]     1   4
#> [2,]     5   6
#> [3,]     7  12
#> 
# $SITKIKAREILD
#      start end
# [1,]     1   4
# [2,]     5   6
# [3,]     7  12

## define own cleavage rule: cleave at K but not if followed by A
cleave(enob, custom=c("K", "K(?=A)"))
#> $SITKIKAREILD
#> [1] "SITK"     "IKAREILD"
#> 
# $SITKIKAREILD
# [1] "SITK"     "IKAREILD"

cleavageRanges(enob, custom=c("K", "K(?=A)"))
#> $SITKIKAREILD
#>      start end
#> [1,]     1   4
#> [2,]     5  12
#> 
# $SITKIKAREILD
#      start end
# [1,]     1   4
# [2,]     5  12

cleavageSites(enob, custom=c("K", "K(?=A)"))
#> $SITKIKAREILD
#> [1] 4
#> 
# $SITKIKAREILD
# [1] 4