vignettes/cleaver.Rmd
cleaver.Rmd
Abstract
This vignette describes the in-silico cleavage of polypeptides using the cleaver
package.
Most proteomics experiments need protein (peptide) separation and cleavage procedures before these molecules could be analyzed or identified by mass spectrometry or other analytical tools.
cleaver allows in-silico cleavage of polypeptide sequences to e.g. create theoretical mass spectrometry data.
The cleavage rules are taken from the ExPASy PeptideCutter tool (Gasteiger et al. 2005).
Loading the cleaver package:
Getting help and list all available cleavage rules:
help("cleave")
Cleaving of Gastric juice peptide 1 (P01358) using Trypsin:
## cleave it
cleave("LAAGKVEDSD", enzym="trypsin")
## $LAAGKVEDSD
## [1] "LAAGK" "VEDSD"
## get the cleavage ranges
cleavageRanges("LAAGKVEDSD", enzym="trypsin")
## $LAAGKVEDSD
## start end
## [1,] 1 5
## [2,] 6 10
## get only cleavage sites
cleavageSites("LAAGKVEDSD", enzym="trypsin")
## $LAAGKVEDSD
## [1] 5
Sometimes cleavage is not perfect and the enzym miss some cleavage positions:
## miss one cleavage position
cleave("LAAGKVEDSD", enzym="trypsin", missedCleavages=1)
## $LAAGKVEDSD
## [1] "LAAGKVEDSD"
cleavageRanges("LAAGKVEDSD", enzym="trypsin", missedCleavages=1)
## $LAAGKVEDSD
## start end
## [1,] 1 10
## miss zero or one cleavage positions
cleave("LAAGKVEDSD", enzym="trypsin", missedCleavages=0:1)
## $LAAGKVEDSD
## [1] "LAAGK" "VEDSD" "LAAGKVEDSD"
cleavageRanges("LAAGKVEDSD", enzym="trypsin", missedCleavages=0:1)
## $LAAGKVEDSD
## start end
## [1,] 1 5
## [2,] 6 10
## [3,] 1 10
Combine cleaver and Biostrings (Pages et al., n.d.):
## create AAStringSet object
p <- AAStringSet(c(gaju="LAAGKVEDSD", pnm="AGEPKLDAGV"))
## cleave it
cleave(p, enzym="trypsin")
## AAStringSetList of length 2
## [["gaju"]] LAAGK VEDSD
## [["pnm"]] AGEPK LDAGV
cleavageRanges(p, enzym="trypsin")
## IRangesList object of length 2:
## $gaju
## IRanges object with 2 ranges and 0 metadata columns:
## start end width
## <integer> <integer> <integer>
## [1] 1 5 5
## [2] 6 10 5
##
## $pnm
## IRanges object with 2 ranges and 0 metadata columns:
## start end width
## <integer> <integer> <integer>
## [1] 1 5 5
## [2] 6 10 5
cleavageSites(p, enzym="trypsin")
## $gaju
## [1] 5
##
## $pnm
## [1] 5
Downloading Insulin (P01308) and Somatostatin (P61278) sequences from the UniProt (The UniProt Consortium 2012) database using UniProt.ws (Carlson, n.d.).
## load UniProt.ws library
library("UniProt.ws")
## select species Homo sapiens
up <- UniProt.ws(taxId=9606)
## download sequences of Insulin/Somatostatin
s <- select(up,
keys=c("P01308", "P61278"),
columns=c("sequence"),
keytype="UniProtKB"
)
## fetch only sequences
sequences <- setNames(s$Sequence, s$Entry)
## remove whitespaces
sequences <- gsub(pattern="[[:space:]]", replacement="", x=sequences)
Cleaving using Pepsin:
cleave(sequences, enzym="pepsin")
## $P01308
## [1] "MA" "L" "W" "MRLLP"
## [5] "LL" "A" "WGPDPAAA" "F"
## [9] "VNQH" "CGSH" "VEA" "Y"
## [13] "VCGERG" "FF" "YTPKTRREAED" "QVGQVE"
## [17] "GGGPGAGS" "LQP" "LA" "EGS"
## [21] "QKRGIVEQCCTSICS" "Q" "EN" "CN"
##
## $P61278
## [1] "ML" "SCRL" "QCA"
## [4] "L" "AA" "SIV"
## [7] "A" "GCVTGAPSDPRL" "RQ"
## [10] "FL" "QKS" "LAAAAGKQEL"
## [13] "AK" "Y" "AE"
## [16] "SEPNQTENDA" "LEPED" "SQAAEQDEMRL"
## [19] "EL" "QRSANSNPAMAPRERKAGCKN" "FF"
## [22] "W" "KT" "FTSC"
A common use case of in-silico cleavage is the calculation of the isotopic distribution of peptides (which were enzymatic digested in the in-vitro experimental workflow). Here BRAIN (Claesen et al. 2012; Dittwald et al. 2013) is used to calculate the isotopic distribution of cleaver’s output. (please note: it is only a toy example, e.g. the relation of intensity values between peptides isn’t correct).
## load BRAIN library
library("BRAIN")
## cleave insulin
cleavedInsulin <- cleave(sequences[1], enzym="trypsin")[[1]]
## create empty plot area
plot(NA, xlim=c(150, 4300), ylim=c(0, 1),
xlab="mass", ylab="relative intensity",
main="tryptic digested insulin - isotopic distribution")
## loop through peptides
for (i in seq(along=cleavedInsulin)) {
## count C, H, N, O, S atoms in current peptide
atoms <- BRAIN::getAtomsFromSeq(cleavedInsulin[[i]])
## calculate isotopic distribution
d <- useBRAIN(atoms)
## draw peaks
lines(d$masses, d$isoDistr, type="h", col=2)
}
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 11 (bullseye)
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
## [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
## [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] BRAIN_1.42.0 lattice_0.20-41 PolynomF_2.0-5
## [4] UniProt.ws_2.36.5 RSQLite_2.2.16 cleaver_1.34.0
## [7] Biostrings_2.64.1 GenomeInfoDb_1.32.3 XVector_0.36.0
## [10] IRanges_2.30.1 S4Vectors_0.34.0 BiocGenerics_0.42.0
## [13] BiocStyle_2.24.0
##
## loaded via a namespace (and not attached):
## [1] Biobase_2.56.0 httr_1.4.4 httpcache_1.2.0
## [4] sass_0.4.2 bit64_4.0.5 jsonlite_1.8.0
## [7] bslib_0.4.0 shiny_1.7.2 assertthat_0.2.1
## [10] highr_0.9 BiocFileCache_2.4.0 BiocManager_1.30.18
## [13] blob_1.2.3 GenomeInfoDbData_1.2.8 yaml_2.3.5
## [16] progress_1.2.2 pillar_1.8.1 glue_1.6.2
## [19] digest_0.6.29 promises_1.2.0.1 htmltools_0.5.3
## [22] httpuv_1.6.5 pkgconfig_2.0.3 bookdown_0.28
## [25] zlibbioc_1.42.0 purrr_0.3.4 xtable_1.8-4
## [28] later_1.3.0 tibble_3.1.8 KEGGREST_1.36.3
## [31] generics_0.1.3 ellipsis_0.3.2 DT_0.24
## [34] cachem_1.0.6 cli_3.3.0 magrittr_2.0.3
## [37] crayon_1.5.1 mime_0.12 memoise_2.0.1
## [40] evaluate_0.16 fs_1.5.2 fansi_1.0.3
## [43] cellxgenedp_1.0.1 textshaping_0.3.6 tools_4.2.1
## [46] prettyunits_1.1.1 hms_1.1.2 lifecycle_1.0.1
## [49] stringr_1.4.1 AnnotationDbi_1.58.0 compiler_4.2.1
## [52] pkgdown_2.0.6 jquerylib_0.1.4 systemfonts_1.0.4
## [55] rlang_1.0.5 grid_4.2.1 RCurl_1.98-1.8
## [58] rappdirs_0.3.3 htmlwidgets_1.5.4 bitops_1.0-7
## [61] rmarkdown_2.16 DBI_1.1.3 curl_4.3.2
## [64] R6_2.5.1 knitr_1.40 dplyr_1.0.9
## [67] fastmap_1.1.0 bit_4.0.4 utf8_1.2.2
## [70] filelock_1.0.2 rprojroot_2.0.3 ragg_1.2.2
## [73] desc_1.4.1 stringi_1.7.8 parallel_4.2.1
## [76] Rcpp_1.0.9 png_0.1-7 vctrs_0.4.1
## [79] dbplyr_2.2.1 tidyselect_1.1.2 xfun_0.32