Вычислительно-эффективные методы анализа данных тандемной масс-спектрометрии тема диссертации и автореферата по ВАК РФ 00.00.00, кандидат наук Аквей Фрэнк Лоренс Ний Адокквей

  • Аквей Фрэнк Лоренс Ний Адокквей
  • кандидат науккандидат наук
  • 2025, «Национальный исследовательский университет «Высшая школа экономики»
  • Специальность ВАК РФ00.00.00
  • Количество страниц 193
Аквей Фрэнк Лоренс Ний Адокквей. Вычислительно-эффективные методы анализа данных тандемной масс-спектрометрии: дис. кандидат наук: 00.00.00 - Другие cпециальности. «Национальный исследовательский университет «Высшая школа экономики». 2025. 193 с.

Оглавление диссертации кандидат наук Аквей Фрэнк Лоренс Ний Адокквей

Contents

Introduction

1 Background

1.1 Tooling

1.2 Open-Science and Open Source

1.3 Software implementation and availability

1.4 Data repositories

1.4.1 PRIDE

1.4.2 MassIVE

1.4.3 Peptide Atlas

2 Efficient indexing of peptides for database search using tide

2.1 Introduction

2.2 Definition of parameters controlling peptide index generation

2.3 The old version of tide-index

2.4 Faster tide-index

2.5 Experimental settings

2.5.1 Database search

2.5.2 False discovery rate control

2.6 Experimental results

2.6.1 Tide-index is fast and memory efficient

2.6.2 Parameter impact on database size

2.6.3 Investigating the trade-off between database size and statistical power

2.6.4 Tide can handle immunopeptidomics experiments

2.6.5 Microbiome data analysis

2.7 Discussion

3 Fast and memory efficient searching of large-scale mass spectrometry data using tide

3.1 Introduction

3.2 The old version of tide-search

3.3 Faster tide-search

3.3.1 Code optimization

3.3.2 Standardization of modifications and output format

3.4 Experimental settings

3.4.1 Database search

3.4.2 False discovery rate control

3.5 Experimental results

3.5.1 Tide-search's scoring speedups

3.5.2 Microbiome data analysis

3.5.3 Immunopeptidomics data analysis

3.5.4 Human proteome analysis

3.6 Discussion

4 Label-free quantification in the Crux toolkit

4.1 Introduction

4.2 Fast LFQ in crux toolkit

4.2.1 Quantification algorithm

4.3 Experimental Settings

4.3.1 Database search

4.3.2 Peptide detection and quantification

4.4 Experimental results

4.4.1 CruxLFQ results are consistent with existing LFQ tools

4.5 Discussion

Conclusion

Bibliography

List of abbreviations and definitions

Acknowledgments

List of figures

List of tables

List of algorithms

Author Contributions

Appendix: Russian Translation

Рекомендованный список диссертаций по специальности «Другие cпециальности», 00.00.00 шифр ВАК

Введение диссертации (часть автореферата) на тему «Вычислительно-эффективные методы анализа данных тандемной масс-спектрометрии»

Introduction

This dissertation focuses on computationally efficient methods for identifying and quantifying peptides in tandem mass spectrometry data. In this context, computational efficiency means achieving the same results while minimising processing time and memory usage. Mass spectrometry has emerged as the de facto method for identifying molecules in various samples across several disciplines, including molecular biology, forensics, the pharmaceutical industry, medicine and general biomarker discovery[2, 32]. Mass spectrometers produce a series of spectra. A spectrum s is a list of pairs (mi, Ij), where mi represents the peak location (e.g., mass-to-charge ratio in mass spectrometry) and Ii denotes the corresponding peak intensity as illustrated in Figure 1. The spectrum can be considered a fingerprint of a molecule yet to be identified. Peptide molecule identification is often done through database search in which a spectrum s is annotated with a best scoring database reference hj; formally: s ^ h = argmax 0(s, hj). This involves: (1) a scoring function 0 : S x DB ^ R,

hj ecp (s)

where a higher score indicates a better match, and (2) a database of peptides (DB)

The computational cost is O(|S| x |CP|), where |S| and |CP| denotes the number of the experimental spectra and the number of the candidate peptides (CP) considered during the searching, respectively.

Mass spectrometers often produce terabytes, if not petabytes, of spectrum data [20], the peptide database also can contain tens of hundreds of billions of peptides. These require computationally efficient programs for analysis.

Figure 1: Experimental spectrum. An illustration of an experimental spectrum generated by a mass spectrometer.

Label-free quantification (LFQ) is a method used to measure the relative amounts of annotated peptides in biological samples without chemical labelling. LFQ operates at the MS1 level, detecting peptide signals, tracking them over time, and integrating the total ion signals for quantification. The current state of the art in LFQ has been shown to be slow and computationally inefficient [3, 42] hence the need for faster tools for performing LFQ tasks.

Indexing in computer science involves creating data structures that facilitate quick information retrieval from large datasets. This well-researched problem served as a key element for this thesis.

As mass spectrometry increasingly becomes a big data problem—particularly in proteomics [ ]—there is a growing need for tools that can process data efficiently in terms of time and memory. This thesis addresses that need by developing and evaluating computationally efficient methods tailored to tandem mass spectrometry data analysis.

Motivation for research

Although peptide identification and quantification is highly researched problem [16, 41, 18], most tools are computationally inefficient, even for small datasets. In addition, the rise of platforms such as the PRIDE database, which is maintained by the European Bioinformatics Institute (EMBL-EBI) and serves as a public repository for proteomics data obtained through mass spectrometry, highlights the need for effective data processing software. The PRIDE database accepts around 500 [46] new datasets monthly and aligns with FAIR (Findable, Accessible, Interoperable, and Reusable) data principles for data submission, access, and reuse. Consequently, there is a pressing need for computationally efficient tools to manage and analyse these enormous amounts of data. For this reason, developing fast and memory-efficient tools is crucial for advancing the mass spectrometry field and harnessing the generated data's full potential.

Additionally, clinicians, specifically doctors, frequently encounter challenges due to limited access to high-performance computing resources [48]. For instance, an av-

erage doctor in a hospital typically works with a standard computer and lacks access to bioinformaticians who possess the technical expertise to process large datasets, such as through cloud computing. This deficiency in computational power hinders their ability to analyze significant amounts of data effectively. Therefore, there is an urgent need for tools that can operate efficiently on standard computers while managing large datasets, especially in clinical settings.

Moreover, the arrival of the Astral Mass Spectrometer marks a significant advancement in data generation compared to traditional Orbitrap devices [23]. These newer devices produce more comprehensive spectrum data, underscoring the need for reliable and accurate tools to process data from various spectrometers and experimental protocols.

In summary, accessible data repositories, the need for computational efficiency in clinical settings, and advancements in mass spectrometry technology—particularly the requirement to process Astral Mass Spectrometry data—are key factors driving ongoing research to enhance analytical capabilities in this field. Timely annotation of in-silico analyses will reduce the time required to conclude from experiments, ultimately supporting clinical decision-making.

Aims and objectives of research

Crux-toolkit is an open-source project designed to provide users with a suite of tools for analysing tandem mass spectrometry data. Crux includes several programs such as: tide-index, tide-search, comet, percolator, kojak, etc. The primary goal of this research was to improve the computational efficiency of the crux-toolkit; namely:

1. Improve tide-index program in crux-toolkit such that it can handle billions of peptides.

2. Improve tide-search program in crux-toolkit to process terabytes of tandem mass spectrometry data.

3. Add a fast and efficient LFQ functionality to the crux-toolkit, for peptide quantification.

Novelty and summary of the author's main results

The main results of this thesis are improvements to the crux-toolkit, which enables practitioners to process terabytes of data more quickly on standard computers without the need for HPC clusters, expensive cloud computing technologies, or trained bioinformaticians or IT technicians.

Publications

This dissertation consists of four articles, as outlined in Publication List section. Two of these articles have been published in Q1 or A-category journals according to the rankings provided by HSE University's Scientometrics Centre, based on data from Scopus and the Web of Science. The other two articles are currently under review. I am the primary author of two of these articles. Additionally, I have one other publication listed in Publication List section. My contributions are listed in Appendix A.

List of publications

1. Attila Kertesz-Farkas,Frank Lawrence Nii Adoquaye Acquaye, Kishankumar Bhi-mani, Jimmy K Eng, William E Fondrie, Charles Grant, Michael R Hoop-mann, Andy Lin, Yang Y Lu, Robert L Moritz, Michael J MacCoss, William Stafford Noble (2023). The Crux toolkit for analysis of bottom-up tandem mass spectrometry proteomics data. Journal of Proteome Research. https: //pubs.acs.org/doi/10.1021/\acs.jproteome.2c00615

2. Frank Lawrence Nii Adoquaye Acquaye, Attila Kertesz-Farkas, William Stafford Noble (2023). Efficient Indexing of Peptides for Database Search Using Tide. Journal of Proteome Research. https://pubs.acs.org/doi/10.1021/acs. jproteome.2c00617

3. Attila Kertesz-Farkas, Frank Lawrence Nii Adoquaye Acquaye, Vladislav Ostapenko, Rufino Haroldo Locon, Yang Lu, Charles E. Grant, William Stafford Noble

Fast and memory efficient searching of large-scale mass spectrometry data using Tide https://www.biorxiv.org/content/10.1101/2025.04.01.646675v1

4. Frank Lawrence Nii Adoquaye Acquaye, Bo Wen, Charles E. Grant, William Stafford Noble, Attila Kertesz-Farkas (2025). Label-free quantification in the Crux toolkit https://www.biorxiv.org/content/10.1101/2025.04.21.649897v1

Other publications

1. Frank Lawrence Nii Adoquaye Acquaye, Latypov I., Attila Kertesz-Farkas Hy-pernym Information and Sentiment Bias Probing in Distributed Data Representation , in : ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing. NY : Association for Computing Machinery (ACM), https://dl.acm.org/doi/10.1145/3587716.3587753

The organisation of the thesis

This thesis organises the content as follows. Chapter 1 focuses on the tools and data repositories related to computational tandem mass spectrometry. Chapter 2 focuses on optimizing tide-index's memory and time consumption. Upon gaining said speedups, we proceed to Chapter 3, which details the improvements made to the tide-search tool to handle terabytes of tandem mass spectrometry data. Chapter 4 discusses the implementation of a fast quantification algorithm in the crux-toolkit. Finally, Chapter 4.5 summarises and concludes this thesis.

Похожие диссертационные работы по специальности «Другие cпециальности», 00.00.00 шифр ВАК

Заключение диссертации по теме «Другие cпециальности», Аквей Фрэнк Лоренс Ний Адокквей

Заключение

Несмотря на улучшения, представленные в этой диссертации, нам часто приходилось идти на компромиссы при проектировании, например, между использованием ЦП и ОЗУ или между эффективностью кода и его читаемостью/обслуживаемостью. Примером может служить CruxLFQ. Из рисунка 17 видно, что CruxLFQ потребляет меньше памяти и RAM, чем FlashLFQ, но CruxLFQ плохо масштабируется при увеличении размера данных. Тем не менее, в целом он по-прежнему быстрее и более эффективен с точки зрения использования памяти, чем FlashLFQ.

Другой пример — часть ассемблерного кода в Tide. Изначально tide-search динамически создавал ассемблерный код из теоретических пептидов во время процесса оценки спектра-пептида. Этот ассемблерный код хранился в оперативной памяти в куче в виде исполняемого файла. Это делало tide-search на 20% быстрее, чем при реализации оценки спектра-пептида на чистом C+—+. К сожалению, ассемблерный код имел несколько недостатков. Во-первых, это делало обслуживание кода более трудоемким и проблематичным, поскольку не многие программисты знакомы с ассемблерным кодом. Во-вторых, он работал на процессорах Intel, но не компилировался на процессорах Apple M1. В-третьих, он также не работал в многопоточной среде. В конечном итоге мы пожертвовали частью скорости в пользу поддержки многопоточности, лучшего обслуживания кода и поддержки нескольких платформ и переписали оценку спектра-пептида на чистом C+—+.

Наконец, мы исследовали преимущества реализации оценки спектр-пептид в Tide-search на GPU в лабораторных условиях. Однако мы поняли, что из-за передачи данных между CPU и GPU улучшение по сравнению с многопоточным решением было несущественным. Кроме того, это усложнило бы кодовую базу. Поэтому мы решили не продолжать эту реализацию.

В заключение, растущая сложность и объем данных масс-спектрометрии, особенно в протеомике, подчеркивают острую необходимость в вычислительно эффективных инструментах для эффективного управления и обработки этой

информации. В данной диссертации были выявлены ключевые области, в которых можно добиться повышения вычислительной эффективности, главным образом за счет совершенствования методов индексирования и оптимизации ресурсов. Решая эти проблемы, мы стремимся устранить существующие пробелы в обработке данных масс-спектрометрии, что в конечном итоге будет способствовать развитию протеомики в целом. Улучшения, предложенные для индекса tide, повышают производительность поисковых возможностей Crux и открывают путь к более масштабируемым и эффективным подходам к анализу данных в научных исследованиях. Эта работа закладывает основу для будущих инноваций и подчеркивает важность вычислительной эффективности в решении проблем больших данных в протеомике и других областях.

Список литературы диссертационного исследования кандидат наук Аквей Фрэнк Лоренс Ний Адокквей, 2025 год

Список литературы

[1] Frank Lawrence Nii Adoquaye Acquaye, Attila Kertesz-Farkas, and William Stafford Noble. Efficient indexing of peptides for database search using tide. Journal of proteome research, 22(2):577-584, 2023.

[2] Ruedi Aebersold and Matthias Mann. Mass spectrometry-based proteomics. Nature, 422(6928):198, 2003.

[3] Constantin Ammar, Julia Patricia Schessner, Sander Willems, Andre C. Michaelis, and Matthias Mann. Accurate label-free quantification by directlfq to compare unlimited numbers of proteomes. Molecular & Cellular Proteomics : MCP, 22, 2023.

[4] Andrea Argentini, Ludger JE Goeminne, Kenneth Verheggen, Niels Hulstaert, An Staes, Lieven Clement, and Lennart Martens. moFF: a robust and automated approach to extract peptide ion intensities. Nature Methods, 13(12):964-966, 2016.

[5] Michal Bassani-Sternberg, Eva Braunlein, Richard Klar, Thomas Engleitner, Pavel Sinitcyn, Stefan Audehm, Melanie Straub, Julia Weber, Julia Slotta-Huspenina, Katja Specht, et al. Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry. Nature communications, 7(1):13404, 2016.

[6] Jeremiah J. Bowers, Jian Liu, Harsha P. Gunawardena, and Scott A. McLuckey. Protein identification via ion-trap collision-induced dissociation and examination of low-mass product ions. Journal of mass spectrometry : JMS, 43 1:23-34, 2007.

[7] Hao Chi, Kun He, Bing Yang, Zhen Chen, Ruixiang Sun, Sheng bo Fan, Kun Zhang, Chao Liu, Zuo-Fei Yuan, Quan Wang, Si-Qi Liu, Meng qiu Dong, and Si-Min He. pfind-alioth: A novel unrestricted database search algorithm to improve

the interpretation of high-resolution ms/ms data. Journal of proteomics, 125:89-97, 2015.

[8] Hao Chi, Chao Liu, Hao Yang, Wen feng Zeng, Long Wu, Wen-Jing Zhou, Rui-Min Wang, X. Y. Niu, Yue-He Ding, Yao Zhang, Zhao-Wei Wang, Zhen-Lin Chen, Ruixiang Sun, T. Liu, Guang-Ming Tan, Meng qiu Dong, Ping Xu, Pei Zhang, and Si-Min He. Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine. Nature Biotechnology, 36:10591061, 2018.

[9] Jurgen Cox and Matthias Mann. Maxquant enables high peptide identification rates, individualized ppb-range mass accuracies and proteome-wide protein quantification. Nature biotechnology, 26(12):1367-1372, 2008.

[10] Jurgen Cox, Nadin Neuhauser, Annette Michalski, Richard A. Scheltema, Jesper V. Olsen, and Matthias Mann. Andromeda: a peptide search engine integrated into the maxquant environment. Journal of proteome research, 10 4:1794-805, 2011.

[11] Frank Desiere, Eric W. Deutsch, Nichole L. King, Alexey I. Nesvizhskii, Parag Mallick, Jimmy K. Eng, Sharon Chen, James Eddes, Sandra N. Loevenich, and Ruedi Aebersold. The peptideatlas project. Nucleic Acids Research, 34:D655 -D658, 2005.

[12] Benjamin Diament and William Stafford Noble. Faster sequest searching for peptide identification from tandem mass spectra. Journal of proteome research, 10 9:3871-9, 2011.

[13] Viktoria Dorfer, Peter Pichler, Thomas Stranzl, Johannes Stadlmann, Thomas Taus, Stephan Winkler, and Karl Mechtler. Ms amanda, a universal identification algorithm optimized for high accuracy tandem mass spectra. Journal of Proteome Research, 13(8):3679-3684, 2014.

[14] J. E. Elias and S. P. Gygi. Target-decoy search strategy for mass spectrometry-based proteomics. Methods in Molecular Biology, 604(55-71), 2010.

[15] Jimmy K Eng, Tahmina A Jahan, and Michael R Hoopmann. Comet: an open-source ms/ms sequence database search tool. Proteomics, 13(1):22-24, 2013.

[16] Jimmy K. Eng, Ashley L. McCormack, and John R. Yates. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry, 5:976-989, 1994.

[17] David Fenyo and Ronald C Beavis. A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Analytical chemistry, 75(4):768-774, 2003.

[18] Lewis Y. Geer, Sanford P. Markey, Jeffrey A. Kowalak, Lukas Wagner, Ming Jing Xu, Dawn M. Maynard, Xiaoyu Yang, Wenyao Shi, and Stephen H. Bryant. Open mass spectrometry search algorithm. Journal of proteome research, 3 5:958-64, 2004.

[19] Johannes Griss, Andrew R. Jones, Timo Sachsenberg, Mathias Walzer, Laurent Gatto, Jürgen Hartler, Gerhard G. Thallinger, Reza M. Salek, Christoph Steinbeck, Nadin Neuhauser, Jürgen Cox, Steffen Neumann, Jun Fan, Florian Reisinger, Qing-Wei Xu, N del Toro, Yasset Perez-Riverol, Fawaz Ghali, Nuno Bandeira, Ioannis Xenarios, Oliver Kohlbacher, Juan Antonio Vizcaino, and Henning Hermjakob. The mztab data exchange format: Communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience*. Molecular & Cellular Proteomics : MCP, 13:2765 - 2775, 2014.

[20] Jian Guo, Huaxu Yu, Shipei Xing, and Tao Huan. Addressing big data challenges in mass spectrometry-based metabolomics. Chemical communications, 2022.

[21] Kun He, Mengjie Li, Yan Fu, Fuzhou Gong, and Xiaoming Sun. A direct approach to false discovery rates by decoy permutations. arXiv: Methodology, 2018.

[22] Benjamin J. Heil, Michael M. Hoffman, Florian Markowetz, Su-In Lee, Casey S. Greene, and Stephanie C. Hicks. Reproducibility standards for machine learning in the life sciences. Nature Methods, 18:1132 - 1135, 2021.

[23] Lilian R. Heil, Eugen Damoc, Tabiwang N. Arrey, Anna Pashkova, Eduard Denisov, Johannes Petzoldt, Amelia C. Peterson, Chris Hsu, Brian C. Searle, Nicholas Shulman, Michael Riffle, Brian Connolly, Brendan X. MacLean, Philip M. Remes, Michael W Senko, Hamish I Stewart, Christian Hock, Alexander A. Makarov, Daniel Hermanson, Vlad Zabrouskov, Christine C. Wu, and Michael J. MacCoss. Evaluating the performance of the astral mass analyzer for quantitative proteomics using data-independent acquisition. Journal of Proteome Research, 22:3290 - 3300, 2023.

[24] J. J. Howbert and W. S. Noble. Computing exact p-values for a cross-correlation shotgun proteomics score function. Molecular and Cellular Proteomics, 13(9):2467-2479, 2014.

[25] Jan W Huebbers, Kim Büttgen, Franz Leissing, Melissa Mantz, Markus Pauly, Pitter F Huesgen, and Ralph Panstruga. An advanced method for the release, enrichment and purification of high-quality Arabidopsis thaliana rosette leaf trichomes enables profound insights into the trichome proteome. Plant Methods, 18(1):1-23, 2022.

[26] Philip Jones, Richard G. Cote, Lennart Martens, Antony F. Quinn, Chris F. Taylor, William Derache, Henning Hermjakob, and Rolf Apweiler. Pride: a public repository of protein and peptide identifications for the proteomics community. Nucleic Acids Research, 34:D659 - D663, 2005.

[27] Lukas Kall, Jesse D. Canterbury, Jason Weston, William Stafford Noble, and Michael J. MacCoss. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nature Methods, 4:923-925, 2007.

[28] Mathias Kalxdorf, Torsten Müller, Oliver Stegle, and Jeroen Krijgsveld. IceR

improves proteome coverage and data completeness in global and single-cell proteomics. Nature Communications, 12(1):4787, 2021.

[29] Claire Kamaliddin, Emilie Guillochon, Virginie Salnot, David Rombaut, Stephanie Huguet, François Guillonneau, Sandrine Houze, Michel Cot, Philippe Deloron, Nicolas Argy, et al. Comprehensive analysis of transcript and protein relative abundance during blood stages of Plasmodium falciparum infection. Journal of Proteome Research, 20(2):1206-1216, 2021.

[30] Seo Young Kang, Eun Ji Lee, Jung Woo Byun, Dohyun Han, Yoori Choi, Do Won Hwang, and Dong Soo Lee. Extracellular vesicles induce an aggressive phenotype in luminal breast cancer cells via pkm2 phosphorylation. Frontiers in Oncology, 11, 2021.

[31] Attila Kertesz-Farkas, Frank Lawrence Nii Adoquaye Acquaye, Vladislav Ostapenko, Rufino Haroldo Locon, Yang Young Lu, Charles E. Grant, and William Stafford Noble. Fast and memory efficient searching of large-scale mass spectrometry data using tide. bioRxiv, 2025.

[32] Attila Kertesz-Farkas, Beata Reiz, Michael P Myers, and Sandor Pongor. Database searching in mass spectrometry based proteomics. Current Bioinformatics, 7(2):221-230, 2012.

[33] D. Kessner, M. Chambers, R. Burke, D. Agnus, and P. Mallick. Proteowizard: open source software for rapid proteomics tools development. Bioinformatics, 24(21):2534-2536, 2008.

[34] A. T. Kong, F. V. Leprevost, D. M. Avtonomov, D. Mellacheruvu, and A. I. Nesvizhskii. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nature Methods, 14(5):513-520, 2017.

[35] Andy T. Kong, Felipe da Veiga Leprevost, Dmitry M. Avtonomov, Dattatreya Mellacheruvu, and Alexey I. Nesvizhskii. Msfragger: ultrafast and comprehensive peptide identification in shotgun proteomics. Nature methods, 14:513 - 520, 2017.

[36] PS Kostenetskiy, RA Chulkevich, and VI Kozyrev. Hpc resources of the higher school of economics. In Journal of Physics: Conference Series, volume 1740, page 012050. IOP Publishing, 2021.

[37] M. R. Lazear. Sage: An open-source tool for fast proteomics searching and quantification at scale. Journal of Proteome Research, 22(11):3652-3659, 2023.

[38] A. Lin, J. J. Howbert, and W. S. Noble. Combining high-resolution and exact calibration to boost statistical power: A well-calibrated score function for highresolution MS2 data. Journal of Proteome Research, 17:3644-3656, 2018.

[39] Andy Lin, Temana Short, William Stafford Noble, and Uri Keich. Improving peptide-level mass spectrometry analysis via double competition. Journal of proteome research, 2022.

[40] Lennart Martens, Henning Hermjakob, Philip Jones, Marcin Adamski, Chris F. Taylor, David J. States, Kris Gevaert, Joel Vandekerckhove, and Rolf Apweiler. Pride: The proteomics identifications database. PROTEOMICS, 5, 2005.

[41] Sean McIlwain, Kaipo Tamura, Attila Kertesz-Farkas, Charles E Grant, Benjamin Diament, Barbara Frewen, J Jeffry Howbert, Michael R Hoopmann, Lukas Küll, Jimmy K Eng, et al. Crux: rapid open source protein tandem mass spectrometry analysis. Journal of Proteome Research, 13(10):4488-4491, 2014.

[42] R. Millikin, S. Solntsev, M. Shortreed, and L. Smith. Ultrafast peptide labelfree quantification with flashlfq. Journal of Proteome Research, 17:386-391, 2018.

[43] M. C. Mudge, M. Riffle, G. Chebli, D. Plubell, T. Rynearson, W. S. Noble, E. Timmins-Schiffman, J. Kubanek, and B L. Nunn. Harmful algal blooms preceded by a predictable and quantifiable shift in the oceanic microbiome. Nature Communications, 2024. In press.

[44] Johannes B. Muller, Philipp E. Geyer, Ana R. Colago, Peter V. Treit, Maximilian T. Strauss, Mario Oroshi, Sophia Doll, Sebastian Virreira Winter,

Jakob Maximilian Bader, Niklas D. Köhler, Fabian J Theis, Alberto Santos, and Matthias Mann. The proteome landscape of the kingdoms of life. Nature, 582:592-596, 2020.

[45] Christopher Y Park, Aaron A. Klammer, Lukas Kall, Michael J. MacCoss, and William Stafford Noble. Rapid and accurate peptide identification from tandem mass spectra. Journal of proteome research, 7 7:3022-7, 2008.

[46] Yasset Perez-Riverol, Jingwen Bai, Chakradhar Bandla, David Garcia-Seisdedos, Suresh Hewapathirana, Selvakumar Kamatchinathan, Deepti Jaiswal Kundu, Ananth Prakash, Anika Frericks-Zipper, Martin Eisenacher, Mathias Walzer, Shengbo Wang, Alvis Brazma, and Juan Antonio Vizcaino. The pride database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Research, 50:D543 - D552, 2021.

[47] David N Perkins, Darryl JC Pappin, David M Creasy, and John S Cottrell. Probability-based protein identification by searching sequence databases using mass spectrometry data. ELECTROPHORESIS: An International Journal, 20(18):3551-3567, 1999.

[48] Julia Rechenberger, Patroklos Samaras, Anna Jarzab, Juergen Behr, Martin Frejno, Ana Djukovic, Jaime Sanz, Eva M. Gonzalez-Barbera, Miguel Salavert, Jose Luis Lopez-Hontangas, Karina B. Xavier, Laurent Debrauwer, Jean Marc Rolain, Miguel Angel Sanz, Marc Garcia-Garcera, Mathias Wilhelm, Carles Ubeda, and Bernhard Kuster. Challenges in clinical metaproteomics highlighted by the analysis of acute leukemia patients with gut colonization by multidrug-resistant enterobacteriaceae. Proteomes, 7, 2019.

[49] Siranush Sarkizova, Susan Klaeger, Phuong M Le, Letitia W Li, Giacomo Oliveira, Hasmik Keshishian, Christina R Hartigan, Wandi Zhang, David A Braun, Keith L Ligon, et al. A large peptidome dataset improves hla class i epitope prediction across most of the human population. Nature biotechnology, 38(2):199-209, 2020.

[50] Mikhail M. Savitski, Toby Mathieson, Nico Zinn, Gavain M A Sweetman, Carola Doce, Isabelle Becher, Fiona Pachl, Bernhard Kuster, and Marcus Bantscheff. Measuring and managing ratio compression for accurate itraq/tmt quantification. Journal of proteome research, 12 8:3586-98, 2013.

[51] B. C. Searle, L. K. Pino, J. D. Egertson, Y. S. Tin, R. T. Lawrence, B. X. MacLean, J. Vill'en, and M. J. MacCoss. Chromatogram libraries improve peptide detection and quantification by data independent acquisition mass spectrometry. Nature Communications, 9:5128, 2018.

[52] Xiaomeng Shen, Shichen Shen, Jun Li, Qiang Hu, Lei Nie, Chengjian Tu, Xue Wang, Benjamin Orsburn, Jianmin Wang, and Jun Qu. An ionstar experimental strategy for ms1 ion current-based quantification using ultrahigh-field orbitrap: reproducible, in-depth, and accurate protein measurement in large cohorts. Journal of proteome research, 16(7):2445-2456, 2017.

[53] Lauren E. Stopfer, Joshua M. Mesfin, Brian A. Joughin, Douglas A. Lauffenburger, and Forest M. White. Multiplexed relative and absolute quantitative immunopeptidomics reveals mhc i repertoire alterations induced by cdk4/6 inhibition. Nature Communications, 11, 2020.

[54] P. Sulimov and A. Kertesz-Farkas. Tailor: A nonparametric and rapid score calibration method for database search-based peptide identification in shotgun proteomics. Journal of Proteome Research, 19(4):1481-1490, 2020.

[55] Wilfred H. Tang, Benjamin Halpern, Ignat V. Shilov, Sean L. Seymour, Sean P. Keating, Alexander V. Loboda, Alpesh A. Patel, Daniel A. Schaeffer, and Lydia M. Nuwaysir. Discovering known and unanticipated protein modifications using ms/ms database searching. Analytical chemistry, 77 13:3931-46, 2005.

[56] Stefka Tyanova, Tikira Temu, and Juergen Cox. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nature Protocols, 11(12):2301-2319, 2016.

[57] UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Research, page gku989, 2014.

[58] K. Verheggen, H. Raeder, F. S. Berven, L. Martens, H. Barsnes, and M. Vaudel. Anatomy and evolution of database search engines—a central component of mass spectrometry based proteomic workflows. Mass Spectrometry Reviews, 2017. Epub ahead of print.

[59] Craig D Wenger and Joshua J Coon. A proteomics search algorithm specifically designed for high-resolution tandem mass spectra. Journal of proteome research, 12(3):1377-1386, 2013.

[60] Lei Xin, Rui Qiao, Xin Chen, Hieu Tran, Shengying Pan, Sahar Rabinoviz, Haibo Bian, Xianliang He, Brenton Morse, Baozhen Shan, et al. A streamlined platform for analyzing tera-scale DDA and DIA mass spectrometry data enables highly sensitive immunopeptidomics. Nature Communications, 13(1):3108, 2022.

[61] Fengchao Yu, Sarah E Haynes, Guo Ci Teo, Dmitry M Avtonomov, Daniel A Polasky, and Alexey I Nesvizhskii. Fast quantitative analysis of timsTOF PASEF data with MSFragger and IonQuant. Molecular & Cellular Proteomics, 19(9):1575-1585, 2020.

[62] Yuan yuan Gao, Lingyan Ping, Duc M. Duong, Chengpu Zhang, Eric B. Dammer, Yanchang Li, Peiru Chen, Lei Chang, Huiying Gao, Junzhu Wu, and Ping Xu. Mass-spectrometry-based near-complete draft of the saccharomyces cerevisiae proteome. bioRxiv, 2020.

Обратите внимание, представленные выше научные тексты размещены для ознакомления и получены посредством распознавания оригинальных текстов диссертаций (OCR). В связи с чем, в них могут содержаться ошибки, связанные с несовершенством алгоритмов распознавания. В PDF файлах диссертаций и авторефератов, которые мы доставляем, подобных ошибок нет.