Protein Informatics glossary & taxonomy

You are here Biopharmaceutical / Genomic glossary homepage > Informatics > Protein Informatics

Protein Informatics Glossary & taxonomy
Evolving Terminology for Emerging Technologies
Comments? Revisions? Suggestions?
Mary Chitty MSLS mchitty@healthtech.com
Last revised January 03, 2020

SCOPE NOTE: Protein informatics is a newer name for an already existing discipline. It encompasses the techniques used in bioinformatics and molecular modeling that are related to proteins. While bioinformatics is mainly concerned with the collection, organization, and analysis of biological data, molecular modeling is devoted to representation and manipulation of the structure of proteins. Karl Heinz Zimmerman, An introduction to protein informatics, Springer, 2003 https://www.springer.com/us/book/9781402075780

Drug discovery term index   Drug targets Molecular Diagnosticsc
Informatics Algorithms   Bioinformatics . Cheminformatics Drug discovery Informatics Genomic Informatics   Ontologies & Taxonomies
Technologies Protein Technologies   Mass spectrometry   NMR & X-Ray Crystallography Metabolic engineering
Biology Protein Structures   Proteins    Functional Genomics   Proteomics

ab initio: From the beginning (Latin) .

ab initio protein modeling: Predict 3D structure from sequence without using a homologous model/ template; this technology is not at the stage of being broadly applicable to drug discovery. CHI Structural proteomics report

Ab initio methods use the physiochemical properties of the amino acid sequence of a protein to literally calculate a 3D structure (lowest energy model) based on protein folding. As opposed to determining the structure of an entire protein, ab initio methods are typically used to predict and model protein folds (domains). This method is gaining considerably, in part due to the development of novel mathematical approaches, a boost in available computational resources (for example, tera- and pentaFLOPS supercomputers), and considerable interest from researchers investigating protein- ligand (or drug) interactions. Christopher Smith "Bioinformatics, Genomics, and Proteomics" Scientist 14[23]:26, Nov. 27, 2000 Related terms protein structure prediction

ab initio protein structure prediction: Prediction of a protein’s structure based on amino acid sequence alone — that is, without mapping the structure to structures of known sequences.
Broader term: protein structure prediction (compared with ab initio). Narrower term (compared with structure prediction)

ab initio quantum mechanical methods: Methods of quantum mechanical calculations independent of any experiment other than the determination of fundamental constants. The methods are based on the use of the full Schrödinger equation to treat all the electrons of a chemical system. In practice, approximations are necessary to restrict the complexity of the electronic wave function and to make its calculation possible. (Synonymous with non- empirical quantum mechanical methods.) IUPAC Computational

ab initio quantum mechanical modeling: The application of ab initio modelling cross diverse fields such as condensed matter physics, materials science and chemistry has been demonstrated over the past 10 years. ... The recent completion of the Human Genome Project will offer an unprecedented number of protein receptors and enzymes as targets for pharmacological intervention in disease processes. However, before this wealth of information can be used to develop pharmaceuticals, an understanding of the biochemistry of the newly identified proteins and their interactions must be obtained. First principles quantum mechanical modelling will play an important role in this process. [Matthew Segall, Ursula Röthlisberger, Paolo Carloni, CECAM/Psi-k Workshop: Ab Initio Modelling in the Biological Sciences Lyon, France 11-13 June 2001] http://www.tcm.phy.cam.ac.uk/~mds21/Workshop2001/
Scientific/node1.html#SECTION00010000000000000000

annotation protein - dictionary-driven: For many years, computational methods seeking to automatically determine the properties (functional, structural, physiochemical, etc.) of a protein directly from sequence have been the focus of numerous research groups, including ours. By general admission, this is a difficult problem and the methods that have been proposed over the years typically concentrated on the analysis of individual genes. With the advent of advanced sequencing methods and systems, the number of amino acid sequences and fragments being deposited in the public databases has been increasing steadily. This in turn generated a renewed demand for automated approaches that can quickly, exhaustively and objectively annotate individual sequences as well as complete genomes. In this paper, we present one such approach. The approach is centered around and exploits the Bio- Dictionary, an exhaustive collection of amino acid patterns (referred to as seqlets) that completely covers the natural sequence space of proteins to the extent that this space is sampled by the currently available public databases. Isidore Rigoutsos, Tien Huynh, Laxmi P. Parida, Daniel E. Platt, Aris Floratos, Dictionary Driven Protein Annotation, Nucleic Acids Research, 30 (no 17) 3901- 3916, 2002

CASP Critical Assessment of Techniques for Protein Structure Alignment Protein Structure Prediction Center http://predictioncenter.org/ Links to CASP meetings results https://en.wikipedia.org/wiki/CASP

comparative modeling: See homology modeling

comparative proteomics: The C. elegans proteome was used as an alignment template to assist in novel human gene identification … Among the available 18,452 C. elegans protein sequences, our results indicate that at least 83% had human homologous genes, with 7954 records of C. elegans proteins matching known human gene transcripts. [CH Lai et al "Identification of Novel Human Genes Evolutionarily Conserved in Caenorhabditis elegans by Comparative Proteomics" Genome Research 10(5): 703-713 May 2000
Related terms Functional Genomics comparative genomics, evolutionary genomics.

computational biophysics: Activities of the Theoretical and Computational Biophysics Group center on the structure and function of supramolecular systems in the living cell, and on the development of new algorithms and efficient computing tools for structural biology. The Resource brings the most advanced molecular modeling, bioinformatics, and computational technologies to bear on questions of biomedical relevance. Theoretical and Computational Biophysics Group, Univ. of Illinois Urbana Champaign, About the Group http://www.ks.uiuc.edu/Overview/intro.html

Our research focuses on the modeling of large macromolecular systems in realistic environments. These efforts have produced insight into biomolecular processes coupled to mechanical force, bioelectronic processes in metabolism and vision, and the function and mechanism of membrane proteins. Theoretical and Computational Biophysics Group, Univ. of Illinois Urbana Champaign, Emerging Studies, http://www.ks.uiuc.edu/Research/Recent/

contextual data: While proteomic studies initially focused largely on expression and protein identification, progress in these areas drove the demand for more detailed types of proteomic data. Now researchers want information about where specific proteins are expressed, both in terms of tissues and localization within the cell. Information relating proteins to function require additional details of post- translational modification, and studies of protein interactions have moved beyond just looking at binary interactions to studies of protein complexes. For both genomics and proteomics, this shift can be characterized as an interest in more contextual data. Enhanced insight into biological context is essential for obtaining a better understanding of how biology actually works, and thus there is now an emphasis to move from genomic and proteomic snapshots to time series data of expression. Such context is of particular value if biological studies are to be translated into medical advances, because of the importance of being able to predict the impact of potential treatments. The integration of genomic and proteomic data with medical conditions, treatment and outcomes becomes another critical type of contextual information. Christina Lingham, Beyond Genome: Thinking Globally, Cambridge Healthtech

docking: Computational simulation of a candidate ligand binding to a receptor. Wikipedia docking glossary accessed 2018 Aug 26 https://en.wikipedia.org/wiki/Docking_(molecular) Narrower term: pharmacophore based docking

docking studies: Computational techniques for the exploration of the possible binding modes of a substrate to a given receptor, enzyme or other binding site. IUPAC Computational Related terms: drug design, QSAR

domain shuffling: Creating new proteins by bringing domains together. It is thought that this is a major way that new proteins have arisen during evolution. Thus, mining of databases for homology by domains, rather than by whole proteins (which are not as evolutionarily conserved), is important in obtaining clues to functionality.

A protein sequence can have more than one domain. Related term: multi- domain proteins.

energy function: Computationally, a shape is assigned to a protein sequence based on an empirical energy function. The lower the energy of a given structure, the more likely it is to be the correct fold. The structure prediction challenge is therefore divided into two: (1) The first challenge is the creation of many plausible folds or a set of structures that will include the native shape. The creation of the appropriate set depends on existing databases (such as the Protein Data Bank) or on the design of automated algorithms (using physical or statistical information) to generate plausible folds. Once the set is available, a selection procedure is used to ``fish'' out the correct fold. (2) The ``fishing'' of the plausible native shapes critically depends on the quality of the energy function. The value of the energy function must be the lowest for the native structure. Opportunities in Molecular Biomedicine in the Era of Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign Molecular Biomedicine in the Era of Teraflop Computing - DDDAS.org

fold alignment: A critical step in homology modeling, because it provides the key structures for the model. If suitably matched folds cannot be identified, a type of fold assignment known as protein threading can be used.

fold recognition: Methods of protein fold recognition attempt to detect similarities between protein 3D structure that are not accompanied by any significant sequence similarity. There are many approaches, but the unifying theme is to try and find folds that are compatible with a particular sequence. Unlike sequence- only comparison, these methods take advantage of the extra information made available by 3D structure information. In effect, the turn the protein folding problem on it's head: rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence. Robert B. Russell, Guide to Structure Prediction "Fold recognition methods and links" Sept. 1999 http://www.sbg.bio.ic.ac.uk/people/rob/CCP11BBS/foldrec.html Related terms threading; Protein structure. protein folding, protein folds

foldedness: Methods for analyzing "foldedness" of expressed proteins include NMR and circular dichroism spectroscopies.

Hidden Markov Model HMM: Wikipedia https://en.wikipedia.org/wiki/Hidden_Markov_model Useful for insights into protein structure sequence and function. Related term: simulated annealing

homeomorphic superfamilies: Protein families are clustered into "homeomorphic superfamilies". Sequences are homeomorphic if they can be aligned from end- to- end. In practice, we allow the amino and carboxyl ends to be ragged and moderate internal length variations (represented as gaps in the sequences). However, all members of the superfamily should have the same overall domain architecture, i.e., the same domains in the same order (except for domains missing due to alternative splicing or very recent genetic events). It is assumed, although in most cases this has not been investigated in detail, that the molecules in a homeomorphic superfamily share a common evolutionary history since the acquisition of their constituent domains. Thus, it should be valid to construct an evolutionary tree from the members of a homeomorphic superfamily. If two groups of proteins with the same architecture are shown to have come to that structure independently, they are appropriately separated into two homeomorphic superfamilies. PIR Classification Terminology, Georgetown Univ, revised 1998 http://pir.georgetown.edu/pirwww/aboutpir/doc/short_sf_def.html

homology: Genomic informatics

homology domains: Many types of domains have been found in diverse proteins. In common use, the term "immunoglobulin superfamily" refers to the collection of all proteins that contain an immunoglobulin- like domain. We call such a group a "homology domain superfamily". Any given protein sequence will be assigned to only one homeomorphic superfamily, but it may contain sequence segments belonging to several homology domain superfamilies. PIR Classification Terminology, Georgetown Univ, revised 1998 http://pir.georgetown.edu/pirwww/aboutpir/doc/short_sf_def.html

homology model: A model of a protein, whose three-dimensional structure is unknown, built from, e.g., the X-ray coordinate data of similar proteins or using alignment techniques and homology arguments. IUPAC Computational Related terms: Sequencing alignment

homology modeling: also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein (the "template"). Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence. It has been shown that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.[1] Evolutionarily related proteins have similar sequences and naturally occurring homologous proteins have similar protein structure. Wikipedia accessed 2018 Oct 23 https://en.wikipedia.org/wiki/Homology_modeling

A computational method for determining the structure of a protein based on its similarity to known structures. The accuracy of structures determined by homology modeling depends largely on the amount of homology between the unknown and the known protein sequence. The most successful tool for prediction of protein structure from sequence, but with significant room for improvement. Related terms: structural homology; Sequencing glossary sequence homology; Proteins glossary hypothetical protein; In silico & Molecular Modeling Compare with similarity

interologs: Protein interaction maps have provided insight into the relationships among the predicted proteins of model organisms for which a genome sequence is available. These maps have been useful in generating potential interaction networks, which have confirmed the existence of known complexes and pathways and have suggested the existence of new complexes and or crosstalk between previously unlinked pathways. However, the generation of such maps is costly and labor intensive. Here, we investigate the extent to which a protein interaction map generated in one species can be used to predict interactions in another species. LR Matthews "Identification of potential interaction networks using sequence- based searches for conserved protein- protein interactions or "Interologs" Genome Research 11 (12): 2120- 2126, Dec. 2001

location proteomics: Seeks to provide automated, objective high-resolution descriptions of protein location patterns within cells. Methods have been developed to group proteins into statistically indistinguishable location patterns using automated analysis of fluorescence microscope images. ... Preliminary work suggests the feasibility of expressing each unique pattern as a generative model that can be incorporated into comprehensive models of cell behaviour. RF Murphy, Location proteomics: a systems approach to subcellular location, Biochem Society Transactions, 33 (Pt 3): 535- 538, June 2005

membrane proteins: Drug Targets

ontologies - proteomics: A principal aim of post- genomic biology is elucidating the structures, functions and biochemical properties of all gene products in a genome. However, to adequately comprehend such a large amount of information we need new descriptions of proteins that scale to the genomic level. In short, we need a unified ontology for proteomics. Much progress has been made towards this end, including a variety of approaches to systematic structural and functional classification and initial work towards developing standardized, unified descriptions for protein properties. In relation to function, there is a particularly great diversity of approaches, involving placing a protein in structured hierarchies or more- generalized networks and a recent approach based on circumscribing a protein's function through systematic enumeration of molecular interactions. N Lan, GT Montelione, M. Gerstein, Ontologies for proteomics: towards a systematic definition of structure and function that scales to the genome level, Current Opinion in Chemical Biology 7(1): 44- 54, Feb. 2003

phylogenetic profiles: Phylogenomics Can be used to hypothesize protein function.

post- translational modification identification: ExPASy Proteomics Tools https://www.expasy.org/proteomics list a number of tools for prediction of post- translational modification, as do other websites. Identification of these modifications may provide important structural- functional information.

protein analysis sequencing: A process that includes the determination of AMINO ACID SEQUENCE of a protein (or peptide, oligopeptide or peptide fragment) and the information analysis of the sequence. MeSH 2000

protein array analysis: Ligand-binding assays that measure protein- protein, protein- small molecule or protein- nucleic acid interactions using a very large set of capturing molecules, i.e., those attached separately on the solid support, to measure the presence or interaction of target molecules in the sample. MeSH 2003

protein bioinformatics: Tools for Protein Informatics • sequence and structure comparison • multiple alignments • phylogenetic tree construction • composition/pI/mass analysis • motif/pattern identification • 2° structure prediction/threading • TMD prediction/hydrophobicity analysis • homology modeling • visualization A Very very very short introduction to protein bioinformatics, Patricia Babbitt 2003 http://pga.lbl.gov/Workshop/May2003/lectures/Babbitt.pdf See also protein informatics Is there a difference?

protein databases: Protein location can be determined by such genome- wide techniques as green fluorescent protein (GFP) tagging, and protein- protein interactions can be determined by affinity chromatography, immunoprecipitation and yeast two- hybrid experiments. Databases resulting from these methods are beginning to emerge, but they are of uncertain accuracy. Defining the Mandate of Proteomics in the Post- Genomics Era, Board on International Scientific Organizations, National Academy of Sciences, 2002 http://www.nap.edu/books/NI000479/html/R1.html

Dr. Stanley Fields, Professor of Genetics and Medicine at the Univ. of Washington and developer of the yeast two hybrid system writes that protein databases "will need to become much more sophisticated if they are to help scientists make sense of the staggering number of experimental measurements that will soon emerge. ... protein data will need to be integrated with results from expression profiling, genome- wide mutation or antisense analyses, and polymorphism detection. As proteomic data accumulate, we will become better at triangulating from multiple disparate bits of information to gain a bearing on what a protein does in the cell. S. Fields "Proteomics in Genomeland" Science 291: 1221-1224 Feb. 16, 2001 Related terms: protein identification, protein localization; Expression expression profiling
Protein databases Databases & software directory

protein dynamics: Certain parts of a particular protein will be rigid, but others may be flexible and change their shape, even when bound. ... NMR has the unique ability to characterize protein fluctuations quantitatively, much more so than crystallography can. Understanding the function of a protein is fundamental for gaining insight into many biological processes. Proteins are stable mechanical constructs that allow certain internal motions to enable their biological function. Structural properties of a protein can be obtained with X-ray crystallography or NMR acquisition techniques. Molecular dynamics (MD) simulations at pico/ nano- second time scales output one or more trajectory files which describe the coordinates of each individual atom over time. The main problem with animating these trajectories is one of temporal scale. Taking large time steps will destroy the impression of smooth motion, while small time steps will result in the camouflage of interesting motions. Henk Huitema, Robert van Liere " Interactive Visualization of Protein Dynamics" ERCIM [European Research Consortium for Computers and Informatics] News No. 44 - January 2001 http://www.ercim.org/publication/Ercim_News/enw44/van_liere.html

protein expression mapping: Maps, genetic & genomic
protein expression profiling: Expression
protein folding problem: Protein structures See also protein structure prediction

protein function: The focus of the group is the understanding of protein function and evolution using genomic, structural and proteomic data. Central to this question is the concept of the domain: a structurally conserved, genetically mobile unit. When viewed at the three-dimensional level of protein structure, a domain is a compact arrangement of secondary structures connected by linker polypeptides. It usually folds independently and possesses a relatively hydrophobic core. The importance of domains is that they cannot be divided into smaller units they represent a fundamental building block that can be used to understand the evolution and function of proteins... The advent of complete genomic sequences, including more and more eukaryotes, is leading to a fundamental change in protein domain analysis. Having characterised most of the domain families and having developed tools to predict them, we can now start to analyse their function and evolution on a higher level. Protein Function Analysis Group, Max Planck Institute for Molecular Genetics, Germany http://protfunc.molgen.mpg.de/

Function is not a fixed property for many, if not most proteins. There are many ways that gene products can be altered to elicit modified or completely new functions. For example there are exist - alternative splicing - which may affect as many as Â¼ or more of the genes in a higher eukaryote and can alter biochemical function either drastically or subtly, producing truncated proteins and proteins with different compositions - post- translational modification, such as phosphorylation and glycosidation (which can occur on numerous sites on the same protein) - pre-enzymes made for secretion and pro- enzymes that are activated by cleavage - acylation and ubiquitination - non- enzymatic modifications like oxidation, so a given protein exists in the cell in different oxidized states. Defining the Mandate of Proteomics in the Post- Genomics Era, Board on International Scientific Organizations, National Academy of Sciences, 2002 http://www.nap.edu/books/NI000479/html/R1.html

More systematic attempts have been made to place proteins within a hierarchy of standard functional categories or to connect them in overlapping networks of varying types of associations. These networks can obviously include protein- protein interactions ... More broadly, they can include pathways, regulatory systems and signaling cascades... Perhaps, in the future, the systematic combination of networks may provide for a truly rigorous definition of protein function. Mark Gerstein, et. al "Integrating Interactomes" Science 295 (5553): 284, Jan. 2002

A biologically useful definition of the function of a protein requires a description at several different levels. To the biochemist, function means the biochemical role of an individual protein: if it is an enzyme, function refers to the reaction catalyzed; if it is a signaling protein, function refers to the interactions that the protein makes. To the geneticist or cell biologist, function includes these roles but will also encompass the cellular roles of the protein, such as the phenotype of its deletion, the pathway in which it operates, among others. A physiologist or developmental biologist may have an even broader view of function, including tissue specificity and expression during the life cycle of the organism. Gregory A Petsko, Dagmar Ringe "Overview: The Structural Basis of Protein Function" from Chapter 2 of Protein Structure and Function: New Science Press, 1991-2001

In the expanded view of protein function, a protein is defined as an element in the network of its interactions. Various terms have been coined for this expanded notion of function, such as ‘contextual function’ or ‘cellular function’ … Whatever the term, the idea is that each protein in living matter functions as part of an extended web of interacting molecules … Often it is possible to understand the cellular functions of uncharacterized proteins through their linkages to characterized proteins. In broader terms, the networks of linkages offer a new view of the meaning of protein function, and in time should offer a deepened understanding of the function of cells. David Eisenberg et al "Protein function in the post- genomic era" Nature 405: 823- 826, 15 June 2000

The principal problem facing the post- genome era. Walter Blackstock & Malcolm Weir "Proteomics" Trends in Biotechnology: 121-134 Mar 1999

Related terms: Protein categories interaction proteomics; Functional genomics gene function, Gene Ontology^TM ; Maps cell mapping

protein identification: The analytical method used most commonly to visualize and identify large numbers of proteins is 2D-gel electrophoresis. One can theoretically visualize changes in protein production, both qualitatively and quantitatively, from two individual samples (e.g., a control preparation and a treated preparation). Furthermore, one can potentially accomplish protein identification by "picking" proteins from the 2D- gel and subjecting the highly purified protein to MALDI- TOF mass spectrometry.

protein informatics: Computational biological research has become an essential component of biological research. The great quantity and diversity of the data being generated by different technologies is daunting, and impossible to organize or oversee without computational assistance. In functional genomics, a great deal of effort has been devoted to developing community- based standards for reporting gene expression data to allow others to replicate experiments. The same will need to be done for proteomics to validate across the different technologies. Perhaps never before has a bioinformatics problem of this magnitude been approached. Without effective and integrated databases to store and retrieve these data and advanced computational methods such as pattern recognition and other machine learning approaches to analyze and interpret them, the full implications of these data will not be realized. Defining the Mandate of Proteomics in the Post- Genomics Era, Board on International Scientific Organizations, National Academy of Sciences, 2002 http://www.nap.edu/books/NI000479/html/R1.html

Although mining of protein structure homology data is a relatively small field now, it is likely to experience dramatic growth and to become pivotal in the ultimate exploitation of genomic data and tools. Related terms: proteoinformatics; Algorithms; protein bioinformatics; In Silico & molecular modeling

protein interactions: Narrower terms: protein DNA interactions, protein protein interactions, protein RNA interactions Related terms: annotation- proteins, binary interaction, interaction proteomics, protein networks; -Omes & -omics interactome

protein interaction mapping: Maps genomic & genetic
protein linkage maps: Maps genomic & genetic

protein & mRNA data: Although the relationship between mRNA and protein levels is vague for individual genes, some of the statistics for broad categories of protein properties are much more robust... In contrast to the differences between mRNA and protein data for individual genes, the broad categories show that the transcriptome and translatome populations are remarkably similar; both contain roughly the same proportions of secondary structure and functional categories. Moreover, this contrasts the difference with the genome, which appears to have a distinctly different composition of functional categories. This illustrates that we get a more consistent picture when we average across the population, i.e. there is broad similarity between the characteristics of highly expressed mRNA and highly abundant proteins. Dov Greenbaum, Mark Gerstein et. al. "Interrelating Different Types of Genomic Data" Dept. of Biochemistry and Molecular Biology, Yale Univ. 2001 http://bioinfo.mbb.yale.edu/e-print/omes-genomeres/text.pdf Related terms: Expression; Genomics genome data; functional genomics data -Omes & -omics transcriptome, translatome

protein networks: The individual steps in signal transduction pathways involve protein interactions with target molecules that may be other proteins, small molecules or DNA. Identifying all of the proteins that take part in a given class of interactions, on a genome-wide scale, remains an extremely challenging task. We propose to apply mRNA display (1, 2) technology to this problem, with the goal of developing databases of protein-ligand interactions that will add value to the existing and growing sequence databases. PI Jack Szostak, Definition of Protein Networks using mRNA display, ParaBioSYs, MGH, HMS, BU http://pga.mgh.harvard.edu/Parabiosys/projects/protein_networks_rna_display.php

protein sequence: A process that includes the determination of an amino acid sequence of a protein (or peptide, oligopeptide or peptide fragment) and the information analysis of the sequence. MeSH, 2002 See also amino acid sequence.

protein sequence space: [J.] Maynard-Smith's (1970. Natural Selection and the concept of a protein space. Nature 225: 563- 564) concept of a "protein sequence space" in which each site in an alignment is represented on its own axis and the number of axes required to represent all conceivable variants for a protein is equal to the number of sites in its sequence. Each sequence occupies a unique point in this space; variants differing at one site are adjacent (Hamming) neighbours. The collection of all viable sequence variants for a particular protein forms a localized interconnected `neighbourhood' of points within the space. This representation has proved conceptually intuitive and analytically powerful ... In protein sequence space, constraints are reflected in the multidimensional shape of the cluster of points that make up the "neighbourhood" of variants viable for a specific protein. The boundary defining the edge of this neighbourhood is characteristic of the protein's function and can be thought of as its functional "signature". Gavin JP Naylor, "Measuring Shifts In Function and Evolutionary Opportunity Using Variability Profiles: A Case Study of the Globins" also Journal of Molecular Evolution 51 (3): 223-233 Sept. 2000 http://bioinfo.mbb.yale.edu/e-print/protspace-jme/text.pdf

protein sorting signals: Amino acid sequences found in transported proteins that selectively guide the distribution of the proteins to specific cellular compartments. MeSH, 2001
Protein Spotlight, Swiss-Prot http://au.expasy.org/spotlight/ One month, one protein

protein structure prediction: Involves primary sequence alignment, secondary and tertiary structure prediction and homology modelling. Narrower term: ab initio protein structure prediction Related term: CASP

protein taxonomy: A Protein Taxonomy Based on Secondary Structure T. Przytycka, R. Aurora, GD Rose, Nature Structural Biology 6 (7): 1999.

protein threading: See threading

proteogenomics: The systematic study of annotated genomic information to global protein expression in order to determine the relationship between genomic sequences and both expressed proteins and predicted protein sequences. MeSH Year introduced: 2017

proteome informatics: Peer Bork and David Eisenberg, "Genome and proteome informatics" Current Opinion in Structural Biology 10 (3): 341-342, 2000

Proteome Informatics group is part of the Swiss Institute of Bioinformatics (SIB). It is in charge of research and development in the fields of bioinformatics, molecular imaging and the use of Internet for biomedical applications. Current Projects and People, ExPASy, Swiss Institute of Bioinformatics http://au.expasy.org/people/pig/

proteome map: Maps, genomic & genetic

proteome mining: We present the development and application of a new machine-learning approach to exhaustively and reliably identify major histocompatibility complex class I (MHC-I) ligands among all 208 octapeptides and in genome-derived proteomes of Mus musculus, influenza A H3N8, and vesicular stomatitis virus (VSV). Exhaustive Proteome Mining for Functional MHC-I Ligands ACS Chem. Biol., 2013, 8 (9), pp 1876–1881 DOI: 10.1021/cb400252t http://pubs.acs.org/doi/abs/10.1021/cb400252t

proteomic analysis: Systematic and quantitative analysis of the properties that define protein activity and functions within a defined context, essential for biology and medicine. Ruedi Aebersold quoted in Defining the Mandate of Proteomics in the Post- Genomics Era, National Academies Press, 2002 http://www.nap.edu/books/NI000479/html/R1.html

A systematic analysis of proteins for their identify quantity and function. J Peng and Steven Gygi, Proteomics: the move to mixtures, Journal of Mass Spectrometry 35: 1083- 1091, 2001

Proteomic Standards Initiative PSI: The HUPO Proteomics Standards Initiative (PSI) defines community standards for data representation in proteomics to facilitate data comparison, exchange and verification. Proteomic Standards Initiative, HUPO http://www.psidev.info/

regulatory homology: Quantitative analysis of protein expression data obtained by high - throughput methods has led us to define the concept of "regulatory homology" and use it to begin to elucidate the basic structure of gene expression control in vivo. N. Leigh Anderson, Norman G. Anderson "Proteome and proteomics; New technologies, new concepts, and new words" Electrophoresis 19(11):1853-61 August 1998

RNA structural genomics: The systematic determination of all macromolecular structures represented in a genome, is focused at present exclusively on proteins. It is clear, however, that RNA molecules play a variety of significant roles in cells, including protein synthesis and targeting, many forms of RNA processing and splicing, RNA editing and modification, and chromosome end maintenance. To comprehensively understand the biology of a cell, it will ultimately be necessary to know the identity of all encoded RNAs, the molecules with which they interact and the molecular structures of these complexes. This report focuses on the feasibility of structural genomics of RNA, approaches to determining RNA structures and the potential usefulness of an RNA structural database for both predicting folds and deciphering biological functions of RNA molecules. Jennifer A. Doudna "Structural Genomics of RNA" Nature Structural Biology 7 (11) supp: 954-956 (Nov. 2000

Rosetta stone method: A way of looking at the correlation of protein domains across species. Some proteins have homologs that are fused in other species, yielding clues as to the proteins with which they might interact. In addition, proteins that have been identified in particular complexes and pathways hint at the location and function of their homologs in other species. S. Spengler “Bioinformatics in the information age” Science 287 (5451): 221- 223 Feb. 18, 2000 Related term: Phylogenomics phylogenetic profiles

sequence homology, amino acid: The degree of similarity between sequences of amino acids. This information is useful for the understanding of genetic relatedness of certain species. MeSH, 1993

seuqence similarity searching: a method of searching sequence databases by using alignment to a query sequence. By statistically assessing how well database and query sequences match one can infer homology and transfer information to the query sequence. European Bioinformatics Institute https://www.ebi.ac.uk/Tools/sss/

Sequence similarity searching, typically with BLAST (units 3.3, 3.4), is the most widely used, and most reliable, strategy for characterizing newly determined sequences. Sequence similarity searches can identify” homologous” proteins or genes by detecting excess similarity – statistically significant similarity that reflects common ancestry. Pearson WR. An Introduction to Sequence Similarity (“Homology”) Searching. Current protocols in bioinformatics / editorial board, Andreas D Baxevanis. [et al]. 2013;0 3:10.1002/0471250953.bi0301s42. doi:10.1002/0471250953.bi0301s42.

structural bioinformatics: Involves the process of determining a protein's three- dimensional structure using comparative primary sequence alignment, secondary and tertiary structure prediction methods, homology modeling, and crystallographic diffraction pattern analyses. Currently, there is no reliable de novo predictive method for protein 3D- structure determination. Over the past half- century, protein structure has been determined by purifying a protein, crystallizing it, then bombarding it with X-rays. The X-ray diffraction pattern from the bombardment is recorded electronically and analyzed using software that creates a rough draft of the 3D structure. Biological scientists and crystallographers then tweak and manipulate the rough draft considerably. The resulting spatial coordinate file can be examined using modeling- structure software to study the gross and subtle features of the protein's structure. Christopher Smith "Bioinformatics, Genomics, and Proteomics" Scientist 14[23]:26, Nov. 27, 2000 Related terms Algorithms, In silico & Molecular Modeling.

structural genomics: Focuses on the physical aspects of the genome through the construction and comparison of gene maps and sequences, as well as gene discovery, localization, and characterization. Brush up on your 'omics, Chemical & Engineering News, 81(49): 20, Dec. 2003 http://pubs.acs.org/cen/coverstory/8149/8149genomics1.html

Involves quickly determining the 3D structures of large numbers of proteins (or other complex biological molecules, such as nucleic acids), ultimately accounting for an organism’s entire proteome. Footnote: As traditionally defined, the term structural genomics referred to the use of sequencing and mapping technologies, with bioinformatic support, to develop complete genome maps (genetic, physical, and transcript maps) and to elucidate genomic sequences for different organisms, particularly humans. Now, however, the term is increasingly used to refer to high- throughput methods for determining protein structures

Many of the criticisms leveled at the Human Genome Project in the mid- 1980’s have been redirected toward structural genomics. Unlike high- throughput genome sequencing, it is not a simple matter to decide when a structural genomics effort has reached completion. SK Burley et al “Structural genomics: beyond the Human Genome Project” Nature Genetics 23: 151 Oct. 1999 Related term: structural proteomics

A good explanation of structural genomics Joint Center for Structural Genomics http://www.jcsg.org/help/robohelp/Definitions/Structural_Genomics.htm
Human Proteomics Initiative, Swiss Institute of Bioinformatics, European Bioinformatics Institute http://us.expasy.org/sprot/hpi/ A major project to annotate all known human sequences according to the quality standards of Swiss- Prot. This means providing, for each known protein, a wealth of information that include the description of its function, its domain structure, subcellular location, post- translational modifications, variants, similarities to other proteins, etc.

structural homology: Identify 3D structures of proteins or domains in the same family as a sequence of interest. Related terms: homology Functional genomics homology modeling Molecular modeling

structure based design: A design strategy for new chemical entities based on the three- dimensional (3D) structure of the target obtained by X-ray or nuclear magnetic resonance (NMR) studies, or from protein homology models. IUPAC Computational

structure from sequence: See protein structure prediction, structural homology

structure prediction problem: The protein secondary structure prediction problem has become a classic, challenging problem for the artificial- intelligence and machine learning community. Virtually every conceivable computational technique in these fields (e.g., information theory [6, 12, 13], artificial neural networks [15, 20, 22], cascaded networks [18, 19, 27], hybrid systems [28], nearest neighbor methods [21], hidden markov chains [4], machine learning [17, 25], mutual information [26]) has been applied in the context of protein structure prediction. The reason for this attention is well- founded and clear: If protein structure, even secondary structure, can be accurately predicted from the now abundantly available gene and protein sequences, such sequences become immensely more valuable for the understanding of drug- design, the genetic basis of disease, the role of protein structure in its enzymatic, structural, and signal transduction functions, and basic physiology from molecular to cellular, to fully systemic levels. In short, the solution of the protein structure prediction problem (and the related protein folding problem) will bring on the second phase of the revolution. Peter Munson et. al "Protein Secondary Structure Prediction, NIH, 1994 http://abs.cit.nih.gov/reprints/text3.html SWISS- PROT: Databases & software directory

threading: In this approach, a target sequence is “threaded” through a library of 3D folds to try to find a match. This method is used when no sequence is clearly related to the target sequence.

Protein informatics resources
Joint Center for Structural Genomics Technologies http://www.jcsg.org/scripts/prod/technologies1.html

How to look for other unfamiliar terms

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.

Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map