You are here Biopharmaceutical / Genomic glossary homepage/Search > Applications > Protein Informatics

Protein Informatics Glossary & taxonomy
Evolving Terminology for Emerging Technologies

Comments? Revisions? Suggestions? Mary Chitty
Last revised January 12, 2015 
View a Printer-Friendly Version of this Web Page!

Applications Map   Finding guide to terms in these glossaries   Site Map 
Applications  Drug targets  Molecular Diagnostics
Informatics Algorithms
   Bioinformatics   Cheminformatics  Drug discovery Informatics   Genomic Informatics   Ontologies & Taxonomies  
Technologies  Protein Technologies   Mass spectrometry   NMR & X-Ray Crystallography  Metabolic engineering glossary
  Protein Structures   Proteins  Functional Genomics   Proteomics

ab initio: From the beginning (Latin)

ab initio protein modeling: Predict 3D structure from sequence without using a homologous model/ template; this technology is not at the stage of being broadly applicable to drug discovery. CHI Structural proteomics report

Ab initio methods use the physiochemical properties of the amino acid sequence of a protein to literally calculate a 3D structure (lowest energy model) based on protein folding. As opposed to determining the structure of an entire protein, ab initio methods are typically used to predict and model protein folds (domains). This method is gaining considerably, in part due to the development of novel mathematical approaches, a boost in available computational resources (for example, tera- and pentaFLOPS supercomputers), and considerable interest from researchers investigating protein- ligand (or drug) interactions.   Christopher Smith "Bioinformatics, Genomics, and Proteomics"  Scientist 14[23]:26, Nov. 27, 2000   Related terms protein structure prediction 

ab initio protein structure prediction: Prediction of a protein’s structure based on amino acid sequence alone — that is, without mapping the structure to structures of known sequences. 
Broader term: protein structure prediction
(compared with ab initio).  Narrower term (compared with structure prediction)

ab initio quantum mechanical methods: Methods of quantum mechanical calculations independent of any experiment other than the determination of  fundamental constants. The methods are based on the use of the full Schrödinger equation to treat all the electrons of a chemical system. In practice, approximations are necessary to restrict the complexity of the electronic wave function and to make its calculation possible. (Synonymous with non- empirical quantum mechanical methods.) IUPAC Computational

ab initio quantum mechanical modeling:  The application of ab initio modelling cross diverse fields such as condensed matter physics, materials science and chemistry has been demonstrated over the past 10 years. ... The recent completion of the Human Genome Project will offer an unprecedented number of protein receptors and enzymes as targets for pharmacological intervention in disease processes. However, before this wealth of information can be used to develop pharmaceuticals, an understanding of the biochemistry of the newly identified proteins and their interactions must be obtained. First principles quantum mechanical modelling will play an important role in this process.  [Matthew Segall, Ursula Röthlisberger, Paolo Carloni, CECAM/Psi-k Workshop: Ab Initio Modelling in the Biological Sciences Lyon, France 11-13 June 2001]

annotation- proteins: Macromolecular structure determination is moving from a functionally driven initiative to include a genomically driven initiative (structural genomics) where structures are determined based on what is known from a target sequence alone. The resultant structures may be structurally and functionally uncharacterized. Systematic Protein Annotation and Modeling (SPAM) is a multi-institutional initiative to make better use of target sequences and structures. SPAM includes new algorithms for the study of sequence-sequence, sequence-structure and structure-structure and deployment of these methods in resources available to the community  Systematic Protein Annotation and Modeling  Skaggs School of Pharmacy and Pharmaceutical Sciences and the San Diego Supercomputer Center (SDSC) at the University of California San Diego (UCSD), the Keck Graduate Institute (KGI) and the Burnham Institute (TBI). 

annotation protein - dictionary-driven For many years, computational methods seeking to automatically determine the properties (functional, structural, physiochemical, etc.) of a protein directly from sequence have been the focus of numerous research groups, including ours. By general admission, this is a difficult problem and the methods that have been proposed over the years typically concentrated on the analysis of individual genes. With the advent of advanced sequencing methods and systems, the number of amino acid sequences and fragments being deposited in the public databases has been increasing steadily. This in turn generated a renewed demand for automated approaches that can quickly, exhaustively and objectively annotate individual sequences as well as complete genomes. In this paper, we present one such approach. The approach is centered around and exploits the Bio- Dictionary, an exhaustive collection of amino acid patterns (referred to as seqlets) that completely covers the natural sequence space of proteins to the extent that this space is sampled by the currently available public databases. Isidore Rigoutsos, Tien Huynh, Laxmi P. Parida, Daniel E. Platt, Aris Floratos, Dictionary Driven Protein Annotation, Nucleic Acids Research, 30 (no 17) 3901- 3916, 2002 

candidate proteins:  NIGMS (part of NIH) is supporting research on identifying candidate proteins and their genes, including those that cause variations in human drug metabolism, transport, distribution, and excretion (for both small organic molecules and macromolecular drugs such as peptides and oligonucleotides), that may play a role in determining individual variations in drug responses and candidate proteins and their genes, including those that are direct targets for drug action (e.g., receptors, enzymes, signal transducing molecules, regulatory factors), that may play a role in determining individual variations in drug responses. National Institute of General Medical Sciences, Recommendations of the NIGMS Working Group -- Understanding Individual Variations in Drug Responses: From Phenotype to Genotype , June 9-10, 1998, Bethesda MD  Related terms: Pharmacogenomics

CASP Critical Assessment of  Techniques for Protein Structure Alignment Protein Structure Prediction Center, Lawrence Livermore National Lab, US  Links to CASP meetings results and information on "Ten most wanted" proteins solicitation.

comparative modeling: See homology modeling

comparative proteomics:  The C. elegans proteome was used as an alignment template to assist in novel human gene identification … Among the available 18,452 C. elegans protein sequences, our results indicate that at least 83% had human homologous genes, with 7954 records of C. elegans proteins matching known human gene transcripts. [CH Lai et al "Identification of Novel Human Genes Evolutionarily Conserved in Caenorhabditis elegans by Comparative Proteomics" Genome Research 10(5): 703-713 May 2000] Related terms Functional Genomics glossary comparative genomics, evolutionary genomics.

computational biophysics:  Activities of the Theoretical and Computational Biophysics Group center on the structure and function of supramolecular systems in the living cell, and on the development of new algorithms and efficient computing tools for structural biology.  The Resource brings the most advanced molecular modeling, bioinformatics, and computational technologies to bear on questions of biomedical relevance. Theoretical and Computational Biophysics Group, Univ. of Illinois Urbana Champaign,  About the Group 

Our research focuses on the modeling of large macromolecular systems in realistic environments. These efforts have produced insight into biomolecular processes coupled to mechanical force, bioelectronic processes in metabolism and vision, and the function and mechanism of membrane proteins. Theoretical and Computational Biophysics Group, Univ. of Illinois Urbana Champaign,  Emerging Studies, 

computational proteomics: Large- scale generation and analysis of 3D and 4D protein structural information and the application of structural knowledge across all life science disciplines. [Edward T. Maggio, Kal Ramnarayan "Recent developments in computational proteomics" Trends in Biotechnology 19 (7): 266- 272 July 2001] 

contextual data: While proteomic studies initially focused largely on expression and protein identification, progress in these areas drove the demand for more detailed types of proteomic data. Now researchers want information about where specific proteins are expressed, both in terms of tissues and localization within the cell. Information relating proteins to function require additional details of post- translational modification, and studies of protein interactions have moved beyond just looking at binary interactions to studies of protein complexes. For both genomics and proteomics, this shift can be characterized as an interest in more contextual data. Enhanced insight into biological context is essential for obtaining a better understanding of how biology actually works, and thus there is now an emphasis to move from genomic and proteomic snapshots to time series data of expression. Such context is of particular value if biological studies are to be translated into medical advances, because of the importance of being able to predict the impact of potential treatments. The integration of genomic and proteomic data with medical conditions, treatment and outcomes becomes another critical type of contextual information. Christina Lingham, Beyond Genome: Thinking Globally, Cambridge Healthtech 

designer proteins: Protein design is currently used for the creation of new proteins with desirable traits. In our lab, we focus on the synthesis of proteins with high essential amino acid content having potential applications in animal nutrition. One of the limitations we face in this endeavour is achieving stable proteins despite a highly biased amino acid content. We report here the synthesis and characterisation of two mutants derived from our MB-1 designer protein. Williams M, Gagnon MC, Doucet A, Beauregard M, "Design of high essential amino acid proteins: two design strategies for improving protease resistance of the nutritious MB-1 protein" Journal of Biotechnology 94(3): 245- 254, Apr. 11, 2002
Designer proteins
, Scripps 

Designer proteins can also refer to high- protein nutritional supplements.

differential proteomes: Mass spectrometry-based differential proteomics is a comprehensive analysis of protein expression that involves comparing distinct proteomes, such as cells, tissues or cell lines that are normal, diseased or treated. Mayo Clinic

differential proteomics: Differential proteomics makes qualitative and quantitative comparisons of proteomes under different conditions. This knowledge enables us to unravel the mysteries of biological processes. Genencor Proteome and Tools  

differential subproteomes: As defined by relative solubilities, cellular location and narrow-range immobilised pH gradients. . SJ Cordwell, AS Nouwens, NM Verrills, DJ Basseal, BJ Walsh, Subproteomics based upon protein cellular location and relative solubilities in conjunction with composite two-dimensional electrophoresis gels, Electrophoresis, 21(6): 1094- 103, April 2000  Broader terms: subproteomes, subproteomics

docking: Three- dimensional molecular structure is one of the foundations of structure- based drug design. Often, data are available for the shape of a protein and a drug separately, but not for the two together.  The program AutoDock was originally written in FORTRAN-77 in 1990 by David S. Goodsell here in Arthur J. Olson's laboratory.  It performs automated docking of ligands (small molecules like a candidate drug) to their macromolecular targets (usually proteins, sometimes DNA) Garrett B. Morris, “Molecular docking web”, Scripps, Dec. 2000
Wikipedia   Narrower term: pharmacophore based docking

docking programs: Programs for evaluating lead compounds against target proteins; these programs are “informed” by structure data. Traditional ligand- docking programs - such as DOCK, developed by Irwin Kuntz at the University of California at Berkeley; MacroModel, developed by Clark Still at Columbia University; and GOLD from MSI (now part of Pharmacopeia) - give information about potential ligands for a known protein structure.  These programs select molecules predicted to be highly complementary to the receptor structure and can screen many of these ligands against the protein.  This type of virtual screening technology  has already been incorporated into many major pharmaceutical companies’ discovery programs and offers the ability to screen many more compounds at once than the traditional laboratory- based method.  CHI Structural proteomics report

docking studies: Computational techniques for the exploration of the possible binding modes of a substrate to a given receptor, enzyme or other binding site. IUPAC Computational Related terms: drug design, QSAR  

domain shuffling: Creating new proteins by bringing domains together. It is thought that this is a major way that new proteins have arisen during evolution. Thus, mining of databases for homology by domains, rather than by whole proteins (which are not as evolutionarily conserved), is important in obtaining clues to functionality.  

A protein sequence can have more than one domain. Related term: multi- domain proteins.

energy function: Computationally, a shape is assigned to a protein sequence based on an empirical energy function. The lower the energy of a given structure, the more likely it is to be the correct fold. The structure prediction challenge is therefore divided into two: (1) The first challenge is the creation of many plausible folds or a set of structures that will include the native shape. The creation of the appropriate set depends on existing databases (such as the Protein Data Bank) or on the design of automated algorithms (using physical or statistical information) to generate plausible folds. Once the set is available, a selection procedure is used to ``fish'' out the correct fold. (2) The ``fishing'' of the plausible native shapes critically depends on the quality of the energy function. The value of the energy function must be the lowest for the native structure.  Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign

fold alignment: A critical step in homology modeling, because it provides the key structures for the model.  If suitably matched folds cannot be identified, a type of fold assignment known as protein threading can be used. 

fold recognition: Methods of protein fold recognition attempt to detect similarities between protein 3D structure that are not accompanied by any significant sequence similarity. There are many approaches, but the unifying theme is to try and find folds that are compatible with a particular sequence. Unlike sequence- only comparison, these methods take advantage of the extra information made available by 3D structure information. In effect, the turn the protein folding  problem on it's head: rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence. Robert B. Russell, Guide to Structure Prediction "Fold recognition methods and links" Sept. 1999 
Related terms threading; Protein structure. protein folding, protein folds 

foldedness: Methods for analyzing "foldedness" of expressed proteins include NMR and circular dichroism spectroscopies.

functional proteomics:  As the emerging field of proteomics continues to expand at an extremely rapid rate, the relative quantification of proteins, targeted by their function, becomes its greatest challenge. Complex analytical strategies have been designed that allow comparative analysis of large proteomes, as well as in depth detection of the core proteome or the interaction network of a given protein of interest. Functional Proteomics, Methods in Molecular Biology 2008

Is yielding large databases of interacting proteins and extensive pathways. Maps of these interactions are being scored and deciphered by novel high throughput technologies. However, traditional methods of screening have not been very successful in identifying protein- protein interaction inhibitors.  See also activity based protein profiling

Hidden Markov Models HMM: Searching a protein sequence database for homologues is a powerful tool for discovering the structure and function of a sequence. Amongst the algorithms and tools available for this task, Hidden Markov model (HMM) - based search methods improve both the sensitivity and selectivity of database searches by employing position- dependent scores to characterize and build a model for an entire family of sequences. HMMs have been used to analyze proteins using two complementary strategies. In the first, a sequence is used to a search a collection of protein families, such as Pfam, to find which of the families it matches. In the second approach an HMM for a family is used to search a primary sequence database to identify additional members of the family. The latter approach has yielded insights into protein involved in both normal and abnormal human pathology. Lawrence Berkeley Lab, US "Advanced Computational Structural Genomics"

A widely used probabilistic model for data that are observed in a sequential fashion (e.g., over time). A HMM makes two primary assumptions. The first assumption is that the observed data arise from a mixture of K probability distributions. The second assumption is that there is a discrete- time Markov chain with K states, which is generating the observed data by visiting the K distributions in Markov fashion. The "hidden" aspect of the model arises from the fact that the state- sequence is not directly observed. Instead, one must infer the state- sequence from a sequence of observed data using the probability model. Although the model is quite simple, it has been found to be very useful in a variety of sequential modeling problems, most notably in SPEECH RECOGNITION (Rabiner 1989) and more recently in other disciplines such as computational biology (Krogh et al. 1994). MITECS Online MIT Encyclopedia of the Cognitive Sciences   Related term: simulated annealing

homeomorphic superfamilies: Protein families are clustered into "homeomorphic superfamilies". Sequences are homeomorphic if they can be aligned from end- to- end. In practice, we allow the amino and carboxyl ends to be ragged and moderate internal length variations (represented as gaps in the sequences). However, all members of the superfamily should have the same overall domain architecture, i.e., the same domains in the same order (except for domains missing due to alternative splicing or very recent genetic events). It is assumed, although in most cases this has not been investigated in detail, that the molecules in a homeomorphic superfamily share a common evolutionary history since the acquisition of their constituent domains. Thus, it should be valid to construct an evolutionary tree from the members of a homeomorphic superfamily. If two groups of proteins with the same architecture are shown to have come to that structure independently, they are appropriately separated into two homeomorphic superfamilies. PIR Classification Terminology, Georgetown Univ, revised 1998 

homology: Functional genomics

homology domains: Many types of domains have been found in diverse proteins. In common use, the term "immunoglobulin superfamily" refers to the collection of all proteins that contain an immunoglobulin- like domain. We call such a group a "homology domain superfamily". Any given protein sequence will be assigned to only one homeomorphic superfamily, but it may contain sequence segments belonging to several homology domain superfamilies.  PIR Classification Terminology, Georgetown Univ, revised 1998 

homology model: A model of a protein, whose three-dimensional structure is unknown, built from, e.g., the X-ray coordinate data of similar proteins or using alignment techniques and homology arguments.  IUPAC Computational  Related terms:  Sequencing alignment

homology modeling: This procedure, also termed comparative modeling or knowledge-based modeling, develops a three-dimensional model from a protein sequence based on the structures of homologous proteins. ... Care must be used in applying the term, "homology modeling." In fact, as noted above some authors prefer alternative names for the procedure. One must recognize that homology does not necessarily imply similarity. Homology has a precise definition: having a common evolutionary origin [6,7]. Thus, homology is a qualitative description of the nature of the relationship between two or more things, and it cannot be partial. Either there is an evolutionary relationship or there is not. An assertion of homology usually must remain an hypothesis. Supporting data for a homologous relationship may include sequence or three-dimensional similarities, the relationships between which can be described in quantitative terms.  David R. Bevan, Molecular Modeling of Proteins and Nucleic Acids, Dept. of Biochemistry, Virginia Tech, 1997-2003 

A computational method for determining the structure of a protein based on its similarity to known structures. The accuracy of structures determined by homology modeling depends largely on the amount of homology between the unknown and the known protein sequence.  The most successful tool for prediction of protein structure from sequence, but with significant room for improvement.   

Related terms: structural homology;  Sequencing glossary sequence homology; Proteins glossary hypothetical protein; In silico & Molecular Modeling  Compare with similarity

hypothetical proteins: Many of the gene products of completely sequenced organisms are “hypothetical” – they cannot be related to any previously characterized proteins – and so are of completely unknown function. ..As each [completely sequenced] organism’s genome is analyzed about one third of the observed open reading frames (ORFs), although conserved among several organisms, encode for “hypothetical ‘ proteins that cannot be related to other proteins of known function or structure. Understanding the physiological function of the protein products of these so-called ‘orphan’ genes has emerged as a major challenge. E Eisenstein et al “Biological function made crystal clear – annotation of hypothetical proteins via structural genomics” Current Opinion in Biotechnology 11(1): 25- 30 Feb. 2000

Searching for hypothetical proteins: theory and practice based upon original data and literature. Lubec G, Afjehi-Sadat L, Yang JW, John JP. Prog Neurobiol. 2005 Sep-Oct;77(1-2):90-127. Epub 2005 Nov 4.  

All predicted protein  sequences lacking any significant sequence similarity to characterised proteins are labeled as ‘hypothetical proteins'. The majority of these cases come from the genome sequencing projects.  "SWISS- PROT" in Introduction to Molecular Biology Databases, R. Apweiler, R. Lopez, B. Marx, 1999 

in silico proteomics: Prediction of protein structure and function. [Gareth W. Roberts and Jonathan Swinton "In Silico Proteomics: Playing by the rules" Current Drug Discovery 5: Aug. 1, 2001

integral membrane proteins:   See also under membrane proteins

interologs: Protein interaction maps have provided insight into the relationships among the predicted proteins of model organisms for which a genome sequence is available. These maps have been useful in generating potential interaction networks, which have confirmed the existence of known complexes and pathways and have suggested the existence of new complexes and or crosstalk between previously unlinked pathways. However, the generation of such maps is costly and labor intensive. Here, we investigate the extent to which a protein interaction map generated in one species can be used to predict interactions in another species. LR Matthews "Identification of potential interaction networks using sequence- based searches for conserved protein- protein interactions or "Interologs" Genome Research 11 (12): 2120- 2126, Dec. 2001 

intrinsically disordered proteins IDPs:  Recent studies revealed that functional proteins without unique 3-D structures are highly abundant in nature. These intrinsically disordered proteins (IDPs) possess a number of crucial biological functions that are complementary to functions of structured (ordered) proteins. In any given organism, IDPs constitute a functionally broad and densely populated unfoldome; i.e., a set of unstructured proteins in a proteome. Being structurally and functionally very different from ordered proteins, IDPs require special experimental and computational tools for their identification and analyses.  Intrinsically Disordered Proteins  Unfoldome and Unfoldomics Gordon Research Conferences 2010 

location proteomics: Seeks to provide automated, objective high-resolution descriptions of protein location patterns within cells. Methods have been developed to group proteins into statistically indistinguishable location patterns using automated analysis of fluorescence microscope images. ... Preliminary work suggests the feasibility of expressing each unique pattern as a generative model that can be incorporated into comprehensive models of cell behaviour. RF Murphy, Location proteomics: a systems approach to subcellular location, Biochem Society Transactions, 33 (Pt 3): 535- 538, June 2005  

membrane proteins: Proteins which are found in membranes including cellular and intracellular membranes. They consist of two types, peripheral and integral proteins. They include most membrane- associated enzymes, antigenic proteins, transport proteins, and drug, hormone, and lectin receptors. MeSH, 1977   Narrower term: Drug & disease targets G-protein-coupled receptors GPCRs; 
Related term: Microarray categories membrane microarrays  membrane transport proteins:
Membrane proteins whose primary function is to facilitate the transport of molecules across a biological membrane. Included in this broad category are proteins involved in active transport (BIOLOGICAL TRANSPORT, ACTIVE), facilitated transport and ION CHANNELS.  MeSH 2002 

Membrane Proteins  January 21-22, 2015 • San Diego, CA Program | Register | Download Brochure

Classification, glossary of membrane transport proteins, IUBMB International Union of Biochemistry and Molecular Biology, 2002, 13 definitions 

Are  there any transport proteins which are not membrane proteins?  Broader term: carrier proteins

membrane proteomics: Membrane proteins comprise the largest set of proteins to resist high-throughput structural genomics efforts. One of the major impediments to the analysis of membrane proteins is the lack of generic and effective expression systems. The aims of the membrane protein platform are to develop the methodologies to perform high-throughput cloning, expression, purification and crystallization of membrane proteins. To date, we have purified over 30 targets to homogeneity. This represents ~10% of the total number of genes cloned (compared to an average of 40% for similar efforts with soluble proteins). The proteins we have purified include several active prokaryotic and eukaryotic rhomboid proteases and human G protein-coupled receptors (GPCRs).  "Membrane Proteomics", A Edwards Labs, Univ. of Toronto, Canada, 2004 

Membrane proteins perform some of the most important functions in the cell, including the regulation of cell signaling through surface receptors, cell-cell interactions, and the intracellular compartmentalization of organelles. Recent developments in proteomic strategies have focused on the inclusion of membrane proteins in high-throughput analyses. While slow and steady progress continues to be made in gel-based technologies, significant advances have been reported in non-gel shotgun methods using liquid chromatography coupled to mass spectrometry (LC/MS).  Wu CC, Yates John R,  The application of mass spectrometry to membrane proteomics Nature Biotechnology 21(3): 262- 267, March 2003  
Related terms: membrane proteins, membranomics

Monte Carlo technique: A simulation procedure consisting of randomly sampling the conformational space of a molecule. IUPAC Computational  Broader term: simulation

ontologies - proteomics: A principal aim of post- genomic biology is elucidating the structures, functions and biochemical properties of all gene products in a genome. However, to adequately comprehend such a large amount of information we need new descriptions of proteins that scale to the genomic level. In short, we need a unified ontology for proteomics. Much progress has been made towards this end, including a variety of approaches to systematic structural and functional classification and initial work towards developing standardized, unified descriptions for protein properties. In relation to function, there is a particularly great diversity of approaches, involving placing a protein in structured hierarchies or more- generalized networks and a recent approach based on circumscribing a protein's function through systematic enumeration of molecular interactions. N Lan, GT Montelione, M. Gerstein, Ontologies for proteomics: towards a systematic definition of structure and function that scales to the genome level, Current Opinion in Chemical Biology 7(1): 44- 54, Feb. 2003

ORFeome: Omes & omics glossary
peptide mapping, peptide maps: Maps, genomic & genetic
peptidomics: -Omes & -omics glossary

phylogenetic profiles: Phylogenomics glossary Can be used to hypothesize protein function.

orphan proteins: those proteins that do not have significant sequence identity(>10%) with other known proteins.  Bioinformatics.Org general forum

phosphoproteome: Characterization of post- translational modifications in proteins is one of the major tasks that is to be accomplished in the post- genomic era. Phosphorylation is a key reversible modification that regulates enzymatic activity, subcellular localization, complex formation and degradation of proteins. DE Kalume et. al, Tackling the phosphoproteome: tools and strategies, Current Opinion in Chemical Biology 7(1): 64- 69, Feb. 2003
Ahn NG, Resing KA (2001) Toward the phosphoproteome. Nature Biotechnology 19:317- 19318  

phosphoproteomics: Developments in the field of phosphoproteomics have been fueled by the need simultaneously to monitor many different phosphoproteins within the signaling networks that coordinate responses to changes in the cellular environment. Marc Mumby, Deirdre Brekken, Phosphoproteomics: new insights into cellular signaling, Genome Biology 2005, 6:230     doi:10.1186/gb-2005-6-9-230  

post- translational modification identification: ExPASy Proteomics Tools  list a number of tools for prediction of post- translational modification, as do other websites. Identification of these modifications may provide important structural- functional information.

predicted proteins:  ORFs with no similarity to other sequence were named predicted proteins.   MIT, Broad Institute, Methanosarcina project information, 2004 

predictive proteomics: The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 103 times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. Machine learning methods for predictive proteomics, Annalissa Barla et. Al Briefings in Bioinformatics (2008) 9 (2): 119-128. doi: 10.1093/bib/bbn008 

probable protein: See under putative proteins

probable protein (similarity): When a protein exhibits extensive sequence similarity to a characterised protein and/ or has the same conserved regions then the label ‘probable' is used in the DE line. "SWISS- PROT" in Introduction to Molecular Biology Databases, R. Apweiler, R. Lopez, B. Marx, 1999  Related term: Protein categories putative protein

protein analysis sequencing: A process that includes the determination of AMINO ACID SEQUENCE of a protein (or peptide, oligopeptide or peptide fragment) and the information analysis of the sequence.  MeSH 2000

protein array analysis: Ligand-binding assays that measure protein- protein, protein- small molecule or protein- nucleic acid interactions using a very large set of capturing molecules, i.e., those attached separately on the solid support, to measure the presence or interaction of target molecules in the sample. MeSH 2003

protein bioinformatics:  Tools for Protein Informatics  • sequence and structure comparison  • multiple alignments • phylogenetic tree construction  • composition/pI/mass analysis • motif/pattern identification • 2° structure prediction/threading • TMD prediction/hydrophobicity analysis • homology modeling • visualization A Very very very short introduction to protein bioinformatics, Patricia Babbitt 2003  See also Proteomics protein informatics Is there a difference?  

protein databases: Protein location can be determined by such genome- wide techniques as green fluorescent protein (GFP) tagging, and protein- protein interactions can be determined by affinity chromatography, immunoprecipitation and yeast two- hybrid experiments. Databases resulting from these methods are beginning to emerge, but they are of uncertain accuracy. Defining the Mandate of Proteomics in the Post- Genomics Era, Board on International Scientific Organizations, National Academy of Sciences, 2002

Dr. Stanley Fields, Professor of  Genetics and Medicine at the Univ. of Washington and developer of the yeast two hybrid system writes that protein databases "will need to become much more sophisticated if they are to help scientists make sense of the staggering number of experimental measurements that will soon emerge. ...  protein data will need to be integrated with results from expression profiling, genome- wide mutation or antisense analyses, and polymorphism detection. As proteomic data accumulate, we will become better at triangulating from multiple disparate bits of information to gain a bearing on what a protein does in the cell. S. Fields "Proteomics in Genomeland" Science 291: 1221-1224 Feb. 16, 2001  Related terms: protein identification, protein localization; Expression glossary expression profiling
Protein databases Databases & software directory

protein dynamics: Certain parts of a particular protein will be rigid, but others may be flexible and change their shape, even when bound. ... NMR has the unique ability to characterize protein fluctuations quantitatively, much more so than crystallography can. Understanding the function of a protein is fundamental for gaining insight into many biological processes. Proteins are stable mechanical constructs that allow certain internal motions to enable their biological function. Structural properties of a protein can be obtained with X-ray crystallography or NMR acquisition techniques. Molecular dynamics (MD) simulations at pico/ nano- second time scales output one or more trajectory files which describe the coordinates of each individual atom over time. The main problem with animating these trajectories is one of  temporal scale. Taking large time steps will destroy the impression of smooth motion, while small time steps will result in the camouflage of interesting motions. [Henk Huitema, Robert van Liere " Interactive Visualization of Protein Dynamics" ERCIM [European Research Consortium for Computers and Informatics] News No. 44 - January 2001]  

protein expression mapping: Maps, genetic & genomic 
protein expression profiling: Expression
protein folding problem: Protein structures  See also  protein structure prediction

protein function: The focus of the group is the understanding of protein function and evolution using genomic, structural and proteomic data. Central to this question is the concept of the domain: a structurally conserved, genetically mobile unit. When viewed at the three-dimensional level of protein structure, a domain is a compact arrangement of secondary structures connected by linker polypeptides. It usually folds independently and possesses a relatively hydrophobic core. The importance of domains is that they cannot be divided into smaller units they represent a fundamental building block that can be used to understand the evolution and function of proteins...  The advent of complete genomic sequences, including more and more eukaryotes, is leading to a fundamental change in protein domain analysis. Having characterised most of the domain families and having developed tools to predict them, we can now start to analyse their function and evolution on a higher level. Protein Function Analysis Group, Max Planck Institute for Molecular Genetics, Germany  

Function is not a fixed property for many, if not most proteins. There are many ways that gene products can be altered to elicit modified or completely new functions. For example there are exist - alternative splicing - which may affect as many as ¼ or more of the genes in a higher eukaryote and can alter biochemical function either drastically or subtly, producing truncated proteins and proteins with different compositions - post- translational modification, such as phosphorylation and glycosidation (which can occur on numerous sites on the same protein) - pre-enzymes made for secretion and pro- enzymes that are activated by cleavage - acylation and ubiquitination - non- enzymatic modifications like oxidation, so a given protein exists in the cell in different oxidized states. Defining the Mandate of Proteomics in the Post- Genomics Era, Board on International Scientific Organizations, National Academy of Sciences, 2002

More systematic attempts have been made to place proteins within a hierarchy of standard functional categories or to connect them in overlapping networks of varying types of associations.  These networks can obviously include protein- protein interactions ... More broadly, they can include pathways, regulatory systems and signaling cascades... Perhaps, in the future, the systematic combination of networks may provide for a truly rigorous definition of protein function. Mark Gerstein, et. al "Integrating Interactomes" Science 295 (5553): 284, Jan. 2002   

A biologically useful definition of the function of a protein requires a description at several different levels. To the biochemist, function means the biochemical role of an individual protein: if it is an enzyme, function refers to the reaction catalyzed; if it is a signaling protein, function refers to the interactions that the protein makes. To the geneticist or cell biologist, function includes these roles but will also encompass the cellular roles of the protein, such as the phenotype of its deletion, the pathway in which it operates, among others. A physiologist or developmental biologist may have an even broader view of function, including tissue specificity and expression during the life cycle of the organism. Gregory A Petsko, Dagmar Ringe "Overview: The Structural Basis of Protein Function" from Chapter 2 of Protein Structure and Function: New Science Press, 1991-2001 

In the expanded view of protein function, a protein is defined as an element in the network of its interactions. Various terms have been coined for this expanded notion of function, such as ‘contextual function’ or ‘cellular function’ … Whatever the term, the idea is that each protein in living matter functions as part of an extended web of interacting molecules … Often it is possible to understand the cellular functions of uncharacterized proteins through their linkages to characterized proteins. In broader terms, the networks of linkages offer a new view of the meaning of protein function, and in time should offer a deepened understanding of the function of cells. David Eisenberg et al "Protein function in the post- genomic era" Nature 405: 823- 826, 15 June 2000

The principal problem facing the post- genome era. Walter Blackstock & Malcolm Weir "Proteomics" Trends in Biotechnology: 121-134 Mar 1999   

Related terms: Protein categories interaction proteomics; Functional genomics glossary gene function, Gene OntologyTM ; Maps  cell mapping

protein identification: The analytical method used most commonly to visualize and identify large numbers of proteins is 2D-gel electrophoresis. One can theoretically visualize changes in protein production, both qualitatively and quantitatively, from two individual samples (e.g., a control preparation and a treated preparation). Furthermore, one can potentially accomplish protein identification by "picking" proteins from the 2D- gel and subjecting the highly purified protein to MALDI- TOF mass spectrometry.  "High - Throughput Genomics, CHI Genome Link 14.1    Related term: protein databases

protein informatics: The Protein Informatics Group currently consists of a collaboration between researchers at the Oak Ridge National Laboratory, the University of Missouri, and the University of Georgia. Our common interests are in development of computational tools for solving problems from molecular biology. Our work ranges from construction of mathematical/statistical models to development of algorithms to code implementation on various platforms to applications of computational tools to solve various bio-data analysis problems.  Protein Informatics Group, Computational Biology, Oak Ridge National Lab, US 

Computational biological research has become an essential component of biological research. The great quantity and diversity of the data being generated by different technologies is daunting, and impossible to organize or oversee without computational assistance. In functional genomics, a great deal of effort has been devoted to developing community- based standards for reporting gene expression data to allow others to replicate experiments. The same will need to be done for proteomics to validate across the different technologies. Perhaps never before has a bioinformatics problem of this magnitude been approached. Without effective and integrated databases to store and retrieve these data and advanced computational methods such as pattern recognition and other machine learning approaches to analyze and interpret them, the full implications of these data will not be realized. Defining the Mandate of Proteomics in the Post- Genomics Era, Board on International Scientific Organizations, National Academy of Sciences, 2002

Although mining of protein structure homology data is a relatively small field now, it is likely to experience dramatic growth and to become pivotal in the ultimate exploitation of genomic data and tools.   Related terms: proteoinformatics; Algorithms glossary;  protein bioinformatics; In Silico & molecular modeling glossary

protein interactions: Narrower terms:  protein DNA interactions, protein protein interactions, protein RNA interactions   Related terms: annotation- proteins, binary interaction, interaction proteomics, protein networks; -Omes & -omics glossary interactome

protein interaction mapping: Maps genomic & genetic 
protein linkage maps: Maps genomic & genetic 

protein & mRNA data: Although the relationship between  mRNA and protein levels is vague for individual genes, some of the statistics for broad categories of protein properties are much more robust... In contrast to the differences between mRNA and protein data for individual genes, the broad categories show that the transcriptome and translatome populations are remarkably similar; both contain roughly the same proportions of secondary structure and functional categories. Moreover, this contrasts the difference with the genome, which appears to have a distinctly different composition of functional categories. This illustrates that we get a more consistent picture when we average across the population, i.e. there is broad similarity between the characteristics of highly expressed mRNA and highly abundant proteins.  Dov Greenbaum, Mark Gerstein et. al. "Interrelating Different Types of  Genomic Data" Dept. of Biochemistry and Molecular Biology, Yale Univ. 2001  Related terms: Expression glossary; Genomics glossary genome data; functional genomics data Omes & omics transcriptome, translatome

protein networks: The individual steps in signal transduction pathways involve protein interactions with target molecules that may be other proteins, small molecules or DNA. Identifying all of the proteins that take part in a given class of interactions, on a genome-wide scale, remains an extremely challenging task. We propose to apply mRNA display (1, 2) technology to this problem, with the goal of developing databases of protein-ligand interactions that will add value to the existing and growing sequence databases. PI Jack Szostak, Definition of Protein Networks using mRNA display,  ParaBioSYs, MGH, HMS, BU  

protein sequence: A process that includes the determination of an amino acid sequence of a protein (or peptide, oligopeptide or peptide fragment) and the information analysis of the sequence.  MeSH, 2002  See also amino acid sequence.

protein sequence space:  [J.] Maynard-Smith's (1970. Natural Selection and the concept of a protein space. Nature 225: 563- 564) concept of a "protein sequence space" in which each site in an alignment is represented on its own axis and the number of axes required to represent all conceivable variants for a protein is equal to the number of sites in its sequence. Each sequence occupies a unique point in this space; variants differing at one site are adjacent (Hamming) neighbours. The collection of all viable sequence variants for a particular protein forms a localized interconnected `neighbourhood' of points within the space. This representation has proved conceptually intuitive and analytically powerful  ... In protein sequence space, constraints are reflected in the multidimensional shape of the cluster of points that make up the "neighbourhood" of variants viable for a specific protein. The boundary defining the edge of this neighbourhood is characteristic of the protein's function and can be thought of as its functional "signature".  Gavin JP Naylor, "Measuring Shifts In Function and Evolutionary Opportunity Using Variability Profiles: A Case Study of the Globins" also Journal of Molecular Evolution 51 (3): 223-233 Sept. 2000

protein sorting signals: Amino acid sequences found in transported proteins that selectively guide the distribution of the proteins to specific cellular compartments. MeSH, 2001
Protein Spotlight
, Swiss-Prot   One month, one protein 

Protein Structure Initiative: Aims at determination of the 3D structure of all proteins. This aim can be achieved in four steps: Organize known protein sequences into families;  Select family representatives as targets; Solve the 3D structure of targets by X-ray crystallography or NMR spectroscopy; Build models for other proteins by homology to solved 3D structures.          
Protein Structure Initiative NIGMS

protein structure prediction: Methods for protein structure prediction have matured to the point where models produced by prediction algorithms can be used to understand and test hypotheses about biological function. The goal of this community wide effort is to provide structural and functional insights into biologically important proteins, particularly those that are intractable to experimental structural determination. Ten Most Wanted, Critical Assessment of Techniques for Protein Structure Prediction, CASP,  Lawrence Livermore National Lab, US

Protein 3D structures are encoded by a linear sequence of amino acid residues. To predict 3D structure from sequence is a task challenging enough to have occupied a generation of researchers. Have we finally succeeded? The bad news is: we still cannot predict structure for any sequence. The good news is: we have come closer, and growing databases facilitate the task. A solution of the structure prediction problem would supposedly change experimental molecular biology more than any other theoretical method. We may witness such a break- through in the near future. However, the lessons from the Asilomar prediction contests were that we may need a common frame- work to co- ordinate the efforts of the researchers in the field. "Neural networks for protein structure prediction:  hype or hit? Burkhard Rost, Dec. 1999 

Involves primary sequence alignment, secondary and tertiary structure prediction and homology modellingNarrower term: ab initio protein structure prediction 

protein taxonomy: A Protein Taxonomy Based on Secondary Structure T. Przytycka, R. Aurora, GD Rose, Nature Structural Biology 6 (7): 1999.

protein threading: See threading

proteome informatics: Peer Bork and David Eisenberg, "Genome and proteome informatics" Current Opinion in Structural Biology 10 (3): 341-342, 2000 
Proteome Informatics group
is part of the Swiss Institute of Bioinformatics (SIB). It is in charge of research and development in the fields of bioinformatics, molecular imaging and the use of Internet for biomedical applications.  Current Projects and People, ExPASy, Swiss Institute of Bioinformatics  

proteome map: Maps, genomic & genetic glossary  

proteome mining: We present the development and application of a new machine-learning approach to exhaustively and reliably identify major histocompatibility complex class I (MHC-I) ligands among all 208 octapeptides and in genome-derived proteomes of Mus musculus, influenza A H3N8, and vesicular stomatitis virus (VSV).  Exhaustive Proteome Mining for Functional MHC-I Ligands ACS Chem. Biol.20138 (9), pp 1876–1881 DOI: 10.1021/cb400252t

Related term: proteome database mining

proteomic analysis: Systematic and quantitative analysis of the properties that define protein activity and functions within a defined context, essential for biology and medicine. Ruedi Aebersold quoted in Defining the Mandate of Proteomics in the Post- Genomics Era, National Academies Press, 2002

A systematic analysis of proteins for their identify quantity and function. J Peng and Steven Gygi, Proteomics: the move to mixtures, Journal of Mass Spectrometry 35: 1083- 1091, 2001  

Proteomic Standards Initiative PSI: The HUPO Proteomics Standards Initiative (PSI) defines community standards for data representation in proteomics to facilitate data comparison, exchange and verification.  Proteomic Standards Initiative, HUPO 

putative proteins: Some similarity to one or more existing entries It is in this category that the adjective "putative" comes into play. For these cases, again there is no experimental proof that the protein exists and there is only limited evidence to point the protein to a particular family. Again, we have no fixed rules on what is "limited" and what isn't. It is a judgment that we make based on which family it is and which, if any, areas are conserved. A primer on UniProtKB/Swiss-Prot annotation Name: ANNBIOCH.TXT Release: 54.0 of 24-Jul-2007 

The label ‘putative' is used in the DE [descriptor] line of proteins that exhibit limited sequence similarity to characterised proteins. These proteins often have a conserved site e.g. ATP-binding site but no other significant similarity to a characterised protein. It is most frequently used for sequences from genome projects.  The assignment of the labels ‘probable' and ‘putative' is dependent primarily on the results of sequence similarity searches against SWISS- PROT. It is important to point out here that no specific cut- off point is used to assign a protein as ‘putative' or ‘probable'.  "SWISS- PROT" in Introduction to Molecular Biology Databases, R. Apweiler, R. Lopez, B. Marx, 1999  Related term probable protein (similarity)

quantitation - proteins: It is likely that in the near future, researchers will continue to use comprehensive gene arrays at the start of their work, to generate hypotheses and narrow their research questions. Then, they might delve deeper into these questions by using non-array- based gene expression studies (to get better quantitation and true relative expression) or go to a focused protein array that covers most of proteins that are indicated based on the gene array experiments.  "Proteomewide chips - not so fast" CHI's GenomeLInk 21.2 

regulatory homology: Quantitative analysis of protein expression data obtained by high - throughput methods has led us to define the concept of "regulatory homology" and use it to begin to elucidate the basic structure of gene expression control in vivo. N. Leigh Anderson, Norman G. Anderson "Proteome and proteomics; New technologies, new concepts, and new words" Electrophoresis 19(11):1853-61 August 1998  

regulatory proteins: A detailed understanding of the interplay between regulating proteins and DNA targets is required to interpret transcriptomic data and to model the dynamics of genetic networks. Two key problems in this respect are the control of protein traffic on DNA and the combined effects of several regulating proteins operating on the same target gene.   [International Workshop on Regulatory Proteins Interplay and Traffic on DNA, July 12-13, 2002, Evry, France 

reverse proteomics: In reverse proteomics, the starting point is the DNA sequence of the genome of an organism. First, the transcriptome (complete set of transcripts) and proteome (complete set of proteins) are predicted in silico and subsequently this information is used to generate reagents for their analysis. Marc Vidal, AJ Walhout, "Protein Interaction Maps for Model Organisms" Nature Reviews Molecular Cell Biology 2; 55- 63, Jan. 2001

Compounds can be tested to see if they can disrupt protein - protein interactions - a strategy that may be extremely useful for the development of new drugs. [Wellcome Trust, UK "The Human Genome Functional Genomics"]  

RNA structural genomics: The systematic determination of all macromolecular structures represented in a genome, is focused at present exclusively on proteins. It is clear, however, that RNA molecules play a variety of significant roles in cells, including protein synthesis and targeting, many forms of RNA processing and splicing, RNA editing and modification, and chromosome end maintenance. To comprehensively understand the biology of a cell, it will ultimately be necessary to know the identity of all encoded RNAs, the molecules with which they interact and the molecular structures of these complexes. This report focuses on the feasibility of structural genomics of RNA, approaches to determining RNA structures and the potential usefulness of an RNA structural database for both predicting folds and deciphering biological functions of RNA molecules. [Jennifer A. Doudna "Structural Genomics of RNA" Nature Structural Biology  7 (11) supp: 954-956 (Nov. 2000] 

Rosetta stone method: A way of looking at the correlation of protein domains across species. Some proteins have homologs that are fused in other species, yielding clues as to the proteins with which they might interact. In addition, proteins that have been identified in particular complexes and pathways hint at the location and function of their homologs in other species. S. Spengler “Bioinformatics in the information age” Science 287 (5451): 221- 223 Feb. 18, 2000  Related term: Phylogenomics glossary phylogenetic profiles

sequence homology, amino acid The degree of similarity between sequences of amino acids. This information is useful for the understanding of genetic relatedness of certain species. MeSH, 1993

signal transduction: Metabolic engineering glossary

similarity: Quantity that indicates for example the percentage identical amino acids between two sequences. Similarity is an observed quantity, that might be for example be expressed in percent of residues that are similar between two aligned sequences. Similarity is a bad measure, because it is subjective. The author of the software decides whether Gln and Asp are similar or not. The percentage identity is a much better measure. There is an important difference between similarity and homology. Similarity is a value between 0.0 and 1.0, or between 0 and 100%. On the other hand, there are no degrees of homology. The sequences are either homologous or not.  Center for Molecular and Biomolecular Informatics, Dictionary, Univ. of Nijmegen, Netherlands, 2001 

structural bioinformatics: Involves the process of determining a protein's three- dimensional structure using comparative primary sequence alignment, secondary and tertiary structure prediction methods, homology modeling, and crystallographic diffraction pattern analyses. Currently, there is no reliable de novo predictive method for protein 3D- structure determination. Over the past half- century, protein structure has been determined by purifying a protein, crystallizing it, then bombarding it with X-rays. The X-ray diffraction pattern from the bombardment is recorded electronically and analyzed using software that creates a rough draft of the 3D structure. Biological scientists and crystallographers then tweak and manipulate the rough draft considerably. The resulting spatial coordinate file can be examined using modeling- structure software to study the gross and subtle features of the protein's structure. Christopher Smith "Bioinformatics, Genomics, and Proteomics"  Scientist 14[23]:26, Nov. 27, 2000  Related terms Algorithms In silico & Molecular Modeling.

Structural Biology Industrial Platform: Fifteen companies, including representatives of some of Europe's largest pharmaceutical industries, have formed the Structural Biology Industrial Platform to work with each other, the European Commission and Research Centres in Europe to promote structural biology research, training and development.

structural genomics: Focuses on the physical aspects of the genome through the construction and comparison of gene maps and sequences, as well as gene discovery, localization, and characterization. Brush up on your 'omics, Chemical & Engineering News, 81(49): 20, Dec. 2003 

The fast-developing fields of structural and functional genomics -- studies of proteins encoded by the entire genome -- are being brought to bear on the problem of understanding the root of many cancers. A protein's structure can tell researchers much about its function, information that ultimately is needed to understand a protein's link to cancer. By determining the detailed, three- dimensional structure of proteins, researchers are better able to understand how each protein functions normally and how faulty protein structures can cause disease. David Brand, MacCHESS moves into cancer research through structural genomics, Cornell, 2001 

Involves quickly determining the 3D structures of large numbers of proteins (or other complex biological molecules, such as nucleic acids), ultimately accounting for an organism’s entire proteome. Footnote: As traditionally defined, the term structural genomics referred to the use of sequencing and mapping technologies, with bioinformatic support, to develop complete genome maps (genetic, physical, and transcript maps) and to elucidate genomic sequences for different organisms, particularly humans. Now, however, the term is increasingly used to refer to high- throughput methods for determining protein structures

Many of the criticisms leveled at the Human Genome Project in the mid- 1980’s have been redirected toward structural genomics. Unlike high- throughput genome sequencing, it is not a simple matter to decide when a structural genomics effort has reached completion. SK Burley et al “Structural genomics: beyond the Human Genome Project” Nature Genetics 23: 151 Oct. 1999 Related term: structural proteomics
A good explanation of structural genomics  
Joint Center for Structural Genomics 
Human Proteome/Structural Genomics Pilot Project, Brookhaven National Laboratory, US   A pilot project to examine the feasibility of  high-throughput determination of 3-dimensional structures of proteins by x-ray crystallography, starting from genome sequences.
Human Proteomics Initiative
, Swiss Institute of Bioinformatics, European Bioinformatics Institute   A major project to annotate all known human sequences according to the quality standards of Swiss- Prot. This means providing, for each known protein, a wealth of information that include the description of its function, its domain structure, subcellular location, post- translational modifications, variants, similarities to other proteins, etc. 
Structural Genomics Initiative,
Structural genomics databases Databases & software directory.

structural homology: Identify 3D structures of proteins or domains in the same family as a sequence of interest.  
Related terms: homology Functional genomics glossary
homology modeling Molecular modeling glossary

structural homology protein: The degree of 3-dimensional shape similarity between proteins. It can be an indication of distant AMINO ACID SEQUENCE HOMOLOGY and used for rational DRUG DESIGN. MeSH 2003

structural proteomics: is focused on the determination of three-dimensional (3D) structures of annotated and un-annotated proteins … emerged from the simultaneous developments of rapid and parallel methodologies in gene cloning, protein purification, and 3D structure determination and recent results have demonstrated the feasibility and importance of this approach for functional annotation. The classic work by Zarembinski et al. (1998, PNAS, 95: 15189-15193) represents one possible outcome of the structural analysis of an unknown protein, in which a protein-bound ligand or cofactor was discovered. Such information is the most useful for functional annotation because it identifies the nature of the ligand, the ligand-binding site and the disposition of catalytic residues, from which a catalytic mechanism can be postulated. Other sources of structure-derived information come by identifying structural homologues in databases or local structural motifs or putative catalytic sites. Our strategy involves the use of X-ray crystallography and NMR spectroscopy to determine the structures of hypothetical proteins. Yeast Integrative Biology Project  Genome Canada funded, University of Toronto

Sometimes referred to as structural genomics, this discipline involves determining the 3D structures of large numbers of proteins, ultimately accounting for an organism's entire proteome. It adds critical information in at least two points in the drug discovery pathway: (1) target identification, or selecting a pathway in which a drug might function, and (2) medicinal chemistry, or the actual design of compounds to modulate this pathway. A high-throughput, system wide means of determining gene function. It typically involves using high- throughput X-ray diffraction methods to determine the structure of proteins encoded by at least one member of each gene family in the genome. This approach is coupled with the use of bioinformatics as a tool in structural proteomics and computational modeling to determine structures of other proteins in the same family. Conversely, an important goal of structural proteomics is the creation of databases of structures. When asked to identify bottlenecks in the structural proteomics field, several academic and industry scientists pointed to the need for faster and more reliable protein production and purification strategies, rather than stronger beams at the X-ray crystallization step.  

structure based design:  A design strategy for new chemical entities based on the three- dimensional (3D) structure of the target obtained by X-ray or nuclear magnetic resonance (NMR) studies, or from protein homology models. IUPAC Computational

structure- based drug design:  Structure-based drug design took nearly two decades of multiple, parallel technological improvements to arrive at its current mainstream position in medicinal chemistry. Developments in computer graphics, high-power radiation sources, computational processing power, refinement protocols, virtual screening and crystallography were all necessary to create the environment for rapid, iterative structure-based drug discovery.  Given the crisis facing the pharmaceutical industry in the translation of early stage drug discovery results, a different set of tools, concerned with algorithms and methods for predicting the biological profiles, will need to be refined.  

Structure-Based Drug Design May 21-22, 2014 • Boston, MA Program | Register | 

structure from sequence: See protein structure prediction, structural homology

structure prediction problem:  The protein secondary structure prediction problem has become a classic, challenging problem for the artificial- intelligence and machine learning community. Virtually every conceivable computational technique in these fields (e.g., information theory [6, 12, 13], artificial neural networks [15, 20, 22], cascaded networks [18, 19, 27], hybrid systems [28], nearest neighbor methods [21], hidden markov chains [4], machine learning [17, 25], mutual information [26]) has been applied in the context of protein structure prediction. The reason for this attention is well- founded and clear: If protein structure, even secondary structure, can be accurately predicted from the now abundantly available gene and protein sequences, such sequences become immensely more valuable for the understanding of drug- design, the genetic basis of disease, the role of protein structure in its enzymatic, structural, and signal transduction functions, and basic physiology from molecular to cellular, to fully systemic levels. In short, the solution of the protein structure prediction problem (and the related protein folding problem) will bring on the second phase of the revolution. Peter Munson et. al "Protein Secondary Structure Prediction, NIH, 1994

SWISS- PROT: Databases & software directory

threading: In this approach, a target sequence is “threaded” through a library of 3D folds to try to find a match.  This method is used when no sequence is clearly related to the target sequence.  

virtual genomes: A distributed computing project to use protein design to generate new "virtual genomes."  Our project, Genome@home, studies real genomes and proteins directly, by designing new sequences for existing 3-D protein structures, which come from real genomes. The protein structure files that are sent out as work contain the Cartesian atomic coordinates of a protein. This data was obtained experimentally through X-ray crystallography or NMR techniques. Note that this was not done by us; thousands of scientists have spent decades compiling this data, which is generously made freely available to the public. By designing new sequences that could form these specific protein structures, we're setting the stage to attack a number of significant contemporary issues in structural biology, genetics, and medicine. Vijay Pande, Pande Group Projects, Stanford Univ. US

whole proteom analysis: Proteome analysis has become indispensable and complementary to genomic analysis. With access to whole genome sequences from various organisms and with the imminent completion of many more, the SWISS- PROT group at EBI has developed a research- oriented initiative that utilises many of the existing resources and provides comparative analysis of the predicted protein coding sequences of all complete genomes. Rolf Apweiler "Whole Proteome Analysis: The role of InterPro and CluSTr" Plant & Animal Genome IX, San Diego CA  Jan. 13-17, 2001  

whole proteome interaction mining: A major post- genomic scientific and technological pursuit is to describe the functions performed by the proteins encoded by the genome. One strategy is to first identify the protein- protein interactions in a proteome, then determine pathways and overall structure relating these interactions, and finally to statistically infer functional roles of individual proteins. Although huge amounts of genomic data are at hand, current experimental protein interaction assays must overcome technical problems to scale- up for high- throughput analysis. In the meantime, bioinformatics approaches may help bridge the information gap required for inference of protein function. JR Bock, DA Gough, Whole- proteome interaction mining, Bioinformatics 19(1) :125- 134, Jan. 2003

Joint Center for Structural Genomics Technologies 
Nature Structural Biology, Structural genomics supplement, Nov. 2000 journal/v7/n11s/index.html

Alpha glossary index

How to look for other unfamiliar  terms

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.


Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map