You are here Biopharmaceutical / Genomic glossary homepage/Search > Applications > Structural genomics for biopharmaceuticals 

Structural Genomics Glossary & taxonomy
Evolving Terminology for Emerging Technologies

Comments? Revisions? Suggestions? Mary Chitty  mchitty@healthtech.com
Last revised March 28, 2008 

View a Printer-Friendly Version of this Web Page!


New Page 1

Please register for CHI's Genomics Glossaries & Taxonomies website. This sign-in box with then disappear from each page, if you accept cookies. Use of this site will continue to be free, but better demographic data on who is accessing this material helps us to justify the expense of maintaining this resource. Registration policy has details.

Registered users of the Genomics Glossaries & Taxonomies will automatically be signed up for CHI's complimentary email monthly newsletter, GenomeLink, unless you choose to opt out of receiving it.

Mr.     Ms.     Mrs.     Dr.     Prof.

First:

         

Last:

Title:

Dept.:

Company:

Address:

City:

State:

Zip:

Country:

Email:

Opt-out of Email

YES    NO

Telephone:

Would you like to receive CHI event updates via fax? 
Yes       No 

Fax:


The determination that the human genome comprises only approximately 35,000 genes - not 60,000 to 100,000 as previously thought - has  directed even more attention to the role of proteins and, therefore, to the field of structural genomics.  One goal of this field is to reveal the structures of all the key “functional” sites of any human protein, information that should make it much easier to develop highly specific drugs, thus leading to more effective, and safer, pharmaceuticals. 

Applications Map   Finding guide to terms in these glossaries   Site Map 
Applications Functional Genomics   Proteomics 
Informatics Algorithms
  In silico & Molecular Modeling
Technologies Mass spectrometry   NMR & X-Ray Crystallography  
Biology
Protein Structures 
  Proteins

ab initio: From the beginning (Latin)

ab initio modeling: In silico & molecular Modeling glossary

ab initio protein modeling: Predict 3D structure from sequence without using a homologous model/ template; this technology is not at the stage of being broadly applicable to drug discovery. [CHI Structural proteomics report]

Ab initio methods use the physiochemical properties of the amino acid sequence of a protein to literally calculate a 3D structure (lowest energy model) based on protein folding. As opposed to determining the structure of an entire protein, ab initio methods are typically used to predict and model protein folds (domains). This method is gaining considerably, in part due to the development of novel mathematical approaches, a boost in available computational resources (for example, tera- and pentaFLOPS supercomputers), and considerable interest from researchers investigating protein- ligand (or drug) interactions.   [Christopher Smith "Bioinformatics, Genomics, and Proteomics"  Scientist 14[23]:26, Nov. 27, 2000] http://the-scientist.com/yr2000/nov/profile_001127.html

Related terms protein structure prediction 

ab initio structure prediction: Prediction of a protein’s structure based on amino acid sequence alone — that is, without mapping the structure to structures of known sequences. 

Broader term: protein structure prediction (compared with ab initio).  Narrower term (compared with structure prediction)

atomic resolution data: NMR & X-ray crystallography

biological function: Functional genomics glossary

CASP Critical Assessment of  Techniques for Protein Structure Alignment [Protein Structure Prediction Center, Lawrence Livermore National Lab, US]  http://predictioncenter.llnl.gov/  Links to CASP meetings results and information on "Ten most wanted" proteins solicitation.

comparative modeling: See homology modeling.

evolutionary homology: Functional genomics glossary

fold alignment: A critical step in homology modeling, because it provides the key structures for the model.  If suitably matched folds cannot be identified, a type of fold assignment known as protein threading can be used. 

fold recognition: Methods of protein fold recognition attempt to detect similarities between protein 3D structure that are not accompanied by any significant sequence similarity. There are many approaches, but the unifying theme is to try and find folds that are compatible with a particular sequence. Unlike sequence- only comparison, these methods take advantage of the extra information made available by 3D structure information. In effect, the turn the protein folding  problem on it's head: rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence. Robert B. Russell, Guide to Structure Prediction "Fold recognition methods and links" Sept. 1999 http://www.bmm.icnet.uk/people/rob/CCP11BBS/foldrec.html

Related terms threading; Protein structure glossary. protein folding, protein folds 

foldedness: Methods for analyzing "foldedness" of expressed proteins include NMR and circular dichroism spectroscopies.

granularity: Computers & computing glossary

Hidden Markov Models HMM: In silico & Molecular modeling glossary

homology model: A model of a protein, whose three-dimensional structure is unknown, built from, e.g., the X-ray coordinate data of similar proteins or using alignment techniques and homology arguments.  [IUPAC Computational]  

Related terms:  Functional genomics glossary homology; Sequencing glossary alignment

homology modeling: This procedure, also termed comparative modeling or knowledge-based modeling, develops a three-dimensional model from a protein sequence based on the structures of homologous proteins. ... Care must be used in applying the term, "homology modeling." In fact, as noted above some authors prefer alternative names for the procedure. One must recognize that homology does not necessarily imply similarity. Homology has a precise definition: having a common evolutionary origin [6,7]. Thus, homology is a qualitative description of the nature of the relationship between two or more things, and it cannot be partial. Either there is an evolutionary relationship or there is not. An assertion of homology usually must remain an hypothesis. Supporting data for a homologous relationship may include sequence or three-dimensional similarities, the relationships between which can be described in quantitative terms.  David R. Bevan, Molecular Modeling of Proteins and Nucleic Acids, Dept. of Biochemistry, Virginia Tech, 1997-2003     http://www.biochem.vt.edu/modeling/homology.html 

A computational method for determining the structure of a protein based on its similarity to known structures. The accuracy of structures determined by homology modeling depends largely on the amount of homology between the unknown and the known protein sequence.  

The most successful tool for prediction of protein structure from sequence, but with significant room for improvement.   

CMBI Homology Modelling Course http://www.cmbi.kun.nl/gvteach/hommod/index.shtml Center for Molecular and Biomolecular Informatics, Univ. of Nijmegen, Netherlands, 2001. Dictionary http://www.cmbi.kun.nl/gvteach/dictionary.shtml  45 definitions.

Related terms: structural homology;  Sequencing glossary sequence homology; Proteins glossary hypothetical protein; In silico & Molecular Modeling  Compare with similarity

NIGMS National Institute of General Medical Sciences: Part of NIH, supports biomedical research not targeted to specific diseases or disorders. Divisions of Cell Biology and Biophysics; Genetics and Developmental Biology; and Pharmacology, Physiology, and Biological Chemistry support research   http://www.nigms.nih.gov/  

NIGMS Structural Genomics Initiatives http://www.nigms.nih.gov/funding/psi.html

pharmacophore: Pharmaceutical biology glossary

protein folding problem: See protein structure prediction

protein informatics: Proteomics glossary

protein production: A major bottleneck and challenge in structural genomics.

protein sequence space:  [J.] Maynard-Smith's (1970. Natural Selection and the concept of a protein space. Nature 225: 563- 564) concept of a "protein sequence space" in which each site in an alignment is represented on its own axis and the number of axes required to represent all conceivable variants for a protein is equal to the number of sites in its sequence. Each sequence occupies a unique point in this space; variants differing at one site are adjacent (Hamming) neighbours. The collection of all viable sequence variants for a particular protein forms a localized interconnected `neighbourhood' of points within the space. This representation has proved conceptually intuitive and analytically powerful  ... In protein sequence space, constraints are reflected in the multidimensional shape of the cluster of points that make up the "neighbourhood" of variants viable for a specific protein. The boundary defining the edge of this neighbourhood is characteristic of the protein's function and can be thought of as its functional "signature".  [Gavin JP Naylor, "Measuring Shifts In Function and Evolutionary Opportunity Using Variability Profiles: A Case Study of the Globins" also Journal of Molecular Evolution 51 (3): 223-233 Sept. 2000] http://bioinfo.mbb.yale.edu/e-print/protspace-jme/text.pdf

Protein Structure Initiative: Aims at determination of the 3D structure of all proteins. This aim can be achieved in four steps: Organize known protein sequences into families;  Select family representatives as targets; Solve the 3D structure of targets by X-ray crystallography or NMR spectroscopy; Build models for other proteins by homology to solved 3D structures.  http://www.structuralgenomics.org/                         

protein structure prediction: Methods for protein structure prediction have matured to the point where models produced by prediction algorithms can be used to understand and test hypotheses about biological function. The goal of this community wide effort is to provide structural and functional insights into biologically important proteins, particularly those that are intractable to experimental structural determination. Ten Most Wanted, Critical Assessment of Techniques for Protein Structure Prediction, CASP,  Lawrence Livermore National Lab, US http://predictioncenter.llnl.gov/

Involves primary sequence alignment, secondary and tertiary structure prediction and homology modelling.

Protein 3D structures are encoded by a linear sequence of amino acid residues. To predict 3D structure from sequence is a task challenging enough to have occupied a generation of researchers. Have we finally succeeded? The bad news is: we still cannot predict structure for any sequence. The good news is: we have come closer, and growing databases facilitate the task. A solution of the structure prediction problem would supposedly change experimental molecular biology more than any other theoretical method. We may witness such a break- through in the near future. However, the lessons from the Asilomar prediction contests were that we may need a common frame- work to co- ordinate the efforts of the researchers in the field. "Neural networks for protein structure prediction:  hype or hit? Burkhard Rost, Dec. 1999 http://www.embl-heidelberg.de/~rost/Papers/pre1999_tics/paper.html
Narrower term: ab initio protein structure prediction Related terms: In silico & Molecular Modeling glossary

protein structure, primary, secondary, tertiary and quaternary: Protein Structure glossary.

protein threading: See threading.

RNA structural genomics: The systematic determination of all macromolecular structures represented in a genome, is focused at present exclusively on proteins. It is clear, however, that RNA molecules play a variety of significant roles in cells, including protein synthesis and targeting, many forms of RNA processing and splicing, RNA editing and modification, and chromosome end maintenance. To comprehensively understand the biology of a cell, it will ultimately be necessary to know the identity of all encoded RNAs, the molecules with which they interact and the molecular structures of these complexes. This report focuses on the feasibility of structural genomics of RNA, approaches to determining RNA structures and the potential usefulness of an RNA structural database for both predicting folds and deciphering biological functions of RNA molecules. [Jennifer A. Doudna "Structural Genomics of RNA" Nature Structural Biology  7 (11) supp: 954-956 (Nov. 2000] http://www.euchromatin.org/Doudna1.htm 

signal transduction: Metabolic engineering glossary

similarity: Quantity that indicates for example the percentage identical amino acids between two sequences. Similarity is an observed quantity, that might be for example be expressed in percent of residues that are similar between two aligned sequences. Similarity is a bad measure, because it is subjective. The author of the software decides whether Gln and Asp are similar or not. The percentage identity is a much better measure.

There is an important difference between similarity and homology. Similarity is a value between 0.0 and 1.0, or between 0 and 100%. On the other hand, there are no degrees of homology. The sequences are either homologous or not.  Center for Molecular and Biomolecular Informatics, Dictionary, Univ. of Nijmegen, Netherlands, 2001 http://www.cmbi.kun.nl/gvteach/dictionary.shtml 

structural bioinformatics: Involves the process of determining a protein's three- dimensional structure using comparative primary sequence alignment, secondary and tertiary structure prediction methods, homology modeling, and crystallographic diffraction pattern analyses. Currently, there is no reliable de novo predictive method for protein 3D- structure determination. Over the past half- century, protein structure has been determined by purifying a protein, crystallizing it, then bombarding it with X-rays. The X-ray diffraction pattern from the bombardment is recorded electronically and analyzed using software that creates a rough draft of the 3D structure. Biological scientists and crystallographers then tweak and manipulate the rough draft considerably. The resulting spatial coordinate file can be examined using modeling- structure software to study the gross and subtle features of the protein's structure. Christopher Smith "Bioinformatics, Genomics, and Proteomics"  Scientist 14[23]:26, Nov. 27, 2000  http://the-scientist.com/yr2000/nov/profile_001127.html  

Related terms Algorithms In silico & Molecular Modeling.

Structural Biology Industrial Platform: Fifteen companies, including representatives of some of Europe's largest pharmaceutical industries, have formed the Structural Biology Industrial Platform to work with each other, the European Commission and Research Centres in Europe to promote structural biology research, training and development. http://www.sbip.org/

structural genomics: Focuses on the physical aspects of the genome through the construction and comparison of gene maps and sequences, as well as gene discovery, localization, and characterization. Brush up on your 'omics, Chemical & Engineering News, 81(49): 20, Dec. 2003 http://pubs.acs.org/cen/coverstory/8149/8149genomics1.html 

The fast-developing fields of structural and functional genomics -- studies of proteins encoded by the entire genome -- are being brought to bear on the problem of understanding the root of many cancers. A protein's structure can tell researchers much about its function, information that ultimately is needed to understand a protein's link to cancer. By determining the detailed, three- dimensional structure of proteins, researchers are better able to understand how each protein functions normally and how faulty protein structures can cause disease. David Brand, MacCHESS moves into cancer research through structural genomics, Cornell, 2001  http://www.news.cornell.edu/Chronicle/01/2.22.01/MacCHESS.html 

Involves quickly determining the 3D structures of large numbers of proteins (or other complex biological molecules, such as nucleic acids), ultimately accounting for an organism’s entire proteome. Footnote: As traditionally defined, the term structural genomics referred to the use of sequencing and mapping technologies, with bioinformatic support, to develop complete genome maps (genetic, physical, and transcript maps) and to elucidate genomic sequences for different organisms, particularly humans. Now, however, the term is increasingly used to refer to high- throughput methods for determining protein structures

Many of the criticisms leveled at the Human Genome Project in the mid- 1980’s have been redirected toward structural genomics. Unlike high- throughput genome sequencing, it is not a simple matter to decide when a structural genomics effort has reached completion. SK Burley et al “Structural genomics: beyond the Human Genome Project” Nature Genetics 23: 151 Oct. 1999 

Related term: structural proteomics

A good explanation of structural genomics   Joint Center for Structural Genomics http://www.jcsg.org/help/robohelp/Definitions/Structural_Genomics.htm 

Structural genomics project links
Human Proteome/Structural Genomics Pilot Project, Brookhaven National Laboratory, US  http://www.proteome.bnl.gov/   A pilot project to examine the feasibility of  high-throughput determination of 3-dimensional structures of proteins by x-ray crystallography, starting from genome sequences.

Human Proteomics Initiative, Swiss Institute of Bioinformatics, European Bioinformatics Institute http://us.expasy.org/sprot/hpi/   A major project to annotate all known human sequences according to the quality standards of Swiss- Prot. This means providing, for each known protein, a wealth of information that include the description of its function, its domain structure, subcellular location, post- translational modifications, variants, similarities to other proteins, etc.

 Effort to annotate, describe a distribute to the life science community a large amount of highly curated information concerning human protein sequences

Structural Genomics Initiative, NIGMS, US  http://www.nigms.nih.gov/funding/psi.html

Structural genomics databases Databases & software directory.

structural genomics technologies: NMR & X-Ray Crystallography

structural homology: Identify 3D structures of proteins or domains in the same family as a sequence of interest.  

Related terms: homology Functional genomics glossary homology modeling Molecular modeling glossary

structural homology protein: The degree of 3-dimensional shape similarity between proteins. It can be an indication of distant AMINO ACID SEQUENCE HOMOLOGY and used for rational DRUG DESIGN. [MeSH 2003]

structural proteomics: Often referred to as structural genomics, this discipline involves determining the 3D structures of large numbers of proteins, ultimately accounting for an organism's entire proteome. It adds critical information in at least two points in the drug discovery pathway: (1) target identification, or selecting a pathway in which a drug might function, and (2) medicinal chemistry, or the actual design of compounds to modulate this pathway. 

A high-throughput, system wide means of determining gene function. It typically involves using high- throughput X-ray diffraction methods to determine the structure of proteins encoded by at least one member of each gene family in the genome. This approach is coupled with the use of bioinformatics as a tool in structural proteomics and computational modeling to determine structures of other proteins in the same family. Conversely, an important goal of structural proteomics is the creation of databases of structures. [CHI Target Validation report]

When asked to identify bottlenecks in the [structural proteomics] field, several academic and industry scientists pointed to the need for faster and more reliable protein production and purification strategies, rather than stronger beams at the X-ray crystallization step.  

structure from sequence: See protein structure prediction, structural homology

structure prediction problem:  The protein secondary structure prediction problem has become a classic, challenging problem for the artificial- intelligence and machine learning community. Virtually every conceivable computational technique in these fields (e.g., information theory [6, 12, 13], artificial neural networks [15, 20, 22], cascaded networks [18, 19, 27], hybrid systems [28], nearest neighbor methods [21], hidden markov chains [4], machine learning [17, 25], mutual information [26]) has been applied in the context of protein structure prediction. The reason for this attention is well- founded and clear: If protein structure, even secondary structure, can be accurately predicted from the now abundantly available gene and protein sequences, such sequences become immensely more valuable for the understanding of drug- design, the genetic basis of disease, the role of protein structure in its enzymatic, structural, and signal transduction functions, and basic physiology from molecular to cellular, to fully systemic levels. In short, the solution of the protein structure prediction problem (and the related protein folding problem) will bring on the second phase of the revolution. [Peter Munson et. al "Protein Secondary Structure Prediction, NIH, 1994] http://abs.cit.nih.gov/reprints/text3.html

target identification: Targets glossary

threading: In this approach, a target sequence is “threaded” through a library of 3D folds to try to find a match.  This method is used when no sequence is clearly related to the target sequence.  

toxicoproteomics: Proteomics glossary

Bibliography
CHI, Structural Proteomics: High-Throughput Approaches Fuel Drug Discovery and Development, Cambridge Healthtech Institute, Malorye Branca, Allan Haberman, Deidre Lockwood  2001 

Joint Center for Structural Genomics Technologies http://www.jcsg.org/scripts/prod/technologies1.html 

Nature Structural Biology, Structural genomics supplement, Nov. 2000 

Alpha glossary index

How to look for other unfamiliar  terms

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.

 

Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map