|
Applications Map
Finding guide to terms in these glossaries Site
Map
Applications Drug targets Molecular
Diagnostics
Informatics Algorithms
Bioinformatics Cheminformatics
Drug discovery Informatics
Genomic
Informatics Ontologies &
Taxonomies
Technologies Protein Technologies
Mass spectrometry
NMR & X-Ray Crystallography
Metabolic
engineering glossary
Biology Protein Structures
Proteins Functional
Genomics Proteomics
ab initio:
From the beginning (Latin)
ab initio
protein modeling:
Predict
3D structure from sequence without using a homologous model/ template; this
technology is not at the stage of being broadly applicable to drug discovery.
CHI Structural
proteomics report
Ab initio
methods use the physiochemical properties of the amino acid sequence of
a protein to literally calculate a 3D structure (lowest energy model) based
on protein folding. As opposed to determining the structure of an entire
protein,
ab initio methods are typically used to predict and model
protein folds (domains). This method is gaining considerably, in part due
to the development of novel mathematical approaches, a boost in available
computational resources (for example, tera- and pentaFLOPS supercomputers),
and considerable interest from researchers investigating protein- ligand
(or drug) interactions. Christopher Smith "Bioinformatics,
Genomics, and Proteomics" Scientist 14[23]:26, Nov. 27, 2000 http://the-scientist.com/yr2000/nov/profile_001127.html
Related
terms protein structure prediction
ab initio protein
structure prediction:
Prediction of
a protein’s structure based on amino acid sequence alone — that is, without
mapping the structure to structures of known sequences.
Broader term: protein structure prediction
(compared
with ab initio). Narrower term (compared with structure prediction)
ab initio
quantum mechanical methods:
Methods of quantum
mechanical calculations independent of any experiment other than the determination
of fundamental constants. The methods are based on the use of the
full Schrödinger equation to treat all the electrons of a chemical
system. In practice, approximations are necessary to restrict the complexity
of the electronic wave function and to make its calculation possible. (Synonymous
with non- empirical quantum mechanical methods.) IUPAC Computational
ab initio
quantum mechanical modeling: The application
of ab initio modelling cross diverse fields such as condensed matter
physics, materials science and chemistry has been demonstrated over the past 10 years.
... The recent completion of the Human Genome Project will offer an unprecedented
number of protein receptors and enzymes as targets for pharmacological
intervention in disease processes. However, before this wealth of information
can be used to develop pharmaceuticals, an understanding of the biochemistry
of the newly identified proteins and their interactions must be obtained.
First principles quantum mechanical modelling will play an important role
in this process. [Matthew
Segall, Ursula Röthlisberger, Paolo Carloni, CECAM/Psi-k Workshop: Ab Initio
Modelling in the Biological Sciences Lyon, France 11-13 June 2001] http://www.tcm.phy.cam.ac.uk/~mds21/Workshop2001/
Scientific/node1.html#SECTION00010000000000000000
annotation- proteins:
Macromolecular structure determination is moving from a
functionally driven initiative to include a genomically driven initiative
(structural genomics) where structures are determined based on what is known
from a target sequence alone. The resultant structures may be structurally and
functionally uncharacterized. Systematic Protein Annotation and Modeling (SPAM)
is a multi-institutional initiative to make better use of target sequences and
structures. SPAM includes new algorithms for the study of sequence-sequence,
sequence-structure and structure-structure and deployment of these methods in
resources available to the community Systematic Protein Annotation and
Modeling Skaggs School of Pharmacy and Pharmaceutical Sciences and the San
Diego Supercomputer Center (SDSC) at the University of California San Diego
(UCSD), the Keck Graduate Institute (KGI) and the Burnham Institute (TBI). http://spam.sdsc.edu/about.html
annotation
protein - dictionary-driven
For
many years, computational methods seeking to automatically determine the
properties (functional, structural, physiochemical, etc.) of a protein directly
from sequence have been the focus of numerous research groups, including ours.
By general admission, this is a difficult problem and the methods that have been
proposed over the years typically concentrated on the analysis of individual
genes. With the advent of advanced sequencing methods and systems, the number of
amino acid sequences and fragments being deposited in the public databases has
been increasing steadily. This in turn generated a renewed demand for automated
approaches that can quickly, exhaustively and objectively annotate individual
sequences as well as complete genomes. In this paper, we present one such
approach. The approach is centered around and exploits the Bio- Dictionary, an
exhaustive collection of amino acid patterns (referred to as seqlets)
that completely covers the natural sequence space of proteins to the extent that
this space is sampled by the currently available public databases. Isidore Rigoutsos,
Tien Huynh, Laxmi P. Parida, Daniel E. Platt, Aris Floratos, Dictionary
Driven Protein Annotation, Nucleic Acids Research, 30 (no 17) 3901- 3916,
2002
candidate proteins:
NIGMS (part of NIH) is supporting
research on identifying candidate proteins and their genes, including those that
cause variations in human drug metabolism, transport, distribution, and
excretion (for both small organic molecules and macromolecular drugs such as
peptides and oligonucleotides), that may play a role in determining individual
variations in drug responses and candidate proteins and their genes, including
those that are direct targets for drug action (e.g., receptors, enzymes,
signal transducing molecules, regulatory factors), that may play a role in
determining individual variations in drug responses. National Institute of
General Medical Sciences, Recommendations of the NIGMS Working Group --
Understanding Individual Variations in Drug Responses: From Phenotype to
Genotype , June 9-10, 1998, Bethesda MD http://www.nigms.nih.gov/news/reports/pharmacogenetics.html
Related
terms: Pharmacogenomics
CASP
Critical Assessment of Techniques for Protein Structure
Alignment Protein Structure Prediction Center, Lawrence Livermore National
Lab, US http://predictioncenter.llnl.gov/
Links to CASP meetings results and information on "Ten most wanted"
proteins solicitation.
comparative modeling: See homology
modeling
comparative proteomics:
The
C. elegans proteome was used
as an alignment template to assist in novel human gene identification …
Among the available 18,452 C. elegans
protein sequences, our results
indicate that at least 83% had human homologous genes, with 7954 records
of C. elegans proteins matching known human gene transcripts. [CH
Lai et al "Identification of Novel Human Genes Evolutionarily Conserved
in Caenorhabditis elegans by Comparative Proteomics" Genome Research
10(5): 703-713 May 2000] Related terms Functional
Genomics glossary comparative genomics, evolutionary
genomics.
computational
biophysics: Activities of the Theoretical and Computational Biophysics
Group center on the structure and function of supramolecular systems in the
living cell, and on the development of new algorithms and efficient computing
tools for structural biology. The Resource brings the most advanced
molecular modeling, bioinformatics, and computational technologies to bear on
questions of biomedical relevance. Theoretical and Computational Biophysics
Group, Univ. of Illinois Urbana Champaign, About the Group http://www.ks.uiuc.edu/Overview/intro.html Our
research focuses on the modeling of large macromolecular systems in realistic
environments. These efforts have produced insight into biomolecular processes
coupled to mechanical force, bioelectronic processes in metabolism and vision,
and the function and mechanism of membrane proteins. Theoretical and
Computational Biophysics Group, Univ. of Illinois Urbana Champaign,
Emerging Studies, http://www.ks.uiuc.edu/Research/Recent/
computational proteomics: Large- scale generation and analysis of 3D
and 4D protein structural information and the application of structural
knowledge across all life science disciplines. [Edward T. Maggio, Kal Ramnarayan
"Recent developments in computational proteomics" Trends in
Biotechnology 19 (7): 266- 272 July 2001] Google - about 1,290
Sept. 10, 2003
contextual
data:
While proteomic studies
initially focused largely on expression and protein identification, progress in
these areas drove the demand for more detailed types of proteomic data. Now
researchers want information about where specific proteins are expressed, both
in terms of tissues and localization within the cell. Information relating
proteins to function require additional details of post- translational
modification, and studies of protein interactions have moved beyond just looking
at binary interactions to studies of protein complexes. For both genomics and proteomics, this
shift can be characterized as an interest in more contextual data. Enhanced
insight into biological context is essential for obtaining a better
understanding of how biology actually works, and thus there is now an emphasis
to move from genomic and proteomic snapshots to time series data of expression.
Such context is of particular value if biological studies are to be translated
into medical advances, because of the importance of being able to predict the
impact of potential treatments. The integration of genomic and proteomic data
with medical conditions, treatment and outcomes becomes another critical type of
contextual information. Christina Lingham, Beyond Genome: Thinking Globally,
Cambridge Healthtech
designer proteins:
Protein design is currently used for the
creation of new proteins with desirable traits. In our lab, we focus on the
synthesis of proteins with high essential amino acid content having potential
applications in animal nutrition. One of the limitations we face in this
endeavour is achieving stable proteins despite a highly biased amino acid
content. We report here the synthesis and characterisation of two mutants
derived from our MB-1 designer protein. Williams M, Gagnon MC, Doucet A,
Beauregard M, "Design
of high essential amino acid proteins: two design strategies for improving
protease resistance of the nutritious MB-1 protein" Journal of
Biotechnology 94(3): 245- 254, Apr. 11, 2002
Designer
proteins, Scripps http://mgl.scripps.edu/people/goodsell/pdb/pdb70/pdb70_1.html
Designer proteins can also refer to high- protein
nutritional supplements.
differential
proteomes: Google - about 169
Sept. 10, 2003, about 38 Apr 10 2007
differential
proteomics:
Differential proteomics makes qualitative and quantitative
comparisons of proteomes under different conditions. This knowledge enables us
to unravel the mysteries of biological processes. Genencor Proteome and Tools http://www.genencor.com/cms/connect/genencor/technology/protein_chemistry/proteomics/proteome_and_tools/proteome...
Google
= about 14,000 Apr 10, 2007
differential subproteomes: As defined
by relative solubilities, cellular location and narrow-range immobilised pH
gradients. . SJ Cordwell, AS Nouwens, NM Verrills, DJ Basseal, BJ Walsh,
Subproteomics based upon protein cellular location and relative solubilities in
conjunction with composite two-dimensional electrophoresis gels,
Electrophoresis, 21(6): 1094- 103, April 2000 Google = about 4 Nov 5, 2005;
about 3 Apr 10, 2007
Broader terms: subproteomes, subproteomics
docking: Three-
dimensional molecular structure is one of the
foundations of structure- based drug design. Often, data are available
for the shape of a protein and a drug separately, but not for the two together.
The program AutoDock was originally written in FORTRAN-77 in 1990 by David
S. Goodsell here in Arthur J. Olson's laboratory. It performs automated
docking of ligands (small molecules like a candidate drug) to their macromolecular
targets (usually proteins, sometimes DNA) Garrett B. Morris, “Molecular
docking web”, Scripps, Dec. 2000 http://www.scripps.edu/pub/olson-web/people/gmm/index.html
Wikipedia http://en.wikipedia.org/wiki/Docking_%28molecular%29
Narrower term: pharmacophore based docking
docking programs:
Programs for evaluating lead compounds against
target proteins; these programs are “informed” by structure data. Traditional ligand- docking programs - such as DOCK, developed by Irwin
Kuntz at the University of California at Berkeley; MacroModel, developed
by Clark Still at Columbia University; and GOLD from MSI (now part of
Pharmacopeia) - give information about potential ligands for a known protein structure.
These programs select molecules predicted to be highly complementary to
the receptor structure and can screen many of these ligands against the
protein. This type of virtual screening technology has already been incorporated into many
major pharmaceutical companies’ discovery programs and offers the ability
to screen many more compounds at once than the traditional laboratory- based
method. CHI Structural
proteomics report
docking studies:
Computational techniques for the exploration
of the possible binding modes of a substrate to a given receptor, enzyme
or other binding site. IUPAC Computational Related terms: drug design, QSAR
domain shuffling:
Creating new proteins by bringing domains together.
It is thought that this is a major way that new proteins have arisen during
evolution. Thus, mining of databases for homology by domains, rather than
by whole proteins (which are not as evolutionarily conserved), is important
in obtaining clues to functionality.
A protein
sequence can have more than one domain. Related term: multi- domain proteins.
energy function:
Computationally, a shape is assigned to a protein
sequence based on an empirical energy function. The lower the energy of
a given structure, the more likely it is to be the correct fold. The structure
prediction challenge is therefore divided into two: (1) The first challenge
is the creation of many plausible folds or a set of structures that will
include the native shape. The creation of the appropriate set depends on
existing databases (such as the Protein Data Bank) or on the design of
automated algorithms (using physical or statistical information) to generate
plausible folds. Once the set is available, a selection procedure is used
to ``fish'' out the correct fold. (2) The ``fishing'' of the plausible
native shapes critically depends on the quality of the energy function.
The value of the energy function must be the lowest for the native structure.
Opportunities in Molecular Biomedicine in the Era of
Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource
for Macromolecular Modeling and Bioinformatics Beckman Institute
for Advanced Science and Technology, University of Illinois at Urbana- Champaign
fold alignment:
A critical step in homology modeling,
because it provides the key structures for the model. If suitably
matched folds cannot be identified, a type of fold assignment known as
protein threading can be used.
fold recognition:
Methods of protein fold recognition attempt
to detect similarities between protein 3D structure that are not accompanied
by any significant sequence similarity. There are many approaches, but
the unifying theme is to try and find folds that are compatible with a
particular sequence. Unlike sequence- only comparison, these methods take
advantage of the extra information made available by 3D structure information. In effect,
the turn the protein folding problem on it's head: rather than predicting
how a sequence will fold, they predict how well a fold will fit a sequence.
Robert B. Russell, Guide to Structure Prediction "Fold recognition
methods and links" Sept. 1999 http://www.sbg.bio.ic.ac.uk/people/rob/CCP11BBS/foldrec.html
Related terms threading; Protein
structure. protein folding, protein folds
foldedness:
Methods for analyzing "foldedness" of expressed
proteins include NMR and circular dichroism
spectroscopies.
functional proteomics: As the emerging field
of proteomics continues to expand at an extremely rapid rate, the relative
quantification of proteins, targeted by their function, becomes its greatest
challenge. Complex analytical strategies have been designed that allow
comparative analysis of large proteomes, as well as in depth detection of the
core proteome or the interaction network of a given protein of interest.
Functional Proteomics, Methods in Molecular Biology 2008 http://www.springer.com/new+%26+forthcoming+titles+(default)/book/978-1-58829-971-0
Is yielding large databases of interacting proteins and extensive pathways.
Maps of these interactions are being scored and deciphered by novel high
throughput technologies. However, traditional methods of screening have not been
very successful in identifying protein- protein interaction inhibitors.
Google = about 8,480
Feb. 4, 2004 See also activity based protein
profiling
Hidden Markov Models HMM:
Searching a protein sequence database
for homologues is a powerful tool for discovering the structure and function
of a sequence. Amongst the algorithms and tools available for this task,
Hidden Markov model (HMM) - based search methods improve both the sensitivity
and selectivity of database searches by employing position- dependent scores
to characterize and build a model for an entire family of sequences. HMMs have been used to analyze proteins using two complementary strategies.
In the first, a sequence is used to a search a collection of protein families,
such as Pfam, to find which of the families it matches. In the second approach
an HMM for a family is used to search a primary sequence database to identify
additional members of the family. The latter approach has yielded insights
into protein involved in both normal and abnormal human pathology. Lawrence Berkeley Lab, US "Advanced
Computational Structural Genomics" http://cbcg.lbl.gov/ssi-csb/Meso.html
A widely used probabilistic model for data that are observed in a sequential
fashion (e.g., over time). A HMM makes two primary assumptions. The first
assumption is that the observed data arise from a mixture of K
probability distributions. The second assumption is that there is a discrete-
time Markov chain with K states, which is generating the observed
data by visiting the K distributions in Markov fashion. The
"hidden" aspect of the model arises from the fact that the state-
sequence is not directly observed. Instead, one must infer the state-
sequence from a sequence of observed data using the probability model. Although
the model is quite simple, it has been found to be very useful in a variety of
sequential modeling problems, most notably in SPEECH
RECOGNITION (Rabiner 1989) and more recently in other disciplines such as computational
biology (Krogh et al. 1994). MITECS Online MIT Encyclopedia of the
Cognitive Sciences http://cognet.mit.edu/MITECS/Entry/pearl.html
Related term: simulated annealing
homeomorphic superfamilies:
Protein families are clustered into "homeomorphic superfamilies". Sequences are homeomorphic if they can be aligned from
end- to- end. In practice, we allow the amino and carboxyl ends to be ragged and moderate internal length variations (represented as
gaps in the sequences). However, all members of the superfamily should have the same overall domain architecture, i.e., the same
domains in the same order (except for domains missing due to alternative splicing or very recent genetic events). It is assumed, although in most cases this has not been investigated in detail, that the molecules in a homeomorphic superfamily share a common evolutionary history since the acquisition of their constituent domains. Thus, it should be valid to construct an evolutionary tree from the members of a homeomorphic superfamily. If two groups of proteins with the same architecture are shown to have come to that structure independently, they are appropriately separated into two homeomorphic superfamilies.
PIR Classification Terminology, Georgetown Univ, revised 1998 http://pir.georgetown.edu/pirwww/aboutpir/doc/short_sf_def.html homology:
Functional genomics
homology domains:
Many types of domains have been found in diverse proteins. In common use, the term "immunoglobulin superfamily" refers to the collection of all proteins that contain an
immunoglobulin- like domain. We call such a group a "homology domain superfamily". Any given protein sequence will be assigned to only one homeomorphic superfamily, but it may contain sequence segments belonging to several homology domain superfamilies.
PIR Classification Terminology, Georgetown Univ, revised 1998 http://pir.georgetown.edu/pirwww/aboutpir/doc/short_sf_def.html
homology model: A model of a protein, whose three-dimensional
structure is unknown, built from, e.g., the X-ray coordinate data of similar
proteins or using alignment techniques and homology arguments.
IUPAC Computational Related terms: Sequencing
alignment
homology modeling:
This procedure, also termed comparative modeling or
knowledge-based modeling, develops a three-dimensional model from a protein
sequence based on the structures of homologous proteins. ... Care must be used
in applying the term, "homology modeling." In fact, as noted above
some authors prefer alternative names for the procedure. One must recognize that
homology does not necessarily imply similarity. Homology has a precise
definition: having a common evolutionary origin [6,7]. Thus, homology is
a qualitative description of the nature of the relationship between two or more
things, and it cannot be partial. Either there is an evolutionary relationship
or there is not. An assertion of homology usually must remain an hypothesis.
Supporting data for a homologous relationship may include sequence or
three-dimensional similarities, the relationships between which can be described
in quantitative terms. David R. Bevan, Molecular Modeling of Proteins and
Nucleic Acids, Dept. of Biochemistry, Virginia Tech,
1997-2003 http://www.biochem.vt.edu/modeling/homology.html
A computational method for determining the
structure of a protein based on its similarity to known structures. The accuracy
of structures determined by homology modeling depends largely on the amount of
homology between the unknown and the known protein sequence. The most successful tool for prediction of
protein structure from sequence, but with significant room for improvement.
CMBI Homology Modelling Course
http://www.cmbi.kun.nl/gvteach/hommod/index.shtml
Center for Molecular and Biomolecular Informatics, Univ. of Nijmegen,
Netherlands, 2001. Dictionary http://www.cmbi.kun.nl/gvteach/dictionary.shtml
45 definitions. Related terms: structural homology; Sequencing
glossary sequence homology; Proteins glossary hypothetical
protein; In silico & Molecular
Modeling Compare with similarity
hypothetical proteins:
Many of the gene products of completely
sequenced organisms are “hypothetical” – they cannot be related to any
previously characterized proteins – and so are of completely unknown function.
..As each [completely sequenced] organism’s genome is analyzed about one
third of the observed open reading frames (ORFs), although conserved among
several organisms, encode for “hypothetical ‘ proteins that cannot be related
to other proteins of known function or structure. Understanding the physiological
function of the protein products of these so-called ‘orphan’ genes has
emerged as a major challenge. E Eisenstein et al “Biological function
made crystal clear – annotation of hypothetical proteins via structural
genomics” Current Opinion in Biotechnology 11(1): 25- 30 Feb. 2000
Searching for hypothetical proteins: theory and practice
based upon original data and literature. Lubec
G, Afjehi-Sadat L, Yang JW, John JP. Prog Neurobiol. 2005
Sep-Oct;77(1-2):90-127. Epub 2005 Nov 4.
http://myweb.unomaha.edu/~acornish/articles%5Chypothetical_prot_search.pdf
All predicted protein sequences lacking any significant sequence similarity to characterised proteins are labeled
as ‘hypothetical proteins'. The majority of these cases come from the genome
sequencing projects.
"SWISS- PROT" in Introduction to Molecular Biology Databases, R.
Apweiler, R. Lopez, B. Marx, 1999 http://www.ebi.ac.uk/panda/Publications/mbd1.html in silico
proteomics:
Prediction of protein
structure and function. [Gareth W. Roberts and Jonathan Swinton "In Silico
Proteomics: Playing by the rules" Current Drug Discovery 5: Aug. 1, 2001 http://www.current-drugs.com/CDD/CDD/CDDPDF/issue%205/Roberts.pdf
integral
membrane proteins: http://en.wikipedia.org/wiki/Integral_membrane_protein
See also
under membrane proteins
interologs:
Protein interaction maps have provided insight into the
relationships among the predicted proteins of model organisms for which a genome
sequence is available. These maps have been useful in generating potential
interaction networks, which have confirmed the existence of known
complexes and pathways and have suggested the existence of new complexes
and or crosstalk between previously unlinked pathways. However, the generation
of such maps is costly and labor intensive. Here, we investigate the extent to
which a protein interaction map generated in one species can be used to predict
interactions in another species. LR Matthews "Identification
of potential interaction networks using sequence- based searches for conserved
protein- protein interactions or "Interologs" Genome Research 11 (12):
2120- 2126, Dec. 2001
intrinsically disordered
proteins IDPs: Recent studies revealed that functional
proteins without unique 3-D structures are highly abundant in nature. These
intrinsically disordered proteins (IDPs) possess a number of crucial biological
functions that are complementary to functions of structured (ordered) proteins.
In any given organism, IDPs constitute a functionally broad and densely
populated unfoldome; i.e., a set of unstructured proteins in a proteome. Being
structurally and functionally very different from ordered proteins, IDPs require
special experimental and computational tools for their identification and
analyses. Intrinsically Disordered Proteins Unfoldome and Unfoldomics Gordon Research Conferences
2010 http://www.grc.org/programs.aspx?year=2010&program=intrinsic
location proteomics: Seeks
to provide automated, objective high-resolution descriptions of protein location
patterns within cells. Methods have been developed to group proteins into
statistically indistinguishable location patterns using automated analysis of
fluorescence microscope images. ... Preliminary work suggests the feasibility of
expressing each unique pattern as a generative model that can be incorporated
into comprehensive models of cell behaviour. RF Murphy, Location
proteomics: a systems approach to subcellular location, Biochem Society
Transactions, 33 (Pt 3): 535- 538, June 2005 Google - about 344
Sept. 10, 2003; about 525 March 14, 2006
membrane
proteins:
This meeting addresses the insoluble
nature of membrane proteins, their function and interrelationships, the
challenges of achieving scale, and their importance as drug
targets. Along
the way, we will discover how membrane proteins are housed – either
fully or partially – in lipid bilayers, and why they are considered to
be extremely important intermediaries. For researchers who are
familiar with soluble proteins, Membrane Proteins’ hydrophobic nature
poses a significant barrier to extraction and keeping them stable in an
aqueous state. Membrane
Proteins January 13-14, 2011 • Coronado, CA Program | Register
| Download Brochure Order
CD

Proteins which are found in membranes including cellular and intracellular membranes. They consist of two types,
peripheral and integral proteins. They include most
membrane- associated enzymes, antigenic proteins, transport proteins, and drug,
hormone, and lectin receptors. MeSH, 1977 Narrower term: Drug
& disease targets G-protein-coupled receptors GPCRs;
Related term: Microarray
categories membrane microarrays membrane transport
proteins: Membrane proteins whose primary function is to
facilitate the transport of molecules across a biological membrane. Included in
this broad category are proteins involved in active transport (BIOLOGICAL
TRANSPORT, ACTIVE), facilitated transport and ION CHANNELS. MeSH 2002
Classification,
glossary of membrane transport proteins,
IUBMB International Union of Biochemistry and Molecular Biology, 2002, 13
definitions http://www.chem.qmul.ac.uk/iubmb/mtp/intro.html#glossary
Are there any
transport proteins which are not membrane proteins? Broader term:
carrier proteins
membrane
proteomics:
Membrane proteins comprise the largest set
of proteins to resist high-throughput structural genomics efforts. One of the
major impediments to the analysis of membrane proteins is the lack of generic
and effective expression systems. The aims of the membrane protein platform are
to develop the methodologies to perform high-throughput cloning, expression,
purification and crystallization of membrane proteins. To date, we have
purified over 30 targets to homogeneity. This represents ~10% of the total
number of genes cloned (compared to an average of 40% for similar efforts with
soluble proteins). The proteins we have purified include several active
prokaryotic and eukaryotic rhomboid proteases and human G protein-coupled
receptors (GPCRs). "Membrane Proteomics", A Edwards Labs, Univ.
of Toronto, Canada, 2004 http://www.utoronto.ca/AlEdwardsLab/membrane_proteomics_index.html
Membrane proteins
perform some of the most important functions in the cell, including the
regulation of cell signaling through surface receptors, cell-cell interactions,
and the intracellular compartmentalization of organelles. Recent developments in
proteomic strategies have focused on the inclusion of membrane proteins in
high-throughput analyses. While slow and steady progress continues to be made in
gel-based technologies, significant advances have been reported in non-gel
shotgun methods using liquid chromatography coupled to mass spectrometry
(LC/MS). Wu CC, Yates John R, The
application of mass spectrometry to membrane proteomics Nature Biotechnology
21(3): 262- 267, March 2003 Google = about 25,300
March 1, 2006
Monte Carlo technique:
A simulation procedure consisting of randomly
sampling the conformational space of a molecule. IUPAC Computational Broader
term: simulation
ontologies - proteomics:
A
principal aim of post- genomic biology is elucidating the structures, functions
and biochemical properties of all gene products in a genome. However, to
adequately comprehend such a large amount of information we need new
descriptions of proteins that scale to the genomic level. In short, we need a
unified ontology for proteomics. Much progress has been made towards this end,
including a variety of approaches to systematic structural and functional
classification and initial work towards developing standardized, unified
descriptions for protein properties. In relation to function, there is a
particularly great diversity of approaches, involving placing a protein in
structured hierarchies or more- generalized networks and a recent approach based
on circumscribing a protein's function through systematic enumeration of
molecular interactions. N Lan, GT Montelione, M. Gerstein, Ontologies for
proteomics: towards a systematic definition of structure and function that
scales to the genome level, Current Opinion in Chemical Biology 7(1): 44- 54,
Feb. 2003
ORFeome: Omes & omics glossary
peptide mapping, peptide maps: Maps, genomic & genetic
peptidomics: -Omes &
-omics glossary
Google = about 180 Sept. 18, 2002;
about 748 July 14, 2004
phylogenetic profiles: Phylogenomics glossary
Can be used to hypothesize protein function.
orphan
proteins:
those proteins that do not have significant sequence identity(>10%) with
other known proteins.
Bioinformatics.Org general forum
http://www.bioinformatics.org/pipermail/bbb/2005-July/002624.html
phosphoproteome:
Characterization of post- translational modifications in
proteins is one of the major tasks that is to be accomplished in the post-
genomic era. Phosphorylation is a key reversible modification that regulates
enzymatic activity, subcellular localization, complex formation and degradation
of proteins. DE Kalume et. al, Tackling
the phosphoproteome: tools and strategies, Current Opinion in Chemical
Biology 7(1): 64- 69, Feb. 2003
Ahn NG, Resing KA (2001) Toward
the phosphoproteome. Nature Biotechnology 19:317- 19318
Google = about 88 Sept. 19, 2002;
about 773 June 18, 2004; about 3,600 Feb. 14, 2005, about 1,070 Oct. 25, 2006
phosphoproteomics:
Developments in the field of phosphoproteomics have been fueled by the need
simultaneously to monitor many different phosphoproteins within the signaling
networks that coordinate responses to changes in the cellular environment. Marc
Mumby, Deirdre Brekken, Phosphoproteomics: new insights into cellular signaling,
Genome Biology 2005,
6:230 doi:10.1186/gb-2005-6-9-230 Google = about 6,030
Aug. 15, 2005, about 39,400 Oct. 25, 2006
post- translational modification identification:
ExPASy Proteomics
Tools http://www.expasy.ch/tools/#ptm
list a number of tools for prediction of post- translational modification, as do
other websites. Identification of these modifications may provide important
structural- functional information.
predicted proteins:
ORFs with no similarity to other sequence were named predicted
proteins. MIT, Broad Institute, Methanosarcina project
information, 2004 http://www.broad.mit.edu/annotation/microbes/methanosarcina/background.html
predictive
proteomics:
The search for predictive biomarkers of disease from high-throughput mass
spectrometry (MS) data requires a complex analysis path. Preprocessing and
machine-learning modules are pipelined, starting from raw spectra, to set up a
predictive classifier based on a shortlist of candidate features. As a
machine-learning problem, proteomic profiling on MS data needs caution like the
microarray case. The risk of overfitting and of selection bias effects is
pervasive: not only potential features easily outnumber samples by 103
times, but it is easy to neglect information-leakage effects during
preprocessing from spectra to peaks. Machine learning methods for predictive proteomics, Annalissa Barla et. Al
Briefings in Bioinformatics (2008) 9 (2): 119-128. doi: 10.1093/bib/bbn008 http://bib.oxfordjournals.org/content/9/2/119.abstract
probable protein: See under putative proteins
probable protein (similarity):
When a protein exhibits extensive sequence similarity to a characterised protein
and/ or has the same conserved regions then the label ‘probable' is used in the DE line.
"SWISS- PROT" in Introduction to Molecular Biology Databases, R.
Apweiler, R. Lopez, B. Marx, 1999 http://www.ebi.ac.uk/panda/Publications/mbd1.html
Related term: Protein categories putative protein
protein analysis
sequencing: A process that includes the determination of AMINO ACID SEQUENCE
of a protein (or peptide, oligopeptide or peptide fragment) and the information
analysis of the sequence. MeSH 2000
protein array
analysis:
Ligand-binding assays that
measure protein- protein, protein- small molecule or protein- nucleic acid
interactions using a very large set of capturing molecules, i.e., those attached
separately on the solid support, to measure the presence or interaction of
target molecules in the sample. MeSH 2003
protein bioinformatics:
Tools for Protein Informatics • sequence and structure comparison
• multiple alignments • phylogenetic tree construction •
composition/pI/mass analysis • motif/pattern identification • 2° structure
prediction/threading • TMD prediction/hydrophobicity analysis • homology
modeling • visualization A Very very very short introduction to
protein bioinformatics, Patricia
Babbitt 2003 http://pga.lbl.gov/Workshop/May2003/lectures/Babbitt.pdf
See also Proteomics
protein
informatics Is there a difference?
Google = about 690 April 1, 2003
protein
computational tools: Computational life science
modeling has emerged as an important tool from academic research to advance
industrial applications which support strategic decision making throughout the
protein production pipeline. High-throughput technologies are producing reliable
and quantitative data that aid engineering methods to model complex biological
systems. Protein
Computational Tools January 13-14, 2011 • Coronado, CA Program | Register
| Download Brochure
 protein data:
applying informatics for biotherapeutics, expression and formulation; data
mining, linking structure to function, enabling expression, cloud computing. Protein Data Integration and
Interrogation January 12-13, 2011 • Coronado, CA Program | Register
| Download Brochure

protein databases:
Protein location can be determined by such genome-
wide techniques as green fluorescent protein (GFP) tagging, and protein-
protein interactions can be determined by affinity chromatography,
immunoprecipitation and yeast two- hybrid experiments. Databases resulting from
these methods are beginning to emerge, but they are of uncertain accuracy.
Defining the Mandate of Proteomics in the Post- Genomics Era, Board on
International Scientific Organizations, National Academy of Sciences, 2002 http://www.nap.edu/books/NI000479/html/R1.html
Dr. Stanley Fields, Professor of Genetics and
Medicine at the Univ. of Washington and developer of the yeast two hybrid system
writes that protein databases "will need to become much more sophisticated
if they are to help scientists make sense of the staggering number of experimental
measurements that will soon emerge. ... protein
data will need to be integrated with results from expression profiling, genome-
wide mutation or antisense analyses, and polymorphism detection.
As proteomic data accumulate, we will become better at triangulating from
multiple disparate bits of information to gain a bearing on what a protein
does in the cell. S. Fields "Proteomics in Genomeland" Science
291: 1221-1224 Feb. 16, 2001 Related terms: protein identification, protein localization; Expression
glossary expression profiling
Protein databases Databases
& software directory
protein dynamics: Certain parts of a particular protein will
be rigid, but others may be flexible and change their shape, even when
bound. ... NMR has the unique ability to characterize protein fluctuations
quantitatively, much more so than crystallography can. Understanding the function of a protein is fundamental for gaining insight
into many biological processes. Proteins are stable mechanical constructs
that allow certain internal motions to enable their biological function.
Structural properties of a protein can be obtained with X-ray
crystallography or NMR acquisition techniques. Molecular dynamics
(MD) simulations at pico/ nano- second time scales output one or more
trajectory files which describe the coordinates of each individual atom
over time. The main problem with animating these trajectories is one of
temporal scale. Taking large time steps will destroy the impression of
smooth motion, while small time steps will result in the camouflage of
interesting motions. [Henk Huitema, Robert van Liere " Interactive Visualization
of Protein Dynamics" ERCIM [European Research Consortium for Computers
and Informatics] News No. 44 - January 2001] http://www.ercim.org/publication/Ercim_News/enw44/van_liere.html
Google = about 5,800 Sept. 18, 2002;
about 18,200 July 14, 2004; about 295,000 Nov 10, 2006
protein expression mapping: Maps, genetic
& genomic
protein expression profiling: Expression
protein folding problem: Protein
structures See also protein structure
prediction protein
function: The focus of the group is the understanding
of protein function and evolution using genomic, structural and proteomic data.
Central to this question is the concept of the domain: a structurally conserved,
genetically mobile unit. When viewed at the three-dimensional level of protein
structure, a domain is a compact arrangement of secondary structures connected
by linker polypeptides. It usually folds independently and possesses a
relatively hydrophobic core. The importance of domains is that they cannot be divided
into smaller units they represent a fundamental building block that can be used
to understand the evolution and function of proteins... The advent of
complete genomic sequences, including more and more eukaryotes, is leading to a
fundamental change in protein domain analysis. Having characterised most of the
domain families and having developed tools to predict them, we can now start to
analyse their function and evolution on a higher level. Protein
Function Analysis Group, Max Planck Institute for Molecular Genetics, Germany
http://protfunc.molgen.mpg.de/ Function is not a fixed property for many, if not
most proteins. There are many ways that gene products can be altered to elicit
modified or completely new functions. For example there are exist - alternative
splicing - which may affect as many as ¼ or more of the genes in a higher
eukaryote and can alter biochemical function either drastically or subtly,
producing truncated proteins and proteins with different compositions - post-
translational modification, such as phosphorylation and glycosidation
(which can occur on numerous sites on the same protein) - pre-enzymes made for
secretion and pro- enzymes that are activated by cleavage - acylation and
ubiquitination - non- enzymatic modifications like oxidation, so a given protein
exists in the cell in different oxidized states. Defining the Mandate of
Proteomics in the Post- Genomics Era, Board on International Scientific
Organizations, National Academy of Sciences, 2002 http://www.nap.edu/books/NI000479/html/R1.html
More systematic attempts have been made to place
proteins within a hierarchy of standard functional categories or to connect them
in overlapping networks of varying types of associations. These networks
can obviously include protein- protein interactions ... More broadly, they can
include pathways, regulatory systems and signaling cascades... Perhaps, in the
future, the systematic combination of networks may provide for a truly rigorous
definition of protein function. Mark Gerstein, et. al "Integrating
Interactomes" Science 295 (5553): 284, Jan. 2002 A biologically useful definition of the function of a protein requires a description at several different levels. To the biochemist, function means the biochemical role of an individual protein: if it is an
enzyme, function refers to the reaction catalyzed; if it is a signaling protein, function refers to the interactions that the protein makes. To the geneticist or cell biologist, function includes these roles but will also encompass the cellular roles of the protein, such as the
phenotype of its deletion, the pathway in which it operates, among others. A physiologist or developmental biologist may have an even broader view of function, including tissue specificity and
expression during the life cycle of the organism.
Gregory A Petsko, Dagmar Ringe "Overview: The Structural Basis of Protein
Function" from Chapter 2 of Protein Structure and Function: New Science
Press, 1991-2001
In the expanded view of protein function, a
protein is defined as an element in the network of its interactions. Various
terms have been coined for this expanded notion of function, such as ‘contextual
function’ or ‘cellular function’ … Whatever the term, the idea is that
each protein in living matter functions as part of an extended web of interacting
molecules … Often it is possible to understand the cellular functions of
uncharacterized proteins through their linkages to characterized proteins.
In broader terms, the networks of linkages offer a new view of the meaning
of protein function, and in time should offer a deepened understanding
of the function of cells. David Eisenberg et al "Protein function in the post-
genomic era" Nature 405: 823- 826, 15 June 2000
The principal problem facing the post- genome era.
Walter Blackstock & Malcolm Weir "Proteomics" Trends in Biotechnology: 121-134 Mar
1999 Google = about 27,400 Sept. 18, 2002
about 58,400 Aug. 18, 2003, about 133,000 July 14, 2004; about 766,000 Nov 10,
2006 Related terms: Protein
categories interaction proteomics; Functional
genomics glossary gene function, Gene OntologyTM
; Maps
cell
mapping
protein identification:
The analytical method used most commonly to
visualize and identify large numbers of proteins is 2D-gel electrophoresis.
One can theoretically visualize changes in protein production, both
qualitatively and quantitatively, from two individual samples (e.g., a
control preparation and a treated preparation). Furthermore, one can potentially
accomplish protein identification by "picking" proteins from the 2D-
gel and subjecting the highly purified protein to MALDI- TOF mass
spectrometry. "High - Throughput Genomics, CHI Genome Link
14.1 http://www.chidb.com/newsarticles/issue14_1.asp
Google = about 8,460 Sept. 18, 2002
about 15,000 Aug. 18, 2003, about 32,000 July 14, 2004; about 494,000 Nov 10,
2006 Related term: protein databases
protein informatics:
The Protein Informatics Group currently consists of a collaboration between
researchers at the Oak Ridge National Laboratory, the University of Missouri,
and the University of Georgia. Our common interests are in development of
computational tools for solving problems from molecular biology. Our work ranges
from construction of mathematical/statistical models to development of
algorithms to code implementation on various platforms to applications of
computational tools to solve various bio-data analysis problems. Protein Informatics Group, Computational Biology, Oak Ridge
National Lab, US http://compbio.ornl.gov/structure/ Computational
biological research has become an essential component of biological research. The great quantity
and diversity of the data being generated by different technologies is daunting,
and impossible to organize or oversee without computational assistance. In
functional genomics, a great deal of effort has been devoted to developing
community- based standards for reporting gene expression data to allow others to
replicate experiments. The same will need to be done for proteomics to validate
across the different technologies. Perhaps never before has a bioinformatics
problem of this magnitude been approached. Without effective and integrated
databases to store and retrieve these data and advanced computational methods
such as pattern recognition and other machine learning approaches to analyze and
interpret them, the full implications of these data will not be realized.
Defining the Mandate of Proteomics in the Post- Genomics Era, Board on
International Scientific Organizations, National Academy of Sciences, 2002 http://www.nap.edu/books/NI000479/html/R1.html
Although mining of protein
structure homology data is a relatively small field now, it is likely to
experience dramatic growth and to become pivotal in the ultimate exploitation of
genomic data and tools. Google = about 561 Sept. 18, 2002,
about 888 Aug. 18, 2003, about 1,810 July 14, 2004; about 16,200 Nov 10, 2006
Related
terms: proteoinformatics; Algorithms glossary;
protein bioinformatics; In Silico & molecular modeling
glossary
protein interactions:
See protein DNA interactions, protein protein interactions, protein RNA
interactions Google = about 59,900 Sept. 18, 2002;
about 141,000 Aug. 18, 2003, about 271,000 July 14, 2004; about 1.170,000 Nov
10, 2006 Narrower terms: annotation- proteins, binary
interaction, interaction proteomics, protein- DNA
interactions, protein- protein
interactions, protein- RNA interactions; Related terms: protein networks; -Omes
& -omics glossary interactome
protein interaction mapping: Maps genomic
& genetic
protein linkage maps: Maps genomic &
genetic
protein & mRNA data:
Although the relationship between
mRNA and
protein levels is vague for individual genes, some of the statistics for broad
categories of protein properties are much more robust... In contrast to the
differences between mRNA and protein data for individual genes, the broad
categories show that the transcriptome and translatome populations are
remarkably similar; both contain roughly the same proportions of secondary
structure and functional categories. Moreover, this contrasts the difference
with the genome, which appears to have a distinctly different composition of
functional categories. This illustrates that we get a more consistent picture
when we average across the population, i.e. there is broad similarity between
the characteristics of highly expressed mRNA and highly abundant proteins.
Dov Greenbaum, Mark Gerstein et. al. "Interrelating Different Types of
Genomic Data" Dept. of Biochemistry and Molecular Biology, Yale Univ. 2001 http://bioinfo.mbb.yale.edu/e-print/omes-genomeres/text.pdf
Related terms:
Expression glossary; Genomics
glossary genome data; functional
genomics data Omes & omics
transcriptome, translatome
protein networks:
The individual steps in signal
transduction pathways involve protein interactions with target molecules that
may be other proteins, small molecules or DNA. Identifying all of the proteins
that take part in a given class of interactions, on a genome-wide scale, remains
an extremely challenging task. We propose to apply mRNA display (1,
2) technology to this problem, with the goal of developing
databases of protein-ligand interactions that will add value to the existing and
growing sequence databases. PI Jack Szostak, Definition of Protein Networks
using mRNA display, ParaBioSYs, MGH, HMS, BU http://pga.mgh.harvard.edu/Parabiosys/projects/protein_networks_rna_display.php
Google = about 1,160 Sept. 18, 2002;
about 2,530 Aug. 18, 2003; about 6,170 July 14, 2004; about 138,000 Nov 10, 2006
protein sequence:
A process that includes the determination of an amino acid
sequence of a protein (or peptide, oligopeptide or peptide fragment) and the
information analysis of the sequence. MeSH, 2002 See
also amino acid sequence.
protein sequence space: [J.] Maynard-Smith's (1970. Natural Selection and the concept of a protein space. Nature 225: 563- 564) concept of a "protein
sequence space" in which each site in an alignment is represented on its own axis and the number
of axes required to represent all conceivable variants for a protein is equal to the number of sites
in its sequence. Each sequence occupies a unique point in this space; variants differing at one site
are adjacent (Hamming) neighbours. The collection of all viable sequence variants for a
particular protein forms a localized interconnected `neighbourhood' of points within the space.
This representation has proved conceptually intuitive and analytically powerful
...
In protein sequence space, constraints are reflected in the multidimensional shape of the
cluster of points that make up the "neighbourhood" of variants viable for a specific protein. The
boundary defining the edge of this neighbourhood is characteristic of the protein's function and
can be thought of as its functional "signature". Gavin JP Naylor,
"Measuring Shifts In Function and Evolutionary Opportunity Using
Variability Profiles: A Case Study of the Globins" also Journal of
Molecular Evolution 51 (3): 223-233 Sept. 2000 http://bioinfo.mbb.yale.edu/e-print/protspace-jme/text.pdf
protein sorting signals:
Amino acid sequences found in transported proteins that selectively guide the distribution of the proteins to specific cellular compartments.
MeSH, 2001
Protein
Spotlight, Swiss-Prot http://au.expasy.org/spotlight/
One month, one protein
Protein Structure Initiative:
Aims at determination of the 3D
structure of all proteins. This aim can be achieved in four steps: Organize
known protein sequences into families; Select family representatives
as targets; Solve the 3D structure of targets by X-ray crystallography
or NMR spectroscopy; Build models for other proteins by homology to solved
3D structures. http://www.structuralgenomics.org/
Protein Structure
Initiative NIGMS http://www.nigms.nih.gov/funding/psi.html
protein structure prediction:
Methods for
protein structure prediction have matured to the point where models produced by
prediction algorithms can be used to understand and test hypotheses about
biological function. The goal of this community wide effort is to provide
structural and functional insights into biologically important proteins,
particularly those that are intractable to experimental structural
determination. Ten Most Wanted, Critical Assessment of Techniques for Protein
Structure Prediction, CASP, Lawrence Livermore National Lab, US http://predictioncenter.llnl.gov/
Protein 3D structures are encoded
by a linear sequence of amino acid residues. To predict 3D structure from
sequence is a task challenging enough to have occupied a generation of
researchers. Have we finally succeeded? The bad news is: we still cannot
predict structure for any sequence. The good news is: we have come closer,
and growing databases facilitate the task. A solution of the structure
prediction problem would supposedly change experimental molecular biology
more than any other theoretical method. We may witness such a break- through
in the near future. However, the lessons from the Asilomar prediction contests
were that we may need a common frame- work to co- ordinate the efforts of
the researchers in the field. "Neural networks for protein structure prediction:
hype or hit? Burkhard Rost, Dec. 1999 http://www.embl.de/~rost/Papers/pre1999_tics/paper.html
Involves primary sequence alignment,
secondary and tertiary structure prediction and homology modelling.
Narrower term: ab initio
protein structure prediction
protein taxonomy:
A
Protein Taxonomy Based on Secondary Structure T. Przytycka, R. Aurora, GD Rose, Nature Structural Biology 6
(7): 1999.
protein threading: See
threading
proteome map: Maps, genomic & genetic
glossary
Google = about 149 Sept. 18, 2002;
about 319 Aug. 18, 2003; about 746 June 21, 2004; about 18,300 Nov 10,
2006
proteome mining:
Timothy AJ Haystead "Proteome Mining: Exploiting
serendipity in drug discovery" Current Drug Discovery, March 2001] http://www.current-drugs.com/CDD/CDD/CDDPDF/HAYSTEAD.pdf
Google = about 68 Sept. 18, 2002;
about 156 Aug. 18, 2003; about 276 June 21, 2004; about 951 Nov 10, 2006
Related term: proteome database mining
proteomic analysis:
Systematic and
quantitative analysis of the properties that define protein activity and
functions within a defined context, essential for biology and medicine. Ruedi
Aebersold quoted in Defining the Mandate of Proteomics in the Post- Genomics
Era, National Academies Press, 2002 http://www.nap.edu/books/NI000479/html/R1.html A
systematic analysis of proteins for their identify quantity and function. J Peng
and Steven Gygi, Proteomics: the move to mixtures, Journal of Mass Spectrometry
35: 1083- 1091, 2001
proteome informatics:
Peer Bork and David Eisenberg, "Genome and
proteome informatics" Current Opinion in Structural Biology 10 (3):
341-342, 2000
Proteome Informatics group
is part of the Swiss
Institute of Bioinformatics (SIB). It is in charge of research and
development in the fields of bioinformatics, molecular imaging
and the use of Internet for biomedical applications. Current Projects and
People, ExPASy, Swiss Institute of Bioinformatics http://au.expasy.org/people/pig/
Google = about 261 Sept. 18, 2002;
about 453 Aug. 18, 2003; about 708 July 14, 2004; about 10,700 Nov 10, 2006
Proteomic
Standards Initiative PSI:
The HUPO Proteomics Standards Initiative (PSI)
defines community standards for data representation in proteomics to facilitate
data comparison, exchange and verification. Proteomic Standards Initiative,
HUPO http://www.psidev.info/
putative proteins:
Some similarity to one or more existing entries It is in this category that
the adjective "putative" comes into play. For these cases, again there
is no experimental proof that the protein exists and there is only limited
evidence to point the protein to a particular family. Again, we have no fixed
rules on what is "limited" and what isn't. It is a judgment that we
make based on which family it is and which, if any, areas are conserved. A
primer on UniProtKB/Swiss-Prot annotation Name: ANNBIOCH.TXT Release: 54.0 of
24-Jul-2007 http://www.expasy.ch/cgi-bin/lists?annbioch.txt
The label ‘putative' is used in the DE [descriptor] line of proteins that exhibit limited sequence similarity to characterised proteins. These proteins often have a conserved site e.g. ATP-binding site but no other significant similarity to a characterised protein. It is most frequently used for sequences from genome projects.
The assignment of the labels ‘probable' and ‘putative' is dependent primarily on the results of
sequence similarity searches against
SWISS- PROT. It is important to point out here that no specific cut- off point is used to assign a protein as ‘putative' or
‘probable'. "SWISS- PROT" in Introduction to Molecular
Biology Databases, R. Apweiler, R. Lopez, B. Marx, 1999 http://www.ebi.ac.uk/panda/Publications/mbd1.html Related term probable protein (similarity)
quantitation - proteins:
It is likely that in the
near future, researchers will continue to use comprehensive gene arrays at the
start of their work, to generate hypotheses and narrow their research questions.
Then, they might delve deeper into these questions by using non-array- based
gene expression studies (to get better quantitation and true relative
expression) or go to a focused protein array that covers most of proteins that
are indicated based on the gene array experiments. "Proteomewide
chips - not so fast" CHI's GenomeLInk 21.2 http://www.chidb.com/newsarticles/issue21_2.asp regulatory homology:
Quantitative analysis of protein expression data
obtained by high - throughput methods has led us to define the concept of
"regulatory homology" and use it to begin to elucidate the basic
structure of gene expression control in vivo. N. Leigh Anderson, Norman
G. Anderson "Proteome and proteomics; New technologies, new concepts, and
new words" Electrophoresis 19(11):1853-61 August 1998 Google = about 22 Sept. 18, 2002;
about 38 Aug. 18, 2003; about 49 July 14, 2004; about 144 Nov 10, 2006
regulatory proteins: A detailed understanding
of the interplay between regulating proteins and DNA targets is required to
interpret transcriptomic data and to model the dynamics of genetic networks. Two
key problems in this respect are the control of protein traffic on DNA and the
combined effects of several regulating proteins operating on the same target
gene. [International Workshop on Regulatory Proteins Interplay and Traffic
on DNA, July 12-13, 2002, Evry, France http://www.lami.univ-evry.fr/~epigenese/Ecoles/Autrans/workshop.html
reverse proteomics:
In reverse proteomics, the starting point is the
DNA sequence of the genome of an organism. First, the transcriptome (complete
set of transcripts) and proteome (complete set of proteins) are predicted in
silico and subsequently this information is used to generate reagents for
their analysis. Marc Vidal, AJ Walhout, "Protein Interaction Maps for Model Organisms" Nature
Reviews Molecular Cell Biology 2; 55- 63, Jan. 2001 http://www.nature.com/nrm/journal/v2/n1/slideshow/nrm0101_055a_F2.html
Compounds can be tested to see if they
can disrupt protein - protein interactions - a strategy that may be extremely
useful for the development of new drugs. [Wellcome Trust, UK "The Human
Genome Functional Genomics"]
RNA structural genomics:
The systematic determination of all
macromolecular structures represented in a genome, is focused at present
exclusively on proteins. It is clear, however, that RNA molecules play a variety
of significant roles in cells, including protein synthesis and targeting,
many forms of RNA processing and splicing, RNA editing and modification,
and chromosome end maintenance. To comprehensively understand the biology of a cell, it will ultimately be necessary to know the identity of all encoded RNAs,
the molecules with which they interact and the molecular structures of these
complexes. This report focuses on the feasibility of structural genomics of RNA,
approaches to determining RNA structures and the potential usefulness of an RNA
structural database for both predicting folds and deciphering biological
functions of RNA molecules. [Jennifer A. Doudna "Structural Genomics of
RNA" Nature Structural Biology 7 (11) supp: 954-956 (Nov. 2000] http://www.euchromatin.org/Doudna1.htm
Rosetta stone method: A way of looking at the correlation of
protein
domains across species. Some proteins have homologs that are fused
in other species, yielding clues as to the proteins with which they might
interact. In addition, proteins that have been identified in particular
complexes and pathways hint at the location and function of their homologs
in other species. S. Spengler “Bioinformatics in the information age”
Science 287 (5451): 221- 223 Feb. 18, 2000 Related term: Phylogenomics glossary phylogenetic profiles
sequence homology, amino acid
The degree of similarity between sequences of amino acids. This information is useful for the understanding of genetic relatedness of certain species.
MeSH, 1993
signal transduction: Metabolic
engineering glossary
similarity:
Quantity that indicates for example the percentage
identical amino acids between two sequences. Similarity is an observed quantity,
that might be for example be expressed in percent of residues that are similar
between two aligned sequences. Similarity is a bad measure, because it is
subjective. The author of the software decides whether Gln and Asp are similar
or not. The percentage identity is a much better measure. There
is an important difference between similarity and homology. Similarity is a
value between 0.0 and 1.0, or between 0 and 100%. On the other hand, there are
no degrees of homology. The sequences are either homologous or not. Center for Molecular and Biomolecular Informatics,
Dictionary, Univ. of Nijmegen,
Netherlands, 2001 http://www.cmbi.kun.nl/gvteach/dictionary.shtml
structural bioinformatics:
Involves the process of determining
a protein's three- dimensional structure using comparative primary sequence
alignment,
secondary and tertiary structure prediction methods, homology modeling,
and crystallographic diffraction pattern analyses. Currently, there is
no reliable de novo predictive method for protein 3D- structure determination.
Over the past half- century, protein structure has been determined by purifying
a protein, crystallizing it, then bombarding it with X-rays. The X-ray
diffraction pattern from the bombardment is recorded electronically and
analyzed using software that creates a rough draft of the 3D structure.
Biological scientists and crystallographers then tweak and manipulate the
rough draft considerably. The resulting spatial coordinate
file can be examined using modeling- structure software to study the gross
and subtle features of the protein's structure. Christopher Smith "Bioinformatics,
Genomics, and Proteomics" Scientist 14[23]:26, Nov. 27, 2000 http://the-scientist.com/yr2000/nov/profile_001127.html Related terms Algorithms,
In silico & Molecular
Modeling.
Structural Biology Industrial Platform:
Fifteen companies, including
representatives of some of Europe's largest pharmaceutical industries,
have formed the Structural Biology Industrial Platform to work with each
other, the European Commission and Research Centres in Europe to promote
structural biology research, training and development. http://www.sbip.org/
structural genomics:
Focuses on the physical aspects of the genome through the
construction and comparison of gene maps and sequences, as well as gene
discovery, localization, and characterization. Brush up on your 'omics, Chemical
& Engineering News, 81(49): 20, Dec. 2003 http://pubs.acs.org/cen/coverstory/8149/8149genomics1.html
The fast-developing
fields of structural and functional genomics -- studies of proteins encoded by
the entire genome -- are being brought to bear on the problem of understanding
the root of many cancers. A protein's structure can tell researchers much about
its function, information that ultimately is needed to understand a protein's
link to cancer. By determining the detailed, three- dimensional structure of
proteins, researchers are better able to understand how each protein functions
normally and how faulty protein structures can cause disease. David Brand,
MacCHESS moves into cancer research through structural genomics, Cornell,
2001 http://www.news.cornell.edu/Chronicle/01/2.22.01/MacCHESS.html
Involves quickly determining the 3D structures of large numbers of
proteins
(or other complex biological molecules, such as nucleic acids), ultimately
accounting for an organism’s entire proteome. Footnote: As traditionally
defined, the term structural genomics referred to the use of sequencing
and mapping technologies, with bioinformatic support, to develop complete
genome maps (genetic, physical, and transcript maps) and to elucidate genomic
sequences for different organisms, particularly humans. Now, however, the
term is increasingly used to refer to high- throughput methods for determining
protein structures
Many of the criticisms leveled at the Human Genome Project in
the mid- 1980’s have been redirected toward structural genomics. Unlike high-
throughput genome sequencing, it is not a simple matter to decide
when a structural genomics effort has reached completion. SK Burley et
al “Structural genomics: beyond the Human Genome Project” Nature Genetics
23: 151 Oct. 1999 Related term: structural proteomics
A good
explanation of structural genomics Joint
Center for Structural Genomics http://www.jcsg.org/help/robohelp/Definitions/Structural_Genomics.htm
Human Proteome/Structural Genomics Pilot Project, Brookhaven National
Laboratory, US http://www.proteome.bnl.gov/
A pilot project to examine the feasibility of high-throughput
determination of 3-dimensional structures of proteins by x-ray crystallography,
starting from genome sequences.
Human Proteomics Initiative, Swiss Institute of Bioinformatics, European
Bioinformatics Institute http://us.expasy.org/sprot/hpi/
A major project to annotate all known human sequences
according to the quality standards of Swiss- Prot. This means providing, for
each known protein, a wealth of information that include the description of its
function, its domain structure, subcellular location, post- translational
modifications, variants, similarities to other proteins, etc.
Structural Genomics Initiative,
NIGMS, US http://www.nigms.nih.gov/funding/psi.html
Structural genomics databases Databases & software
directory.
structural homology: Identify
3D structures of proteins or domains in the same family as a sequence of
interest.
Related terms: homology Functional
genomics glossary homology modeling Molecular
modeling glossary
structural homology
protein:
The degree of 3-dimensional shape similarity between proteins. It
can be an indication of distant AMINO
ACID SEQUENCE HOMOLOGY and used for rational DRUG
DESIGN. MeSH 2003
structural proteomics:
is focused on the
determination of three-dimensional (3D) structures of annotated and un-annotated
proteins … emerged from the simultaneous developments of rapid and parallel
methodologies in gene cloning, protein purification, and 3D structure
determination and recent results have demonstrated the feasibility and
importance of this approach for functional annotation. The classic work by
Zarembinski et al. (1998, PNAS, 95: 15189-15193) represents one possible outcome
of the structural analysis of an unknown protein, in which a protein-bound
ligand or cofactor was discovered. Such information is the most useful for
functional annotation because it identifies the nature of the ligand, the
ligand-binding site and the disposition of catalytic residues, from which a
catalytic mechanism can be postulated. Other sources of structure-derived
information come by identifying structural homologues in databases or local
structural motifs or putative catalytic sites. Our strategy involves the use of
X-ray crystallography and NMR spectroscopy to determine the structures of
hypothetical proteins. Yeast Integrative Biology Project
Genome
Canada funded, University of Toronto http://yeastgenomics.ca/structure
Sometimes referred to as structural genomics, this
discipline involves determining the 3D structures of large numbers of proteins,
ultimately accounting for an organism's entire proteome. It adds critical
information in at least two points in the drug discovery pathway: (1) target
identification, or selecting a pathway in which a drug might function, and (2)
medicinal chemistry, or the actual design of compounds to modulate this pathway. A high-throughput, system wide means of determining gene
function. It
typically involves using high- throughput X-ray diffraction methods to determine
the structure of proteins encoded by at least one member of each gene
family in the genome. This approach is coupled with the use of bioinformatics
as a tool in structural proteomics and computational modeling
to determine structures of other proteins in the same family. Conversely, an
important goal of structural proteomics is the creation of databases of
structures. When asked to identify bottlenecks in the structural proteomics field,
several academic and industry scientists pointed to the need for faster and more
reliable protein production and purification strategies, rather than stronger
beams at the X-ray crystallization step.
structure based design:
A design
strategy for new chemical entities based on the three- dimensional (3D) structure of the target obtained by X-ray or nuclear
magnetic resonance (NMR) studies, or from protein homology models. IUPAC
Computational structure- based
drug design: Structure-based drug design took
nearly two decades of multiple, parallel technological improvements to arrive at
its current mainstream position in medicinal chemistry. Developments in computer
graphics, high-power radiation sources, computational processing power,
refinement protocols, virtual screening and crystallography were all necessary
to create the environment for rapid, iterative structure-based drug discovery.
Given the crisis facing the pharmaceutical industry in the
translation of early stage drug discovery results, a different set of tools,
concerned with algorithms and methods for predicting the biological profiles,
will need to be refined. Structure-Based
Drug Design June 6-8, 2012 • Cambridge, MA Program | Register | Download Brochure
Structure-Based
Drug Design
structure from sequence:
See protein structure prediction,
structural homology
structure prediction problem:
The protein secondary structure
prediction problem has become a classic, challenging problem for the artificial-
intelligence and machine learning community. Virtually every conceivable
computational technique in these fields (e.g., information theory [6, 12, 13], artificial
neural networks [15, 20, 22], cascaded networks [18, 19, 27], hybrid systems
[28], nearest neighbor methods [21], hidden markov chains [4], machine
learning [17, 25], mutual information [26]) has been applied in the context of
protein structure prediction. The reason for this attention is well- founded and
clear: If protein structure, even secondary structure, can be accurately
predicted from the now abundantly available gene
and protein sequences, such
sequences become immensely more valuable for the understanding of drug-
design, the genetic basis of disease, the role of protein
structure in its enzymatic, structural, and signal transduction
functions, and basic physiology from molecular to cellular, to fully systemic
levels. In short, the solution of the protein structure prediction problem (and
the related protein folding problem) will bring on the second phase of
the revolution. Peter Munson et. al "Protein Secondary Structure
Prediction, NIH, 1994 http://abs.cit.nih.gov/reprints/text3.html
SWISS- PROT: Databases &
software directory
threading:
In this approach, a target sequence is “threaded”
through a library of 3D folds to try to find a match. This method
is used when no sequence is clearly related to the target sequence.
virtual genomes:
A distributed computing project to use protein design
to generate new "virtual genomes." Our project, Genome@home,
studies real genomes and proteins directly, by designing new sequences for
existing 3-D protein structures, which come from real genomes. The protein
structure files that are sent out as work contain the Cartesian atomic
coordinates of a protein. This data was obtained experimentally through X-ray
crystallography or NMR techniques. Note that this was not done by us;
thousands of scientists have spent decades compiling this data, which is
generously made freely available to the public. By designing new sequences that
could form these specific protein structures, we're setting the stage to attack
a number of significant contemporary issues in structural biology, genetics, and
medicine. Vijay Pande, Pande Group Projects, Stanford Univ. US http://www.stanford.edu/group/pandegroup/projects.html#design
whole proteom
analysis:
Proteome analysis has become indispensable and complementary to genomic analysis. With access to whole genome sequences from various organisms and with the imminent completion of many more, the
SWISS- PROT group at EBI has developed a research- oriented initiative that utilises many of the existing resources and provides comparative analysis of the
predicted protein coding sequences of all complete genomes. Rolf Apweiler
"Whole Proteome Analysis: The role of InterPro and CluSTr" Plant &
Animal Genome IX, San Diego CA Jan. 13-17, 2001 http://www.intl-pag.org/pag/9/abstracts/W22_01.html
Google = about 259 Sept. 18, 2002;
about 586 Aug. 18, 2003; about 1,610 July 14, 2004; about 30,600 Nov 10, 2006
whole proteome interaction mining:
A
major post- genomic scientific and technological pursuit is to describe the
functions performed by the proteins encoded by the genome. One strategy is to
first identify the protein- protein interactions in a proteome, then determine
pathways and overall structure relating these interactions, and finally to
statistically infer functional roles of individual proteins. Although huge
amounts of genomic data are at hand, current experimental protein interaction
assays must overcome technical problems to scale- up for high- throughput
analysis. In the meantime, bioinformatics approaches may help bridge the
information gap required for inference of protein function. JR Bock, DA Gough, Whole-
proteome interaction mining, Bioinformatics 19(1) :125- 134, Jan. 2003
Bibliography
CHI, Structural Proteomics: High-Throughput
Approaches Fuel Drug Discovery and Development, Cambridge Healthtech Institute, Malorye
Branca, Allan Haberman, Deidre Lockwood 2001
IUPAC
International Union of Pure and Applied Chemistry, Glossary of Terms used in
Computational Drug Design, H. van de Waterbeemd, R.E. Carter, G. Grassy, H.
Kubinyi, Y. C.. Martin, M.S. Tute, P. Willett, 1997. 125+ definitions. http://www.iupac.org/reports/1997/6905vandewaterbeemd/glossary.htmlJoint Center for
Structural Genomics Technologies http://www.jcsg.org/scripts/prod/technologies1.html
Nature Structural Biology, Structural genomics supplement, Nov.
2000
Alpha
glossary index
How
to look for other unfamiliar terms
IUPAC definitions are
reprinted with the permission of the International Union of Pure and Applied
Chemistry.
|
|