Back
to BiopharmaceuticalGlossaries.com
You are here Biopharmaceutical
Glossary
homepage/Search > Biology > Gene definitions
Gene definitions
& taxonomy for pharmaceuticals One of the most unfortunate legacies of Mendelian genetics is
the lumping together of gene defects and genes. People with various
genetic defects may or may not manifest a disease phenotype. As both
Horace Freeland Judson and Sydney Brenner point out in the articles cited
below classical genetics was so firmly based on gene defects that only
recently have we begun to determine what "normal" or wild- type genes really
are. Careful reading and/or listening will often reveal that people
use the word gene and a number of related words and phrases (mutations
and other variants) very loosely and interchangeably. And we are only
starting to realize the full extent of the diversity which characterizes
"normal" variants.
Biology & chemistry map
Finding guide to terms in these glossaries Site
Map How past history leads to present confusion
Sydney Brenner, writing in the special Drosophila genome issue of Science
made a similar observation "Old geneticists knew what they were talking
about when they used the term "gene", but it seems to have become
corrupted by modern genomics to mean any piece of expressed sequence, just
as the term algorithm has become corrupted in much the same way to mean
any piece of a computer program. I suggest that we now use the term "genetic
locus" to mean the stretch of DNA that is characterized either by mapped
mutations as in the old genetics or by finding a complete open reading
frame as in the new genomics. In higher organisms, we often find closely
related genes that subserve closely related, but subtly different, functions."
Sydney Brenner "The End of the beginning" Science 287 (5451): 2173, Mar.
24, 2000
Don't expect to know anytime soon exactly
how many human genes there are. About 60% of our genes
exhibit alternative splicing, making the number of protein products close to
100,000, not a very different number from the more recent estimates. Expect to hear more about genes and
the cell cycle, and how gene expression differs throughout it. After
all, the yeast (Saccharomyces cerevisiae)
genome has been sequenced since 1996 and the precise number of genes is not yet
confirmed. It is also useful to read the Oxford English Dictionary's
definitions for genome and note the quotation from Scientific American
Oct. 1970 "The human genome consists of perhaps as many as 10 million
genes."
Definitions of gene
Michael
Snyder and Mark Gerstein, GENOMICS:
Defining Genes in the Genomics Era, Science 300: 258- 260, Apr.
11, 2003
Rat
Genome Database definition: "the DNA sequence necessary and sufficient
to express the complete complement of functional products derived from a unit of
transcription" Splice
variants for each gene are also
listed in the gene report ... Does not include: variations such as mutations,
pseudogenes, Protein products which are derived from the modification
of precursor proteins. Rat Genome Database, Medical College of Wisconsin,
Milwaukee, Wisconsin, accessed May 30, 2003 http://rgd.mcw.edu/tu/genes/#definition
A gene can be defined as an
abstraction that is useful for the purposes of nomenclature and for the
assignment of a symbol. It was originally described as a "unit of
inheritance" and has since been described a "set of features on the
genome that can produce a functional unit", but this latter term does not
encompass all of those objects to which symbols are assigned. HM Wain, JA Blake,
EA Bruford, LJ Maltais, S Povey, Report of ASHG- NW Gene Nomenclature Workshop,
Clarification of the definition of a gene for nomenclature purposes, Oct. 2,
2000 http://www.gene.ucl.ac.uk/nomenclature/workshop/ashgnw_report.html
The definition of gene is evolving
(and lengthening) as we tease apart the incredible complexity of biological and
molecular processes and discover that "junk DNA" has important
regulatory functions. Gene identification in
prokaryotes is almost trivial as their genomes consist almost entirely
of exons. However human genes are only about 2 % of total human DNA.
Human exons are widely separated by immense stretches of introns.
The concept of "gene" didn't come along until 1909, three years after
the term genetics in 1906 (Evelyn Fox Keller, The century of the gene,
Harvard University Press, 2000). For some time it remained a quite abstract
term. With advances in molecular biology the definition is far from settled.
Is a monolithic gene concept still valid?
William Gelbart writing on "Databases in Genomic Research" in
Science (282 (5389: 659- 661, 23 Oct. 1998) notes: Nonetheless, we may
well have come to the point where the use of the term "gene" is of
limited value and might in fact be a hindrance to our understanding of the
genome. Although this may sound heretical, especially coming from a card-
carrying geneticist, it reflects the fact that, unlike chromosomes, genes are
not physical objects but are merely concepts that have acquired a great deal of
historic baggage over the past decades. Ultimately, we want to understand the relationships between heritable units,
their gene products, and their phenotypes. ... the
realities of genome organization are much more complex than can be accommodated
in the classical gene concept. Genes reside within one another, share some of
their DNA sequences, are transcribed and spliced in complex patterns, and can
overlap in function with other genes of the same sequence families. Consider so-
called alternative splicing, in which one or more exons are shared among
multiple transcripts. There is a continuum ranging from cases in which two
transcripts are almost identical along their entire length to examples in which
only a small portion of the two mRNAs is shared. Sometimes these products have
very similar biological activities, whereas in other cases their activities are
disparate. What are the rules for deciding whether two partially overlapping
mRNAs should be declared to be alternative transcripts of the same gene or
products of different genes? We have none. Independent of this question is the question of how to relate a mutant
phenotype to alterations in multiple overlapping gene products. Suppose that we
have a missense mutation that falls within one or more exons that contribute to
more than one mRNA and thereby to more than one polypeptide chain. How do we
assess the contributions of defects in the different polypeptides to the
ultimate phenotype elicited by this mutation? For reasons such as these, I believe that we are entering a period in which
we must shift to the view that the genome largely encodes a series of functional
RNAs and polypeptides that are expressed in characteristic spatial, temporal,
and quantitative patterns. The classical concept of the gene ultimately forms a
barrier to trying to understand phenotypes in terms of encoded functional
products. This is not a purely abstract discussion but may well demand that we
reexamine how we are organizing data within genome- related databases. In most
or all of these databases, much biological data is attached to these suspect
units called genes. Although some aspects of these phenotypes might be
associated with different subsets of alternative products of these genes, the
databases might not support the most rigorous parsing of this phenotypic
information." See also Does defining gene
only get harder? Petter Portin in "The Origin, Development and Present Status of the
Concept of the Gene: A Short Historical Account of the Discoveries" Univ.
of Turku, Finland (2000) writes "The current view of the gene is of necessity an abstract, general, and open one, despite the
fact that our comprehension of the structure and organization of the genetic material has greatly increased. Simply,
our comprehension has outgrown the classical and neoclassical terminology.
... In fact it should be stressed that our comprehension of the very concept of gene has always been abstract and open as
indicated already by Wilhelm Johannsen [2]. Due to the openness of the concept of the gene, it takes different meanings depending on the context. Maxime
Singer and Paul Berg [148] have pointed out that many different definitions of the gene are possible. If we want to
adopt a molecular definition, they suggest the following definition: "A eukaryotic gene is a combination of DNA
segments that together constitute an expressible unit. Expression leads to the formation of one or more specific
functional gene products that may be either RNA molecules or
polypeptides. Each gene includes one or more DNA
segments that regulate the transcription of the gene and thus its expression." (p. 622). Thus the segments of a gene
include [1] a transcription unit, which includes the coding sequences, the
introns, the flanking sequences
- the
leader and trailer sequences, and [2] the regulatory sequences, which flank the transcription unit and which are
necessary for its specific function." http://www.bentham.org/cg/sample/cg1-1/Portin.pdf Bioinformatics expert Nat Goodman writing in the April 2001 issue of Genome
Technology states that gene "is a highly nuanced noun like "truth". Ten
years ago, it commonly meant "genetic locus" - a region of the genome
linked to a disease or other phenotype. Over time biologists became more
comfortable thinking of a gene as a transcribed region of the genome that
results in functional molecular product. In its published human genome
paper [Science Feb. 16, 2001] Celera defines a gene as "a locus of
cotranscribed exons" in order to emphasize the importance of alternative
splicing. Ensembl's Gene Sweepstake Web page [see below] took the definition
to new depth: "A gene is a set of connected transcripts. ... Two
transcripts are connected if they share at least part of one exon in the genomic
coordinates. Implicit in the new definitions of a gene is a belief that
the genome can be partitioned into regions such that all exons in a given region
belong to a single gene. These regions are the loci of Celera's
definition. A theoretically possible alternative is that the genome might
contain long chains of overlapping transcripts in which the first transcript
overlaps the second which overlaps the third, but the first and third don't
overlap. I'm not aware of any such examples, but if they exist, then all bets
are off." Nat Goodman "Human Transcriptome Project" Genome Technology:
55-58 April 2001
While some of the terms included below are relevant to all genes, some are specific to humans
and/ or other organisms. Gene definitions
There
are many discussions between biologists to find a comprehensive definition of a
gene, which is not easy, if possible at all. For our purposes a gene is a
continuous stretch of a genomic DNA molecule, from which a complex molecular
machinery can read information (encoded as a string of A, T, G, and C) and make
a particular type of a protein or a few different proteins. Alvis Brazma, et.
al., A quick introduction to elements of biology: 3.3 Genes and protein
synthesis, European Bioinformatics Institute, Draft, 2001 http://www.ebi.ac.uk/microarray/biology_intro.html#Genomes Specific
sequences of nucleotides along a molecule of DNA
(or, in the case of some viruses, RNA) which
represent the functional units of heredity. The majority of eukaryotic genes
contain coding regions (codons) that are interrupted by non- coding
regions (introns) and are therefore labeled split genes. MeSH, 1965
The functional and
physical unit of heredity passed from parent to offspring. Genes are pieces of
DNA, and most genes contain the information for making a specific protein [NHGRI
glossary] This definition doesn't specify that it applies only to humans - but
by specifying "parents" it seems to rule out non- animal genes, and
almost implies mammals, or at least warm- blooded organisms.
A gene is a DNA segment that contributes to phenotype/
function. In the
absence of demonstrated function a gene may be characterized by sequence, transcription
or
homology. HUGO, J.A. White et. al. Guidelines for Human Gene Nomenclature HGNC
Human Genome Nomenclature Committee, 1997 http://www.gene.ucl.ac.uk/nomenclature/guidelines.html#2.2
From Genesweep, Ensembl, European Bioinformatics Institute, UK
http://www.ensembl.org/genesweep.html
At the 2000 Cold Spring Harbor Genome conference [May 10-14] “one of
the hotly debated topics was the number of human genes. This has been estimated
at anything from 35,000 to 150,000. Considering the spread of opinion,
the only way to resolve was to get people to bet on it … This led to an
interesting debate on the definition of a gene … and how to assess that
number.”
A gene is a set of connected transcripts. A transcript is a set
of exons via transcription followed (optionally) by pre- mRNA splicing.
Two transcripts are connected if they share at least part of one exon in
the genomic coordinates. At least one transcript must be expressed outside
of the nucleus and one transcript must encode a protein (see Footnotes).
MGI Glossary,
Mouse Genome Informatics, Jackson Lab outlines 7 different possibilities
referred to by "gene". http://www.informatics.jax.org/mgihome/other/glossary.shtml#gene Assessment of the method used to determine the gene will occur by voting
at Cold Spring Harbor Genome Meeting 2002 Footnotes
We are restricting ourselves to protein coding genes to allow an
effective assessment. RNA genes were considered too difficult to
assess by 2003.
The key definition in the gene is that alternatively spliced transcripts
all belong to the same gene, even if the proteins that are produced are
different.
The hope is that by 2003 we should have at least a hard floor to the gene
numbers. The voting should be able to determine the best method. [The cost
of betting goes up over the years because people will have more information.]
The scope of the genome are the autosomal chromosomes and X and Y. No
epigenetic
nor mitochondrial genes are counted.
Encoding a protein assumes that the translation machinery does translate
the sequence at some time. The scope of the expression of genes is across
all cell types and all developmental stages (obviously!).
The genome is defined as the reference sequence (hence a mosaic of haplotypes)
as defined by Greg Schuler, NCBI.
Somatic recombinant loci are counted after recombination: i.e.,
Ig [immunoglobulin] and TCR [T cell receptor] loci will form one gene per locus.
Transcripts from repetitive regions are not counted even if expressed.
A repetitive region is an element which is both repeated in the genome
and has good evidence that the method of replication is based on a selfish
replication strategy.
If trans- splicing is found in humans (which it has not been so far, and
is unlikely to occur. But just in case) the definition of the transcript
occurs after the trans splicing event. This will split trans- spliced, polycistronic transcripts into multiple genes by this definition.
June 3, 2003 And the winner is... Nature "Human
Gene Number Wager Won" http://www.nature.com/nsu/030602/030602-3.html
But the definitive answer is still sometime off. (What were they thinking
in 2000?) After all, does Saccharomyces
cerevisae, whose genome was completed in 1996, have an absolutely
definitive gene count yet?
Does defining “gene” only get harder? Or are we making progress
by recognizing how complicated it really is? This is not a new problem. The report of the Invitational DOE Workshop
on Genome Informatics (26-27 April 1993, Baltimore MD) pointed out "The
concept of “gene” is perhaps even more resistant to unambiguous definition
now than before the advent of molecular biology. Our inability to produce
a single definition for “gene” has no adverse effect upon bench research,
[is this true?] but it poses real challenges for the development of federated
genome databases.
http://www.ornl.gov/hgmis/publicat/miscpubs/bioinfo/inf_rep2.html
A tutorial "Ontologies for Molecular Biology Workshop: Semantic Foundations
for Molecular Biologies" at the Intelligent Systems for Molecular Biology
Conference ( June 27-28, 1998) in Montreal, Canada noted "Molecular
biology has a communication problem. Many researchers and databases
use (at least partially) idiosyncratic terms and concepts for representing
biological information. Often, terms and definitions differ between groups,
with different groups not infrequently using identical terms with different
meanings. The concept ‘gene’, for example, is used with different
semantics by the major international genomic databases. http://www-lbit.iro.umontreal.ca/ISMB98/anglais/ontology.html
An account of a Gene Nomenclature workshop held in
conjunction with the annual American Society of Human Genetics meeting in
Philadelphia PA, US Oct. 2 2000 reported on discussion between the human and
mouse nomenclature committees (and other interested parties): "A
gene can be defined as an abstraction that is useful for the purposes of
nomenclature and for the assignment of a symbol. It was originally described as
a "unit of inheritance" and has since been described a "set of
features on the genome that can produce a functional unit", but this latter
term does not encompass all of those objects to which symbols are assigned.
Designations in MGD [Jackson Lab's Mouse Genome Database] specify whether each object is a
marker, gene, D segment
etc., so in this context the actual definition of a gene is not so important.
The GeneSweep definition is
not particularly useful for nomenclature as it indicates all genes must code for
a protein, and hence does not include mRNAs etc. It was agreed that the term
"gene" has been used for a collection of object types and should not
be removed as it is still a very useful term, particularly for the clinician and
for those with a clearly defined locus of interest; however, perhaps it is not
so useful for nomenclature, and the term "genomic feature"
should be used instead. Possible definitions of genomic features were discussed,
including an object which shares exons, that are assumed to be transcripts from
the same gene. Another suggestion was that the term "symbol" should be
defined, rather than "gene", as this is what nomenclature committees
work with, and it can incorporate a number of variations on the term
"gene". HM Wain et. al "Report of ASHG- NW Gene Nomenclature
Workshop", HUGO, Jan. 2001 http://www.gene.ucl.ac.uk/nomenclature/ashgnw_report.html
See also under gene family. See also William
Gelbart "Databases in Genome Research, Science Oct 23, 1998
Nomenclature
and terminology promise to continue
to be ongoing challenges as comparative genomics matures.
Gene structure, parts of genes
(and potential genes) and gene processes:
Parts of genes and gene processes constitute the rest of this
section. Broader term: DNA Is protein
broader, narrower or somewhere in between? The genome is smaller, in a sense
than the proteome, but the number of proteins is infinitely larger than the
number of genes. At the
biochemical and molecular level these hierarchies are being redefined, in ways
we are just beginning to comprehend. See also Gene
categories
alternative exons:
When interrupted genes produce messenger RNA, there occurs in certain genes
tissue and stage- specific alternative splicing. The interrupted gene
produces primary transcription product a heterogeneous nuclear RNA molecule, in
which both exons and introns are represented. Introns, however,
are removed from the primary transcript during the processing of
messenger RNA in specific splicing reactions. Splicing is usually constitutive,
which means that all exons are joined together in the order in which they occur
in the heterogeneous nuclear RNA. In many genes, however, alternative splicing
has also been observed, in which the exons may be combined in some other way
(Fig. 2). For example, some exon or exons may be skipped in the splicing
reaction. The primary order of the exons is not, however, altered even in
alternative splicing. Thus alternative splicing makes it possible for a single
gene to produce more than one messenger RNA molecule, which contradicts the
basic conceptual framework of the neoclassical view of the gene. Petter
Portin in "The Origin, Development and Present Status of the Concept of the
Gene: A Short Historical Account of the Discoveries" Univ. of Turku,
Finland, 2000 http://www.bentham.org/cg/sample/cg1-1/Portin.pdf
Related term: constitutive exons
alternative splicing: The production of two or more distinct mRNAs from RNA transcripts having the same sequence via differences in
splicing (by the choice of different exons). Mouse Genome
Informatics, Jackson Lab Different ways of combining a gene's exons to make
variants of the complete protein DOE, Genome Glossary, Oak Ridge National Lab,
US
Recent genome- wide analyses of alternative splicing indicate that 40- 60% of
human genes have alternative splice forms, suggesting that alternative splicing is one of the most significant components of the functional
complexity of the human genome. Here we review these recent results from
bioinformatics studies, assess their reliability and consider the impact of alternative splicing on biological
functions. Although the 'big picture' of alternative splicing that is emerging from genomics is exciting, there are many challenges.
High- throughput experimental verification of alternative splice forms, functional characterization, and regulation of alternative splicing are key directions for research.
B. Modrek, C. Lee, "A genomic view of alternative splicing" Nature Genetics30
(1) :13- 19, Jan. 2002 Alternative splicing involves the processing of a primary gene transcript,
which is a messenger RNA (mRNA) with introns, into a variety of mRNA isoforms,
which differ in their precise combination of exons. Even a small change in the
mRNA sequence, such as the presence or absence of a single exon, can alter the
protein's functional properties in important ways. Paula Grabowski, Dept.
of Biological Sciences, Univ. of Pittsburgh, US, 2001 http://www.pitt.edu/AFShome/b/i/biohome/public/html/Dept/Frame/Faculty/grabowski.htm Alternative
splicing was first observed in animal viruses [87 - 95]. The first observations
of alternative splicing in the genes of eukaryotes concerned murine
immunoglobulin genes [96 - 99]. Since then, alternative splicing has been
observed in hundreds of genes in various eukaryotic organisms, man included [see
100 for review]. The tissue specificity of alternative splicing was first
shown in the fibrinogen genes of rat and man [101]. The first observations of
developmental stage specificity concerned the alcohol dehydrogenase gene of Drosophila
melanogaster [102]. The first demonstration that alternative splicing
was both tissue and stage-specific concerned the trompomyosin gene of D.
melanogaster and rat [103, 104]. The tissue and stage specificity of
alternative splicing naturally constitutes a previously unknown and effective
mechanism of gene regulation. Petter Portin in "The Origin,
Development and Present Status of the Concept of the Gene: A Short Historical
Account of the Discoveries" Univ. of Turku, Finland, 2000 http://www.bentham.org/cg/sample/cg1-1/Portin.pdf
Broader term: splicing Related terms: alternative splice sites;
RNA glossary pre- mRNA splicing, RNA splicing; Sequences,
DNA & beyond protein splicing, trans- splicing alternative transcript: Expression, genes &
beyond
biochemical genomics: Functional
genomics glossary
Can identify genes by the function of their products.
cDNA complementary DNA:
A single stranded DNA molecule with a
nucleotide sequence that is complementary to an RNA molecule; cDNA is formed
by the action of the enzyme reverse transcriptase on an RNA template. After
conversion to the double stranded form, cDNA is used for molecular cloning
or for hybridization studies. IUPAC Biotech A complementary DNA for a messenger RNA molecule. Unlike an mRNA, a
cDNA can be easily propagated and sequenced. NCBI Single-stranded complementary DNA synthesized from an RNA template by the action of
RNA- dependent DNA polymerase. cDNA (i.e., complementary DNA, not circular DNA, not
cDNA) is used in a variety of molecular cloning experiments as well as serving as a specific hybridization probe.
MeSH, 1994 The term cDNA can encompass "proper"
cDNAs and ESTs. "Proper" cDNAs are long segments of genes,
often full-length. Many specialists believe that cDNAs (including ESTs)
are the highest-value sequences, because they represent experimentally
determined genes. CHI Outlook
for DNA Microarrays report Logic of Molecular Approaches to Biological Problems, John
Wagner (Cornell
Univ. Graduate School of Medical Science, US ) has an extensive and articulate
section on the use of cDNA in experimental design. http://www-users.med.cornell.edu/~jawagne/cDNA_cloning.html
Narrower term
cDNA
maps Related terms transcript clusters; DNA
glossary EST
expressed
sequence tag,
genomic DNA; Expression gene expression
cDNA databases: Databases & software
directory cis-: This side of; compare with trans-,
meaning across.
cis trans
test:
In the cis-trans
test cis- and trans
-heterozygotes are compared. In the cis
-heterozygote the mutations are in the same chromosome but in the trans
-heterozygote in homologous chromosomes. Thus the genotype of the cis
-heterozygote is designated as a b/+ +
and that of the trans -heterozygote
as a +/+ b. If the cis
-heterozygote is of a wild type phenotype and the trans
-heterozygote is mutant, a and b
are mutations of the same cistron. If, however, both cis-
and trans -heterozygotes are
phenotypically of a wild type, a and b
are mutations of different cistrons. The cistron is a synonym of the gene, but
this term should be used only when it is based on cis-
trans test or biochemical
evidence. Petter Portin in "The Origin, Development and Present
Status of the Concept of the Gene: A Short Historical Account of the
Discoveries" Univ. of Turku, Finland, 2000 http://www.bentham.org/cg/sample/cg1-1/Portin.pdf cistron:
HF Judson in the Eighth Day of Creation tells how Seymour
Benzer "wanted to scrap the word "gene" and replace it with three
new terms, "muton" for the smallest spot at which mutation could take
place, "recon' for the irreducibly shortest length on the map that could
not be split by a genetic recombination even at the fine scale he had reached,
and "cistron" for the shortest stretch that comprised a functional
genetic unit. (The last was derived from the mating tactic Benzer used to
determine which mutations lay near each other on the map, which was technically
called the "cis- trans test"... Over the next decade, Benzer's new
terms came into a considerable vogue, especially "cistron". But the
other two were superfluous once mutations and recombinations could be thought of
in terms of base pairs, while the cistron was, in effect, the gene in its
principal sense; it is the older usage that has lasted and the newer one that
has died away. Horace F Judson Eighth Day of Creation, Cold Spring Harbor
Laboratory Press, 1996 pp. 320-321
Term coined by Seymour Benzer in 1955 referring
to DNA coding for a single polypeptide. Originally did not include the
start and stop codons. Related term: polycistronic
coding
regions:
The part of a gene that specifies the structure
of a protein. [SNP Consortium] Also referred to as a "coding sequence" or
protein
coding region or sequence. Narrower terms mature peptide or protein
coding sequence, signal peptide coding sequence, transit peptide coding sequence
coding sequence CDS:
Sequence of nucleotides that corresponds
with the sequence of amino acids in a protein (location includes stop codon).
Feature includes amino acid conceptual translation. DDBJ/ EMBL/ GenBank
Feature Table http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
Related terms: coding regions, mature peptide or protein coding
sequence
complementary DNA: See cDNA.
constitutive exons:
Splicing is usually constitutive, which
means that all exons are joined together in the order in which they occur in the
heterogeneous nuclear RNA. [Petter Portin in "The Origin, Development
and Present Status of the Concept of the Gene: A Short Historical Account of the
Discoveries" Univ. of Turku, Finland, 2000 http://www.bentham.org/cg/sample/cg1-1/Portin.pdf
CpG islands:
Areas of increased density of the dinucleotide sequence
cytosine- phosphate diester-- guanine. They form stretches of DNA several hundred to several thousand
base pairs long. In humans there are about 45,000 CpG islands, mostly found at the
5' ends of genes. They are unmethylated except for those on the inactive X chromosome and some associated with
imprinted genes. MeSH, 1996
Genomic regions that signal the presence of genes.
Alison
Stewart "The human gene map initiative" Genome Digest 2 (2) : 1-4
http://www.gene.ucl.ac.uk/hugo/london.htm EST Expressed sequence tag: DNA glossary
May but don't necessarily represent genes.
exons: A section of DNA which carries the coding sequence for
a protein or part of it. Exons are separated by intervening, non- coding sequences (introns). In eukaryotes most genes consist of a number
of exons. IUPAC Bioinorganic
The portion of the genome that is expressed as a processed mRNA. [NHLBI]
The parts of a genetic transcript remaining after the INTRONS are removed and
which are spliced together to become a messenger or structural RNA. MeSH, 1987
The term "exon" is normally applied for regions which are not spliced
out from a pre- mRNA sequence (5' untranslated region (5' UTR), coding sequences
(CDS) and 3' untranslated region (3' UTR). But this term is often used
also to indicate the protein- coding regions only. “Gene Structure Prediction”
HGMP training course notes, Luciano Milanesi, 1998 http://www.hgmp.mrc.ac.uk/Courses/GeneProteinID/milanesi/milanesi.htm
Exons contain the coding sequences of a gene -
in contrast to introns, or "junk DNA," which are excised before mRNA
is translated into a protein.
Narrower terms: alternative exons, constitutive exons, non-coding first
exons; Sequences, DNA
& beyond glossary non- coding first exons
expressed sequence: See coding sequence (coding regions) gene: Definitions and history of are at the beginning of this glossary. Gene categories: Narrower terms
include antibody genes, candidate genes,
caretaker genes, DME Drug Metabolizing Enzyme genes, DNA repair genes,
developmental genes, differentiated genes, extranuclear genes, gatekeeper genes,
housekeeping genes, hypothetical genes, immunoglobulin genes IG, immediate-
early genes, lethal genes, luxury genes, mitochondrial genes, non- nuclear
genes, nuclear genes, oncogene, operator genes, organelle genes, orphan genes,
pleiotropic gene, predicted genes, processed genes, promoter genes, pseudogenes,
putative genes, RNA genes, rRNA genes, regulator genes, regulatory genes,
reporter genes, silent genes, split genes, structural genes, suppressor genes,
syntenic genes, tumor suppressor gene, virulence genes gene cluster: A set of closely related
genes
that code for the same or similar proteins
and which are usually grouped together on the same chromosome Life
Science In UniGene (an experimental system for
automatically partitioning GenBank sequences into a non- redundant set of gene-
oriented clusters), each UniGene cluster contains sequences that represent
a unique gene, as well as related information such as the tissue types in which
the gene has been expressed and map location. NCBI, UniGene, US http://www.ncbi.nlm.nih.gov/UniGene/index.html King's Dictionary of Genetics cross references gene
cluster with "multigene family" (which can be on the same or different
chromosomes, and descended by gene duplication) and reiterated genes (which are
multiple genes on the same chromosome). The MeSH term "multigene
family" is based on King's definition. Y and M. Zhang's Dictionary of
Gene Technology Terms definition specifies "identical or related
genes" coding for "the same or similar proteins". The Oxford
Dictionary of Biochemistry & Molecular Biology defines "gene
cluster or gene complex" and specifies "functionally related" and
"closely linked" genes on a chromosome and notes these are "often
structural genes coding for the enzymes that catalyse the various steps of a
metabolic pathway". Is a consensus definition possible? In bacteria see operon
gene coding: See coding regions, coding sequences.
gene discovery methods: See SNPs
& other Genetic
variations : candidate gene
approach, direct approach, functional cloning, indirect approach, linkage analysis,
positional cloning, random
genome-wide association studies; Functional
genomics; IN silico & Molecular modeling: gene identification, gene prediction
gene expression
regulation: Expression glossary
gene
families: The HUGO Gene Nomenclature Committee
http://www.genenames.org/ has been working to develop a unique symbol, as well as a longer and more
descriptive name, for each human gene. Thus, members of many gene families,
previously cloned in different laboratories and known by a variety of terms, now
share a common gene symbol. A text search in any of the genome browsers will
often return links to all named members of a gene family that have been mapped
to the genome. Whereas Ensembl and UCSC currently return lists of the genes, the
NCBI presents both a list and a graphical overview. "How can one
find all the members of a human gene family? Nature Genetics 32 supplement: 49-
52, 2002 http://www.nature.com/cgi-taf/DynaPage.taf?file=/ng/journal/v32/n1s/full/ng973.html
Related terms: Functional genomics
glossary gene families -- drug discovery:
Drug discovery & development
glossary
gene grouping:
HGNC Gene
Grouping/ Family Nomenclature, HUGO,
Human Genome Nomenclature Committee, with link to gene families
currently under review http://www.genenames.org/genefamily.html
gene identification: Molecular
modeling glossary
gene imprinting:
A phenomenon in which the phenotype of the disease depends on which parent
passed on the disease gene. For instance, Prader- Willi syndrome and Angelman
syndrome are both inherited when the same part of chromosome 15 is missing. When
the father's complement of 15 is missing, then the child has Prader-Willi, but
when the mother's complement of 15 is missing, the child has Angelman syndrome. [PhRMA]
See also under epigenetics gene localization: Several lines of evidence suggest that the nucleus
of mammalian cells is compartmentalized. It is possible that the position
of a gene either within the 3‑dimensional space of the nucleus or within a
local chromatin domain could play an important role in the efficiency with which
its transcripts are spliced or polyadenylated, or its mRNA is transported from
the nucleus. [Chao Chen, Lawrence A. Chasin "Effect of Gene Localization on
RNA Processing" Columbia, Univ. 2000] http://www.columbia.edu/cu/biology/faculty/chasin/position.html
Related terms: Proteins: protein localization, subcellular localization gene map, gene mapping: Maps genomic &
genetic
gene order:
The sequential location of genes on a chromosome.
MeSH, 2001
gene prediction: Molecular
modeling glossary
Related term: gene validation
gene product: A description of the protein or RNA product
(and its function, if relevant) that is coded for by the gene. [SGD
Saccharomyces Genome Database Glossary, Stanford University http://genome-www.stanford.edu/Saccharomyces/help/glossary.html#fasta
There is a potential for semantic confusion between a gene product and
its molecular function, because very often these are described in exactly
the same words. For example, "alcohol dehydrogenase" can describe what
you can put in an Eppendorf tube (gene product) or it can describe the
function of this stuff. There is, however, a formal difference -- a "product"
has a (potentially) many- to- many relationship with a "molecular function." [Gene
Ontology TM Documentation] http://www.geneontology.org/GO.doc.html
The biochemical material, either RNA or protein, resulting from
expression
of a gene. The amount of gene product is used to measure how active a gene
is; abnormal amounts can be correlated with disease causing alleles. [DOE]
gene recognition: Molecular
modeling glossary
gene regulation: http://en.wikipedia.org/wiki/Regulation_of_gene_expression gene silencing: Genetic
manipulation & disruption
gene structural components: Includes exons, introns, regulatory
sequences, splice sites, other? gene structure:
In prokaryotes, genes tend to be clustered in coordinately-
regulated groups called operons. The genes are transcribed together
on a single transcript and each protein within the cluster is translated
separately. Prokaryotes can "couple" transcription and translation -
i.e. a mRNA being transcribed can begin being translated even before
transcription is complete. In eukaryotes, genes are not clustered in
operons. In addition,
eukaryotic genes often contain non- coding introns ("intervening
sequences") interspersed among the coding regions (exons). During RNA
processing, introns are removed from RNA transcripts and the exons are spliced
together. Mature mRNA, after being transcribed and processed in the nucleus, is
transported into the cytoplasm where translation occurs. Because transcription
and translation occur in different "compartments" in eukaryotes,
"coupling" of these two processes is not possible. M. A.
Gilles- Gonzalez Micro 521, Ohio State Univ. 1997
http://www.biosci.ohio-state.edu/~mgonzalez/Micro521/05.html Related term:
genetic structures gene superfamily: Gene superfamily is defined as "a cluster of
evolutionarily related sequences" (Dayhoff,
1976), and consists of homologous gene families, which are clusters
of genes from different genomes that include both orthologs and paralogs
(Tatusov
et al., 1997). gene validation: Genetic validation of predicted genes.
See under transcription clusters. Related term: Target
validation glossary target validation genetic code: Sequencing,
DNA & beyond glossary genetic
structures: The biological objects that
contain genetic information and that are involved in transmitting genetically
encoded traits from one organism to another. MeSH 2003 Subcategories
include base sequence, chromosome structures, chromosomes, gene library, genes,
genetic code, genetic vectors, genome components, plasmids and others. Related term: gene
structure
genome control maps: Expression glossary localize: Determination of the original position (locus)
of a gene or other marker on a chromosome. [DOE]
Related terms: gene localization; Labels glossary:
immunohistochemistry; Proteins
protein localization, subcellular localization
locus
(plural loci): The word "locus" is not
a synonym for gene but refers to a map position. A more precise definition is
given in the Rules
and Guidelines from the International Committee on Standardized Genetic
Nomenclature for Mice
which states: "A locus is a point in the
genome, identified by a marker, which can be mapped by some means. It does not
necessarily correspond to a gene; it could, for example, be an anonymous
non-coding DNA segment or a cytogenetic feature. A single gene may have several
loci within it (each defined by different markers) and these markers may be
separated in genetic or physical mapping experiments. In such cases, it is
useful to define these different loci, but normally the gene name should be used
to designate the gene itself, as this usually will convey the most
information."
Position on a chromosome of a gene or other
chromosome marker; also the DNA at that position. The use of locus is sometimes
restricted to mean regions of DNA that are expressed. [DOE]
Any genomic site, whether functional or not, that can be mapped through
formal genetic analysis. [NHLBI]
Related term: Expression glossary gene
expression;
Mendelian genetics: Genomics glossary mobile genetic elements:
Includes retrotransposons, transposons;
Sequences,
DNA & beyond glossary LINES, SINES ORF open reading frame: Sequences,
DNA & beyond
May, but don't necessarily represent genes. Broader
term reading frames Sequences, DNA
& beyond
ORFans:
ORFans
comprise 20-30% of the ORFs of most completely sequenced genomes. Because
nothing can be learnt about ORFans via sequence homology, the functions and
evolutionary origins of ORFans remain a mystery... We find that functional and
structural studies of ORFans are not as underemphasized as previously suggested.
These recently determined structures correspond to ORFans from all Kingdoms of
life, and include proteins that have previously been functionally characterized,
as well as structural genomics targets of unknown function labeled as
"hypothetical proteins". This suggests that many of the ORFans in the
databases are likely to correspond to expressed, functional (and even essential)
proteins. Furthermore, the recently determined structures include examples of
the various types of ORFans, suggesting that the functions and evolutionary
origins of ORFans are diverse. N. Siew and D. Fischer, Structural
Biology Sheds Light on the Puzzle of Genomic ORFans, J Mol Biol. 342(2):
369- 373, Sept. 10, 2004
Protein encoding
regions [ORFs] with no apparent similarity to proteins in other genomes. D.
Fischer and D. Eisenberg, Finding
families for genomic ORFans, Bioinformatics, 15(9): 759- 762, Sept.
1999
Related terms: Gene categories: hypothetical
genes, pleiotropic gene; -Omes & -omics
glossary ORFeome.
open reading frame: See ORF Sequences,
DNA & beyond
operon:
A functional unit consisting of a promoter, an operator
and a number of structural genes, found mainly in prokaryotes. The structural
genes commonly code for several functionally related enzymes, and although
they are transcribed as one (polycistronic) mRNA each is independently
translated. In the typical operon, the operator region acts as a controlling
element in switching on or off the synthesis of mRNA. (operator gene) IUPAC
Biotech The genetic unit consisting of a feedback system under the control of
an operator gene, in which a structural gene transcribes its message
in the form of mRNA upon blockade of a repressor produced by a regulator
gene. Included here is the attenuator site of bacterial operons where transcription
termination is regulated. MeSH, 1972
paramutation: See under epigenetics. polycistronic:
Implies coding for two or more proteins.
See also cistron.
polygene:
Genetics. A gene which acts together
with other genes to influence quantitative traits (such as size or weight).
Oxford English Dictionary
Seems to have begun as a concept which referred to a hypothetical
single "gene" which acted with other genes in a less than Mendelian fashion,
and evolved into a class of "genes" which we have yet to truly begin
to understand. Related terms Genomics
Glossary polygenic, post- genomic, post-
Mendelian
proper cDNA: See under cDNA regulon:
In eukaryotes, a genetic unit consisting of a noncontiguous
group of genes under the control of a single regulator gene. In bacteria,
regulons are global regulatory systems involved in the interplay of pleiotropic
regulatory domains. These regulatory systems consist of several operons.
MeSH, 1994
repressors: See under regulator genes retrotransposon: DNA
fragments copied from viral RNA
with reverse transcriptase
that insert in the host chromosomes. Edward Bollenbach, Life Sciences Dictionary Related term transposons.
signal peptide coding sequence:
Coding sequence for an N-terminal
domain of a secreted protein; this domain is involved in attaching nascent
polypeptide to the membrane; leader sequence. [DDBJ/ EMBL/ GenBank Feature
Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
splice variants: Splice
variants play an important role within the cell in both increasing the proteome
diversity and in cellular function. Splice variants are also associated with
disease states and may play a role in their etiology. Information about splice
variants has, until now, mostly been derived from the primary transcript or
through cellular studies. In this study information from the transcript and
other studies is combined with tertiary structure information derived from
homology models. Splice
variants: a homology modeling approach. Furnham
N, Ruffle S, Southan C. Proteins. 2004 Feb 15; 54(3): 596-608. structural genes: Gene categories synteny:
Two genes which occur on the same chromosome are syntenic;
however, syntenic genes may or may not be linked. [NHLBI]
The presence of two or more genetic loci on the same chromosome. Extensions of this original definition refer to the similarity in content and organization between chromosomes, of different species for example.
MeSH, 2002
transcript clusters:
[Bo] Yuan [Ohio State Univ.] avoids calling the
index entries genes, preferring to call them transcript clusters, a
careful term referring to how cDNAs and ESTs from different databases are
grouped together based on homology. "They should be genes, but we don't
have the evidence yet," he says. "We still have to confirm that all
those transcripts and ESTs that align with the genome are functional." ...
Confirming that predictions are real genes, known as validation, is a major
reason the gene count will remain open for a while. "A prediction is just a
prediction," says [Michael] Cooke [Genomics Institute, Novartis Research
Foundation]. "You have to validate the prediction experimentally before you
can call it a gene." Tom Hollon "Human Genes: How Many?"
Scientist 15 (20): 1, Oct. 15, 2001 http://www.the-scientist.com/yr2001/oct/hollon_p1_011015.html
Related terms: gene validation; In
silico & Molecular modeling
glossary: gene prediction
transit peptide coding sequence: Coding sequence for an
N-terminal domain of a nuclear- encoded organellar protein; this domain
is involved in post- translational import of the protein into the organelle.
DDBJ/ EMBL/ GenBank Feature Table http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
transposons: Sequences,
DNA & beyond Related term retrotransposon Bibliography
How
to look for other unfamiliar terms
IUPAC definitions are reprinted with the permission of the International
Union of Pure and Applied Chemistry
Evolving terminology for emerging
technologies
Comments? Suggestions? Questions? Mary Chitty mchitty@healthtech.com
Last revised
March 23, 2012
<%end if%>
Related glossaries particularly include Gene categories
, Nomenclature
Applications Genomics Proteomics is also key, since it is the gene's protein products which are ultimately
of interest.
Informatics Bioinformatics
Molecular
modeling
Technologies Sequencing Not until the technologies for working with
DNA and proteins are better integrated will their researchers be better
integrated than they are now. Microarrays &
protein chips show promise in the meantime.
Biology DNA, Expression, Proteins,
RNA,
SNPs & genetic variations, Sequences,
DNA & beyond.
Horace Freeland Judson, writing in the Feb. 2001 human genome issue of Nature
notes problems with terminology. "The phrases current in genetics that
most plainly do violence to understanding begin "the gene for":
the gene for breast cancer, the gene for hypercholesterolaemia, the gene
for schizophrenia, the gene for homosexuality, and so on. We know of course
that there are no single genes for such things. We need to revive and put
into public use the term "allele". Thus, "the gene for breast cancer"
is rather the allele, the gene defect - one of several - that increases
the odds that a woman will get breast cancer. "The gene for" does, of course,
have a real meaning: the enzyme or control element that the unmutated gene,
the wild- type allele, specifies. But often, as yet, we do not know what
the normal gene is for. ... Pleiotropy. Polygeny. Perhaps these terms will not easily become
common parlance; but the critical point never to omit is that genes act in concert with one
another - collectively with the environment. Again, all this has long been understood by biologists,
when they break free of habitual careless words. We will not abandon the reductionist
Mendelian programme for a hand- wringing holism: we cannot abandon the term gene and its allies.
On the contrary, for ourselves, for the general public, what we require
is to get more fully and precisely into the proper language of genetics." Horace Freeland Judson "Talking about the genome" Nature 409: 769, 15
Feb. 2001
Gene is a good example of
a word in the process of evolving from classical genetics meanings (fairly abstract
concepts, rooted in the Mendelian model of monogenic
diseases with high penetrance). The concept of "gene" has been changing so fast that most print
resources (and some online) are out of date. The absolute best source I've found is at http://www.ergito.com/ a
project of Benjamin Lewin and colleagues
(requires registration, subscription fee for parts of site) Molecular Biology: The best- selling textbook GENES
online (which also has an extensive glossary).
gene: (cistron) Structurally, a basic unit of hereditary material;
an ordered sequence of nucleotide bases that encodes one polypeptide chain
(via mRNA). The gene includes, however, regions preceding and following
the coding region (leader and trailer) as well as (in eukaryotes) intervening
sequences (introns) between individual coding segments (exons). Functionally,
the gene is defined by the cis- trans test that determines whether
independent mutations of the same phenotype occur within a single gene
or in several genes involved in the same function. IUPAC Compendium
cDNA maps: Maps genomic & genetic
glossary
central dogma: Sequences DNA & beyond
chromosome: Cell biology glossary
Wikipedia http://en.wikipedia.org/wiki/CpG_island
Wikipedia http://en.wikipedia.org/wiki/Gene_families
Genomic imprinting website
http://www.geneimprint.com/
BioBase http://www.gene-regulation.com/
Databases
genetic linkage maps:
Maps
genomic & genetic glossary
genomic DNA: Genomics glossary
genomic ORFans: See
ORFans
genotype, genotyping: Sequencing glossary
global regulators: Expression glossary
haplotype,
haplotyping: Sequencing glossary
intergenic DNA,
intragenic DNA: DNA glossary
introns: Sequences,
DNA & beyond glossary
jumping genes: See Gene categories transposons
metagenes: Expression
glossary
molecular function: See Functional
genomics glossary Gene Ontology
muton: See under cistron
phenotype, phenotyping: Genomics glossary
pleiotropic gene:
Gene categories
quantitative gene: See under
polygene.
recon: See under cistron
superfamily: See gene superfamily and under gene family.
susceptibility genes: Molecular
Medicine glossary
syntenic genes: See under
synteny.
trans-splicing: Sequences,
DNA & beyond glossary
variants: SNPs
& other Genetic variations
glossary
DDBJ/ EMBL/ GenBank
Feature Table, 2001, 100+ definitions. http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
IUPAC International Union of Pure and Applied Chemistry, Glossary of Terms
used in Bioinorganic Chemistry, Recommendations, 1997. 450+ definitions. http://www.chem.qmw.ac.uk/iupac/bioinorg/
IUPAC International Union of Pure and Applied Chemistry, Glossary for
Chemists of terms used in biotechnology. Recommendations, Pure & Applied
Chemistry 64 (1): 143-168, 1992. 200 + definitions.
Jackson Lab, Mouse
Genome Informatics Glossary, Jackson Lab, US, 250+ definitions, 2006 http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=glossaryIndex&print=no
Lewin, Benjamin Genes VII, Oxford University Press, 1999. To order:
http://www.oup-usa.org/isbn/019879276X.html
Online (full- text) and updated http://www.ergito.com
Has extensive glossary
MeSH Topical Subheadings with Scope Notes, National Library of Medicine, 80 +
definitions http://www.nlm.nih.gov/mesh/topsubscope.html