Back
to BiopharmaceuticalGlossaries.com
You are here Biopharmaceutical/
Genomic Glossary homepage/Search >
Technologies > Biopharmaceutical Sequencing
Sequencing
for biopharmaceuticals Glossary & taxonomy
The "race" to sequence the Human Genome was not a 100 yard
dash, but a marathon. Although the Human Genome Project finished
well ahead of schedule, and a number of genes have been identified, we have
just begun to get a glimpse of what specific genes do and how we might be
able to better use this knowledge for therapeutic interventions. Teasing
apart the interactions of genes and proteins, delineating changes throughout
the
cell cycle, and correlating changes with health and disease will take even more time.
But with complete sequences, and the cross- species comparisons
we can expect new insights and speeding up over time. Sequencing DNA is only a first step towards finding what functions are
connected with specific sequences. Sequencing proteins
(and determining the structures – and functions of proteins) is
ongoing. Sequencing of carbohydrates
is even more difficult.
Applications
Technologies Map: Finding guide to terms in these glossaries Site
Map alignment:
The process of lining up two or more sequences to achieve maximal levels of
identity (and conservation, in the case of amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology.
[NCBI Bioinformatics]
Narrower terms:
global alignment, local alignment, optimal
alignment, pairwise alignment.
Related terms BLAST, BEAUTY, BLAST2, FASTA, gapped BLAST, Needleman -
Wunsch, Smith - Waterman alignment annotation:
Bioinformatics
assembled:
The term used to describe the process of using a computer
to join up bits of sequence into a larger whole. Peer Bork, Richard Copley
"Filling in the gaps" Nature 409: 218-820, 15 Feb. 2001
This is different from assembly language, and the source of
some confusion between
biologists and computer scientists.
Related term contig assembly
automation: See sequencers- automation
BEAUTY BLAST Enhanced Alignment Utility: An enhanced version
of the NCBI's BLAST database search tool. BEAUTY, when used to search three
new custom sequence databases that we have developed, incorporates information
on sequence family membership, the location of the conserved domains, and
the locations of any annotated domains and sites directly into BLAST search
results. These enhancements make it much easier to detect weak, but functionally
significant, matches in BLAST database searches. http://searchlauncher.bcm.tmc.edu:9331/seq-search/Help/beauty.html
Beyond
Sequencing June 22-23, 2010 • San
Francisco, CA See also next
generation sequencing.
BLAST (Basic Local Alignment Search Tool): Software program from
NCBI for searching public databases for homologous sequences or proteins.
Designed to explore all available sequence databases regardless of whether
query is protein or DNA. http://www.ncbi.nlm.nih.gov/BLAST/
Faster but less rigorous than FASTA or Smith- Waterman BLAST2:
A newer release of BLAST that allows insertions or deletions
in the aligned sequences. Gapped alignments may be more biologically significant.
Synonymous with gapped BLAST
BLAT: BLAST
Like Alignment Tool, Jim Kent http://www.genomeblat.com/genomeblat/index.asp
base caller:
A programs that analyzes trace data in chromatogram files and assigns a base for each peak. Some programs, for example,
Phred and TraceTuner also produce a corresponding quality value for each base.
DNA Sequence Glossary Geospiza http://www.geospiza.com/support/glossary.htm
chain termination sequencing method: See Sanger sequencing
(under Maxam- Gilbert & Sanger).
chemical cleavage sequencing:
See Maxim- Gilbert sequencing.
chemical degradation sequencing:
See Maxim- Gilbert sequencing
clone, cloning: Cell biology
Related term: library
consensus sequence:
A theoretical representative nucleotide or
amino acid sequence in which each nucleotide or amino acid is the one,
which occurs most frequently at that site in the different forms which
occur in nature. The phrase also refers to an actual sequence, which approximates
the theoretical consensus. A known CONSERVED SEQUENCE set is represented by a consensus sequence. Commonly observed supersecondary protein structures (AMINO ACID MOTIFS) are often formed by
conserved sequences. MeSH, 1991
A sequence of DNA, RNA, protein or carbohydrate derived from a number
of similar molecules, which comprises the essential features for a particular
function. [IUPAC Bioinorganic]
Related term:
Protein structures:
amino acid motifs
conserved sequence:
A sequence of amino acids in a polypeptide or of nucleotides in DNA or RNA that is similar across multiple species. A known set of conserved sequences is represented by a
CONSENSUS SEQUENCE. AMINO ACID MOTIFS are
often composed of conserved sequences. MeSH, 1993
A "highly conserved sequence" is a DNA sequence that is very similar
in several different kinds of organisms. Scientists regard these cross
species similarities as evidence that a specific gene performs some basic
function essential to many forms of life and that evolution has therefore
conserved its structure by permitting few mutations to accumulate in it.
[NHGRI]
contig:
Group of cloned (copied) pieces of DNA representing overlapping regions of a particular chromosome.
[DOE]
A contiguous (i.e.
without gaps) stretch of DNA sequence which has been assembled solely on the
basis of direct sequencing information, i.e. sequencer reads. Note however that
'contig' is used in other contexts in genomics to mean a contiguous assembly of
something (e.g. clones), without necessarily implying that all the bases in the
assembly have been determined. Ensembl Glossary http://vega.sanger.ac.uk/Mus_musculus/helpview?se=1&kw=glossary
Narrower terms: initial sequence contigs, merged sequence
contigs. Related terms clone, contig assembly, scaffolds.
Published genome sequence has many gaps and interruptions. Concept of
"contig" is crucial to our understanding of current limitations. David Galas
"Making sense of the sequence" Science 291 (5507):
1257, Feb. 16, 2001
contig assembly:
One of the most difficult and critical functions
in DNA sequence analysis is putting together fragments from sets of overlapping
segments. Some programs do this better than others, particularly when dealing
with sequences containing gaps. [Laura De Francesco "Some things considered"
Scientist 12[20]:18, Oct. 12, 1999] http://www.the-scientist.com/yr1998/oct/profile1_981012.html
NCBI Contig Assembly
and Annotation Process, National Center for Biotechnology Information, US,
, Feb. 2001 http://www.ncbi.nlm.nih.gov/genome/guide/build.html
contig mapping: Maps
& mapping
coverage (or depth):
The average number of times that a nucleotide is represented by a
high- quality base in a collection of random raw sequence. Operationally, a
`high- quality base' is defined as one with an accuracy of at least 99% (corresponding to a Phred score of at least 20).
UC-Santa Cruz, US, Human Genome Project Working Draft Terminology, 2001 http://genome.ucsc.edu/goldenPath/term.html
DNA library: Combinatorial
libraries & synthesis
DNA reaction setup:
Each individual DNA sequencing project defines the
criteria for reaction setup, which include: 1) the type of template; 2) the
amount of DNA sequence that needs to be determined from each template; and 3)
how many templates are being analyzed. Quantifying each template is time
consuming, template quantities are usually estimated based on standard
purification scale and protocols to expedite the sequencing process. The
sequencing method used is based on the original "Sanger" (dideoxy
chain termination) sequencing methods developed in the 1970s. data handling:
Before the advent of capillary- based automated
sequencers, DNA sequence data was produced at a comparatively inefficient rate,
therefore data handling and analysis were not as great a concern. DNA sequencing
data production rates of a single lab can now exceed one million bases per
day.
Related terms: Bioinformatics
de novo
sequencing:
Determination of sequences (of genes
or amino acids) whose sequence is not yet known. Can be done with LC/MS/MS
or nanoelectrospray MS/MS.
From the Latin "de novo" from the beginning. See also Mass
spectrometry deep
sequencing: With the initial stages of the Human
Genome Project completed and new insights gained into the complex interplay of
genomic function, genomic structure and the environment in mental disorders,
attention is shifting towards the translational promise of the completed human
sequence and a new era of genomic medicine in mental disorders. However, there
are still major obstacles that need to be addressed before this occurs. It will
be necessary to see a confluence of developments in genomic technologies and
epidemiological methods, such as high throughput sequencing of candidate genes
or genomic regions; global single nucleotide polymorphism (SNP) or haplotype
analysis; improved measurements of environmental factors; well-characterized
clinical phenotypes and endophenotypes; and the development of new analytical
approaches. Furthermore, these advances will have to be applied in carefully
crafted study designs and, implicitly, in large samples. Deep Sequencing and
Haplotype Profiling of Mental Disorders (R01) Announcement Type, This is a
reissue of PA-05-106,
Released/Posted Dec.20, 2006 http://grants.nih.gov/grants/guide/pa-files/PA-07-209.html
depth:
See under coverage
detection methods:
When DNA sequencing was first developed in the
1970s, detection methods were relatively primitive compared to today’s
technology. Instead of fluorescent tags, DNA fragments were labeled with
radioactive tags. DNA fragments were then resolved, based on size, on vertical
polyacrylamide gels and the gels exposed to X-ray film to capture the DNA
fingerprint image of the separated fragments. The X-ray film was viewed by a
person and scored manually. Sequencing was limited to only 200- 300 bases of
data from a single sample analysis and many reiterative steps were required to
acquire just a few thousand bases of DNA sequence data. Development of
fluorescent detection technology has allowed the data collection and analysis
processes to be streamlined into a single step. DNA is separated on an automated
DNA sequencer and the fluorescent image of DNA migrating through the acrylamide
gel is captured by CCD cameras similar to that found in common home video
recorders. Software specific to the automated sequencer automatically tracks and
interprets the image (i.e., determines the fluorescent identity and hence the
specific nucleotide at the end of each fragment), processes the data, and
"calls", or interprets, the order of bases in the DNA sequence. The
data can then be used in downstream sequence assembly processes.
dideoxy sequencing: See Sanger sequencing under Maxam-Gilbert
& Sanger.
directed sequencing: See under shotgun sequencing
draft genome sequence [human]:
The sequence produced by combining the
information from individual sequenced clones (by creating merged sequenced
contigs and them employing linking information to create scaffolds) and
positioning the sequence along the physical map of the chromosomes. (Nickname
"golden path".) [Univ. of California Santa Cruz Human Genome Project
Working Draft terminology] http://genome.ucsc.edu/goldenPath/term.html
Sequence with lower accuracy than a finished
sequence; some segments are missing or in the wrong order or orientation.
[History
of the Human Genome Project" A Genome Glossary" Science 291: pullout chart
Feb. 16, 2001]
NHGRI Rapid Data
Release Policies, NHGRI, US, 2003 http://www.genome.gov/page.cfm?pageID=10506537
See also: working draft, human sequence
dynamic programming methods:
Assure the optimal global (Needleman and
Wunsch 1970; Sankoff and Kruskal 1983) or local (Smith, et al. 1981)
alignment by simply exploring all possible alignments and choosing the best. ["Pedestrian guide to analysing sequence databases" Burkhard Rost, Reinhard Schneider, 1999]
http://cubic.bioc.columbia.edu/papers/1999_pedestrian/paper.html
These methods allow the introduction of artificial gaps in aligned sequences
to create an optimal alignment.
Related terms alignment, gap penalties,
global alignment, local alignment, Needleman-Wunsch, sequencing
algorithms
flow cytometry: Cell biology
FASTA:
The first widely used algorithm for database similarity searching. The program looks for optimal local alignments by scanning the sequence for small matches called "words". Initially, the scores of segments in which there are multiple word hits are calculated ("init1"). Later the scores of several segments may be summed to generate an
"initn" score. An optimized alignment that includes gaps is shown in the output as "opt". The sensitivity and speed of the search are inversely related and controlled by the
"k-tup" variable which specifies the size of a "word". (Pearson and Lipman)
[NCBI Bioinformatics] More rigorous and slower than BLAST. http://fasta.bioch.virginia.edu/
finished sequence - human:
Sequence in which bases are identified to
an accuracy of no more than 1 error in 10,000 and are placed in the right
order and orientation along a chromosome with almost no gaps. [History
of the Human Genome Project" A Genome Glossary" Science 291: pullout chart
Feb. 16, 2001]
Each base pair has been sequenced 8-10 times, with the remaining gaps
limited by present technology. ... No eukaryotic genome sequenced so
far has been totally sequenced - current technology isn't up to it. Highly
repetitive regions (not expected to contain many protein- coding genes)
can be impossible (or very difficult) to clone. One definition of "finished"
is that fewer than one base in 10,000
is incorrectly assigned. [Peer Bork, Richard Copley "Filling in the gaps" Nature
409: 218-820, 15 Feb. 2001] At some level it’s a little arbitrary when you declare a sequence essentially
complete." says NHGRI Director Francis Collins… The definition
of finished is evolving. Our definition today is different from
10 years ago. Ten years ago we didn’t even think at the level of genomes."
says Laurie Goodman, editor of Genome Research. "I think the community
at large should define done. Not everyone is going to agree, but
when you’re using the word you should define what it means." Francis Collins
says "You’re done when you’ve exhausted the standard methods for closing
the gaps. There should be some biological reason why those last bits of
sequence eluded you – not because you just didn’t bother." "Are we there
yet?" The Scientist :12 July 19, 1999 http://www.the-scientist.com/yr1999/july/hopkin_p12_990719.html
Related terms finished clone, Human Genome Project,
post-genomic.
Genomics
finishing standards - Human Genome Project: The International
Human Genome Consortium recognizes the need to maximize the likelihood that
the finished human genome sequence meets consistent standards of quality across
all participating genome centers, and to adopt uniform practices and annotation
for regions that present problems for current sequencing technology. At the
Seventh International Meeting, the Consortium approved a detailed set of
consensus standards for what should be considered as finished sequence, a set of
rules for dealing with regions that are difficult to resolve, and a set of
finishing annotation tags to be submitted with accessions. Finishing Standards
for the Human Genome Project - Version September 7, 2001, Standard Finishing
Practices and Annotation of Problem Regions for the Human Genome Project (Genome
Sequencing Center, Washington Univ. in St. Louis School of Medicine, US) http://www.genome.wustl.edu/Overview/finrulesname.php?G16=1
flow sorting: Cell biology
full shotgun coverage:
The coverage in random raw sequence needed from
a large-insert clone to ensure that it is ready for finishing; this varies among
centers but is typically 8-10 fold. Clones with full shotgun coverage can
usually be assembled with only a handful of gaps per 100 kb. [Univ. of
California Santa Cruz Human Genome Project Working Draft terminology] http://genome.ucsc.edu/goldenPath/term.html
gap:
A space introduced into an alignment to compensate for insertions and
deletions in one sequence relative to another. To prevent the accumulation of too many gaps in an alignment, introduction of a gap causes the deduction of a fixed amount (the gap score) from the alignment score. Extension of the gap to encompass additional nucleotides or amino acid is also penalized in the
scoring of an alignment.
[NCBI Bioinformatics]
Narrower term gap penalties
gap penalties:
An important problem is the treatment of gaps,
i.e., residue inserted (or deleted) to optimise the objective function.
Usually, gap penalties (cost of inserting and extending gaps) are chosen
to be length dependent. Typically, the cost of extending a gap (gap elongation)
is 5-10 times lower than is the cost for introducing a gap (gap open).
The optimal choice of gap penalties depends on the particular method and,
in detail, on the particular sequence family ["Pedestrian guide to analysing
sequence databases" Burkhard Rost, Reinhard Schneider, 1999] http://cubic.bioc.columbia.edu/papers/1999_pedestrian/paper.html
Related terms alignment, dynamic
programming methods. Broader term gaps
genotype:
The genetic constitution of an organism as revealed
by genetic or molecular analysis, i.e. the complete set of genes, both
dominant and recessive, possessed by a particular cell or organism. IUPAC
Biotech
The observed alleles at a genetic locus for an individual.
NHLBI
An organism’s genetic makeup, as revealed through molecular analysis.
genotype to phenotype: Genomics
genotyping: The genetic scientific community is
exploding with new robust tools which explore the connections between genotypes
and phenotypes. The falling prices from developing to mature genotyping
platforms, result in abundant data to interrogate and analyze. In addition, as
more detailed clinical classification of patients is performed, stronger genetic
associations of complex diseases are discovered. Genotyping
Tools June 2009, San Francisco CA
Used for diagnosis,
drug efficacy, and toxicity. Utilizes genomic DNA
that, after digestion, reacts with a SNP array to
obtain an individual SNP pattern. These variations
can for instance provide information about the
diagnosis of a certain disease, or the
effectiveness or side effect of a certain drug.
The determination of relevant nucleotide- base sequences
in each of the two parental chromosomes. May
refer to identifying one or more, up to the entire gene sequence of an
organism. Compare phenotype
Genotyping implies (though I haven't found this in print) determining known
variants, as opposed to discovery of new ones.
What is the difference
between genotyping and sequencing? 23andme
https://www.23andme.com/ourservice/process/genotyping/
Related terms Genetic
variations; Broader term sequencing;
Narrower terms: haplotyping, viral genotyping genome wide
glass capillary electrophoresis: Chromatography
& electrophoresis A type of automated sequencer.
global alignment:
The alignment of two nucleic acid or protein sequences over their entire length.
[NCBI Bioinformatics]
Related term: dynamic programming methods, Broader term
alignment
Golden Path: The
assembled genome sequence. Contigs are the principal building blocks of
"Golden path". The term was originally applied to the human genome
assemblies coordinated at UCSC, but some people now use it for any genome
assembly. Ensembl Glossary http://vega.sanger.ac.uk/Mus_musculus/helpview?se=1&kw=glossary
GRAILexp:
Gene Recognition and Assembly Internet Link software http://compbio.ornl.gov/Grail-1.3/ The
GRAILexp FAQ http://compbio.ornl.gov/grailexp/gxpfaq1.html
with references to Perceval, an exon prediction program; Galahad, a gene message
alignment program and Gawain, a gene assembly program clearly has scientific and
literary finesse. Does this name relate in any way to Walter Gilbert's description of the Human
Genome Project as the "Holy Grail" of molecular biology? I
should just ask them.
GWAS Genome Wide
Association Sequencing: The NIH is
interested in advancing genome-wide association studies (GWAS) to identify
common genetic factors that influence health and disease. For the purposes of
this policy, a genome-wide association study is defined as any study of genetic
variation across the entire human genome that is designed to identify genetic
associations with observable traits (such as blood pressure or weight), or the
presence or absence of a disease or condition. Whole genome information, when
combined with clinical and other phenotype data, offers the potential for
increased understanding of basic biological processes affecting human health,
improvement in the prediction of disease and patient care, and ultimately the
realization of the promise of personalized medicine. http://grants.nih.gov/grants/gwas/ In
the past couple of years, thanks to plunging microarray costs and greater
international collaboration, GWAS have come to dominate the pages of the leading
journals, as scientists successfully pinpoint scores of gene loci associated
with complex diseases including diabetes, heart disease, mental illness and
cancer. Despite their success and popularity, Duke University geneticist David
Goldstein believes their usefulness is limited. BioIT World, 2009 April 15
http://www.bio-itworld.com/news/04/15/09/geneticists-debate-GWAS-NEJM.html
haplotype:
The linear, ordered arrangement of alleles
on a chromosome. Haplotype analysis is useful in identifying recombination
events. [NHLBI]
The genetic constitution of individuals with respect to one member of a pair
of allelic genes, or sets of genes that are closely linked and tend to be
inherited together such as those of the MAJOR HISTOCOMPATIBILITY COMPLEX. MeSH,
1987
A particular pattern of sequential SNPs
found on a single chromosome. These SNPs tend to be inherited together over time
and can serve as disease-gene markers. The examination of single chromosome sets
(haploid sets), as opposed to the usual chromosome pairings (diploid sets), is
important because mutations in one copy of a chromosome pair can be masked by
normal sequences present on the other copy.
From “haploid genotype.” The
key idea is that alleles often travel in packs. This offers the hope that pharmacogenomics
will not hopelessly fragment pharmaceutical fragments.
Related terms: haplotyping, haplotyping technologies
Cell biology diploid, haploid, ploidy; Maps
& mapping:
haplotype map HapMap; Narrower term: SNPs
& genetic
variations haploinsufficiency, haplotype block, SNP haplotype
haplotyping:
Somatic cells, as opposed to germ
cells, have two copies of each chromosome. A given single- base position may be homozygous
for the wild- type base (each chromosome has the normal allele),
homozygous for a SNP base (each chromosome has the altered allele), or
heterozygous for two different bases (one chromosome has the normal allele
and the other has the abnormal allele).
Haplotyping involves grouping
subjects by haplotypes, or particular patterns of sequential SNPs, found
on a single chromosome. These SNPs tend to be inherited together over time and
can serve as disease- gene markers. The examination of single chromosome sets (haploid
sets), as opposed to the usual chromosome pairings (diploid sets), is
important because mutations in one copy of a chromosome pair can be masked by
normal sequences present on the other copy. [CHI SNPs
Update report]
Genes tend to travel
in packs. This is good news for pharmacogenomics. Broader
terms genotyping, sequencing
Haplotyping : A key approach to studying genetic variation,
interview with Mark Daley, Whitehead Institute, CHI's GenomeLink 14.2 http://www.healthtech.com/newsarticles/issue14_2.asp haplotyping technologies: Include
microarrays, mass
spectrometry, sequencing Hidden Markov Models HMM:
In Silico & Molecular
Modeling Useful for analyzing protein sequences. high-throughput sequencing:
DNA resequencing involves sequencing a DNA region where a
reference sequence for the region is already available. These studies provide
important insight into the function of genes and the evolution of genes and
populations. Applications abound including: comparative genomics,
high-throughput SNP detection, identifying mutant genes in disease pathways,
profiling transcriptomes for organisms where little information is available,
researching lowly expressed genes, to identifying newly emerging or genetically
engineered bacterial and viral strains. Compare
low
throughput sequencing, medium throughput sequencing homology:
Narrower terms sequence homology, sequence
homology- nucleic acid; Functional
genomics homology Related terms homolog (homologue),
similarity, ortholog, paralog, xenology; Molecular
modeling homology modeling horizontal
sequencing: The obvious alternative [to vertical
sequencing] is to perform all four reactions in one vial and determine the
sequence by comparing determined oligonucleotide mass differences with expected
data (horizontal sequencing). Eckhard Nordhoff,a Christine Luebbert,
Gabriela Thiele, Volker Heiser, and Hans Lehrach, Rapid determination of short
DNA sequences, Nucleic
Acids Res > v.28(20);
Oct 15, 2000 http://www.pubmedcentral.gov/articlerender.fcgi?artid=110802 Related
term: vertical sequencing human sequence: See draft
sequence, finished sequence, published sequence, working draft
initial sequence contigs: Derived from sequenced clones [David
Galas "Making sense of the sequence" Science 291: 1257-1260, 16 Feb.
2001]
library; library, genomic: Cell biology
local alignment:
The alignment of some portion of two nucleic acid or protein sequences.
[NCBI Bioinformatics]
Best alignment method for sequences for whom
no evolutionary relatedness is known. See Smith- Waterman alignment.
Compare global alignment.
low throughput sequencing:
A low throughput lab, such as a small
academic lab, most often has one sequencer (perhaps a single gel based unit that
still uses radioactivity) or a core facility available (a single low capacity
fluorescent sequencer). Capacity is low and sequencing of few DNA clones occurs.
For activities beyond that, a small academic lab would have to sub- contract to
a commercial sequencing source. [CHI High
Throughput Genomics] Genomic Report, Dec. 2001. Compare medium
throughput sequencing, high throughput sequencing.
MALDI-TOF: Mass spectrometry masking:
Also known as filtering. The removal of repeated or low complexity regions from a sequence in order to improve the sensitivity of sequence similarity searches performed with that sequence.
NCBI Bioinformatics
Maxam-Gilbert sequencing & Sanger sequencing:
The two basic
sequencing approaches, Maxam- Gilbert and Sanger, differ primarily in the
way the nested DNA fragments are produced. Both methods work because gel
electrophoresis produces very high resolution separations of DNA molecules;
even fragments that differ in size by only a single nucleotide can be resolved.
Almost all steps in these sequencing methods are now automated. Maxam-
Gilbert sequencing (also called the chemical degradation method) uses chemicals
to cleave DNA at specific bases, resulting in fragments of different lengths.
A refinement to the Maxam- Gilbert method known as multiplex sequencing
enables investigators to analyze about 40 clones on a single DNA sequencing
gel. Sanger sequencing (also called the chain termination or dideoxy
method) involves using an enzymatic procedure to synthesize DNA chains
of varying length in four different reactions, stopping the DNA replication
at positions occupied by one of the four bases, and then determining the
resulting fragment lengths. [Primer on Molecular Genetics, Oak Ridge
National Lab,
US] http://www.ornl.gov/hgmis/publicat/primer/intro.html
medical
resequencing MRS: Key parts of suspect
genes are sequenced and compared between patients and controls to identify
genetic variations that may contribute to disease. Richard Gibbs, "Deeper
into the genome" Nature 7063:1233- 1234, 27 Oct. 2005
medium
throughput sequencing:
A medium throughput lab (e.g., a small pharma or a
university wide core facility) most often has one to two small automated
sequencers, using them to sequence a small collection of clones or PCR products
for comparison sequencing. High
Throughput Genomic Report, 2001. Compare low
throughput sequencing, high throughput sequencing.
megalocus: See under haplotype
merged sequence contigs: Derived by merging sequence
contigs from overlapping sequenced clones. [David Galas "Making sense of
the sequence" Science 291: 1257-1260, 16 Feb. 2001]
microsequencing:
Sequencing of proteins or peptides in very small
amounts (sub microgram), sometimes for use as probes.
minisequencing: A solid- phase method for the detection of any
known point mutation or allelic variation of DNA. In the method amplified,
biotinylated DNA sequences containing the mutation site are immobilized
onto streptavidin coated microplate and primer extension reactions are
carried out using labeled nucleotides. Incorporation of the labeled nucleotide
is dependent on the genotype and is analyzed using ELISA technique. Assay
method allows automation. [Photometry applications, Labsystems Oy, Finland, no
longer on website]
Single base sequencing.
Related terms:
single base extension array; SNPs
& genetic
variations
multiple sequence alignment:
An alignment of three or more sequences with gaps inserted in the sequences such that residues with common structural positions
and/ or ancestral residues are aligned in the same column. ClustalW is one of the most widely used multiple sequence alignment programs.
NCBI Bioinformatics
The concept of dynamic programming
cannot be extended to align more than three sequences optimally (Murata
1990). A way around this problem is to first find optimal pairwise alignments
and to then merge the pairs "Pedestrian guide to analysing sequence
databases" Burkhard Rost, Reinhard Schneider, 1999 http://cubic.bioc.columbia.edu/papers/1999_pedestrian/paper.html
Related term Hidden Markov Models HMM
nanopore sequencing:
Biological membrane pores are also being
investigated for rapid single DNA molecule analysis. These nanometer-
sized pores are constructed from a -hemolysin
channels, isolated from bacteria, placed in Teflon horizontal bilayers. A single
DNA molecule is pulsed through a nanopore in only hundreds of microseconds. The
major challenge for this technology is developing the capability of reading the
genetic code of the DNA fragment in the brief time that it is traversing the
nanopore. [CHI High
Throughput Genomics] report, 2001.
Related term: Nanoscience
& Miniaturization nanopore
Needleman-Wunsch:
Global sequence alignment algorithm. [Needleman,
S. B., Wunsch, C. D., "A general method applicable to the search for similarities
in the amino acid sequence of two proteins" J. Mol. Biol.( 48): 443-453
Mar. 1970] Related terms dynamic programming; Algorithms
& data management In
Silico & Molecular
modeling Several next generation
sequencing (NGS) technologies have recently emerged, including Roche 454,
Illumina GA, and ABI SOLiD, which are able to generate three to four orders of
magnitude more sequence and are considerably less expensive than the Sanger
method on the ABI 3730xL platform (hereafter referred to as ABI Sanger) To
date these new technologies have been successfully applied toward ChIP-sequencing
to identify binding sites of DNA-associated proteins, RNA-sequencing to profile
the mammalian transcriptome, as well as whole human genome sequencing. Currently
there is much interest in applying NGS platforms for targeted sequencing of
specific candidate genes, intervals identified through single nucleotide
polymorphism (SNP)-based association studies, or the entire human exome in
large numbers of individuals. Evaluation of next generation sequencing platforms for
population targeted sequencing studies, Olivier
Harismendy
, Pauline C Ng , Robert
L Strausberg, Xiaoyun Wang,
Timothy B Stockwell, Karen
Y Beeson, Nicholas J Schork,
Sarah S Murray, Eric
J Topol, Samuel Levy and
Kelly A Frazer Genome
Biology 2009, 10:R32doi:10.1186/gb-2009-10-3-r32 next
generation sequencing data analysis: New-generation
sequencing platforms are capable of generating gigabytes of data in a sequence
run - leading to terabytes of data in a single experiment. Thus data storage,
transfer, and analysis will unquestionably be the rate limiting steps in turning
this new sequencing data into knowledge. Next
Generation Sequencing Data Analysis, Sept 2009, Providence RI
Alignments are intended to unravel evolutionary pathways and/ or
structural homology between two proteins. These two objectives (functional/
structural) may be mutually
contradictory, i.e., the 'optimal' alignment' may differ according to the
objective. Yet another perspective is the 'mathematical' optimal alignment.
This is the alignment that optimises a given objective function, e.g.,
to find the alignment with the highest number of pairwise identical residues.
FASTA and BLAST are not guaranteed to find such a mathematically optimal
alignment. ["Pedestrian guide to analysing sequence databases" Burkhard
Rost, Reinhard Schneider, 1999] http://cubic.bioc.columbia.edu/papers/1999_pedestrian/paper.html
pathogen sequencing:
In the future, more pathogens will have their
genomes completely sequenced to determine not only how the pathogen causes
disease, but what, if any, treatments will be most effective. The DNA sequences
of viruses like HIV, human papilloma virus (HPV), and hepatitis C (HCV) are
already being characterized and therapies prescribed based on this genetic
information. To perform these types of diagnoses, DNA sequencing will have to
become faster, more cost effective, simpler to perform, and more accessible to
clinical laboratories.
Phrap: Assembler software. http://www.phrap.com/background.htm Phred:
Base calling program for DNA sequence traces;
... developed by Drs. Phil Green and Brent Ewing, and is distributed under
license from the University of Washington. http://www.phrap.org/
protein sequence: Proteins
published working drafts - human genome:
International Human Genome Sequencing Consortium special issue: Nature
409 (6822) 15 Feb 2001 http://www.nature.com/genomics/human/papers/analysis.html
Human Genome [Celera Genomics sequence] special issue: Science 291
(5507) Feb. 16, 2001 http://www.sciencemag.org/content/vol291/issue5507/index.shtml
random sequencing: See under shotgun sequencing
resequencing:
Eric Lander, director of the Whitehead Institute's Center for Genome Research, and professor of biology at
MIT notes " The human genome will need to be sequenced only once, but it will be
resequenced thousands of times, in order, for example to unravel the polygenic
factors underlying human susceptibilities and predispositions … Re-sequencing
will also provide the ultimate tool for genotyping studies" E. Lander
"The New Genomics" Science 274: 536, 25 Oct. 1996 Previously sequenced site is resequenced for SNP
discovery or other purposes. DNA resequencing involves sequencing a DNA
region where a reference sequence for the region is already available. These
studies provide important insight into the function of genes and the evolution
of genes and populations. Applications abound including: comparative genomics,
high-throughput SNP detection, identifying mutant genes in disease pathways,
profiling transcriptomes for organisms where little information is available,
researching lowly expressed genes, to identifying newly emerging or genetically
engineered bacterial and viral strains. Related terms:
finished sequence, finishing standards,
published working drafts, rough drafts - human genome,
working drafts
SNP Single Nucleotide Polymorphism: SNPs
& Genetic Variations
SNP scoring: Involves methods to determine the genotypes of many
individuals for particular SNPs that have already been discovered. ... tools
are just beginning to emerge and many more robust technologies are needed. NIH, Methods for Discovering and Scoring
Single Nucleotide Polymorphisms, Request for Applications Jan. 9, 1998 ]http://grants.nih.gov/grants/guide/rfa-files/RFA-HG-98-001.html Sanger sequencing: See under
Maxam-Gilbert sequencing.
scaffolds: Ordered set of contigs placed on the chromosome.
NCBI,
Human Genome Home "Contig Assembly Process" Glossary, Feb. 2001 http://www.ncbi.nlm.nih.gov/genome/guide/build.html#glossary.
A series of contigs that are in the right order but are not necessarily
connected in one continuous stretch of sequence. [History of the Human
Genome Project" A Genome Glossary" Science 291: pullout chart Feb. 16,
2001]
The definition of a scaffold appears to be quite different in the Science
and Nature draft published sequences. [David Galas "Making sense of sequence"
Science 291: 1257- Feb. 16, 2001] This is also different from the scaffold defined in Drug
discovery and development.
The result of connecting contigs by linking information, such as paired-end
reads from plasmids, paired-end reads from BACs, known mRNAs, or other sources.
The contigs in a scaffold are ordered and oriented with respect to one another.
Univ. of California Santa Cruz Human Genome Project Working Draft terminology
http://genome.ucsc.edu/goldenPath/term.html
Narrower terms: sequence- contig scaffold, sequenced- clone-
contig scaffold Related term: contig assembly.
scanning, scoring: SNPs
& other genetic variations scoring methods:
Many choices, best choice often problem dependent.. Nice review "Sequence
Analysis: Which scoring method should I use? Pittsburgh Supercomputing Center,
Carnegie Mellon Univ. 1999] http://www.psc.edu/research/biomed/homologous/scoring_primer.html
Related terms filtering, gap, masking
Molecular
modeling homology modeling
Narrower term: SNP scoring
sequence alignment:
The arrangement of two or more amino acid or base
sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or
homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms.
MeSH, 1991 Broader term?: alignments.
sequence analysis: Sequence analysis is a robust field,
and mining sequence data using bioinformatics
is one of the main activities of genomics- based drug discovery. Using
sequence analysis to understand whole genomes may provide an important
advantage for groups looking for new drug targets among genes, or trying to pick
the best among targets they already have. Sequence analysis is one of the most widely used techniques in genomics. A great deal of sequence work will continue to be done, as researchers
fill in the gaps left in the genome maps of humans and other important
organisms. Studies to confirm sequence, and to identify SNPs, will also need to
continue. sequence homology: The degree of similarity between sequences. Studies of
amino acid and nucleotide sequences provide useful
information about the genetic relatedness of certain species. MeSH, 1993 Broader
term Functional genomics homology;
Related terms Functional
genomics evolutionary homology; Proteomics
regulatory homology; Molecular modeling homology
modeling; Structural genomics structural homology
sequence homology, amino acid
The degree of similarity between sequences of amino acids. This information is useful for the understanding of genetic relatedness of certain species.
MeSH, 1993
sequence homology - nucleic acid:
The sequential correspondence
of nucleotide triplets in a nucleic acid molecule which permits nucleic
acid hybridization. Sequence homology is important in the study of mechanisms
of oncogenesis and also as an indication of the evolutionary relatedness
of different organisms. The concept includes viral homology. MeSH, 1991
Broader term sequence homology
sequence tags:
Sequence bits 2-4 contig residues in length.
Used to determine the mass of a particular sequence. [CHI Proteomics
report] Can
be used to search protein and EST databases with high specificity. Blackstock
& Weir “Proteomics” Trends in Biotechnology 17:121 Mar 1999
sequence-contig scaffold:
Scaffold
produced by connecting a maximal set of sequence contigs joined by bridged gaps.
[Univ. of California Santa Cruz Human Genome Project Working Draft
terminology] http://genome.ucsc.edu/goldenPath/term.html
sequenced-clone-contig scaffold:
Scaffold
produced by joining sequenced clone contigs by bridged SCC gaps. [Univ. of
California Santa Cruz Human Genome Project Working Draft terminology] http://genome.ucsc.edu/goldenPath/term.html
sequencers- automation:
The types and degree of automation needed at
each [sequencing] step can vary greatly from laboratory to laboratory. The
ultimate goal of automation is not to replace the role of humans, but to
increase the speed and accuracy rates of individual steps while decreasing the
number of repetitive manual steps that would otherwise be imposed on personnel.
In fact, some of the steps may still be performed manually if that step can be
performed faster or more proficiently by hand (e.g., preparation of reaction
mixes, loading sequencing gels).
Related term: online automated DNA sequencers
sequencers- miniaturization:
In the future, miniature DNA sequencers
may become state of the art. Pocket or brief case- sized DNA sequencers would
permit onsite research to be performed in remote areas and dramatically increase
the speed at which DNA fragments are analyzed.
Related terms:
Microarrays categories:
lab- on- a- chip
sequencing:
Proteins, nucleic acids
-- Analytical procedures for
the determination of the order of amino acids in a polypeptide chain or
of nucleotides in a DNA or RNA molecule. IUPAC Compendium
Sequencing of biomolecules began with the insulin B-chain - a thirty residue
peptide - which Saenger and Tuppy deduced through a combination of limited
proteolysis and chemical analysis in 1951. It was a full 14 years later, until
Holley et al. determined the sequence of alanine tRNA from yeast. And it took
another 12 years, until "real" DNA sequencing was developed by Maxam
& Gilbert and Saenger et al in 1977. [Introduction to
bioinformatics, Univ. of Munich Gene Center, Germany, Summer 2000] http://www.lmb.uni-muenchen.de/groups/bioinformatics/01/ch_01_1.html
Largely automated now. Full DNA sequencing is the "gold standard"
for genotyping.
Narrower terms: deep
sequencing, horizontal sequencing, resequencing, sequencing - algorithms,
sequencing, advanced, sequencing - cost of,
sequencing - high- throughput, sequencing - throughput, next generation
sequencing, shotgun sequence, single
DNA molecule sequencing, vertical sequencing, whole genome shotgun sequencing, chain
termination sequencing, chemical cleavage sequencing, chemical degradation
sequencing, de novo sequencing, dideoxy sequencing, microsequencing,
minisequencing, multiplex sequencing, Sanger sequencing, sequencing by
synthesis. Related terms: genotyping, GWAS Genome Wide Association
Sequencing, haplotyping
See also next
generation sequencing, now generation sequencing, sequencing data analysis &
storage
sequencing,
advanced: Jay Shendure et. al., Advanced
Sequencing Technologies: Methods and Goals, Nature Reviews Genetics, 5:
335- May 2004 http://arep.med.harvard.edu/PGP/Shendure04.pdf
sequencing algorithms: See BLAST, FASTA, Needleman - Wunsch, Smith - Waterman
sequencing by
synthesis: Promising
new sequencing technologies, based on sequencing by synthesis (SBS), are
starting to deliver large amounts of DNA sequence at very low cost.
Polymorphism detection is a key application. Quality
scores and SNP detection in sequencing-by-synthesis systems., Brockman W,
Alvarez P, Young S, Garber M, Giannoukos G, Lee WL, Russ C, Lander ES, Nusbaum
C, Jaffe DB, Genome Research 2008 Jan 22 [Epub ahead of print ]
sequencing - cost of:
The cost of sequencing a single DNA base
[when the Human Genome Project was initiated] was about $10 then; today, sequencing costs have fallen about 100-fold to $.10 to $.20 a base and still are dropping rapidly.
[Human Genome News 11 (1-2) Nov. 2000] http://www.ornl.gov/hgmis/publicat/hgn/v11n1/01giants.html
sequencing data
analysis: Now-generation sequencing platforms are capable of generating
gigabytes of data in a sequence run leading to terabytes of data in a single
experiment. Thus data storage, transfer, and analysis will unquestionably be the
rate limiting steps in turning this new sequencing data into knowledge. Sequencing Data Analysis
and Storage
March 15-17, 2010 • San
Diego, CA Program | Register | Download
Brochure sequencing - high- throughput:
Uses robotics, automated DNA-
sequencing machines and computers.
shotgun sequencing method: Sequencing method which involves randomly sequencing
tiny cloned pieces of the genome, with no foreknowledge of where on a chromosome
the piece originally came from. This can be contrasted with "directed"
[sequencing] strategies, in which pieces of DNA from adjacent stretches of a chromosome
are sequenced. Directed strategies eliminate the need for complex reassembly
techniques. Because there are advantages to both strategies, researchers
expect to use both random (or shotgun) and directed strategies in combination
to sequence the human genome. [DOE] Uses dynamic programming methods.
Narrower terms:
chromosome-specific shotgun sequencing,
protein shotgun sequencing, whole genome shotgun sequencing,
YAC shotgun sequencing;
Related term: full shotgun coverage
Shotgun
sequencing comes of age, Tabitha
Powledge, Scientist Dec. 31, 2002 http://www.biomedcentral.com/news/20021231/06
Hybrid of whole genome shotgun and clone- by- clone approach is probably best.
similarity: Functional genomics
similarity search: BLAST, FASTA
and Smith- Waterman are examples of similarity search
algorithms.
single base
extension array: Single Base Extension
[SBE] or mini- sequencing, was one of the first genotyping technologies
developed, and it is categorized as a type of primer extension method. SBE is
very similar to DNA sequencing, with the exception that only one base (the
polymorphic base or SNP) is queried, while sequencing can identify up to several
hundred bases and determine their relative order. The system is cheaper and can
be performed in higher throughput than traditional sequencing. CHA Cambridge
Healthtech Advisors, Clinical
Genomics: The Impact of Genomics on Clinical Trials and Medical Practice
report, 2004
single DNA molecule sequencing: The evolution of technology for single DNA molecule sequencing will ultimately permit
whole genome analysis of populations of cells at high resolution and will obviate current
PCR- based approaches, particularly important for sequencing diploid or polyploid cells.
This is the ultimate in sensitivity, and perhaps difficulty. Further in the future, it might be
possible to utilize the protein synthesis machinery of the cell as a "sequencing engine."
National Center for Research Resources "Integrated Genomics Technologies
Workshop Report" Jan 1999
Smith-Waterman alignment: An amino acid sequence alignment
that illustrates sequence similarity. The alignment is generated using
the Smith- Waterman algorithm (Temple Smith and MS Waterman, J Mol Biol.
147: 195-197, 1981; WR Pearson Genomics 11:635-650, 1991) [SGD Saccharomyces
Genome Database glossary, Stanford Univ.] http://genome-www.stanford.edu/Saccharomyces/help/glossary.htm
Related
terms dynamic programming; Algorithms,
In silico & Molecular
modeling
templates: Used for sequencing generally come in two forms: 1) PCR
products and 2) cloned DNA. PCR (polymerase chain reaction) products are derived
by the PCR process where a specific but minute portion of the genome is
selectively amplified 1 billion- fold from the source DNA (usually a complete
genome). PCR products are generated when only a small but discrete portion of
the genome needs to be analyzed in hundreds or thousands of individuals. ... DNA
clones are derived from cutting large portions of DNA (sometimes a complete
genome) into discrete fragments that are "cloned" or inserted into DNA
vectors. ... A single DNA fragment is inserted into a vector and transferred
into a host (generally E. coli bacteria) where the vector and the DNA
fragment replicate as the host replicates, thus producing mass quantities of a
single DNA fragment that can be purified from the host. CHI High
Throughput Genomics Report 2001 vertical
sequencing: Among the many proposed concepts for
sequencing DNA using mass spectrometry, the most successful has been to combine
Sanger cycle sequencing with MALDI-TOF-MS (1,2,4–12).
Four nucleobase- specific oligonucleotide ladders are generated in separate
reaction vials, which are then separated and detected inside the mass
spectrometer. The sequence is determined by comparing the recorded spectra
(vertical sequencing). Eckhard Nordhoff,a Christine Luebbert,
Gabriela Thiele, Volker Heiser, and Hans Lehrach, Rapid determination of short
DNA sequences, Nucleic
Acids Res > v.28(20);
Oct 15, 2000 http://www.pubmedcentral.gov/articlerender.fcgi?artid=110802 Related
term: horizontal sequencing
viral genotyping: Genomic
data is enabling researchers to predict a patient's response to therapy based on
the viral genotype for viral infections. HIV genotyping is an early example of
how treatment decisions are made based on the genotype of the virus. viral
homology: See under sequence homology- nucleic acid
whole genome shotgun sequencing:
Celera’s whole genome shotgun sequencing technique involves sequencing from both ends of the double stranded cloned DNA. Celera’s accurately paired clone end sequences are a key tool for assembling the genome much more completely than single stranded sequencing methods allow at comparable levels of sequence coverage. Celera’s paired
end- sequencing strategy, as part of the whole genome shotgun sequencing technique, has now produced sequence pairs from clones that cover the human genome 11 times. The company believes that 99% of the human genome is represented in the cloned DNA.
["Celera Genomics completes sequencing phase of the genome
from one human being" press release, Rockville,
MD, April 6, 2000] http://www.pecorporation.com/press/prccorp040600.html
Broader term shotgun sequencing method. Related
term: GWAS Genome Wide Association Sequencing "working draft, human genome sequence":
This site contains a working
draft of the human genome, which is over 90% complete. Approximately half of the
sequence is in a highly accurate 'finished' state. The other half is merely
'draft' quality. Some care must be taken interpreting draft regions, but these
are still often very useful to the working scientist. We encourage you to
explore the working draft with the genome browser, which displays the work of
many annotators worldwide. Human Genome Project Working Draft, Univ. of
California, Santa Cruz, US http://genome.ucsc.edu/
This
milestone was announced at the White House (Washington DC, US) on June
26, 2000. President Bill Clinton was joined by Francis Collins (National
Human Genome Research Institute) and Craig Venter (Celera Genomics) and
heads of the major US genome sequencing centers. Work continues to be done
on annotating the sequence, but further celebration ensued with publication
of two versions of the sequence in Feb. 2001. Related terms:
draft
sequence, finished sequence - human, published working drafts; Genomics Human Genome
Project
Bibliography
How
to look for other unfamiliar terms
IUPAC definitions are reprinted with the permission of the International
Union of Pure and Applied Chemistry.
Evolving Terminologies for Emerging
Technologies
Comments? Suggestions Revisions? Mary Chitty mchitty@healthtech.com
Last revised February 18, 2010.
<%end if%>
Related glossaries include Applications Functional
genomics, Proteomics
Informatics Algorithms
Bioinformatics, In
silico & molecular
modeling
Technologies Chromatography
& electrophoresis, Mass spectrometry
Biology Proteins,
Protein
Structures, SNPs & genetic
variations, Sequences
- DNA & beyond
Program
| Register
| Download Brochure
SBIR funding for http://grants.nih.gov/grants/funding/sbir.htm
Pronounced gee-wahs|
Related term: next generation sequencing
next generation
sequencing: Purchasing your next-generation sequencing (NGS)
platform(s) was a major decision. Now that you have purchased a platform, how do
you maximize the greatest potential for your investment? Realizing this
potential requires efficient workflow strategies, careful experimental design,
comprehensive targeted enrichment technologies, data analysis, management, and
integration, in addition to maintaining your platform and people management all
at maximum production. Beyond
Sequencing June 22-23, 2010 • San
Francisco, CA Program
| Register
| Download Brochure 
Next-generation sequencing (NGS)
technologies are advancing in quality and applications diversity at a
breathtaking pace. The market is diversifying strongly into labs without
previous involvement in sequencing. Next-Generation Sequencing: Solving the Genome June 2009 Table
of Contents | Tables
and Figures | Executive
Summary
Now-Generation
Sequencing March 17-19, 2010 • San Diego, CA Program | Register | Download
Brochure 
DNA Sequencing glossary, Geospiza, Inc.,
2002, 26 definitions. http://www.geospiza.com/support/glossary.htm
NCBI (US) BLAST Glossary, 2000. 40+ definitions http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html
Human Genome Project Information, Facts about
Genome Sequencing, Oak Ridge National Lab, US, 2002 http://www.ornl.gov/hgmis/faq/seqfacts.html