You are here Biopharmaceutical/ Genomic glossary homepage > Biology > Sequences DNA & beyond

Sequences – DNA & beyond 
Evolving terminology for emerging technologies

Comments? Suggestions? Revisions?
Mary Chitty MSLS 
mchitty@healthtech.com
Last revised July 03, 2019



Biology & Chemistry term index:   Gene definitions,    DNA     Proteins,     Protein Structures and RNA are sub-categories linked to this glossary. 
Related glossaries include: Genomics,     Proteomics  Informatics Algorithms
,     In silico & Molecular Modeling 
Technologies Microarrays & protein chips,      Sequencing
Biology: Biomolecules,     Expression,     Glycosciences 

3' [three prime] flanking region: The region of DNA which borders the 3' end of a transcription unit and where a variety of regulatory sequences are located. MeSH, 2002

3' UTR (three prime): The sequence at the 3' end of messenger RNA that does not code for product. This region contains transcription and translation regulating sequences. MeSH, 1999

Region at the 3' end of a mature transcript (following the stop codon)  that is not translated into a protein. DDBJ/ EMBL/ GenBank Feature Table  http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

A term that identifies one end of a single- stranded nucleic acid molecule. The 3' end is that end of the molecule which terminates in a 3' hydroxyl group. The 3' direction is the direction toward the 3' end. Nucleic acid sequences are written with the 5' end to the left and the 3' end to the right, in reference to the direction of DNA synthesis during replication (from 5' to 3'), RNA synthesis during transcription (from 5' to 3'), and the reading of mRNA sequence (from 5' to 3') during translation Broader term: UTR Related terms:  5' (5-prime)  PCR primer extension

5' (5-prime):  The sequence at the 5' end of the messenger RNA that does not code for product. This sequence contains the ribosome binding site and other transcription and translation regulating sequences. MeSH, 1999

A term that identifies one end of a single- stranded nucleic acid molecule. The 5' end is that end of the molecule which terminates in a 5' phosphate group. The 5' direction is the direction toward the 5' end. Nucleic acid sequences are written with the 5' end to the left and the 3' end to the right, in reference to the direction of DNA synthesis during replication (from 5' to 3'), RNA synthesis during transcription (from 5' to 3'), and the reading of mRNA sequence (from 5' to 3') during translation.  [Mouse Genome Informatics] Related term: 3' (3-prime)

5' Flanking Region:  The region of DNA which borders the 5' end of a transcription unit and where a variety of regulatory sequences are located.  MeSH 2002

5' UTR (five prime): Region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein. DDBJ/ EMBL/ GenBank Feature Table   http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

5' Untranslated Region:. That portion of an mRNA from the 5' end to the position of the first codon used in translation. Related terms:  3'UTR, 3' prime; PCR primer extension Broader term UTR

adenine (A): A nitrogenous base, one member of the base pair AT (adenine/ thymine). DOE

amino acid sequence:  The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining protein conformation. MeSH, 1966

ATCG: See adenine, base, base pair, thymine, cytosine, guanine

attenuator: In prokaryotes. 1) region of DNA at which regulation of termination of  transcription occurs, which controls the expression of some bacterial operons;  2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription. DDBJ/ EMBL/ GenBank Feature Table  http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

base: Adenine, cytosine, guanine, thymine, and (only in RNA) uracil. Related terms: base pair, nucleotide [DOE]  

Called bases because they are alkaline (basic) in the acidic DNA structure. Base and base pair used "fairly indiscriminately" by molecular biologists [Bains]

base pair bp): Two bases which form a "rung of the DNA ladder." A DNA nucleotide is made of a molecule of sugar, a molecule of phosphoric acid, and a molecule called a base. The bases are the "letters" that spell out the genetic code. In DNA, the code letters are A, T, G, and C, which stand for the chemicals adenine, thymine, guanine, and cytosine, respectively. In base pairing, adenine always pairs with thymine, and guanine always pairs with cytosine. [NHGRI]Narrower terms: adenine, cytosine, guanine, thymine, uracil

central dogma: Horace Freeland Judson quotes Francis Crick talking about the central dogma "Nobody tried to go from protein sequence back to nucleic acid, because that just wasn't on. You see. But I don't think it was ever discussed. ... Jim, [Watson] you might say, had it first. DNA makes RNA makes protein. That became then the general idea. ... what are all the possible information flows?" [Freeland asked why he had called it the central dogma?] "It was because, I think, of my curious religious upbringing. Because Jacques [Monod] has since told me that a dogma is something which a true believer cannot doubt!" Crick laughed. ... "But that wasn't what was in my mind. My mind was, that a dogma was an idea for which there was no reasonable evidence. You see?!" And Crick gave a roar of delight. "I just didn't know what dogma meant. And I could just as well have called it the "Central Hypothesis" - you know. Which is what I meant to say. Dogma was just a catch phrase.  ... And it's a negative hypothesis, so it's very very difficult to prove.... The central dogma is much more powerful [than Crick's sequence hypothesis], and therefore in principle you might have to say it could never be proved. But it's utility - there was no doubt about that. Because if you didn't believe that, you could invent theories, unlimited theories, whereas if you just put in that one assumption, ... then, essentially you were on the right track you see." ... "In looking back I am struck not only by the brashness which allowed us to venture powerful statements of a very general nature, but also by the rather delicate discrimination used in selecting what statements to make. Time has shown that not everybody appreciated our restraint" HF Judson, Eighth Day of Creation Cold Spring Harbor Laboratory Press 1996 pp. 333-334

Francis Crick "Central dogma of molecular biology" Nature 227 (258): 561-563 Aug. 8, 1970 [historical article clarifying original explanation]

The Oxford English Dictionary makes clear the duality of dogma, particularly in the context of dogmatic, defined as "accepted as true instead of being based upon experience, particularly if done in an imperious, arrogant manner".  Dogma is defined as "systematised beliefs" (sometimes deprecating). Dogmatic physicians are cited as "an ancient sect" which "endeavoured to discover by reasoning the essence and occult causes" of disease.  Related terms: transcription, translation  

central dogma exceptions: Reverse transcription, prions, retroviruses?   
1. Reverse transcriptase and RNA genomes. DNA is not the only molecule of heredity in nature and, as David Baltimore and Howard Temin showed, the flow of information from DNA to RNA is not the only pathway possible. 2. Catalytic RNAs (ribozymes). Proteins are not the only structures capable of catalyzing a reaction. Tom Cech demonstrated the catalytic nature of certain classes of introns (intervening sequences) that are able to "self-splice." In addition Harry Noller has shown that the synthesis of the peptide bond during protein synthesis is catalyzed by the 23S rRNA of the ribosome. 3. Heritable proteins. Stanley Prusiner has given us the novel name "prion" (proteinaceous infections particle) to describe the agent responsible for a number of slow, neurological infectious disease, including scrapie, bovine spongiform encepalopathy (mad cow disease) and Creutzfeld- Jakob disease. [Martinez Hewlett, Molecular Biology 411, Univ. of Arizona, Tucson US] http://www.blc.arizona.edu/marty/411/Modules/mod4.html

cis-acting sequences: The sequences just 5' of the start site of transcription are the most important for the initiation of transcription. This is where the transcription complex is built. In general, this region is called the promoter. For eukaryotes, several sequences same to be conserved among many genes. One such sequences is the TATA box. The sequence is located about 30 bases upstream (-30) from the transcription start site and is the one sequence required for any significant transcription to occur. Other sequences add in transcription but are not always part of promoter. The two most found are the CCAAT box (called the CAT box) and the GC box. Because mutants of these three sequences only express mRNAs at low levels, these are considered the most important sequences of the basic transcription complex. Phillip McClean, "Control of gene expression in eukaryotes, North Dakota State Univ. https://www.ndsu.edu/pubweb/~mcclean/plsc731/cis-trans/cis-trans6.htm  Compare trans-acting factors

Does not usually code for proteins. Compare trans-acting. Expression glossary

cytosine (C): A nitrogenous base, one member of the base pair GC (guanine and cytosine). [DOE]

DNA - RNA - protein: See central dogma  Related term: transposons How are these two terms different?

ds: Double-stranded (DNA or RNA).

downstream:  Identifies sequences proceeding farther in the direction of expression; for example, the coding region is downstream from the initiation codon, toward the 3' end of an mRNA molecule. Sometimes used to refer to a position within a protein sequence, in which case downstream is toward the carboxyl end which is synthesized after the amino end during translation. [Lemon]

enhancer: A cis- acting sequence that increases the utilization of (some)  eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter. Eukaryotes and eukaryotic viruses. [DDBJ/ EMBL/ GenBank Feature Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html 

At the 5' and 3' end of the gene, enhancers are located, which respond to the signals mediated by the proteins regulating the function of the gene. Enhancers can also be located within the introns. The regulative effect of the enhancers is either positive or negative. In the latter case they are often called silencers [for reviews concerning enhancers and silencers, see for example 141, 142]. ... In the cis-trans test, the E- g-/E+ g+ cis-heterozygote is phenotypically wild, whereas the E- g+/E+ g- trans-heterozygote is phenotypically mutant. Thus the cis-trans test gives a positive result. This means that we cannot on the basis of a genetic test alone distinguish between an enhancer and the transcription unit regulated by it; biochemical evidence is needed. Thus, by definition, the regulatory elements of a transcription unit, such as enhancers, have to be included in the gene itself.  Petter Portin in "The Origin, Development and Present Status of the Concept of the Gene: A Short Historical Account of the Discoveries" Current Genomics, 2000   https://pdfs.semanticscholar.org/a61a/4e1a2c28e517d6e4ca9a43fd63bbb65379e4.pdf  

enhancer elements (genetics): Cis- acting DNA sequences which can increase transcription of genes. Enhancers can usually function in either orientation and at various distances from a promoter.  [MeSH, 1988]  Related term: promoter 

genetic code:  The sequence of nucleotides, coded in triplets (codons) along the mRNA, that determines the sequence of amino acids in protein synthesis. The DNA sequence of a gene can be used to predict the mRNA sequence, and the genetic code can in turn be used to predict the amino acid sequence. [DOE] 

The notion of a “code” as the key to information transfer was not articulated publicly until late 1954, when [George] Gamow, Martynas Ycas, and Alexander Rich published an article that defined the code idiom for the first time since Watson and Crick casually mentioned it in a 1953 article. Yet the concept of coding applied to genetic specificity was somewhat misleading, as translation between the 4 nucleic acid bases and the 20 amino acids would obey the rules of a cipher instead of a code. As Crick acknowledged years later, in linguistic analysis, ciphers generally operate on units of regular length (as in the triplet DNA scheme), whereas codes operate on units of variable length (e.g., words, phrases). But the code metaphor worked well, even though it was literally inaccurate, and in Crick’s words, “‘Genetic code’ sounds a lot more intriguing than ‘genetic cipher’.” Codes and the information transfer metaphor were extraordinarily powerful, and heredity was often described as a biological form of electronic communication. [Richard A. Pizzi "Genetic ciphering" Modern Drug Discovery  4 (3): 65- 66 Mar. 2001] http://pubs.acs.org/subscribe/journals/mdd/v04/i03/html/03timeline.html
Who wrote the book of life: A history of the genetic code
. Lily E. Kay, Stanford University Press, 2000.    Related term: central dogma

genomic sequence: In April 2003, the sequence of the human genome will be essentially complete. For the scientific community now to make the best use of that fundamental information resource, the identity and precise location of all sequence-based functional elements in the genome must be determined. While many of the protein-coding genes are already known, many others remain to be identified. Beyond open reading frames, non- protein- coding genes, transcriptional regulatory elements and determinants of chromosome structure and function remain largely unknown. A comprehensive encyclopedia of all of these features is needed to utilize fully the sequence of the human genome to understand human biology better, to predict potential disease risks, and to stimulate the development of new therapies and other interventions to prevent and treat disease. The sequence- based functional elements that will be targeted include, but are not limited to: Transcribed sequences, including both protein- coding and non- protein- coding. A description of the gene structure with transcriptional start sites, polyadenylation sites, along with all alternative transcripts, is an example. Conserved non- coding sequences that may represent functional elements. Cis- acting elements that regulate transcription and/ or chromatin structure. These elements include promoters, enhancers, and insulators. Sequence features that affect/ control chromosome biology. Examples include origins of replication and hot spots for recombination. Epigenetic changes, such as DNA methylation and chromatin modifications. Workshop on the Comprehensive Extraction of Biological Information from Genomic Sequence, Bethesda, Md. July 23-24, 2002, http://www.genome.gov/10005568http://grants1.nih.gov/grants/guide/rfa-files/RFA-HG-03-003.html

guanine (G): A nitrogenous base, one member of the base pair GC (guanine and cytosine). DOE]

human sequence: See Sequencing  draft sequence, finished sequence, published sequence, working draft 

intein-mediated protein splicing: has become an essential tool in modern biotechnology. Fundamental progress in the structure and catalytic strategies of cis- and trans-splicing inteins has led to the development of modified inteins that promote efficient protein purification, ligation, modification and cyclization. Recent work has extended these in vitro applications to the cell or to whole organisms. We review recent advances in intein-mediated protein expression and modification, post-translational processing and labeling, protein regulation by conditional protein splicing, biosensors, and expression of trans-genes. Topilina NI, Mills KV. Recent advances in in vivo applications of intein-mediated protein splicing. Mob DNA. 2014;5(1):5. Published 2014 Feb 4. doi:10.1186/1759-8753-5-5 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3922620/

inteins: 
Wikipedia http://en.wikipedia.org/wiki/Intein   Internal protein sequences.   Related terms: exteins, protein splicing.

interspersed repetitive sequences: Copies of transposable elements interspersed throughout the genome, some of which are still active and often referred to as "jumping genes". There are two classes of interspersed repetitive elements. Class I elements (or RETROELEMENTS - such as retrotransposons, retroviruses, LONG INTERSPERSED NUCLEOTIDE ELEMENTS and SHORT INTERSPERSED NUCLEOTIDE ELEMENTS) transpose via reverse transcription of an RNA intermediate. Class II elements (or DNA TRANSPOSABLE ELEMENTS - such as transposons, Tn elements, insertion sequence elements and mobile gene cassettes of bacterial integrons) transpose directly from one site in the DNA to another. MeSH, 1999  Narrower terms: LINES, SINES

LINEs Long Interspersed Nuclear Elements or Long INterspersed Elements: Families of long (average length = 6 500 bp), moderately repetitive (about 10,000 copies). LINEs are cDNA copies of functional genes present in the same genome; also known as processed pseudo- genes. FAO Glossary

Highly repeated sequences, 6K- 8K base pairs in length, which contain RNA polymerase II promoters. They also have an open reading frame that is related to the reverse transcriptase of retroviruses but they do not contain LTRs (long terminal repeats). Copies of the LINE 1 (L1) family form about 15% of the human genome. The jockey elements of Drosophila are LINEs. MeSH, 1999 Related terms: non-coding, retrotransposons. 

LTR Long Terminal Repeat: A sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses. DDBJ/ EMBL/ GenBank Feature Table http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html  Broader term: terminal repeat sequences

locus control region: A regulatory region first identified in the human beta- globin locus but subsequently found in other loci. The region is believed to regulate transcription by opening and remodeling chromatin structure. It may also have enhancer activity. MeSH, 1998

Open Reading Frame ORF:  In molecular genetics, an open reading frame(ORF) is the part of a reading frame that has the ability to be translated. An ORF is a continuous stretch of codons that contain a start codon(usually AUG) and a stop codon (usually UAA, UAG or UGA).[1] An ATG codon within the ORF (not necessarily the first) may indicate where translation starts. The transcription termination site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete protein would be made during translation.[2] In eukaryotic genes with multiple exons, ORFs span intron/exon regions, which may be spliced together after transcription of the ORF to yield the final mRNA for protein translation. Wikipedia accessed 2018 Nov 8 https://en.wikipedia.org/wiki/Open_reading_frame

'ORF' refers to a stretch of DNA that could potentially be translated into a polypeptide or RNA: i.e., it begins with an ATG "start" codon and terminates with one of the 3 "stop" codons. For an ORF to be considered as a good candidate for coding a bona fide cellular protein, a minimum size requirement has often been set, e.g., during the yeast genome sequencing project an ORF was defined as a stretch of DNA that would encode a protein of 100 amino acids or more. An ORF is not usually considered equivalent to a gene or locus until there has been shown to be a phenotype associated with a mutation in the ORF, and/or an mRNA transcript or a gene product generated from the ORF's DNA has been detected. See ORF naming conventions for how ORFs are named in Saccharomyces cerevisiae. The usage of the term ORF within SGD and typically by the Saccharomyces community is generally called a Coding Sequence (CDS).  SGD Glossary https://sites.google.com/view/yeastgenome-help/sgd-general-help/glossary

Reading frames where successive nucleotide triplets can be read as codons specifying amino acids and where the sequence of these triplets is not interrupted by stop codons. MeSH, 1991

Without stop codons, are continuously readable by RNA polymerase  Broader term: reading frame, Narrower term: URF Related term: Omes & omics glossary ORFeome

operator regions (genetics): Regulatory elements of an operon to which activators or repressors bind to effect the transcription of genes in the operon. MeSH, 1986

primary (initial, unprocessed) transcript: Includes 5' clipped region (5' clip), 5' untranslated region (5' UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3' UTR), and 3' clipped region (3' clip). DDBJ/ EMBL/ GenBank Feature Table http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

promoter: Region on a DNA molecule involved in RNA polymerase binding to initiate transcription.  DDBJ/ EMBL/ GenBank Feature Table  http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html  

Promoters are DNA sequences on the 5' side of the gene on which the RNA polymerase fastens when transcription begins. In all groups of organisms alternative promoters have been shown for many genes. These alternative promoters have been classified into six classes by Ueli Schibler and Filipe Sierra [121] (Fig. 3). Certain types of alternative promoters make it possible for transcription to start from different points of the gene in different cases, and for the transcripts to have initiation codons at different positions of the chromosome. Thus it is possible for a single gene in this case too to produce more than one type of messenger RNA molecules, encoding more than one polypeptide. This is again against the basic conceptual framework of the neoclassical view of the gene. ... According to whether the unit of transcription is controlled by one or several promoters, simple and complex transcription units are distinguished.   [Petter Portin in "The Origin, Development and Present Status of the Concept of the Gene: A Short Historical Account of the Discoveries" Current Genomics, 2000 https://pdfs.semanticscholar.org/a61a/4e1a2c28e517d6e4ca9a43fd63bbb65379e4.pdf   Related terms: cis- acting, enhancer, promoter regions; Omes & omics : promoterome

promoter regions: The DNA region, usually upstream to the coding sequence of a gene or operon, which binds and directs RNA polymerase to the correct transcriptional start site and thus permits the initiation of transcription. IUPAC Biotech  

DNA sequences which are recognized (directly or indirectly) and bound by a DNA- dependent RNA polymerase during the initiation of transcription. Highly conserved sequences within the promoter include the Pribnow box in bacteria and the TATA BOX in eukaryotes. MeSH, 1985 Related term: enhancer.

protein splicing: Excision of in- frame internal protein sequences (inteins) of a precursor protein, coupled with ligation of the flanking sequences (exteins). Protein splicing is an autocatalytic reaction and results in the production of two proteins from a single primary translation product: the intein and the mature protein. MeSH, 1997

is defined as the excision of an intervening protein sequence (the INTEIN) from a protein precursor and the concomitant ligation of the flanking protein fragments (the EXTEINS) to form a mature extein host protein and the free intein (Perler 1994). Protein splicing results in a native peptide bond between the ligated exteins (Cooper 1993). Extein ligation differentiates protein splicing from other forms of autoproteolysis. Conserved intein motifs differentiate inteins from other types of in-frame sequences present in one homolog and absent in another homolog or from other types of protein rearrangements. Please Note: The term 'Protein Splicing' has been associated with inteins since 1994 (Perler 1994). Recent papers have described protein rearrangements that are not intein-mediated. The mechanism of these rearrangements is currently unknown, but preliminary evidence suggests that they are mediated by various cellular enzymes. For clarity, we suggest calling these non-intein mediated events either protein rearrangements or Protein Editing.  InBase, The Intein Database and Registry, hosted by Hideo Iwai lab 2010 http://www.inteins.com/  Related terms: exteins, inteins  Narrower term: intein mediated protein splicing

reading frames: The sequence of codons by which translation may occur. A segment of mRNA 5' AUCCGA3' could be translated in three reading frames, 5' AUC.. or 5' UCC.. or 5' CCG.., depending on the location of the start codon. MeSH, 1991  Narrower term: ORF Open Reading Frames

reference sequences: The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products, for major research organisms. RefSeq standards serve as the basis for medical, functional, and diversity studies; they provide a stable reference for gene identification and characterization, mutation analysis, expression studies, polymorphism discovery, and comparative analyses. RefSeqs are used as a reagent for the functional annotation of some genome sequencing projects, including those of human and mouse. NCBI Reference Sequences database  http://www.ncbi.nlm.nih.gov/RefSeq/ 

response elements: Nucleotide sequences, usually upstream, which are recognized by specific regulatory transcription factors, thereby causing gene response to various regulatory agents. These elements may be found in both promoter and enhancer regions. MeSH, 1998

retroelements: Elements that are transcribed into RNA, reverse- transcribed into DNA and then inserted into a new site in the genome. Long terminal repeats (LTRs) similar to those from retroviruses are contained in retrotransposons and retrovirus- like elements. Retroposons, such as LONG INTERSPERSED NUCLEOTIDE ELEMENTS and SHORT INTERSPERSED NUCLEOTIDE ELEMENTS do not contain LTRs. MeSH, 1999

retrotransposon: DNA fragments copied from viral transcriptase that insert in the host chromosomes ..Life Sciences

reverse transcriptases: Gene amplification & PCR  Related terms: non- coding, retrotransposons.

reverse transcription: The synthesis of DNA from an RNA template, via reverse transcription, produces complementary DNA (cDNA). Reverse transcriptases (RTs) use an RNA template and a short primer complementary to the 3' end of the RNA to direct the synthesis of the first strand cDNA, which can be used directly as a template for the Polymerase Chain Reaction (PCR). This combination of reverse transcription and PCR (RT-PCR) allows the detection of low abundance RNAs in a sample, and production of the corresponding cDNA, thereby facilitating the cloning of low copy genes. Alternatively, the first-strand cDNA can be made double-stranded using DNA Polymerase I and DNA Ligase. These reaction products can be used for direct cloning without amplification. New England Biolabs https://www.neb.com/applications/cloning-and-synthetic-biology/dna-preparation/reverse-transcription-cdna-synthesis
Related terms reverse transcriptases; Gene definitions cDNA

SINEs Short Interspersed Nuclear Elements or Short INterspersed Elements: Short interspersed nuclear elements. Families of short (150 to 300 bp), moderately repetitive elements of eukaryotes, occurring about 100,000 times in a genome. SINES appear to be DNA copies of certain tRNA molecules, created presumably by the unintended action of reverse transcriptase during retroviral infection. FAO Glossary

Highly repeated sequences, 100- 300 bases long, which contain RNA polymerase III promoters. The primate Alu (ALU ELEMENTS) and the rodent B1 SINEs are derived from 7SL RNA, the RNA component of the signal recognition particle. Most other SINEs are derived from tRNAs including the MIRs (mammalian- wide interspersed repeats). MeSH, 1999

sequence: The order of neighbouring amino acids in a protein or the purine and pyrimidine bases [A,C,T,G, uracil] in RNA and DNA. IUPAC Bioinorganic Narrower terms: sequence data-  molecular;  Proteins amino acid sequence Related terms: Sequencing draft sequence - human, published sequence - human, working draft sequence - human Glycosciences glossary carbohydrate sequence

sequence data- molecular:  Descriptions of specific amino acid, carbohydrate or nucleotide sequences which have appeared in the published literature an/or are deposited in and maintained by databanks such as GenBank, EMBL, NBRF or other sequence repositories [databases] MeSH, 1988

silencer elements transcriptional: Nucleic acid sequences that are involved in the negative regulation of TRANSCRIPTION by CHROMATIN SILENCING. MeSH 2003

splice sites: In 1993, Richard J. Roberts and Phillip Allen Sharp received the Nobel Prize in Physiology or Medicine for their discovery of "split genes".[4] Using the model adenovirus in their research, they were able to discover splicing—the fact that pre-mRNA is processed into mRNA once introns were removed from the RNA segment. These two scientists discovered the existence of splice sites, thereby changing the face of genomics research. They also discovered that the splicing of the messenger RNA can occur in different ways, opening up the possibility for a mutation to occur. Wikipedia accessed 2018 Aug 26 https://en.wikipedia.org/wiki/Splice_site_mutation

Location in the DNA sequence where RNA removes the noncoding areas to form a continuous gene transcript for translation into a protein. DOE

splice junctions:  Junctions between exons and introns. 

splice variants: The HGNC [Human Genome Nomenclature Committee] has no authority over protein nomenclature; however, we are frequently asked how to designate splice variants so we suggest the following: Proteins should be designated using the same symbol as the gene, printed in non- italicized letters. When referring to splice variants, the symbol can be followed by an underscore and the lower case letter "v" then a consecutive number to denote which variant is which. Human Genome Nomenclature Committee "Guidelines for Human Gene Nomenclature"  Genomics 79(4):464-470 (2002)  http://www.genenames.org/guidelines.html    

splicing: 1. Of RNA: the procedure by which introns are removed from eukaryotic precursor mRNA molecules and adjacent exon sequences are joined together (spliced). 2. Of DNA: manipulation for joining together double stranded DNA fragments with protruding single stranded "sticky ends" by means of ligases. [IUPAC Biotech, IUPAC Compendium] Narrower terms: cis- splicing, protein splicing, pre- mRNA splicing, RNA splicing, trans- splicing; Gene Definitions  alternative splicing, cDNA; Related terms Cell biology  spliceosomes

start codon, stop codon: RNA

template: Gene amplification & PCR Template appears in many biological and biochemical contexts.  Do meanings vary?

terminal repeat sequences: Nucleotide sequences repeated on both the 5' and 3' ends of a sequence under consideration. For example, the hallmarks of a transposon are that it is flanked by inverted repeats on each end and the inverted repeats are flanked by direct repeats. The Delta element of Ty retrotransposons and LTRs (long terminal repeats) are examples of this concept. MeSH, 1999

terminator: A sequence of DNA lying beyond the 3’ end of the coding segment of a gene which is recognized by RNA polymerase as a signal to stop synthesizing mRNA. IUPAC Biotech

Sequence of DNA located either at the end of the transcript  that causes RNA polymerase to terminate transcription  [DDBJ/ EMBL/ GenBank Feature Table]   http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

terminator regions (genetics): DNA sequences which signal the termination of transcription. MeSH, 1991

transcript: Expression  Related terms 3' UTR, 5' UTR, primary transcript, terminator

trans-acting factors: Trans- acting factors functionally have two domains. One domain is required for the factor to bind to DNA, and the second domain is required for the activation of transcription. This was discovered by studying deletion mutants of the factors. Mutants factors were found that could bind DNA but could not activate transcription. Other experiments in which a hybrid protein consisting of the non- DNA binding segment of one trans- acting factor fused to the DNA- binding region of a second trans- acting activated transcription defined the second function of trans- acting factors. Phil McLean "Control of gene expression in eukaryotes" North Dakota State Univ. https://www.ndsu.edu/pubweb/~mcclean/plsc731/cis-trans/cis-trans6.htm  Compare cis-acting factors

transcription: The process by which the genetic information encoded in a linear sequence of nucleotides in one strand of DNA is copied into an exactly complementary sequence of RNA. IUPAC Biotech

The synthesis of an RNA copy from a sequence of DNA (a gene); the first step in gene expression. Compare translation (the process in which the genetic code carried by mRNA directs the synthesis of proteins from amino acids. [DOE]

transcription, genetic: The transfer of genetic information from DNA to messenger RNA by DNA- directed RNA polymerase. It includes reverse transcription and transcription of early and late genes expressed early in an organism's life cycle or during later development.  MeSH, 1973  Related terms: translation,  attenuator, reverse transcriptases, transcription machinery; Narrower terms:  Gene amplification & PCR reverse transcription; Microarrays Northern blotting

translation: The unidirectional process that takes place on the ribosomes whereby the genetic information present in an mRNA is converted into a corresponding sequence of amino acids in a protein. IUPAC Bioinorganic

The conversion of the genetic instructions for a protein from nucleotides of messenger RNA with amino acids. NIGMS

translation, genetic: Formation of peptides on ribosomes, directed by messenger RNA. MeSH, 1973

transposons:  A mobile genetic element that can replicate itself and insert itself into the genome, including interrupting genes and disrupting their function, an insertional mutagen. 

One of a class of genes that are capable of moving spontaneously from one chromosome to another, or from one position to another in the same chromosome; also known as jumping genes or transposable elements. [Glick]

DNA elements carrying genes for transposition and other genetic functions.  In many cases the latter genes enable bacteria to live in extreme environments. Transposons are much longer than IS (Insertion) elements. Abbreviated Tn. Schlindwein

First recognized in the 1940’s by Dr. Barbara McClintock in studies of peculiar inheritance patterns found in the colors of Indian corn. Also known as  “jumping DNA”, referring to the fact that some stretches of DNA are unstable and “transposable” i.e. they can move around – on and between chromosomes.  Related term: DNA transposable elements How are these two terms different?

URF: Unidentified Reading Frame

UTR: The parts of the messenger RNA sequence that do not code for product, i.e. the 5' UNTRANSLATED REGIONS and 3' UNTRANSLATED REGIONS. MeSH, 1999

UnTranslated Region: Critical for many aspects of gene regulation and expressionNarrower terms  3' UTR, 5' UTR. 

upstream: Identifies sequences located in a direction opposite to that of expression; for example, the bacterial promoter is upstream of the initiation codon. In an mRNA molecule, upstream means toward the 5' end of the molecule. Occasionally used to refer to a region of a polypeptide chain which is located toward the amino terminus of the molecule. Lemon

Sequences DNA resources
DDBJ/ EMBL/ GenBank Feature Table, 2             017 http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
Ensembl Glossary https://www.ensembl.org/info/website/glossary.html
Mouse Genome Informatics Glossary, Jackson Lab, US, 2006 
http://www.informatics.jax.org/mgihome/other/glossary.shtml

How to look for other unfamiliar  terms

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.

Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map