You are here Biopharmaceutical glossary homepage/Search > Informatics > Bioinformatics & Computational biology

Bioinformatics for pharmaceuticals glossary & taxonomy
Evolving terminology for emerging technologies

Suggestions? Comments? Questions? Mary Chitty mchitty@healthtech.com

Last revised November 16, 2007
View a Printer-Friendly Version of this Web Page!


New Page 1

Please register for CHI's Genomics Glossaries & Taxonomies website. This sign-in box with then disappear from each page, if you accept cookies. Use of this site will continue to be free, but better demographic data on who is accessing this material helps us to justify the expense of maintaining this resource. Registration policy has details.

Registered users of the Genomics Glossaries & Taxonomies will automatically be signed up for CHI's complimentary email monthly newsletter, GenomeLink, unless you choose to opt out of receiving it.

Mr.     Ms.     Mrs.     Dr.     Prof.

First:

         

Last:

Title:

Dept.:

Company:

Address:

City:

State:

Zip:

Country:

Email:

Opt-out of Email

YES    NO

Telephone:

Would you like to receive CHI event updates via fax? 
Yes       No 

Fax:


Applications  Informatics Maps: Finding guides to terms in these glossaries  Site Map
Related glossaries include 
Applications: Drug Discovery & Development, Functional genomics
Informatics: Algorithms & data analysis, Chemoinformatics, Computers & computing, Information management & interpretation Databases & software directory, In silico & molecular Modeling
Technologies Sequencing
Biology: DNA, Expression, Proteins, Sequences, DNA & beyond

annotated databases:  Databases may contain a combination of amino acid sequences, comments, literature references and notes on known post- translational modifications to the sequence. A database that contains all of these elements is referred to as "annotated". Other databases only contain the sequence, an accession number and a descriptive title. Annotation of each entry is obviously very time- consuming and difficult to maintain without errors. Therefore annotated databases usually have many fewer sequence entries than non- annotated ones. Annotation also implies that some functional or structural information is known about the mature protein, as opposed to a sequence that is known only from the translation of a stretch of nucleotide sequence. Even the best annotated databases now include large numbers of entries that have very little real information about the mature protein other than some reference to who sequenced and translated the nucleotide sequence. Annotated databases are technically superior for many purposes, because they contain information about the true form of the mature protein. [Biopolymer Markup Language — BIOML Working Draft Proposal, 1999] http://www.rdcormia.com/COIN79/b_chpt1.htm

annotation: The annotation process identifies sequence features on the contigs - such as variation, sequence tagged sites, FISH mapped clone regions, known and predicted genes, and gene models. This stage provides contig, mRNA, and protein records with added feature annotation. [NCBI Contig Assembly and Annotation Process, 2001]  http://www.ncbi.nlm.nih.gov/genome/guide/build.html#contig 

The value of a genome is only as good as its annotation. At the Sanger Institute, we are providing high quality manual curation in addition to automated prediction provided by Ensembl. Finished genomic sequence is analysed on a clone by clone basis using a combination of similarity searches against DNA and protein databases as well as a series of ab initio gene predictions. Manual Curation of the Human Genome, Wellcome Trust, Sanger Institute, 2003 http://www.sanger.ac.uk/HGP/havana/ 

Each fragment of DNA contains unique features. A DNA fragment may encode a portion of a gene or a gene control sequence, or the fragment may be a portion of a genome that has no apparent function. Bioinformaticists perform detailed analysis of DNA fragments, comparing new DNA sequence, previously annotated DNA sequences and identifying common characteristics, and assigning known or putative potential functions to the DNA sequence. Cross species DNA sequence comparison is quite common and can reveal common genes shared between organisms. A bioinformatic study may also require peptide to peptide comparisons allowing common structural features of proteins to define the function a DNA fragment encoding a specific protein or enzyme.   [CHI High Throughput Genomics] report,  2001.

The elucidation and description of biologically relevant features in the sequence is essential in order for genome data to be useful. The quality with which annotation is done will have direct impact on the value of the sequence. At a minimum, the data must be annotated to indicate the existence of gene coding regions and control regions. Further annotation activities that add value to a genome include finding simple and complex repeats, characterizing the organization of promoters and gene families, the distribution of G + C content, and tying together evidence for functional motifs and homologs. [Lawrence Berkeley Lab, US "Advanced Computational Structural Genomics"] 

Explanatory notes, comments, analysis and commentaries added to a database. May refer to sequence data or protein structures and includes predictions, characterizations, summaries, and other detailed information, including gene function. Annotation can be manual (as in SWISS- PROT) or automated (as in TrEMBL).  Since annotation is highly skilled and labor intensive, efforts are being made to automate the process, at least for preliminary data. 

Related terms: annotated databases, curated databases, comparative genome annotation,  distributed annotation system, genome annotation; SNPs & genetic variations glossary Genetic Annotation Initiative 

Narrower terms: baseline annotation, computational annotation, distributed sequence annotation; Proteomics: annotation - proteins; 

annotation- proteins: Proteomics glossary

BISTI Consortium: Established in May 2000 to serve as the focus of biomedical computing issues at the NIH and to facilitate implementation of the BISTI recommendations. The Consortium is composed of senior-level representatives from the NIH centers and institutes and representatives of other Federal agencies concerned with bioinformatics and computational applications. The mission of the BISTI Consortium is to make optimal use of computer science and technology to address problems in biology and medicine by fostering new basic understandings, collaborations, and transdisciplinary initiatives between the computational and biomedical sciences. http://www.bisti.nih.gov/bistic2.cfm

bacterial bioinformatics: Antibiotic resistance amongst virulent species is on the increase, causing major concern worldwide. ...  It is evident that some current therapies are no longer effective, and accordingly, novel antimicrobials will need to be developed. To help counteract these problems, advances in technology can be used to hasten the hunt for new drug and vaccine targets. One obvious advantage of using computer- based screening techniques (bioinformatics)  to scan newly sequenced pathogen genomes is the speed at which identification of novel targets can be carried out. '"Bacteriology for the Bioinformatician" Edward Jenner Institute for Vaccine Research, UK http://www.jenner.ac.uk/BacBix3/BACforBIX.htm

Bacterial Bioinformatics http://www.jenner.ac.uk/BacBix3/Welcomehomepage.htm

baseline annotation: As good as possible, computational only annotation. [Project Ensembl, Wellcome Trust, Sanger Institute, EBI, UK, 2001] http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/ScienceDocumentation.html

BioConductor:  An open source and open development software project to provide tools for the analysis and comprehension of genomic data (bioinformatics). http://www.bioconductor.org/ 

biocorba.org: Provides an object- oriented, language neutral, platform independent method for describing and solving bioinformatic problems. BioCORBA's mission is to leverage the code of the other Bio projects in a simple and easy to use fashion. For example language neutral environment allows users to write programs using BioPython and access BioPerl modules through the CORBA server. http://www.biocorba.org/

bioinformatics:  Roughly, bioinformatics describes any use of computers to handle biological information. In practice the definition used by most people is narrower; bioinformatics to them is a synonym for "computational molecular biology" - the use of computers to characterise the molecular components of living things. [Damian Counsell, bioinformatics.org FAQ] http://bioinformatics.org/faq/#whatIsBioinformatics

See above bioinformatics.org FAQ for tight and loose definitions of bioinformatics, and information on how long the term has been used. 

The definition of bioinformatics is not univerally agreed upon. Generally speaking, we define it as the creation and development of advanced information and computational technologies for problems in biology, most commonly molecular biology (but increasingly in other areas of biology). As such, it deals with methods for storing, retrieving and analyzing biological data, such as nucleic acid (DNA/RNA) and protein sequences, structures, functions, pathways and genetic interactions.  Some people construe bioinformatics more narrowly, and include only those issues dealing with the management of genome project sequencing data. Others construe bioinformatics more broadly and include all areas of computational biology, including population modeling and numerical simulations.   Biomedical informatics is a slightly broader umbrella that includes not only bioinformatics, but other areas of informatics in biology, medicine and health-care.  They are closely related.  Russ Altman "Guide to informatics at Stanford University,  2006  http://www-helix.stanford.edu/people/altman/bioinformatics.html 

Research, development or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Biomedical Information Science and Technology Initiative BISTI Bioinformatics at the NIH, 2000  http://www.bisti.nih.gov/ 

The earliest Medline reference I've found to bioinformatics is William Bain's "Bioinformatics in Europe - the federation strikes back" in Trends in Biotechnology 11(6): 217- 218 June 1993.

We have coined the term Bioinformatics for the study of informatic processes in biotic systems. Our Bioinformatic approach typically involves spatial, multi- leveled models with many interacting entities whose behavior is determined by local information. [Theoretical Biology Group, Univ. of Utrecht, Netherlands, Paulien Hogeweg Director]  http://www-binf.bio.uu.nl/

Original definition was “the study of informatic processes in biotic systems” Paulien Hogeweg MIRROR beyond MIRROR, puddles of LIFE, in Artificial Life, ed. C.G. Langton, Addison Wesley, 297-316, 1988 [Nick Saville's homepage, Theoretical Biology and Bioinformatics, Utrecht Univ., Netherlands, 1997]  

Narrower terms: bacterial bioinformatics, comparative bioinformatics, functional bioinformatics, glycobioinformatics, medical bioinformatics, molecular bioinformatics, pharmaceutical bioinformatics, protein bioinformatics; Structural genomics structural bioinformatics;  Related terms: European Bioinformatics Institute EBI, Open Bioinformatics Foundation; Algorithms glossary  data mining

bioinformatics visualization:  Special issue of Informatics Visualisation, vol. 4 no. 3, Sept. 2005 guest editors Chris North & Theresa-Marie Rhyne http://people.cs.vt.edu/~north/BioVisCFP.html  

biojava.org: An open-source project dedicated to providing Java tools for processing biological data. This will include objects for manipulating sequences, file parsers, CORBA interoperability, access to ACeDB, dynamic programming, and simple statistical routines.  The BioJava library is useful for automating those daily and mundane bioinformatics tasks. http://www.biojava.org/

BioLisp.org:   A public resource supporting scientists who use Lisp to develop intelligent applications in the biological sciences. 

biological computing: Computers & computing glossary

biological databases: Biological databases have inherent complications stemming from the nature of the information they contain and the dependence of computational methods on these data. Most biological data are not digital, making machine- readability of the data (for automated data- mining) impossible. In addition, the lack of standardized nomenclature and ontology, the use of protein aliases (leading to ambiguity), the lack of interoperability across databases, and the presence of errors in database annotations have hindered and complicated the use of computational methods. Defining the Mandate of Proteomics in the Post- Genomics Era, Board on International Scientific Organizations, National Academy of Sciences, 2002  http://www.nap.edu/books/NI000479/html/R1.html

biomedical computing: Information interpretation glossary

bioMOBY: An international group of biological data hosts, biological data service providers, and coders whose aim is to set standards for biological data representation, distribution, and discovery. http://biomoby.org/

BIONLP.org: Natural language processing of biology text. [Bob Futrelle, Computer Science, Northeastern Univ., US, 2002] http://www.ccs.neu.edu/home/futrelle/bionlp/

BioPax:  Biological Pathways Exchange.  A collaborative effort to create a data exchange format for biological pathway data. http://www.biopax.org/ 

Related terms: metabolic pathways

bioperl.org: An international association of developers of open source Perl tools for bioinformatics, genomics and life science research. We work closely with our friends and colleagues at biojava.org, biopython.org and bioxml.org. The Bioperl server provides an online resource for modules, scripts, and web links for developers of  Perl- based software for life science research. http://bio.perl.org/

biopython.org: An international association of developers of freely available Python tools for computational molecular biology. biopython.org provides an online resource for modules, scripts, and web links for developers of Python- based software for life science research. http://www.biopython.org/

biosemiotics: http://www.gypsymoth.ento.vt.edu/~sharov/biosem/biosem.html#topics

BioWidget Consortium Home Page, Computation Biology & Informatics Lab, Univ. of Pennsylvania, US.  The bioWidgets toolkit is a collection of Java Beans (used for development of graphics applications and/or applets in the genomics domain).  http://www.cbil.upenn.edu/bioWidgets/

bioxml.org: This site was created to be a center for development for open source biological DTDs.  http://www.xml.com/pub/r/1118 

CORBA: Computers & computing glossary

cellular bioinformatics:   The lesser developed branch of bioinformatics that focuses on the understanding of the functioning living cell. As such it has to integrate DNA, mRNA, protein and metabolic data. Because of the complexity of the problem, it also needs to invoke mathematical modeling. ... The branch of cellular bioinformatics that focuses on understanding on the basis of all the know experimental data is also called computational biochemistry. Hans Westerhoft, Vrije Universiteit Netherlands http://www.bio.vu.nl/hwconf/papers/cellbioinf.html  

comparative bioinformatics: The genome sequences from several chordates are being completed; the bioinformatics largely exists in the research community to discover the protein-coding potential of those genomes. However, the bioinformatics to elucidate gene regulation encoded in genomes and gene regulatory networks is not so developed. New bioinformatics, new model organism resources, new experimental approaches, and new collaborations are needed if the community is to understand the gene networks that help create phenotypes of interest. A research team at ORNL and the University of Tennessee are developing some needed bioinformatics. The overall projects include 1) supplying several web services and collaborative bioinformatics that supports large consortia of experimental researchers and 2) developing comparative bioinformatics and new data mining environments that can ultimately help understand the nature and evolution of gene regulatory networks. J. Snoddy et. al. Univ of Tennessee, ORNL, International Mammalian Genome 17 Nov. 2002  http://imgs.org/abstracts/2002abstracts/file192.htm

comparative genome annotation:  The major immediate interests of the genome projects are in the identification of protein coding regions. However, a complete description of gene structure necessitates identification of the associated sites which signal the different processes in the gene to protein pathway. Such sites include promoters, transcription start and end points, poly-adenylation sites, splice sites, and translation start and stop sites. In addition, regulatory regions form an important functional component of gene structure. Indeed, gene regulation may utilise alternatives in promoters, splice sites and translation start sites. Accurate identification of coding regions is aided by the identification of such sites, and vice versa3. Identification of regulatory sites is more accurate when they are viewed in the context of other surrounding elements.  [Briefings in Bioinformatics" special issue,  proceedings from the symposium on "Genome Based Gene Structure Determination" conducted at the EMBL European Bioinformatics Institute (EBI) during June 1-2, 2000] http://industry.ebi.ac.uk/~thanaraj/BIB_Editorial.htm  

Broader term: genome annotation Related term: Functional genomics comparative genomics

computational annotation: The workshop began with a series of presentations on computational annotation and experimental approaches to biological confirmation of functional elements in the genomes of both model organisms and the human. Subsequent to those discussions, NHGRI outlined its proposal for a pilot project to exhaustively determine all functional elements in a small fraction (~1 percent) of the human genome, Initial Inventory of Functional Elements to Identify: The participants recommended that both protein- coding genes and non- protein- coding genes need to be identified. For each of these, the complete (full- length) coding sequence and all variants, as well as the transcriptional regulatory elements (e.g., promoters and enhancers) and post- transcriptional regulatory elements (e.g. cis- acting RNA elements) should be described. All pseudogenes should be identified. A number of global sequence features, such as sites of methylation, sequence variation, evolutionary history of sequence blocks and repetitive elements were suggested for inclusion, as were a number of chromosomal elements, such as origins of replication, nuclease hypersensitive sites, matrix attachment sites and histone modifications. Workshop on the Comprehensive Extraction of Biological Information from Genomic Sequence, Bethesda, Md. July 23-24, 2002, http://www.genome.gov/10005568

computational annotation technologies: Several ‘wet bench’ technologies and resources were discussed. These included DNA array studies, RT-PCR/ cDNAs, in situ hybridization, chromatin immunoprecipitation, RNAi, knockout mice, and antibody analysis of protein function. A broad range of computational approaches were also considered to be critical for inclusion. These included both comparative sequence analysis of multiple genomic sequences to identify conserved elements and automated prediction of functional elements, including coding sequences, promoters, alternative splice variants and other highly conserved regions. The importance of ensuring close collaboration between experimental and computational approaches was stressed. Workshop on the Comprehensive Extraction of Biological Information from Genomic Sequence, Bethesda, Md. July 23-24, 2002, http://www.genome.gov/10005568

computational biology: The development and application of data - analytical and theoretical methods, mathematical modelling and computational simulation techniques to the study of biological, behavioral, and social systems. Biomedical Information Science and Technology Initiative BISTI Bioinformatics at the NIH, 2000  http://www.bisti.nih.gov/ 

I find that people use "computational biology" when discussing that subset of bioinformatics (in the broadest sense) closest to the field of classical general biology.  Computational biologists interest themselves more with evolutionary, population and theoretical biology rather than cell and molecular biomedicine. It is inevitable that molecular biology is profoundly important in computational biology, but it is certainly not what computational biology is all about (see next paragraph). In these areas of computational biology it seems that computational biologist's have tended to prefer statistical models for biological phenomena over physico- chemical ones. This is often wise... 

One computational biologist (Paul J Schulte) did object to the above and makes the entirely valid point that this definition derives from a popular use of the term, rather than a correct one. Paul works on water flow in plant cells and points out that biological fluid dynamics is a field of computational biology in itself - and this, like any application of computing to biology, can be described as computational biology... Where we disagree, perhaps, is in his conclusion from this - which I reproduce in full: "Computational biology is not a "field", but an "approach" involving the use of computers to study biological processes and hence it is an area as diverse as biology itself." 

Richard Durbin, Head of Informatics at the Wellcome Trust Sanger Institute, expressed an interesting opinion on this distinction in an interview on this distinction:  "I do not think all biological computing is bioinformatics, e.g. mathematical modelling is not bioinformatics, even when connected with biology- related problems. In my opinion, bioinformatics has to do with management and the subsequent use of biological information, particular genetic information."  [Damian Counsell, bioinformatics.org FAQ, 2001] https://bioinformatics.org/faq/#definitionOfCompbiol

A field of biology concerned with the development of techniques for the collection and manipulation of  biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories applicable to molecular biology and areas of computer- based techniques for solving biological problems including manipulation of models and datasets.  MeSH, 1997

Computational biology FAQ, Robert D. Phair, US, 2000 http://www.bioinformaticsservices.com/bis/resources/faq/faq.html

conceptual biology:  As we see it, is not a distinct type of science, but rather it has a different source: the information in databases... By logical, critical analysis of existing facts and models, one can generate a hypothesis in which predictions are formulated in testable terms, and then search for relevant information among published reports of experiments that may have had a different purpose altogether. MG Blagosklonny and AB Pardee, Unearthing the gems: Conceptual Biology, Nature 416 (6879): 373, 28 March 2002

The iterative process of analysing existing facts and models available in published literature to generate new hypotheses. Julie C. Barnes, Conceptual biology: a semantic issue and more, Nature 417(6889): 587-588, 6 June 2002

Related terms: Research glossary meta-analyses, meta- analysis

controlled vocabulary: Information management & interpretation glossary 

curated databases: Often less complete than primary databases, but they have less redundancy and the added value of scientific annotation; therefore, a biologically significant sequence should be easier to find in such a database and of greater value. Naturally, the degree of redundancy and annotation in such a database depends on the experience, skills, aims, and devotion of its curators.  ...  The only proper way to curate databases is the way groups like those that developed OMIM [Online Mendelian Inheritance in Man], SWISS- PROT and most commercial databases have done it — that is, through making scientific judgments as data are cleaned up and merged. [CHI Bioinformatics report]

Under the supervision of a curator. Other curated databases include LocusLink, RefSeq, & SGD (Saccharomyces cerevisae Genome Database) and

data mining: Algorithms & data analysis glossary

databases: Collections of data in machine- readable form, which can be manipulated by software to appear in varying arrangements and subsets. [CHI Bioinformatics report] 

Genetic information is stored in different ways in different databases, which makes it hard to compare their holdings. So while computational biologists are trying to improve the quality of the databases, they are also working to build bridges between them.  So far, they have had only limited success … each database has its own Web site with unique navigation tools and data storage formats that make such searching difficult … programs can’t easily recognize data that are not stored in a uniform way. [Elizabeth Pennisi “Seeking Common language in a Tower of Babel” Science: 449 Oct. 15 1999]   

How can the databases be made most useful? Science Functional Genomics Weblog, 2004 http://sciencemag.blogs.com/sfgblog/2004/10/how_can_the_dat.html 
How do we fund the databases? Science Functional Genomics Weblog, 2004 http://sciencemag.blogs.com/sfgblog/2004/10/how_do_we_fund_.html 

Databases & software directory describes and provides links to around 200 databases and about 30 software tools. 

Narrower terms: annotated databases, curated databases, federated databases, integrated databases, interoperability, non- redundant databases, proprietary databases, redundant databases, relational databases, flat files, indexed flat files.

distributed sequence annotation: The pace of human genomic sequencing has outstripped the ability of sequencing centers to annotate and understand the sequence prior to submitting it to the archival databases. Multiple third-party groups have stepped into the breach and are currently annotating the human sequence with a combination of computational and experimental methods. Their analytic tools, data models, and visualization methods are diverse, and it is self-evident that this diversity enhances, rather than diminishes, the value of their work.  Lincoln Stein, et. al. Distributed Sequence Annotation, 2000 http://biodas.org/documents/rationale.html

distributed annotation system:  A client- server system in which a single client integrates information from multiple servers. It allows a single machine to gather up genome annotation information from multiple distant web sites, collate the information, and display it to the user in a single view. Little coordination is needed among the various information providers. [Biodas.org] http://biodas.org/

EBI: European Bioinformatics Institute, Hinxton, Cambridge, UK. An EMBL outstation.  http://www.ebi.ac.uk/

Ensembl: A joint project between EMBL- EBI and the Sanger Centre (UK) to develop a software system which produces and maintains automatic annotation on eukaryotic genomes. Human data are available now; they hope to add mouse data soon.  http://www.ensembl.org/index.html

federated databases: An integrated repository data from of multiple, possibly heterogeneous, data sources presented with consistent and coherent semantics. They do not usually contain any summary data, and all of the data resides only at the data source (i.e. no local storage).   [Lawrence Berkeley Lab "Advanced Computational Structural Genomics" Glossary]

Related term: Information management & interpretation glossary semantic data integration

federated information systems. Their main characteristic is that they are constructed as an integrating layer over existing legacy applications and databases. They can be broadly classified in three dimensions: the degree of autonomy they allow in integrated components, the degree of heterogeneity between components they can cope with, and whether or not they support distribution. Whereas the communication and interoperation problem has come into a stage of applicable solutions over the past decade, semantic data integration has not become similarly clear. Susanne Busse et. al "Federated Information Systems: Concepts, Terminology and Architecture"  Computergestützte Informations Systeme CIS, Berlin, Germany 1999 http://citeseer.ist.psu.edu/busse99federated.html 

flat files: Pure text documents that are totally unstructured. This type of file generally does not provide very specific search answers, but it is the most popular type of file on the Web and is now a bit easier to search, thanks to the use of hyperlinks.

Narrower term: indexed flat files. Related term: relational databases

functional bioinformatics:  The emerging field of functional bioinformatics focuses on the development of ontologies or concept classifications fed into algorithms used to perform computations of the functions of  biomolecules .["About bioinformatics" George Washington Univ. Medical Center, 2002] http://www.gwumc.edu/bioinformatics/about/bioinfo.htm

An emerging subfield of bioinformatics that is concerned with ontologies and algorithms for computing with biological function. Functional bioinformatics is the computational counterpart of functional genomics ...  is concerned with managing and analyzing functional genomics data, such as gene expression experiments and large- scale knock- out experiments. .. emphasizes large- scale computational problems, such as problems involving complete metabolic networks and genetic networks.  [Peter D. Karp "An ontology for biological function based on molecular interactions" Bioinformatics Ontology 16 (3): 269- 285, 2000] 

Related terms: Functional genomics glossary, Metabolic Engineering glossary

functional informatics: 

Gene OntologyTM (GO): Functional genomics glossary Broader term Information management & interpretation glossary ontology. 

genome annotation: It is now apparent that the bottleneck in genomics is no longer in sequencing the genomes, but lies in their annotation. Large- scale annotation efforts require handling massive amounts of genome data through automated pipelines, with a need to combine diverse sources of data and methods. In addition, it requires visualisation tools to manually examine the automatic annotation, since integration of human expertise to assess the validity and authenticity of all computational results goes a long way to improve the quality of gene annotation. The "Annotation Jamboree", a collaboration between Celera, the Berkeley Drosophila Genome Project, and a team of experts on the annotation of the Adh region of Drosophila, is an exemplary attempt on how to transform the process of manual annotation into a high- throughput operation. [Paradigm Shifts in the Approaches for Gene Annotation, a special issue of "Briefings in Bioinformatics" which reports on the proceedings from the recently concluded symposium on "Genome Based Gene Structure Determination" conducted at the EMBL European Bioinformatics Institute (EBI) during June 1- 2, 2000.]  http://industry.ebi.ac.uk/~thanaraj/BIB_Editorial.htm  

Narrower term: comparative genome annotation 

Genome Annotation Data Warehouse: Databases & software directory

genome browser, genomic data: Genomics glossary

glycobioinformatics, glycoinformatics: Glycosciences glossary .

high- throughput bioinformatics: Bioinformatics is currently undergoing dramatic changes, as high- throughput laboratory methods lead to changes in key approaches, including sequence analysis, gene expression analysis, protein expression analysis, and protein structure prediction and modeling. [CHI Bioinformatics report press release] 

Related terms: Assays & screening glossary throughput Functional genomics glossary; systems biology Structural genomics glossary structural proteomics

I2B2 Informatics for Integrating Biology & the Bedside:  An NIH- funded National Center for Biomedical Computing based at Partners HealthCare System. [Boston]  http://www.i2b2.org/ 

indexed flat files IFFs: Partially structured databases, which may include a thesaurus (adding the ability to search synonyms) or other basic search tools. ... IFFs, meanwhile, allow users to interactively navigate among entries in several different databases by means of hypertext links. IFFs do not, however, allow true database integration, and gathering information from these types of files is often haphazard: Because the data are not really structured, researchers may end up with many incorrect matches to their queries. The principal advantage of this technology is that it is cheap and easy to understand. [CHI Bioinformatics report]

integrated databases: Integration [of databases] typically is accomplished by creating small, object- oriented software elements, or “wrappers” that let a single overlaying, often browser like, desktop application interact with all the pieces.  The original separate systems are intact and functional, and new ones can be added, while the underlying complexity is transparent to users. There are still many challenges … but computing environments are becoming more unified, flexible and expandable. [A. Thayer “Bioinformatics for the Masses” Chemical & Engineering News 78(6): 19-32 Feb. 7, 2000] 

Information in OMIM [Online Mendelian Inheritance in Man] and the published working draft of the International Human Genome Sequencing Consortium (Nature 15 Feb. 2001) has been facilitated by ties to NCBI's RefSeq and LocusLink databases. Are there other good examples of integrated databases?

Related terms:  Bio-Ontology Standards Group, Data Model Standards Group; Functional genomics Gene Ontology  

integration:  Integration of the various types of large- scale data is currently receiving much attention. There appears, however, to be little agreement on what exactly is meant by "integration", not to mention how to achieve it. The word "integration" is being attached to almost any analysis that involves the combined use of two or more large datasets.  Lars J. Jensen, Peer Bork, Quality analysis and integration of large- scale molecular data sets. Drug Discovery Today: Targets, 3(2): 51-56

integration (of databases): Allows researchers to increase the value they get from the data, because it increases the base of information they can access and allows for more robust searching. [CHI Bioinformatics report]  

Related terms: Computers & computing middleware, Object Oriented modeling OOM, object protocol model OPM; Maps genomic & genetic memory mapped data structures

interoperability: Information management & interpretation glossary

Interoperable Informatics Infrastructure Consortium I3C: Is this active? still in existence? http://www.consortiuminfo.org/links/detail.php?ID=186

LSID Life Sciences Identifiers: 
Cover pages http://xml.coverpages.org/lsid.html  

medical bioinformatics: Linking clinical data to patient gene profiling. Covers haplotyping, genotyping, population genomics, gene expression profiling, particularly for use in diagnosis, prognosis and therapeutic stratification of patients.

Google = about 512, Oct. 15, 2003

Related terms: Biomarkers glossary, Expression glossary, Microarrays and protein chips glossary

memory-mapped data structures: Computers & computing glossary  

metadata: Information management & interpretation glossary 

middleware, modularity: Computers & computing glossary

molecular bioinformatics: Conceptualizing biology in terms of molecules (in the sense of physical- chemistry) and then applying "informatics" techniques (derived from disciplines such as applied math, CS [computer science] and statistics to understand and organize the information associated with these molecules on a large- scale. [Mark Gerstein "What is Bioinformatics?" MB&B 474b3, 2001] http://bioinfo.mbb.yale.edu/what-is-it.html

NCBI  National Center for Biotechnology Information: Established in 1988 as a national  resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease. Part of  NIH. http://www.ncbi.nlm.nih.gov

non-redundant databases: Researchers at the National Center for Biotechnology Information (NCBI) coined the term "nr" database (nonredundant database) to refer to a database in which the obviously redundant entries have been merged. These entries are typically those that are 100%, character- by- character identical, and algorithms exist that can remove such redundancy. Although such a database has less redundancy than a primary database, a substantial amount of redundancy remains, and it can be removed only by a curator using scientific judgment. [CHI Bioinformatics report]

Many databases try to be “non-redundant”.  Unfortunately, biological data is too complex to fit a simple definition of redundancy … Each “non- redundant” database has its own definition of redundancy. [George Church Lab, Harvard Medical School, US]  http://arep.med.harvard.edu/seqanal/db.html   Examples of non- redundant databases include UniGene and SWISS- PROT, while DDBJ/ EMBL/ GenBank are redundant databases.

OMG Object Management Group: Computers & computing glossary

Object- oriented modeling OOM: Computers & computing glossary

ontology: Information management & interpretation glossary

Open Bioinformatics Foundation OPEN-BIO: The purpose of the foundation is to act as an umbrella organization for the various bio*.org projects that grew out of the original BioPerl project. The goal of the foundation is to provide financial, administrative and technical assistance for our various open source life science projects. http://open-bio.org/

Narrower terms: biojava.org, bioperl.org, biopython.org, bioxml.org Related term: biocorba.org

pharmaceutical bioinformatics: Bioinformatics and structure- aided drug design are really part of the same continuum. Bioinformatics offers a means to get to a structure through sequence; while structure- aided drug design offers a means to get to a drug through structure. We plan to combine innovative computational techniques with biochemical and structural expertise to bring bioinformatics and structure- aided drug design even closer together. In particular, we intend to blend computational chemistry with computational biology to create software that will aid protein chemists in understanding, evaluating and predicting the structure, function and activity of medically and industrially important proteins. My laboratory is currently involved in three "bioinformatics" projects. These include: (1) the development of novel methods to identify remote sequence/ structure relationships; (2) the creation of a compact, relational database with advanced bioinformatics functionality; and (3) the development of novel methods for predicting and evaluating protein secondary and tertiary structure. David Wishart, Wishart Pharmaceutical Research Group, Univ. of Alberta, Canada  http://redpoll.pharmacy.ualberta.ca/projects/bioinfo.html

private databases: See under proprietary databases

proprietary databases:  Fee- based, copyrighted databases (in contrast to public databases such as those at DDBJ/ EMBL/ GenBank).  Examples include Incyte's LifeSeq and Gene Logic's GeneExpress databases.  Some databases charge subscription fees to commercial organizations, with other arrangements available to non- profits.. Also referred to as private databases.

Compare: public databases

protein bioinformatics:  Bioinformatic and experimental analysis of protein superfamilies for understanding protein structure- function relationships and developing strategies for protein engineering. Using superfamily analysis to understand how protein sequence and structure determine protein function. Our computational approach begins with identifying the sets of divergently related proteins that comprise enzyme superfamilies and then attempts to correlate their conserved and variable structural features to similarities and differences in their functions.

This work also requires the development of new tools in protein bioinformatics to identify and evaluate distant relationships and to distinguish those elements of structure that provide common function from those that determine specificity. Designed to take advantage of the huge volumes of data coming out of the genome projects, this approach provides a much more contextual picture of the structure- function paradigm than can be achieved by studying a single protein at a time. This work has been successfully applied to such problems as the prediction of function for unknown reading frames and elucidation of enzyme mechanisms. Patricia Babbitt, Dept. of Biopharmaceutical Sciences, Univ. of California San Francisco, US http://www.ucsf.edu/dbps/faculty/pages/babbitt.html

Very very very short introduction to protein bioinformatics, Patricia Babbitt et. al. http://baygenomics.ucsf.edu/education/workshop1/lectures/w1.color2.pdf

See also Proteomics glossary protein informatics Is there a difference?

Google = about 690 April 1, 2003

public databases: Freely accessible databases such as GenBank/ EMBL/ DDBJ, ArrayExpress or BLOCKS. There has been much debate about public vs. proprietary databases. 

redundant databases: When sequence databanks were first created, primary [redundant] databases had the advantage of being more comprehensive than curated databases and more likely to contain recently discovered sequences. However, redundancy is no longer much of an advantage. In a highly redundant database, biologically significant results are more likely to be hidden among large numbers of irrelevant reported matches. [CHI Bioinformatics report] 

Related term: non- redundant databases

relational databases: Most or all of the data are structured. These files are the hardest to set up and maintain, and require specific knowledge by a searcher, but they are the easiest to use when doing analysis or integration. Data is categorized by specific fields, and so, by knowing the fields one should be able to capture all the relevant data, quite easily. The searchability of a relational database is totally dependent on how well the database has been structured. [CHI Bioinformatics report]

schema: Algorithms & data analysis glossary    

spatio temporal dynamics: Local interactions in space can give rise to large scale spatio temporal patterns (e.g. (spiral) waves, spatio- temporal chaos (turbulence), stationary (Turing- type) patterns and transitions between these modes). Their occurrence and properties are largely independent of the precise interaction structure. They are indeed seen to occur at many organizational levels of biotic systems. Space can be either 'real' space or a state space, e.g. 'phenotype space' in models of speciation or 'shape space' in immunological models of shape- based receptor interactions. We show that such spatio- temporal patterns have important consequences for fundamental bioinformatic processes. Paulien Hogeweg, Overview of Research 1993- 1998, Utrecht University, Netherlands, 1999  http://www-binf.bio.uu.nl/overview/node3.html

standards: Related terms: Bio-ontology Standards Group, CORBA, Data Model Standards Group, object protocol model OPM . EBI [European Bioinformatics Institute] is also working on standards. Microarrays MAML, MGED, MIAMI

structural bioinformatics: Structural genomics glossary

structured data: The complex and richly structured data from genomics can be viewed as the greatest encoding problem of all time (e.g. genome ® organism). From this perspective, the sequencing of the human and other genomes can be viewed as one of the all- time great opportunities for theorists interested in information, its structure and analysis. [UCLA Bioinformatics Institute] http://www.bioinformatics.ucla.edu/index/mission.htm

Related terms:  indexed flat files, memory mapped data structures, relational databases, unstructured data

systems bioinformatics: With the completion of the Human Genome Project, the scientific community is now faced with the even greater challenge of analyzing the resulting data from this and other large-scale genome projects to better understand the networks underlying biological function. Second International Computational Systems Bioinformatics Conference To be Held August 11-14, 2003 at Stanford University, IEEE CS Bioinformatics Technical Chair via BizWire http://quickstart.clari.net/qs_se/webnews/wed/bx/Bca-ieee-cs_csb2003.RMsB_DuP.html

Google = about 1,230 Sept. 2, 2003; about 8,240 May 25, 2005

systems biology: Genetic manipulation & disruption glossary

taxonomies: Information management & interpretation glossary

translational bioinformatics: AMIA refers to translational bioinformatics as the development of storage, analytic, and interpretive methods to optimize the transformation of increasingly voluminous biomedical data, and genomic data in particular, into proactive, predictive, preventive, and participatory health. Translational bioinformatics includes research on the development of novel techniques for the integration of biological and clinical data and the evolution of clinical informatics methodology to encompass biological observations. The end product of translational bioinformatics is newly found knowledge from these integrative efforts that can be disseminated to a variety of stakeholders, including biomedical scientists, clinicians, and patients. Issues relating to database management, administration, or policy will be coordinated through the Clinical Research Informatics domain.  American Medical Informatics Association, AMIA Strategic Plan, 2007 http://www.amia.org/inside/stratplan/ 

unstructured data: Information management & interpretation glossary

wrappers: See under integrated databases

Bibliography
Bioinformatics and Genomics Gateway, BioMedCentral  http://www.biomedcentral.com/gateways/bioinformaticsgenomics/ 
Hightower, Christy, Bioinformatics, Univ. of California Santa Cruz http://library.ucsc.edu/science/subjects/bioinformatics/   
Kahn, Charles E, Jr, editor, Bioinformatics Glossary, Medical College of Wisconsin, 2005, 3000+ terms http://big.mcw.edu/ 

Alpha glossary index

How to look for other unfamiliar  terms

Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map