You are here Biopharmaceutical glossary homepage/Search > Informatics > Bioinformatics & Computational biology

Bioinformatics for pharmaceuticals glossary & taxonomy
Evolving terminologies for emerging technologies

Suggestions? Comments? Questions? Mary Chitty
Last revised December 18, 2014
View a Printer-Friendly Version of this Web Page!


Chemistry term index   Drug discovery term index   Informatics term index   Technologies term index    Biology term index   Site Map
Related glossaries include  Drug Discovery & Development,  Functional genomics
Informatics: Algorithms & data analysis, ChemoinformaticsClinical informatics   Drug discovery informatics  
Genomic informatics   IT infrastructure   Protein informatics
Technologies  Microarrays  Sequencing  See Genomic Informatics for data analysis using these technologies, as well as gene specific informatics.  Systems biology and cellular and physiological biology informatics are in this Bioinformatics glossary. 
Biology: DNA, Expression, Proteins, Sequences, DNA & beyond

Bio-IT World April 21-23, Boston MA 

annotation:  The annotation process identifies sequence features on the contigs such as variation, sequence tagged sites, FISH-mapped clone regions, transcript alignments, known and predicted genes, and gene models. This stage provides contig, RNA, and protein records with added feature annotation. In addition, organism specific features, such as Gene Trap clones for mouse will also be annotated.. NCBI Annotation Information 2008 

The value of a genome is only as good as its annotation. At the Sanger Institute, we are providing high quality manual curation in addition to automated prediction provided by Ensembl. Finished genomic sequence is analysed on a clone by clone basis using a combination of similarity searches against DNA and protein databases as well as a series of ab initio gene predictions. Manual Curation of the Human Genome, Wellcome Trust, Sanger Institute, 2003 

Each fragment of DNA contains unique features. A DNA fragment may encode a portion of a gene or a gene control sequence, or the fragment may be a portion of a genome that has no apparent function. Bioinformaticists perform detailed analysis of DNA fragments, comparing new DNA sequence, previously annotated DNA sequences and identifying common characteristics, and assigning known or putative potential functions to the DNA sequence. Cross species DNA sequence comparison is quite common and can reveal common genes shared between organisms. A bioinformatic study may also require peptide to peptide comparisons allowing common structural features of proteins to define the function a DNA fragment encoding a specific protein or enzyme.  Explanatory notes, comments, analysis and commentaries added to a database. May refer to sequence data or protein structures and includes predictions, characterizations, summaries, and other detailed information, including gene function. Annotation can be manual (as in SWISS- PROT) or automated (as in TrEMBL).  Since annotation is highly skilled and labor intensive, efforts are being made to automate the process, at least for preliminary data. Related terms: annotated databases, curated databases, comparative genome annotation,  distributed annotation system, genome annotation; SNPs & genetic variations Genetic Annotation Initiative Narrower terms: baseline annotation, computational annotation, distributed sequence annotation; Proteomics: annotation - proteins

big data: Bioinformatics for Big Data February 16-18, 2015 • San Francisco, CA Program | Register | Download Brochure
Bioinformatics for Big Data

In March 2012, the Obama Administration launched a $200 million "Big Data Research and Development Initiative," which aims to improve the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data. The initiative will help to transform the use of big data in various sectors including scientific discovery and biomedical research. Big data in medical research is transforming research from hypothesis-driven to data-driven. Efficient analysis and interpretation of big medical data can open up new avenues to explore, new questions to ask, and new ways to answer, leading to better understanding of diseases and development of better and personalized diagnostics and therapeutics. 

the term "Big Data" is meant to capture the opportunities and address the challenges facing all biomedical researchers in releasing, accessing, managing, analyzing, and integrating datasets of diverse data types.  Such data types may include imaging, phenotypic, molecular (including –omics), clinical, behavioral, environmental, and many other types of biological and biomedical data.  They may also include data generated for other purposes (e.g., social media, search histories, and cell phone data).  The datasets are increasingly larger and more complex, and exceed the abilities of currently-used approaches to manage and analyze them.  Biomedical Big Data primarily emanate from three sources: 1) a few groups that produce very large amounts of data, usually as part of projects specifically funded to produce important resources for the research community; 2) individual investigators who produce large datasets for their own projects, which might be broadly useful to the research community; and 3) an even greater number of investigators who each produce small datasets whose value can be amplified by aggregating or integrating them with other data. Centers of Excellence for Big Data Computing in the Biomedical Sciences (U54), July 2013

BioConductor:  An open source and open development software project to provide tools for the analysis and comprehension of genomic data (bioinformatics). 

bioinformatics:  Track 4: Bioinformatics  April 21-23, 2015 • Boston, MA Program | Register | Download Brochure
Track 4: Bioinformatics
technologies and tools that bring together relevant -omic data from multiple physical locations for analysis   virtual data integration across multiple research initiatives can be applied to any disease. Other topics  include collaboration tools, biomarker research, imaging, computational models, clinically actionable variants, and gene mapping and expression.

In recent years, biologists and medical researchers have increasingly relied on computational methods to perform investigations. Bioinformatics is not only an integral part of basic life science research but also plays an important role in converting basic science results to application and/or commercial tools.  The interdisciplinary fields of Bioinformatics and Computational Biology are locked in a high stakes race with analytical instrument developers and innovators. The pace and scope of change in many fields of biomedical research rivals what we once associated only with semiconductor devices. This report explores the interlocking challenges facing instrumentation advances, computational demands and our evolving systems biology knowledge. Key challenges presented in this report include: Instrumentation capable of generating terabytes of raw data daily Storage requirements for human gene sequences Need for cross platform data analysis standards Appropriateness of analysis & modeling applications Database data quality and annotation protocols Insight Pharma Reports, Bioinformatics & Computational Biology, 2009

The Bioinformatics and Computational Biology program, which supports the National Centers for Biomedical Computing, aims to develop novel, cutting-edge software and data management tools to effectively mine the vast wealth of biomedical data generated from sophisticated modern laboratory techniques and facilitate data sharing between researchers. NIH Common Fund 

Roughly, bioinformatics describes any use of computers to handle biological information. In practice the definition used by most people is narrower; bioinformatics to them is a synonym for "computational molecular biology" - the use of computers to characterise the molecular components of living things. Damian Counsell, FAQ]

See above FAQ for tight and loose definitions of bioinformatics, and information on how long the term has been used. 

The definition of bioinformatics is not universally agreed upon. Generally speaking, we define it as the creation and development of advanced information and computational technologies for problems in biology, most commonly molecular biology (but increasingly in other areas of biology). As such, it deals with methods for storing, retrieving and analyzing biological data, such as nucleic acid (DNA/RNA) and protein sequences, structures, functions, pathways and genetic interactions.  Some people construe bioinformatics more narrowly, and include only those issues dealing with the management of genome project sequencing data. Others construe bioinformatics more broadly and include all areas of computational biology, including population modeling and numerical simulations.  Biomedical informatics is a slightly broader umbrella that includes not only bioinformatics, but other areas of informatics in biology, medicine and health-care. They are closely related.  Russ Altman "Guide to informatics at Stanford University,  2006 

We have coined the term Bioinformatics for the study of informatic processes in biotic systems. Our Bioinformatic approach typically involves spatial, multi- leveled models with many interacting entities whose behavior is determined by local information. Theoretical Biology Group, Univ. of Utrecht, Netherlands, Paulien Hogeweg Director  

Original definition was “the study of informatic processes in biotic systems” Paulien Hogeweg MIRROR beyond MIRROR, puddles of LIFE, in Artificial Life, ed. C.G. Langton, Addison Wesley, 297-316, 1988

Despite the apparent fatigue of the linguistic use of the term itself, bioinformatics has grown perhaps to a point beyond recognition. We explore both historical aspects and future trends and argue that as the field expands, key questions remain unanswered and acquire new meaning while at the same time the range of applications is widening to cover an ever increasing number of biological disciplines.  Rise and Demise of Bioinformatics? Promise and Progress, Christos A. Ouzounis, PLOS Computational Biology April 2012

The earliest Medline reference I've found to bioinformatics is William Bain's "Bioinformatics in Europe - the federation strikes back" in Trends in Biotechnology 11(6): 217- 218 June 1993. 
Narrower terms: bacterial bioinformatics, comparative bioinformatics, functional bioinformatics, glycobioinformatics, medical bioinformatics, molecular bioinformatics, pharmaceutical bioinformatics, protein bioinformatics; Protein informatics structural bioinformatics;  Related terms: European Bioinformatics Institute EBI, Open Bioinformatics Foundation; Algorithms  data mining
Carole Goble, Seven Deadly Sins of Bioinformatics, 2007 An open-source project dedicated to providing Java tools for processing biological data. This will include objects for manipulating sequences, file parsers, CORBA interoperability, access to ACeDB, dynamic programming, and simple statistical routines.  The BioJava library is useful for automating those daily and mundane bioinformatics tasks.

biological computers: 

biological databases: Biological databases have inherent complications stemming from the nature of the information they contain and the dependence of computational methods on these data. Most biological data are not digital, making machine- readability of the data (for automated data- mining) impossible. In addition, the lack of standardized nomenclature and ontology, the use of protein aliases (leading to ambiguity), the lack of interoperability across databases, and the presence of errors in database annotations have hindered and complicated the use of computational methods. Defining the Mandate of Proteomics in the Post- Genomics Era, Board on International Scientific Organizations, National Academy of Sciences, 2002

bioMOBY: An international group of biological data hosts, biological data service providers, and coders whose aim is to set standards for biological data representation, distribution, and discovery. Natural language processing of biology text. Bob Futrelle, Computer Science, Northeastern Univ., US

BioPax:  Biological Pathways Exchange.  A collaborative effort to create a data exchange format for biological pathway data.  Related terms: metabolic pathways An international association of developers of open source Perl tools for bioinformatics, genomics and life science research. We work closely with our friends and colleagues at, and The Bioperl server provides an online resource for modules, scripts, and web links for developers of  Perl- based software for life science research. An international association of developers of freely available Python tools for computational molecular biology. provides an online resource for modules, scripts, and web links for developers of Python- based software for life science research.


BISTI Consortium: The Biomedical Information Science and Technology Initiative is a consortium of representatives from each of the NIH institutes and centers. ...  The mission of BISTI is to make optimal use of computer science and technology to address problems in biology and medicine by fostering new basic understandings, collaborations, and transdisciplinary initiatives between the computational and biomedical sciences.

cellular bioinformatics:   The lesser developed branch of bioinformatics that focuses on the understanding of the functioning living cell. As such it has to integrate DNA, mRNA, protein and metabolic data. Because of the complexity of the problem, it also needs to invoke mathematical modeling. ... The branch of cellular bioinformatics that focuses on understanding on the basis of all the know experimental data is also called computational biochemistry. Hans Westerhoft, Vrije Universiteit Netherlands 

clinical bioinformatics: Clinical & Medical informatics

comparative bioinformatics:  The main focus of the group is the development of novel algorithms for the comparison of multiple biological sequences. Multiple comparisons have the advantage of precisely revealing evolutionary traces, thus allowing the identification of functional constraints imposed on the evolution of biological entities. Most comparisons are currently carried out on the basis of sequence similarity. Our goal is to extend this scope by allowing comparisons based on any relevant biological signal such as sequence homology, structural similarity, genomic structure, functional similarity and more generally any signal that may be identified within biological sequences. Using such heterogeneous signals serves two complementary purposes: (i) producing better models that take advantage of the signal evolutionary resilience, (ii) improving our understanding of the evolutionary processes that lead to the diversification of biological functions  Centre for Genomic Regulation, Barcelona Spain   . 

comparative systems biology: My research projects in comparative systems biology have four main thrusts: whole-genome functional annotation, multi-clustering of molecular profiles, cross-condition analysis of functional genomics data, and computationally-driven design of biological experiments. The research I am conducting with my life science colleagues in comparative systems biology has the goal of providing precise functional annotations to hypothetical genes in model organisms and in newly-sequenced genomes; delineating similarities and differences in cellular networks activated in different diseases; identifying core cellular pathways common to response networks for multiple stresses in various model organisms; and refining our understanding of the molecular basis of disease resistance in plant-pathogen interactions. Research interests, TM Murali, Computer Sciences, Virginia Tech,    Broader term: systems biology  

computational annotation: The workshop began with a series of presentations on computational annotation and experimental approaches to biological confirmation of functional elements in the genomes of both model organisms and the human. Subsequent to those discussions, NHGRI outlined its proposal for a pilot project to exhaustively determine all functional elements in a small fraction (~1 percent) of the human genome, Initial Inventory of Functional Elements to Identify: The participants recommended that both protein- coding genes and non- protein- coding genes need to be identified. For each of these, the complete (full- length) coding sequence and all variants, as well as the transcriptional regulatory elements (e.g., promoters and enhancers) and post- transcriptional regulatory elements (e.g. cis- acting RNA elements) should be described. All pseudogenes should be identified. A number of global sequence features, such as sites of methylation, sequence variation, evolutionary history of sequence blocks and repetitive elements were suggested for inclusion, as were a number of chromosomal elements, such as origins of replication, nuclease hypersensitive sites, matrix attachment sites and histone modifications. Workshop on the Comprehensive Extraction of Biological Information from Genomic Sequence, Bethesda, Md. July 23-24, 2002,

computational annotation technologies: Several ‘wet bench’ technologies and resources were discussed. These included DNA array studies, RT-PCR/ cDNAs, in situ hybridization, chromatin immunoprecipitation, RNAi, knockout mice, and antibody analysis of protein function. A broad range of computational approaches were also considered to be critical for inclusion. These included both comparative sequence analysis of multiple genomic sequences to identify conserved elements and automated prediction of functional elements, including coding sequences, promoters, alternative splice variants and other highly conserved regions. The importance of ensuring close collaboration between experimental and computational approaches was stressed. Workshop on the Comprehensive Extraction of Biological Information from Genomic Sequence, Bethesda, Md. July 23-24, 2002,

computational biology: The development and application of data - analytical and theoretical methods, mathematical modelling and computational simulation techniques to the study of biological, behavioral, and social systems. Biomedical Information Science and Technology Initiative BISTI Bioinformatics at the NIH, 2000 

I find that people use "computational biology" when discussing that subset of bioinformatics (in the broadest sense) closest to the field of classical general biology.  Computational biologists interest themselves more with evolutionary, population and theoretical biology rather than cell and molecular biomedicine. It is inevitable that molecular biology is profoundly important in computational biology, but it is certainly not what computational biology is all about (see next paragraph). In these areas of computational biology it seems that computational biologist's have tended to prefer statistical models for biological phenomena over physico- chemical ones. This is often wise...   One computational biologist (Paul J Schulte) did object to the above and makes the entirely valid point that this definition derives from a popular use of the term, rather than a correct one. Paul works on water flow in plant cells and points out that biological fluid dynamics is a field of computational biology in itself - and this, like any application of computing to biology, can be described as computational biology... Where we disagree, perhaps, is in his conclusion from this - which I reproduce in full: "Computational biology is not a "field", but an "approach" involving the use of computers to study biological processes and hence it is an area as diverse as biology itself."  Richard Durbin, Head of Informatics at the Wellcome Trust Sanger Institute, expressed an interesting opinion on this distinction in an interview on this distinction:  "I do not think all biological computing is bioinformatics, e.g. mathematical modelling is not bioinformatics, even when connected with biology- related problems. In my opinion, bioinformatics has to do with management and the subsequent use of biological information, particular genetic information."  [Damian Counsell, FAQ, 2001]

A field of biology concerned with the development of techniques for the collection and manipulation of  biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories applicable to molecular biology and areas of computer- based techniques for solving biological problems including manipulation of models and datasets.  MeSH, 1997  Related terms: protein informatics  
Computational biology FAQ
, Robert D. Phair, US, 2000

conceptual biology:  As we see it, is not a distinct type of science, but rather it has a different source: the information in databases... By logical, critical analysis of existing facts and models, one can generate a hypothesis in which predictions are formulated in testable terms, and then search for relevant information among published reports of experiments that may have had a different purpose altogether. MG Blagosklonny and AB Pardee, Unearthing the gems: Conceptual Biology, Nature 416 (6879): 373, 28 March 2002

The iterative process of analysing existing facts and models available in published literature to generate new hypotheses. Julie C. Barnes, Conceptual biology: a semantic issue and more, Nature 417(6889): 587-588, 6 June 2002  Related terms: Research  meta-analyses, meta- analysis

curated databases: Often less complete than primary databases, but they have less redundancy and the added value of scientific annotation; therefore, a biologically significant sequence should be easier to find in such a database and of greater value. Naturally, the degree of redundancy and annotation in such a database depends on the experience, skills, aims, and devotion of its curators.  ...  The only proper way to curate databases is the way groups like those that developed OMIM [Online Mendelian Inheritance in Man], SWISS- PROT and most commercial databases have done it — that is, through making scientific judgments as data are cleaned up and merged.   Under the supervision of a curator. Other curated databases include LocusLink, RefSeq, & SGD (Saccharomyces cerevisae Genome Database) 

databases: Collections of data in machine- readable form, which can be manipulated by software to appear in varying arrangements and subsets. 

Genetic information is stored in different ways in different databases, which makes it hard to compare their holdings. So while computational biologists are trying to improve the quality of the databases, they are also working to build bridges between them.  So far, they have had only limited success … each database has its own Web site with unique navigation tools and data storage formats that make such searching difficult … programs can’t easily recognize data that are not stored in a uniform way. Elizabeth Pennisi “Seeking Common language in a Tower of Babel” Science: 449 Oct. 15 1999

Databases & software
describes and provides links to around 200 databases and about 30 software tools.  Narrower terms: annotated databases, curated databases, federated databases, integrated databases, interoperability, non- redundant databases, proprietary databases, redundant databases, relational databases, flat files, indexed flat files.

distance functions or similarity scores: The key issue in comparing expression profiles is deciding what it means for two profiles to be "similar." Mathematically, we need a function that takes two expression profiles and calculates a similarity score. It is sometimes easier to work with the opposite concept of distance, and people often speak of distance functions instead of similarity scores. Many similarity or distance functions are used in microarray work, and there is no consensus as to which one is best. Narrower terms: Euclidean distance, Pearson correlation

distributed annotation system:  A client- server system in which a single client integrates information from multiple servers. It allows a single machine to gather up genome annotation information from multiple distant web sites, collate the information, and display it to the user in a single view. Little coordination is needed among the various information providers.

dynamic modeling: Mathematical approaches to studying biological variation have changed little in several decades. There is a need to develop new dynamic models to illuminate how systems interact and evolve. Just as important, it is critical to study the nature of biological and mathematical assumptions of models and statistics. Tools for analyzing and interpreting data on the architecture of complex phenotypes should be developed in the context of real biological information. Genetic Architecture, Biological Variation and Complex Phenotypes, PA-02-110, May 29, 2002- June 5, 2005

Euclidean distance: Commonly used distance function, which works by treating each expression profile as defining a point in a multidimensional space. 

European Bioinformatics Institute EBI, Hinxton, Cambridge, UK. An EMBL outstation.

functional bioinformatics:  The emerging field of functional bioinformatics focuses on the development of ontologies or concept classifications fed into algorithms used to perform computations of the functions of biomolecules . "About bioinformatics" George Washington Univ. Medical Center, 2002

An emerging subfield of bioinformatics that is concerned with ontologies and algorithms for computing with biological function. Functional bioinformatics is the computational counterpart of functional genomics ...  is concerned with managing and analyzing functional genomics data, such as gene expression experiments and large- scale knock- out experiments. .. emphasizes large- scale computational problems, such as problems involving complete metabolic networks and genetic networks.  Peter D. Karp "An ontology for biological function based on molecular interactions" Bioinformatics Ontology 16 (3): 269- 285, 2000 Related terms: Functional genomicsMetabolic Engineering  Ontologies & taxonomies  

genochemistry genomic chemistry: The volume of data from biological and chemical studies has been increasing exponentially in recent years. In particular, there are now 150 billion sequences within GenBank, 60k protein structures in PDB, and 50 million chemicals with unique structures (as of  Sept. 7, 2009, CAS).  As a result, one of the most important challenges has been the annotation of genetic sequences to their functions, and enzymes (encoded by their sequences) to their substrate profiles.  A systematic study of chemistry that links the enzyme's sequence information (including SNP) and substrate structural diversity is needed.  It differs from traditional disciplines in many ways and requires a restructuring of established methods, the standardization of the data collection process, and new bioinformatics and modeling tools. It can take the form of extended biocatalysis complemented by bioinformatics and molecular modeling. We tentatively refer to this discipline as Genochemistry. IUPAC, Genochemistry -- chemistry designed for life sciences: Towards a guideline and a framework of genochemistry, 2010  IUPAC Project Number 2009-021-3-300.  A glossary of specialized terms will be included.

glycobioinformatics, glycoinformatics: Glycosciences

I2B2 Informatics for Integrating Biology & the Bedside:  An NIH- funded National Center for Biomedical Computing based at Partners HealthCare System. [Boston] 

information overload: Biomedicine is in the middle of revolutionary advances. Genome projects, microassay methods like DNA chips, advanced radiation sources for crystallography and other instrumentation, as well as new imaging methods, have exceeded all expectations, and in the process have generated a dramatic information overload that requires new resources for handling, analyzing and interpreting data. Delays in the exploitation of the discoveries will be costly in terms of health benefits for individuals and will adversely affect the economic edge of the country. Opportunities in Molecular Biomedicine in the Era of Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign

Many of today's problems stem from information overload and there is a desperate need for innovative software that can wade through the morass of information and present visually what we know.  The development of such tools will depend critically on further interactions between the computer scientists and the biologists so that the tools address the right questions, but are designed in a flexible and computationally efficient manner.  It is my hope that we will see these solutions published in the biological or computational literature.  Richard J. Roberts, The early days of bioinformatics publishing, Bioinformatics 16 (1): 2-4, 2000

"Information overload" is not an overstatement these days. One of the biggest challenges is to deal with the tidal wave of data, filter out extraneous noise and poor quality data, and assimilate and integrate information on a previously unimagined scale    

Where's my stuff? Ways to help with information overload, Mary Chitty, SLA presentation June 10, 2002, Los Angeles CA

integrated databases: Integration [of databases] typically is accomplished by creating small, object- oriented software elements, or “wrappers” that let a single overlaying, often browser like, desktop application interact with all the pieces.  The original separate systems are intact and functional, and new ones can be added, while the underlying complexity is transparent to users. There are still many challenges … but computing environments are becoming more unified, flexible and expandable. A. Thayer “Bioinformatics for the Masses” Chemical & Engineering News 78(6): 19-32 Feb. 7, 2000

Information in OMIM [Online Mendelian Inheritance in Man] and the published working draft of the International Human Genome Sequencing Consortium (Nature 15 Feb. 2001) has been facilitated by ties to NCBI's RefSeq and LocusLink databases. Are there other good examples of integrated databases?  Related terms:  Bio-Ontology Standards Group, Data Model Standards Group; Functional genomics Gene Ontology  

integration:  Integration of the various types of large- scale data is currently receiving much attention. There appears, however, to be little agreement on what exactly is meant by "integration", not to mention how to achieve it. The word "integration" is being attached to almost any analysis that involves the combined use of two or more large datasets.  Lars J. Jensen, Peer Bork, Quality analysis and integration of large- scale molecular data sets. Drug Discovery Today: Targets, 3(2): 51-56

integration (of databases): Allows researchers to increase the value they get from the data, because it increases the base of information they can access and allows for more robust searching. Related terms: IT infrastructure middleware, Object Oriented modeling OOM, object protocol model OPM; Maps genomic & genetic memory mapped data structures

Interoperable Informatics Infrastructure Consortium I3C: 

LSID Life Sciences Identifiers:  Cover pages  

life sciences informatics: Informatics are essential at every step of genomics- based drug discovery and development. The commercial landscape of life sciences information technology has changed dramatically in recent years. Bioinformatics, in particular, has gone through a dramatic boom/bust. While IT companies are looking to the drug discovery and development arena as a new market opportunity, pharmaceutical companies  are faced with rising pressure to reduce (or at least control) costs, and have a growing need for new informatics tools to help manage the influx of data from genomics, and turn that data into tomorrow's drugs. Key IT tools, such as high- performance computing, Web services, and grids, are being used to improve the speed and efficiency of drug discovery and development. True breakthroughs are still lacking, particularly in key areas such as gene prediction, data mining, protein structure modeling and prediction, and modeling of complex biological systems. However, most experts agree that IT and bioinformatics are essential to reaching the improved productivity the pharmaceutical industry craves.  

molecular bioinformatics: Conceptualizing biology in terms of molecules (in the sense of physical- chemistry) and then applying "informatics" techniques (derived from disciplines such as applied math, CS [computer science] and statistics to understand and organize the information associated with these molecules on a large- scale. Mark Gerstein "What is Bioinformatics?" MB&B 474b3, 2001

molecular information theory: In our laboratory we use Claude Shannon's information theory, computers (Unix, Pascal and PostScript graphics on Sun workstations) and genetic engineering (protein and DNA gels, cloning, sequencing and magnetic bead technology) to study genetic control patterns on DNA and RNA.  "Molecular Information Theory" Tom Schneider, National Cancer Institute, US, 2002

molecular pattern recognition: Developing computational methodologies for the analysis and interpretation of large-scale expression datasets generated by DNA  microarray experiments. Analysis of genome-wide expression patterns and their correlations with phenotypes of interest may provide unique insights into the structure of genetic networks and into biological processes not yet  understood at the molecular level. Whitehead/ MIT [US] Genome Center's  Molecular Pattern Recognition web site.  Broader term: pattern recognition.  Related terms Expression

molecular systems biology: An integrative discipline that seeks to explain the properties and behaviour of complex biological systems in terms of their molecular components and their interactions.  Nature Publishing, Molecular Systems Biology aims & scope   Broader term: systems biology

NCBI  National Center for Biotechnology Information: Established in 1988 as a national  resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease. Part of  NIH.

non-redundant databases: Researchers at the National Center for Biotechnology Information (NCBI) coined the term "nr" database (nonredundant database) to refer to a database in which the obviously redundant entries have been merged. These entries are typically those that are 100%, character- by- character identical, and algorithms exist that can remove such redundancy. Although such a database has less redundancy than a primary database, a substantial amount of redundancy remains, and it can be removed only by a curator using scientific judgment.

Many databases try to be “non-redundant”.  Unfortunately, biological data is too complex to fit a simple definition of redundancy … Each “non- redundant” database has its own definition of redundancy. George Church Lab, Harvard Medical School, US   Examples of non- redundant databases include UniGene and SWISS- PROT, while DDBJ/ EMBL/ GenBank are redundant databases.

ontologies proteomics: Protein informatics

Open Bioinformatics Foundation OPEN-BIO: The purpose of the foundation is to act as an umbrella organization for the various bio*.org projects that grew out of the original BioPerl project. The goal of the foundation is to provide financial, administrative and technical assistance for our various open source life science projects.  Narrower terms:,,, Related term:

prediction: Narrower terms: exon prediction, gene prediction, ORF prediction, protein sequence prediction;  Protein informatics protein structure prediction; Related terms: recognition

proprietary databases:  Fee- based, copyrighted databases (in contrast to public databases such as those at DDBJ/ EMBL/ GenBank). Some databases charge subscription fees to commercial organizations, with other arrangements available to non- profits.. Also referred to as private databases.  Compare: public databases

protein bioinformatics protein informatics:  Protein informatics

public databases: Freely accessible databases such as GenBank/ EMBL/ DDBJ, ArrayExpress or BLOCKS. There has been much debate about public vs. proprietary databases. 

recognition: Narrower terms: computational gene recognition, gene recognition, molecular recognition. recognition site: Pharmaceutical biology

research informatics:  The explosion of genomic information, from sequences and gene expression to SNPs and protein structures, is of limited value for pharmaceutical researchers without powerful software capable of interpretation and comparisons. Data mining, multiple location data sharing, and computational enhancements of biological and chemistry projects, as well as integration of these efforts, and legacy information systems, the very different language and perspectives of chemists and biologists, and the organizational issues of compartmentalization remain key topics.

self- organizing map: A type of mathematical cluster analysis that is particularly well suited for recognizing and classifying features in complex, multidimensional data. The method has been implemented in a publicly available computer package, GENECLUSTER, that performs the analytical calculations and provides easy data visualization … Expression patterns of some 6,000 human genes were assayed, and an online database was created. GENECLUSTER was used to organize the genes into biologically relevant clusters that suggest novel hypotheses about hematopoietic differentiation. P. Tamayo et al “Interpreting patterns of gene expression with self- organizing maps: methods and application to hematopoietic differentiation” PNAS 96(6): 2907- 2912 Mar 16, 1999  

Similar to k-means, but the algorithm organizes the clusters in a two- dimensional grid, such that clusters that are close together in the grid are more similar than those further apart. This is a very useful feature when working with large numbers of clusters.  Related term: neural networks  

semantic systems biology: Semantic technologies are playing an increasingly important role in capturing and modeling biological knowledge. Semantic systems biology can complement the bottom-up approach with data-driven generation of hypotheses. Therefore, Semantic Systems Biology (SSB) is a systems biology approach that uses semantic description of knowledge about biological systems to facilitate integrated data analysis.  About Semantic Systems Biology 

spatio temporal dynamics: Local interactions in space can give rise to large scale spatio temporal patterns (e.g. (spiral) waves, spatio- temporal chaos (turbulence), stationary (Turing- type) patterns and transitions between these modes). Their occurrence and properties are largely independent of the precise interaction structure. They are indeed seen to occur at many organizational levels of biotic systems. Space can be either 'real' space or a state space, e.g. 'phenotype space' in models of speciation or 'shape space' in immunological models of shape- based receptor interactions. We show that such spatio- temporal patterns have important consequences for fundamental bioinformatic processes. Paulien Hogeweg, Overview of Research 1993- 1998, Utrecht University, Netherlands, 1999

standards: Related terms: Bio-ontology Standards Group, CORBA
, Data Model Standards Group, object protocol model OPM . EBI [European Bioinformatics Institute] is also working on standards. Microarrays MAML, MGED, MIAMI 

systems bioinformatics: With the completion of the Human Genome Project, the scientific community is now faced with the even greater challenge of analyzing the resulting data from this and other large-scale genome projects to better understand the networks underlying biological function. Second International Computational Systems Bioinformatics Conference To be Held August 11-14, 2003 at Stanford University, IEEE CS Bioinformatics Technical Chair via BizWire    

systems biology: NIGMS views "systems biology" as a conceptual framework for the analysis of complex biological systems. Such systems derive from interactions among many distinct components in varying contexts. These systems exhibit properties, such as nonlinear dynamics and emergent behavior, that cannot easily be inferred from studies of components in isolation. Systems biology relies on mathematical methods and computational models to generate hypotheses and to design new experiments. Iteration between theory and experiment is crucial. The quantity and quality of data required for these approaches often challenge current technologies, and the development of new technologies and cross-disciplinary collaborations may be required. When applied to human health, systems biology can be a powerful tool to test hypotheses relevant to health and disease, particularly the results of therapeutic interventions.  National Centers for Systems Biology

This report focuses on the current and future applications of Systems Biology in drug discovery, specifically in pinpointing optimal individual targets, and combinations of targets, to overcome metabolic pathway redundancies, leading to efficacious and safe products. Insight Pharma Reports, Systems biology: A disruptive technology, 2008

The label “systems biology” is pretty awful, except, of course, for the many even worse labels that have been tried. More important is what SB seeks to do: transform biology and health care into a rigorous, predictive science offering a richer understanding of biology and a vastly improved approach to drug development and medicine. SB would build on the molecular biology revolution and elucidate the wiring diagrams (and their rules) buried in the data.   John Russell, BioIT World, Sept  2007 

Systems biology is frequently defined as the study of all of the elements in a biological system and their relationship to one another in response to perturbation. Advances in science and technology are enabling the development of this emerging and cross-disciplinary field by allowing researchers to explore how biological components function as a network in cells, tissues and organisms. Recently, pharmaceutical companies have begun to embrace systems approaches in an effort to better understand physiology, pathogenic processes and pharmacological responses. This review focuses on recent advances within three core areas of systems biology: data collection, data analysis, and the integration and sharing of data.  Susie Stevens and J. Rung, Advances in systems biology: measurement, modeling and representation, Current Opinion in Drug Discovery and Development, 2006 Mar; 9(2): 240- 250.

Researchers at ISB seek to understand not only each constituent of a biological network but also how all of a network’s constituents function together. They use cutting-edge technologies to gather as much information as they can about a biological system. They then use this information to build mathematical and graphical models that account for the behavior of the system. They test these models by gathering additional data, often by perturbing a system through genetic or environmental changes. In this way, they build an understanding of biological systems that can be used, for example, to explore what goes wrong when a biological system becomes diseased and how to treat or prevent that disease. ISB Institute for Systems biology

There are basically two approaches to systems biology. One has its roots in biology, the other in systems theory. The former sees it as a way to integrate data from a variety of sources. For the latter, the main idea is that the methods developed in those fields might also have a useful application in biology, since engineering sciences have a tradition of borrowing from natures design principles. Only recently, the prospect of 'designing' biological systems has become feasible. Currently this is mostly done by 'improving' plants or animals by adding genes from other organisms, but first simple from-scratch designs of biological functional modules are starting to appear. Examples are designed cells as thermometers and oscillators which are independent of the cell cycle. Even before all this became possible, though, the possibility of using engineering methods to assist in 'reverse engineering nature' had a certain appeal. Glossary for Systems Biology, Univ of Stuttgart

The very nature of systems biology requires integrating data from a variety of sources generated and interpreted by people skilled in different areas --  engineering, computer science, biology, physics, mathematics, and statistics. Key considerations in this process include the generation of quantitative data, barriers in communication across departments, and organizational challenges.

Glossary for systems biology, Institutes for System Dynamics and Control and for Systems Theory in Engineering of the University of Stuttgart 2011 Wikipedia 
Narrower terms: comparative systems biology, molecular systems biology; hepatocyte systems biology, semantic systems biology ;  In silico & molecular modeling applied systems biology, in silico biology ; Metabolic engineering signal transduction Pharmaceutical biology integrative biology- 

thresholding: The researcher defines minimum and maximum values that are considered reliable; measurements that are too low or too high are dropped from the dataset or marked as unreliable. It also makes sense to subtract the minimum value from all other measurements, because this reflects baseline noise. This approach implicitly assumes that microarrays normally operate in the linear part of the dynamic range, and that the transitions between the linear and flat regimes occur abruptly. Broader term: normalization 

translational bioinformatics: Clinical and medical informatics

Bioinformatics and Genomics Gateway, BioMedCentral 
Bioinformatics information resources poster
EMBL-EBI Bioinformatics Services, European Molecular BIology Lab, European BIoinformatics Institute 

Alpha glossary index

How to look for other unfamiliar  terms

Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map