You are here Biopharmaceutical Glossary Homepage/Search > Informatics > Biopharmaceutical Databases directory
Indexes include Alleles, Annotation, Bibliographic (and full text), cDNA & clones, Chromosomes, Comparative genomes, Database directories (life sciences), Diseases, Electrophoresis, ESTs, Gene Expression, Gene Prediction, Genes, Genomes, Interactions - genetic and molecular, Maps and Mapping, Microarrays, Model organisms, Molecular Modeling, Mutations, Nomenclature and Systematics and Taxonomy, Patents, Pathways, Phenotypes, Plasmids, Polymorphisms, Probes and primers, protein sequences, Proteins and Protein structures, Protein domains, Protein families, Proteome, RNA, Research in Progress, SNPs, Sequences, Servers, Signal transduction, Transgenics, Variations.
Databases descriptions and links follow
Life Sciences Database Directories
Introduction to Molecular Biology Databases, R. Apweiler, R. Lopez, B. Marx, 1999 http://www.ebi.ac.uk/swissprot/Publications/mbd1.html Covers bibliographic, taxonomy, nucleotide sequence, genetic, protein sequence databases, PIR, SWISS-PROT, TrEML and specialised protein sequence, protein databases, secondary protein databases and structure databases.
NCBI Handbook, guide to databases and bioinformatics, National Center for Biotechnology Information, NLM, NIH, 2003
This is not a comprehensive catalog of databases. Both public and proprietary databases are included. Many proprietary databases may make special arrangements for academic users. Please consult individual websites for details. The dividing line between databases, software and integrated systems gets blurrier all the time.
2D PAGE databases index http://au.expasy.org/ch2d/2d-index.html
3Dee Database of Protein Domain Definitions, EBI, UK http://www.compbio.dundee.ac.uk/3Dee/ Structural domain definitions for all protein chains in the Brookhaven Protein Databank (PDB) that have 20 or more residues and are not theoretical models [listed here]. In addition, the domains have been clustered on sequence similarity and structural similarity. The resulting families are stored as a hierarchy.
3DinSight, BioInfoBank, Japan http://gibk26.bse.kyutech.ac.jp/jouhou/3dinsight/3DinSight.html Integrated database and search tool for structure, property and function of biomolecules.
ALFRED Allele Frequency Database, Kidd Lab, Yale University, US http://alfred.med.yale.edu/alfred/index.asp
ASDB Alternative Splicing DataBase http://hazelton.lbl.gov/~teplitski/alt/AceDB http://www.acedb.org/ A genome database system developed primarily by Jean Thierry- Mieg (CNRS, Montpellier, France) and Richard Durbin (Sanger Centre. UK). It provides a custom database kernel, with a non- standard data model designed specifically for handling scientific data flexibly, and a graphical user interface with many specific displays and tools for genomic data. AceDB is used both for managing data within genome projects, and for making genomic data available to the wider scientific community. AceDB was originally developed for the C. elegans genome project, from which its name was derived (A C. elegans DataBase). However, the tools in it have been generalized to be much more flexible and the same software is now used for many different genomic databases from bacteria to fungi to plants to man. It is also increasingly used for databases with non- biological content.
allgenes.org, Univ. of Pennsylvania, USA http://www.allgenes.org Comprehensive gene index or gene catalog that includes genes/transcripts predicted by two largely independent methods: 1. Genes (transcripts) predicted by clustering and assembling EST sequences. The EST clusters on allgenes.org are those in the latest release of the Database of Transcribed Sequences (DoTS), which was developed by the Computational Biology and Informatics Laboratory at the University of Pennsylvania. 2. Genes predicted by running the ab initio gene finders GRAIL- EXP and GENSCAN on all available human and mouse genomic sequence. This data comes from the Genome Channel, an effort of the Computational Biosciences Section at Oak Ridge National Laboratory.
Amino Acid Index AAI, GenomeNet, Japan http://www.genome.ad.jp/dbget/aaindex.html An amino acid index is a set of 20 numerical values representing any of the different physicochemical and biological properties of amino acids. The AAindex1 section of the Amino Acid Index Database is a collection of published indices together with the result of cluster analysis using the correlation coefficient as the distance between two indices.
ArrayExpress, EBI, UK http://www.ebi.ac.uk/arrayexpress/ A public repository for microarray based gene expression data. Currently the EBI is establishing a pilot database containing microarray gene expression data that are available publicly.
Axeldb (A Xenopus laevis database) DKFZ (Germany Cancer Research Center), Univ. Heidelberg, Germany http://www.dkfz-heidelberg.de/abt0135/axeldb.htm A database focussing on gene expression in the frog Xenopus laevis. It is the web companion to our paper describing a large- scale in situ hybridization screening in Xenopus embryos. The goals of our "large- scale in situ screen" project are to identify genes by the characterization of their expression pattern, to partially sequence the corresponding cDNAs and to maintain a database collecting the results. .
BBID Biological Biochemical Image Database, National Institute on Aging, NIH, US http://bbid.grc.nia.nih.gov/ A searchable database of images of putative biological pathways, macromolecular structures, gene families, and cellular relationships. It is of use to those who are working with large sets of genes or proteins using cDNA arrays, functional genomics, or proteomics.
BioModels Database; BioModels.net initiative, a collaboration amongst the SBML Team (USA), the EMBL-EBI (United-Kingdom), the Systems Biology Group of the Keck Graduate Institute (USA), the Systems Biology Institute (Japan), and JWS Online at the Stellenbosch University (South Africa). http://www.ebi.ac.uk/biomodels/ Annotated published models.... an effort to develop a data resource that allows biologists to store, search and retrieve published mathematical models of biological interests.
BIOSIS Biological Abstracts, Zoological Abstracts http://www.biosis.org/ Bibliographic index to biological literature.
BLOCKS, Fred Hutchinson Cancer Research Center, US http://www.swbic.org/origin/proc_man/Blocks/search/blocks_release.html From PROSITE
BRITE Biomolecular Relations in Information Transmission and Expression, GenomeNet, Japan http://www.genome.ad.jp/brite/ Cell cycle controlling pathways.
Berkeley Drosophila Genome Project BDGP, http://www.fruitfly.org/ UC-Berkeley, US http://www.fruitfly.org/ Curated annotated informatics database from the Berkeley and European Drosophila genome projects, with annotations from the literature, comparative sequence analysis and the FlyBase research community.
Biochemical Pathways, Boehringer Mannheim GmbH, Germany http://biochem.boehringer-mannheim.com/prodinfo_fst.htm?/techserv/metmap.htm A digitized version of our Biochemical Pathway Chart is available on the ExPASy Molecular Biology Server of the Geneva University Hospital and the University of Geneva. An electronic index allows for the quick localization of any metabolite or enzyme on the chart. In addition most enzyme names on the chart act as links to the extensive ENZYME database.
BioExpress See GeneExpress
Biology WorkBench. San Diego Supercomputer Center, US http://workbench.sdsc.edu/ A revolutionary web- based tool for biologists. The WorkBench allows biologists to search many popular protein and nucleic acid sequence databases. Database searching is integrated with access to a wide variety of analysis and modeling tools, all within a point and click interface that eliminates file format compatibility problems.
BioMagRes, Univ. of Wisconsin-Madison, US http://www.bmrb.wisc.edu/ Contains NMR chemical shifts derived from proteins and peptides, reference data, amino acid sequence information, and data describing the source of the protein and the conditions used to study the protein. In constructing the database, proteins and larger peptides have been given priority. Shift assignments for hemes, cofactors, and substrates of a protein are also included, when they are reported as part of a complex.
BioMedCentral (UK) http://www.biomedcentral.com/home/ Publisher of journals covering all areas of biology and medicine. We provide free access to peer- reviewed research articles and subscription- based access to reviews, commentaries and other information services
CATH Protein Structure Classification, University College, London, UK http://www.cathdb.info/ Protein domains classified into superfamilies.
CDD Conserved Domain Database, NCBI, US http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml Database and search service. currently contains domains derived from two popular collections, Smart and Pfam, plus contributions from colleagues at NCBI. The source databases also provide descriptions and links to citations. Since conserved domains correspond to compact structural units, CDs contain links to 3D-structure via Cn3D whenever possible.
CEPH Genotype Database, Centre d'Etude du Polymorphisme Humain (CEPH), France http://www.cephb.fr/cephdb/ A database of genotypes for genetic markers that have been typed on the CEPH reference family resource for linkage mapping (Genomics 6: 575-577, 1990; Science 265: 2049-2054, 1994). The present version of the database (V10.0 - November 2004) contains genotypes for 32,356 genetic markers.
CGAP Cancer Gene Anatomy Project, NCBI, US http://www.ncbi.nlm.nih.gov/ncicgap/ An interdisciplinary program established and administered by the National Cancer Institute to generate the information and technological tools needed to decipher the molecular anatomy of the cancer cell. CGAP is divided into five complementary Initiatives, each with its own goals, informatics tools and resources.
COG Clusters of Orthologous Groups of Proteins, NCBI, US. http://www.ncbi.nlm.nih.gov/COG/ Delineated by comparing protein sequences encoded in 21 complete genomes, representing 17 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain.
Caenorhabditis elegans WWW Server, University of Texas Southwestern Medical Center at Dallas, US http://elegans.swmed.edu
C. elegans Gene Knockout Consortium, University of British Columbia, Canada http://elegans.bcgsc.bc.ca/knockout.shtml
Chemical Abstracts CA http://www.cas.org/ Bibliographic index to the chemical literature.
ChipDB*, Richard Young, Whitehead Institute, MIT, US We are dissecting genome regulatory circuitry in yeast and human cells. The transcriptional regulatory circuitry of yeast and human cells is being deduced through the use of high density oligonucleotide arrays. We are exploring the role of the transcription apparatus, chromatin and signaling pathways in regulation of genome expression. (Transcription Initiation Apparatus, Genome- Wide Expression)
NCBI, US http://www.ncbi.nlm.nih.gov/clone
that integrated information about clones and libraries,including sequence
data, map positions and distributor information. It replaces the former NCBI
Conserved Domain Database CDD, NCBI, US http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml currently contains domains derived from two popular collections, Smart and Pfam, plus contributions from colleagues at NCBI. The source databases also provide descriptions and links to citations. Since conserved domains correspond to compact structural units, CDs contain links to 3D-structure via Cn3D whenever possible.
CrossRef http://www.crossref.org Publishers International Linking Association 77 publishers of over 4,780 journals.
DATABANKS, SRS, EBI, UK http://www.ebi.ac.uk/srs5cgi/wgetz?-fun+PageLibInfo+-info+DATABANKS 450+ databases, compiled nightly from public SRS servers worldwide.
DBGET/LinkDB, GenomeNet, Institute for Chemical Research, Kyoto University, Japan http://www.genome.ad.jp/dbget/ Integrated database retrieval system, currently supports the following databases and gene catalogs: nucleic acid sequences: GenBank, EMBL protein sequences: SWISS- PROT, PIR, PRF, PDB, STR, 3D structures: PDB, sequence motifs: PROSITE, EPD, TRANSFAC, enzyme reactions: LIGAND, metabolic pathways: PATHWAY, amino acid mutations: PMD, amino acid indices: AAindex, genetic diseases: OMIM, literature: LITDB, Medline, gene catalogs: E. coli, H. influenzae, M. genitalium, M. pneumoniae, M. jannaschii, Synechocystis, S. cerevisiae, cross reference EMBL and GenBank
DDBJ DNA DataBank of Japan Shares information daily with EMBL and GenBank. http://www.ddbj.nig.ac.jp/
DHMD Dysmorphic Human and Mouse Homology Database, Mothercare Unit of Clinical Genetics and Fetal Medicine, Institute of Child Health, University of London, UK http://www.hgmp.mrc.ac.uk/DHMHD/dysmorph.html This application consists of three separate databases of human and mouse malformation syndromes together with a database of mouse/ human syntenic regions. The mouse and human malformation databases are linked together through the chromosome synteny database. The purpose of the system is to allow retrieval of syndromes according to detailed phenotypic descriptions and to be able to carry out homology searches for the purpose of gene mapping. Databases include the London Dysmorphology Database (LDDB), Mouse malformation database, and Human Cytogenetic Aberrations.
DIP Database of Interacting Proteins, UCLA/DOE, US http://dip.doe-mbi.ucla.edu/ Documents experimentally determined protein- protein interactions and interactive methods.
DNA Patents Georgetown University's Kennedy Institute of Ethics http://dnapatents.georgetown.edu/ Free public access to the full text and analysis of all DNA patents issued by the United States Patent and Trademark Office (PTO). Allows researchers to track data or trends of patents and patent applications in categories such as the date of issue, the inventor or the receipt of government funding..
DOGS Database of Genome Sizes Center for Biological Sequence Analysis, Technical University Denmark http://www.cbs.dtu.dk/databases/DOGS/index.html A comprehensive list of (estimated) genome sizes for different organisms. The purpose of this database is to provide such a list. The ultimate goal is to compile a list of all the known organisms and their respective genome sizes. Both the completed and estimated genomes are listed. The estimated genome sizes are given for both the organisms currently being sequenced and those for which no sequencing programme is in progress.
DOTS Database of Transcribed Sequences, Univ. of Pennsylvania, US. Has been superseded by http://www.allgenes.org/ which combines data from DOTS and the Genome Channel (ORNL).
DrugBank: A unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains nearly 4300 drug entries including >1,000 FDA-approved small molecule drugs, 113 FDA-approved biotech (protein/peptide) drugs, 62 nutraceuticals and >3,000 experimental drugs. Additionally, more than 6,000 protein (i.e. drug target) sequences are linked to these drug entries. Each DrugCard entry contains more than 80 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. DrugBank Wishart DS et al., DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006 1;34
DSSP Definition of Secondary Structures of Proteins, http://bioweb.pasteur.fr/seqanal/interfaces/dssp-simple.html W. Kabsch and Chris Sander (1983) Biopolymers 22, 2577- 2637.
Dali, EBI European Bioinformatics Institute http://www.embl-ebi.ac.uk/dali/ The Dali server is a network service for comparing protein structures in 3D. You submit the coordinates of a query protein structure and Dali compares them against those in the Protein Data Bank. A multiple alignment of structural neighbours is mailed back to you.
Database of Macromolecular Movements, Molecular Biophysics and Biochemistry, Yale Univ., US http://bioinfo.mbb.yale.edu/MolMovDB/ This describes the motions that occur in proteins and other macromolecules, particularly using movies. Associated with it are a variety of free software tools and servers for structural analysis. M Gerstein & WG Krebs (1998). Nuc. Acid. Res. 26:4280-4290
Database of Ribosomal Crosslinks, Max Planck Institut, Berlin, Molekulare Genetik, Germany http://www.molgen.mpg.de/~ag_ribo/ag_brimacombe/drc/ To interpret the molecular basis of the translational process, it is essential to have a corresponding knowledge of the higher structure of the ribosome.
dbEST, NCBI http://www.ncbi.nlm.nih.gov/dbEST/index.html Sequence data and other information on "single- pass" cDNA sequences or ESTs, from a number of organisms, part of GenBank.
dbSNP, NCBI http://www.ncbi.nlm.nih.gov/SNP/ Uses "looser variation" definition for SNPs (no requirement or assumption about minimum allele frequencies or the polymorphisms…Short deletion and insertion polymorphisms, and microsatellite repeats, as well as SNPs are included. Disease causing clinical mutations, as well as neutral polymorphisms, are also in scope. [dbSNP FAQ]
dbSTS, NCBI http://www.ncbi.nlm.nih.gov/dbSTS/ A subset of GenBank, with sequence and mapping data on short genomic landmark sequences (STSs). More comprehensive annotation than in GenBank and regularly updated with BLAST.
Dead DNA: See under Mitomap
DrugTarget Database, LifeSpan Biosciences http://www.lsbio.com/products/expression/ Using either proprietary antibodies or commercial antibodies, LifeSpan has produced over 700 reports that provide information on the localization of approximately 380 G protein- coupled receptors, nuclear receptors, and kinases in at least 25 normal peripheral tissues, 11 brain regions, and 25 major diseases.
EGAD, TIGR, US http://www.tigr.org/tdb/egad/sequence/sequence_page.html Extraction and curation of sequences from GenBank to create a non- redundant set of transcript (HT and ET) sequences.
EMBASE Excerpta Medica http://www.embase.com/home Bibliographic index to biomedical and pharmacological literature.
EMBL (European Molecular Biology Laboratory: Main laboratory is in Heidelberg, Germany, with outstations in Hamburg, Grenoble, France (access to high powered instruments for structure studies) and Hinxton, UK (bioinformatics). Supported by 14 European countries and Israel, shares data daily with DDBJ and GenBank. http://www.embl-heidelberg.de/
EPD Eukaryotic Promoter Database, Bioinformatics Group, ISREC Swiss Institute for Experimental Cancer Research http://www.epd.isb-sib.ch/ an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis.
ENCODE ENCyclopedia of DNA Elements, NHGRI http://www.genome.gov/10005107 Before the best use of the information contained in the [human genome] sequence can be made, the identity and precise location of all of the protein- encoding and non- protein- encoding genes will have to be determined. The identity of other functional elements encoded in the DNA sequence, such as promoters and other transcriptional regulatory sequences, along with determinants of chromosome structure and function, such as origins of replication, also remain largely unknown. A comprehensive encyclopedia of all of these features is needed to fully utilize the sequence to better understand human biology, to predict potential disease risks, and to stimulate the development of new therapies to prevent and treat these diseases.
Entrez Programming Utilities (E-utilities) are a set of eight server-side programs that provide a stable interface into the Entrez query and database system at the National Center for Biotechnology Information (NCBI). The E-utilities use a fixed URL syntax that translates a standard set of input parameters into the values necessary for various NCBI software components to search for and retrieve the requested data. The E-utilities are therefore the structured interface to the Entrez system, which currently includes 38 databases covering a variety of biomedical data, including nucleotide and protein sequences, gene records, three-dimensional molecular structures, and the biomedical literature. http://www.ncbi.nlm.nih.gov/books/NBK25501/
Entrez Gene, NCBI http://www.ncbi.nih.gov/entrez/query.fcgi?db=gene Provides a unified query environment for genes defined by sequence and/or in NCBI's Map Viewer.
Entrez Genomes, NCBI, US http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome
Entrez Nucleotides, NCBI, US http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide A collection of sequences from several sources, including GenBank, RefSeq, and PDB. The number of bases grows at an exponential rate.
Entrez Proteins, NCBI, US http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=Protein The protein entries in the Entrez search and retrieval system have been compiled from a variety of sources, including SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq.
ENZYME, ExPASy, Switzerland http://au.expasy.org/enzyme/ Enzyme nomenclature database
EpoDB Erythropoiesis database, CBIL (Computational Biology & Informatics Lab), Univ. of Pennsylvania, US http://www.cbil.upenn.edu/EpoDB/index.html A database of genes that relate to vertebrate red blood cells. It includes DNA sequence, structural features, protein information, gene expression information and transcription factor binding sites.
ExPASy (Expert Protein Analysis System), Swiss Institute of Bioinformatics, Switzerland http://au.expasy.org/ Proteomics server
Express DB, George Church Lab, Harvard Medical School, US http://arep.med.harvard.edu/ExpressDB/ A relational database for maintaining yeast RNA expression data. It is intended as a demonstration of how such data can be managed, and of the benefits such management confers. As of July, 1999, over 17.5 million pieces of information have been loaded into ExpressDB deriving from 11 source studies. The EXD web query system allows data from multiple source studies to be retrieved to user specifications and collated by ORF name. A manuscript on ExpressDB, the data loaded into it, and how it may be analyzed, has been submitted for publication
FlyBase SEE Berkeley Drosophila Genome Project
FlyView, Univ. Muenster, Germany http://pbio07.uni-muenster.de/ Image database on Drosophila development and genetics, especially expression patterns of genes
GDB Genome DataBase, Hospital for Sick Children, Toronto, Canada http://www.gdb.org Genomic maps, genes, YACs and amplimers 12 mirror sites http://www.gdb.org/gdb/contact.html#nodes
GGEG Global Gene Expression Database, MD Anderson Cancer Center http://sciencepark.mdanderson.org/ggeg/default.html Human mRNA sequence data specific to the RAGE and SAGE techniques, general mRNA information.
GOBASE Organelle Genome Database, Univ. of Montreal, Canada http://megasun.bch.umontreal.ca/gobase/ A taxonomically broad organelle genome database that organizes and integrates diverse data related to organelles. The current version focuses on the mitochondrial subset of data. In its second phase, GOBASE will also include information on chloroplasts and representative bacteria that are thought to be specifically related to the bacterial ancestors of mitochondria and chloroplasts.
GPCRDB: Information system for G- Protein Coupled Receptors (GPCRs), Univ. of Nijmegen, UCSF, EBI, IPSI, Leiden/ Amsterdam Center for Drug Research, SWISS- PROT, tinyGrap http://www.gpcr.org/ Data is collected for the following G protein- Coupled Receptor Families: Class A. Receptors related to Rhodopsin and the adrenergic receptor; Class B. Receptors related to the Calcitonin and PTH/ PTHrP Receptors; Class C. Receptors related to the Metabotropic Receptors; Class D. Receptors related to the pheromone Receptors; Class E. Receptors related to the cAMP Receptors; Non-GPCR molecules (e.g, G proteins, halo- rhodopsins, etc.)
GSDB See Genome Sequence DataBase
GSS Genome Survey Sequences, NCBI, US http://www.ncbi.nlm.nih.gov/dbGSS/ The GSS division of GenBank is similar to the EST division, except that its sequences are genomic in origin, rather than cDNA (mRNA). The GSS division contains (but is not limited to) the following types of data: random "single pass read" genome survey sequences, cosmid/BAC/YAC end sequences, exon trapped genomic sequences, Alu PCR sequences.
GXD: Gene Expression Database, Jackson Laboratory, US http://www.informatics.jax.org/mgihome/GXD/gxdgen.shtml#concept Gene expression data on the laboratory mouse.
GadFly Genome Annotation Database, Berkeley Drosophila Genome Project http://www.fruitfly.org/annot/index.html Genome annotations
GenAtlas Univ. Rene Descartes, France http://genatlas.org/ Genes, phenotypes and markers for humans.
GenBank:, NCBI, US http://www.ncbi.nlm.nih.gov/Genbank/ NIH genetic sequence database, annotated collection of all publicly available DNA sequence Mirrored at EMBL and DDBJ. Currently estimated (early 2000) that over 2 million bases are deposited here each day. This growth will only accelerate in the future. Began in the 1980’s by DOE. Cross reference DDBJ and EMBL. See also Sequencing Glossary.
GeneCards, Weizmann Institute, Israel http://www.genecards.org/index.shtml Numerous mirrored sites, database of human genes, their products and their involvement in diseases. It offers concise information about the functions of all human genes that have an approved symbol, as well as selected others.
Gene Census system, Yale University, US http://bioinfo.mbb.yale.edu/genome/ Comprehensive statistical accounting of protein structural features in genomes and sequence databanks.
Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ A high- throughput gene expression / molecular abundance data repository, as well as a curated, online resource for gene expression data browsing, query and retrieval.
Gene Index Project,
Dana Farber Cancer Institute http://compbio.dfci.harvard.edu/tgi/
The goal of The Gene
Index Project is to use the available EST and gene sequences, along with the
reference genomes wherever available, to provide an inventory of likely genes
and their variants and to annotate these with information regarding the
functional roles played by these genes and their products.
Gene Map of the Human Genome, International RH Mapping Consortium http://www.ncbi.nlm.nih.gov/genemap99/ Includes locations of more than 30,000 genes and provides an early glimpse of some of the most important pieces of the genome.
Genetic Annotation Index (GAI) identifies and characterizes the polymorphisms associated with cancer.
GenLink http://www.genlink.wustl.edu/ Washington Univ. St. Louis, US a multimedia database resource for human genetics and telomere research.
Genome Analysis Pipeline http://grail.lsd.ornl.gov/tools/pipeline/ Purpose: Submit a sequence and get back the results of (1) gene and exon model predictions, (2) GRAIL annotated features, and eventually (3) BLAST analysis. This tool is currently undergoing development and testing. Genome Centers wanting to help us refine the options available through this tool are encouraged to contact us. Hosted by the Computational Biosciences Section, Oak Ridge National Laboratory, Oak Ridge, Tennessee, US.
Genome Annotation Data Warehouse: A computational annotation pipeline is being applied to the genome sequences of human, mouse, and over 23 other organisms. This analysis integrates experimental data and predictions around a genome sequence framework. The data is periodically obtained from the GenBank/ EMBL/ DDBJ collaboration and processed through a large- scale computational framework consisting of several analysis modules. . [Annotated Genomes, Oak Ridge National Lab, TN, US] http://genome.ornl.gov/GCat/
Genome Catalog, Genome ChannelOak Ridge National Lab, TN, US http://genome.ornl.gov/GCat/ Functional annotation pipeline is being applied to the genome sequences of human, mouse, and over 23 other organisms. This analysis integrates experimental data and predictions around a genome sequence framework. The data is periodically obtained from the GenBank/ EMBL/ DDBJ collaboration and processed through a large- scale computational framework consisting of several analysis modules. Gene models and other features are predicted. Links are made to other databases and experimental data. The results are stored in the Genome Annotation Data Warehouse. There are two major set of interrelated interfaces to this annotated genomes and the links to external databases: Genome Catalog , an HTML browsing and querying interface with summary and detail data about annotation organized around chromosome, contigs, submitted GenBank clones, GenBank annotated genes, GRAIL- EXP gene models, GENSCAN gene models, STS markers, and other features and Genome Channel. a JAVA interactive browser interface provides a rich graphical view of contigs, clones, genes, and other annotation features.
Genome Channel, Oak Ridge National Laboratory, US http://compbio.ornl.gov/channel/ Search by organism (including human), chromosome?
Genotypes DB, Washington Univ. St. Louis, US http://www.genlink.wustl.edu/gtypes/index.html Makes all genotypic data used in the construction of linkage maps presented in GenLink easily accessible through the WWW.
A central depository for mutation collection efforts undertaken in allegiance with the Human Genome Variation Society (HGVS) An attempt to summarize all known sequence variations in the human genome, to facilitate research into how genotypes affect common diseases, drug responses, and other complex phenotypes.
Sequence variations are presented with details of how they are physically and functionally related to the closest neighbouring gene. Records include SNPs, Indels, simple tandem repeats, and other sequence alternatives, regardless of location, allele frequencies, or known affect upon phenotype. All records are highly curated and annotated, ensuring maximal utility and data accuracy. Was HGBASE, Human Genic Bi-Allelic Sequence Database
HIV Structural Reference Database Biotechnology Division, NIST, in conjunction with NCI http://xpdb.nist.gov/hivsdb/hivsdb.html Structural data for proteins involved in making HIV, the virus that causes AIDS, as well as molecules that inhibit these activities.
HOMOLOGENE, NCBI, US http://www.ncbi.nlm.nih.gov/HomoloGene/ A homology resource which includes both curated and calculated orthologs and homologs for genes represented in UniGene and LocusLink for human, mouse, rat, and zebrafish. The curated orthologs include ortholog gene pairs reported in the Mouse Genome Database (MGD) at the Jackson Laboratory, the Zebrafish Information (ZFIN) database at the University of Oregon, and in published reports. The calculated orthologs and homologs are the result of nucleotide sequence comparisons between all UniGene clusters for each pair of organisms. These orthologs and homologs are considered putative since they are based only on sequence comparisons.
HOVERGEN Homologous Vertebrate Genes Database, PBIL (Pôle Bio-Informatique Lyonnais, Univ. Lyons, France http://biom3.univ-lyon1.fr/databases/hovergen.html A database of homologous vertebrate genes, structured under ACNUC sequence database management system. It allows one to select sets of homologous genes among vertebrate species, and to visualize multiple alignments and phylogenetic trees. Thus HOVERGEN is particularly useful for comparative sequence analysis, phylogeny and molecular evolution studies. More generally, HOVERGEN gives an overall view of what is known about a peculiar [particular?] gene family. The database itself contains all vertebrate sequences from GenBank (except ESTs), with some data corrected, clarified or completed (notably to address the problem of redundancy). Homologous coding sequences have been classified in gene families and protein multiple alignments and phylogenetic trees have been computed for each family. Sequences and related information have been structured in an ACNUC database. The database is updated every four months
HTGS High Throughput Genomic Sequences, NCBI, US http://www.ncbi.nlm.nih.gov/HTGS/ created to accommodate a growing need to make 'unfinished' genomic sequence data rapidly available to the scientific community. It was done in a coordinated effort between the three International Nucleotide Sequence databases: DDBJ, EMBL, and GenBank. The HTG division contains 'unfinished' DNA sequences generated by the high-throughput sequencing centers. Sequence data in this division are available for BLAST homology searches against either the "htgs" database or the "month" database, which includes all new submissions for the prior month. The HTG division of GenBank was recently described in a [1997 Genome Research 7(10) article by Ouellette and Boguski.
Human BAC Ends, TIGR, US http://www.tigr.org/tdb/humgen/bac_end_search/bac_end_intro.html Sequences from the ends of bacterial artificial chromosome (BAC) clones provide highly specific markers. A whole genome sequencing approach has been described in a map-as-you-go strategy. The complete sequence of a seed BAC is searched against a BAC end database and the minimally overlapping clones in each direction are selected for sequencing. As coverage increases, BAC end sequences provide samples for whole genome survey. ~743,000 end sequences from 470,000 clones (20 X clone coverage and 12% sequence coverage) have been generated by TIGR, Univ. of Washington and CalTech, providing a sequence marker every 5 kb across the genome.
Human Mouse Homology Map, NCBI, US http://www.ncbi.nlm.nih.gov/Homology/ Map is now being computed by integrating orthologs curated by the Mouse Genome Database with putative orthologs identified by sequence homology. This version of the Human-Mouse Homology map also differs from the previous Davis map by including several new features: reporting representative STS associated with the loci in the map and linked to the dbSTS pages, linking human cytogenetic locations to NCBI's MapViewer, providing alignments of representative sequences via BLAST2 , and linking gene symbols to LocusLink
HSSP homology-derived secondary structure of proteins, EMBL, Germany http://www.sander.embl-heidelberg.de/hssp/A database of homology- derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability and sequence profile. Tertiary structures of the aligned sequences are implied, but not modelled explicitly.
HUGE Human Unidentified Gene Encoded Large Proteins, Kazuza DNA Research Institute, Japan http://www.kazusa.or.jp/huge/The HUGE protein database has been created to publicize the fruits of our Human cDNA project at the Kazusa DNA Research Institute. In this project, we plan to sequence and analyze long (>4 kb) human cDNAs and to establish methods by using the sequence data how to predict the primary structure of proteins of various biological activities. Currently, we focus on the analysis of cDNA clones encoding particularly large proteins (>50 kDa). The basic concept underlying our project and the strategies employed have been described elsewhere (Ohara et al., 1997). Our HUGE protein database contains various types of information derived from the predicted primary structure data of newly identified human proteins
HuGE Human Gene Expression Index, Brigham & Women’s Hospital, US http://www.hugeindex.org/ A comprehensive database to understand the expression of human genes in normal human tissues. Currently, RNA expression of more than 6000 genes is obtained using high- density oligonucleotide array technology
Highwire Press, Stanford Univ., US http://highwire.org Free (and fee-based), full- text science journals.
Human Gene Index, TIGR, US http://www.tigr.org/tdb/hgi/index.html Human EST sequences from TIGR and GenBank.
Human Genome Sequencing (finished, draft, other statistics, progress reports and access to data) http://www.ncbi.nlm.nih.gov/genome/seq/page.cgi?F=HsHome.html&ORG=Hs
Human Mouse Homology Map, NCBI, US http://www.ncbi.nlm.nih.gov/Homology/ Map is now being computed by integrating orthologs curated by the Mouse Genome Database with putative orthologs identified by sequence homology. This version of the Human- Mouse Homology map also differs from the previous Davis map by including several new features: reporting representative STS associated with the loci in the map and linked to the dbSTS pages, linking human cytogenetic locations to NCBI's MapViewer, providing alignments of representative sequences via BLAST2 , and linking gene symbols to LocusLink
IMAGE Consortium: Integrated Molecular Analysis of Genomes and their Expression, Lawrence Livermore National Lab, US http://image.llnl.gov/ Shares high quality arrayed cDNA libraries and places sequence, map and expression data on the clones in these arrays into the public domain. Human and mouse genomes are first to be studied. They anticipate arraying (and sharing) cDNA libraries from other species in time.
IMGT, the international ImMunoGeneTics database, Univ.
Montpellier, CNRS, France , http://imgt.cines.fr
IXDB Integrated Chromosome X DataBase, Max Planck Institut, Berlin, Germany http://www.molgen.mpg.de/~xteam/ The purpose of IXDB is to provide an integrated view of the X chromosome mapping field. Ultimately this will allow the construction of an integrated map that will take into account all the data generated by the community, including physical, genetic, transcript and sequence information. This implies acquiring, understanding and formatting an enormous amount of experimental results and can only be accomplished progressively. We have chosen to start the integration process with YAC maps generated by the community. These provide the basis for future higher resolution physical maps, as well as emerging transcript and sequence maps. The current content of IXDB therefore reflects this situation, with the emphasis placed on YAC mapping data. Due to their immediate value, IXDB has also started to systematically include bacterial clone contig maps and EST data. Currently IXDB does not store sequence data, although links to nucleic sequence databases are provided.
Induced Mutant Resource IMR, Jackson Laboratories,
InBase: Intein Database, New England Biolabs, US http://www.neb.com/neb/inteins.html Perler, F. B. InBase, the Intein Database. Nucleic Acids Res. 30, 383- 384, 2002
Interactive Fly, Purdue Univ., US http://flybase.bio.indiana.edu/allied-data/lk/interactive-fly/aimain/1aahome.htm A cyberspace guide to Drosophila genes and their roles in development, including pathways.
International Nucleotide Database: Composed of DDBJ, EMBL and GenBank. Often - but inaccurately - referred to as GenBank.
KEGG Pathway Database, http://www.genome.ad.jp/kegg/ Links to pathway and other databases (metabolic and regulatory) http://www.genome.jp/kegg/pathway.html See also note on KEGG under Metabolic engineering glossary pathways
Kabat Database of Sequences of Proteins of Immunological Interest http://www.kabatdatabase.com/
KeyNet, Consiglio Nazionale della Ricera, Italy. A database of Keywords extracted from EMBL and GenBank databases. The KeyNet structure is based on biological criteria aimed to assist the user in data searching and to minimize the risk of loss of information.
Klotho, Toni Kazic, Washington Univ. US http://www.biocheminfo.org/klotho/ Part of our attempt to model biological processes, beginning with biochemistry. I call the whole project Moirai, after the three Fates of antiquity, since fundamentally these are questions about the fates of molecules and cells.
LIGAND database, Institute for Chemical Research, Kyoto Univ. Japan http://www.genome.ad.jp/dbget-bin/www_bfind?ligand Enzymes, compounds and reactions.
Life Seq, Incyte Genomics, US http://www.incyte.com/
LocusLink, NCBI, US http://www.ncbi.nlm.nih.gov/LocusLink/ A single query interface to curated sequence and descriptive information about genetic loci. It presents information on official nomenclature, aliases, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related web sites. Note that NCBI LocusLink was replaced by Entrez Gene in 2005
MAGEST, GenomeNet, Japan http://www.genome.ad.jp/magest/ Expression patterns and sequence tags for maternal mRNAs of the ascidian egg, Halocynthia roretzi.]
MGD See Mouse Genome Database
MIPS Munich Information Center for Protein Sequences, Germany http://mips.gsf.de/ We are a bioinformatics group of the GSF (National Research Center for Environment and Health) at the Max- Planck- Institut f. Biochemie. MIPS is a member of PIR- International (Protein Identification Resource) and of EMBNET (European Molecular Biological Network)
MIRAGE (Molecular Informatics Resource for the Analysis of Gene Expression), Institute for Transcriptional Informatics, Pittsburgh PA, US http://www.isbi.net Experimental web resource dedicated to the study of gene expression.
MITOMAP, Emory Univ., US http://www.mitomap.org/ A human mitochondrial genome database. A compendium of polymorphisms and mutations of the human mitochondrial DNA.
MKMD Mouse Knockout and Mutation Database, BioMedNet, Current Biology http://research.bmn.com/mkmd Phenotypic information related to knockout and classical mutations in mice. It includes extensive links to MEDLINE on BioMedNet. MKMD was originally created from tables published over 3 issues of Current Biology (Brandon EP, Idzerda R.L., McKnight, G.S.: Current Biology (1995) 5: 569-694; 627-634; 873-881). The database has been expanded to include gene insertion mutations and classical mutants whose molecular nature has been identified.
MMDB Molecular Modeling DataBase, NCBI, US http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure 3D macromolecular structures, including proteins and polynucleotides. MMDB contains over 28,000 structures and is linked to the rest of the NCBI databases, including sequences, bibliographic citations, taxonomic classifications, and sequence and structure neighbors.
Univ. of California- San Francisco US http://alto.compbio.ucsf.edu/modbase-cgi/index.cgi
Three- dimensional protein models calculated by comparative
MOT SEE Genome MOT
Mammalian Gene Collection, NCBI, US http://mgc.nci.nih.gov/ The goal of the Mammalian Gene Collection (MGC) is to provide a complete set of full-length (open reading frame) sequences and cDNA clones of expressed genes for human and mouse. The MGC is an NIH initiative that supports the production of cDNA libraries, clones and sequences.
Medline See PubMed
Mitelman DataBase of Chromosome Aberrations in Cancer, CGAP, NCI, US http://cgap.nci.nih.gov/Chromosomes/Mitelman relates chromosomal aberrations to tumor characteristics, based either on individual cases or associations. All the data have been manually culled from the literature by Felix Mitelman, Bertil Johansson, and Fredrik Mertens.
Molecular Anatomy and Pathology Database TM, Large Scale Biology Corp., US
Molecular Effects of Drugs Database TM, Large Scale Biology Corp.
Molecular Probe Data Base MPDB, Advanced Biotechnology Center of Genoa, Italy http://www.biotech.ist.unige.it/interlab/mpdb.html Information on about 4.300 synthetic oligonucleotides with a sequence of up to 100 nucleotides. Data are mainly taken from the literature and are encoded on the basis of controlled vocabularies.
Mouse Atlas and Gene Expression Database, Human Genetics Unit, MRC
Medical Research Council, Edinburgh, UK http://genex.hgu.mrc.ac.uk/
Not yet available 11/2/00 A digital atlas of mouse development and database
to be a resource for spatially mapped data such as in situ gene expression
and cell lineage. The project is in collaboration with the Department of
Anatomy, University of Edinburgh. The gene expression database is being
developed as part of the Mouse Gene Expression Information Resource (MGEIR)
in collaboration with the Jackson Laboratory, USA.
Mouse Gene Expression Information Resource (MGEIR) http://genex.hgu.mrc.ac.uk/MouseGeneExpInfoRes/ The gene- expression resource is a collaborative project to produce a single gene- expression resource database for the research community. This resource will be directly linked to the Mouse Genome Database at the Jackson Laboratory. Database design and development is centered at the MRC Human Genetics Unit and the Jackson Laboratory, with biological and technical support from the Department of Anatomy, the ESF Embryonic Databases Network and other collaborating sites. For further details see Ringwald et. al., Science 265: 2033- 2034. Sept. 30, 1994
Mouse Genome Database MGD See Mouse Genome Informatics
Mouse Genome Informatics, Jackson Laboratory, US http://www.informatics.jax.org/mgihome/ Provides integrated access to data on the genetics, genomics and biology of the laboratory mouse. The projects contributing to this resource are: Mouse Genome Database (MGD), Gene Expression Database (GXD, Mouse Genome Sequence (MGS).
Mouse Phenome Database, Jackson Labs, US NIST ATP Funded Projects, National Institute of Standards and Technology, US http://jazz.nist.gov/atpcf/prjbriefs/listmaker.cfm
Nucleic Acids Database NDB, Rutgers Univ., US http://ndbserver.rutgers.edu/ Assembles and distributes structural information about nucleic acids. See also Protein Data Bank PDB
OMIA Online Mendelian Inheritance in Animals, Univ. of Sydney, Australia http://www.angis.org.au/Databases/BIRX/omia/ A database of the genes and phenes* that have been documented in a wide range of animal species other than those for which databases already exist (human, rat and mouse). It is modelled on, and is complementary to, McKusick's Mendelian Inheritance in Man (MIM).
* A phene is a word or words that identify a familial trait. For single- locus traits, the word(s) correspond to one of the phenotypes that arise from segregation at that locus. For example, CITRULLINAEMIA is the phene for the ARGININOSUCCINATE SYNTHETASE locus; and FECUNDITY, BOOROOLA is the phene for a locus that has not yet been identified at the biochemical/ molecular level. OMIA also includes multifactorial traits and disorders. Thus, for example, HIP DYSPLASIA is a phene.
OMIM, Online Mendelian Inheritance in Man, NCBI, US http://www.ncbi.nlm.nih.gov/Omim/searchomim.html Gene maps (cytogenetic locations of genes described in OMIM) and morbid maps (alphabetical list of diseases described in OMIM and their corresponding cytogenetic locations). [from the OMIM FAQ]
OMIM Locus Specific Mutation Databases, NCBI, US http://www.ncbi.nlm.nih.gov/Omim/Index/mutation.html Links to a number of locus specific mutation databases.
ooTFD object oriented Transcription factors and gene expression, Institute for Transcriptional Informatics IFTII, US http://www.ifti.org/cgi-bin/ifti/ootfd.pl A successor to TFD (Transcription Factors Database), now referred to as rTFD (relational Transcription Factors Database). ooTFD has been implemented in a number of object-oriented database management systems, including ROL (Rule- based Object Language), MOOD (Materials object- oriented database), and the pure java object database ozone.
PDB Protein Data Bank, Research Collaboratory for Structural Bioinformatics http://www.rcsb.org/ 3D macromolecular structural data. Incorporates NDB Nucleic Acid Database Project, Rutgers.
PEDB Prostate ESTs, Fred Hutchinson Cancer Research Center, US http://www.pedb.org/ A curated relational database and suite of analysis tools designed for the study of prostate gene expression in normal and disease states. Expressed Sequence Tags (ESTs) and full-length cDNA sequences derived from more than 40 human prostate cDNA libraries are maintained and represent a wide spectrum of normal and pathological conditions.
PIR Protein Information Resource, NBRF, Georgetown Univ. Medical Center, US http://pir.georgetown.edu/pirwww/ The Protein Information Resource (PIR), in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japanese International Protein Sequence Database (JIPID) maintains the PIR- International Protein Sequence Database --- a comprehensive, annotated, and non- redundant protein sequence database in which entries are classified into family groups and alignments of each group are available.
PIR-NRL3D http://pir.georgetown.edu/pirwww/dbinfo/nrl3d.html Sequence-Structure Database is produced by PIR- International from sequence and annotation information extracted from three-dimensional structures in the Protein Databank (PDB). The PIR- NRL3D database makes the sequence information in PDB available for similarity searches and retrieval and provides cross- reference information for use with the other PIR Protein Sequence Databases.
PMD Protein Mutant DataBase, National Institute of Genetics, Japan http://pmd.ddbj.nig.ac.jp/~pmd/
PROSITE, Swiss Institute of Bioinformatics http://au.expasy.org/prosite/ A database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs
PUMA, Phylogeny of Unicellular organisms Metabolic pathways Alignments SEE WIT which supersedes PUMA.
PRIMATOR databases, Munich Information Center for Protein Sequences, Germany http://mips.gsf.de/projects/cdna A pipeline for the analysis, annotation and presentation (PRIMATOR) of so far uncharacterized orangutan (Pongo pygmaeus) and human transcripts
Proteome Analysis, European Bioinformatics Institute http://www.ebi.ac.uk/proteome/ set up to provide comprehensive statistical and comparative analyses of the predicted proteomes of fully sequenced organisms.
Pfam (from SWISS-PROT and TrEMBL) http://pfam.wustl.edu/ and various European mirror sites including EBI, UK http://www.sanger.ac.uk/Software/Pfam/ and Sweden http://www.cgr.ki.se/Pfam/ A database of multiple alignments of protein domains or conserved protein regions. Hopefully they represent some evolutionary conserved structure, which has implications for the protein's function. Pfam is actually formed in two separate ways. Pfam-A are accurate human crafted multiple alignments whereas Pfam-B is an automatic clustering of the rest of SWISS- PROT and TrEMBL using the program Domainer
Prints, University College London, UK http://www.biochem.ucl.ac.uk/bsm/dbbrowser/PRINTS/PRINTS.html Compendium of protein fingerprints.
ProDom, INRA, France http://protein.toulouse.inra.fr/prodom.html Protein domain database.
Proteome BioKnowledge Library , BioBase, US http://www.proteome.com/ Biological information about proteins.
proWeb Project, Fred Hutchinson Cancer Research Center, US http://www.proweb.org/ Web- based protein family documentation, links to protein and protein families databases and links to specific protein family websites
PubGene, Univ. of Oslo http://www.pubgene.uio.no/ Expression analysis and text association.
PubMed Central, NCBI http://www4.ncbi.nlm.nih.gov/PubMed/ Medline
PubRef See CrossRef
REBASE, Restriction Enzyme DataBase, New England Biolabs http://rebase.neb.com/rebase/rebcit.html A collection of information about restriction enzymes and related proteins. It contains published and unpublished references, recognition and cleavage sites, isoschizomers, commercial availability, methylation sensitivity, crystal and sequence data. DNA methyltransferases, homing endonucleases, nicking enzymes, specificity subunits and control proteins are also included. Putative DNA methyltransferases and restriction enzymes, as predicted from analysis of genomic sequences, are also listed. REBASE is updated daily and is constantly expanding.
RGD Rat Genome Database, Medical College of Wisconsin, US http://rgd.mcw.edu/ is the [Goal is] establishment of a Rat Genome Database, to collect, consolidate, and integrate data generated from ongoing rat genetic and genomic research efforts and make these data widely available to the scientific community. A secondary, but critical goal is to provide curation of mapped positions for quantitative trait loci, known mutations and other phenotypic data.
RNA Abundance Database (RAD), CBIL, Univ. of Pennsylvania, US http://www.cbil.upenn.edu/RAD/ A public gene expression database designed to hold data from array-based (microarrays, high-density oligo arrays, macroarrays) and nonarray- based (SAGE) experiments. The ultimate goal is to allow comparative analysis of experiments performed by different laboratories using different platforms and investigating different biological systems. To achieve this goal, RAD contains: precise descriptions of the experiments and distinctions between raw data and processed results. In addition, a gene index is used to integrate array elements and gene tags. The selection of experiments to include in RAD will be directed by our research interests and those of our collaborators such as hematopoiesis.
RNA World, IMB, Jena, Germany http://www.imb-jena.de/RNA.html Links on RNA related topics.
RNAi Database, New York Univ., US http://nematoda.bio.nyu.edu/ RNAi phenotypic data in C. elegans.
Rat Genome Data, Medical College of Wisconsin, US http://rgd.mcw.edu/
RatMap Rat Genome Database, Goteborg University, Sweden http://ratmap.gen.gu.se/ Locus queries, homology (mouse/rat) nomenclature, linkage and physical maps, gene mapping data.
RefSeq Reference Sequences, NCBI, US http://www.ncbi.nlm.nih.gov/RefSeq/ Aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products, for major research organisms. RefSeq standards serve as the basis for medical, functional, and diversity studies; they provide a stable reference for gene identification and characterization, mutation analysis, expression studies, polymorphism discovery, and comparative analyses. RefSeqs are used as a reagent for the functional annotation of some genome sequencing projects, including those of human and mouse.
Research Collaboratory for Structural Bioinformatics RCSB See Protein DataBank
SAGEmap, NCBI, US http://www.ncbi.nlm.nih.gov/SAGE/ Serial Analysis of Gene Expression, or SAGE, is an experimental technique designed to gain a quantitative measure of gene expression. The SAGE technique itself includes several steps utilizing molecular biological, DNA sequencing and bioinformatics techniques. These steps have been used to produce 9 or 10 base "tags", which are then, in some manner, assigned gene descriptions
SBASE Protein Domain Library, ICGEB, International Centre for Genetic Engineering and Biotechnology, Italy http://hydra.icgeb.trieste.it/sbase/ Annotated protein sequence segments (structural, functional, ligand binding and topogenic). Designed to facilitate detection of domain homologies.
SBIR Small Business Innovation Research Awards SEE CRISP
SCOP: Structural Classification of Proteins, University of Cambridge UK http://scop.mrc-lmb.cam.ac.uk/scop/ SCOP mirrors http://scop.mrc-lmb.cam.ac.uk/scop/mirrors.html Reference: Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536-540.
SGD Saccharomyces Genome Database, Stanford University http://www.yeastgenome.org/
SMART (a Simple Modular Architecture Research Tool) EMBL, Heidelberg, Germany http://smart.ox.ac.uk/ Allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. More than 500 domain families found in signalling, extracellular and chromatin- associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues. Each domain found in a non- redundant protein database as well as search parameters and taxonomic information are stored in a relational database system. User interfaces to this database allow searches for proteins containing specific combinations of domains in defined taxa.
SNP Consortium Data Release, SNP Consortium Ltd. http://snp.cshl.org/ Single Nucleotide Polymorphisms for Biomedical Research Final data release 2004
SPAD Signaling Pathway Database, Institute of Genetic Resources, Kyushu Univ., Japan http://www.grt.kyushu-u.ac.jp/eny-doc/index.html Integrated database for genetic information and signal transduction systems.
SRS Sequence Retrieval System http://www.lionbio.co.uk/publicsrs.html URL has a list of public SRS servers, including EBI, DDBJ, INFOBIOGEN, EMBL SRS, developed initially as an academic system, probably the best biological database browsing tool available. SRS allows you to browse the contents of databases through a web interface, exploring links to other databases and launching other programs on the retrieved database records.
SWISS 2D PAGE, Swiss Institute of Bioinformatics http://au.expasy.org/ch2d/ Data on proteins identified on various 2-D PAGE reference maps.
SWISS 3D Image, ExPASy, Switzerland http://au.expasy.org/sw3d/ An image database which strives to provide high quality pictures of biological macromolecules with known three- dimensional structure. The database contains mostly images of experimentally elucidated structures, but also provides views of well accepted theoretical protein models.
SWISS-PROT, ExPASy (Expert Protein Analysis System) Swiss Institute of Bioinformatics A curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. See UniProt.
Saccharomyces Genome Deletion Project http://sequence-www.stanford.edu/group/yeast_deletion_project/deletions3.html
Stanford MicroArray Database, Stanford Univ., US http://genome-www4.stanford.edu/MicroArray/SMD/ Stores raw and normalized data from microarray experiments, as well as their corresponding image files. In addition, SMD provides interfaces for data retrieval, analysis and visualization. Includes a biological ontology.
TAMBIS, University of Manchester UK
TC-DB Transport Classification Database, Univ. of California, San Diego, http://tcdb.ucsd.edu/tcdb/ A comprehensive classification system for membrane transport proteins
TRIPLES TRansposon-Insertion Phenotypes, Localization and Expression in Saccharomyces, Yale Univ., US http://ycmi.med.yale.edu/YGAC/triples.htm Defined mutant alleles for the analysis of disruption phenotypes, protein localization and gene expression in Saccharomyces cerevisiae.
TrEMBL, Swiss Institute of Bioinformatics, European Bioinformatics Institute UK A computer- annotated supplement of SWISS- PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS- PROT. See UNI-PROT KnowledgeBase
Taxonomy, NCBI, US See Nomenclature
ToxExpress See under GeneExpress
Transterm, Univ. of Otago, New Zealand http://uther.otago.ac.nz/Transterm.html Database of sequence contexts about the stop and start codons of many species found in GenBank. TransTerm also contains codon usage data for these same species and summary statistics for the sequences analysed.
UM-BBD University of Minnesota Biocatalysis/Biodegradation Database, US http://umbbd.ahc.umn.edu/index.html Information on microbial biocatalytic reactions and biodegradation pathways for primarily xenobiotic, chemical compounds. The goal of the UM-BBD is to provide information on microbial enzyme- catalyzed reactions that are important for biotechnology. The reactions covered are studied for basic understanding of nature, biocatalysis leading to specialty chemical manufacture, and biodegradation of environmental pollutants. Individual reactions and metabolic pathways are presented with information on the starting and intermediate chemical compounds, the organisms that transform the compounds, the enzymes, and the genes.
UniGene, NCBI, US http://www.ncbi.nlm.nih.gov/UniGene/index.html An experimental system for automatically partitioning GenBank sequences into a non- redundant set of gene- oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location. Well- characterized genes and ESTs.
UNI-PROT Knowledgebase Universal Protein Resource, EBI, SIB, Georgetown Univ. http://www.expasy.uniprot.org/ Central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR.
UniVec, NCBI, US http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html A database that can be used to quickly identify segments within nucleic acid sequences which may be of vector origin (vector contamination) ... In addition to vector sequences, UniVec also contains sequences for those adapters, linkers and primers commonly used in the process of cloning cDNA or genomic DNA.
V Base: the database of human antibody genes, Centre for Protein Engineering, Medical Research Council, UK VecScreen, NCBI, US http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html A system for quickly identifying segments of a nucleic acid sequence that may be of vector origin. NCBI developed VecScreen to combat the problem of vector contamination in public sequence databases.
WIT2 What Is There, Argonne National Lab, US http://wit.mcs.anl.gov/WIT2/ Attempts to produce metabolic reconstructions (models of the metabolism of the organism derived from sequence, biochemical, and phenotypic data) for sequenced (or partially sequenced) genomes. For each organism, table connecting genes (ORFs) to hypothesized functional roles is included. "being transferred to new server" July 2004
Worm Chip Directory, Stanford University, US http://cmgm.stanford.edu/~kimlab/wmdirectorybig.html
ZFIN Zebrafish Information Network, University of Oregon, US http://zfish.uoregon.edu/
Software includes BEAUTY, BLAST, CLUSTALW, DBGET, DBSearching, browsing and analysis tools, Dbsolve, Entrez, ExPASy, Fasta, Gene Identification Software Sites GRAIL, Gapped BLAST, MedMiner, Proteomic tools, PSI-BLAST, SMART (Simple Modular Architecture Research Tool), SWISS-Model, Yeast Tools, WWW Promoter Scan
BEAUTY: BLAST Enhanced Alignment Utility: http://searchlauncher.bcm.tmc.edu/See also Sequencing glossary
BLAST (Basic Local Alignment Search Tool): Software program from NCBI for searching public databases for homologous sequences or proteins. Designed to explore all available sequence databases regardless of whether query is protein or DNA. http://www.ncbi.nlm.nih.gov/BLAST/ See also Sequencing glossary
Cn3D http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml A helper application for your web browser that allows you to view 3-dimensional structures from NCBI's Entrez retrieval service.
Electronic PCR, NCBI http://www.ncbi.nlm.nih.gov/STS PCR-based STSs have been used as landmarks for construction of various types of genomic maps. Using e-PCR these sites can be detected in DNA sequences, potentially allowing their map locations to be determined.
GeneSpring™ , Silicon Genetics, US http://www.sigenetics.com/Products/GeneSpring/index.html Software, an analytical workbench enabling scientists to visualize and manipulate gene expression data. Experimental data from microarrays, Affymetrix chips, SAGE, or any technique that associates numbers with genes can easily be imported for rigorous analysis
MedMiner http://discover.nci.nih.gov Text-mining tool for gene expression profiling
ORF Finder, NCBI, US http://www.ncbi.nlm.nih.gov/gorf/gorf.html Gene prediction.
PredictProtein Server http://www.embl-heidelberg.de/predictprotein/predictprotein.html Service for sequence analysis and protein structure prediction. A Neural Network based prediction server, which automatically builds a multiple sequence alignment from the most recent version of SwissProt. Ab initio secondary structure prediction.
Protein Explorer http://www.umass.edu/microbio/chime/explorer/ Supersedes RasMol.
Protein Structure 2/3D Structure Prediction & Databases, CMS Molecular Biology Resource, San Diego Supercomputer Center, US http://restools.sdsc.edu/biotools/biotools9.html
Proteomics Tools, ExPASy http://au.expasy.org/tools/
RasMol homepage [Macromolecular structure viewer] See Protein Explorer which is now recommended as easier to use and more powerful than RasMol.
SEQUEST http://fields.scripps.edu/sequest/ Correlates uninterpreted tandem mass spectra of peptides with amino acid sequences from protein and nucleotide databases. SEQUEST will determine the amino acid sequence and thus the protein(s) and organism(s) that correspond to the mass spectrum being analyzed. [Jimmy Eng, John Yates "SEQUEST HomePage Scripps Research Institute, 1999]
SMART (Simple Modular Architecture Research Tool, EMBL, Heidelberg, Germany http://smart.embl-heidelberg.de/help/smart_glossary.shtml
SWISS-MODEL Repository, Swiss Institute of Bioinformatics and Biozentrum, Basel http://swissmodel.expasy.org/repository/ A database of annotated three- dimensional comparative protein structure models generated by the fully automated homology- modelling pipeline SWISS- MODEL.
Data mining tools
Subject Index to Databases & software
Alleles See Variations
Bibliographic (and full text)
cDNA & clones See also Genes, Gene expression
Comparative genomics See also Gene Expression, Model organisms
Database directories, life sciences
Diseases See also Genes, Variations
Drugs and pipelines
EST See cDNA
Gene expression See also cDNA, Comparative Genomics, Genes,
Gene sequences See Sequences
Genes See also Gene Expression, Variations
Genomes See also Comparative genomics, Sequences
Interactions, genetic & molecular See also Pathways
Maps and mapping See also Genes, Variants
Microarrays See also Gene expression
See also Comparative genomics,
Molecular Modeling See also Proteins
Mutations See Variations
Nomenclature and Systematics/Taxonomy See Nomenclature glossary
Pathways See also Interactions
Phenotype Databases See also Gene
expression, Model Organisms
Plasmids See Entrez Genomes
Polymorphisms See Variations
Probes & primers
Protein sequences See sequences
Protein structure prediction software
Proteins and Protein structures See also Interactions, Sequences
Protein domains (tertiary structures)
Research in Progress
SNPs See Variations
Transgenics See also Model Organisms, Variations
Variations (alleles, mutations, polymorphisms, SNPs) See also
Diseases, Genes, Gene Expression
| Privacy Statement |
Glossary List | Tips & glossary
FAQs | Site Map