Back
to BiopharmaceuticalGlossaries.com
You are here
Biopharmaceutical
Glossary Homepage > Informatics > Biopharmaceutical Databases directory
BioPharmaceutical Databases directory Related glossaries include Bioinformatics
(see definitions of databases and various narrower terms),
Chemoinformatics
This webpage isn't totally up to date.
Life science databases Wikipedia
https://en.wikipedia.org/wiki/List_of_biological_databases NCBI Handbook, guide National Center
for Biotechnology Information, NLM, NIH, 2013 Databases and other NCBI
Resources
https://www.ncbi.nlm.nih.gov/books/NBK143764/
Molecular Cellular and Developmental Biology databases, University of
Michigan Library Research guide
https://guides.lib.umich.edu/molecular Bibliographic/journals,
e-books and tutorials **************************************************************************
This is not a comprehensive catalog of databases. Both public and
proprietary databases are included. Many proprietary databases may make
special arrangements for academic users. Please consult individual
websites for details. The dividing line between databases, software and
integrated systems gets blurrier all the time.
DATABASES 2D PAGE databases index
http://au.expasy.org/ch2d/2d-index.html 3Dee Database of Protein Domain
Definitions, EBI, UK http://www.compbio.dundee.ac.uk/3Dee/
Structural
domain definitions for all protein chains in the Brookhaven Protein Databank
(PDB) that have 20 or more residues and are not theoretical models [listed
here]. In addition, the domains have been clustered on sequence similarity and
structural similarity. The resulting families are stored as a hierarchy. ALFRED Allele Frequency Database,
Kidd Lab, Yale University, US http://alfred.med.yale.edu/alfred/index.asp Amino Acid Index AAI, GenomeNet,
Japan http://www.genome.ad.jp/dbget/aaindex.html
An
amino acid index is a set of 20 numerical values representing any of the
different physicochemical and biological properties of amino acids. The AAindex1
section of the Amino Acid Index Database is a collection of published indices
together with the result of cluster analysis using the correlation coefficient
as the distance between two indices. ArrayExpress, EBI, UK http://www.ebi.ac.uk/arrayexpress/
A public repository for microarray based gene expression data. Currently the EBI
is establishing a pilot database containing microarray gene expression data that
are available publicly. BioModels Database;
BioModels.net initiative,
a collaboration amongst the SBML
Team (USA), the EMBL-EBI
(United-Kingdom), the Systems
Biology Group of the Keck Graduate Institute (USA), the Systems
Biology Institute (Japan), and JWS
Online at the Stellenbosch University (South Africa). http://www.ebi.ac.uk/biomodels/
Annotated published models.... an effort to develop a data resource that allows
biologists to store, search and retrieve published mathematical models of
biological interests. BIOSIS
Biological Abstracts,
https://en.wikipedia.org/wiki/BIOSIS_Previews
Bibliographic index to biological literature. Berkeley Drosophila Genome Project
BDGP, http://www.fruitfly.org/
UC-Berkeley, US Curated annotated informatics database from the
Berkeley and European Drosophila genome projects, with annotations from
the literature, comparative sequence analysis and the FlyBase research
community. Biochemical Pathways,
https://web.expasy.org/pathways/
Originally from Boehringer Mannheim GmbH
https://www.roche.com/sustainability/philanthropy/science_education/pathways.htm BioExpress See GeneExpress BioMagRes, Univ. of
Wisconsin-Madison, US http://www.bmrb.wisc.edu/
Contains
NMR chemical shifts derived from proteins and peptides, reference data, amino
acid sequence information, and data describing the source of the protein and the
conditions used to study the protein. In constructing the database, proteins and
larger peptides have been given priority. Shift assignments for hemes,
cofactors, and substrates of a protein are also included, when they are reported
as part of a complex. BioMedCentral
(UK) http://www.biomedcentral.com/home/
Publisher
of journals covering all areas of biology and medicine. We provide free access
to peer- reviewed research articles and subscription- based access to reviews,
commentaries and other information services CATH Protein Structure
Classification, University College, London, UK http://www.cathdb.info/
Protein domains classified into superfamilies. CDD Conserved Domain Database,
NCBI, US http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml
Database and search service. currently contains domains derived from two
popular collections, Smart and Pfam,
plus contributions from colleagues at NCBI. The source databases also provide
descriptions and links to citations. Since conserved domains correspond to
compact structural units, CDs contain links to 3D-structure via Cn3D
whenever possible. CGAP Cancer Gene Anatomy Project,
NCBI, US http://www.ncbi.nlm.nih.gov/ncicgap/
An
interdisciplinary program established and administered by the National Cancer
Institute to generate the information and technological tools needed to decipher
the molecular anatomy of the cancer cell. CGAP is divided into five
complementary Initiatives, each with its own goals, informatics tools and
resources. Chemical Abstracts CA
http://www.cas.org/
Bibliographic index to the chemical literature. COG Clusters of Orthologous Groups
of Proteins, NCBI, US. http://www.ncbi.nlm.nih.gov/COG/
Delineated
by comparing protein sequences encoded in 21 complete genomes, representing 17
major phylogenetic lineages. Each COG consists of individual proteins or groups
of paralogs from at least 3 lineages and thus corresponds to an ancient
conserved domain. Conserved Domain Database CDD, NCBI, US http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml
currently contains
domains derived from two popular collections, Smart and Pfam, plus contributions from colleagues at NCBI. The source databases also provide descriptions and links to citations. Since conserved domains correspond to compact structural units, CDs contain links to 3D-structure via Cn3D whenever possible. CrossRef http://www.crossref.org Publishers
International Linking Association 77
publishers of over 4,780 journals. Database of Macromolecular
Movements, Molecular Biophysics and Biochemistry, Yale Univ., US http://bioinfo.mbb.yale.edu/MolMovDB/
This describes the motions that occur in proteins and other macromolecules,
particularly using movies. Associated with it are a variety of free software
tools and servers for structural analysis. M Gerstein & WG Krebs (1998).
Nuc. Acid. Res. 26:4280-4290 DBGET/LinkDB, GenomeNet,
Institute for Chemical Research, Kyoto University, Japan
http://www.genome.ad.jp/dbget/
Integrated database retrieval system, currently supports the following
databases and gene catalogs: nucleic acid sequences: GenBank, EMBL protein
sequences: SWISS- PROT, PIR, PRF, PDB, STR, 3D structures: PDB, sequence
motifs: PROSITE, EPD, TRANSFAC, enzyme reactions: LIGAND, metabolic
pathways: PATHWAY, amino acid mutations: PMD, amino acid indices: AAindex,
genetic diseases: OMIM, literature: LITDB, Medline, gene catalogs: E.
coli, H. influenzae, M. genitalium, M. pneumoniae, M. jannaschii,
Synechocystis, S. cerevisiae, cross reference EMBL and GenBank
dbSNP, NCBI http://www.ncbi.nlm.nih.gov/SNP/
Uses
"looser variation" definition for SNPs (no requirement or assumption
about minimum allele frequencies or the polymorphisms…Short deletion and
insertion polymorphisms, and microsatellite repeats, as well as SNPs are
included. Disease causing clinical mutations, as well as neutral polymorphisms,
are also in scope. [dbSNP FAQ] dbSTS, NCBI http://www.ncbi.nlm.nih.gov/dbSTS/
A
subset of GenBank, with sequence and mapping data on short genomic landmark
sequences (STSs). More comprehensive annotation than in GenBank and regularly
updated with BLAST. DDBJ
DNA DataBank of Japan Shares information daily with EMBL
and GenBank. http://www.ddbj.nig.ac.jp/
DOGS Database of Genome Sizes
Center for Biological Sequence
Analysis, Technical University Denmark http://www.cbs.dtu.dk/databases/DOGS/index.html
A comprehensive list of (estimated) genome sizes for different organisms.
The purpose of this database is to provide such a list. The ultimate goal is
to compile a list of all the known organisms and their respective genome
sizes. Both the completed and estimated genomes are listed. The estimated
genome sizes are given for both the organisms currently being sequenced and
those for which no sequencing programme is in progress. DOTS Database of Transcribed Sequences,
Univ. of Pennsylvania, US. Has been superseded by http://www.allgenes.org/
which combines data from DOTS and the Genome Channel (ORNL). DrugBank:
A unique bioinformatics and cheminformatics resource that combines detailed drug
(i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug
target (i.e. sequence, structure, and pathway) information. The database
contains nearly 4300 drug entries including >1,000 FDA-approved small
molecule drugs, 113 FDA-approved biotech (protein/peptide) drugs, 62
nutraceuticals and >3,000 experimental drugs. Additionally, more than 6,000
protein (i.e. drug target) sequences are linked to these drug entries. Each
DrugCard entry contains more than 80 data fields with half of the information
being devoted to drug/chemical data and the other half devoted to drug target or
protein data. DrugBank Wishart DS et al., DrugBank:
a comprehensive resource for in silico drug discovery and exploration.
Nucleic Acids Res. 2006 1;34 EMBASE
Excerpta Medica http://www.embase.com/home
Bibliographic index to biomedical and pharmacological literature.
EMBL (European Molecular Biology Laboratory: Main laboratory
is in Heidelberg, Germany, with outstations in Hamburg, Grenoble, France
(access to high powered instruments for structure studies) and Hinxton,
UK (bioinformatics). Supported by 14 European countries and Israel, shares data
daily with DDBJ and GenBank. http://www.embl-heidelberg.de/
ENCODE
ENCyclopedia of DNA Elements, NHGRI http://www.genome.gov/10005107
Before the best use of the information contained in the [human genome] sequence
can be made, the identity and precise location of all of the protein- encoding
and non- protein- encoding genes will have to be determined. The identity of
other functional elements encoded in the DNA sequence, such as promoters and
other transcriptional regulatory sequences, along with determinants of
chromosome structure and function, such as origins of replication, also remain
largely unknown. A comprehensive encyclopedia of all of these features is needed
to fully utilize the sequence to better understand human biology, to predict
potential disease risks, and to stimulate the development of new therapies to
prevent and treat these diseases.
Entrez Genomes,
NCBI, US http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome Entrez Nucleotides, NCBI, US
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide
A collection of sequences from several sources, including GenBank, RefSeq, and PDB. The number of bases grows at an exponential rate.
Entrez
Programming Utilities (E-utilities) are a set of eight server-side programs that
provide a stable interface into the Entrez query and database system at the
National Center for Biotechnology Information (NCBI). The E-utilities use a
fixed URL syntax that translates a standard set of input parameters into the
values necessary for various NCBI software components to search for and retrieve
the requested data. The E-utilities are therefore the structured interface to
the Entrez system, which currently includes 38 databases covering a variety of
biomedical data, including nucleotide and protein sequences, gene records,
three-dimensional molecular structures, and the biomedical literature.
http://www.ncbi.nlm.nih.gov/books/NBK25501/ Entrez Proteins, NCBI, US http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=Protein
The protein entries in the Entrez search and retrieval system have been compiled from a variety of sources, including SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq. ENZYME, ExPASy,
Switzerland http://au.expasy.org/enzyme/
Enzyme
nomenclature database EpoDB
Erythropoiesis database, CBIL (Computational
Biology & Informatics Lab), Univ. of Pennsylvania, US http://www.cbil.upenn.edu/EpoDB/index.html
A database of genes that relate to vertebrate red blood cells. It includes DNA
sequence, structural features, protein information, gene expression information
and transcription factor binding sites. ExPASy (Expert Protein Analysis
System), Swiss Institute of Bioinformatics, Switzerland
http://au.expasy.org/
Proteomics
server Express DB, George Church Lab,
Harvard Medical School, US http://arep.med.harvard.edu/ExpressDB/
A relational database for maintaining yeast RNA expression
data. It is intended as a demonstration of how such data can be managed, and
of the benefits such management confers. As of July, 1999, over 17.5 million
pieces of information have been loaded into ExpressDB deriving from 11
source studies. The EXD web query system allows data from multiple source
studies to be retrieved to user specifications and collated by ORF name. FlyBase See Berkeley Drosophila GenAtlas
Univ. Rene Descartes, France http://genatlas.org/
Genes, phenotypes and markers for humans.
GenBank:, NCBI, US http://www.ncbi.nlm.nih.gov/Genbank/
NIH
genetic sequence database, annotated collection of all publicly available DNA
sequence Mirrored at EMBL and DDBJ. Currently estimated (early 2000) that over 2
million bases are deposited here each day. This growth will only accelerate in
the future. Began in the 1980’s by DOE. Cross reference DDBJ and EMBL. See
also Sequencing Glossary. GeneCards, Weizmann Institute,
Israel http://www.genecards.org/index.shtml
Numerous
mirrored sites, database of human genes, their products and their involvement in
diseases. It offers concise information about the functions of all human genes
that have an approved symbol, as well as selected others. Gene Census system, Yale
University, US http://bioinfo.mbb.yale.edu/genome/
Comprehensive
statistical accounting of protein structural features in genomes and sequence
databanks. Gene Expression Omnibus
http://www.ncbi.nlm.nih.gov/geo/
A high- throughput gene expression / molecular abundance data repository, as
well as a curated, online resource for gene expression data browsing, query and
retrieval. Gene Map of the Human Genome,
International RH Mapping Consortium http://www.ncbi.nlm.nih.gov/genemap99/
Includes
locations of more than 30,000 genes and provides an early glimpse of some of the
most important pieces of the genome. GGEG Global Gene Expression Database,
MD Anderson Cancer Center
Human
mRNA sequence data specific to the RAGE and SAGE techniques, general mRNA
information. GPCRDB: Information system for G- Protein Coupled Receptors (GPCRs),
Univ. of Nijmegen, UCSF, EBI, IPSI, Leiden/ Amsterdam Center for Drug
Research, SWISS- PROT, tinyGrap http://gpcrdb.org/
GSDB See
Genome Sequence
DataBase GSS Genome Survey
Sequences,
NCBI, US http://www.ncbi.nlm.nih.gov/dbGSS/
The GSS division of GenBank is similar to the EST division, except
that its sequences are genomic in origin, rather than cDNA (mRNA). The
GSS division contains (but is not limited to) the following types of data:
random "single pass read" genome survey sequences, cosmid/BAC/YAC
end sequences, exon trapped genomic sequences, Alu PCR sequences. GXD: Gene Expression Database,
Jackson Laboratory, US http://www.informatics.jax.org/mgihome/GXD/gxdgen.shtml#concept
Gene
expression data on the laboratory mouse. Highwire Press, Stanford Univ., US http://highwire.org
Free (and fee-based), full- text science journals. HOMOLOGENE, NCBI, US https://www.ncbi.nlm.nih.gov/homologene/
A homology resource which includes both curated and
calculated orthologs and homologs for genes represented in UniGene
and LocusLink for human, mouse, rat, and zebrafish. The curated
orthologs include ortholog gene pairs reported in the Mouse Genome
Database (MGD) at the Jackson Laboratory, the Zebrafish Information
(ZFIN) database at the University of Oregon, and in published reports. The
calculated orthologs and homologs are the result of nucleotide sequence
comparisons between all UniGene clusters for each pair of organisms. These
orthologs and homologs are considered putative since they are based only on
sequence comparisons. HOVERGEN Homologous Vertebrate Genes Database, PBIL (Pôle Bio-Informatique
Lyonnais, Univ. Lyons, France http://biom3.univ-lyon1.fr/databases/hovergen.html
A database of homologous vertebrate genes, structured under ACNUC sequence
database management system. It allows one to select sets of homologous genes
among vertebrate species, and to visualize multiple alignments and
phylogenetic trees. Thus HOVERGEN is particularly useful for comparative
sequence analysis, phylogeny and molecular evolution studies. More generally,
HOVERGEN gives an overall view of what is known about a peculiar
[particular?] gene family. The database itself contains all vertebrate
sequences from GenBank (except ESTs), with some data corrected, clarified or
completed (notably to address the problem of redundancy). Homologous coding
sequences have been classified in gene families and protein multiple
alignments and phylogenetic trees have been computed for each family.
Sequences and related information have been structured in an ACNUC database.
The database is updated every four months HTGS High Throughput Genomic Sequences, NCBI, US http://www.ncbi.nlm.nih.gov/HTGS/
created to accommodate a growing need to make 'unfinished' genomic sequence
data rapidly available to the scientific community. It was done in a
coordinated effort between the three International Nucleotide Sequence
databases: DDBJ, EMBL, and
GenBank. The HTG division contains 'unfinished' DNA sequences generated
by the high-throughput sequencing centers. Sequence data in this division
are available for BLAST homology searches against either the "htgs"
database or the "month" database, which includes all new
submissions for the prior month. The HTG division of GenBank was recently
described in a [1997 Genome Research 7(10) article by Ouellette
and Boguski. HUGE Human Unidentified Gene Encoded Large Proteins, Kazuza DNA
Research Institute, Japan http://www.kazusa.or.jp/huge/The
HUGE protein database has been created to publicize the fruits of our Human
cDNA project at the Kazusa DNA Research Institute. In this project, we plan
to sequence and analyze long (>4 kb) human cDNAs and to establish methods
by using the sequence data how to predict the primary structure of proteins
of various biological activities. Currently, we focus on the analysis of
cDNA clones encoding particularly large proteins (>50 kDa). The basic
concept underlying our project and the strategies employed have been
described elsewhere (Ohara et al., 1997). Our HUGE protein
database contains various types of information derived from the predicted
primary structure data of newly identified human proteins Human Mouse Homology Map, NCBI, US http://www.ncbi.nlm.nih.gov/Homology/
Map is now being computed by integrating orthologs curated by the Mouse
Genome Database with putative orthologs identified by sequence homology.
This version of the Human-Mouse Homology map also differs from the previous
Davis map by including several new features: reporting representative STS
associated with the loci in the map and linked to the dbSTS pages, linking
human cytogenetic locations to NCBI's MapViewer, providing alignments of
representative sequences via BLAST2 , and linking gene symbols to LocusLink IXDB Integrated Chromosome X DataBase, Max Planck Institut,
Berlin, Germany http://www.molgen.mpg.de/~xteam/
The purpose of IXDB is to provide an integrated view of the X chromosome
mapping field. Ultimately this will allow the construction of an integrated
map that will take into account all the data generated by the community,
including physical, genetic, transcript and sequence information. This
implies acquiring, understanding and formatting an enormous amount of
experimental results and can only be accomplished progressively. We have
chosen to start the integration process with YAC maps generated by the
community. These provide the basis for future higher resolution physical
maps, as well as emerging transcript and sequence maps. The current content
of IXDB therefore reflects this situation, with the emphasis placed on
YAC mapping data. Due to their immediate value, IXDB has also started to
systematically include bacterial clone contig maps and EST data. Currently
IXDB does not store sequence data, although links to nucleic sequence
databases are provided.
KEGG Pathway Database,
http://www.genome.ad.jp/kegg/
Links
to pathway and other databases (metabolic and regulatory) http://www.genome.jp/kegg/pathway.html
See also note on KEGG under Metabolic
engineering glossary pathways LIGAND database, Institute for
Chemical Research, Kyoto Univ. Japan http://www.genome.ad.jp/dbget-bin/www_bfind?ligand
Enzymes,
compounds and reactions. Mammalian Gene Collection,
NCBI,
US http://mgc.nci.nih.gov/ The
goal of the Mammalian Gene Collection (MGC) is to provide a complete set of
full-length (open reading frame) sequences and cDNA clones of expressed genes
for human and mouse. The MGC is an NIH initiative that supports the
production of cDNA libraries, clones and sequences. Medline See PubMed MGD See Mouse Genome Database Mitelman DataBase of Chromosome Aberrations in Cancer, CGAP, NCI, US
http://cgap.nci.nih.gov/Chromosomes/Mitelman
relates chromosomal aberrations to tumor characteristics, based either on
individual cases or associations. All the data have been manually culled from
the literature by Felix Mitelman, Bertil Johansson, and Fredrik Mertens. MITOMAP, Emory Univ., US
http://www.mitomap.org/
A
human mitochondrial genome database. A compendium of polymorphisms and mutations
of the human mitochondrial DNA.
Mouse Atlas and Gene Expression Database,
Human Genetics Unit, MRC
Medical Research Council, Edinburgh, UK http://genex.hgu.mrc.ac.uk/
Not yet available 11/2/00 A digital atlas of mouse development and database
to be a resource for spatially mapped data such as in situ gene expression
and cell lineage. The project is in collaboration with the Department of
Anatomy, University of Edinburgh. The gene expression database is being
developed as part of the Mouse Gene Expression Information Resource (MGEIR)
in collaboration with the Jackson Laboratory, USA. Mouse Genome Database MGD See Mouse Genome Informatics
Mouse Genome Informatics,
Jackson Laboratory, US
http://www.informatics.jax.org/ international database resource for the
laboratory mouse, providing integrated genetic, genomic, and biological data to
facilitate the study of human health and disease. Mouse Phenome Database, Jackson Labs, US.
https://phenome.jax.org/ A
collection of baseline phenotypic data on commonly used and genetically
diverse inbred mouse strains through a coordinated international effort. Nucleic Acids Database NDB, Rutgers Univ., US
http://ndbserver.rutgers.edu/
Assembles and distributes structural information about nucleic acids. See
also Protein Data Bank PDB
Nucleotide Database
https://www.ncbi.nlm.nih.gov/nucleotide
As of December
1, 2018, all records from the databases for Expressed Sequence Tags (EST)
and Genome Survey Sequences (GSS) will reside in NCBI’s Nucleotidedatabase OMIM, Online Mendelian Inheritance
in Man, NCBI, US
https://www.ncbi.nlm.nih.gov/omim
https://secure.jhu.edu/form/OMIM ooTFD object oriented Transcription factors and gene
expression, Institute for Transcriptional Informatics IFTII, US http://www.ifti.org/cgi-bin/ifti/ootfd.pl
A successor to TFD (Transcription Factors Database), now referred to as rTFD
(relational Transcription Factors Database). ooTFD has been implemented in a
number of object-oriented database management systems, including ROL (Rule-
based
Object Language), MOOD (Materials object- oriented database), and the pure java
object database ozone. PDB Protein Data Bank, Research
Collaboratory for Structural Bioinformatics http://www.rcsb.org/
3D macromolecular structural data.
Incorporates NDB Nucleic Acid Database Project, Rutgers. PEDB Prostate ESTs, Fred Hutchinson
Cancer Research Center, US http://www.pedb.org/
A curated relational database and suite of analysis tools designed for the
study of prostate gene expression in normal and disease states. Expressed
Sequence Tags (ESTs) and full-length cDNA sequences derived from more than
40 human prostate cDNA libraries are maintained and represent a wide
spectrum of normal and pathological conditions. Pfam
https://pfam.xfam.org/ a large collection
of protein families, each represented by multiple sequence alignments and hidden
Markov models (HMMs)
PIR Protein Information Resource,
NBRF, Georgetown Univ. Medical Center, US http://pir.georgetown.edu/pirwww/
The Protein Information Resource (PIR),
in collaboration with the Munich Information Center for Protein Sequences (MIPS)
and the Japanese International Protein Sequence Database (JIPID) maintains the
PIR- International Protein Sequence Database --- a comprehensive, annotated, and
non- redundant protein sequence database in which entries are classified into
family groups and alignments of each group are available. PROSITE, Swiss Institute of
Bioinformatics http://au.expasy.org/prosite/
A database of protein families and
domains. It consists of biologically significant sites, patterns and profiles
that help to reliably identify to which known protein family (if any) a new
sequence belongs
PubMed Central,
NCBI
http://www4.ncbi.nlm.nih.gov/PubMed/
Medline Rat Genome Database
See RGD RefSeq Reference Sequences,
NCBI, US http://www.ncbi.nlm.nih.gov/RefSeq/
Aims to provide a comprehensive, integrated,
non-redundant set of sequences, including genomic DNA, transcript (RNA), and
protein products, for major research organisms. RefSeq standards serve as the
basis for medical, functional, and diversity studies; they provide a stable
reference for gene identification and characterization, mutation analysis,
expression studies, polymorphism discovery, and comparative analyses. RefSeqs
are used as a reagent for the functional annotation of some genome sequencing
projects, including those of human and mouse.
Research Collaboratory for Structural
Bioinformatics RCSB See Protein DataBank RGD Rat Genome Database, Medical
College of Wisconsin, US http://rgd.mcw.edu/
is the [Goal is] establishment of a Rat Genome Database, to collect, consolidate, and integrate data generated from ongoing rat genetic and genomic research efforts and make these data widely available to the scientific community. A secondary, but critical goal is to provide curation of mapped positions for quantitative trait loci, known mutations and other phenotypic data. Saccharomyces Genome Deletion
Project http://sequence-www.stanford.edu/group/yeast_deletion_project/deletions3.html SGD Saccharomyces Genome
Database, Stanford University http://www.yeastgenome.org/
Structure
https://www.ncbi.nlm.nih.gov/structure
Three dimensional structures provide a wealth of
information on the biological function and the evolutionary history of
macromolecules. They can be used to examine sequence-structure-function
relationships, interactions, active sites and more. SWISS 2D PAGE, Swiss Institute
of Bioinformatics http://au.expasy.org/ch2d/
Data
on proteins identified on various 2-D PAGE reference maps. SWISS 3D Image, ExPASy, Switzerland http://au.expasy.org/sw3d/
An image database which strives to provide high quality pictures of
biological macromolecules with known three- dimensional structure. The
database contains mostly images of experimentally elucidated structures, but
also provides views of well accepted theoretical protein models. SWISS-MODEL Repository, Swiss Institute of
Bioinformatics and Biozentrum, Basel http://swissmodel.expasy.org/repository/
A database of annotated three- dimensional comparative protein structure models
generated by the fully automated homology- modelling pipeline SWISS- MODEL.
SWISS-PROT, ExPASy (Expert
Protein Analysis System) Swiss Institute of Bioinformatics A curated protein
sequence database which strives to provide a high level of annotation (such as
the description of the function of a protein, its domains structure,
post-translational modifications, variants, etc.), a minimal level of redundancy
and high level of integration with other databases. See UniProt. Taxonomy, NCBI, US
See Nomenclature
UniGene,
NCBI, US
http://www.ncbi.nlm.nih.gov/UniGene/index.html
An experimental system for
automatically partitioning GenBank sequences into a non- redundant set of gene-
oriented clusters. Each UniGene cluster contains sequences that represent a
unique gene, as well as related information such as the tissue types in which
the gene has been expressed and map location. Well- characterized genes and ESTs.
UNI-PROT Knowledgebase Universal Protein
Resource, http://www.uniprot.org/ .a
comprehensive, high-quality and freely accessible resource of protein sequence
and functional information. SWISS PROT, TrEMBL, UniRef, UniParc, Proteomes
UniVec,
NCBI, US http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html
A database that can be used to quickly identify segments within nucleic acid
sequences which may be of vector origin (vector contamination)
... In addition to vector sequences, UniVec also contains sequences for those
adapters, linkers and primers commonly used in the process of cloning cDNA or
genomic DNA. V Base: the database of human antibody genes, Centre for Protein
Engineering, Medical Research Council, UK http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html
A system for quickly identifying
segments of a nucleic acid sequence that may be of vector origin. NCBI developed
VecScreen to combat the problem of vector contamination in public sequence
databases. Software
BLAST (Basic Local
Alignment Search Tool):
finds regions of similarity between biological sequences. The program compares
nucleotide or protein sequences to sequence databases and calculates the
statistical significance..
https://blast.ncbi.nlm.nih.gov/Blast.cgi
See also Sequencing glossary Cn3D http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
A helper application for your web browser that allows you to view 3-dimensional
structures from NCBI's Entrez
retrieval service. FASTA: Software program, from
the University of Virginia, used to scan a protein or DNA sequence library for
similar sequences.
https://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml
See also Sequencing
GRAIL site updates major,
Broad Institute,
https://software.broadinstitute.org/mpg/grail/faq.html
The GRAIL (Gene
Recognition and Assembly Internet Link) program for exon prediction was
originally described in 1991 by Uberacher and Mural (Proc Natl Acad Sci
USA). For years, along with GENSCAN (Burge & Karlin, J. Mol Biol. 1997),
it was widely cited and utilized for defining coding regions within the
genome. Until 2005 it developed and supported by the Oak Ridge National
Laboratory - since then it has been unavailable. We apologize for any
confusion this might have caused. GRAIL: Genome Recognition and
Assembly Internet Link was at Oak Ridge National Lab. MedMiner
now Miner suite of
bioinformatics software
https://discover.nci.nih.gov/ ORF Finder, NCBI, US
https://www.ncbi.nlm.nih.gov/orffinder/
Gene prediction. Protein Explorer http://www.umass.edu/microbio/chime/explorer/
Supersedes RasMol.
RasMol homepage [Macromolecular
structure viewer] See Protein Explorer which is now recommended as easier to use and more powerful than
RasMol. SEQUEST
http://fields.scripps.edu/yates/wp/?page_id=17
Data mining tools
Comments? Questions?
Revisions?
Mary Chitty MSLS
mchitty@healthtech.com
Last revised
July 10, 2019
Nucleic Acids Research home page
https://academic.oup.com/nar has
links to the database and web server issues.
Nucleic Acids Research,
Volume 46, Issue D1, 4 January 2018, Pages D1–D7,https://doi.org/10.1093/nar/gkx1235
databases
https://academic.oup.com/nar/issue/46/D1
Nucleic Acids Research: Web server issue,
2018 Web Server issue https://academic.oup.com/nar/issue/46/W1
web-based software resources for analysis and visualization of molecular
biology data.
Dead DNA: See under Mitomap
DIP Database of Interacting Proteins,
UCLA/DOE, US http://dip.doe-mbi.ucla.edu/
Documents
experimentally determined protein- protein interactions and interactive methods.
Data is collected for the following G protein- Coupled Receptor Families:
Class A. Receptors related to Rhodopsin and the adrenergic receptor; Class
B. Receptors related to the Calcitonin and PTH/ PTHrP Receptors; Class C.
Receptors related to the Metabotropic Receptors; Class D. Receptors related
to the pheromone Receptors; Class E. Receptors related to the cAMP
Receptors; Non-GPCR molecules (e.g, G proteins, halo- rhodopsins,
etc.)
Now e-Mouse Atlas http://www.emouseatlas.org/emap/home.html
SCOP2
http://scop2.mrc-lmb.cam.ac.uk/
a successor of Structural classification of proteins (SCOP).
Similarly to SCOP, the main focus of SCOP2 is on proteins that are
structurally characterized and deposited in the PDB. Proteins are
organized according to their structural and evolutionary relationships,
but, in contrast to SCOP, instead of a simple tree-like hierarchy these
relationships form a complex network of nodes. Each node represents a
relationship of a particular type and is exemplified by a region of
protein structure and sequence.
SMART (Simple Modular Architecture Research Tool, EMBL, Heidelberg, Germany http://smart.embl-heidelberg.de/help/smart_glossary.shtml
Tools for Data Mining, NCBI, US http://www.ncbi.nlm.nih.gov/Tools/index.html
Provides access to BLAST, Clusters of
Orthologous Groups (COGs), ORF finder, Electronic PCR, UniGene, GeneMap99,
VecScreen, Cancer Genome Anatomy Project CGAP, Cancer Chromosome Aberration
Project cCAP, Human-Mouse Homology Maps, VAST search
Nucleic Acids Research
Database Category list https://www.oxfordjournals.org/nar/database/c/
Nucleotide
Sequence Databases
RNA sequence
databases
Protein
sequence databases
Structure
Databases
Genomics
Databases (non-vertebrate)
Metabolic and
Signaling Pathways
Human and other
Vertebrate Genomes
Human Genes and
Diseases
Microarray Data
and other Gene Expression Databases
Proteomics
Resources
Other
Molecular Biology Databases
Organelle
databases
Plant
databases
Immunological
databases
Cell biology
Back to GenomicGlossaries.com