You are here Biopharmaceutical/ Genomic Glossary Homepage/Search > Informatics >Information management & interpretation

Biopharmaceutical information management & interpretation glossary & taxonomy
Evolving Terminology for Emerging Technologies
Comments? Questions? Revisions? Mary Chitty 
mchitty@healthtech.com
Last revised February 26, 2009


New Page 1

Please register for CHI's Genomics Glossaries & Taxonomies website. This sign-in box with then disappear from each page, if you accept cookies. Use of this site will continue to be free, but better demographic data on who is accessing this material helps us to justify the expense of maintaining this resource. Registration policy has details.

Registered users of the Genomics Glossaries & Taxonomies will automatically be signed up for CHI's complimentary email monthly newsletter, GenomeLink, unless you choose to opt out of receiving it.

Mr.     Ms.     Mrs.     Dr.     Prof.

First:

         

Last:

Title:

Dept.:

Company:

Address:

City:

State:

Zip:

Country:

Email:

Opt-out of Email

YES    NO

Telephone:

Would you like to receive CHI event updates via fax? 
Yes       No 

Fax:


The dividing line between this glossary and Algorithms & data analysis is very fuzzy. In general this one focuses on unstructured data (or a combination of structured and unstructured), while Algorithms centers on structured data  Finding guide to terms in these glossaries Informatics  Map   Site Map
Informatics includes Bioinformatics   Computers & computing    In silico & Molecular Modeling   Ontologies   
Technologies Microarrays & protein chips    Sequencing 

Advances in biology and new high-throughput technologies are generating massive amounts of data that overwhelm the current information technology infrastructure. The challenge is to build a common capability that enables a more efficient translation of data into knowledge that leads to new and effective treatments.   caBigTM and Molecular Medicine, NCI, NIH http://cabig.cancer.gov/molecular/overview.asp   

Google = "data analysis" about 1,420,000 as of July 23, 2002; about 4,480,000 as of Sept. 23, 2004;   "data interpretation" about 58, 200 July 23, 2002; about 147,000 as of Sept. 23, 2004

3D technologies: Visual communications are pervasive in information technology and are a key enabler of most new emerging media. In this context, the NRC Institute for Information Technology (NRC-IIT) performs research, development and technology transfer activities to enable access to 3D information of the real world. Research in the 3D Technologies program focuses on three main areas: Virtualizing Reality and Visualization, Collaborative Virtual Environments, 3D Data Mining and Management [Institute for Information Technology, National Research Council, Canada, 3D Technologies] 

artificial intelligence: Algorithms & data analysis glossary

Google = about  1,120,000  July 19, 2002; about 3, 040,000 Oct. 22, 2004

BIRN Biomedical Informatics Research Network: http://www.nbirn.net/ 

bias: One of the two components of measurement error (the other one being variance). Bias is a systematic error that causes the measurement to differ from the correct value. Since bias is systematic, it affects all experiment replicas the same way. 

bibliomining:  The combination of data mining, bibliometrics, statistics, and reporting tools used to extract patterns of behavior- based artifacts from library systems. Scott Nicholson, Bibliomining: Data Mining for Libraries, Syracuse Univ. US http://www.bibliomining.com/ 

bioinformatics visualization: BIoinformatics Glossary

biomedical computing: Computers & computing glossary    

Google = about 11,800 July 19, 2002; about 20,900 Oct. 22, 2004

biomedical informatics: 

Google about 66,600 Oct. 22, 2004

biomedical ontologies: Open Biomedical Ontologies is an umbrella web address for well-structured controlled vocabularies for shared use across different biological and medical domains.  http://obo.sourceforge.net/ 

Google = about 102, Jan. 8, 2003; about 294 Oct. 1, 2003; about 490 Oct 22, 2004; about 488 May 2, 2005

Biomedical Ontologies: Overview

BIONLP.org: Bioinformatics glossary

biopharmaceutical informatics: Drug companies go through a very arduous and regulated discovery, applied research, and development process- typically spanning five years of laboratory research and ten years of clinical studies .. multinational clinical studies, which need to be done with tremendous precision over a very long period of time. The study parameters must be identical for every patient (many times numbering 10,000 patients, followed for five or more years), and all the participating hospitals essentially have to behave in exactly the same way for the trial to be valid. ..  The life science industry is conservative by nature, and therefore it is a late- adopting industry. It is very sensitive to standards because of the legacy according to which these companies have to maintain data and information. Major pharmaceutical companies typically adopt a 100-year minimum document retention policy, ...each of the industry's four industrial sectors - the pharmaceutical, the biotech, the medical device, and the diagnostics sector - has a different set of needs and desires, as well as its own requirements for unique IT solutions.  ... 

Life science companies are dealing with very large computational data sets. Some are now approaching half terabyte sizes and upward Life science companies also immensely concern themselves with security, because their data represent their crown jewels. Other major concerns expressed by this industry include the stability, scalability, and security of an operating environment. Life science companies and regulatory bodies such as the FDA are more concerned than ever with operating environments that decay with use: When under computational stress, these fragile operating systems have a habit of crashing, and when these systems crash, they tend to corrupt data. ...

Post-genomic, proteomic, chemical information, and other data sets have created a major appetite for solutions to deal with this tremendous amount of data. Scientists are now asking their IT professionals for the ability to better conceptualize and interpret the meaning of this vast information. To do this, scientists need tools for 3D visualization with a tremendous degree of high definition and accuracy. The next step is to take disparate data sets, render them into 3D values, see the DNA and RNA interface, watch protein folds, and then put a therapeutic small molecule in there and see how it relates within a virus that environmentally influences a different process. Scientists Are Demanding Solutions for Dealing with the Post-Genomic, Proteomic, and Chemical Data Deluge: An Interview with Howard Asher, Director, Global Life Sciences Group, Sun Microsystems, CHI GenomeLink 30 http://www.healthtech.com/newsarticles/issue30_1.asp 

Biosemantics Group: http://www.biosemantics.org/  Addresses concept identification and disambiguation algorithms, meta-analysis and visualization techniques, and biological applications [interconnect genes and proteins, semi-automated annotations of protein functions.] Medical Informatics department of the ErasmusMC University Medical Center of Rotterdam and the Center for Human and Clinical Genetics of the Leiden University Medical Center

blog: Wikipedia http://en.wikipedia.org/wiki/Blog 

Related terms: blogging, blogosphere, microcontent, nanopublishing, weblog

blogging:  In the beginning - say 1994 - the phenomenon now called blogging was little more than the sometimes nutty, sometimes inspired writing of online diaries. These days, there are tech blogs and sex blogs and drug blogs and onanistic teenage blogs. But there are also news blogs and commentary blogs, sites packed with links and quips and ideas and arguments that only months ago were the near- monopoly of established news outlets. Poised between media, blogs can be as nuanced and well- sourced as traditional journalism, but they have the immediacy of talk radio.  Andrew Sullivan, "The blogging revolution" Wired Magazine, May 2002 http://www.wired.com/wired/archive/10.05/mustread.html?pg=2

bottom-up ontologies: Are flexible through the use of implicit and, hence, parsimonious part- whole and subconcept-  superconcept relations. The bottom- up method complements current practice, where, as a rule, ontologies are built top- down. The design method is illustrated by an example involving ontologies of pure substances at several levels of detail. It is not claimed that bottom- up construction is a generally valid recipe; indeed, such recipes are deemed uninformative or impossible. Rather, the approach is intended to enrich the ontology developer's toolkit. [Paul E. van der Vet, Nicolaas J.I. Mars, Bottom- Up Construction of Ontologies, IEEE Transactions on Knowledge Engineering, July- Aug, 1998 10(4): 513- 526] http://www.computer.org/tkde/tk1998/k0513abs.htm

Google = "bottom-up ontologies" about 10 bottom-up ontologies about 2, 250 July 19, 2002

bottom-up taxonomies: Faceted classification is a hallmark of the bottom-up approach and suggests yet another reason why the phrase "build the taxonomy" is ill-conceived. ... The bottom-up approach suggests a very different way to classify content. When populating a top-down taxonomy, the central question is "where do I put this?" but at the heart of the bottom-up approach is the question "how do I describe this?" By asking this subtly different question, you’ll wind up in a dramatically different destination.  Peter Morville, "Bottoms up: Designing complex, adaptive systems, Faceted Classification, New Architect, 2002  http://www.newarchitectmag.com/documents/s=7733/na1202b/index3.html 

Can mean from specific to general, but it can also mean content- oriented. [Jean Graef "Top down or bottom up" Montague Institute Review, 2001] http://www.montague.com/review/topdown.html

CML Chemical Markup Language: Chemoinformatics glossary

classification: Involves the development and use of a scheme for the systematic organization of knowledge. (Taylor p 576) Arlene Taylor identified three approaches to classification: enumerative, hierarchical, and analytico- synthetic. Enumerative classification attempts to assign headings for every subject and alphabetically enumerates them. Hierarchical classification uses a more philosophical approach based on the inherent organization of the subject being classified, and establishes logical rules for dividing topics into classes, divisions, and subdivisions. Analytico- synthetic classification assigns terms to individual concepts and provides rules for the local cataloger to use in constructing headings for composite subjects. Traditional classification systems in this country are basically enumerative, though many contain some elements of hierarchy and faceting. (Taylor pp 319- 321) Amanda Maple, "FACETED ACCESS: A REVIEW OF THE LITERATURE" Working Group on Faceted Access to Music, Music Library Association Annual Meeting, 10 February 1995 http://theme.music.indiana.edu/tech_s/mla/facacc.rev  

Indexing in the library and information management sense, but also see Algorithms & data analysis glossary classification, classifiers

collaborative filtering: Tools that leverage user preferences, patterns, and purchasing behavior to customize organization and navigation systems. [Peter Morville "Software for Information Architects" Argus Center for Information Architecture, 2000]  http://argus-acia.com/strange_connections/current_article.html 

Amazon's recommendations based on what other buyers of a specific title are buying is a familiar example of collaborative filtering.  

Google = about  21,600 July 19, 2002; about 49,300 Oct. 22, 2004 

collaborative metadata: A robust increase in both the amount and quality of metadata is integral to realizing the Semantic Web. The research reported on in this article addresses this topic of inquiry by investigating the most effective means for harnessing resource authors' and metadata experts' knowledge and skills for generating metadata. Jane Greenberg, W. Davenport Robertson, Semantic web construction: An Inquiry of Authors' Views on Collaborative Metadata Generation, International Conference DC 2002, Metadata for e-Communities, Oct. 13- 17, 2003, Florence Italy http://dois.mimas.ac.uk/DoIS/data/Papers/dcmdcflorp:5.html
http://www.bncf.net/dc2002/program/ft/paper5.pdf

Google = about 116 Apr. 24, 2003; about 377 Oct. 22, 2004

common ontology: Defines the vocabulary with which queries and assertions are exchanged among agents. ... The agents sharing a vocabulary need not share a knowledge base; each knows things the other does not, and an agent that commits to an ontology is not required to answer all queries that can be formulated in the shared vocabulary. In short, a commitment to a common ontology is a guarantee of consistency, but not completeness, with respect to queries and assertions using the vocabulary defined in the ontology. [Tom Gruber, What is an ontology?"  Knowledge Systems Lab, Stanford Univ. 2001] http://www-ksl.stanford.edu/kst/what-is-an-ontology.html

Google = about  1,190 July 19, 2002, about 4,130 Oct. 22, 2004 

Related terms: ontological commitment, reusable ontologies, shared ontologies 

communications standards: Pharmacogenomics glossary

communities of practice:  Alliances glossary  

competitive intelligence: Business of biopharmaceuticals glossary

computational linguistics:  Computational Linguistics, or Natural Language Processing (NLP), is not a new field. As early as 1946, attempts have been undertaken to use computers to process natural language. These attempts concentrated mainly on Machine Translation ... the limited performance of these systems made it clear that the underlying theoretical difficulties of the task had been grossly underestimated, and in the following years and decades much effort was spent on basic research in formal linguistics. Today, a number of Machine Translation systems are available commercially although there still is no system that produces fully automatic high- quality translations (and probably there will not be for some time). Human intervention in the form of pre- and/ or post-editing is still required in all cases.  Another application that has become commercially viable in the last years is the analysis and synthesis of spoken language, i.e. speech understanding and speech generation. ... An application that will become at least as important as those already mentioned is the creation, administration, and presentation of texts by computer. Even reliable access to written texts is a major bottleneck in science and commerce. The amount of textual information is enormous (and growing incessantly), and the traditional, word- based, information retrieval methods are getting increasingly insufficient as either precision or recall is always low (i.e. you get either a large number of irrelevant documents together with the relevant ones, or else you fail to get a large number of the relevant ones in the collection). Linguistically based retrieval methods, taking into account the meaning of sentences as encoded in the syntactic structure of natural language, promise to be a way out of this quandary. [Computational Linguistics FAQ, Univ. of Zurich, Switzerland, 2001] http://www.ifi.unizh.ch/groups/CL/CL_FAQ.html

Google = about  97,100 July 19, 2002, about 283,000 Oct. 22, 2004 

Linguistics, natural language, and computational linguistics Meta- Index, Stanford Univ. US  http://www-nlp.stanford.edu/links/linguistics.html

configurable: Many out-of-the-box solutions claim to be easy to "customize," when in fact they are referring to configuration options, not true customizability.  Manufacturers have distinct challenges, some which can be addressed out of the box, but many of which cannot. Manufacturers also need the ability to capitalize on changing dynamics in the marketplace before their competitors do. That's why it's imperative to understand the differences between configuration and customization and the value of selecting a CRM system that offers the flexibility to adapt and model specific manufacturing business processes.  Why you need to know the difference between Customizable and Configurable CRM, CDC Software podcast, Intelligent Enterprise,  2006 http://whitepaper.intelligententerprise.com/cmpintelligententerprise/search/viewabstract/86931/index.jsp 

contextual data: While proteomic studies initially focused largely on expression and protein identification, progress in these areas drove the demand for more detailed types of proteomic data. Now researchers want information about where specific proteins are expressed, both in terms of tissues and localization within the cell. Information relating proteins to function require additional details of post- translational modification, and studies of protein interactions have moved beyond just looking at binary interactions to studies of protein complexes.

For both genomics and proteomics, this shift can be characterized as an interest in more contextual data. Enhanced insight into biological context is essential for obtaining a better understanding of how biology actually works, and thus there is now an emphasis to move from genomic and proteomic snapshots to time series data of expression. Such context is of particular value if biological studies are to be translated into medical advances, because of the importance of being able to predict the impact of potential treatments. The integration of genomic and proteomic data with medical conditions, treatment and outcomes becomes another critical type of contextual information. Christina Lingham, Beyond Genome: Thinking Globally, Cambridge Healthtech http://www.beyondgenome.com/download/editorial.pdf

controlled vocabulary: Robin Cover's XML Cover Pages is described as "a collection of references on matters of Subject Classification, Taxonomies, Ontologies, Indexing, Metadata, Metadata Registries, Controlled Vocabularies, Terminology, Thesauri, Business Semantics", 2003 http://xml.coverpages.org/classification.html

A limited number of words or phrases used in an indexing system (subject headings) or database, to ensure reliable, consistent retrieval. Long used to enhance retrievability and consistency, ontologies and/ or taxonomies certainly sound sexier than "controlled vocabularies" but continue to have a good deal in common. Taxonomies add hierarchies, while ontologies make information "machine- understandable" as well as machine- readable. 

Google = about 39,700 July 19, 2002; about 85,300 Oct. 22, 2004 

Broader terms: ontology, taxonomy Related terms: RDF, semantic web 

Thesauri and controlled vocabulary definitions, National Library of Canada, 2002, http://www.tbs-sct.gc.ca/its-nit/standards/tbits39/crit392_e.asp 

customizable: Quite labor intensive and can be very expensive.  Compare configurable.

DAML DARPA Agent Markup Language: The goal of the DAML effort is to develop a language and tools to facilitate the concept of the semantic web. http://www.daml.org/  Related term: OIL

DAML + OIL http://www.w3.org/TR/daml+oil-walkthru/

data cleaning, data integration: Algorithms & data analysis glossary

Google = "data cleaning" about  12,200; about 22,500 July 3, 2003
"data integration" about 175,000 July 19, 2002; about 306, 000 July 3, 2003; about 817,000 Mar. 22, 2004; about 2,940,000 June 22, 2007

data conversion:   Originally data conversion was primarily a matter of moving text and database files from one medium to another, one hardware platform to another, one operating system environment to another. But as text and database representations became more sophisticated it became apparent that application interoperability was going to be the overriding issue of concern. Company History, Data Conversion Lab  http://www.dclab.com/company_history.asp 

Glossary, DCL Labs http://www.dclab.com/glossary.asp 30+ definitions

data management methods: Algorithms & data analysis glossary has automated methods, methods in this glossary generally combine human and automated methods.

data management vocabulary: A third type of taxonomy that is valuable in a business setting is the data management vocabulary. This taxonomy is a short list of authorized terms without any hierarchical structure that is used to support business transactions. For example, with a large sales force, it is most efficient if salespeople report their work using the same list of activities. They may count their contacts with companies according to a simple list of contact types (managers, decision-makers, and so on), and they may categorize the businesses they work with according to different controlled descriptors that have to do with the business's size or market. In this case, a shared taxonomy will help to support reporting needs of management and other salespeople trying to mine the information in the future. Without a shared taxonomy, a company risks developing islands of data that cannot be shared or easily utilized by the rest of the organization. Susan Conway and Char Sligar, "What is a taxonomy" Unlocking Knowledge Assets, Chapter 6, Building Taxonomies, Microsoft Press, 2002   http://www.microsoft.com/mspress/books/sampchap/5516a.asp

Google = about 49 July 9, 2007

Related terms: descriptive taxonomies, navigational taxonomies

data mart, data mining, data pipelining, data reduction methods, data warehouse: Algorithms & data analysis glossary

data visualization:  The classical definition of visualization is as follows: the formation of mental visual images, the act or process of interpreting in visual terms or of putting into visual form. A new definition is a tool or method for interpreting image data fed into a computer and for generating images from complex multi-dimensional data sets (1987). Definitions and Rationale for Visualisation, D. Scott Brown, SIGGRAPH, 1999 http://www.siggraph.org/education/materials/HyperVis/visgoals/visgoal2.htm   includes information on data visualization.

Related term: information visualization; Broader term: visualization

databases: Bioinformatics glossary; Databases & software directory

deep web:  Most of the Web's information is buried far down on dynamically generated sites, and standard search engines never find it.  The deep Web is qualitatively different from the surface Web. Deep Web sources store their content in searchable databases that only produce results dynamically in response to a direct request. But a direct query is a "one at a time" laborious way to search.  [Michael K. Bergman "The deep web: surfacing hidden value" White Paper, BrightPlanet, 2000-2002] http://www.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp  Another version at  http://www.press.umich.edu/jep/07-01/bergman.html

Google = about 10,200 Aug. 17, 2002; about 42,900 Oct. 22, 2004

Related term:  invisible web

description logic: Has existed as a field for a few decades yet only somewhat recently has appeared to transform from an area of academic interest to an area of broad interest. This paper provides a brief historical perspective of description logic developments that have impacted DL usability to include communities beyond universities and research labs.  Deborah L. McGuinness. ``Description Logics Emerge from Ivory Towers''. Stanford Knowledge Systems Laboratory Technical Report KSL-01-08 2001. In the Proceedings of the International Workshop on Description Logics. Stanford, CA, August 2001.http://www.ksl.stanford.edu/people/dlm/papers/dls-emerge-abstract.html

The main effort of the research in knowledge representation is providing theories and systems for expressing structured knowledge and for accessing and reasoning with it in a principled way. Description Logics are considered the most important knowledge representation formalism unifying and giving a logical basis to the well known traditions of Frame- based systems, Semantic Networks and KL- ONE-like languages, Object- Oriented representations, Semantic data models, and Type systems. [Description Logic Knowledge Representation] http://dl.kr.org/

Description Logics Home Page, Patrick Lambrix, Linkoping Univ. Sweden http://www.ida.liu.se/labs/iislab/people/patla/DL/index.html

descriptive ontology: A descriptive ontology would try to explain how things are, whereas a normative ontology would try to tell us how things ought to be. [Robert Kent "Ballot comment", Standard Upper Ontology [SUO] E-mail archive,  IEEE, 2001] http://suo.ieee.org/email/msg05921.html

Google = about 121 July 19, 2002; about 343 Oct. 22, 2004 

descriptive taxonomies: Supports information retrieval through searching. By developing and maintaining a core set of controlled vocabularies, a company can consistently label or tag its content with descriptive metadata selected from these authorized vocabularies. In addition, vocabularies can capture knowledge worker terminology and map it to a company’s preferred terms. ... Active mining of new terms and phrases from emerging content and from search query logs will help keep a descriptive taxonomy relevant to the users of that information. A taxonomy built on the thesaurus model (designating a preferred or authorized term with entry terms or variants) helps to link these different terms together. At search time, the term that the knowledge worker uses is associated with the preferred (or key) term for more precise searching, or the knowledge worker’s term is expanded to include the variant forms of the term as well as the authorized term for a broader search. Taxonomies built on the thesaurus model do not force all work groups to use a common set of terminology. Susan Conway and Char Sligar, "What is a taxonomy" Unlocking Knowledge Assets, Chapter 6, Building Taxonomies, Microsoft Press, 2002   http://www.microsoft.com/mspress/books/sampchap/5516a.asp

Google = about  119 July 19, 2002; about  201 Oct. 22, 2004; about 456 July 9, 2007

Related terms: bottom-up taxonomies, data management vocabulary, navigational taxonomies, shared taxonomies

digital libraries: International digital libraries research is intended to contribute to the fundamental knowledge required to create information systems that can operate in multiple languages, formats, media, and social and organizational contexts. International collaborative research can bring complementary approaches, resources and perspectives to bear on common needs and information technology research challenges. International digital libraries applications testbeds are intended to build operational prototypes for globally distributed, internet- based resources, and to implement these in a variety of applications contexts. The testbeds are expected to advance technologies across the digital libraries lifecycle, focus collective work on organizing domain- specific content, and engage researchers, scholars, students and teachers in enhancing research and knowledge resources in a variety of subject domains. [National Science Foundation, International Digital Libraries Collaborative Research & Applications Testbeds program solicitation, 2002] http://www.nsf.gov/pubs/2002/nsf02085/nsf02085.html

Google = about 197,000 July 19, 2002; about 1,480,000 Oct. 22, 2004 

Directed Acyclic Graph DAG: A directed graph where no path starts and ends at the same vertex. See also directed graph, acyclic graph, cycle. Note: Also called a DAG or acyclic digraph. Also called an oriented acyclic graph. [Paul E. Black, NIST, Dictionary of Algorithms, Data Structures and Problems, 2001] http://www.nist.gov/dads/HTML/directAcycGraph.html

The difference between a DAG and a hierarchy is that in the latter each child can only have one parent; a DAG allows a child to have more than one parent. A child term may be an "instance" of its parent term (is a relationship) or a component of its parent term (part- of relationship). A child term may have more than one parent term and may have a different class of relationship with its different parents. [Gene Ontology Consortium, General Documentation" 2001] http://www.geneontology.org/doc/GO.doc.html

Google = about  18,300 July 19, 2002; about 35,000 Oct. 2, 2004 

disambiguate: Make less ambiguous, clarify, elucidate. 

Google = about  33,100 July 19, 2002; about 65,300 Oct. 22, 2004 

domain expertise: 

Google = about 25,500 Dec. 18, 2002; about 68,500 Oct. 22, 2004; about 785,000 June 22, 2007

domain ontology: Ontologies glossary

domain taxonomies: The first step is to define the taxonomy of entities in the domain. This consists of firstly defining the basic classes, then defining the sub- types of these classes.  [Mick O'Donnell, Defining domain taxonomies" Domain Acquisition in Ilex 3.0, 1993-1996] http://www.hcrc.ed.ac.uk/ilex/Manual/extending/Domain-Acquisition/domacq/node4.html#S0....

Google = about 166 July 19, 2002; about 276 Oct. 22, 2004 

drug discovery informatics:

drug ontology: Drug discovery & Development

Dublin Core Metadata Initiative: An open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. The original workshop for the Initiative was held in Dublin, Ohio [OCLC] in 1995. http://dublincore.org/

dynamic ontology: Ontology glossary

dynamic taxonomies: Developed as a way of sifting through large amounts of data. At its base it uses a domain specific taxonomic hierarchy, consisting of concepts connected by is- a relationships. Examples from the medical domain include UMLS and SNOMED. Concepts from the hierarchy are used to classify chunks of guidelines text. The hierarchy is then used as an augmented index for guidelines chunk retrieval. Navigation is done via the operations of browsing and zooming. [Dennis Wollersheim, Implementation of dynamic taxonomies for clinical guidelines retrieval, La Trobe Univ., Australia, c. 2001]  http://homepage.cs.latrobe.edu.au/lewisba/SPIRT/dw2001c.pdf

Google = about 119 July 19, 2002; about 369 Oct. 22, 2004 

evolvability:   Tim Berners Lee defines    http://www.w3.org/Talks/1998/0415-Evolvability/slide3-1.htm 

Google = evolvability  about 8,210  July 19, 2002; about 21,400 Oct. 22, 2004

See also under interoperability

facet:  Ranganathan was the first to introduce the word "facet" into library and information science, and the first to consistently develop the theory of facet analysis. A facet is, simply put, a category. Taylor defines facets as "clearly defined, mutually exclusive, and collectively exhaustive aspects, properties, or characteristics of a class or specific subject." Ranganathan demonstrated that analysis, which is the process of breaking down subjects into their elemental concepts, and synthesis, the process of recombining those concepts into subject strings, could be applied to all subjects, and demonstrated that this process could be systematized. (Taylor pp 320- 321; Foskett p 390). The phrase "analytico- synthetic classification" derives from these two processes: analysis and synthesis.  Amanda Maple, "FACETED ACCESS: A REVIEW OF THE LITERATURE" Working Group on Faceted Access to Music, Music Library Association Annual Meeting, 10 February 1995 http://www.musiclibraryassoc.org/BCC/BCC-Historical/BCC95/95WGFAM2.html 

faceted classification: One of the most powerful, yet least understood, methods of organizing information. Most folks, when thinking about organizing objects or information, immediately think of a hierarchical, or taxonomic, organization; a top- down structure, where you start with a number of broad categories that get ever more detailed, until you arrive at the object. In such structures, each object has a single home, and typically, one path to get there -- this is how things are organized in "the real world", where each item can only be in one place. Oftentimes, when thinking of organizing information, a hierarchy is where people begin (think Yahoo!).  Faceted classification, on the other hand, is a bottom- up scheme. Here, each object is tagged with a certain set of attributes and values (these are the facets), and the organization of these objects emerges from this classification, and how a user chooses to access them. ... Faceted classification allows for exploration directed by the user, where a large dataset is progressively filtered through the user's various choices, until arriving at a manageable set that meet the users' basic criteria. Instead of sifting through a pre- determined hierarchy, the items are organized on- the- fly, based on their inherent qualities. [Peter Merholz "Innovation in classification" Sept. 23, 2001] http://www.peterme.com/archives/00000063.html

The use of facets in information retrieval did not originate with Ranganathan. In the 18th century, a Frenchman named Condorcet devised what we would now call a faceted classification scheme for organizing information about objects or facts. (Whitrow) The Dewey Decimal Classification, first published in 1876, contained elements of facet analysis. Dewey recognized four facets common to all basic classes: bibliographic form, time, place, and general subjects (such as statistics or research) that at times are related to other subjects. (Foskett pp 176-7) Dewey provided for "number building" to combine two or more facets to express a complex subject. (Taylor p 320) The Universal Decimal Classification, based on the Dewey Decimal Classification and first published in 1905, was intended to be an international classification scheme. It also had elements of a faceted structure, and partly influenced Ranganathan's thinking. (Foskett p 349; Vickery pp 12- 14)  Amanda Maple, "FACETED ACCESS: A REVIEW OF THE LITERATURE" Working Group on Faceted Access to Music, Music Library Association Annual Meeting, 10 February 1995  http://www.musiclibraryassoc.org/BCC/BCC-Historical/BCC95/95WGFAM2.html 

faceted metadata: Composed of orthogonal [mutually independent] sets of categories. For example, in the domain of architectural images, some possible facets might be Materials (concrete, brick, wood, etc.), Styles (Baroque, Gothic, Ming, etc .... and so on. [Jennifer English et. al "Flexible search and navigation using faceted metadata" 2002] http://bailando.sims.berkeley.edu/papers/chi02_short_paper.pdf

Google = about 360 July 19, 2002; about 2,530 Oct. 22, 2004

fractal nature of the web: http://www.w3.org/DesignIssues/Fractal.html Tim Berners- Lee, Commentary on architecture, Fractal nature of the web, first draft  

Society has to be fractal - people want to be involved on a lot of different levels. The need for things that are local and special will create enclaves. And those will give us the diversity of ideas we need to survive. Tim Berners Lee, in "The father of the web", Evan Schwartz, Wired Mar. 1997 http://www.wired.com/wired/archive/5.03/ff_father_pr.html

GIS Geographic Information Systems: Maps have traditionally been used to explore the Earth and to exploit its resources. GIS technology is an expansion of cartographic science. Geographic information systems (GIS) technology can be used for scientific investigations, resource management, and development planning. It has enhanced the efficiency and analytic power of traditional mapping. GIS technology is becoming an essential tool in the effort to understand the process of global change.  [Is GIS in your future?  Boston Chapter, Special Libraries Association meeting, Mar. 12. 2002] http://www.sla.org/chapter/cbos/meetings/fy02/sci_tech.htm

Good Informatics Practices Guidance Document (GIP): A newly drafted comprehensive body of information of regulatory requirements in the form of existing (GLP, GMP, GCP and Part 11) and currently used standards compiled in one reference guide for an IT system of a life science or healthcare environment. http://www.lsit.org/initiatives/gip.php

GUI Graphical User Interface: Computers & computing glossary

granularity: <jargon, parallel> The size of the units of code under consideration in some context The term generally refers to the level of detail at which code is considered, e.g. "You can specify the granularity for this profiling tool". The most common computing use is in parallelism where "fine grain parallelism" means individual tasks are relatively small in terms of code size and execution time, "coarse grain" is the opposite. You talk about the "granularity" of the parallelism. The smaller the granularity, the greater the potential for parallelism and hence speed- up but the greater the overheads of synchronisation and communication. [FOLDOC 1997] 

The extent to which a system contains separate components (like granules). The more components in a system - or the greater the granularity - the more flexible it is. [Webopedia] http://www.webopedia.com/TERM/g/granularity.html

Choosing different levels of granularity, i.e., imposing different quality criteria on models built by homology from representative, experimentally determined [protein] structures, leads to different numbers of family representatives as targets. [NIGMS Structural Genomics Targets Workshop February 11-12, 1999] http://www.nigms.nih.gov/news/meetings/structural_genomics_targets.html

Concept of granularity, ISWorld Mailing List, Michael Chilton, 2001 http://www.isworld.org/isworldarchives/research.asp#  

Level of detail seems to be the essence of granularity.

Google = about  250,000 July 19, 2002; about 454,000 Oct. 22, 2004

health information data: Includes Clinical data captured during the process of diagnosis and treatment. Epidemiological databases , that aggregate data about a population. Demographic data used to identify and communicate with and about an individual. Financial data derived from the care process or aggregated for an organization or population. Research data gathered as a part of care and used for research or gathered for specific research purposes in clinical trials. Reference data that interacts with the care of the individual or with the healthcare deliver systems, like a formulary, protocol, care plan, clinical alerts or reminders, etc. Coded data that is translated into a standard nomenclature or classification so that it may be aggregated, analyzed, and compared.  [Health Information Management; Professional definitions, Committees on Professional Development, American Health Information Management Association, 1999, 2000] http://www.ahima.org/infocenter/definitions/HIMprofessionaldefinition.htm

health information management:  Health information management improves the quality of healthcare by insuring that the best information is available to make any healthcare decision. Health information management professionals manage healthcare data and information resources. The profession encompasses services in planning, collecting, aggregating, analyzing, and disseminating individual patient and aggregate clinical data. It serves the healthcare industry including: patient care organizations, payers, research and policy agencies, and other healthcare- related industries.  [Health Information Management; Professional definitions, Committees on Professional Development, American Health Information Management Association, 1999, 2000] http://www.ahima.org/infocenter/definitions/HIMprofessionaldefinition.htm

Google = about 56,700  Jan. 2, 2003; about 145,000 Oct. 22, 2004

heavyweight ontologies: Heavyweight ontologies, by contrast [to lightweight], contain class hierarchies, constraints, and inference rules. It takes a long time and many resources to develop and maintain them and it is uncertain if there will be a benefit from this extra effort. Resource Description Framework (RDF) and Web Ontology Language (OWL) of the World-Wide Web Consortium (W3C) are technologies designed to model heavyweight ontologies. Topic Maps are Emerging: Why Should I Care?  H. Holger Rath,  http://www.idealliance.org/papers/dx_xmle04/papers/03-01-03/03-01-03.html 

Google = about 21 July 19, 2002; about 60 Oct. 22, 2004; about 70 May 2, 2005
heavyweight taxonomies, heavyweight taxonomy = 0 [except for this glossary]

heterogeneous data:

informatics: The study of the application of computer and statistical techniques to the management of information. In genome projects, informatics includes the development of methods to search databases quickly, to analyse DNA sequence information, and to predict protein sequence and structure from DNA sequence data. ORD Office of Rare Diseases, NIH glossary http://ord.aspensys.com/asp/resources/glossary_a-e.asp#A 

Narrower terms: bioinformatics; cheminformatics; Computers & computing glossary clinical informatics, molecular informatics,  Biomaterials matinformatics research informatics; Drug discovery & development life sciences informatics, Intellectual property & legal glossary;  patinformatics; Molecular imaging image informatics;  pharmacoinformatics, pharmainformatics Proteomics protein informatics 

information -- how much?  How Much Information 2003, School of Information Science and Systems, Univ. of California, Berkeley, 2003 http://www.sims.berkeley.edu/research/projects/how-much-info-2003/index.htm 

information architecture: "Involves the design of organization, labeling, navigation, and searching systems to help people find and manage information more successfully."  Lou Rosenfeld, Peter Morville interview quoted in Mark Hurst "About Information Architecture, Apr. 3, 2000] http://www.goodexperience.com/columns/040300infoarch.html

Google = about 132,000 July 19, 2002; about 258,000 July 3, 2003; about 622,000 Oct. 22, 2004

Information architecture glossary, Kat Hagedorn, Argus Associates, 2000, 60 + definitions http://argus-acia.com/white_papers/iaglossary.html

information ecology: CSTB is contemplating a major initiative that would examine the rise of new forms of content, changes in media use patterns and their implications, changes in the supply of different kinds of content or media and their implications (e.g., for access, use, and the evolution of specific industries or institutions), and such ramifications as growing potential for manipulation of digital information, coping with data overload (data mining, visualization, and other data-intensive applications), and the internationalization of content production, ownership, and use. "Under Development" Computer Science and Telecommunications Board, US National Academics, http://www7.nationalacademies.org/cstb/projects_under_development.html

Google = about 11,100 Oct. 22, 2004

information extraction: Computers & computing glossary

information harvesting: See under Knowledge Discovery in Databases KDD

Google = about 871 July 19, 2002; about 1,230 July 3, 2003; about 1,730 Oct. 22, 2004; about 1,140,000 June 22, 2007

information integration: Our research group is developing intelligent techniques to enable rapid and efficient information integration. The focus of our research has been on the technologies required for constructing distributed, integrated applications from online sources. This research includes: Information Extraction: Machine learning techniques for extracting information from online sources; Source Modeling: Constructing a semantic model of wrapped sources so that they can be automatically integrated with other sources; Record Linkage: Learning how to align records across sources; Data Integration: Generating plans to automatically integrate data across sources; Plan Execution: Representing, defining, and efficiently executing integration plans in the Web environment; Constraint-based Integration  Interactive constraint-based planning and integration for the Web environment. Information Integration Research Group, Intelligent Systems Division, Information Sciences Institute (ISI), University of Southern California http://www.isi.edu/integration/

Google = about 4,430,000 July 3, 2003; about 1,080,000 June 22, 2007

information management:  Information services of various kinds are fundamental to the discovery, development and use of medicines. Within the pharmaceutical industry, often regarded as the epitome of the 'information intensive' industry, research information units provide both external and internal information provision and management to discovery and development programmes, while medical information units provide in- depth information on the company's products to external doctors, pharmacists, etc., and commercial information units handle information on competitors, marketing data, etc. Additionally, information personnel are involved in activities such as records management and archiving, regulatory affairs, data administration, IT support, and many more. Within the NHS [National Health Service, UK] , Drug Information Pharmacists provide information services on effective use of medicines to all healthcare professions, and are also involved in databases compilation, records management, current awareness etc. The move towards evidence- based medicine, with consequent need for evaluation and presentation of information, is of obvious importance to this group. Other sectors with a heavy reliance on the handling pharmaceutical information and knowledge include publishing, database production, software services, and consultancy of varied kinds.  [MSc in Pharmaceutical Information Management, City Univ. London, UK, Dept of Information Science,  Introduction, 2002 ]http://www.soi.city.ac.uk/organisation/is/teaching/pim/

Narrower term: health information management

Google = about 1,470,000 Jan. 2, 2003; about 4,200,000 Oct. 22, 2004

information overload: Biomedicine is in the middle of revolutionary advances. Genome projects, microassay methods like DNA chips, advanced radiation sources for crystallography and other instrumentation, as well as new imaging methods, have exceeded all expectations, and in the process have generated a dramatic information overload that requires new resources for handling, analyzing and interpreting data. Delays in the exploitation of the discoveries will be costly in terms of health benefits for individuals and will adversely affect the economic edge of the country. [Opportunities in Molecular Biomedicine in the Era of Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

Many of today's problems stem from information overload and there is a desperate need for innovative software that can wade through the morass of information and present visually what we know.  The development of such tools will depend critically on further interactions between the computer scientists and the biologists so that the tools address the right questions, but are designed in a flexible and computationally efficient manner.  It is my hope that we will see these solutions published in the biological or computational literature.  Richard J. Roberts, The early days of bioinformatics publishing, Bioinformatics 16 (1): 2-4, 2000

"Information overload" is not an overstatement these days. One of the biggest challenges is to deal with the tidal wave of data, filter out extraneous noise and poor quality data, and assimilate and integrate information on a previously unimagined scale

Google = about  118,000 July 19, 2002; about 249,000 Oct. 22, 2004

Where's my stuff? Ways to help with information overload, Mary Chitty, SLA presentation June 10, 2002, Los Angeles CA

information retrieval: 

information theory: Algorithms & data analysis glossary

information visualization: The direct visualization of a representation of selected features or elements of complex multi- dimensional data. Data that can be used to create a visualization includes text, image data, sound, voice, video - and of course, all kinds of numerical data. Our visual analysis systems also provide the tools to interact with the data that has been visualized so that users can explore, discover and learn. Users do not look at static images, but can subset the data, run queries, do time sequence studies and create categories and correlations of data type. [Pacific Northwest National Lab, About Visualization at PNNL, 1999] http://www.pnl.gov/infoviz/

Google = about 28,100 July 19, 2002; about 94,200 Oct. 22, 2004

Information visualization resources on the web, 2002 http://graphics.stanford.edu/courses/cs348c-96-fall/resources.html

Related term: data visualization; Broader term: visualization

informational repositories: A new strategy that allows universities to apply serious, systematic leverage to accelerate changes taking place in scholarship and scholarly communication, both moving beyond their historic relatively passive role of supporting established publishers in modernizing scholarly publishing through the licensing of digital content, and also scaling up beyond ad-hoc alliances, partnerships, and support arrangements with a few select faculty pioneers exploring more transformative new uses of the digital medium. Clifford Lynch, Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age, ARL Bimonthly Report 226, Feb. 2003 http://www.arl.org/newsltr/226/ir.html

DSpace, MIT http://www.dspace.org/

integrated taxonomy: We developed a comprehensive help taxonomy by combining both user interface and help system attributes, ranging from help access interface, presentation, and supporting knowledge structure, to implementation. The taxonomy systematically identifies independent axes along which help can be categorized which in turn encloses a space of help categories in which to place currently existing help research, and identifies distinct help software architectural features which contrast pros and cons in different approaches to implement help systems. The taxonomy projects a vision of what help can be like if it is on a par with advances in user interface technology, and desirable design features of help system architectures which are in the progressive direction along with the user interface software tools.  [Piyawadee "Noi" Sukaviriya, An Integrated Taxonomy of Online Help Based on User Interface View, GVU, Georgia Institute of Technology, GIT-GVU-91-20] http://www.cc.gatech.edu/gvu/reports/1991/abstracts/91-20.html

Google = about 85 July 19, 2002; about 353 Oct. 22, 2004

integrated view definitions:

Related terms: data mediation, knowledge based mediation

integration: Bioinformatics glossary

interoperability: The ability of two or more systems or components to exchange information and to use the information that has been exchanged. [Institute of Electrical and Electronics Engineers. IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries. New York, NY: 1990] http://www.sei.cmu.edu/str/indexes/glossary/interoperability.html

Enabling heterogeneous databases to function in an integrated way, sometimes refers to cross platform functionality and operability across relational, object- oriented, and non- standard types of databases.

Google = about 1,080,000 July 19, 2002; about 2,380,000 Oct. 22, 2004

Related terms: metadata, ontology, taxonomies ; Narrower terms: ontology interoperability,  semantic interoperability, software interoperability

invisible web:  For this study, we have avoided the term "invisible Web" because it is inaccurate. The only thing "invisible" about searchable databases is that they are not indexable nor able to be queried by conventional search engines. http://www.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp

Those parts of the web which are inaccessible to current search engines. A straightforward example was PubMed/ Medline (until Google started indexing it.) You still can't usually access proprietary (fee- based) databases such as Thomson Dialog or Lexis- Nexis. except directly. Until recently PDF documents and PowerPoint slides were inaccessible to search engines.   

Google = about 17,300 July 19, 2002; about 278,000 Oct. 22, 2004

Direct Search, Gary Price, George Washington Univ. US gary@freepint.com
Invisible Web: Database contents rarely found in Search Engines, Univ. of California- Berkeley, Spring 2001 http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html

Related terms: deep web, semantic web

just in time information: 90,200 websites were found with this phrase by Google on May 23, 2007. An increasing need as we are deluged with information and data -- and still need time to reflect, discuss and think about what all these mean.

Google = about 2,900 March 14, 2002, about 3,400 July 19, 2002; about 51,600 Feb. 21, 2006; about 88,400 May 7, 2007

Just-In-Time Information Retrieval. Bradley J. Rhodes. Ph.D. Dissertation, MIT Media Lab, May 2000. Just in time retrieval agents Bradley J. Rhodes http://www.research.ibm.com/journal/sj/393/part2/rhodes.html

Related terms: information overload, remembrance agents; Bioinformatics modularity

Knowledge Discovery in Databases (KDD): Algorithms & data analysis glossary

knowledge integration:

Related terms: ontologies, semantics

knowledge management:  An organization's collective knowledge - and the ability to access it - comprises a key corporate asset. Smart organizations know that to maintain competitive advantage, they need to manage their data, information, and knowledge effectively and systematically. Knowledge management involves much more than compiling data and retrieving information. It should be seen as an overarching concept that combines a management philosophy with data warehousing, workflow strategies, database management, and knowledge distribution in a network computing environment. [William A. Woods "Knowledge Management Needs Effective Search Technology" Sun Journal] http://www.sun.com/dot-com/sunjournal/V2N1/03_feat2a.html

Google = about 826,000 July 19, 2002; about 3,520,000 Oct. 22, 2004

Knowledge Management, FDA, 2004 http://www.fda.gov/cdrh/strategic/km.html 

Virtual Library: Knowledge Management, May 2000   http://www.brint.com/km/ Definition, articles, white papers, interviews, business and technology library, periodicals and publications, “out of box thinking”, “movers and shakers”, “think tank”, calendar of events, emerging topics. 
Knowledge Management definitions,
Charlie Matthews, VisualInterconnections, 2002 http://www.visualinterconnections.com/CEM/definitions.htm
KM Glossary
, GOTCHA, Univ. of California Berkeley, 1999  About 50 terms. http://sims.berkeley.edu/courses/is213/s99/Projects/P9/web_site/glossary.htm 

Related terms: ontologies, paraphrase problem, taxonomies

knowledge risk: Business of biopharmaceuticals glossary

laboratory informatics:  

The specialized application of information technology to maximize laboratory operations. Laboratory informatics encompasses data acquisition, data processing, laboratory information management system (LIMS), laboratory automation, scientific data management (including data analysis and long- term archiving), and electronic laboratory notebooks. Focus is on the application of this technology in analytical, production, and R&D laboratories.  Graduate Programs: Laboratory Informatics, Indiana Univ. School of Informatics, US  http://www.informatics.iupui.edu/Academics/graduate/laboratory_informatics/index.php

Related term: Drug discovery & development  LIMS

Laboratory Informatics Primer, Waters Corp http://www.waters.com/WatersDivision/ContentD.asp?watersit=EGOO-6M3TVN 

Google = about 1250 Dec. 31, 2002; about 3,000 Oct. 22, 2004

lexical semantics: http://en.wikipedia.org/wiki/Lexical_semantics 

lexicon: A machine- readable dictionary that may contain a good deal of additional information about the properties of the words, notated in a form that parsers can utilize. [Bob Futrelle, A brief introduction to NLP, BIONLP.org, , Computer Science, Northeastern Univ., US, 2002]  http://www.ccs.neu.edu/home/futrelle/bionlp/intro.html

A linguistics term (words and their definitions), an artificial intelligence term.  Sometimes a synonym for glossary or dictionary.

Google = about 768,000 July 19, 2002; about 1,960,000 Oct. 22, 2004

life sciences informatics: Informatics are essential at every step of genomics- based drug discovery and development. The commercial landscape of life sciences information technology has changed dramatically in the last few years. Bioinformatics, in particular, has gone through a dramatic boom/bust. While IT companies are looking to the drug discovery and development arena as a new market opportunity, pharmaceutical companies  are faced with rising pressure to reduce (or at least control) costs, and have a growing need for new informatics tools to help manage the influx of data from genomics, and turn that data into tomorrow's drugs. Key IT tools, such as high- performance computing, Web services, and grids, are being used to improve the speed and efficiency of drug discovery and development. True breakthroughs are still lacking, particularly in key areas such as gene prediction, data mining, protein structure modeling and prediction, and modeling of complex biological systems. However, most experts agree that IT and bioinformatics are essential to reaching the improved productivity the pharmaceutical industry craves.  

lightweight ontologies: Topic maps are seen as lightweight ontologies because they are able to model knowledge in a very ‘shallow’ way (e.g. just topics, their classes, occurrences, and associations, but no class hierarchies, constraints, or inference rules). Even ‘shallow’ topic maps are already very useful without having put large investments in their creation. Topic Maps are Emerging: Why Should I Care?  H. Holger Rath,  http://www.idealliance.org/papers/dx_xmle04/papers/03-01-03/03-01-03.html 

Google = about 154 July 19, 2002; about 287 Oct. 22, 2004; about 274 May 2, 2005

Compare: heavyweight ontologies 

lightweight taxonomies: Existing ontologies vary in a continuum from lightweight taxonomies (thesaura or conceptual vocabularies) to rigorous formalizations. [Manuela Viezzer, Ontologies and conceptual modeling, 2000-08-31] http://www.cs.bham.ac.uk/~mxv/publications/onto_engineering/node1.html

Google = about 5 July 19, 2002; about 4 Oct. 22, 2004

logic based ontologies: Very expressive, model is a set of theories, well defined semantics,  Automatic derived classification taxonomies, Concepts are defined and primitive. [Robert Stevens' slides, Univ. of Manchester, UK at Synopsis of the Bio- Ontologies Workshop at the EBI for MGED, Dec. 5, 2001] http://www.cbil.upenn.edu/Ontology/EBI_Bioontologies_Workshop.html Some powerpoints still on web.

Google = about 23 July 19, 2002; about 71 July 14, 2004

lower ontologies: See under middle ontologies

Google = "lower ontologies" about 62 "lower level ontologies" about 134 Aug. 8, 2002

machine-readable: See under metadata

Google= about 303,000 July 19, 2002; about 535,000 Oct. 22, 2004

machine-understandable: See under metadata

Google= about 3,730 July 19, 2002; about 8,950 July 14, 2004

markup languages: Computers & computing glossary 

Google = about 639,000 Aug. 9, 2002; about 170,000 Oct. 22, 2004

mash-up http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid

Google = about 22,100,000 Oct. 27, 2006

Medbiquitous Consortium: Technology standards based on XML and web services.  http://www.medbiq.org/index.html 

medical informatics: The field of information science concerned with the analysis and dissemination of medical data through the application of computers to various aspects of health care and medicine. [MeSH, 1987] 

Medical informatics has to do with all aspects of understanding and promoting the effective organization, analysis, management, and use of information in health care. While the field of medical informatics shares the general scope of these interests with some other health care specialties and disciplines, medical informatics has developed its own areas of emphasis and approaches that have set it apart from other disciplines and specialties. For one, a common thread through medical informatics has been the emphasis on technology as an integral tool to help organize, analyze, manage, and use information. In addition, as professionals involved at the intersection of information and technology and health care, those in medical informatics have historically tended to be engaged in the research, development, and evaluation side of things, and in studying and teaching the theoretical and methodological underpinnings of data applications in health care. However, today medical informatics also counts among its profession many whose activities are focused on dimensions that include the administration and everyday collection and use of information in health care. What is Medical Informatics? History of MEdical Informatics, AMIA American MEdical Informatics Association http://www.amia.org/history/what.html 

medical Informatics: Consisting of required course work concerning computer applications in medicine, computer- assisted medical decision making, biomedical imaging, and bioinformatics. Mark Musen, Design and Use of Clinical Ontologies: Curricular Goals for the Education of Health Telematics Professionals, Stanford Medical Informatics, 1999 http://smi-web.stanford.edu/pubs/SMI_Reports/SMI-1999-0767.pdf

Google = about 163,000 July 19, 2002; about 479,000 Oct. 22, 2004, about 6,960,000 Oct. 3, 2005

metadata: Could elevate the status of the web from machine- readable to something we might call machine- understandable. Metadata is "data about data" or specifically in our current context "data describing web resources." The distinction between "data" and "metadata" is not an absolute one; it is a distinction created primarily by a particular application ("one application's metadata is another application's data"). [W3C, "Introduction to RDF Metadata" 1997] http://www.w3.org/TR/NOTE-rdf-simple-intro

Metadata is machine understandable information for the web. The W3C Metadata Activity addressed the combined needs of several groups for a common framework to express assertions about information on the Web, and was superceded by the W3C Semantic Web Activity.  [W3C, Metadata and Resource Description, W3C Technology and Society Domain, 2001]http://www.w3.org/Metadata/

Information about data that enables intelligent, efficient access and management of data. … metadata is always less than the data. [Robyne M. Sumpter “Whitepaper on Data Management” Lawrence Livermore National Laboratory, February 10, 1994] http://www.llnl.gov/liv_comp/metadata/papers/whitepaper-draft.html  more on metadata Ontologies glossary

Google = about  1,640,000 July 19, 2002; about 4,850,000 Oct. 22, 2004; about 25,600,000 May 9, 2005;  about 62,700,000 May 7, 2007

Narrower terms: Dublin Core Metadata Initiative,  faceted metadata Related terms: interoperability, RDF, semantic web 

micro-theories: An ontology about a specific domain, that fits within, and for the most part is consistent with, an ontology with a broader scope. For example, structural biology fits within the larger context of biology. Structural biology will have its own terminology and specific algorithms that apply within the specific domain, but may not be useful or identical to, for example, the genome community. [Lawrence Berkeley Lab "Advanced Computational Structural Genomics" Glossary]

Google = about 953 July 19, 2002; about 8,670 Oct. 22, 2004

modularity: Bioinformatics glossary

molecular informatics: The effective use of information derived from genomics and proteomics is of central importance and the ability to identify the most important data, to assess its accuracy and to be aware of any assumptions and limitations of hypotheses and predictive models is absolutely essential. Whereas the development of predictive models based on analogy has been very successful in chemistry and cheminformatics, the complex nature of biomolecular systems limits similar transference within bioinformatics. Without a critical analysis, in- silico discovery will be unable to be effectively integrated in the field of molecular informatics. The following themes will be covered: knowledge discovery and data mining, rational drug design, prediction of small molecule bioavailability (ADME Tox) properties, protein structure and function determination, new methods of drug- target modeling, cellular metabolism, and the use of high- throughput methods (biochips) for acquiring gene expression and protein binding information. [Beilstein- Institut, Molecular Informatics: Confronting Complexity International -Workshop May 13- 16 2002]  http://www.beilstein-institut.de/pdf_files/bozen_02_scientific_program.pdf

Unilever is investing over £13M to establish a new world- leading research group within the Department of Chemistry [Univ. of Cambridge, UK] in the emerging field of Molecular Informatics. .. New methods will be devised for creating, manipulating and storing molecular data to deepen our understanding of molecules and their properties and to allow novel in- silico experimentation. Inter- disciplinary research is a fundamental goal of the centre, integrating chemical, biological and materials sciences through molecular informatics. [Cambridge Univ. Chemical Laboratory, UK, 2000-2001] http://www-ucc.ch.cam.ac.uk/

Google = about  2,580 July 19, 2002; about 4,410 Oct. 22, 2004

molecular information theory: Algorithms & data analysis glossary

molecular taxonomy: Cancer genomics glossary 

"molecular taxonomy" Google = about 1,650 July 19, 2002; about 5,260 Oct. 22, 2004
"molecular taxonomies" Google = about 11 July 19, 2002; about 106, Oct. 22, 2004

Broader term: taxonomy

nanopublishing: A term coined by Jeff Jarvis, head of content, technology, and strategic development for Advance. This is part of the Newhouse media group that owns Conde Nast, among other things. In the past, Jarvis started Entertainment Weekly. Now, he's a committed blogger and his company has put its money where his mouth is, that is, in Pyra, the company behind Blogger. Jim McClellan, New biz on the blog, Guardian Jan. 30, 2003 http://www.guardian.co.uk/online/story/0,3605,884658,00.html

National Center for Biomedical Ontology: http://www.bioontology.org/index.html 

natural language ontologies: Hand crafted, flexible but difficult to evolve, maintain and keep consistent, with weak semantics. Example Gene Ontology [Robert Stevens' slides, Univ. of Manchester, UK at Synopsis of the Bio-Ontologies Workshop at the EBI for MGED, Dec. 5, 2001] http://www.cbil.upenn.edu/Ontology/EBI_Bioontologies_Workshop.html

Google = about 69 July 19, 2002; about 96 Oct. 22, 2004

Natural Language Processing NLP: <artificial intelligence> (NLP) Computer understanding, analysis, manipulation, and/or generation of natural language. This can refer to anything from fairly simple string- manipulation tasks like stemming, or building concordances of natural language texts, to higher- level AI [artificial intelligence] -like tasks like processing user queries in natural language. [FOLDOC]  

The newly emergent interest in natural language processing for biology has been christened "Information Extraction". But work in this area has been going on for many decades under different names and this site includes a good deal of information about past and current work in NLP and in information extraction for biology in particular.  [BIONLP.org, Bob Futrelle, Computer Science, Northeastern Univ., US, 2002] http://www.ccs.neu.edu/home/futrelle/bionlp/

Google = about 166,000 July 19, 2002; about 471,000 Oct. 22, 2004

START Natural Language Question Answering System, InfoLab Group, Computer Science and Artificial Intelligence Lab, MIT  http://www.ai.mit.edu/projects/infolab/start-system.html 

navigational taxonomies: Aimed at discovering information through browsing. Once again the taxonomy provides a controlled vocabulary, but rather than using it in the background for manipulating queries, you can display this taxonomy to knowledge workers to help them find the information they need. The navigational taxonomy consists of labels applied to categories of content based on knowledge workers’ mental models of how the information is organized. ... A navigational taxonomy is based on user behavior and not on content. As a result, the category labels may be organized differently from the concept- based descriptive taxonomy, and they also may contain words or phrases that would not meet the standards of a descriptive taxonomy. ...  navigational taxonomies are often specialized and unique to an instance of information presentation (a portal, a site, an intranet), and multiple content management systems do not typically reuse them as they would a descriptive taxonomy. Navigational taxonomies are therefore not governed by the same rules about which taxonomy terms can be changed.  Susan Conway and Char Sligar, "What is a taxonomy" Unlocking Knowledge Assets, Chapter 6, Building Taxonomies, Microsoft Press, 2002 http://www.microsoft.com/mspress/books/sampchap/5516a.asp

Google = about 21 July 19, 2002; about 27 Oct. 22, 2004; about 83 July 9, 2007

OIL Ontology Inference Layer: A proposal for a web- based representation and inference layer for ontologies, which combines the widely used modelling primitives from frame- based languages with the formal semantics and reasoning services provided by description logics. It is compatible with RDF Schema (RDFS), and includes a precise semantics for describing term meanings (and thus also for describing implied information). http://www.ontoknowledge.org/oil/

object based ontologies: Computers & computing glossary

Google = about 17,500 July 19, 2002

ontological commitment: An agreement to use a vocabulary (i.e., ask queries and make assertions) in a way that is consistent (but not complete) with respect to the theory specified by an ontology. We build agents that commit to ontologies. We design ontologies so we can share knowledge with and among these agents. [Tom Gruber, What is an ontology?" Knowledge Systems Lab, Stanford Univ. 2001] http://www-ksl.stanford.edu/kst/what-is-an-ontology.html

Google = about 2, 370 July 19, 2002; about 5,980 Oct. 22, 2004

ontology, ontologies:  A formal explicit specification of a shared conceptualization. In this context conceptualization refers to an abstract model of some phenomenon in the world that identifies that phenomenon's relevant concepts. Explicit means that the type of concepts used and the constraints on their use are explicitly defined, and formal means that the ontology should be machine understandable. ... Shared reflects the notion that an ontology captures consensual knowledge- that is, it is not restricted to some individual but is accepted by a group. Dieter Fensel et. al "OIL: An Ontology Infrastructure for the Semantic Web" IEEE Intelligent Systems, Mar/Apr. 2001  www.cs.vu.nl/~frankh/postscript/IEEE-IS01.pdf

The word "ontology" seems to generate a lot of controversy in discussions about AI  [artificial intelligence]. It has a long history in philosophy, in which it refers to the subject of existence. ... In the context of knowledge sharing, I use the term ontology to mean a specification of a conceptualization. That is, an ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. This definition is consistent with the usage of ontology as set- of- concept- definitions, but more general. And it is certainly a different sense of the word than its use in philosophy. What is important is what an ontology is for. My colleagues and I have been designing ontologies for the purpose of enabling knowledge sharing and reuse. In that context, an ontology is a specification used for making ontological commitments. ... Notes: 1) Ontologies are often equated with taxonomic hierarchies of classes, but class definitions, and the subsumption relation, but ontologies need not be limited to these forms. Tom Gruber, Stanford Univ. "What is an ontology?", 2001 http://www-ksl.stanford.edu/kst/what-is-an-ontology.html     more in Ontologies glossary

1.1 What is an ontology? W3C, Requirements for a web ontology language, working in progress] http://www.w3.org/TR/webont-req/#onto-def

Similar to a dictionary or glossary, but with greater detail and structure that enables computers to process its content. An ontology consists of a set of concepts, axioms, and relationships that describe a domain of interest. An upper ontology is limited to concepts that are meta, generic, abstract and philosophical, and therefore are general enough to address (at a high level) a broad range of domain areas. [IEEE, Standard Upper Ontology (SUO) Working Group, 2002]  http://suo.ieee.org/

Long used in philosophy and artificial intelligence, major differences from controlled vocabularies or taxonomies is the idea of making information "machine- understandable" as well as machine- readable, and amenable to logic (particularly by agreeing upon one specific meaning for a term.

Terminology of methods and techniques for defining, sharing, and merging ontologies, John F. Sowa, 2001.18 definitions, including formal ontology, mixed ontology, prototype type ontology, terminological ontology. http://users.bestweb.net/~sowa/ontology/gloss.htm

Human Ontology Resources, SOFG Standards and Ontologies for Functional Genomics, http://www.sofg.org/resources/human.html#cbil 

Google = ontology about 336,000 July 19, 2002; about 1,140,000 Oct. 1, 2003; about 1, 250,000 Oct. 22, 2004

Narrower terms: bottom- up ontologies, biomedical ontologies, common ontology, descriptive ontology, domain ontology, dynamic ontology, heavyweight ontologies, lightweight ontologies, logic based ontologies, micro- theories, middle ontologies, mixed ontologies,  taxonomies, natural language ontologies, navigational ontology, object based ontologies, orthogonal ontologies, pure ontologies, reusable ontologies, shared ontologies, simple ontologies, structured ontology, top- down ontology, upper ontologies; Functional genomics glossary Gene OntologyTM GO;  

Related terms: interoperability, metadata, OIL Ontology Inference Layer, ontological commitment, ontology annotation tools, ontology editors, ontology evolution, ontology interoperability, RDF, semantic web, web ontology language; Microarrays glossary Ontology Working Group

ontology annotation tools: Link unstructured and semistructured information sources with ontologies. [Dieter Fensel et. al "OIL: An Ontology Infrastructure for the Semantic Web" IEEE Intelligent Systems, Mar/Apr. 2001] www.cs.vu.nl/~frankh/postscript/IEEE-IS01.pdf

ontology editors: Help human knowledge engineers build ontologies - they support the definition of concept hierarchies, the definition attributes for concepts, and the definition of axioms and constraints. They must provide graphical interfaces and conform to existing standards in Web- based software development. They enable the inspecting, browsing, codifying, and modifying of ontologies, and they support ontology development and maintenance tasks. [Dieter Fensel et. al "OIL: An Ontology Infrastructure for the Semantic Web" IEEE Intelligent Systems, Mar/Apr. 2001] www.cs.vu.nl/~frankh/postscript/IEEE-IS01.pdf

Google = about 314  July 19, 2002; about 873 Oct. 22, 2004

Related term: Computers & computing glossary GUI Graphical User Interface

ontology evolution:   3.2 Ontology evolution, W3C, Requirements for a web ontology language, work in progress] http://www.w3.org/TR/webont-req/#goal-evolution

Google = about 234 July 19, 2002; about 886 Oct. 22, 2004

ontology interoperability: 3.3 Ontology interoperability,  W3C, Requirements for a web ontology language, work in progress http://www.w3.org/TR/webont-req/#goal-interoperability

Google = about 89 July 19, 2002; about 276 Oct. 1, 2003; about 284 Oct. 22, 2004

Broader term: interoperability

ontology language: An ontology must be encoded in some language. If one is using a simple ontology, few issues arise. However, if one is considering a more complex ontology, expressive power of a representation and reasoning language needs to be considered. As with any problem where a language is being chosen, it must be epistemologically adequate -- the language must be able to express the concepts in the domain. Deborah L. McGuinness, "Ontologies Come of Age". In Dieter Fensel, J im Hendler, Henry Lieberman, and Wolfgang Wahlster, editors. Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, 2002. http://www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-

Open Biomedical Ontologies OBO: A collaborative experiment involving developers of science-based ontologies who are establishing a set of principles for ontology development with the goal of creating a suite of orthogonal interoperable reference ontologies in the biomedical domain.  http://www.obofoundry.org/ 

organizational informatics: A field which studies the development and use of computerized information systems and communication systems in organizations. It includes social studies of their conception, design, effective implementation within organizations, maintenance, use, organizational value, conditions that foster risks of failures, and their effects for people and an organization's clients. It is an intellectually rich and practical research area. "Social Informatics" Indiana Univ, School of Library & Information Science  http://www.slis.indiana.edu/SI/oi1.html

Narrower term: social informatics

orthogonal ontologies: Independent, same basis for classification at all levels. Bernd G. Wenzel, Integration of industrial data: Overview, NeuroSTEP and Shell, 1996- http://www.tc184-sc4.org/SC4_Open/SC4_and_Working_Groups/WG10/N-DOCS/Files/wg10n116.pdf 1997 

Google = about 6 July 19, 2002; about 72 Oct. 22, 2004

Related term: pure ontologies. Compare mixed ontologies

orthogonal taxonomies: Independent taxonomies, disjoint, with no overlap parallel processing: The processing of program instructions by dividing them among multiple processor with the objective of running a program in less time [whatis.com] http://whatis.techtarget.com/definition/0,289893,sid9_gci212747,00.html

Google = about 24 July 19, 2002; about 45 Oct. 22, 2004

paraphrase problem: The situation that arises when the terminology used in the request is different from that used by the author. [William A. Woods, Sun Microsystems Research] http://research.sun.com/people/wwoods/ Conceptual Indexing for Precision Content Retrieval http://research.sun.com/knowledge/

Google = about 153 July 19, 2002; about 211 Oct. 22, 2004

Related term:  knowledge management

pattern, pattern language:  Patterns, discussion FAQ http://g.oswego.edu/dl/pd-FAQ/pd-FAQ.html 

phylogenetic taxonomy: Phylogenomics glossary 

Google = about 929 July 19, 2002; about 1,900 Oct. 22, 2004

portal: An entry or starting point on the web, with a mixture of content and services, usually capable of personalization.

Narrower term: web portal

precision: Percentage of unrelated material excluded by a specific query or search statement. 

Related terms: Genetic testing analytical specificity, clinical specificity Compare recall  

pure ontologies: The basis for classification is the same throughout the classification hierarchy. Such ontologies can be expected to be orthogonal. Here orthogonal will mean that classes at a level will be mutually exclusive. On the other hand an object can be a member of a class in more than one ontology. [Matthew West, Integration of Industrial Data for Exchange, Access and Sharing (IIDEAS), NIST, ISO TC184/SC4/WG10 N71, 1996]  http://www.nist.gov/sc4/wg_qc/wg10/current/n071/wg10n071.htm

Google = about 13 July 19, 2002; about 17 Oct. 22, 2004

Related term: orthogonal ontologies

query contraction: Needed when a search engine retrieves thousands of citations. May consist of additional (Boolean AND terms) or different (Boolean OR).

Google = about  26 July 19, 2002; about 130 Oct. 22, 2004

query expansion: Adding new and/ or different terms to a search statement (particularly when a search engine or database retrieve no hits). Often uses Boolean OR. 

Google = about 7,500 July 19, 2002; about 21,300 Oct. 22, 2004

Related terms: ontologies, taxonomies

RDF Resource Description Framework: Integrates a variety of web- based metadata activities including sitemaps, content ratings, stream channel definitions, search engine data collection (web crawling), digital library collections, and distributed authoring, using XML as an interchange syntax. The RDF specifications provide a lightweight ontology system to support the exchange of knowledge on the Web. [W3C, Semantic Web Activity: Resource Description Framework (RDF) Mar. 2001] http://www.w3.org/RDF/

Related term: knowledge management

RSS [Really Simply Syndication] feeds: A Web content syndication format based on XML. Cathleen Moore, Search engines target weblogs, InfoWorld, Mar. 17, 2003 http://www.infoworld.com/article/03/03/17/HNblogs_1.html

Newsreaders http://directory.google.com/Top/Reference/Libraries/Library_and_Informat...e

RSS 2.0 specifications, Dave Winer http://blogs.law.harvard.edu/tech/rss/

recall: The percentage of applicable material retrieved by a specific query or search statement. 

Compare precision. Related term: Genetic testing glossary sensitivity 

regulated information systems: Drug approvals glossary 

relevance: Percentage of truly related material retrieved by a specific query or search statement. 

Related terms: precision Genetic testing glossary analytical specificity, clinical specificity. Compare recall 

remembrance agents: A set of applications that watch over a user's shoulder and suggest information relevant to the current situation. While query- based memory aids help with direct  recall, remembrance agents are an augmented associative memory. [Bradley Rhodes, Remembrance Agents Because serendipity is too important to be left to chance..., 2001]  http://rhodes.www.media.mit.edu/people/rhodes/RA/

Google = about 673 July 19, 2002; about 549 Oct. 22, 2004

Related terms: collaborative filtering, just in time information

research informatics: Research glossary

resourceome: -Omes & -Omics glossary

reusable ontologies:  A key enabler for electronic Commerce, Richard Fikes, Knowledge Systems Lab, Stanford Univ.  http://ksl-web.stanford.edu/Reusable-ontol/index.html

Google = about 597 July 19, 2002; about 1,330 Oct. 1, 2003; about 778 Oct. 22, 2004

Related term: shared ontologies

reusable taxonomies: Metadata, Taxonomies and Content Reusabilities, Marcia Morante  http://adlcommunity.net/file.php/11/Documents/Eedo_Knowledgeware_Metadata_Taxonomies_and_Content_Reusability.pdf 

Google = about  5 July 19, 2002; about 8 Oct. 1, 2003; about 8 Oct. 22, 2004; about 8 June 22, 2007

Rosetta: A systems- level design language developed to address requirements specification for systems- on- chip designs. Rosetta specifically addresses problems associated with heterogeneity and complexity in current systems. Specifically, Rosetta allows designers to develop and integrate specifications written in multiple semantic models to provide language and semantic support for concurrent engineering of electronic systems.  Accellera Rosetta Standards Committee Homepage, EDA Industry Working Groups, 2002  http://www.eda.org/slds-rosetta/

SOAP Simple Object Access Protocol:  A lightweight protocol for exchange of information in a decentralized, distributed environment. [SOAP, W3C 1.1, work in progress] http://www.w3.org/TR/SOAP/

semantic data integration: Semantic data integration requires a shared understanding of the meaning of mathematical data. Until recently, math protocols provided no support for shared semantics beyond the meaning of the primitive data types and simply assumed that the communicating partners ``knew'' each other. An important task of the Computer Algebra community is to close this semantic gap. Several initiatives addressing this problem are underway (MP, OpenMath, MathBus) and we hope that more experience and a careful evaluation of the proposals will lead to a unifying solution. Olaf Bachmann, Hans Schönemann "A Proposal for Syntactic Data Integration for Math Protocols" Centre for Computer Algebra, Dept. of Mathematics, Univ. of Kaiserslautern, Germany http://www.mathematik.uni-kl.de/~zca/Reports_on_ca/10/paper_html/node1.html

Google = about  214 July 19, 2002; about 1,530 Oct. 22, 2004; about 23,900 June 22, 2007

semantic grid:  As the Semantic Web is to the Web, so is the Semantic Grid to the Grid. Rather than orthogonal activities, we see the emerging semantic web infrastructure as an infrastructure for grid computing applications. http://www.semanticgrid.org/

Google = about 190 July 19, 2002; about 5,470 Oct. 22, 2004; about 182,000 June 22, 2007

Related term: Computers & computing grid computing

semantic heterogeneity: Semantic heterogeneity in document encoding systems is a serious obstacle to the interoperability required to create a critical mass of content for the electronic publishing industry. This is a problem which persists even after a common syntax (e.g. XML) has been adopted, and sometimes even when common vocabularies are used.  [Scholarly Technology Group, Brown Univ., US Jan 2002  http://www.stg.brown.edu/news/2002/nist_report.html

Different databases use different controlled vocabularies, thesauri, taxonomies and/ or free text. 

Google = about 2,820 July 19, 2002; about 6,080 Oct. 22, 2004; about 78,700 June 22, 2007

Contrast with: structural heterogeneity Related terms: Natural Language Processing NLP;  Bioinformatics glossary databases, federated databases, integrated databases 

semantic interoperability: Jeff Heflin, James Hendler, Semantic interoperability on the web, Extreme Markup Languages, 2000  http://www.cs.umd.edu/projects/plus/SHOE/pubs/extreme2000.pdf

Google = about 7,280 Apr. 24, 2003; about 18,300 Oct. 22, 2004; about 330,000 June 22, 2007

semantic relationships: Denote concepts such as water, sea, and river, that are by definition permanent relationships; they arise from the definition of the subjects involved, and are not dependent on any particular document content. ... Foskett described three groups of semantic relationships: equivalence, hierarchical, and affinitive/associative. In equivalence relationships, more than one term denotes the same concept. These relationships are shown through cross- references in an alphabetical tool, and through juxtaposition in a classified tool. Hierarchical relationships are of two kinds: genus/ species and whole/ part. These relationships are shown through hierarchies in classified tools and with Broader and Narrower Term codes in alphabetical tools. Foskett described several kinds of affinitive/ associative relationships; these relationships are denoted by Related Term codes. (Foskett pp 72- 78)  Amanda Maple, "FACETED ACCESS: A REVIEW OF THE LITERATURE" Working Group on Faceted Access to Music, Music Library Association Annual Meeting, 10 February 1995  http://www.musiclibraryassoc.org/BCC/BCC-Historical/BCC95/95WGFAM2.html

Related term: syntactic relationships

semantic transparency: Within the context of interoperable XML- based information processing, "semantic transparency" means that machines and humans are presented with information that is both unambiguous (having a precise, predictably interpreted meaning) and meaningfully correct (simultaneously satisfying a number of integrity constraints). Computer agents, in particular, must exchange well- defined data in order to calculate and pass along "the correct answer." Semantic transparency first requires that small information objects as well as large information objects built from smaller ones are formally specified at a detailed level in terms of their fundamental characteristics, relationships, and natural integrity constraints, such that validation tools can apply heuristics to test information correctness. Given unambiguous semantic specification, both computing agents and humans can verify that XML- encoded information is meaningful and trustworthy. Managing Names and Ontologies: An XML Registry and Repository, Robin Cover (OASIS) http://www.sun.com/981201/xml/

semantic web: The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications. In order to make this vision a reality for the Web, supporting standards, technologies and policies must be designed to enable machines to make more sense of the Web, with the result of making the Web more useful for humans. 

Facilities and technologies to put machine- understandable data on the Web are rapidly becoming a high priority for many communities. For the Web to scale, programs must be able to share and process data even when these programs have been designed totally independently. The Web can reach its full potential only if it becomes a place where data can be shared and processed by automated tools as well as by people. [W3C, Semantic Web Activity Statement, Apr. 2001] http://www.w3.org/2001/sw/Activity

The first layer of the semantic Web consists of ontologies and taxonomies, like "A machine bolt is a type of screw." "A huge amount of this is being done very desperately in the realm of biotech, for the human genome and new drug development. When you look at a Web services description, you realize that it's really just a very small ontology"  Tim Berners Lee, August 30, 2001 keynote at Software Development East in Boston. [Alexandra Weber Morales "Web founder seeks simplicity" Show Daily Online, 2001] http://www.sdgnews.com/sd2001es_006/sd2001es_006.htm

Google = about 71,600 July 19, 2002; about 967,000 Oct. 22, 2004; about 19,800,000 June 22. 2007

Semantic Web Business Special Interest Group: http://business.semanticweb.org/
Semantic web challenge: http://challenge.semanticweb.org/ 
Semantic Web Community Portal http://www.semanticweb.org/
Semantic Web HCLS Health Care and Life Sciences Interest Group http://www.w3.org/2001/sw/hcls/ 

Broader term: web Related terms: metadata, ontology, RDF, taxonomies, XML. Compare: syntax 

semantics: How the information [in a data file] should be interpreted by others. [Russ Altman "Challenges for Biomedical Informatics and Pharmacogenomics, Stanford Medical Informatics, c.2001] http://www-smi.stanford.edu/pubs/SMI_Reports/SMI-2001-0898.pdf

shared ontologies: 3.1 Shared ontologies, W3C, Requirements for a web ontology language, work in progress http://www.w3.org/TR/webont-req/#goal-shared-ontologies

Designing Shared Ontologies, JOHO the Blog, 2004 http://www.hyperorg.com/blogger/mtarchive/003057.html 

Google = about 1,090  July 19, 2002; about 2,450 Oct. 1, 2003; about 2,520 Oct. 22, 2004

Related term: reusable ontologies

shared taxonomies: Shared Taxonomies, LouisRosenfeld.com, 2004 http://www.louisrosenfeld.com/home/bloug_archive/000276.html  

Google = about 12 July 19, 2002; about 70 Oct. 22, 2004; about 86 May 2, 2005; about 217 June 22, 2007

social informatics: Refers to the body of research and study that examines social aspects of computerization -- including the roles of information technology in social and organizational change and the ways that the social organization of information technologies are influenced by social forces and social practices. [1] SI includes studies and other analyses that are labeled as social impacts of computing, social analysis of computing, studies of computer- communication (CMC), information policy, "computers and society," organizational informatics, interpretive informatics, and so on. http://www.slis.indiana.edu/SI/concepts.html

The term "Social Informatics" emerged from a series of lively conversations in February and March 1996 among scholars with an interest in advancing critical scholarship about the social aspects of computerization, including Phil Agre, Jacques Berleur, Brenda Dervin, Andrew Dillon, Rob Kling, Mark Poster, Karen Ruhleder, Ben Shneiderman, Leigh Star and Barry Wellman. As the conversation developed, it became clear that labels that could energize scholars in one sub- community could readily turn off participants in other communities. Various participants preferred different labels; a sufficient consensus emerged around "Social Informatics" that it can serve as a working label.  ["Conceptions of social informatics" Indiana Univ., School of Library and Information Science, 2002] http://www.slis.indiana.edu/SI/concepts.html

A serviceable working conception of "social informatics" is that it identifies a body of research that examines the social aspects of computerization. A more formal definition is "the interdisciplinary study of the design, uses and consequences of information technologies that takes into account their interaction with institutional and cultural contexts." ... Social informatics has been a subject of systematic analytical and critical research for the last 25 years. Unfortunately, social informatics studies are scattered in the journals of several different fields, including computer science, information systems, information science and some social sciences. Each of these fields uses somewhat different nomenclature. This diversity of communication outlets and specialized terminologies makes it hard for many non- specialists (and even specialists) to locate important studies. [Rob Kling, What is social informatics and why does  it matter? D-Lib 5(1): Jan. 1999] http://www.dlib.org/dlib/january99/kling/01kling.html 

Social informatics HomePage http://www.slis.indiana.edu/SI/

Red Rock Eater News Service, Phil Agre, UCLA, US  http://polaris.gseis.ucla.edu/pagre/rre.html 

structural heterogeneity: Different databases use different fields, fieldnames and relationships between elements. This can also be a term in structural biology

Google = about  2,210 July 19, 2002; about 9,340 Oct.. 22, 2004

Compare semantic heterogeneity Related term: metadata

structure:  In a biological or anatomical context, the term structure is associated with two distinct concepts (meanings): 1. a material object generated as a result of coordinated gene expression, which necessarily consists of parts (e.g., hemoglobin molecule, cell, heart, human body); and 2. the manner of organization or interrelation of the parts that constitute a structure specified by the first definition (i.e., the structure of a structure). Both definitions emphasize the critical need for declaring the principles according to which units of organization can be defined in order to be able to state what is 'whole' and what is 'part'. Specifying the manner in which parts interrelate must satisfy two requirements: 1. to determine the kinds of parts of which various structures may be constituted; and 2. to state the manner of spatial organization of parts by describing their boundaries, continuities and attachments, as well as their location, orientation and spatial adjacencies in terms of qualitative coordinates (in addition to the quantitative geometric coordinates, which are embedded in the Visible Human data sets). [Cornelius Rosse, et. al., Visible Human, Know Thyself: The Digital Anatomist Dynamic Structural Abstraction, National Library of Medicine, US] http://www.nlm.nih.gov/research/visible/vhpconf2000/AUTHORS/ROSSE/TEXTINDX.HTM

Related terms: Cell biology glossary, Expression glossary Compare unstructured.

subsumption: http://ai.eecs.umich.edu/cogarch0/subsump/ 

Google = about 30,800 July 19, 2002; about 80,500 Oct. 22, 2004; about 159,000 May 2, 2005

syntactic heterogeneity: Fausto Giunchiglia, Pavel Shvaiko, rewritten by Stefano Zanobini, Semantic Matching, 2002 http://www.science.unitn.it/~tomasi/think/pdf/zanobini.pdf

Google = about  114 July 19, 2002; about 243 Oct. 1, 2003; about 201 Oct. 22, 2004; about 227 May 9, 2005

syntactic relationships: Denote otherwise unrelated concepts that are brought together as composite subjects in the documents being indexed. These relationships are not permanent, but rather ad hoc. ...  Syntactic relationships are displayed according to the syntax of a normal sentence, either through the syntax of the subject string (in precoordinate indexing), or through devices such as facet indicators (in postcoordinate indexing). The result of not providing for the display of syntactic relationships in postcoordinate systems results in users not being able to distinguish between different contexts for the same term. ... recent research in information retrieval also supports the use of syntactic as well as semantic relationships.  Amanda Maple, "FACETED ACCESS: A REVIEW OF THE LITERATURE" Working Group on Faceted Access to Music, Music Library Association Annual Meeting, 10 February 1995  http://theme.music.indiana.edu/tech_s/mla/facacc.rev

Related term: semantic relationships

syntax: How information is structured in a data file. [Russ Altman "Challenges for Biomedical Informatics and Pharmacogenomics, Stanford Medical Informatics, c.2001]  http://www-smi.stanford.edu/pubs/SMI_Reports/SMI-2001-0898.pdf

Compare semantics

taxonomies, taxonomy:  Taxonomies define a world- view because they specify which characteristics that compose each item count as important and then they lay out the relationships that exist between those characteristics. Taxonomies are political, value- laden instruments of organization that have a wide- array of assumptions embedded within them. Along more formal lines, a taxonomy is a structured vocabulary that identifies a single key term to represent a concept that could be described using several words. [Katherine C. Adams "Immersed in Structure: The Meaning and Function of Taxonomies" Internetworking Aug. 2000] http://www.internettg.org/newsletter/aug00/article_structure.html

Frustrations with search engines and information retrieval (and information overload) have led to increased interest in specialized taxonomies. A form of controlled vocabulary, with hierarchical relationships (broader terms, narrower terms) which provide additional suggestions for browsing, as do lateral relationships (related terms) and preferred terms. While there is theoretical interest in natural language processing, a very small percentage of web search engine queries actually use natural language processing successfully.

Directories such as Yahoo or the Open Directory Project are sometimes called taxonomies. In biology taxonomies are so associated with Linnaeus, and bioinformatics so dependent upon computers that ontology is almost always the preferred term in this context.

Google taxonomy = about  617,000 July 19, 2002, about 3,270,000 Oct. 1, 2003, about 3,190,000 Oct. 22, 2004

Narrower terms: bottom-up taxonomies, controlled vocabularies, descriptive taxonomies, domain taxonomies, dynamic taxonomies, integrated taxonomy, lightweight taxonomies, morphological taxonomies, navigational taxonomies, orthogonal taxonomies, shared taxonomies, top- down taxonomy; Cancer genomics glossary molecular taxonomies Phylogenomics glossary molecular taxonomy, phylogenetic taxonomy; 

Related terms: classifiers, query expansion; Broader term: ontologies

See also FAQ question # 4 which has more about taxonomies.

term mining:  Term Mining in Biomedicine, Sophia Ananiadou - University of Manchester, 2007 http://talks.cam.ac.uk/talk/index/6769 

Google = about 1,990 June 16, 2003; about 2,980 Oct. 22, 2004; about 40,100 June 22, 2007

text categorisation: See Algorithms & data analysis glossary under support vector machines

Google = about  902 "text categorization" 9,220 July 19, 2002 about 27,100 Oct. 22, 2004

text mining:  Usually data mining technologies mine knowledge from data with well-formed schemes such as relational tables. But, text data don't have such scheme, and information is described freely in the documents. Therefore, we focus on Natural Language Processing (NLP) technologies to extract such information. Using NLP technologies, documents are transformed into a collection of concepts, described using terms discovered in the text.

Usually, "text mining" is used to indicate a text search technique. But, we think of text mining as having more functions. Text mining technologies extract more information than just picking up keywords from texts: facts, author's intentions, their expectations, and their claims.  Tokyo Research Lab, IBM, Text Mining  http://www.trl.ibm.com/projects/textmining/index_e.htm 

Using data mining on unstructured data, such as the biomedical literature.  

Competition in the pharmaceutical industry has increasingly become based upon better recognition and analysis of information, much of which is available as published text.  Breakthrough Strategies for Text Mining in Pharmaceutical R&D, May 25, 2006, Philadelphia PA

Text Mining Glossary, ComputerWorld, 2004 http://www.computerworld.com/databasetopics/businessintelligence/story/0,10801,93967,00.html Includes Categorization, clustering, extraction, keyword search, natural language processing, taxonomy, and visualization.

Related terms:  natural language processing; Algorithms & data analysis glossary support vector machines

Google = about  20,600 July 19, 2002 about 39,300 July 3, 2003; about 113,000 Oct. 22, 2004; about 1,110,000 June 22, 2007

thesaurus, thesauri: See under controlled vocabulary

Google = thesaurus about  2,760,000  thesauri  about 448,000 July 19, 2002; thesaurus about 6,270,000 Oct. 22, 2004 

NISO Z39.19 Standard for Structure and Organization of Information Retrieval Thesauri  http://www.niso.org/standards/resources/Z39-19.html

top-down ontology: We spent the first six months attempting to design a top- down ontology of engineering. We accomplished very little until we selected a concrete system and example applications as contexts for our work. {Jay M. Tenenbaum Lessons from PACT and SHADE  Enterprise Integration Technologies Corporation and Stanford University, 1995] http://tools.org/EI/ICEIMT/archive/abstracts/PACT-SHADE.abstract

Google = about 10 July 19, 2002; about 19 Oct. 22, 2004

top-down taxonomy: Goes from the general to the specific. Can also mean user oriented. Jean Graef "Top down or bottom up" Montague Institute Review, 2001

Google = about  16 July 19, 2002  about 90 June 17, 2003; about 79 Oct. 22, 2004

topic maps: This specification provides a model and grammar for representing the structure of information resources used to define topics, and the associations (relationships) between topics Names, resources, and relationships are said to be characteristics of abstract subjects, which are called topics. Topics have their characteristics within scopes: i.e. the limited contexts within which the names and resources are regarded as their name, resource, and relationship characteristics One or more interrelated documents employing this grammar is called a “topic map.”  http://www.topicmaps.org/xtm/1.0/

(XML) Topic Maps, XML Cover Pages , Robin Cover, 2002 http://xml.coverpages.org/topicMaps.html

Google = about 23,400 July 19, 2002

UDDI: Business of biopharmaceuticals glossary 

UMLS Unified Medical Language System In 1986, the National Library of Medicine (NLM), began a long term research and development project to build a Unified Medical Language System ® (UMLS ® ). The purpose of the UMLS is to aid the development of systems that help health professionals and researchers retrieve and integrate electronic biomedical information from a variety of sources and to make it easy for users to link disparate information systems, including computer- based patient records, bibliographic databases, factual databases, and expert systems. The UMLS project develops "Knowledge Sources" that can be used by a wide variety of applications programs to overcome retrieval problems caused by differences in terminology and the scattering of relevant information across many databases.  [UMLS FactSheet, National Library of Medicine, NIH, US, 2002] http://www.nlm.nih.gov/pubs/factsheets/umls.html

unstructured data: Today, transforming unstructured data into a structured form is primarily a manual process; it is time consuming and costly. However, all leading software applications must leverage structured data to be effective. [About Mohomine] http://www.mohomine.com/about/index.asp

Generally free text, natural language.

Related term: natural language processing. Compare structured. 

Google = about  21,200 July 19, 2002

upper ontology: An upper ontology is limited to concepts that are meta, generic, abstract and philosophical, and therefore are general enough to address (at a high level) a broad range of domain areas. [Upper Ontology, IEEE Standard Upper Ontology Working Group] http://ontology.teknowledge.com/\

Google = about 11,000 Oct. 22, 2004

variance: One of the two components of measurement error (the other one being bias). Variance results from uncontrolled (or uncontrollable) variation that occurs in biological samples, experimental procedures, and arrays themselves;  

visualization:   A method of computing by which the enormous bandwidth and processing power of the human visual (eye- brain) system becomes an integral part of extracting knowledge from complex data.  It utilizes graphics and imaging techniques as well as knowledge of both data management and the human visual system.  [Lloyd Trenish, Visualization for Deep Thunder, IBM Research, 2002] http://www.research.ibm.com/weather/vis/w_vis.htm

Use of computer- generated graphics to make the information more accessible and interactive. Related term data mining

Narrower terms: data visualization, information visualization; Algorithms & data analysis glossary dendogram, heat map, profile chart

visualisation: As the quantity of data produced by simulations grows, so does the difficulty of extracting useful information. It is now clear that in many applications visual methods are the only practical way of extracting information from the data. Computer graphics and scientific visualisation techniques have become more important in the last few years with the increased availability of computing resource and of visualisation tools.  Visualisation is becoming one of the key tools for problem solving both in traditional areas such as visualisation of complex flow and in new applications areas like the planning of surgical operations using 3-D recontruction of anatomical sites using diagnostic images or the development of highly-realistic aeroplane simulators for pilot training.  DIRECT Development of an Interdisciplinary Roundtable for Emerging Computer Technologies,  Edinburgh University, Scotland  http://www.epcc.ed.ac.uk/DIRECT/vect.html 

Definitions and Rationale for Visualisation, D. Scott Brown, SIGGRAPH, 1999 http://www.siggraph.org/education/materials/HyperVis/visgoals/visgoal2.htm 

W3C World Wide Web Consortium: Develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential. W3C is a forum for information, commerce, communication, and collective understanding. http://www.w3.org/

web: The genome community was an early adopter of the Web, finding in it a way to publish its vast accumulation of data, and to express the rich interconnectedness of biological information. The Web is the home of primary data, of genome maps, of expression data, of DNA and protein sequences, of X-ray crystallographic structures, and of the genome project's huge outpouring of publications. ... However the Web is much more than a static repository of information. The Web is increasingly being used as a front end for sophisticated analytic software. Sequence similarity search engines, protein structural motif finders, exon identifiers, and even mapping programs have all been integrated into the Web. Java applets are adding rapidly to Web browsers' capabilities, enabling pages to be far more interactive than the original click- fetch- click interface. [Lincoln D. Stein "Introduction to Human Genome Computing via the World Wide Web", Cold Spring Harbor Lab, 1998]  

Related terms: fractal nature of the web, weblike Narrower terms:  semantic web, web portals, web services  

web harvesting: A Web site is usually viewed as a collection of individual pages interconnected by a simple URL links. This is the common basis for Web harvesting engines, where these pages are harvested, indexed, and the search results made available to end- users. As Web sites become increasingly large and sophisticated, it is worthwhile to see how prevalent simple linking is, or if other Web page navigation techniques are replacing the simple linking model. [Web Characterization Project, OCLC, 2001] http://wcp.oclc.org/pubs/rn2-navigation.html

Google = about 536 July 19, 2002; about 3,000 Oct. 22, 2004

weblogs:   Wikipedia http://en.wikipedia.org/wiki/Weblogs 

A history and a perspective http://www.rebeccablood.net/essays/weblog_history.html
Bob's Weblog Backgrounder
Bob Stepno http://radio.weblogs.com/0106327/stories/2002/12/14/bobsWeblogBackgrounder.html 

Related terms: blog, blogging, blogosphere, microcontent, nanopublishing

web ontology language: Requirements for a Web Ontology Language, working draft http://www.w3.org/TR/2002/WD-webont-req-20020307/

Google = about  736 July 19, 2002; about 19,600 Oct. 22, 2004; about 326,000 Nov 17, 2006

web portals: 2.1 Web Portals, W3C, Requirements for a web ontology language, work in progress  http://www.w3.org/TR/webont-req/#usecase-portal

Google = about 74,600 ("web portal" about 738,000) July 19, 2002

Web search glossary, Google http://www.google.com/support/bin/answer.py?answer=50187 60 definitions

web service interoperability: Web services technology has the promise to provide a new level of interoperability between software applications. It should be no wonder then that there is a rush by platform providers, software developers, and utility providers to enable their software with SOAP, WSDL, and UDDI capabilities.  http://www-106.ibm.com/developerworks/webservices/library/ws-inter.html

Google = "web service interoperability" about  412 "web services interoperability" about 9,620 July 19, 2002; about 283,000 Nov 17, 2006

web services:  The goal of the Web Services Activity is to develop a set of technologies in order to bring Web services to their full potential.  W3C "Web Services Activity 2002  http://www.w3.org/2002/ws/

Google = about 2,110,000 July 19, 2002; about 122,000,000 Nov 17, 2006

Web services glossary, W3C, http://www.w3.org/TR/ws-gloss/

webizing: "Webizing Existing Systems" Tim Berners-Lee, last updated 2001 http://www.w3.org/DesignIssues/Webize

weblike: [Tim Berners- Lee, Ralph Swick, Semantic web Amsterdam, 2000 May 16] http://www.w3.org/2000/Talks/0516-sweb-tbl/slide3-1.html

Tim Berners- Lee writes in his account of coming up with the idea of the web Weaving the Web about "learning to think in a weblike way". I don't know that I can claim to approach this yet, but the more that I write and research this glossary on and for the web, the more insight I'm getting into what he might mean. Metaphors like "shooting at a moving target" and like Wayne Gretzky "skating to where the puck is going to be" are helpful images.

Google = about  3,020 July 19, 2002; about 5,510 Oct. 22, 2004; about 75,700 Nov 17, 2006 
"web like" about 788,000,000 Nov 17, 2006 

Wiki collaborative software: Allows users to post and edit content remotely. An exciting (and free) way to build and manage content. Wiki Web sites  allow all users to add and edit content. While it might sound like a free-for-all, the authors suggest such Web sites have been used successfully in research, business, and education to document project designs, for brainstorming, and for otherwise creating content in a collaborative fashion.  Bo Leuf, Ward Cunningham, The Wiki Way: Collaboration and sharing on the internet,  2001 

wild cards and Google http://www.google.com/support/bin/answer.py?answer=3178&ctx=sibling Yes you can.

XML: Computers & computing glossary 

Bibliography
Barnes, Ken et. al, Microsoft Lexicon or Microspeak made easier, 1995- 1998, 150 + terms.  http://www.cinepad.com/mslex.htm
FOLDOC Free On-line Dictionary of Computing, Denis Howe, 2007. 14,400+ terms.  http://foldoc.org/ 
Glossary of Ontology Terms, Stanford Univ., 2001, 24 terms. http://www-ksl-svc.stanford.edu:5915/doc/frame-editor/glossary-of-terms.html
Information Resource Management Glossary, Government of British Columbia, Canada, 2001 http://www.cio.gov.bc.ca/other/daf/IRM_Glossary.htm
Lycos Tech Glossary 2002 http://webopedia.lycos.com/
Barnes, Ken et. al, Microsoft Lexicon or Microspeak made easier, 1995- 1998, 150 + terms.  http://www.cinepad.com/mslex.htm
Schneider, Tom and Karen Lewis, Glossary for Molecular Information Theory and the Delila System, Lab of Computational and Experimental Biology, NCI Frederick, US, 2004. 100+ definitions.  http://www.lecb.ncifcrf.gov/~toms/glossary.html
W3C Glossary and Dictionary http://www.w3.org/2003/glossary/ 
Web search glossary, Google http://www.google.com/support/bin/answer.py?answer=50187 60 definitions
Web services glossary, W3C, http://www.w3.org/TR/ws-gloss/
Webopedia  http://www.webopedia.com/
whatis.com Information Technology encyclopedia. About 3,000 + definitions.   http://whatis.techtarget.com/
XML Glossary http://www.softwareag.com/xml/about/glossary.htm 

Alpha glossary index

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.

How to look for other unfamiliar  terms

Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map