You are here Biopharmaceutical/ Genomic Glossary Homepage/Search > Informatics >Information management & interpretation

Biopharmaceutical information management & interpretation glossary & taxonomy
Evolving Terminologies for Emerging Technologies
Comments? Questions? Revisions? Mary Chitty
Last revised November 14, 2013


The dividing line between this glossary and Algorithms & data analysis is very fuzzy. In general this one focuses primarily on unstructured data (or a combination of structured and unstructured), while Algorithms centers on structured data  Finding guide to terms in these glossaries Informatics term index   Site Map
Informatics includes Bioinformatics  Clinical informatics   Drug discovery informatics   IT infrastructure    Ontologies & Taxonomies are subsets of, and critical tools for Information management & interpretation  Technologies Microarrays & protein chips    Sequencing 

3D technologies: Visual communications are pervasive in information technology and are a key enabler of most new emerging media. In this context, the NRC Institute for Information Technology (NRC-IIT) performs research, development and technology transfer activities to enable access to 3D information of the real world. Research in the 3D Technologies program focuses on three main areas: Virtualizing Reality and Visualization, Collaborative Virtual Environments, 3D Data Mining and Management Institute for Information Technology, National Research Council, Canada, 3D Technologies

artificial intelligence: Algorithms & data analysis  Google = about  1,120,000  July 19, 2002; about 3, 040,000 Oct. 22, 2004

bias: One of the two components of measurement error (the other one being variance). Bias is a systematic error that causes the measurement to differ from the correct value. Since bias is systematic, it affects all experiment replicas the same way. 

bibliomining:  The combination of data mining, bibliometrics, statistics, and reporting tools used to extract patterns of behavior- based artifacts from library systems. Scott Nicholson, Bibliomining: Data Mining for Libraries, Syracuse Univ. US 

collaborative filtering: Tools that leverage user preferences, patterns, and purchasing behavior to customize organization and navigation systems. [Peter Morville "Software for Information Architects" Argus Center for Information Architecture, 2000] 

Amazon's recommendations based on what other buyers of a specific title are buying is a familiar example of collaborative filtering.  Google = about  21,600 July 19, 2002; about 49,300 Oct. 22, 2004 

collaborative metadata: A robust increase in both the amount and quality of metadata is integral to realizing the Semantic Web. The research reported on in this article addresses this topic of inquiry by investigating the most effective means for harnessing resource authors' and metadata experts' knowledge and skills for generating metadata. Jane Greenberg, W. Davenport Robertson, Semantic web construction: An Inquiry of Authors' Views on Collaborative Metadata Generation, International Conference DC 2002, Metadata for e-Communities, Oct. 13- 17, 2003, Florence Italy  Google = about 116 Apr. 24, 2003; about 377 Oct. 22, 2004

computational linguistics:  Computational Linguistics, or Natural Language Processing (NLP), is not a new field. As early as 1946, attempts have been undertaken to use computers to process natural language. These attempts concentrated mainly on Machine Translation ... the limited performance of these systems made it clear that the underlying theoretical difficulties of the task had been grossly underestimated, and in the following years and decades much effort was spent on basic research in formal linguistics. Today, a number of Machine Translation systems are available commercially although there still is no system that produces fully automatic high- quality translations (and probably there will not be for some time). Human intervention in the form of pre- and/ or post-editing is still required in all cases.  Another application that has become commercially viable in the last years is the analysis and synthesis of spoken language, i.e. speech understanding and speech generation. ... An application that will become at least as important as those already mentioned is the creation, administration, and presentation of texts by computer. Even reliable access to written texts is a major bottleneck in science and commerce. The amount of textual information is enormous (and growing incessantly), and the traditional, word- based, information retrieval methods are getting increasingly insufficient as either precision or recall is always low (i.e. you get either a large number of irrelevant documents together with the relevant ones, or else you fail to get a large number of the relevant ones in the collection). Linguistically based retrieval methods, taking into account the meaning of sentences as encoded in the syntactic structure of natural language, promise to be a way out of this quandary. Computational Linguistics FAQ, 
Linguistics, natural language, and computational linguistics Meta- Index
, Stanford Univ. US  Google = about  97,100 July 19, 2002, about 283,000 Oct. 22, 2004 

DAML DARPA Agent Markup Language: The goal of the DAML effort is to develop a language and tools to facilitate the concept of the semantic web.  Related term: OIL


data cleaning, data integration: Algorithms & data analysis  Google = "data cleaning" about  12,200; about 22,500 July 3, 2003
"data integration" about 175,000 July 19, 2002; about 306, 000 July 3, 2003; about 817,000 Mar. 22, 2004; about 2,940,000 June 22, 2007

data conversion:   Originally data conversion was primarily a matter of moving text and database files from one medium to another, one hardware platform to another, one operating system environment to another. But as text and database representations became more sophisticated it became apparent that application interoperability was going to be the overriding issue of concern. Company History, Data Conversion Lab 
Glossary, DCL Labs 30+ definitions

data management methods: Algorithms & data analysis  has automated methods, methods in this glossary generally combine human and automated methods.

data mapping: Wikipedia  Google = about 26,700 Aug. 20, 2002; about 55,000 July 26, 2004; about 208,000 Nov 27, 2006

data quality:  A vital consideration for data analysis and interpretation.  While people are still reeling from the vast amount of data becoming available, they need to brace themselves to both discard low quality data and handle much more at the same time.  
Data quality glossary, Graham Rind, GRC Data Intelligence,  6,700 terms. 

data science incorporates varying elements and builds on techniques and theories from many fields, including mathematics, statistics, data engineering, pattern recognition and learning,advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products. Data science is a novel term that is often used interchangeably with competitive intelligence or business analytics, although it is becoming more common. Data science seeks to use all available and relevant data to effectively tell a story that can be easily understood by non-practitioners. Wikipedia accessed Nov 11 2013

data scientist: a high-ranking professional with the training and curiosity to make discoveries in the world of big data. The title has been around for only a few years. (It was coined in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook.) More than anything, what data scientists do is make discoveries while swimming in data. It’s their preferred method of navigating the world around them. At ease in the digital realm, they are able to bring structure to large quantities of formless data and make analysis possible. They identify rich data sources, join them with other, potentially incomplete data sources, and clean the resulting set.. … As they make discoveries, they communicate what they’ve learned and suggest its implications for new business directions. Often they are creative in displaying information visually and making the patterns they find clear and compelling. … Data scientists’ most basic, universal skill is the ability to write code. This may be less true in five years’ time, when many more people will have the title “data scientist” on their business cards. More enduring will be the need for data scientists to communicate in language that all their stakeholders understand—and to demonstrate the special skills involved in storytelling with data, whether verbally, visually, or—ideally—both. … Data scientists want to be in the thick of a developing situation, with real-time awareness of the evolving set of choices it presents. Data Scientist: The Sexiest Job of the 21st Century Thomas H. Davenport and D.J. Patil, Harvard Business Review Oct 2012

data visualization:  The classical definition of visualization is as follows: the formation of mental visual images, the act or process of interpreting in visual terms or of putting into visual form. A new definition is a tool or method for interpreting image data fed into a computer and for generating images from complex multi-dimensional data sets (1987). Definitions and Rationale for Visualisation, D. Scott Brown, SIGGRAPH, 1999   includes information on data visualization.  
Related term: information visualization; Broader term: visualization

databases: Bioinformatics; Databases & software directory

deep web:  Google = about 10,200 Aug. 17, 2002; about 42,900 Oct. 22, 2004  Related term:  invisible web

description logic: Has existed as a field for a few decades yet only somewhat recently has appeared to transform from an area of academic interest to an area of broad interest. This paper provides a brief historical perspective of description logic developments that have impacted DL usability to include communities beyond universities and research labs.  Deborah L. McGuinness. ``Description Logics Emerge from Ivory Towers''. Stanford Knowledge Systems Laboratory Technical Report KSL-01-08 2001. In the Proceedings of the International Workshop on Description Logics. Stanford, CA, August 2001.

The main effort of the research in knowledge representation is providing theories and systems for expressing structured knowledge and for accessing and reasoning with it in a principled way. Description Logics are considered the most important knowledge representation formalism unifying and giving a logical basis to the well known traditions of Frame- based systems, Semantic Networks and KL- ONE-like languages, Object- Oriented representations, Semantic data models, and Type systems. [Description Logic Knowledge Representation]

disambiguate: Make less ambiguous, clarify, elucidate.  Google = about  33,100 July 19, 2002; about 65,300 Oct. 22, 2004, about 340,000 Nov 18, 2009

domain expertise: Wikipedia  Google = about 25,500 Dec. 18, 2002; about 68,500 Oct. 22, 2004; about 785,000 June 22, 2007; about 1, 120,000 Nov 18, 2009

DTDs Document Type Definitions: The National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM) created the Journal Archiving and Interchange Document Type Definition (DTD) with the intent of providing a common format in which publishers and archives can exchange journal content.

Dublin Core Metadata Initiative: An open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. The original workshop for the Initiative was held in Dublin, Ohio [OCLC] in 1995.

evolvability:   Tim Berners Lee defines 
Google = evolvability  about 8,210  July 19, 2002; about 21,400 Oct. 22, 2004; about 51,000 Nov 18, 2009  See also under interoperability

federated databases: An integrated repository data from of multiple, possibly heterogeneous, data sources presented with consistent and coherent semantics. They do not usually contain any summary data, and all of the data resides only at the data source (i.e. no local storage).   Lawrence Berkeley Lab "Advanced Computational Structural Genomics" Glossary  Related term: Information management & interpretation  semantic data integration

federated information systems. Their main characteristic is that they are constructed as an integrating layer over existing legacy applications and databases. They can be broadly classified in three dimensions: the degree of autonomy they allow in integrated components, the degree of heterogeneity between components they can cope with, and whether or not they support distribution. Whereas the communication and interoperation problem has come into a stage of applicable solutions over the past decade, semantic data integration has not become similarly clear. Susanne Busse et. al "Federated Information Systems: Concepts, Terminology and Architecture"  Computergestützte Informations Systeme CIS, Berlin, Germany 1999 

fractal nature of the web: Tim Berners- Lee, Commentary on architecture, Fractal nature of the web, first draft  

Society has to be fractal - people want to be involved on a lot of different levels. The need for things that are local and special will create enclaves. And those will give us the diversity of ideas we need to survive. Tim Berners Lee, in "The father of the web", Evan Schwartz, Wired Mar. 1997

granularity: Wikipedia 

<jargon, parallel> The size of the units of code under consideration in some context The term generally refers to the level of detail at which code is considered, e.g. "You can specify the granularity for this profiling tool". The most common computing use is in parallelism where "fine grain parallelism" means individual tasks are relatively small in terms of code size and execution time, "coarse grain" is the opposite. You talk about the "granularity" of the parallelism. The smaller the granularity, the greater the potential for parallelism and hence speed- up but the greater the overheads of synchronisation and communication. FOLDOC 1997 

The extent to which a system contains separate components (like granules). The more components in a system - or the greater the granularity - the more flexible it is. [Webopedia]

Level of detail seems to be the essence of granularity.   Google = about  250,000 July 19, 2002; about 454,000 Oct. 22, 2004; about 2,170,000 Nov 18, 2009

informatics: A field of study that focuses on the use of technology for improving access to and utilization of information. AHIMA e-HIMTM Work Group on Computer-Assisted Coding. "Delving into Computer-assisted Coding. Appendix G: Glossary of Terms" Journal of AHIMA 75, no.10 (Nov-Dec 2004): web extra.  

Narrower terms: bioinformatics; cheminformatics; Computers & computing  clinical informatics, molecular informatics,  Biomaterials matinformatics research informatics; Drug discovery & development life sciences informatics, Intellectual property & legal;  patinformatics; Molecular imaging image informatics;  pharmacoinformatics, pharmainformatics Proteomics protein informatics 

information -- how much?  How Much Information 2003, School of Information Science and Systems, Univ. of California, Berkeley, 2003 

information architecture: "Involves the design of organization, labeling, navigation, and searching systems to help people find and manage information more successfully."  Lou Rosenfeld, Peter Morville interview quoted in Mark Hurst "About Information Architecture, Apr. 3, 2000]  Google = about 132,000 July 19, 2002; about 258,000 July 3, 2003; about 622,000 Oct. 22, 2004; about 5,760,000 Nov 18, 2009
Information architecture glossary
, Kat Hagedorn, Argus Associates, 2000, 60 + definitions

information ecology: Wikipedia 

The Information Ecology group (formerly the Physical Language Workshop) explores ways to connect our physical environments with information resources. Through the use of low-cost, ubiquitous technologies, we are creating seamless and pervasive ways to interact with our information—and with each other. We focus on projects that harness the ecology of consumer electronics and sensor devices—present and future—to more smoothly mediate the boundaries between the physical and information worlds we inhabit.  MIT Media Lab Design Ecology/Information Ecology  2009  Google = about 11,100 Oct. 22, 2004; about 70,200 Nov 18, 2009

information extraction: Automated ways of extracting unstructured or partially structured information from machine readable files. Compare with information retrieval.  Google = about 43,100 July 19, 2002; about 590,000 Nov 18, 2009  Related terms: natural language processing, term extraction

information harvesting: See under Knowledge Discovery in Databases KDD  Google = about 871 July 19, 2002; about 1,230 July 3, 2003; about 1,730 Oct. 22, 2004; about 1,140,000 June 22, 2007

information integration: Our research group is developing intelligent techniques to enable rapid and efficient information integration. The focus of our research has been on the technologies required for constructing distributed, integrated applications from online sources. This research includes: Information Extraction: Machine learning techniques for extracting information from online sources; Source Modeling: Constructing a semantic model of wrapped sources so that they can be automatically integrated with other sources; Record Linkage: Learning how to align records across sources; Data Integration: Generating plans to automatically integrate data across sources; Plan Execution: Representing, defining, and efficiently executing integration plans in the Web environment; Constraint-based Integration  Interactive constraint-based planning and integration for the Web environment. Information Integration Research Group, Intelligent Systems Division, Information Sciences Institute (ISI), University of Southern California  Google = about 4,430,000 July 3, 2003; about 1,080,000 June 22, 2007; about 1, 160,000 Nov 18, 2009

information overload: Biomedicine is in the middle of revolutionary advances. Genome projects, microassay methods like DNA chips, advanced radiation sources for crystallography and other instrumentation, as well as new imaging methods, have exceeded all expectations, and in the process have generated a dramatic information overload that requires new resources for handling, analyzing and interpreting data. Delays in the exploitation of the discoveries will be costly in terms of health benefits for individuals and will adversely affect the economic edge of the country. Opportunities in Molecular Biomedicine in the Era of Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign

Many of today's problems stem from information overload and there is a desperate need for innovative software that can wade through the morass of information and present visually what we know.  The development of such tools will depend critically on further interactions between the computer scientists and the biologists so that the tools address the right questions, but are designed in a flexible and computationally efficient manner.  It is my hope that we will see these solutions published in the biological or computational literature.  Richard J. Roberts, The early days of bioinformatics publishing, Bioinformatics 16 (1): 2-4, 2000

"Information overload" is not an overstatement these days. One of the biggest challenges is to deal with the tidal wave of data, filter out extraneous noise and poor quality data, and assimilate and integrate information on a previously unimagined scale    Google = about  118,000 July 19, 2002; about 249,000 Oct. 22, 2004; about 1,480,000 Nov 18, 2009

Where's my stuff? Ways to help with information overload, Mary Chitty, SLA presentation June 10, 2002, Los Angeles CA

information retrieval:  Wikipedia 

information visualization: The direct visualization of a representation of selected features or elements of complex multi- dimensional data. Data that can be used to create a visualization includes text, image data, sound, voice, video - and of course, all kinds of numerical data. Our visual analysis systems also provide the tools to interact with the data that has been visualized so that users can explore, discover and learn. Users do not look at static images, but can subset the data, run queries, do time sequence studies and create categories and correlations of data type. Pacific Northwest National Lab, About Visualization at PNNL, 1999   Google = about 28,100 July 19, 2002; about 94,200 Oct. 22, 2004; about 1,330, 000 Nov 18, 2009  Related term: data visualization; Broader term: visualization
Information visualization resources on the web
, 2002

invisible web:  Those parts of the web which are inaccessible to current search engines. A straightforward example was PubMed/ Medline (until Google started indexing it.) You still can't usually access proprietary (fee- based) databases such as Thomson Dialog or Lexis- Nexis. except directly. Until fairly recently PDF documents and PowerPoint slides were inaccessible to search engines.   Google = about 17,300 July 19, 2002; about 278,000 Oct. 22, 2004; about 802,000 Nov 18, 2009   Related terms: deep web, semantic web
Invisible or Deep Web:
What it is, How to find it, and Its inherent ambiguity

just in time information: 90,200 websites were found with this phrase by Google on May 23, 2007. An increasing need as we are deluged with information and data -- and still need time to reflect, discuss and think about what all these mean.  Google = about 2,900 March 14, 2002, about 3,400 July 19, 2002; about 51,600 Feb. 21, 2006; about 88,400 May 7, 2007; about 781,000 Nov 18, 2009

Just-In-Time Information Retrieval. Bradley J. Rhodes. Ph.D. Dissertation, MIT Media Lab, May 2000. Just in time retrieval agents Bradley J. Rhodes

Related terms: information overload, remembrance agents; Bioinformatics modularity

knowledge integration: Wikipedia   Related terms: ontologies, semantics

knowledge management:  Systematic approach to acquiring, analyzing, storing, and disseminating information related to products, manufacturing processes, and components ICH Q10   Related terms: ontologies, paraphrase problem, taxonomies  Google = about 826,000 July 19, 2002; about 3,520,000 Oct. 22, 2004; about 11,000,000 Nov 18, 2009, about 33,400,000 Feb 15, 2011
KM Glossary, GOTCHA, Univ. of California Berkeley, 1999  About 50 terms. 
Virtual Library: Knowledge Management, May 2000 Definition, articles, white papers, interviews, business and technology library, periodicals and publications, “out of box thinking”, “movers and shakers”, “think tank”, calendar of events, emerging topics. 


lexical semantics: 

lexicon: A machine- readable dictionary that may contain a good deal of additional information about the properties of the words, notated in a form that parsers can utilize. Bob Futrelle, A brief introduction to NLP,, , Computer Science, Northeastern Univ., US, 2002

A linguistics term (words and their definitions), an artificial intelligence term.  Sometimes a synonym for glossary or dictionary.  Google = about 768,000 July 19, 2002; about 1,960,000 Oct. 22, 2004

linked data:  Linked Data is about using the Web to connect related data that wasn’t  previously linked, or using the Web to lower the barriers to linking data currently linked using other methods.  Linked data glossary  

machine-readable: See under metadata  Google= about 303,000 July 19, 2002; about 535,000 Oct. 22, 2004
machine-understandable: See under metadata 
Google= about 3,730 July 19, 2002; about 8,950 July 14, 2004
markup languages: Computers & computing 
Google = about 639,000 Aug. 9, 2002; about 170,000 Oct. 22, 2004
mash-up  Google = about 22,100,000 Oct. 27, 2006; about 20,400,000 Nov 18, 2009

Medbiquitous Consortium: Technology standards based on XML and web services. 

metadata: Could elevate the status of the web from machine- readable to something we might call machine- understandable. Metadata is "data about data" or specifically in our current context "data describing web resources." The distinction between "data" and "metadata" is not an absolute one; it is a distinction created primarily by a particular application ("one application's metadata is another application's data"). [W3C, "Introduction to RDF Metadata" 1997]

Metadata is machine understandable information for the web. The W3C Metadata Activity addressed the combined needs of several groups for a common framework to express assertions about information on the Web, and was superceded by the W3C Semantic Web Activity.  [W3C, Metadata and Resource Description, W3C Technology and Society Domain, 2001]  Google = about  1,640,000 July 19, 2002; about 4,850,000 Oct. 22, 2004; about 25,600,000 May 9, 2005;  about 62,700,000 May 7, 2007 Narrower terms: Dublin Core Metadata Initiative,  faceted metadata Related terms: interoperability, RDF, semantic web 

organizational informatics: A field which studies the development and use of computerized information systems and communication systems in organizations. It includes social studies of their conception, design, effective implementation within organizations, maintenance, use, organizational value, conditions that foster risks of failures, and their effects for people and an organization's clients. It is an intellectually rich and practical research area. "Social Informatics" Indiana Univ, School of Library & Information Science Narrower term: social informatics  Related term:  knowledge management  Google = about 153 July 19, 2002; about 211 Oct. 22, 2004 pattern, pattern language:  Patterns, discussion FAQ 

precision: Percentage of unrelated material excluded by a specific query or search statement. Related terms: Genetic testing analytical specificity, clinical specificity Compare recall  

query contraction: Needed when a search engine retrieves thousands of citations. May consist of additional (Boolean AND terms) or different (Boolean OR).  Google = about  26 July 19, 2002; about 130 Oct. 22, 2004

query expansion: Adding new and/ or different terms to a search statement (particularly when a search engine or database retrieve no hits). Often uses Boolean OR.  Google = about 7,500 July 19, 2002; about 21,300 Oct. 22, 2004  Related terms: ontologies, taxonomies

RDF Resource Description Framework:  Integrates a variety of applications from library catalogs and world- wide directories to syndication and aggregation of news, software, and content to personal collections of music, photos, and events using XML as an interchange syntax. The RDF specifications provide a lightweight ontology system to support the exchange of knowledge on the Web.  W3C Semantic Web Activity, accessed May 5, 2005  

recall: The percentage of applicable material retrieved by a specific query or search statement. Compare precision. Related term: Genetic testing  sensitivity 

relevance: Percentage of truly related material retrieved by a specific query or search statement. Related terms: precision Genetic testing & diagnostics analytical specificity, clinical specificity. Compare recall 

remembrance agents: A set of applications that watch over a user's shoulder and suggest information relevant to the current situation. While query- based memory aids help with direct  recall, remembrance agents are an augmented associative memory. Bradley Rhodes, Remembrance Agents Because serendipity is too important to be left to chance.    Google = about 673 July 19, 2002; about 549 Oct. 22, 2004  Related terms: collaborative filtering, just in time information

Rosetta: A systems- level design language developed to address requirements specification for systems- on- chip designs. Rosetta specifically addresses problems associated with heterogeneity and complexity in current systems. Specifically, Rosetta allows designers to develop and integrate specifications written in multiple semantic models to provide language and semantic support for concurrent engineering of electronic systems.  Accellera Rosetta Standards Committee Homepage, EDA Industry Working Groups, 2002

SOAP Simple Object Access Protocol:  A lightweight protocol for exchange of information in a decentralized, distributed environment. SOAP, W3C 1.1

semantic: Ontologies & taxonomies

social informatics:  Social Informatics (SI) refers to the body of research and study that examines social aspects of computerization, including the roles of information technology in social and organizational change, the uses of information technologies in social contexts, and the ways that the social organization of information technologies is influenced by social forces and social practices. 

A serviceable working conception of "social informatics" is that it identifies a body of research that examines the social aspects of computerization. A more formal definition is "the interdisciplinary study of the design, uses and consequences of information technologies that takes into account their interaction with institutional and cultural contexts." ... Social informatics has been a subject of systematic analytical and critical research for the last 25 years. Unfortunately, social informatics studies are scattered in the journals of several different fields, including computer science, information systems, information science and some social sciences. Each of these fields uses somewhat different nomenclature. This diversity of communication outlets and specialized terminologies makes it hard for many non- specialists (and even specialists) to locate important studies. Rob Kling, What is social informatics and why does  it matter? D-Lib 5(1): Jan. 1999 
Red Rock Eater News Service
, Phil Agre, UCLA, US 
Social informatics HomePage

soft computing: Principal constituents of soft computing (SC) are fuzzy logic (FL), neural network theory (NN) and probabilistic reasoning (PR), with the latter subsuming belief networks, evolutionary computing including DNA computing, chaos theory and parts of learning theory.... Differs from conventional (hard) computing in that, unlike hard computing, it is tolerant of imprecision, uncertainty and partial truth. In effect, the role model for soft computing is the human mind. The guiding principle of soft computing is: Exploit the tolerance for imprecision, uncertainty and partial truth to achieve tractability, robustness and low solution cost.  Lotfi A. Zadeh, What is BISC? Berkeley Initiative on Soft Computing,

subsumption:   Google = about 30,800 July 19, 2002; about 80,500 Oct. 22, 2004; about 159,000 May 2, 2005 
syntactic, syntax: Ontologies & taxonomies

term extraction: Robert Futrelle, Northeastern Univ., 2001  Google - about 49,900 Nov 18, 2009   
See related information extraction

term mining:  Term Mining in Biomedicine, Sophia Ananiadou - University of Manchester, 2007  Google = about 1,990 June 16, 2003; about 2,980 Oct. 22, 2004; about 40,100 June 22, 2007

text categorisation: See Algorithms & data analysis under support vector machines  Google = about  902 "text categorization" 9,220 July 19, 2002 about 27,100 Oct. 22, 2004

text mining:  Usually data mining technologies mine knowledge from data with well-formed schemes such as relational tables. But, text data don't have such scheme, and information is described freely in the documents. Therefore, we focus on Natural Language Processing (NLP) technologies to extract such information. Using NLP technologies, documents are transformed into a collection of concepts, described using terms discovered in the text. Usually, "text mining" is used to indicate a text search technique. But, we think of text mining as having more functions. Text mining technologies extract more information than just picking up keywords from texts: facts, author's intentions, their expectations, and their claims.  Tokyo Research Lab, IBM, Text Mining 

Using data mining on unstructured data, such as the biomedical literature.  Related terms:  natural language processing; Algorithms & data analysis: support vector machines Google = about  20,600 July 19, 2002 about 39,300 July 3, 2003; about 113,000 Oct. 22, 2004; about 1,110,000 June 22, 2007
Text Mining Glossary, ComputerWorld, 2004   Includes Categorization, clustering, extraction, keyword search, natural language processing, taxonomy, and visualization.

unstructured data: Generally free text, natural language.  Related term: natural language processing. Compare structured.  Google = about  21,200 July 19, 2002

variance: One of the two components of measurement error (the other one being bias). Variance results from uncontrolled (or uncontrollable) variation that occurs in biological samples, experimental procedures, and arrays themselves;  

visualization:   A method of computing by which the enormous bandwidth and processing power of the human visual (eye- brain) system becomes an integral part of extracting knowledge from complex data.  It utilizes graphics and imaging techniques as well as knowledge of both data management and the human visual system.  Lloyd Trenish, Visualization for Deep Thunder, IBM Research, 2002

Use of computer- generated graphics to make the information more accessible and interactive. Related term data mining  Narrower terms: data visualization, information visualization; Algorithms & data analysis  dendogram, heat map, profile chart

visualisation: As the quantity of data produced by simulations grows, so does the difficulty of extracting useful information. It is now clear that in many applications visual methods are the only practical way of extracting information from the data. Computer graphics and scientific visualisation techniques have become more important in the last few years with the increased availability of computing resource and of visualisation tools.  Visualisation is becoming one of the key tools for problem solving both in traditional areas such as visualisation of complex flow and in new applications areas like the planning of surgical operations using 3-D recontruction of anatomical sites using diagnostic images or the development of highly-realistic aeroplane simulators for pilot training.  DIRECT Development of an Interdisciplinary Roundtable for Emerging Computer Technologies,  Edinburgh University, Scotland 

Definitions and Rationale for Visualisation, D. Scott Brown, SIGGRAPH, 1999 

W3C World Wide Web Consortium: Develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential. W3C is a forum for information, commerce, communication, and collective understanding.

web: The genome community was an early adopter of the Web, finding in it a way to publish its vast accumulation of data, and to express the rich interconnectedness of biological information. The Web is the home of primary data, of genome maps, of expression data, of DNA and protein sequences, of X-ray crystallographic structures, and of the genome project's huge outpouring of publications. ... However the Web is much more than a static repository of information. The Web is increasingly being used as a front end for sophisticated analytic software. Sequence similarity search engines, protein structural motif finders, exon identifiers, and even mapping programs have all been integrated into the Web. Java applets are adding rapidly to Web browsers' capabilities, enabling pages to be far more interactive than the original click- fetch- click interface. Lincoln D. Stein "Introduction to Human Genome Computing via the World Wide Web", Cold Spring Harbor Lab, 1998 Related terms: fractal nature of the web, weblike Narrower terms:  semantic web, web portals, web services  

web service interoperability: Web services technology has the promise to provide a new level of interoperability between software applications. It should be no wonder then that there is a rush by platform providers, software developers, and utility providers to enable their software with SOAP, WSDL, and UDDI capabilities.

Google = "web service interoperability" about  412 "web services interoperability" about 9,620 July 19, 2002; about 283,000 Nov 17, 2006

web services:  The goal of the Web Services Activity is to develop a set of technologies in order to bring Web services to their full potential.  W3C "Web Services Activity 2002   Google = about 2,110,000 July 19, 2002; about 122,000,000 Nov 17, 2006  
Web services glossary

webizing: "Webizing Existing Systems" Tim Berners-Lee, last updated 2001

weblike: Tim Berners- Lee, Ralph Swick, Semantic web Amsterdam, 2000 May 16

Tim Berners- Lee writes in his account of coming up with the idea of the web Weaving the Web about "learning to think in a weblike way". I don't know that I can claim to approach this yet, but the more that I write and research this glossary on and for the web, the more insight I'm getting into what he might mean. Metaphors like "shooting at a moving target" and like Wayne Gretzky "skating to where the puck is going to be" are helpful images.    Google = about  3,020 July 19, 2002; about 5,510 Oct. 22, 2004; about 75,700 Nov 17, 2006 "web like" about 788,000,000 Nov 17, 2006 

workflows:  A collaborative environment where scientists can safely publish their workflows and experiment plans, share them with groups and find those of others. Workflows, other digital objects and collections (called Packs) can now be swapped, sorted and searched like photos and videos on the Web. ...  myExperiment makes it really easy for the next generation of scientists to contribute to a pool of scientific workflows, build communities and form relationships. It enables scientists to share, reuse and repurpose workflows and reduce time-to-experiment, share expertise and avoid reinvention. myExperiment 

XML eXtensible Markup Language : The universal format for structured documents and data on the Web. W3C, "Extensible Markup Language (XML)"

Barnes, Ken et. al, Microsoft Lexicon or Microspeak made easier, 1995- 1998, 150 + terms
FOLDOC Free On-line Dictionary of Computing, Denis Howe, 2007. 14,400+ terms. 
Schneider, Tom and Karen Lewis, Glossary for Molecular Information Theory and the Delila System, Lab of Computational and Experimental Biology, NCI Frederick, US, 1999, updated 2013. 100+ definitions.
W3C Glossary and Dictionary 
Webopedia Information Technology encyclopedia. About 3,000 + definitions.
XML Starter kit http://techcommunity. communities/public/webmethods/ products/tamino/faq/xml- starter-kit/

Alpha glossary index

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.

How to look for other unfamiliar  terms

Contact | Privacy Statement | Alphabetical Glossary List | Tips & glossary FAQs | Site Map