The Semantic Web: Ontologies and Real Time Technologies For Business
Dr Gregory Grefenstette, from Exalead explains in this article how the semantic web is changing our way of managing information within organisations. He explains the main advantages and challenges of the solution, and clarifies the concept of real time Internet from the perspective of the semantic web.
By Dr Gregory Grefenstette, Chief Science Officer, Exalead
Companies of all sizes can no longer depend just on in-house databases to manage their company knowledge. It is increasingly necessary to match up this structured knowledge found in databases with ‘real world’, free form knowledge – with company documents found elsewhere.
For this reason, more and more companies that provide text based search solutions now include categories, or facets, in their search engines, as well as database connectors, entity recognition, and other semantic tools that consolidate database information with free-text information. Semantic Web tools link the real world knowledge base of a company - its memos, email records, reports, correspondence and contracts - with the meaning and commercial significance of database records. Where both kinds of knowledge are proliferating, there is a recognised need for the semantic linking and seamless presentation of the two.
Search Based Applications break with a 25-year tradition of database-centred application development by providing information access to information contained in different semantically typed repositories via powerful search and indexing technologies coupled with standard Web technologies. The result is a single, scalable platform that provides users with the look and feel of web searching whilst retaining the semantic typing and relationships of entities in the search results.
The Semantic Web provides a model to look at text in the same way as databases. The challenge, however, is how to define the basis of the relationship between different kinds of knowledge; how to accommodate different roles and worldview within an organisation, and how to incorporate and make available – in appropriate categories – material from the self-labelling discourse of the real time web.
The Semantic Web: How to make it work
Put simply, the Semantic Web means adding new information to web content to ascertain three things about a concept or an entity that appears in free text. Firstly, what type of entity or concept is it? For example, if a text mentions "Michael Jackson", does it mean the former singer, the Canadian actor, the English bishop, or someone else? Adding the type of a concept is achieved by placing it in some category or hierarchy. An ontology is a logical structure that can house categories, or hierarchies. A simpler structure is a taxonomy, which can be seen as a simplified ontology.
Secondly, the Semantic Web adds unique identifiers to concept or entities so that a computer can recognise "NHS" and "National Health Service" as being the same thing. The third facet that the Semantic Web offers is a clear depiction of the relations between various items, including typing, equality and relations, which all appear in databases. The database field an element appears in determines its type, the entity coding in a database ensures equality, and the rows in a database provide the relationship between them. The purpose of the Semantic Web is to be able to make information in text look and work like information in databases. This is not a new challenge, but it is one faced by increasing numbers of organisations as they attempt to translate their shared assumptions and understanding into an ontology, producing reliable, meaningful results.
Defining relationships: how to get a good quality ontology
First, organisations must realise that all the structured information they already have in their databases defines their own ontologies. Sometimes these databases will have grown organically and independently of one another. In this case, a large part of the organisation’s challenge is to understand its own knowledge. A simple way to begin is to organise the information used in the organisation’s databases into explicit taxonomies. These taxonomies can then form the basis of the organisation’s ontology.
Some taxonomies are general and can be acquired from specialised companies or from the Internet – geographical taxonomies, product lists or medical names, for example. The Linked Data Organisation is seeking to make every ontology in the public domain available in Semantic Web format. However, the majority of an organisation’s knowledge, such as client lists, accounts, sales, etc., will not be open. Therefore, just organising a company’s knowledge is a big challenge.
After establishing the organisation’s taxonomies, the next challenge is defining the relationship between elements. The relationship between entities and concepts is one of the other important parts of an ontology. This defines, and is defined by, an organisation’s view on the world, and must be done by people inside the organisation who know what relations are important for the company.
One way to do this is to consider information from the viewpoint of groups of internal users. For example, the marketing department is concerned with products, potential sales, campaigns, advertising venues; the research and development department, meanwhile, will be concerned with the same products, but also by quality assurance, bugs, production and supplies. Each group within an organisation will have its own agenda and view of the important relationships between concepts. We have found that one useful way of attacking this challenge is to imagine each view of the relationships as a document containing all elements of interest. This document is divided into fields, which map onto the relations in this view. The document viewpoint is a convenient, intuitive packaging of relationships from an ontology. Since documents can be created dynamically by query directed extraction from fixed ontologies or from fixed databases, document views allow an agile method for restructuring ontological relations without touching the underlying ontology. In other words, a richer, and complicated ontology detailing all the relationships between organisational entities can be defined once and for all, but different, evolving views on these relationships can be extracted and shown. A document view on ontologies also permits the display of structured and unstructured information in a coherent manner.
Top down or bottom up? A vendor’s eye view
The purpose of an ontology in an organisation is to type objects, recognise equality in objects, and to render explicit the relationships between objects. We believe that the organisation’s existing database should fulfil these three roles, and defining an organisation’s ontology should be built on currently existing databases. This is a top down approach. Search based applications built on organisational ontologies must be able to capture this view. On the other hand, modern search engine technology, in its use of facets and categories, allows a visualisation of ad-hoc structures, and user-created tags.
Search based applications straddle search engine and database technologies, permitting both top-down, organisational structuring of knowledge, in the form of ontologies derived from established databases, as well as uncontrolled, open text, mixing top-down and bottom-up approaches. A number of semantic technologies (synonyms, partial matching, related queries, related terms) in search engines enable users to recognise relationships, not only between elements spontaneously added by the “crowd” to documents, but also, through the same mechanisms, bottom-up terms to established canonical terms appearing in a more rigid, top-down ontology.
Reconciling the real time web and the semantic web: new and established classifications The real time web is concerned with the spontaneous creation of information, terminology, and relations. Twitter is a prime example, with its uncontrolled use of #tags to indicate, in some sense, the “subject” of a tweet. The challenge is to connect these new tags to established tags already appearing in a controlled ontology. One response is to use entity recognition, using known entities and concepts in the ontology, to classify real time submissions, and then extend this classification to collect newly created tags.
This approach allows new concepts to ‘gravitate’ towards known ontological classes. An advantage of the real time web is the quantity of information that is produced, which allows such a ‘gravitational’ approach to function, over time. If the connection is made, and the classification is accurate, the real time web can be linked into the organisation’s view of the world as it subsists in a pre-existing ontology. The knowledge contained in the ontology can be linked into real time web expression and trends, derived form the past 15 minutes or two days, which can then be linked into the ontological view of the organisation.