MIcrosoft Buys into Semantic Search
While the definition of Web 2.0 has been argued between digital specialists for some time now, the same key themes prevail. According to Wikipedia, Web 2.0 technology enhances "creativity, information sharing, and, most notably, collaboration among users". The definition of Web 3.0 however is much more difficult to define.
While some will argue that the mobile Internet will be the next key development in the world of the Web, some are looking further afield. One concept that continues to stick is that of the semantic Web.
The Internet is an infinite resource of information, yet the Web itself is nothing more than a place to host content. The Web remains dumb with little understanding of what content it actually has. Some argue that the advancements made in the semantic Web space will enable it to 'understand' the meaning of content, essentially making the Internet 'think'.
Microsoft is certainly convinced. The Redmond-based company has recently purchased semantic search pioneer, Powerset. Although the details of the deal are sketchy at best, Microsoft reportedly paid $100mn for the private Silicon Valley company. The move comes after Microsoft failed in its bid for Yahoo, although Microsoft claimed that the deal had been in the works for some time and is unrelated.
Powerset's plans to use so called "natural language" technology to search the Web - as opposed to key words used by traditional search engines - has made it one of the most talked about companies. "Natural search" is said to be able to attribute meaning to words, with many seeing it as a true challenger to Google.
Struggling to understand
Critics have claimed that semantic Web technology is still years away and may not be fully realised. Tom Mortimer founder of search engine company Lemur Consulting believes that computers will continue to struggle understanding meaning.
"When we talk about 'semantics' we're talking about meaning and knowledge; and nobody has managed to define these consistently from a philosophical point of view. It's therefore a bit of a leap to say that a computer can deal with them. There have been many attempts to represent knowledge inside computers, but they've all had serious limitations.
"From a pragmatic point of view, if a computer gives people useful answers, it could be said to be 'intelligent', or that it 'knows about things'. But this is different from claiming that it is genuinely intelligent or knowledgeable in a human sense. That is still a long way off so far as anyone can tell," he said.
However, Powerset refuted those charges claiming that full natural language search of the Web is "absolutely within reach". The company recently launched a limited preview of its search technology with some success.
Technology is real
Graham Moore, director of search solutions firm, NetworkedPlanet agrees that semantic search is not just a pipe dream.
"Ignoring issues of bandwidth, the current web infrastructure stack is adequate 'to understand meaning'. The current web utilises a protocol called HTTP. This is a very simple and powerful protocol for the retrieval, update and querying of web resources. What does need to change is the way in which we/tools/applications use this protocol and the kinds of 'resources' we create and consume. Currently, most resources are produced by humans or machines for consumption by humans, i.e. they end up in a browser as HTML web pages. For machines to understand meaning we need them to be able to produce and consume web resources," said Moore.
"While HTML web pages contain natural language and obviously convey meaning to those who read that language, natural language machine processing is limited. So, a semantic web or a linked data web needs to produce resources that can be processed by machines - such that meaning can be conveyed, received, and comprehended.
"By making resources either contain a semantic mark-up, or by web servers publishing a semantic mark-up, the web can get smarter. By smarter, I really mean that we can start organising and finding information based on concepts and subjects and not via full text inferences," he commented.
Where next for semantic?
However, understanding meaning in text is one thing, but in Web full of rich media content, searching through images and video is a different story. Blinkx is a video search engine founded in 2004 by Suranga Chandratillake and has of 18 million hours of searchable video.
The search engine works by teaching a computer to read more signs in video. For instance, a computer can tell that an object has four legs, is black and white and standing in a field. After these details are processed, the images become hazy. Blinkx however, takes into account much more information than the image itself. If the site is about milk, or farms, then Blinkx would be able to identify the animal as a cow.
"The next developments in the semantic web are that large online companies will start to publish their data in semantic web formats alongside the existing html formats and that developers will start to build applications on top of these new semantic web structures", said Tom Ilube, CEO of online identity specialist, Garlik
"In particular, watch for the emergence of a new generation of truly semantic web search engines that know how to handle RDF structures on a truly massive scale. This will catalyse the emergence of the semantic Web and threaten the dominance of today's apparently unstoppable search engines," he continued.