Drawing on his recent experience with the Newspaper Licensing Agency, Charlie Hull of Lemur Consulting discusses some of the complexities in implementing newspaper clippings and media monitoring services.
Parallel to the growth of real-time news is the growth in the importance of archiving it for retrieval. A wide range of social groups are constantly searching for past and current articles, from academic researchers to businesses and the media itself. As the media increasingly goes online and changes in shape and format, there are issues to be faced and resolved.
There is a major concern amongst newspaper and magazine companies over protecting the copyright and financial worth of their material. This is an issue that is exacerbated by the existence of the Internet. How can companies make their content searchable online without compromising their own interests?
You’d think these days, that searching today's papers would be something you could just ‘do’. But you can’t.
What it boils down to is that publishers have good commercial reasons to be wary. Total democratisation of copy saps revenue, and without revenue there is no quality media.
You won’t find print information on the web for a variety of reasons. For a start printed content will seldom appear on the web in its exact original layout. Moreover, concurrently publishing certain types of print content online for free can undermine the ‘premium’ value of printed titles. This is especially true where the copy is not time-sensitive. Media outlets are sometimes hesitant to concurrently publish the exact same features online as they do in print. The Guardian, for example, tends to publish much of its weekend content a week late, minus much of the text and images that accompanied the originals.
Subscription models for newspapers suggest a way of providing searchable archive data online. But these have not proven to be outrageously popular and require a big investment by the user for the sake of what might be one article.
Above: Charlie Hull
Searching for Answers
It’s also true that the presence of old copy online can generate advertising revenue. But this is a model that is struggling even in the case of premium content.
In order to balance the interests of the media with those gathering clippings what is required is a searchable, electronic archive of UK media. We recently completed such a system for the NLA.
Owned by the papers themselves, the NLA specifically is a closed system, collating print archives and making them searchable.
One of the major challenges of such clipping services is protecting the value and copyright of original copy, as the material is never made available for free on the general Internet. One way to achieve this is to restrict access to certain types of individuals.
For example, when the NLA started its business in 2006 with eClips, this was a centralised system accessible only by press cuttings agencies. In 2007 this was expanded to a less guarded system, ClipShare, which allowed journalists and librarians to access PDFs of clippings directly.
I have come to understand what a mind-boggling task cataloguing UK clippings is. The NLA’s systems seem fairly representative, holding something like ten million records, with bewildering amounts of metadata and xml tagging.
Clippings databases really throw out unique challenges. Not only does every word have to be catalogued, but also its position in the document, its class, its author, etc. The NLA is projected to be handling up to 35 million records by the end of this year, and these generally have to be searched in under a second. There are also key security considerations – for example, legal judgements may affect access to particular stories.
Until recently the NLA’s services were provided on a purely subscription basis. The NLA recognised, however, a strata of the general public that only wants sporadic, perhaps one-off, access to clippings archives. These individuals are not served well by the subscription model, which seeks buy-in, for the sake of copyright and revenue protection. How do we democratise access to archive media, while preserving value for the publishers?
Credit Where Credit's Due
The company therefore turned to the second model. The new product, ‘ClipSearch’, involves a more casual ‘credits’ model. Users pay per-clipping in small blocks. Effectively, the ability to access clippings has been democratised.
But this approach comes at a price. Individual articles can be accessed easily, but the revenue and circulation figures affected by premium content must still be protected. The NLA has therefore decided to only allow access to clippings that are more than three days old.
In examples such as this, clippings companies attempt to strike a balance between making printed content available to casual users when it may be less current, and offering content to subscribers as soon as it goes live. The protection of copy must be a clippings company’s first priority.
Such solutions require a powerful full-text enterprise search engine to ensure that the right results are returned, with full accounting for the rich variety of metadata involved in media archiving.
Similar technology can be applied in the media monitoring sector, by running automatic searches for key words, chosen by clients, across incoming news stories. Again, the data is complex and the number of articles can be very large: how many news stories appear across the world in a typical day?
Another significant challenge to these kinds of systems is performance. After all, any mention of monitored keywords should ideally be processed within a few minutes of publication.
It’s encouraging for all concerned to see the increasing democratisation of access to published information. We’re likely to see this trend continue. We’re working with a number of companies to implement a powerful media monitoring system based on the Flax search engine.
Companies such as the NLA have shown a strong willingness to tackle the issues involved in the formidable challenge of bringing clippings to the masses. The demand is there. We’re likely to see many more exciting products over the coming years and, for the moment, ClipSearch is the headliner.
The Newspaper Licensing Agency was established by the UK national newspapers in 1996 to manage copyright collection. The NLA works on behalf of the UK's newspapers. The company authorises paper and digital copying of press cuttings on behalf of national, regional and international newspapers, including 1400 in the UK. The NLA currently licenses over 180,000 businesses and organisations ranging from large government organisations, plcs, and limited companies to partnerships and public relations agencies.