Web science: Exploring the network without guesswork

A FEW hundred people travelled to Raleigh, North Carolina, last week for an international meeting intended to bring academic scrutiny to bear on a seemingly unstoppable force. For more than a decade it’s been shaping our lives, transforming business and rewriting the rules of social interaction. That phenomenon is the World Wide Web.

The Raleigh meeting was only the second conference on the emerging discipline of web science. The web is clearly one of the most significant technologies of recent decades, but does it need to be treated as a field of study in its own right?

“The web is too important not to understand,” says Nigel Shadbolt, professor of artificial intelligence at the University of Southampton, UK. “And you have to get some science done before you can understand something.”

People from a diverse range of backgrounds will be needed to understand the technical, social, legal and political forces at work in cyberspace, says Shadbolt, just as in the 1970s the previously separate disciplines of physics, biology, geology and others contributed to the emergence of environmental science.

Interdisciplinary web science research groups and courses are springing up at universities around the world. Shadbolt and web inventor Tim Berners-Lee will co-direct a new Institute of Web Science at Southampton in collaboration with the University of Oxford, with £30 million of initial funding provided by the UK government.

At the Raleigh meeting, Devin Gaffney of the Rensselaer Polytechnic Institute (RPI) in Troy, New York, described how in mid-2009 he set up software to archive every message posted by Iranians using the social messaging service Twitter to coordinate dissident protests. Now that the buzz from bloggers and journalists declaring that this was a “Twitter revolution” has subsided, Gaffney is analysing the 766,263 tweets he has collected in order to assess how justified that description was.

At the time, Twitter boasted about its role in connecting the protestors, but Gaffney’s initial results suggest that Twitter had a greater impact internationally. “Evidence so far suggests a demographic of non-Iranians generating awareness about the situation,” he says.

Gaffney is now trying to find out if the Iranian government itself has been monitoring and reacting to online activity, and whether the authorities have used Twitter to keep track of the protests. “Twitter and Facebook give Iran’s secret services superb platforms for gathering open-source intelligence,” he says.

Twitter and Facebook give Iran’s secret services a superb platform for gathering intelligence
Gaffney’s work is one example of research that will help us make predictions about the likely impact of web interactions, says Jim Hendler, also at RPI, who recently began teaching the first undergraduate degree in web science in the US.

“I wouldn’t expect to be able to say ‘such and such a company will be successful’ or ‘this technology will succeed or fail’, but to be able to more rapidly model the effects of new technologies,” says Hendler. He likens the predictive use of web science to studying a flu outbreak. It is impossible to say exactly how many people will die, but it is possible to model when and how the outbreak becomes a pandemic.

Shadbolt says such insights will be of interest to business as well as academics and governments. For example, a web scientist would have been able to advise on the likely reception to Google’s Buzz social network, he says. Launched in February, Buzz automatically assigned Gmail users to groups by looking at the people they emailed most frequently. The service had to be hastily modified after it became clear that people did not like their frequent contacts being displayed online.

Berners-Lee has said that new web services “are really social experiments”, and Shadbolt thinks collecting the results of these experiments will help generate ideas for better web products, driven by considerations other than purely financial ones.

Shadbolt adds that web scientists are already leading the development of the semantic web, or web 3.0. This will see web services become able to process text and other online content at a deeper level, so that a search engine could “understand” the intent of a query and deliver results it thinks will answer the query, instead of crudely looking at documents where the search term appears most frequently, for example.

Marshall Kirkpatrick, co-editor of the influential blog ReadWriteWeb, is sceptical. He cautions that what appears ground-breaking to web scientists may fall flat with users. “Anything directly monetisable will be advanced, but the coolest innovations, the most democratic, will be grunted at by an uninspired market,” he says.

Hendler acknowledges that most attempts to understand and exploit the web are driven by financial incentives. But that does not mean we should not study the web for its own sake, he says. “A more neutral, scientific view is needed if we are to understand this important force in our society and make sure it provides the services we need.”