Reflections on the progress of Big Data

Posted by s.aragon on 17 November 2016 - 10:39am

Big DataBy Anna Leida, eScience Lab, University of Manchester

At the New Scientist Live festival of science and innovation, Professor Sir Nigel Shadbolt of the University of Oxford, co-founder of the Open Data Institute gave a talk on the promise and peril of big data and artificial intelligence. Big Data is the popular scientific term to describe the ability of computers to access and successfully analyse large amounts of data from multiple sources. This ability is the foundation for intelligence, and is an activity our human brains do on a daily basis, but where we have so far been in universal solitude - at least as far as we know. So why would we not welcome a little company in the ivory tower of intelligence, even if it is only by artificial means?

During half an hour in a fully packed auditorium, Professor Shadbolt walked the audience through history. Starting with the prosaic description of HAL in "2001: A Space Odyssey" 1970, via the invention of the "World Wide Web" in the 1980s, to machines now outsmarting humans in a series of data processing and analysis tasks, such as Deep Blue (chess), Watson (Q&A) and AlphaGo (go).

Despite the impressive power of artificial intelligence, humans are still the most able when handling unforeseen events and disruptions. This fact was conveniently illustrated in real-time as halfway through the talk the computer system temporarily froze, a situation elegantly handled by assisting technicians. At the same time, it was a timely reminder of how vulnerable our highly interconnected societies are, as we are slowly growing more and more dependent on software systems and technical assistance for our daily activities.

We need computers to handle even the most elementary of tasks, such as paying bills, do our shopping, stay in contact with friends and family and even find our way to the local store using satellite navigation. The use of computers and the sharing of information enrich our lives and give us more time to focus on what we want to do. At the same time, we have lost the ability to manage without each other, without the shared knowledge and without the computers. Who of us can build  a computer from scratch, even if our lives depended on it?

Internet, the main platform for sharing data, is by far the most commonly used source of large scale data gathering and analysis. At the same time, it is on the Internet that we as individuals share many of our most intimate activities, and this is where interests start to collide. As Big Data grows bigger and information is continuously being transferred to computerised systems, the peril to privacy grows as well. A study from University of Cambridge shows how likes on social media can pinpoint private details such as relationship status and political opinion.

The web was designed to transcend borders, to be an open platform for all and to enhance our quality of life. In order to be open and shared, it also requires that we share even given the risk of any information we give away—consciously or subconsciously—is also being misused. As the Internet is being used more and more not only for private, research and communication purposes but also for crime and illegal persecution, there is an increased need for security oversight also on the open web. But how much is too much and when is the open web no longer open?

In his talk, Professor Shadbolt pointed out examples from both sides, but the take home message was that, so far, the uses of open data outweigh by far  the downsides. It has saved lives after earthquakes and delivered teaching resources to communities, which would otherwise lack such access and helped to quickly compile citizen reports much more efficiently than any governmental organisation or security force.

As in all areas of life, we need to make sure that there is a balance between surveillance and openness and a proportionate response to misuse. The only way we can do that is to be vigilant enough to question unproportionate restrictions as well as unnecessary demands for openness and remember that even if data can not always be completely open, the discussion certainly should be.