The knowledge hierarchy: does DIK really lead to W?

Julià Minguillón

Lecturer
Computing, Multimedia and Telecommunications Studies
Universitat Oberta de Catalunya

In today’s information society our intangible assets count for more than our tangible ones, and knowing how to use a product or service is considered more important than understanding how it was conceived or created. Information and knowledge, described as the two middle links in the data–information–knowledge–wisdom (DIKW) hierarchy, continue to be more firmly coupled to the first link than the last, more recognizable as refinements of raw data than wellsprings of wisdom. And since the mid-1990s, when the Internet appeared, the rate at which we produce, process and share data has increased exponentially, multiplying itself a thousand times every few years and sending us back to our dictionaries for handy prefixes (mega-, giga-, tera-) to reasonably contain all those zeros. With practically real-time speed, governments, corporations and private users generate staggering quantities of data in multiple formats. Volume, variety and velocity, the three Vs often used to characterize big data, are effectively the axes of a 3-cube which is expanding so fast that figures which once seemed very large now look trivial. To take just three of what have become all too familiar examples, in the next minute Twitter users will tweet more than 350,000 times, Google will receive nearly 2.5 million search queries and email users worldwide will send over 140 million messages (retrieved in www.internetlivestats.com).

All this is happening because our world is becoming increasingly digital. As consumers turn en masse towards conveniently flexible, economically-priced formats based on the mass digitization of products and services, those who traditionally marketed their wares in physical formats are having to remodel. For example, the news, music and photography industries are replacing their paper broadsheets, music LPs or CDs and rolls of film with digital products, which banks must also help us pay for digitally rather than in coins or banknotes. The digitization process also creates a trace in the shape of data waiting to be analysed and transformed, first into usable information by being contextualized, and then into knowledge by being compared with previously acquired data. When in 2006 Clive Humby declared that data was “the new oil”, he was observing its high intrinsic value as the material from which we can extract an even more precious substance, called knowledge. And when four years later David McCandless argued that data was more like “the new soil”, he was proposing that it could also provide people with the opportunity to tell (their) stories. The result of this dichotomy between data as material generated by and for the industry or by and for the user is that while businesses have created new opportunities for themselves, the tensions between producers and consumers have also risen.

The interaction between users, service providers, services and resources in the big data arena is also especially thought-provoking. For example, customers buying books in an online shop regularly benefit from recommender systems whose knowledge is gleaned from processing data on hundreds of thousands or even millions of previous purchases. But when we decide that a recommender system’s suggestion was aptly made (or even “wisely” made, given the subject in question), should this reassure or alarm us? Doesn’t it reveal how predictable we are? As its users, don’t we somehow become data ourselves? And if we do, who are our custodians and what are their intentions? George Orwell’s 1984 portrayed the dystopia of an omnipresent state subjecting its fearful citizens to constant surveillance. What would he have thought of our digital world, where not only governments but private sector corporations control large swathes of private information? This is even more disconcerting when so many users voluntarily generate, collate and share the details of their private, academic and professional lives. The advent of Web 2.0 in the first decade of this century enabled end users to become generators as well as consumers of Web content. In fact, it effectively reshaped the entire human communication paradigm by allowing a single individual to speak out to hundreds of thousands of others at the same time, only by going online. Social and professional networking services like Twitter, Facebook, Instagram or LinkedIn have redefined the way we share private information. Users’ habits also help these platforms and third parties to observe, analyse and extract behaviour patterns that can be used to offer us services and products which are more personalized, but which at the same time threaten to infringe upon our user privacy.

But it’s not just users that generate interesting data. Any object can become a piece in the gigantic, living puzzle we call the Internet of Things. Sensors, radiofrequency ID tags and other such devices can be designed to establish connections between objects and people that were unthinkable a decade ago. For example, why should we go on building costly on-road traffic sensors to calculate traffic flows when we can use the live data from the connection patterns in our motorists’ mobile phone-integrated GPS? And similar technologies can be used to create knowledge in commercial contexts, for example when supermarket companies need to calculate which products sell most and in combination with which others, in order to more effectively position, group and organize certain items or promote new ones.

Because it enables us to communicate in real time from anywhere in the world, our technology has given us a series of devices that free us from the constraints of certain kinds of institutional inertia or corporate mistrust and secrecy. It is therefore important to review what can or should happen in each phase of the life-cycle of usable data —from its generation, capture, and pre-processing to its storage, analysis, visualization and publication— and to ask ourselves not only about the technical details but about the organizational and legal issues and, most importantly, the ethics of data. What constitutes data? How is it generated and managed? Who controls it, what is it used for and when does it become obsolete? These are the questions we need to address if we want to assume responsibility for our role as digital citizens serving the common good.

The big data paradigm opens up any number of avenues to the digital citizen and to society at large. Whichever we choose to explore, the delicate balance between our needs as individual users and as members of society will be directly affected, as it is in many areas of human experience. In short, the data–information–knowledge chain will only lead to any real wisdom when this particular hierarchy becomes stable and beneficial for producers and consumers alike, for each of us according to our needs and for all of us in harmony with the objectives of the common good.

Creative Commons licence (Attribution-Non-Commercial-No Derivative works). They may be consulted and distributed freely provided that the author and publisher are quoted (in accordance with the “Recommended citation” section in each of the articles). However, no derivative works (translation, change of format, etc.) may be made without the publisher’s permission. Therefore, it meets the definition of open access form the Budapest Open Access Initiative declaration. The journal allows the author(s) to hold the copyright without restrictions and to retain publishing rights without restrictions.