In the middle of 2021, no one is surprised by the term “Big Data”. All professional or student of any discipline has heard about it and each one is capable of giving a description, or rather, their interpretation of the concept, its limits or its purpose. However, do we really know what Big Data is?
It is not the aim of this post to make a formal and exhaustive definition of the term since it is currently very easy to find thousands of references on the web with different levels of depth and in suitable languages for any audience. That is why I would prefer to focus on assessing what is commonly understood by Big Data and some of the most popular problems derived from an incorrect or incomplete interpretation.
Since the beginning of my professional life, whether for academic or work reasons in private companies, I have been linked to different fields belonging to Big Data and I have been able to discuss the potential of Big Data with students and professionals from different fields. Of course, in many of those conversations I have been able to learn a lot and continue to strengthen my knowledge about this area, but in most cases I have detected that a good part of the population really has a biased or incomplete vision.
In fact, in most cases, Big Data seems to be understood “simply” as the set of solutions, tools or algorithms capable of allowing to store and retrieve high volumes of data, thus essentially focusing on the architectural problem that manages large quantities of data supposes. Effectively solving this problem through advanced indexing processes, distributed architectures or the ideal types of databases for the nature of the problem is one of the key points to be addressed by the so-called “Big Data management techniques”, but Big Data is much more. Possibly this typical association exclusively with the technical problem involved in working with high volumes of data and with adequate speed is due in large part to the rapid interpretation of this confusing term (“Big” + “data”) but we must consider that through these techniques we can face the challenge that It involves working with data in a variety of formats, where the veracity of these must be ensured, and with the final goal of extracting a value that would otherwise be hidden. It is what is usually known as the 5 V’s of Big Data: Volume, Speed, Variety, Veracity and Value.
In other words, through the management and analysis of Big Data we should not only be able to quickly obtain the result of a query on a data base, but also, for example, detect behavior patterns of our customers, identify new demands or opportunities, make more accurate estimates or anticipate possible conflicts with the aim of allowing us to be more competitive and efficient. In this context, classic techniques encompassed within the field of Artificial Intelligence such as Machine Learning, Deep Learning or KDD process, without having to be renamed with any other “cool” name, can be the key to exploiting its full potential by being applied to a good set of data without the obligation to have Terabytes or Petabytes of information, being therefore applicable in companies of any size.
In short, currently both the concept of Big Data and the need to implement strategies for its management in the business environment seem to have been completely assimilated. However, it is important to refresh its real meaning, from the different perspectives involved, to avoid its exclusive use in certain sectors or types of company. We all have within our reach to obtain some benefit from Big Data, let’s not waste it.