Q5 - When analyzing large volumes of data, what does this task imply in terms of technological support and, consequently, in terms of costs?
Answer to the question:
Big Data has become viable by combining large volumes of data with the low cost of hardware, software, and cloud computing technologies, and the need for companies to generate new knowledge from data.
For an application to be considered Big Data, it must comply with the V’s of Big Data.
The five V’s are Volume, Velocity, Variety, Veracity, and Value.
This means that a Big Data application must have a large volume of different types of data, real and true data processed at very high speeds, and that generate value for the business.
1 - Data Volume
The first Big Data applications emerged in large Silicon Valley technology companies like Yahoo, LinkedIn, Google, and Facebook, which needed to handle data in the range of TB (Terabytes), PB (Petabytes), and EB (Exabytes).
A Byte is equal to one character, and the word “Portugal” is 8 Bytes in size.
Storage on computers and mobile phones has been measured in KB (Kilobytes), MB (Megabytes), and GB (Gigabytes).
Next in the scale, we have TB (Terabytes), PB (Petabytes), EB (Exabytes), and ZB (Zettabytes), where Big Data applications take place.
Bytes are measured in base 2, which is the binary system of computers, but in the decimal system, the measures follow the rhythm of the known thousand, million, billion, trillion, quadrillion, quintillion characters, or Bytes.
Google generates more than 100 Petabytes of data per day, with 1 Petabyte equivalent to 2^50 in binary base or 10^15 in decimal base.
Until 2007, humanity generated 300 Exabytes of data, which is equivalent to 300x(2^60). Today we exceed 10,000 Exabytes, and it is estimated that we are rising to 50,000 EB or 50 Zettabytes equal to 50x(2^70).
These are astronomical measures in the processing of information.
2 - Big Data Technologies
The goal of Big Data is to gain insights for decision-making, which involves capturing and storing data, analyzing it to understand trends, discovering patterns, detecting anomalies, and seeking an explanation for the problem being analyzed.
For this, Big Data requires innovative technologies, parallel processing, distributed computing, scalability, learning algorithms, real-time queries, distributed file systems, computer clusters, cloud storage, support for data variety, as well as specialist technical teams.
Initially, Big Data could only be done in large companies that invested in research and whose business was directly related to data manipulation.
These companies became “data-driven”, with their business and the development of new products directly related to the analysis of large amounts of data and the generation of insights.
3 - Popularization and cost reduction
Big Data is becoming popular, and costs are decreasing thanks to the development of cloud technologies, software like Hadoop and Machine Learning, hardware like SSD memories for data storage, and GPUs (graphics processing units) that accelerated Big Data applications.
4 - Conclusions for this question
In terms of technological support, today it is possible for small and medium-sized companies to enter this segment, seeking to develop new applications for their businesses.
In terms of cost, it is necessary to evaluate the type of application, estimate the costs of cloud computing, general investments in software and hardware, and the team that will develop the Big Data project.
Each project has a cost based on the amount of hardware, software, and personnel resources to be used.
The trend is for costs to continue decreasing as hardware and software technologies become more accessible and Big Data applications more common, popularizing their development.