Short & Simple: What is Data Science?
Imagine you are about to start your career as a Data Scientist, and your grandmother asks you what’s it all about. This text gives an easy to understand insight that your grandmother will understand.
The New Oil
In order to understand what Data Science is, it is important to deal with the essential component of this profession, the data. “Data is the new oil” is the shortened version of a much-quoted saying from The Economist in 2017[1]. I think that’s a very apt comparison. Let’s dare to do a little experiment. Look around and try to find something that is not oil related. Chances are very good that you won’t find anything, because everything around us is in some way a product of oil, or at least oil was required in the production or supply chain — without oil, the infinite amount of consumer goods from all ends of the world would not exist.
From Pen & Paper to Computers and Smart Phones
The same has been true for data since the beginning of the 21st century. There are almost no products or processes about which data is not collected. Only a sophisticated exchange of data and information makes the most complex supply chains possible. As digitization has progressed, i.e. the transformation process from classic information media such as pen and paper to digital ones such as computers or cell phones, it has become increasingly easy to collect data. Rich performance and storage, as well as access to the Internet, are now so cheap that they have reached the whole of society and (at least in emerging and industrialized countries) everyone can participate in the global exchange of data.
Every cell phone transmits data every second to its manufacturer and an inexhaustible amount of other companies whose interest is to find out how we behave. Machines are not infrequently monitored at a frequency of 100Hz, as a hundred times per second, and some processes make it necessary to use sensors that scan in the kilo or gigahertz range.
Why is Data Collected?
This data is not collected without reason. In most cases, it is because a company or person collecting the data hopes to gain an economic advantage from it. Business decisions are to be made better with the information contained in data. When is it time to replace a component in a machine so that it has as little maintenance time as possible, but still runs reliably? What advertising do I show to which customers so that they buy my products? Is this anomaly in the tissue a tumor? These are the questions that need to be answered, but there is a problem, and this is where Data Science comes in.
The 4 Vs of Big Data
For a long time now, it has no longer been possible to simply bend over raw data to gain insights. In this context, one speaks of Big Data, which is due to the almost infinite mass of measurement data. Beyond that, however, other challenges arise. The data is also collected very quickly — and under certain circumstances, decisions are to be made just as quickly. They are of a very versatile structure and it is also to be examined whether the data are at all correct — thus truthful. These four characteristics are known as the Four Vs of Big Data: Volume, Velocity, Variety and Veracity. Thanks to them, new methods are needed to get to grips with the data.
The Role of a Data Scientist
As a Data Scientist, the task is to transform this data with the help of powerful computers and suitable algorithms in a reasonable amount of time so that it provides meaningful insights for decision-makers or, ideally, so that these decisions can be made automatically. Data is to be collected, prepared, filtered and then presented and analyzed in a comprehensible way. In this context, it is not enough to focus only on the technical framework in the form of mathematical expertise and knowledge about the tools to use, such as databases or programming languages. Rather, it is indispensable in Data Science to also deal with the subject area in which the capabilities are to provide insights, for example, business administration or medicine. But even beyond that, there are other topics, such as the data protection to be ensured or ethical requirements, that need to be addressed by Data Scientists. Data is very often collected from people and their rights must be respected. Models or analyses created by Data Scientists can only be as good as their data basis. Thus, scientists and the public had to realize that such models can be just as discriminating as we humans — who generated this data basis. Here, too, it is the task of Data Science to address these problems and to define and clearly communicate the limits of what is possible.
Sum it up
So in summary, Data Science is a science that seeks to consider the collection, processing, analysis, and presentation of data from a variety of perspectives. These considerations must be included from the technical, as well as from the domain-specific, legal and ethical side, in order to generate added value for companies in their decision-making.
Sources: