Since the term “data scientist” came to the technology scene, there was a cross-raging cross-generation debate, trying to define and distinguish newly branded data scientists and traditional statisticians. In more serious light, data science training is often defined as a meeting of three fields: computer science, mathematics / statistics, and certain domain knowledge. Implied in this definition is to focus on solving specific problems, different from the type of deep understanding that is typical in academic statistics.
In this article, we will take another view on the scientist data / statistical kerfuffle to see if we can find some general point and maybe even the same end point.
Data statistics or statistics?
It seems that the appointment of “Full Stack Data Scienctist” has taken the world with a storm. This is a title that gives rise to the mystical ability of someone who collects information from Deep fund funds easily. It comes from the belief that a data scientist can wave his hand like the 21st century Houdini and easily extract insight from the data.
What is interesting about the field of science is a threat felt to other disciplines, especially statistics. I don’t see this threat real but because both fields are quite different and complementary. In the past decade, it is clear that even though both fields can be separately separately, each is weak without others. Statistics need to understand data modeling and structure, while data scientists need to understand the statistics applied.
No wonder that statistians feel threatened by data scientists to a certain level. Statistics handle vague concepts such as point estimates, error margins, trust intervals, standard errors, values-p, hypothesis testing, and proverbial arguments between “Bayesians” and “Bayesia.” Statistics can be seen as confusing to the general public and many times statisticians cannot even agree to what is right.
Data scientists on the other hand, following a “scientific process of data” that is more approachable; Data digest, data transformation, exploration data analysis, model selection, model evaluation, and data fairy tales. Of course, many of these steps follow the statistical method behind the scene, but they are sealed in a more interesting and understandable wrapper. Many more people can embrace data science.
To be sure, there will always be a need for a strong foundation in statistics. There are many cases where a data scientist will have no clue to do with a certain data set without help from someone with a statistical background. At the same time, if a statistist is given a set of high dimensional data sets with 5 billion lines and 10,000 variables, they will be difficult to regulate data for analysis without consulting data scientists.
Check out : https://aeioutech.com/pii_email_aef67573025b785e8ee2/
More compare and contrast
Although scientists and data experts tend to collect information for similar purposes, their data collection facilities are very different. On the one hand, the number of data for data from scientists is often massive, as a result, they spend a lot of time with tasks such as large-scale data consumption, data cleaning and transformation. In contrast, statistians still rely on more traditional and smaller data collection methods, such as surveys, polls, and experiments.
Usually data science problems are formulated using the modeling process that focuses on models predictive accuracy. Data scientists do this by comparing the predictive accuracy of different machine learning algorithms and choosing models with the best accuracy. Statistics take different approaches to build and test their models. The starting point in statistics is usually a simple model, such as linear regression, where data is verified to determine whether it is consistent with the assumption of the model. The model is enhanced by handling assumptions in the model violated. The modeling process is considered complete when all the assumptions of the model are verified and no assumptions are violated.
While data from scientists focuses on comparing a number of different methods to make the best machine learning model, statistians rather work to improve single, simple models to best suit the data. Statistics tend to focus more on measuring uncertainty than data scientists. As part of the construction process of statistical models, generally to measure the relationship between predicted results and each predictor. Every uncertainty about this connection is also quantified. This process is not common with the tools used by data scientists, namely machine learning.