What is Data Science?

1. Definition from Wikipedia:

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured,[1][2] similar to data mining.

Data science is a “concept to unify statistics, data analysis, machine learning and their related methods” in order to “understand and analyze actual phenomena” with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science.

10 steps to become a pro data scientist:

1. Develop Skills in Algebra, Statistics, and ML

2. Learn to Love (Big) Data

3. Gain a Thorough Knowledge of Databases

4. Learn to Code

5. Master Data Munging, Visualization, and Reporting

6. Work on Real Projects

7. Look for Knowledge Everywhere

8. Communication Skills

9. Compete

10. Stay Up to Date with the Data Scientist Community

Understand Data: Data is useless and can (and should) be misleading without the context. Data needs a story to tell a story. Data is like a colour that needs a surface to even prove its existence, as colour red for example, can’t prove its existence without a surface, we see a red car, or red scarf, red tie, red shoes or red something, similarly, data must be related to its surroundings, context, methods, ways and the whole life cycle where it is born, generated, used, modified, executed and terminated.

I have yet to seek out a “data scientist” who can ask me about the “data” without mentioning technologies like Hadoop, NoSQL, Tableau or other sophisticated vendors and buzzwords. You need to possess an intimate relationship together with your data; you would like to understand it inside out. Asking somebody else about anomalies in “your” data is adequate to ask your wife how she gets pregnant. One of the distinct edge we had for our relationship with the UN and therefore the software to secure schools form bombings is our command over the underlying data, while the world talks about it using statistical charts and figures, we are those back home who experience it, live it in our daily lives, the importance, details, and the appreciation of this data that we have cannot be find anywhere else. We do an equivalent with our other projects and clients.

Understand Data Scientist: Unfortunately, one of the most confused and misused words in data sciences filed is the “data scientist” itself. Someone relate it to a mystic oracle who would know everything under the sun, while others would scale back it right down to statistical expert, for few its someone familiar with Hadoop and NoSQL, and for others it is someone who can perform A/B testing and can use so much mathematics and statistical terms that would be hard to understand in executive meetings. For some, it’s visualization dashboards and for others, it’s a never-ending ETL process.

For me, a Data Scientist is someone who understands less about the science than the ones who create it and little less about the data than the ones who generate it but exactly knows how these two works together. A good data scientist is that the one who knows what’s available “outside the box” and who he must connect with, hire, or the technologies he needs to deploy to get the job done, one who can link business objectives with data marts, and who can simply connect the dots from business gains to human behaviours and from data generation to dollars spent

For someone who is considering self-learning, there would be a few of learning pathways, you’ll follow the one below or develop your own schedule.

But a psychological prerequisite that’s important has a real interest to seem at data, understand data and work with it.

From a knowledge perspective, you’d got to start learning to programme, just in case of no prior experience.

In the First Step: Learn Programming (R or Python), become proficient.

I did this FREE- E-Book

Latest E-Book on R Programming Language for Beginners

or you can watch this video on Youtube

[Data Science] | How to Identify Missing Values and Outliers Using R | Eduonix

Second Step: Gain the knowledge up to Intermediate Statistics, re-learn College Algebra, algebra, understand theoretical foundations, logic & applications of Machine Learning algorithms

Third Step: Work with independent projects, you’ll get the info sets from platforms like Kaggle. Try to implement your learning during a step by step fashion while solving the objectives of those projects.

my suggestion is a beginner level course

Introduction to R for Data Science | Data Science Tutorial

but you will find the best path once you begin learning by doing


Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store