Data Science is an intriguing field. In 2012, Harvard Business Review named data scientist the “sexiest job of the 21st century” and most recently, Glassdoor named it the “best job of the year” for 2016. As data is increasing at a rate more than ever before, the data scientist is also becoming one of the lucrative and fastest growing career options today.

To learn Data Science, you must first understand what data science is. If you’re not sure, here’s a simple introduction to data science. Just like any other thing to learn, you need to have curiosity, dedication, patience, and practice to learn data science and become a data scientist.

The following guidelines will help you to create a path to learning data science.


1. Learn Programming Language

Kickstart your journey of learning data science by getting familiar with a programming language. Data scientist must know how to manipulate the code to tell the computer how to analyze the data. Python, R, and SAS are a popular choice. Personally, I started with R programming. R and Python are both free and open-source programming languages whereas SAS is a commercial software and is expensive. You can choose either Python or R.

programming language data science


2. Brush up your Mathematics, Statistics and Probability skills

A good data scientist must be able to understand what the data is telling you. To do that, you must have strong knowledge of linear algebra, understanding of probability and statistics. Probability is also called the science of uncertainty. You should know the concept of probability for prediction and statistics for analysis of data and patterns that can solve the problem.

maths, stats, machine learning


3. Learn Database

Since data science is all about data, so obviously you are going to play with lots of data. Having a sound knowledge of database is very important. A database is used to store data. There are varieties of databases like MySQL, MongoDB, Cassandra, etc. I recommend you learn MySQL.



4. Understand the concept of Machine Learning

Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Machine Learning tools and techniques like k-nearest neighbors, random forests, etc. are mandatory to become a good data scientist.

machine learning


5. Learn data munging and data cleaning techniques

Data munging (also known as Data wrangling) is the process of converting “raw” data into another format that is easier to access and analyze. Data cleaning is the process of detecting and removing data in a database that is incorrect, incomplete, improperly formatted or duplicated.

data munging


6. Learn Data Visualization and Reporting

You must understand the basics of good data visualization and report. Data visualization involves the creation and study of the visual representation of data. You don’t need to be a graphic designer, but you do need to be well skilled in creating data reports that a layman like a manager or CEO can understand. Tableau, R Markdown, etc. are some tools which help in reporting.

data visualization tools


7. Skill up with big data

Big data refers to a large volume of data – both structured and unstructured, which are so complex that traditional data processing application software is inadequate to deal with them. Lean Hadoop, Spark. Hadoop is an open-source software framework for storage and large-scale processing of data-sets whereas Apache Spark is Hadoop’s speedy Swiss Army knife, which is a fast-running data analysis that provides real-time data processing function to Hadoop.

big data


8. Practice

As the saying goes ‘Practice makes perfect’, so keep practicing on getting better. Start by taking a simple project. Find some data or use data that you have. There are plenty of open data sources on the internet.



9. Follow data science blogs and engage with the community

There are many data science blogs and communities out there but here are few that you must follow and keep yourself updated with. Also, don’t just be a consumer, try to contribute whenever you can by answering questions, or asking questions or contributing your projects on GitHub, which will be helpful to other enthusiasts.

R-blogger, Analytics Vidhya, DataTau, KDNuggets, Data Science Central are some websites that I recommend you to follow.


data science blogs

Facebook Comments
Samikshya Gautam

Subscribe to Samikshya's Blog

Join my mailing list to receive the latest updates and notification.

You have successfully subscribed. Check your mail!