Data Science is an intriguing field. In 2012, Harvard Business Review named data scientist the “sexiest job of the 21st century” and most recently, Glassdoor named it the “best job of the year” for 2016. As data is increasing at a rate more than ever before, the data scientist is also becoming one of the lucrative and fastest growing career options today.
To learn Data Science, you must first understand what data science is. If you’re not sure, here’s a simple introduction to data science. Just like any other thing to learn, you need to have curiosity, dedication, patience, and practice to learn data science and become a data scientist.
The following guidelines will help you to create a path to learning data science.
1. Learn Programming Language
Kickstart your journey of learning data science by getting familiar with a programming language. Data scientist must know how to manipulate the code to tell the computer how to analyze the data. Python, R, and SAS are a popular choice. Personally, I started with R programming. R and Python are both free and open-source programming languages whereas SAS is a commercial software and is expensive. You can choose either Python or R.
2. Brush up your Mathematics, Statistics and Probability skills
A good data scientist must be able to understand what the data is telling you. To do that, you must have strong knowledge of linear algebra, understanding of probability and statistics. Probability is also called the science of uncertainty. You should know the concept of probability for prediction and statistics for analysis of data and patterns that can solve the problem.
3. Learn Database
Since data science is all about data, so obviously you are going to play with lots of data. Having a sound knowledge of database is very important. A database is used to store data. There are varieties of databases like MySQL, MongoDB, Cassandra, etc. I recommend you learn MySQL.
4. Understand the concept of Machine Learning
Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Machine Learning tools and techniques like k-nearest neighbors, random forests, etc. are mandatory to become a good data scientist.
5. Learn data munging and data cleaning techniques
Data munging (also known as Data wrangling) is the process of converting “raw” data into another format that is easier to access and analyze. Data cleaning is the process of detecting and removing data in a database that is incorrect, incomplete, improperly formatted or duplicated.
6. Learn Data Visualization and Reporting
You must understand the basics of good data visualization and report. Data visualization involves the creation and study of the visual representation of data. You don’t need to be a graphic designer, but you do need to be well skilled in creating data reports that a layman like a manager or CEO can understand. Tableau, R Markdown, etc. are some tools which help in reporting.
7. Skill up with big data
Big data refers to a large volume of data – both structured and unstructured, which are so complex that traditional data processing application software is inadequate to deal with them. Lean Hadoop, Spark. Hadoop is an open-source software framework for storage and large-scale processing of data-sets whereas Apache Spark is Hadoop’s speedy Swiss Army knife, which is a fast-running data analysis that provides real-time data processing function to Hadoop.
As the saying goes ‘Practice makes perfect’, so keep practicing on getting better. Start by taking a simple project. Find some data or use data that you have. There are plenty of open data sources on the internet.
9. Follow data science blogs and engage with the community
There are many data science blogs and communities out there but here are few that you must follow and keep yourself updated with. Also, don’t just be a consumer, try to contribute whenever you can by answering questions, or asking questions or contributing your projects on GitHub, which will be helpful to other enthusiasts.