How to self learn data science

  • 11 April 2022
  • 0 replies
How to self learn data science
Userlevel 3

There are different ways you can start a career in Data Science and they don’t necessarily require you to have completed a degree in Data Science or Computer Science. 


With the rapid expansion of the Data Science community, many resources have become available online for self learning; from blog posts, to videos, to free books, to entire courses.

However, navigating in this sea of information and deciding where to start can be overwhelming. 


I found myself in the same situation when I first started my journey in the Data Science world, after finishing my Masters in Astrophysics and having very little knowledge on the topic. 

In this blog post I would like to share what I learned from my personal experience to everyone who would like to start a career in Data Science from scratch. 


The basics of data science


Data Science is a very wide field and in continuous evolution with different areas you can specialise in. However, there are core skills required in a data scientist role that you should aim to master. 


In this blog post Amy and Sorcha explain very well what these skills are and why they are important. 


At the heart of everything a data scientist does is programming, so this is a good starting point if you are not familiar with coding. Most data scientists work in Python and/or R: both languages offer a wide variety of libraries to implement Machine Learning methods and are also useful for data analysis and visualisation. 


Another key aspect is understanding the algorithms behind the Machine Learning methods you will eventually learn to apply. This requires some mathematical and statistical knowledge that will naturally be easier for some learners than others. Although having a maths and stats background is helpful, becoming skilled in Machine Learning is achievable for anyone willing to put in the work. 


Getting started


I recommend starting with some practical resources first, like a course with hands on exercises so you can familiarise yourself with key Data Science and ML concepts, test your knowledge and, most of all, have fun tackling a real world problem. In time you can then move on to the more theoretical background. 


A personal favourite of mine is this Udemy data science course 

It is very well structured and accessible for a complete beginner, in both programming and Machine Learning. 


The first part of the course is dedicated to programming and offers a crash course in Python, starting with explaining how to set up a Python environment on your computer. It then goes on to focus on libraries that are most useful for Data Science, both for data manipulation and visualisation. 


The second part of the course covers a variety of Machine Learning methods most commonly used in Data Science; from linear/ logistic regression to decision three/random forest, clustering methods, natural language processing and neural networks. 


For each method there is a mini-project you can work on to test your understanding, mostly using models from Python scikit-learn library. 


My top resource recommendations

Although this is a very valid course, it alone is not enough to prepare you for a real Data Science job as each topic is explained in a very introductory way, with little emphasis on the maths behind the models.


My approach to this course was to mostly use it as a guide. It gave me a structure of what I needed to learn and a general idea of each topic but then I used a variety of other resources to deepen my knowledge. 


Here is a list of useful resources I particularly enjoyed and highly recommend to consult:


  • Python Data Science Handbook : this is a book about doing Data Science with Python. It assumes the users to have a bit of familiarity with the Python language.

  • R for Data Science : this book explains how to apply the R language to Data Science.

  • 3Blue1Brown: a really good YouTube channel explaining advanced maths concepts using pretty cool visualisation tools. If you don’t have a strong maths background this channel has a series of videos on linear algebra and calculus:


  • StatQuest: another great YouTube channel explaining Statistics, Machine Learning and Data Science in a friendly and accessible way.

  • An Introduction to Statistical Learning with Applications in R (2nd Edition): this book will give you a good theoretical understanding of a variety of Machine Learning methods. It offers a good balance between descriptive writing, maths and stats concepts and hands on examples, resulting accessible to different backgrounds.

  • Machine Learning, A Probabilistic Perspective: this book is more advanced explaining Machine Learning using concepts from probability theory. The reader is assumed to be familiar with basic calculus, probability and linear algebra.

  • Kaggle: a platform for Data Scientists to access a variety of data sets, share code with other users and join competitions. This is where you can keep having fun with some practical projects and learn the challenges of a Data Science project.


The resources I provided will hopefully give you a solid theoretical background and best prepare you for a first job in Data Science.


My final advice is that, it’s okay not to know everything, each of us has a unique background and set of skills. What matters is being open to learn. This is only the beginning of your journey in Data Science, during your career you will always be learning and expanding your knowledge, which is both the challenge, as well as the joy of the profession! 

0 replies

Be the first to reply!