Technical aspects of Data Science
A place to discuss the modelling and programming side of data science
A common tactic you hear in data science is:“start by building a simple model, then build a more complicated one and see if it improves performance”For example, start by building a logistic classifier, then perhaps see if a Random Forest performs better.But how much improvement can we expect? If our logistic reg gets 60% accuracy, and our random forest gets 90% accuracy, is that normal or has something gone wrong?I’m interesting everyone’s experiences: when you’ve done this “simple model → complex model” tactic, how big performance boost did you see? Did you see any at all?!
Welcome to the Programming in Data Science part of the Community 👋Some ideas for discussion for this section: New technologies that people are using and finding useful in their data science work Questions on how to use a particular programming language or technology, for example how to use ggplot in R for making charts or how to use Docker Sharing of resources you find elsewhere that might be of use to the community And anything else relevant to programming in data science!
Welcome to the Data Science Modelling part of the Community 👋Some ideas for discussion for this section:The pros and cons of different types of models (e.g. linear models vs neural networks) How to evaluate particular models, and what to look out for How to prepare your data for modelling, e.g. if you’re building an in-market model how should you set up your training data Sharing content on new models or methods coming out of the research literature And anything else relevant to data science modelling!
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.