Placement Experience | Microsoft Data Science
My on-campus placement experience and advice for those preparing for Data Scientist roles. 🚀
Hi, I am Varshil. I recently graduated from EEE dept, IIT Guwahati and am currently working as a Data & Applied Scientist at Microsoft. While documenting my placement experience, I realized that my story may not fit in the standard experience catalogue of people. I had varying priorities regarding placements & higher education, and the latter usually remained on the top all the time. Nevertheless, this article includes my personal story and lessons I learned throughout the placement season, which would be helpful for students applying for data science profiles in the upcoming placement season.
Why Data Science:
Coming to my experience with the placement process, starting with a clear view of the profiles to apply for is critical. The first question that needs answering is: Why pursue Data Science? And what do these roles fulfil? I personally think anyone having a knack for innovating & engineering products using data through recent advances in machine learning should definitely try. In short, data scientists are the middlemen that bring in state-of-the-art research and engineers it for the product improvisation. Moreover, for students interested in the research aspects of ML, these positions require keeping up with the state-of-the-art techniques and collaborating with research scientists, thus keeping up one’s exposure into the domain & assisting in higher studies.
Pre-Test preparations:
When I started out, I was clear on which companies I would be targeting, and I directed my preparations accordingly. With my decision to go forward with both higher education & placement preparations, I went ahead only with companies having Data Science job profiles & targeted some individual companies working in ML even though the job position doesn’t explicitly mention Data Science. Before the tests, I majorly focused on revising & revisiting machine learning concepts (especially its application aspects), probability & linear algebra, solving puzzles and reviewing ML libraries in Python. Moreover, I also practised some coding problems covering the basics of data structures & algorithms from InterviewBit & GeeksForGeeks. I wasn’t able to devote a lot of time towards coding. Still, it is always better to practice more coding, as there are companies with Data Science profiles that conduct coding tests only. While my preparation in ML was based on 2+ years of experience in the same; it may not be the case for everyone. So, in the points below, I will go over some ‘before test preparation’ tips briefly.
- Resume:
- It is a critical part of the placement process. Some go-to tips I followed: Start your resume early on and always try keeping it concise, brief and explanatory; in one-page if possible. Describe your internships and projects such that even a layman can grasp the idea with little understanding of the topic. Avoid using complex terms, code names, etc unless it’s absolutely necessary. If one has multiple projects, only include those on which you can emphasize and show your contribution.
- Portfolio:
- For Data Science profiles, projects and internships in ML related domains matter a lot. Having great (personal & internship) projects, open-source contributions in famous ML libraries or packages, excellent Kaggle profile, and research papers will definitely help stand out of the crowd.
- Machine Learning:
- One would need a thorough understanding of machine learning algorithms, their theoretical aspects and applications. Along with it, basic knowledge of deep learning would also be a good addition. I used this book as my go-to reference for ML concepts and applications. Apart from this, some companies ask coding problems on simple ML models based on regression and time-series data. For those, one should also require familiarity with machine learning libraries like pandas, scikit-learn, etc.
Test:
For Microsoft’s Data Scientist role, the test was MCQ based, consisting of around 60+ MCQs in 60 minutes. The test covered theoretical & application aspects of several topics in machine learning, such as regression, classification, decision trees, random forests, SVM, neural networks, generative/discriminative models and dimensionality reduction algorithms. Furthermore, it included some questions based on skip connections and language models, thus requiring a brief idea of essential concepts in deep learning. While the Microsoft test was solely MCQ based, some other tests actually needed us to code up solutions for ML problems in a short time frame. I also encountered questions on SQL and R language, but they are occasional and can be anticipated beforehand through the job description. For test preparation, I have shared a brief list of topics at the end of this article that covers up a vast majority of test & interview topics.
Two days before the interviews, around 11 students were shortlisted for the interview process. For the SDE profile, Microsoft has a standard ‘Group Fly’ round a day before the interviews, where they give coding questions to solve and select a handful of students for the interviews. While in my case, we were informed that the ‘group fly’ round was not meant for Data Scientist profiles, and we moved forward to the interview process directly. 12 hours before the interviews, I revisited all my notes on ML algorithms, reviewed my resume and focused primarily on previous interview questions and case studies. Apart from that, I relaxed, chilled out and ate a lot of chocolates. :P
Interview:
Coming to the interviews, the process had 3 rounds, all technical interviews and no HR round. In the first round, the interviewer asked me to choose any machine learning algorithm I liked and inquired more on its theoretical aspects. For some specific questions, he was expecting answers understandable to a layman. Below are some of the questions that were asked:
How would you explain information gain in decision trees?
How would you design an anomaly detection algorithm using variants of decision trees?
Neural Networks & weight matrices, backpropagation
Questions related to Ensemble models and stacking.
The second round was more of a case study. Firstly the interviewer asked me if I have worked on any projects that involve feature engineering. Then, he presented a problem, where one is expected to build a classification system that detects faults or bone cracks in high-resolution X-ray images while optimizing several other factors like time delay, cloud processing, scalability, need for local compute etc. With each idea I propose, he corrected and directed me towards a different aspect. The round went on for about 30 minutes or more on the same question. Though my final answer was not what he had expected, he was happy that I reached pretty close and kept churning out viable solutions.
In the third round, the interviewer focused more on my projects & internships. The interviewer was observant & inquisitive towards my internships. As most of my projects involved reinforcement learning, he asked me about some mathematical formulations regarding the same. Moreover, he was curious about the motivation behind those projects. Finally, he started asking questions on ML algorithms & data pipelines, each more difficult than the last till I wasn’t able to answer. The final question where the interview ended was:
How would you design a clustering algorithm using decision trees?
Overall, all the interviewers were extremely friendly, polite and always entertained questions from my side. What I feel helped me a lot during the interview, apart from the core ML knowledge is my past experience on multiple related projects & internships. Revisiting those learning pathways during interviews adds practicality & strong support for the solution I propose. Apart from all that, keep your interview priorities straight and clear, be attentive & confident, and last but not least, be curious and learn from every experience.
There is no foolproof way to crack data science interviews, but it is with one’s own experience and of others before that can help navigate the unknown. Enjoy the placements, learn from it, help your friends and don’t forget to smile throughout the process. May the force be with you!
List of ML topics to review:
- Naive Bayes (Decision boundary)
- Linear & Logistic Regression
- Multi-class vs Multi-Label Classification
- SVM
- Decision Trees
- Random Forests
- Bias Variance Trade-Offs
- Time-Series Data Handling
- k-Means Clustering
- PCA, LDA, t-Sne
- L[1-inf] losses and their behaviours
- Metrics (Precision, Recall, ROC AUC, F1 Score, PR AUC)
- Regularization
- Backpropagation
- Ensemble Models
- Skip connections