Professional Documents
Culture Documents
Data Scientist, Inference: develop methods and tools that increase the rigor and efficiency of
experimentation platform using causal inference techniques, focusing on A/B testing, design
and analyze experiments.
Data Scientist, ML: programming heavy, creating models, and deploying machine learning
systems to production. Similar to ML engineers.
Basic
1. Linear Algebra
Required -
Optional -
Topics to cover
● Vectors
● Matrix multiplication, factorization
● Singular value decomposition
● Distance metrics (euclidean, cosine similarity)
Resources:
● 3B1B: link
● Khan’s Academy (beginner friendly): link
2. Calculus
Required -
Optional -
Topics to cover:
● Differential
● Integral
● Single Variable Calculus
● Multivariable Calculus
Resources:
● 3B1B: link
● Khan’s Academy (beginner friendly): link
● Single Variable Calculus MIT course: link
● Multivariable Calculus MIT course: link
Topics to cover:
● Dependence and Independence
● Conditional Probability/Bayes’ Theorem
● Random Variables
● Probability Distribution (Does not need to know in depth, but understand the
definition - a couple of distributions that might be helpful to know in depth)
○ Continuous Distribution
○ Discrete Distribution
● Normal Distribution
○ Central Limit Theorem
○ Skewness (distribution shape)
○ Central tendency (mean, mode, median)
○ Confidence interval
● Binomial Distribution
○ Central tendency (mean, mode, median)
○ Normal approximation to the binomial
● Statistical hypothesis testing (ex. ANOVA, T-test, Z-test, Chi-Square Test)
○ P-value
● Correlation (ex. Pearson’s correlation coeff)
● Concept of multicollinearity
● Randomization of sample
● Variance and standard deviation/standard error
● Law of Large Numbers
● Type I/Type II error
Resources:
● Open Intro to Statistics: link
○ Quoc Anh: “In 2022, I used this book to teach statistics to a complete
beginner. This book is truly beginner friendly, i.e. doesn’t require anything
beyond high school math. While light on math, it’s still conceptually deep,
building up gradually from sample vs population, to sampling distribution,
to hypothesis testing. My student was able to read the book 80%
independently, and asked me clarifying questions about the rest. The
book does have a decent lab section, but it’s a bit too hand-holding and
thus doesn’t develop strong coding.”
● MIT OCW (beginner friendly): link
● Blitz Stein's Probability Course: link
● Rigollet's Statistics Course: link
● Probability and Statistics for Data Science by Carlos Fernandez-Granda: link
● Brilliant.org Prob/stats course (not free): link
● Nick Singh’s 40 Prob/Stat Questions: link
● Bayes rules applied (Medium article) link
● Bayes rules visualized: link
● Causal Inference, pick one of these books link
4. Coding
Required -
Optional -
Topics to cover:
● Basic data structures (e.g. hash map, list, stack, recursion, etc)
● Object-oriented programming (e.g. classes and inheritance)
● Dataframe/Data wrangling/data manipulation:
○ Packages: pandas, numpy, seaborn, pyspark, sklearn
● Coding with probability and statistics
○ Probabilities, implementing efficient calculations
● Implementation of simple/basic ML models
○ KNN
○ Linear/Logistic regression
○ K-mean clustering
● Algorithms
○ Types of sorts
○ Types of searches
○ Recursion (especially for ML/AI roles)
○ Iterations
Resources:
● Any of these online courses Datacamp, EdX, Udemy, or Coursera (beginner
friendly)
● Harvard CS50: link
● OOP - Class & Inheritance: link
● Kaggle: link
5. Database/SQL
Required -
Topics to cover:
● Groupby
● Join (self-join)
● Subqueries
● Window functions: link
Resources:
● Coursera: link
● StrataScratch: link
● Hackerrank: link
● SQL leetcode questions(easy-medium): link
6. Product sense
Required -
Optional -
Topics to cover:
● Product Diagnostic: understanding deviations from norm (e.g. the # of views was
decreased by 10% today, examine the problem and propose solutions)
● Product Improvement: how do you improve certain existing products
● Product Design: should we add more marketing promotion emails, should we
make the Submit button smaller/larger, etc.
● How to measure the success of a product (GAME template: Goal, Action, Metric,
Evaluations)
● Decision to launch/not launch a product/service (e.g. how should we decide
whether to roll out the FB campus feature or not?)
● List useful metrics for identification (e.g. how to identify small businesses on our
platform?)
● Component of A/B testing link
Resources:
● A brief overview of the product sense interview: link
● Trustworthy Online Controlled Experiments (A Practical Guide to A/B Testing):
link
● Emma Youtube Channel: link
● Cracking the PM interview: link
● Data masked (not free): link
● StellarPeers: link
7. Machine Learning
Required -
Optional -
Topics to cover:
● Regression vs. Classification Problems
○ Multi-class classifications
○ Multi-label classifications
○ Binary classification: tailored for heavily unbalanced dataset
● Supervised vs. unsupervised learning
● Parametric vs. non-parametric models
● Comparing advantages / disadvantages across different models
● Bias-Variance Tradeoff
● Overfitting/Underfitting
● L1/L2 Regularization
● Preprocessing: normalization, standardization, and techniques for each
advanced topic
● How to deal with missing data and high dimension data
● Cross validation
● Evaluation metrics: MAE, MSE, RMSE, accuracy, precision/recall, AUC, etc.
● Data Centric: link
ML Models
● K-Nearest Neighbor
● Linear Regression
● Logistic Regression
● Decision Tree/Random Forest
● Gradient Boosting
● Support Vector Machines
● K-Means Clustering
● Principal Components Analysis
Resources:
● An Introduction to Statistical Learning in R: link
● The Elements of Statistical Learning: link
● Stanford CS229: link
● CMU 10-601: link
● Hands-On Machine Learning with Scikit-Learn: link
● Comparative study on classic ML algorithms: link
● Machine-learning-interview repo by khangic: link
Advanced
Topics:
● Backprop
● Loss functions
○ Regression: MSE, RMSE
○ Classification: BCE/Cross-entropy loss, Hinge loss, NLL
■ Relation with logistic regression
■ Softmax formula, smoothing
○ Distribution: KL Divergence
● Optimization
○ Gradient calculation
○ Gradient descent methods
■ Stochastic gradient descent
■ Gradient descent with momentum/acceleration
● Nestorov and similar ones
■ Newton’s method
○ Gradient Clipping
● Feature engineering
○ Potentially multi-model, e.g image and text
● Overfitting
○ Early stopping
○ Regularization, dropouts, layer-or-batch normalization
● Evaluation
○ Precision, recall, f-score
○ ROC, Precision-recall curve
Resources:
● Deep Learning book by Ian Goodfellow and Yoshua Bengio: link
● Deeplearning.ai: link
● Dive into Deep Learning: link
● NYU DS-GA 1008: link
● UPenn ESE546: link
Problems
● Text classification (e.g. fraud detection, sentiment analysis, spam or not spam)
● Text understanding (e.g. application in NER, recommender system, ranking and
retrieval in general)
● Text generation (e.g. question-answering, reading comprehension, text
summarization, machine translation)
Topics
● Word embedding
○ Contextual (Transformer-based)
○ Non-contextual (word2vec)
○ Skip gram and CBOW
● Statistical model (e.g. Markov Chain Model, Conditional Random Field)
● Deep Learning model
○ Recurrent: RNN/LSTM/GRU
○ Attention:
■ Encoder:, BERT
■ Decoder: GPT-1/2/3
■ Encoder-Decoder: Transformers, T5, BART
● Techniques:
○ Metrics:
■ Understanding tasks: F1, Precision, Recall
■ Generation tasks: BLEU/GLEU (for translation, summarization),
Perplexity, METEOR (for Grammar Error Correction), WER (word
error rate)
○ Decoding algorithms: beam search, conditional random field, copy-pointer
(solving out-of-vocab)
Resources:
● Stanford NLP classes and textbooks (publicly available online): link
● Georgia Tech NLP classes and textbooks (publicly available online): link
● NLP Progress: link
Topics to cover:
● Collaborative filtering (user-based)
● Content-based / item-based filtering
● Evaluation metrics: recall @k, ndcg @k, A/B testing, Hit-Ratio
● Deep Learning RecSys (e.g. Neural Collaborative Filtering, Wide-and-Deep)
● Tasks
○ Next-item Recommendation
○ Within-Basket Recommendation
○ Session-based Recommendation
○ Topic Discovery
Resources:
● Overall understanding of collaborative/content-based filtering: link
Topics:
● Multi-task learning:
○ Issues: calibrate convergence speed and scale among loss terms
● Active learning, meta learning
● Calibration
● Reinforcement learning
● VAE
● GAN
● Graph Representation Learning
○ Aggregating neighbors:
■ Issues:
● Deep layers lead to indistinguishable node features
● Sampling large K neighbors requires huge computer
resources
■ Transductive Graph Learning (e.g. GCN, GAN): use node
embedding
■ Inductive Graph Learning (e.g. GraphSAGE): use node meta
○ Graph Transformer: limited by the predefined graph-size
● Machine Learning for Chemistry
○ Tasks:
■ Property Prediction
■ Molecule Generation
■ Property-guided Molecule Generation
○ Problems:
■ Input representation:
● SMILES: linear form of molecules that any change leads to
a completely new molecule.
● Adj. matrix to represent molecule structures.
■ Recurrent approaches:
● Any error in SMILES predictions easily leads to error in
reconstructing molecule structures.
■ Graph approaches:
● SOTA is by graph models which are heavily limited by
graph size (mostly <100 nodes). Any molecule > 100
nodes (atoms) causes performance degradation.
● Hierarchical Graph models are recently applied to generate
motifs (substructures of molecules) to overcome the graph
size but require good distribution of motifs.
● Contrastive Learning: applications in security, ranking and retrieval
Other Resources
● Steve Nouri’s 800 DS questions: link
● Ian’s Data Science cheat sheet: link
INTERVIEW EXPERIENCE
THIS IS THE TEMPLATE TO SHARE YOUR INTERVIEW EXPERIENCE
Notes:
● If you accepted the offer, could you please share what are the pros and cons of
working at your current team/company?
● Are you okay with sharing your experience to a larger audience?
● Do you wish to stay anonymous?
FAIRE - DATA SCIENTIST
Timeline
- 4 weeks since online application (no referral)
Format
- Online assessment: 2-hour CodeSignal
- Hiring Manager interview: 45-min
- Onsite: 4 rounds, 1 coding, 1 ML model building, 1 product sense, 1 behavioral
Online assessment
- 2-hour data science assessment via CodeSignal. Lots of SQL (3-4 questions, basic join
+ subqueries, no window function needed) and basic ML questions (how to deal with
overfitting, what is gradient descent, what is the difference between boosting and
bagging, etc.). There might also be prob/stats questions.
Onsite Interview
- 1hr coding challenge. The problem that will be given doesn’t require knowledge of any
specific algorithms or data structures, but rather, it tests whether you can implement
basic logic and control flow (loops, if/else statements), write clean/organized/idiomatic
code, and debug edge cases. The question itself is not too hard but make sure to
convey your thought process throughout the interview. Communicate calmly and clearly
to the interviewer is the key to succeed through this round (and of course your codes
have to work and pass all test cases lol)
- 1hr of model building. For this exercise, you will be expected to perform exploratory data
analysis, prepare data for model building, and train/evaluate a model on the dataset
provided. You’ll have an option to train either a regression or a classification model with
techniques of your choice (logistic regression, linear regression, tree based model etc).
The interviewer allowed me to use Google or whatever templates I have. So make sure
you have some sort of sample codes for EDA, data manipulating, model
training/evaluation, etc. pretty much a whole pipeline for a standard Kaggle project.
Interviewer asked me lots of “why” questions during this interview, to test my
understanding of the dataset, my ability to break down a vague question into smaller
pieces, and my logic behind using certain algorithms.
- 30 mins of behavioral questions. Also very standard, tell me about a time you failed, tell
me about a time you led a team, tell me about a time you had a conflict with
coworkers/boss. Just do the STAR template, have 4-6 different projects/stories ready,
practice a few times and you should be fine
Overall Experience
- Faire interviews are one of the most challenging and well rounded/well structured
interviews I had. My technical interviewers all have PhD in CS/Maths/Physics from great
schools and are highly intelligent. They also are down to earth and very helpful. They
know how to ask a question to gauge interviewee’s knowledge, and know how to lead a
conversation. I did not enjoy my behavioral round though. Interviewer seemed to be
disengaged, kept reading questions from the list, and did not seem to be interested in
my answers, but oh well.
- Faire is a great company with very interesting products and a very bright future. 1 month
after I rejected their offer, the company raised Series G and my offer would have
increased quite a bit had I accepted. I still regretted it until this day.
Timeline
- Applied with referral, interviewed for 2-3 weeks, received offer a week after final round
Format
- 1st round: 20-minute phone screening
- 2nd round: 45-minute technical interview
- Final round: one 45-60 minute presentation, three 45-minute technical interviews
Onsite
- Because of Covid, my onsite was conducted via Zoom. I had one week to prepare a
45-minute long presentation about 1-2 research projects. I presented my work to the
entire team, followed by a 15-minute Q&A.
- After the presentation, I had three back to back 45 minute interviews with three team
members, one of them is my current manager. Types of questions include 1/ coding
(Leetcode easy to medium and modeling), 2/ machine learning concepts (very close to
the ML, DL, NLP topics above), and 3/ technical details in my projects. For example, I
used BERT in my previous work so I got asked about BERT a lot - make sure you study
at least the Devlin paper carefully if you said you know BERT.
Overall Experience
- I enjoyed the process and types of questions they asked. The questions cover a wide
variety of skills and topics including coding, math, mostly ML/DL, and a little bit of
advanced NLP. I think it suits me better than a coding heavy one because I don’t have a
background in CS. I also think it sets a pretty accurate expectation of the requirements of
this role and type of work.
Timeline
- I received the Microsoft Career Opportunities email on July. 23rd 2021
- Got the Microsoft Remote interview invitation email on Sep. 1st 2021
- First Round: Sep. 17th 2021, Virtual Onsite: Oct. 21st 2021, Offer: Oct. 25th 2021
- Since I got another offer deadline on Nov. 1st, hr helped me to fasten the process.
Format
- First round includes many short answer questions(bq with very simple coding)
- Virtual Onsite
- Two tech round: test on simple SQL coding(easy level in Leetcode), simple
Python coding, machine learning concepts, professional experience, deep
learning concepts and some statistics knowledge
- Hiring Manager: professional experience and some pop-up question based on
your response
Overall Experience
- Virtual onsite has three back to back interviews, so it is kind of tiring. However, the level
of difficulties is okay. Although I am not familiar with deep learning concepts, I think it is
acceptable.
Timeline
- Applied with referral, interviewed for about 4 weeks, decided to stop because accepted
another offer.
Format
- 1st round: 1-hour technical interview with a data scientist
- 2nd round: take home DS assignment
- Final round: one 30-minute presentation, followed by questions
First round
- This round consists of 3 main topics: prob/stats, A/B testing, and product sense
- Prob/stats: lots of questions came straight from StrataScratch (with some minor
modification), which really surprised me (cause these questions were asked 2
years ago) eg. Cost of discount coupon, Discount Coupon Usages, Two coupons
- A/B testing: everything about A/B testing. Some very tricky questions:
- How do you determine the sample size for the A/B testing?
- We usually split the control and experimental group 50-50. But if I split it
25-75, would you still feel "confident" about the result?
- Product sense: very similar to Meta DS analytics interview
- We want to launch a shared drive product, how to evaluate the success?
- Total number of booked rides decrease by 5% since last week, how to
interpret?
Overall Experience
- Can only speak to the first round. I really enjoyed my conversation with the DS. He
seemed to be very genuine and enjoyed what he does at Lyft. The conversation was
awkward at first as he literally picked questions from a list (the prob/stats part), but the
longer it went, the more he opened up about what he likes/doesn’t like about the
company. Lyft definitely offers a lot in terms of TC. I’m not knowledgeable enough in ride
sharing service so not sure how they fare against Uber. I hope they remove the take
home assignment tho, it seems to cause lots of friction among candidates.
VERISK - DATA SCIENTIST
Timeline
- About 2 months (no referral) from September to November
Format
- Online assessment
- Video interview
- Onsite
Online Assessment
- 90 minutes assessment from HackerRank with coding, statistics and ML questions
- 2 Easy coding questions
- Multiple-choice questions relating to probabilities, statistics and ML topics
Video Interview
- A quick video interview on Wepow
- Answer 3 questions from 3 data scientists, all are behavioral questions
Hiring Manager
- We hire candidates for the entire company and they will later be assigned to different
business units so there was no hiring manager. Later I had a chance to join the DS
interview committee and learned that after the onsite interview round, the interview
committee discussed and put each candidate into each of 3 buckets: offer made, no offer
or more discussion needed. Everyone in the committee no matter entry level or director
level has equal voice in evaluating the candidates.
Onsite
- Round 1: Interview with HR, mostly to learn the candidates’ interest and preferred
location for team matching
- Round 3: Presentation
I was given a dataset several days before the interview. For this round I needed to
prepare a presentation to present my analysis on the data. I just prepared a Colab
notebook with my codes, analysis, visualization and detailed interpretation. The
interviewer was a Lead Data Scientist who described himself as very nerdy. He asked
me several questions about my analysis and choice of algorithms. Fortunately for all the
questions he asked, I already wrote my explanation in my Colab notebook. So a
thorough preparation helped me a lot in this round. After the interview, I was told to be
suited for engineering roles in Data Science.
Overall Experience
- The interview experience was very well-structured. But since I was not interviewed by
the team I would join but the team matching would take place later, the interview
experience was not very personal and I couldn’t ask any questions relating to team
culture.
Timeline
- 04/02: Submitted application
- 04/05: Invited to phone screen with recruiter
- 04/08: 30 min behavioral phone screen with recruiter
- 04/13: 45 min interview with Hiring Manager
- 04/21: 45 min Stat/Business case + 60 min live ML interview
- Offer after 2 hours
- 3 weeks from application, no referral
Format
- Hiring Manager: 45 min
- Statistics/Business case: 45 min
- ML: 1hr
Onsite
- 45 min Stat: a big focus on business case. The interviewer also has an Econ background
(undergrad from UChicago working/co-authoring with 3 renowned economists – one
Nobel laureate and PhD from Top 1) and a few years of prior experience. The question
started off vaguely, also related to churning. I had some trouble navigating the scope of
the problem so the first few minutes were a bit awkward. The interviewer was
understanding and trying to help/suggest ways for me to narrow down and redefine the
problem. He mentioned that there would be two cases but we only did one (I don’t know
if he changed his mind in the middle of the conversation or because I managed time
poorly and never got to a solution for the first case). He would ask me questions on my
assumptions and push deeper, like you mentioned that you would want to predict
revenue, how would you do that. Now you have a model, which features would you
include? How do you know if the model is working? Assuming you would want to test
responses to a coupon, how would you design the experiment? What information do you
need? Explain power calculation. How do you measure the components? If we run the
experiments and find no statistically significant result, however, other PMs use the same
data on a subset of retailers and got significant results and would like to roll the feature
out, what would you do? There were a few more questions along that line. Basically, any
ideas I have, he tested why and how I would realize them. We talked about predictive
model (regression models with fixed effects), XGBoost (how do you design an XGBoost,
what would be your y labels), time series forecasting (how do you treat seasonality). I
think I brought it onto myself because I mentioned these words. So yeah, make sure you
know your stuff. I wasn’t too confident about the XGBoost stuff and confused myself for a
moment. My impression is that this interview was like rapid fire.
- 1hr ML interview. Interviewer has a MS from UCBerkeley + some prior work experience,
now specializes in Risk. Data set was sent by email; googling and templates were
allowed. The interviewer stated the objective of the exercise very clearly: clean the data,
split it to train/test sets, train a classifier, evaluate on the test set, print out 10 retailers
with highest probabilities of defaulting. He said not to care about feature engineering
(there were about 40 features), everything was numerical but there were a few with
missing values. The data set was on the transaction level and the goal for prediction was
on the retailer level so I needed to make a few assumptions. The interviewer muted
himself so I was kinda on my own. I got stuck when my one line of code wasn’t running
(silly basic syntax error), he offered to help but I consulted Google and got through. I did
make some mistakes in getting the data so it was not even compatible with my model, I
just went back to fix it…I made sure to think out loud and communicated what I planned
to do and why I think it might work if I had more time. He also asked about metrics, why I
use them, and what they mean. I honestly didn’t finish the last item. I was close enough
that he said he thought I was on the right track. He did help me redefine the problem and
told me I was overthinking it hahaha. So overall, I think he was quite lenient on me. He
asked me to send the code to him and I took ~10 extra minutes to brush it up a bit more.
Overall Experience
- So far this is the best company when it comes to communication. The platform to set up
interview time and communication about the interviewers, type of interviews, and next
steps are clear and smooth. I almost never had any wait time. They promised to get back
to me within 2 business days (even if I get rejected) and they got back in 2 hours.
- Everybody was on time and easy to talk to. The ML guy talks a bit faster and has some
accent but overall nothing to complain about.
Format
- A quick call with a recruiter
- 1st Tech round 1 hour with a Machine Learning Scientist
- 2nd Tech round 1 hour with a Applied Scientist
- Hiring Manager interview: 45-min
Tech interview
- Tech interview was a combination of machine learning/statistics questions and coding
problems. It is very important to know details of models that you wrote down on your
resume. They will ask you not only the big picture of your work, but also technical details
of ML models that you used (ex. How’s K means clustering algo work? And what's the
objective of K means clustering?) Also, the team that I was interviewing was focused on
inference work, so they asked me lots of regression/causal inference questions. Knowing
fundamental concepts and concisely explaining the concepts are important as well (ex.
How do you explain regression coeff to a non-technical person?)
- First round tech person asked me to do some 3-4 SQL questions (Leetcode medium).
Second round tech person asked me to do 2 Python questions (Leetcode easy). Both
interviewers asked me one or two leadership principle questions (ex. Dive deep or learn
and be curious type).
Overall Experience
- Overall, the interview experience was good. As long as you know well about
models/techniques that you wrote down on the resume in detail and fundamental
statistics/causal inference, then I think the interview should not be bad. SQL/Python
leetcode easy/medium should be good to prepare. But, types of interview can be
different by teams, so make sure to do some research about your interview team if you
can (some interview cases might be non-team specific though). Also, even if you might
not know all things that interviewers ask (or you might not know at all), instead of saying
that you don't know, you can first list things that you know and say that you are looking
forward to learning more about it. From my interview experience, it’s okay to say it rather
than pretending that you know and getting more difficult questions later.
How did you prepare?
- Machine learning course/Statistic courses from my education
- Some of youtube channels were helpful (https://www.youtube.com/c/joshstarmer)
META - DATA SCIENCE INTERN
Timeline
- Applied with internal referral (i.e. rejected for a different role, and was referred to this role
by the recruiter)
Format
- Recruiter call (which I skipped due to internal referral)
- Coding Round
- Product Sense / Modeling Round
Recruiter Call (for a different team within the same company) -30 min
- Asked about background follows by a product sense question. The question was “how
would you identify real-life best friends from instagram” (It’s okay to share now since I’ve
interviewed for them a year ago and they have since then scrapped this question). They
look for structure in your response. I would recommend studying the STAR framework,
and practice how to first state the problem, explain possible metrics, rank metrics, and
discuss caveats. There’s hardly much ML that goes into product sense questions, but is
rather an exercise for thinking-out-loud and propose feasible rationales on the spot.
- They might also ask behavior questions like previous projects.
HR Round - 45 min
- What happens in the round heavily depends on your (potential) group. If you are
applying the DS Analytics, it is more or less the same (i.e. another product sense). I
instead had a modeling interview on a hypothetical scenario regarding revenue
projection. My HR round on a different team asked about advertisement search. It’s still
very much intuition based. They want to hear you breaking down a problem, thinking
about what could possibly be done, and list a few algorithms under certain assumptions.
The most important knowledge for me was to be able to list pros and cons for different
ML algorithms, and knowing when to use which.
Timeline
- Applied w/o referral and took 3 weeks to offer
Format
- Hired directly for the Bixby Lab
- 3 technical rounds interviewed with teammates and the direct supervisor
Onsite:
- 1st round: coding questions are Medium LeetCode-styled
- 2nd & 3rd round:
- 1-2 Medium LeetCode-styled coding questions or implementation of a ML
algorithm
- Theoretical ML questions regarding traditional ML, Deep Learning
- SOTA NLP questions and other advanced Deep Learning techniques
Overall Experience
- Since this is the direct hire for Bixby Lab, I can easily narrow the knowledge scope. In
my opinion, this is standard for a research internship at bigtech.
- The interview feedback was fast and intense that I had 3 interviews in 2 weeks.
Pros
- I worked on SOTA techniques for virtual assistants and was directly supervised by
well-known scientists.
- Encouraged to file patents and publish papers.
Cons
- The DevOps teams are located in South Korea. Hence, it takes time to get help,
especially in obtaining computer resources to conduct experiments.
- Bixby is not a brand-name product. Hence, it may not be a big bump in resume.
7-ELEVEN - DATA SCIENTIST INTERN
Timeline
- Applied w/ referral and received offer in 2 weeks
Format
- Hired directly for the team
- 1 technical round directly with supervisor
Onsite:
- 1st round: ~1hr, 1 Medium LC questions + theoretical questions about
ML/DL/RecSys/NLP + design ML algorithms for specific problems
Overall Experience
- My work focused on a new RecSys task that is not actively researched. During the
internship, I was assigned to mini tasks (e.g fine-tuning current in-deployment
algorithms) besides working on the internship project.
- The supervisor was supportive. However, the team was not familiar with Deep Learning.
Hence, I didn’t receive much help from the team.
- The RecSys in the convenient retail is greatly challenging due to the 5-min visit time.
Pros
- The task is challenging but big-impact.
- Gained understanding of e-commerce and convenient retail.
Cons
- GPU power resources are only available upon request.
- The IT department is slow in resolving equipment and software requests that slow down
progress.
- 7-Eleven is not a big name in the Deep Learning field. Hence, it won’t be a big bump in
resume.
CREDIT
Special thanks to these wonderful data scientists who devoted their time and effort to
completing this document!
● Somang Han
● Caroline Liongosari
● Sarah Ye
● Dat Ngo
● My Phung
● Yuchen Zhang
● Ian McDonald