You are on page 1of 8

如何修改一份完美的DS求职简历(中/英文版)​

Here are the 7 heuristics used to quickly screen your Data


Science CV:​
1. Prior experience as a Data Scientist​
I’m going to quickly run through your CV to look at your previous positions and see which are
marked as ‘Data Scientist’. There are some other adjacent terms (depending on the role I’m
hiring for), such as ‘Machine Learning Engineer’, ‘Research Scientist’ or ‘Algorithm
Engineer’. I don’t include ‘Data Analyst’ in this bucket as the day-to-day work is typically
different from that of a Data Scientist and the Data Analyst title is an extremely broad term.​
If you’re doing data science work at your present job and you have some other creative job
description, it’ll probably be in your best interest to have your title changed to a Data Scientist.
This can be very true for Data Analysts who are de facto Data Scientists. Remember, even if the
CV contains descriptions of the projects you’ve worked on (and they include machine
learning), a title other than Data Scientist will add unnecessary ambiguity.​
Additionally, if you’ve undergone a data science bootcamp or full-time masters in the field, this
will probably be considered the beginning of your data science experience (unless you worked in
a similar role earlier, which will warrant questions at a later stage).​
2. Business-oriented achievements​
Ideally, I’d like to read what you did (technical aspects) and what the business outcome was.
There’s a lack of technically savvy data scientists who can talk in business terms. If you can
share the business KPIs that your work impacted, that’s a big thumbs-up in my book. For
example, indicating your model’s improvement in AUC is alright, but addressing the
conversion rate increase as a result of your model improvement means you ‘get it’ — the
business impact is what really matters at the end of the day. Compare the following alternatives
depicting the same work with a different emphasis (technical vs business):z​
a. Bank loan default rate model — improved model’s Precision-Recall AUC from 0.94 to 0.96.​
b. Bank loan default rate model — increased business unit’s annual revenue by 3% ($500K
annually) while maintaining constant default rates.​
3. Education​
What’s your formal education and in what field. Is it a well-known institution? For more recent
grads, I’ll also look at their GPA and whether they received any excellence awards or honors
such as making the Rector’s or Dean’s list. Since Data Science is a wide-open field without
any standardized tests or required knowledge, people can enter the field in various methods. In
my last blog, I wrote about the 3 main paths taken into the field and based on your education
and timing, I’ll figure out which one you probably took. Hence, the timing helps understand
your story — how and when did you transition into data science. If you don’t have any formal
education in data science, that’s fine, but you need to either demonstrate a track record of
work in the field and/or advanced degrees in similar fields.​
4. Layout / visual appeal​
I’ve seen some beautiful CVs (I’ve saved a few of these for personal inspiration) but I’ve also
received text files (.txt) that lack any formatting. Working on your CV can be a pain, and if
you’ve chosen data science as your endeavor there’s a good chance you don’t enjoy
creating aesthetic designs in your spare time. Without going overboard, you do want to look for
a nice template that enables you to get everything across in limited space. Use the space wisely
— it’s useful to split the page and highlight specific sections that don’t fall under the
chronological work/education experience. This can include the tech stack you’re familiar with,
a list of self-projects, links to your github or blog and others. A few simple icons can also help
with emphasizing section headers.​
Many candidates use 1–5 stars or bar charts next to each language/tool they are familiar with.
Personally, I’m not a big fan of this approach for several reasons:​
• It’s extremely subjective — is your ‘5 star’ the same as someone else’s ‘2 stars’?​
• They mix languages with tools, and in the worst cases with soft skills — saying your ‘4.5
stars’ at Leadership isn’t helpful. As a strong believer in a growth mentality, claiming to
max out a skill (especially a harder to quantify and harder to master soft skill) feels very
presumptuous.​
I’ve also seen this approach abused even further by taking the subjective measures and
turning them into a pie chart (30% python, 10% team-player, etc). While this was probably
supposed to be a creative way to stand out, it demonstrates a lack of basic understanding
behind the concepts of different charts.​
Here are two examples of CVs I’ve found visually appealing, with details blurred for anonymity.​
Credit Eva Mishor (used with permission)​
Visually appealing CVs of Data Scientists, details blurred. Note the vertical split used in both
examples to differentiate experience, skills, achievements and publications. In both cases, a
short summary paragraph helps describe their background and desires. Used with permission
from owners.​
5. Machine Learning variety​
There are two types of variety I look for:​
1. Type of algorithms — structured/classic ML vs Deep Learning. Some candidates have only
worked with Deep Learning, including on structured data that could have been better suited
with tree-based models. While there’s no problem per se with being an expert at DL,
limiting your toolset can limit your solution. As Maslow said: “If the only tool you have is a
hammer, you tend to see every problem as a nail.” At Riskified we deal with structured,
domain-driven, feature-engineered data which is best dealt with various forms of boosting
trees. Having someone whose entire CV points back to DL is an issue.​
2. ML Domain — this is usually relevant in two domains that require much expertise —
computer vision & NLP. Experts in these fields are in demand and in many cases, their entire
career will be focused on these domains. While this is crucial if you’re looking for someone
to work on that field, it’s typically a bad fit for someone to work in a more general data
science role. So, if most of your experience is in NLP and you’re applying for a position
outside the domain, try to emphasize positions/projects we’re you’ve worked on
structured data to demonstrate variety.​
6. Tech Stack​
This can generally be broken down into languages, specific packages (scikit learn, pandas, dplyr,
etc), clouds and their services (AWS, Azure, GCP) or other tools. Some candidates mix this up
with algorithms or architectures they are familiar with (RNN, XGBoost, K-NN). On a personal
note, I prefer that this revolve around technologies and tools; when a specific algorithm is
mentioned it makes me wonder whether the candidate’s theoretical ML knowledge is limited
to just those specific algorithms.​
Here, I’m looking for the relevance of the tech stack — are they from the last few years (a
positive sign that the candidate is hands-on and learning new skills), the breadth of the stack
(are they very limited to specific tools or are they familiar with quite a few things) and the fit
with our stack (how much will we need to teach them).​
7. Projects​
Is there something you’ve worked on that you can share on GitHub? Any Kaggle competition or
side-project can be very helpful, and enables looking at concise code, types of preprocessing,
feature engineering, EDA, choice of algorithm and countless other issues that need to be
addressed in a real-life project. Add a link to your GitHub and Kaggle account for interviewers to
dive into your code. If you don’t have much experience, there’s a good chance you’ll be
asked about one or more of these projects. In some interviews I had, the candidate didn’t
remember much about the project and we couldn’t develop a conversation regarding the
choices they made and the reason behind them. Be sure you brush up on the work you did or
keep it out of the CV. Similarly, make sure you present your best work and you’ve put enough
time and effort into it. It’s better to have 2–3 high-quality projects than 8–10 medium (or
lower) quality.​

以下是快速筛选数据科学简历的7个要素:​
01 有数据科学家的经验​
我将快速浏览一下你的简历,看看你以前的职位,看看哪些被标记为“数据科学家”。还会看一些相
关的术语(取决于招聘的职位),如“机器学习工程师”,“研究科学家”或“算法工程师”。我不
把“数据分析师”包括在这个范畴内,因为日常工作通常与数据科学家不同,数据分析师这个头衔是
一个非常宽泛的术语。
如果你目前在从事数据科学方面的工作,并且有一些其他创造性的工作经历,那么将你的头衔改为数
据科学家可能是对你最有利的,这对于事实上是数据科学家的数据分析师来说是非常正确的做法。即
使简历中有你所从事项目的描述(包括机器学习),数据科学家以外的头衔也会增加不必要的歧义。
此外,如果你参加过数据科学训练营或者你是该领域的硕士,这可能会被认为是你数据科学经验的开
始(除非你之前做过类似的工作)。
02 面向业务的成就​
理想情况下,我想看到你做过什么(技术方面)和业务结果是什么。缺乏精通技术的数据科学家,可
以用商业术语进行交流。如果你能分享你的工作所影响的商业关键绩效指标,那在我的记录中会是一
个很大的加分项。例如,表明你的模型在AUC上的改进是可以的,但解决由于你的模型改进而导致的
转化率提高意味着你已经“get it”——业务影响是最终真正重要的东西。比较以下描述同一工作的不
同重点的备选方案(技术与业务):
• a.银行贷款违约率模型——改进模型的Precision-Recall AUC从0.94到0.96。​
• b.银行贷款违约率模型——业务部门年营收增加3% (每年50万美元),同时保持不变的违约率。​
03 教育背景​
你受过什么教育,在什么领域?是知名机构吗?对于最近毕业的学生,我也会看他们的GPA,以及他们
是否获得过任何优秀奖项或荣誉,比如进入了校长或院长优秀学生名单。由于数据科学是一个非常开
放的领域,没有任何标准化的测试,也没有必要的知识,人们可以通过各种方法进入这个领域。如果
你没有任何数据科学方面的正规教育,也没关系,但你需要证明你在该领域的工作记录和/或在类似领
域的高级学位。
04 布局/视觉吸引力​
我看过一些漂亮的简历(我保存了一些以备个人灵感之用),但我也收到了一些没有任何格式的文本
文件(.txt)。写简历是件痛苦的事,如果你选择了数据科学,那么你很有可能不喜欢在业余时间创造美
学设计。你需要寻找一个好的模板,使你能够在有限的空间内完成所有的事情。合理使用空间——这很
有用,可以分割页面并突出那些不属于按时间顺序排列的工作/教育经历的特定部,这可能包括你熟悉
的技术堆栈,从自我项目列表到你的github或博客和其他的链接,一些简单的图标也可以突出部分标
题。
许多应聘者在他们熟悉的每种语言/工具旁边使用1-5颗星或条形图。就我个人而言是不太喜欢这种方法
的,原因如下:
• 这是非常主观的——你的“5星”和别人的“2星”能一样吗?​
• 他们把语言和工具混在一起,最糟糕的情况是把软技能混在一起——说你在领导方面的“4.5星”并
没有什么帮助。
我也看到过这种方法被滥用的情况,即采用主观的度量方法并将其转化为饼图(30%的python,10%
的团队合作,等等)。虽然这也算是一种脱颖而出的创造性方法,但它显示出对不同图表概念背后缺
乏基本理解。
05 机器学习的多样性​
我需要两种类型的产品:
1. 算法类型-结构化/经典ML与深度学习。一些求职者只使用了深度学习,包括结构化数据,这些数据
本可以更好地应用于基于树的模型。虽然作为DL专家本身没有问题,但限制工具集会限制解决方
案。正如马斯洛所说:“如果你拥有的唯一工具是一把锤子,你就会把每个问题都看作钉子。”“在
riskfied,我们处理结构化的、领域驱动的、特征工程的数据,这些数据最好用于各种形式的提升
树,让一个简历都指向DL的人是有问题的。​
2. ML领域——这通常与计算机视觉和自然语言处理这两个需要很多专业知识的领域相关。这些领域的
专家是很需要的,而且在很多情况下,他们的整个职业生涯都将专注于这些领域。如果你正在寻找
从事该领域工作的人,这一点很关键,但通常不适合从事一般的数据科学工作的人。所以,如果你
的大部分经验都是在NLP,而你正在申请一个领域之外的职位,试着强调我们的职位/项目你已经在
结构化数据方面工作过,以展示多样性。
06 技术堆栈​
这通常可以被分解成语言、特定的包 (scikit learn、pandas、dplyr等)、云及其服务 (AWS、
Azure、GCP) 或其他工具。有些求职者会将其与他们熟悉的算法或架构 (RNN、XGBoost、K-NN) 混
在一起。就我个人而言,我更倾向于围绕技术和工具展开;当提到一个具体的算法时,我想知道求职
者的ML理论知识是否仅仅局限于这些具体的算法。​
在这里,我在寻找技术堆栈的相关性——它们是过去几年的(这是求职者动手学习新技能的积极迹
象)、堆栈的广度(它们是否非常局限于特定的工具,或者它们熟悉很多东西)以及与我们的堆栈的
契合度(我们需要教多少)。
07 项目经验​
你有没有做过什么可以在GitHub上分享的东西?任何Kaggle竞争或课外做的项目都非常有帮助,可以
查看简洁的代码、预处理类型、特征工程、EDA、算法选择和现实项目中需要解决的无数其他问题。在
你的GitHub和Kaggle账户上添加一个链接,以便面试官深入了解你的代码。​
如果你没有太多的经验,你很有可能会被问及一个或多个这样的项目。在我进行的一些面试中,应聘
者对项目的印象不太好,我们也无法就他们所做的选择及其背后的原因展开对话。一定要温习你之前
的工作,否则不要把它写在简历上。同样,确保你展示了你最好的作品,并且你已经投入了足够的时
间和精力。2-3个高质量的项目比8-10个中等 (或较低) 质量的项目要好。​

You might also like