13 views

Uploaded by sathyaseelan160

Analytical Questions for Job seekers

save

- EL5
- Flow of Statistical Analysis_Full Version
- Multiple Regression in Small-N Comparisons
- Definition: Hydraulic Flow Units in Oil and Gas Reservoirs
- MBA 253 Sensitivity and Scenario Analysis 2012-13
- 3DCS Advanced Analyzer-Optimizer
- APA format thesis
- Quantatative Methods-syl-T6-2016.doc
- p1624-chwif
- Banking deregulation and corporate tax avoidance.pdf
- Statistics Study Guide MINITAB
- DM Assignments
- Functions of Statistics
- A2AS MATH Past Papers Mark Schemes Standard MayJune Series 2014 14247 2
- Thesis Final
- Data analysis lecture
- Bibliograph - Tutorial on Statistical Signifcance Testing in Theory and in Practice
- what_is_in_the_new_ass_method_zip_file
- unsupervised extraction
- 2112590_6513_ENG_A_W
- 1 Cutting Tool
- normalização
- Copia de el modelo 4 1 vistas.pdf
- PROYECTO SOCIOTECNOLOGICO AULA.docx
- Ischebeck - Lizzi Lecture - Design of Direct Drilled and Continuous Flush Grouted Micropiles TITAN
- Refrigerator Sectoral Analysis
- NIA 402.doc
- 3 Gestion Électronique de l'Injection Diesel à Rampe Commune Des Matériels
- Examen Turbina 3 Parcial
- D8T Tractor de Cadenas FCT00001-UP (MÁQUINA) Powered by C15 Motor (SEBP5805 - 27) - Sistemas y Componentes3
- gasoduto-ambiental
- Contenido Programatico Gerencia de Proyectos PMP
- PRIORIDADES COMPETITIVAS.pptx
- Nitoflor Lithurin Technical
- Ing. Economica
- 2nd Sem Lab Manual
- Capitolul_18
- Ingenico Ict220 Users Manual 120304
- Concrete Repair
- Solución actividad 4
- RESUMEN DE LA ELABORACION DE PRESUPUESTO DE CAPITAL
- Macpherson Refrigeration Limited
- Macskad
- Trabalho de 2ºv.a - Sociologia
- Front Seat Removal and Installation
- Odebrecht Infraestructura
- Executive Summary
- 295710066 Project Report on Logistic System in DHL
- REPORTE
- mc50

You are on page 1of 6

These are mostly open-ended questions, to assess the technical horizontal knowledge of a senior candidate for a rather high level position, e.g. director. . What is the !iggest data set that you processed, and how did you process it, what were the results" #. Tell me two success stories a!out your analytic or computer science pro$ects" %ow was lift &or success' measured" (. What is) lift, *+,, ro!ustness, model fitting, design of e-periments, 8./#. rule" 0. What is) colla!orative filtering, n-grams, map reduce, cosine distance" 1. %ow to optimize a we! crawler to run much faster, e-tract !etter information, and !etter summarize data to produce cleaner data!ases" 6. %ow would you come up with a solution to identify plagiarism" 2. %ow to detect individual paid accounts shared !y multiple users" 8. 3hould click data !e handled in real time" Why" ,n which conte-ts" 4. What is !etter) good data or good models" 5nd how do you define 6good6" ,s there a universal good model" 5re there any models that are definitely not so good" .. What is pro!a!ilistic merging &5*5 fuzzy merging'" ,s it easier to handle with 378 or other languages" Which languages would you choose for semistructured te-t data reconciliation" . %ow do you handle missing data" What imputation techniques do you recommend" #. What is your favorite programming language / vendor" why" (. Tell me ( things positive and ( things negative a!out your favorite statistical software. 0. 9ompare 353, :, +ython, +erl 1. What is the curse of !ig data" 6. %ave you !een involved in data!ase design and data modeling" 2. %ave you !een involved in dash!oard creation and metric selection" What do you think a!out ;irt" 8. What features of Teradata do you like" 4. <ou are a!out to send one million email &marketing campaign'. %ow do you optimze delivery" %ow do you optimize response" 9an you optimize !oth separately" &answer) not really'

((. ridge regression. and !y how much" ( . . small hash ta!les or one !ig hash ta!le. %ow.. in terms of access speed &assuming !oth fit within :5?'" What do you think a!out in-data!ase analytics" #6. Dive e-amples of data that does not have a Daussian distri!ution.n the conte-t of fraud or spam detection' #8. 9an you perform logistic regression with @-cel" &yes' %ow" &use linest on logtransformed data'" Would the result !e good" &@-cel has numerical issues. . +ython etc. What are the draw!acks of general linear model" 5re you familiar with alternatives &8asso.ayes so !ad" %ow would you improve a spam detection algorithm that uses naive .f not. %ow would you turn unstructured data into structured data" ..s it really necessary" .#.s it !etter to have . @-amples where mapreduce does not work" @-amples where it works very well" What are the security issues involved with the cloud" What do you think of @?9As solution offering an hy!rid approach .ayes" #2. days for .sigma.s it !etter to spend 1 days developing a 4.C accurate solution. !ut itAs very interactive' (. Why" %ow would you do to increase speed !y a factor .C accuracy" >epends on the conte-t" (#. !oosted trees'" (0. Why is naive . What is star schema" 8ookup ta!les" #4.rio or any other similar clients are quite inefficient to query =racle data!ases. . >o you think 1. si. Dive e-amples of good and !ad designs of e-periments.. %ow to make sure a mapreduce application has good load !alance" What is load !alance" #0. +erl. Dive e-amples of data that has a very chaotic distri!ution" . in memory.s actuarial science not a !ranch of statistics &survival analysis'" . how so" (6.!oth internal and e-ternal cloud .?3" ##. or . nor lognormal.. %ave you !een working with white lists" +ositive rules" &. What are hash ta!le collisions" %ow is it avoided" %ow frequently does it happen" #(.s it =* to store data as flat te-t files rather than in an 378powered :>.. >efine) quality assurance. Toad or .to mitigate the risks and offer other advantages &which ones'" #1. small decision trees are !etter than a large one" Why" (1. %ave you optimized code or algorithms for speed) in 378. 9BB. and !e a!le to handle far !igger outputs" # . design of e-periments.

. or the other way around" %ow to perform good cross-validation" What do you think a!out the idea of in$ecting noise in your data set to test the sensitivity of your models" 0.As" 5aa3 &5nalytics as a service'" 1.ridge theorem' 0(. 5re you familiar either with e-treme value theory. monte carlo simulations or mathematical statistics &or anything else' to correctly estimate the chance of a very rare event" 00. testing" (4.g. great ro!ustness' and low predictive power. >o you know / used data reduction techniques other than +95" What do you think of step-wise regression" What kind of step-wise techniques are you familiar with" When is full data !etter than reduced data or sample" 0#. %ow to detect the !est rule set for a fraud detection scoring technology" %ow do you deal with rule redundancy. a correlation" Dive e-amples.. What is sensitivity analysis" . 5ny e-perience with using 5+.As" Doogle or 5mazon 5+. What is a . for scores" &see the 5nalytic.the one with !est predictive power'" 9an an appro-imate solution to the rule set pro!lem !e =*" %ow would you find an =* appro-imate solution" %ow would you decide it is good enough and stop looking for a !etter one" 02.(2. 9ompare logistic regression w. What is +=9 &proof of concept'" . 01. decision trees. What is root cause analysis" %ow to identify a cause vs. %ow would you !uild non parametric confidence intervals.s it !etter to have low sensitivity &that is. neural networks. Why is mean square error a !ad measure of model performance" What would you suggest instead" (8. %ow have these technologies !een vastly improved over the last 1 years" 0 . %ow can you prove that one improvement youAve !rought to an algorithm is really an improvement over not doing anything" 5re you familiar with 5/. e.otnet" %ow can it !e detected" 04. %ow to efficiently represent 1 dimension in a chart &or in a video'" 1#. %ow to create a keyword ta-onomy" 08. %ow would you define and measure the predictive power of a metric" 06.As" +rogramming 5+. and the com!inatorial nature of the pro!lem &for finding optimum rule set . Which tools do you use for visualization" What do you think of Ta!leau" :" 353" &for graphs'. When is it !etter to write your own code than using a data science software package" 1 . rule discovery.

What is an efficiency curve" What are its draw!acks. sales / finance / marketing / .from gathering requests to maintenance" 11.. 5re you a lone coder" 5 production guy &developer'" =r a designer &architect'" 12. or too many false negatives" 18. million data points . 5re you familiar with pricing optimization. . %ave you used time series models" 9ross-correlations with time lags" 9orrelograms" 3pectral analysis" 3ignal processing and filtering techniques" . price elasticity. fast clustering algorithm" What is a good clustering algorithm" %ow do you determine the num!er of clusters" %ow would you perform clustering on one million unique keywords. %ave you ever thought a!out creating a startup" 5round which idea / concept" 6(. competitive intelligence" Dive e-amples. including vendor selection and testing" 10. 5re you familiar with software life cycle" With .each one consisting of two . What is a cron $o!" 16. What is a recommendation engine" %ow does it work" 64.s it !etter to have too many false positives. What do you think makes a good data scientist" 2 . %ow would you create a new anonymous digital currency" 6#. 14.1(. %ow to detect !ogus reviews. assuming you have .T pro$ect life cycle . >o you think that typed login / password will disappear" %ow could they !e replaced" 60. Which data scientists do you admire most" which startups" 66. %ow did you !ecome interested in data science" 62. What is the computational comple-ity of a good. What is an e-act test" %ow and when can simulations help us when we do not use an e-act test" 2. %ow does EillowAs algorithm work" &to estimate the value of any home in F3' 6. or !ogus Gace!ook accounts used for !ad purposes" 6 .T people" 9onsulting e-perience" >ealing with vendors. e-ternal. >o you think data science is an art or a science" 2#. inventory management. What types of clients have you !een working with) internal.n which conte-t" 61.. and how can they !e overcome" 68.

?ore difficult. that & ' is independent of sample size. comprehensive factual information on a specific su!$ect" Gor instance.com' will charge you to produce fake accounts and fake likes. %ow do you sample permutations &that is. Gan?eIow. what do you think a!out the official monthly unemployment statistics regularly discussed in the press" What could make them more accurate" 28.keywords. Two of them e-hi!it patterns. %ow many 6useful6 votes will a <elp review receive" My answer) @liminate !ogus accounts &read this article'.and B . Which ones" >o you know that these charts are called scatter-plots" 5re there other ways to visually represent this type of data" 24. >o you know a few 6rules of thum!6 used in statistical or computer science" =r in !usiness analytics" 26. %ow do you immediately know when statistics pu!lished in an article &e. >etect fake likes) some companies &e. and use it rather than wasting our time trying to estimate the asymptotic distri!ution using simulations" 8. >o you think that an e-act theoretical distri!ution might e-ist. used to decode a permutation and transform it !ack into a num!er" Hint) 5n intermediate step is to use the factorial num!er system representation of an integer. feel free to !rowse the we! to find the full answer to the question &this will test the candidateAs a!ility to quickly search online and find a solution to a pro!lem without spending hours reinventing the wheel'. Testing your analytic intuition) look at these three charts.talian restaurants in same Eip code could !admouth each other and write great comments for themselves'. difficult to read or interpret" What features should a useful chart have" 21. What are your top 1 predictions for the ne-t #. There is an o!vious one-to-one correspondence !etween permutations of n elements and integers !etween and nH >esign an algorithm that encodes an integer less than nH as a permutation of n elements. <ou design a ro!ust non-parametric statistic &metric' to replace correlation or : square.g. @ven !etter. years" 22. What would !e the reverse algorithm. million data points ta!le in the first place" 2(. or competitor reviews &how to detect them) use ta-onomy to classify users. %ow do you normalize for sample size" Write an algorithm that computes all permutations of n elements. rather than correct.two . What could make a chart misleading. those who hate everything. Dive a few e-amples of 6!est practices6 in data science. Geel free to check this reference online to answer the question. 20. @liminate prolific users who like everything. and &(' !ased on rank statistics. we should find it. %ave a !lacklist of keywords . newspaper' are either wrong or presented to support the authorAs point of view. and location . generate tons of random permutations' when n is large. and therefore.. 8 . and a metric measuring how similar these two keywords are" %ow would you create this . technical question related to previous one. &#' always !etween . to estimate the asymptotic distri!ution for your newly created metric" <ou may use this asymptotic distri!ution for normalizing your metric.g.

WhatAs wrong with this picture" .+ address or . 9an you estimate and forecast sales for any !ook.s it a !rand new company" 5dd more weight to trusted users &create a category of trusted users'. 9reate a metric to measure distance !etween two pieces of te-t &reviews'. !ased on 5mazon pu!lic data" %int) read this article. Who are the !est people you recruited and where are they today" 88. Glag all reviews that are identical &or nearly identical' and come from same . What/when/where is the last data science !log post you wrote" 86. 9reate a review or reviewer ta-onomy. Fse hidden decision trees to rate or score review and reviewers. what is data science" ?achine learning" >ata mining" 82. What did you do today" =r what did you do this week / last week" 8(. . and why" Which company do you admire most" 81.to filter fake reviews. What are your favorite data science we!sites" Who do you admire most in the data science community. Watch out for # or ( similar comments posted the same day !y ( users regarding a company that receives very few reviews. 5lso watch out for disgruntled employees !admouthing their former employer. . 3ee if . What/when is the latest data mining !ook / article you read" What/when is the latest data mining conference / we!inar / class / workshop / training you attended" What/when is the most recent programming skill that you acquired" 80.+ !lock of reviewer is in a !lacklist such as 63top Gorum 3pam6. 8#.n your opinion. 84. 9reate honeypot to catch fraudsters.+ address or same user.

- EL5Uploaded byGustavTillman
- Flow of Statistical Analysis_Full VersionUploaded bymasoom91
- Multiple Regression in Small-N ComparisonsUploaded byDavi Barboza
- Definition: Hydraulic Flow Units in Oil and Gas ReservoirsUploaded byASaifulHadi
- MBA 253 Sensitivity and Scenario Analysis 2012-13Uploaded byPraveen Kumar Ray
- 3DCS Advanced Analyzer-OptimizerUploaded byPaloma Campillo
- APA format thesisUploaded byPortia du Belmont
- Quantatative Methods-syl-T6-2016.docUploaded byPhuongDiep
- p1624-chwifUploaded byHiya Dutt
- Banking deregulation and corporate tax avoidance.pdfUploaded byBhuwan
- Statistics Study Guide MINITABUploaded byldlewis
- DM AssignmentsUploaded byKaustubh Khare
- Functions of StatisticsUploaded byshailesh
- A2AS MATH Past Papers Mark Schemes Standard MayJune Series 2014 14247 2Uploaded byJamieO'Brien
- Thesis FinalUploaded byLouiseBundgaard
- Data analysis lectureUploaded byC Donis
- Bibliograph - Tutorial on Statistical Signifcance Testing in Theory and in PracticeUploaded byrferreira85
- what_is_in_the_new_ass_method_zip_fileUploaded byherver30
- unsupervised extractionUploaded byUtsav Barman
- 2112590_6513_ENG_A_WUploaded byrafyta

- 1 Cutting ToolUploaded byKemalMalovcic
- normalizaçãoUploaded bycarlos_alberto_85
- Copia de el modelo 4 1 vistas.pdfUploaded byOlga Lidia Domingoo Ortega
- PROYECTO SOCIOTECNOLOGICO AULA.docxUploaded byBarinitas Calderas
- Ischebeck - Lizzi Lecture - Design of Direct Drilled and Continuous Flush Grouted Micropiles TITANUploaded byDaniel Rioja Garcia
- Refrigerator Sectoral AnalysisUploaded byopamp2013
- NIA 402.docUploaded bysheyla
- 3 Gestion Électronique de l'Injection Diesel à Rampe Commune Des MatérielsUploaded byjeanbenoit015
- Examen Turbina 3 ParcialUploaded byBrayan Velasco
- D8T Tractor de Cadenas FCT00001-UP (MÁQUINA) Powered by C15 Motor (SEBP5805 - 27) - Sistemas y Componentes3Uploaded byAlonso Inostroza
- gasoduto-ambientalUploaded byDavi Prata
- Contenido Programatico Gerencia de Proyectos PMPUploaded byajramosp
- PRIORIDADES COMPETITIVAS.pptxUploaded byjcarlostoes
- Nitoflor Lithurin TechnicalUploaded bymyke_0101
- Ing. EconomicaUploaded byErick Navarro
- 2nd Sem Lab ManualUploaded byAdhir Sarkar
- Capitolul_18Uploaded byulik003
- Ingenico Ict220 Users Manual 120304Uploaded bydermord
- Concrete RepairUploaded byfairmatechemical
- Solución actividad 4Uploaded byOscar Andres Sandoval C
- RESUMEN DE LA ELABORACION DE PRESUPUESTO DE CAPITALUploaded byYamiiLe'ssN'üñëz
- Macpherson Refrigeration LimitedUploaded byRendy Franata
- MacskadUploaded byAce Rigor Marzan Abril
- Trabalho de 2ºv.a - SociologiaUploaded byHygor Aquino
- Front Seat Removal and InstallationUploaded byMichael Hernandez
- Odebrecht InfraestructuraUploaded byThaliaMisabelCampodonicoAlcantara
- Executive SummaryUploaded byungaskatalaga
- 295710066 Project Report on Logistic System in DHLUploaded bySamraizTejani
- REPORTEUploaded bydayis_bravo
- mc50Uploaded byklemcb