Ismail

Ismail
(Data Scientist)
PROFESSIONAL SUMMARY
 Around 8 years of hands on experience and comprehensive industry knowledge of Machine Learning,
Statistical Modeling, Deep Learning, Data Analytics, Data Modeling, Data Architecture, Data Analysis,
Data Mining, Text Mining & Natural Language Processing (NLP), Artificial Intelligence algorithms,
Business Intelligence.
 Having good experience in Analytics Models like Decision Trees, Linear & Logistic Regression, Hadoop
(Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and PostgreSQL,Erwin.
 Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design,
development, testing, implementation and maintenance.
 Experienced in Data Modeling techniques employing Data Warehousing concepts like star/snowflake
schema and Extended Star.
 Expertise in applying Data Mining techniques and optimization techniques in B2B and B2C industries.
 Expertise in writing functional specifications, translating business requirements to technical specifications,
created/maintained/modified database design document with detailed description of logical entities and
physical tables.
 Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research.
Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding &
exposure of Big Data Eco-system.
 Expertise in Data Analysis, Data Migration, Data Profiling, Data Cleansing, Transformation,
Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica
Power Center.
 Proficient in Machine Learning, Data/Text Mining, Statistical Analysis&Predictive Modeling.
 Good Knowledge and experience in deep learning algorithms such as Artificial Neural network (ANN),
Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), LSTM and RNN based
speech recognition using Tensor Flow.
 Excellent knowledge and experience in OLTP/OLAP System Study with focus on Oracle Hyperion Suite
of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables,
Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical
Data Modeling using Erwin tool.
 Used Cognitive Science in Machine Learning for Neurofeedback training which is essential for
intentional control of brain rhythms.
 Experienced in building data models using machine learning techniques for Classification, Regression,
Clustering and Associative mining.
 Good Knowledge on Natural Language Processing (NLP) and Time Series Analysis and Forecasting
using ARIMA model in Python and R.
 Enabling rapid insights generation from adverse event Data via cognitive technology to increase the
translational research capabilities
 Working experience in Hadoop ecosystem and Apache Spark framework such as HDFS,
MapReduce,HiveQL, SparkSQL, PySpark.
 Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes
services like EC2, S3, and EMR.
 Proficient in data visualization tools such as Tableau, Python Matplotlib, R Shiny to create visually
powerful and actionable interactive reports and dashboards.
 Expertise in building, publishing customized interactive reports and dashboards with customized
parameters and user-filters using Tableau(9.x/10.x).
 Experienced in Agile methodology and SCRUM process.
 Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.
 Proficient in Python, experience building, and product ionizing end-to-end systems.
 Strong programming expertise (preferably in Python) and strong in Database SQL.
 Solid coding and engineering skills preferably in Machine Learning.
 Exposure to python and python packages.
 Be a valued contributor in shaping the future of our products and services.
EDUCATION
Bachelor of Computer Science
TECHNICAL SKILLS
MySQL, PostgreSQL, Oracle, HBase, Amazon Redshift, MS SQL Server

Databases
2016/2014/2012/2008 R2/2008, Teradata
Hypothetical Testing, ANOVA, Time Series,Confidence Intervals, Bayes Law,
Statistical Methods PrincipalComponent Analysis (PCA), Dimensionality Reduction, Cross-
Validation, Auto-correlation
Regression analysis, Bayesian Method, Decision Tree, Random Forests,
Machine Learning SupportVector Machine, Neural Network, SentimentAnalysis, K-Means
Clustering, KNN andEnsemble Method
Hadoop Ecosystem Hadoop 2.x, Spark 2.x, MapReduce, Hive, HDFS, Sqoop, Flume
Tableau Suite of Tools 10.x, 9.x, 8.x which includes Desktop, Server and Online,
Reporting Tools
Server Reporting Services(SSRS)
Languages Python (2.x/3.x), R, SAS, SQL, T-SQL
Operating Systems PowerShell, UNIX/UNIX Shell Scripting , Linux and Windows
Data Analytics Tools Python (numpy, scipy, pandas, Gensim, Keras), R (Caret, Weka, ggplot).
Data Visualization Tableau, Visualization packages, Matplotlib, Seaborn, ggplot2, Microsoft Office.
dplyr, sqldf, data table, Random Forest, gbm, caret, elastic net and all sort of
R Package
Machine Learning Packages.
PROFESSIONAL EXPERIENCE:
Role: Sr. Data scientist/Machine learning Engineer

Client: Best Buy, Minneapolis, MN May 2018-Till Date
Description: Best Buy Co., Inc. is an American multinational consumer electronics retailer headquartered in
Richfield, Minnesota. It was originally founded by Richard M. Schulze and James Wheeler in 1966 as an audio
specialty store called Sound of Music. In 1983, it was re-branded under its current name with an emphasis placed on
consumer electronics. Best Buy is the largest specialty retailer in the United States consumer electronics retail
industry.
Responsibilities:
 Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, R a broad variety of machine
learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the
engine to increase user lifetime by 45% and triple user conversations for target categories.
 Participated in features engineering such as feature intersection generating, feature normalize and label
encoding with Scikit-learnpre-processing.
 Performing statistical analysis on textual data. Building Machine learning/ Deep Learning models in the
domain of Natural Language
 Used Python 3.X (numpy, scipy, pandas, scikit-learn, seaborn) and Spark2.0 (PySpark, MLlib) to develop
variety of models and algorithms for analytic purposes.
 Application of various Machine Learning algorithms and statistical modeling like decision trees,
regression models, neural networks, SVM, clustering to identify Volume using the scikit-learn package
in python, Matlab.
 Create and build Dockers images for prototype deep learning models running on local GPU.
 Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
Created various types of data visualizations using Python and Tableau.
 Developed and implemented predictive models using machine learning algorithms such as linear
regression, classification, multivariate regression, Naive Bayes, RandomForest, K-means clustering,
KNN, PCA and regularization for Data Analysis.
 Performed Data Collection, Data Cleaning, Data Visualization and developing Machine Learning
Algorithms by using several packages: Numpy, Pandas, Scikit-learn and Matplotlib.
 Implemented various data pre-processing techniques to manipulate the unstructured, structured data
and imbalanced data like SMOTE.
 Clustered customers' actions data by using K-means clustering and Hierarchical clustering, then
segmented them into different groups for further analyses.
 Built Support Vector Machine algorithms for detecting the fraud and dishonest behaviours of customers
by using several packages: Scikit-learn, Numpy, Pandas in Python.
 Designed and developed NLP models for sentiment analysis.
 Led discussions with users to gather business processes requirements and data requirements to develop a
variety of Conceptual, Logical and Physical Data Models. Expert in Business Intelligence and Data
Visualization tools: Tableau, Microstrategy.
 Developed and evangelized best practices for statistical analysis of Big Data.
 Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
 Developed deep learning algorithm that generated hedging strategies providing 15% ROI per month with
a standard deviation of 2.7%(results based on testing strategies on real data for 3 months)
 Designed the Enterprise Conceptual, Logical, and Physical Data Model for ‘Bulk Data Storage
System ‘using Embarcadero ER Studio, the data models were designed in 3NF.
 Worked on machine learning on large size data using Spark and MapReduce.
 Collaborated with data engineers and operation team to implement ETL process, wrote and optimized
SQL queries to perform data extraction to fit the analytical requirements.
 Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data
from RedShift.
 Explored and analyzed the customer specific features by using SparkSQL.
 Performed data imputation using Scikit-learn package in Python.
 Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and
utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other
approaches.
 Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning.
 Developed Spark/Scala,SAS and R programs for regular expression (regex) project in the Hadoop/Hive
environment with Linux/Windows for big data resources.
 Conducted analysis on assessing customer consuming behaviours and discover value of customers with
RMF analysis; applied customer segmentation with clustering algorithms such as K-MeansClustering
and Hierarchical Clustering.
 Implement deep learning algorithms to identify fraudulent transactions
 Built regression models include: Lasso, Ridge, SVRand XGboost to predict Customer Lifetime Value.
 Built classification models include: Logistic Regression, SVM, Decision Tree, RandomForest to predict
Customer Churn Rate.
 Used F-Score, AUC/ROC, Confusion Matrix, MAE, RMSE to evaluate different Model performance.
Environment: AWS RedShift, EC2, EMR, Hadoop Framework, S3,HDFS, Spark(Pyspark, MLlib, Spark SQL),
Python 3.x (Scikit-Learn/Scipy/Numpy/Pandas/Matplotlib/Seaborn),Tableau Desktop (9.x/10.x), Tableau Server
(9.x/10.x), Machine Learning (Regressions, KNN, SVM, Decision Tree, Random Forest, XGboost, LightGBM,
Collaborative filtering, Ensemble),Deep Learning, Teradata, Git 2.x, Agile/SCRUM
Role: Data scientist/Machine learning Engineer

Client: Johnson and Johnson, Raritan, NJ Jan 2017 – Apr 2018
Description:
Johnson & Johnson is an investment holding company with interests in health care products. It engages in
research and development, manufacture and sale of personal care hygienic products, pharmaceuticals and surgical
equipment. The company operates through the following business segments.
Responsibilities:
 Tackled highly imbalanced Fraud dataset using undersampling, oversampling with SMOTE and cost
sensitive algorithms with Python Scikit-learn.
 Wrote complex Spark SQL queries for data analysis to meet business requirement.
 Developed MapReduce/Spark Python modules for predictive analytics & machine learning in Hadoop
on AWS.
 Building Optimization models using Machine Learning, Deep Learning algorithms.
 Worked on data cleaning and ensured Data Quality, consistency, integrity using Pandas, Numpy.
 Participated in feature engineering such as feature intersection generating, feature normalize and label
encoding with Scikit-learn preprocessing.
 Improved fraud prediction performance by using random forest and gradient boosting for feature selection
with Python Scikit-learn.
 Performed feature engineering, performed NLP by using some techniques like Word2Vec, BOW (Bag of
Words), Tf-Idf, Word2Vec, Doc2Vec.
 Performed Naïve Bayes, KNN, Logistic Regression, RandomForest, SVMandXGboost to identify
whether a loan will default or not.
 Implemented Ensemble of Ridge, Lasso Regression and XGboost to predict the potential loan default
loss.
 Used various Metrics (RMSE, MAE, F-Score, ROC and AUC) to evaluate the performance of each model.
 Performed data cleaning and feature selection using MLlib package in PySpark and working with deep
learning frameworks.
 Actively involved in all phases of data science project life cycle including Data Extraction, Data
Cleaning, Data Visualization and building Models.
 Experience in working with languages Python and R.
 Developed text mining models using Tensor Flow&NLP (NLTK, SpaCy and CoreNLP) on call
transactions & social media interaction data for existing customer management.
 Experienced in Agile methodology and SCRUM process.
 Experience in Extract, Transfer and Load process using ETL tools like Data Stage, Data Integrator and
SSIS for Data migration and Data Warehousing projects.
 Experienced in Data Integration Validation and Data Quality controls for ETL process and Data
Warehousing using MS Visual Studio, SSAS, SSIS and SSRS.
 Used big data tools Spark(Pyspark, SparkSQL and MLlib) to conduct Realtime analysis of loan default
based onAWS.
Environment:MS SQL Server 2014, Teradata, ETL, SSIS, Alteryx, Tableau (Desktop 9.x/Server 9.x),
Python3.x(Scikit-Learn/Scipy/Numpy/Pandas), Machine Learning (Naïve Bayes, KNN, Regressions, Random
Forest, SVM, XGboost, Ensemble), AWS Redshift, Deep Learning, Spark(PySpark, MLlib, Spark SQL), Hadoop
2.x, Map Reduce, HDFS, SharePoint.
Role: Data Scientist

Client: RetailMeNot INC, Austin, TX Nov 2015 – Dec2016
Description:RetailMeNot, Inc. is a leading digital savings destination connecting consumers with retailers,
restaurants and brands, both online and in-store. The company enables consumers across the globe to find
hundreds of thousands of digital offers and discounted gift cards to save money while they shop or dine out.
Responsibilities:
 Gathered, analyzed, documented and translated application requirements into data models and Supports
standardization of documentation and the adoption of standards and practices related to data and
applications.
 Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using
Sqoop, Pig, Flume, Hive, MapReduce and HDFS.
 Automated csv to chatbot friendly Json transformation by writing NLP scripts to minimize development
time by 20%.
 Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data.
 Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in
python.
 Applied clustering algorithms i.e. Hierarchical, K-means usingScikit and Scipy.
 Created logical data model from the conceptual model and it's conversion into the physical database design
using ERWIN.
 Mapped business needs/requirements to subject area model and to logical enterprise model.
 Worked with DBA's to create a best fit physical data model from the logical data model
 Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/
columns as part of data analysis responsibilities.
 Enforced referential integrity in the OLTPData Model for consistent relationship between tables and
efficient database design.
 Developed the data warehouse model (star schema) for the proposed central model for the project.
 Created 3NF business area data modeling with de-normalized physical implementation data and
information requirements analysis using ERWIN tool.
 Worked on the Snow-flaking the Dimensions to remove redundancy.
 Worked in using Teradata14 tools like Fast Load, Multi Load, T Pump, Fast Export, TeradataParallel
Transporter (TPT) and BTEQ.
 Helped in migration and conversion of data from the Sybase database into Oracle database, preparing
mapping documents and developing partial SQL scripts as required.
 Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data
from legacy Oracle and SQL Server database systems.
Environment: Machine learning(KNN, Clustering, Regressions, Random Forest, SVM,Ensemble), Linux,

Python 2.x (Scikit-Learn/Scipy/Numpy/Pandas), R, Tableau (Desktop 8.x/Server 8.x), Hadoop, Map
Reduce,HDFS, Hive, Pig, HBase,Sqoop, Flume,Oracle 11g, SQL Server 2012.
Role: BI Developer/Data Analyst

Client: Deutsche Bank, New York City, NYMay 2014 – Oct 2015
Description: Deutsche Bank is a leading global investment bank with a strong and profitable private client’s
franchise. The Project was to implement machine learning techniques and develop statistical models to identify
loan default pattern and predict potential default loss for the company.
Responsibilities:
 Used SSIS to create ETL packages to Validate, Extract, Transform and Load data into Data Warehouse
and Data Mart.
 Maintained and developed complex SQL queries, stored procedures, views, functions and reports that meet
customer requirements using Microsoft SQL Server 2008 R2.
 Created Views and Table-valued Functions, Common Table Expression (CTE), joins, complex
subqueries to provide the reporting solutions.
 Optimized the performance of queries with modification in T-SQL queries, removed the unnecessary
columns and redundant data, normalized tables, established joins and created index.
 Created SSIS packages using Pivot Transformation, Fuzzy Lookup, Derived Columns, ConditionSplit,
Aggregate, Execute SQL Task, Data Flow Task and Execute Package Task.
 Migrated data from SAS environment to SQL Server 2008 via SQL Integration Services (SSIS).
 Developed and implemented several types of Financial Reports (Income Statement, Profit& Loss
Statement, EBIT, ROIC Reports) by using SSRS.
 Collaborated with database engineers to implement ETL process, wrote and optimized SQL queries to
perform data extraction and merging from SQL server database.
 Created Complex ETL Packages using SSIS to extract data from staging tables to partitioned tables with
incremental load.
 Gathered, analyzed, and translated business requirements, communicated with other departments to
collected client business requirements and access available data.
 Migrating data from Legacy system to SQL Server using SQL Server Integration Services 2012.
 Used C# scripts to map records.
 Involved in writing complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins,
Constraints, DDL, DML and User Defined Functions to implement the business logic and created
clustered and non-clustered indexes.
 Created and modified Stored Procedures, Functions, and Indexes.
 Developed SQL Scripts to Insert/Update and Delete data in MS SQL database tables.
 Created various ad-hoc SQL queries for customer reports, executive management reports and types of
report types like tables, matrix, sub reports etc.
 Designed and developed new reports and maintained existing reports using Microsoft SQLReporting
Services (SSRS) and Microsoft Excel to support the firm's strategy and management.
Created sub-reports, drill down reports, summary reports, parameterized reports, and ad-hoc reports using
SSRS.
 Used SAS/SQL to pull data out from databases and aggregate to provide detailed reporting based on the
user requirements.
 Used SAS for pre-processing data, SQL queries, Data Analysis, generating reports, Graphics, and
Statistical analyses.
 Provided statistical research analyses and Data Modeling support for mortgage product.
 Perform analyses such as regression analysis, logistic regression, discriminant analysis, cluster
analysisusing SAS programming.
Environment: SQL Server 2008 R2, DB2,Oracle,SQL Server Management Studio, SAS/ BASE, SAS/SQL,
SAS/Enterprise Guide, MS BI Suite(SSIS/SSRS), T-SQL, SharePoint 2010, Visual Studio 2010, Agile/SCRUM
Role: Data Analyst

Client: Exceloid Soft Systems, India Jan 2013 – Apr 2014
Description: By implementing Exceloid's made-for-future technological strategies excellent working with

Exceloid Soft Systems in the initial days of our retail journey in India. I think they are among the best Openbravo
specialist we worked with in India.
Responsibilities:
 Wrote SQL queries for data validation on the backend systems and used various tools like
TOAD&DBVisualizer for DBMS(Oracle).
 Perform Data analysis, Backend Database testing, Data Modeling and Developing SQL Queries to
solve problems and meet user's need for Database management in Data Warehouse.
 Utilize object-oriented languages, concepts, database design, star schemas and databases.
 Create algorithms as needed to manage and implement proposed solutions.
 Participate in test planning and test execution for functional, system, integration, regression, UAT (User
Acceptance Testing), load and performance testing.
 Work with test automation tools for recording/coding in Database, and execute in regression testing
cycles.
 Transferred data from various OLTP data sources, such as Oracle, MS Access, MS Excel, Flat files, CSV
files into SQL Server.
 Working with Databases DB2, Oracle DM, SQL Server for Database testing and maintenance.
 Involved in writing and executing User Acceptance Testing (UAT) with end users.
 Involved in Post- Implementation validations after the changes have been to the Data Marts.
 Chart out Graphs, and Reports alike in QC to point out the percentage of Test Cases passed, and thereby
to point out the percentage of Quality achieved and uploading the status daily to ART reports an in-house
tool.
 Performed extensive Data Validation, Data Verification against Data Warehouse.
 Used UNIX to check the Data marts, Tables and Updates made to the tables.
 Writing advanced SQL Queries to query the data from Data marts and Landings to verify the changes has
been made.
 Involved in Client requirement gathering, participated in discussion & brain storming sessions and
documented requirements.
 Validating and profilingFlat File Data into Teradata tables using UNIX Shell scripts.
 Actively participated Functional, System and User Acceptance testing on all builds and supervised releases
to ensure system / functionality integrity.
 Closely interacted with designers and software developers to understand application functionality and
navigational flow and keep them updated about Business user sentiments.
 Interacted with developers to resolve different Quality Related Issues.
 Wrote and executed manual test cases for functional, GUI, and regression testing of the application to
make sure that new enhancements do not break working features
 Writing and executing Manual test cases in HP Quality Center.
 Wrote test plans for positive and negative scenarios for GUI and functional testing
 Involved in writing SQL queries and stored procedures using Query Analyzer and matched the results
retrieved from the batch log files
 Created Project Charter documents & Detailed Requirement document and reviewed with Development &
other stake holders.
Environment: Subversion, TortoiseSVN, Jira, Agile-Scrum, Web Services, Mainframe, Oracle, Perl, UNIX,
LINUX, Shell Scripts, UML, Quality Center, RequisitePro, SQL, MS Visio, MS Project, Excel, Power Point,
Word, SharePoint, Win XP/7 Enterprise.
Role: Data Analyst/Data Modeler

Client: ZEN3 Info Solutions, India May 2011 – Dec 2012
Description: Zen3 is a leading software solutions group developing innovative solutions for media, travel and
technology industries.
Responsibilities:
 Data analysis and reporting using MY SQL, MS Power Point, MS Access and SQL assistant.
 Involved in MY SQL, MS Power Point, MS Access Database design and design new database on
Netezza which will have optimized outcome.
 Used DB2 Adapters to integrate between Oracle database and Microsoft SQL database in order to transfer
data.
 Designed the data marts using the Ralph Kimball's DimensionalData Mart modeling methodology using
ER Studio.
 Involved in writing T-SQL, working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data
Migration.
 Used Normalization methods up to 3NF and De-normalization techniques for effective performance in
OLTP systems.
 Initiated and conducted JAD sessions inviting various teams to finalize the required data fields and their
formats.
 Involved in designing and implementing the Data Extraction (XML DATA stream) procedures.
 Created base tables, views, and index. Built a complex Oracle procedure in PL/SQL for extract, loading,
transforming the data into the warehouse via DBMSScheduler from the internal data.
 Involved in writing scripts for loading data to target data Warehouse using BTEQ, Fast Load, MultiLoad.
 Create ETL scripts using Regular Expressions and custom tools (Informatica, Pentaho, and Sync Sort) to
ETL data.
 Developed SQLService Broker to flow and sync of data from MS-I to Microsoft's master database
management (MDM).
 Extensively involved in Recovery process for capturing the incremental changes in the source systems for
updating in the staging area and data warehouse respectively
 Strong knowledge of Entity-Relationship concept, Facts and dimensions tables, slowly changing
dimensions and Dimensional Modeling (Star Schema and Snow Flake Schema).
 Involved in loading data between Netezza tables using NZSQL utility.
 Worked on Data modeling using Dimensional Data Modeling, Star Schema/Snow Flake schema, and Fact
& Dimensional, Physical & Logical data modeling.
 Generated Stats pack/AWR reports from Oracle database and analyzed the reports for Oracle8.x wait
events, time consuming SQL queries, table space growth, and database growth.
Environment: ER Studio, MY SQL, MS Power Point, MS Access, MY SQL, MS Power Point, MS Access,
Netezza, DB2, T-SQL, DTS, Informatica MDM, SSIS, SSRS, SSAS, ETL, MDM, 3NF and De-normalization,
Teradata, Oracle8.x, (Star Schema and Snow Flake Schema) etc.

Ismail - Data Scientist

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ismail - Data Scientist

Uploaded by

Copyright:

Available Formats

Bachelor of Computer Science

MySQL, PostgreSQL, Oracle, HBase, Amazon Redshift, MS SQL Server

Operating Systems PowerShell, UNIX/UNIX Shell Scripting , Linux and Windows

Role: Sr. Data scientist/Machine learning Engineer

Role: Data scientist/Machine learning Engineer

Role: Data Scientist

Environment: Machine learning(KNN, Clustering, Regressions, Random Forest, SVM,Ensemble), Linux,

Role: BI Developer/Data Analyst

Role: Data Analyst

Description: By implementing Exceloid's made-for-future technological strategies excellent working with

Role: Data Analyst/Data Modeler

You might also like