You are on page 1of 3

"WILLIAM A THORNDIKE\r\nBig Data Engineer\r\n\r\nEmail:

williamthorndikeIII@gmail.com \r\nPhone: (651) 371-1285\r\n\r\n5 Years� Experience


I.T. and Hadoop/Big Data \r\n\r\nSeasoned Hadoop/Big Data Engineer skilled in the
use of Spark/Spark Streaming, Spark Data Frames. Experience working with Hadoop
components, Kafka, Kibana and PySpark. Design and implement on-prem and cloud Big
Data ecosystems and pipelines using Hadoop and Spark.\r\n\r\nPROFESSIONAL
PROFILE\r\n�\tProficient in extracting and generating analysis using Business
Intelligence Tool, Tableau for better analysis of data.\r\n�\tEffective in HDFS,
YARN, Pig, Hive, Impala, Sqoop, HBase, Cloudera. \r\n�\tMLlib, Spark
GraphX.\r\n�\tExperience processing data using Spark Streaming API with
Scala.\r\n�\tSpark Architecture including Spark Core, Spark SQL, Spark Streaming,
Spark\r\n�\tETL, data extraction, transformation and load using Hive, Pig and
HBase.\r\n�\tVery Good knowledge and Hands-on experience in Cassandra, Flume and
YARN. \r\n�\tExperience in implementing User Defined Functions for Pig and Hive.
\r\n�\tExtensive Knowledge in Development, analysis and design of ETL methodologies
in all the phases of Data Warehousing life cycle. \r\n�\tExcellent understanding of
Hadoop architecture and its components such as HDFS, Job Tracker, Task Tracker,
Name Node, and Data Node. \r\n�\tExpertise in Python and Scala, user-defined
functions (UDF) for Hive and Pig using Python.\r\n�\tHands-on use of Spark and
Scala API's to compare the performance of Spark with Hive and SQL, and Spark SQL to
manipulate Data Frames in Scala.\r\n�\tExpertise in preparing the test cases,
documenting and performing unit testing and Integration testing. \r\n�\tHands on
Experience on major components in Hadoop Echo Systems like Spark, HDFS, HIVE, PIG,
HBase, Zookeeper, Sqoop, Oozie, Flume, Kafka.\r\n�\tWork Experience with Cloud
Infrastructure like Amazon Web Services.\r\n�\tExperience in Importing and
Exporting data using Sqoop from Oracle/Mainframe DB2 to HDFS and Data
Lake.\r\n�\tExperience in developing Shell Scripts, Oozie Scripts and Python
Scripts.\r\n�\tExpert in writing complex SQL queries with databases like DB2,
MySQL, SQL Server and MS SQL Server.\r\n�\tExperience in importing and exporting
data using Sqoop and SFTP for Hadoop to/from RDBMS. \r\n�\tExtensive experience
with Databases such as MySQL, Oracle 11G. \r\n�\tExperience in using Kafka as a
messaging system to implement real-time Streaming solutions using Spark
Streaming.\r\n�\tExpertise with the tools in Hadoop Ecosystem including HDFS, Pig,
Hive, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, Zookeeper etc.\r\n�\tKnowledge in
implementing advanced procedures like text analytics and processing using Apache
Spark with Python language.\r\n\r\nTECHNICAL SECURITY SKILLS
PROFILE\r\n\r\nAPACHE\r\nApache Ant, Apache Flume, Apache Hadoop, Apache YARN,
Apache Hive, Apache Kafka, Apache MAVEN, Apache Oozie, Apache Pig, Apache Spark,
Apache Tez, Apache Zookeeper, Cloudera Impala, HDFS\tHADOOP\r\nHive, Pig,
Zookeeper\r\nSqoop, Oozie, Yarn\r\nMaven, Ant, Flume, HDFS\r\nApache
Airflows\r\n\r\nVERSIONING\r\nGit, GitHub\r\n\tDATA VISUALIZATION TOOLS\r\nPentaho,
QlikView, Tableau\r\n\r\nPROGRAMMING\r\nSpark, Scala, PySpark, PyTorch, Java
\r\n\r\nFRAMEWORKS\r\nSpark, Kafka \r\nFILE FORMATS\r\nParquet, Avro & JSON,
ORC\r\n\tHADOOP ADMINISTRATION\r\nAmbari, Yarn, Workflows, Zookeeper, Oozie,
Cluster Management, Cluster Security\tSCRIPTING\r\nPig Latin, HiveQL, MapReduce,
Shell scripting, SQL, Spark SQL\r\nSOFTWARE DEVELOPMENT\r\nTest-Driven
Development\r\nContinuous Integration\r\nUnit Testing, Functional Testing, Scenario
Testing, Regression Testing, Object-Oriented Programming, Functional
Programming\tIDE\r\nJupyter Notebooks, PyCharm\r\nContinuous Integration (CI CD):
Jenkins\r\n\r\nDATA MANAGEMENT\r\nHDFS, Data Lake, Data Warehouse,
Database\tPROJECT METHODOLOGY\r\nAgile Scrum, Sprint Planning, Sprint
Retrospective, Sprint Grooming, Backlog, Daily Scrums\r\n \r\nBIG DATA
DISTRIBUTIONS AND PLATFORMS\r\nAWS Cloud, Hadoop On-Prem, Hadoop, Cloudera
(CDH),\r\nHortonworks Data Platform (HDP)\tAMAZON AWS CLOUD \r\nAWS Lambda, AWS S3,
AWS RDS\r\nAWS EMR, AWS Redshift, AWS S3\r\nAWS Lambda, AWS Kinesis, AWS ELK, AWS
Cloud Formation, AWS IAM\tDATABASE\r\nApache Cassandra\r\nAWS
Redshift\r\nAmazonRDS\r\nApache Hbase\r\nSQL,
NoSQL\r\nElasticsearch\r\n\r\nPROFESSIONAL EXPERIENCE PROFILE\r\n\r\nBIG DATA
ENGINEER\r\n3M\r\nMaplewood, MN\r\nJuly 2017 - Present\r\n\r\n�\tWorked with Spark
to create structured data from the pool of unstructured data
received.\r\n�\tImplemented advanced procedures like text analytics and processing
using the in-memory computing capabilities like Apache Spark written in
Scala.\r\n�\tImplemented Spark using Scala and Spark SQL for faster testing and
processing of data.\r\n�\tInvolved in converting Hive/SQL queries into Spark
transformations using Spark RDDs, Scala.\r\n�\tDocumented the requirements
including the available code which should be implemented using Spark, Hive, HDFS
and Elasticsearch.\r\n�\tMaintained ELK (Elasticsearch, Kibana) and Wrote Spark
scripts using Scala shell.\r\n�\tImplemented Spark using Scala and utilizing Data
frames and Spark SQL API for faster processing of data.\r\n�\tFine-tuned resources
for long running Spark Applications to utilize better parallelism and executor
memory for more caching.\r\n�\tUsed Apache Spark and Scala on large datasets using
Spark to process real time data.\r\n�\tTransferred Streaming data from different
data sources into HDFS and HBase using Apache Flume.\r\n�\tFetching the live stream
data from DB2 to Hbase table using Spark Streaming and Apache Kafka.\r\n�\tInvolved
in complete Big Data flow of the application starting from data ingestion from
upstream to HDFS, processing the data into HDFS using Spark
Streaming.\r\n�\tDeveloped ETL pipelines using Spark and Hive for performing
various business specific transformations.\r\n�\tAutomated the pipelines in Spark
for bulk loads as well as incremental loads of various datasets.\r\n�\tWorked on
building input adapters for data dumps from FTP Servers using Apache
spark.\r\n�\tIntegration of Kafka with Spark for real time data
processing.\r\n�\tPerformed streaming data ingestion to the Spark distribution
environment, using Kafka.\r\n�\tExtracted Real time feed using Spark streaming and
convert it to RDD and process data into Data Frame and load the data into
Cassandra. Elasticsearch and Logstash performance and configure
tuning.\r\n�\tResponsible for designing and deploying new ELK clusters
(Elasticsearch, Logstash, Kibana, beats, Kafka, zookeeper etc.\r\n�\tBash source
databases and creating ETL pipeline into Kibana and Elasticsearch. Involved in the
process of data acquisition, data pre-processing and data exploration of project in
Scala.\r\n�\tDeveloped Spark applications for the entire batch processing by using
Scala.\r\n�\tDeveloped Spark scripts by using Scala shell commands as per the
requirement and used PySpark for proof of concept.\r\n�\tImplemented Hadoop using
Hortonworks Data Platform (HDP).\r\n�\tWorked on continuous Integration with
Jenkins and automated jar files at end of day.\r\n\r\nBIG DATA ENGINEER � AMAZON
AWS CLOUD\r\nClif Bar\r\nEmeryville, CA\r\nApril 2016 - July
2017\r\n\r\n�\tExperienced in implementing Spark RDD transformations, actions to
implement business analysis.\r\n�\tImplemented Spark using Scala and Spark SQL for
faster testing and processing of data.\r\n�\tUsed AWS RedShift Clusters to sync
data from Hoot and used AWS RDS to store the data for retrieval to
dashboard.\r\n�\tExpertise in AWS data migration between different database
platforms like SQL Server to Amazon Aurora using RDS tool.\r\n�\tResponsible for
continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS
console.\r\n�\tImplemented AWS Lambda functions to run scripts in response to
events in Amazon Dynamo DB table or S3 bucket or to HTTP requests using Amazon API
Gateway.\r\n�\tExperience in working on AWS Kinesis for processing huge amounts of
real time data.\r\n�\tAutomated the installation of ELK agent (file beat) with
Ansible playbook. Developed KAFKA Queue System to Collect Log data without Data
Loss and Publish to various Sources.\r\n�\tUsed AWS Cloud Formation to ensure
successful deployment of database templates. Automated cloud deployments using
Chef, Python (Boto and Fabric), Ruby, Scripting and AWS Cloud Formation
templates.\r\n�\tConfigured AWS IAM and Security Group as per requirement and
distributed them as groups into various availability zones of the VPC.\r\n\r\nBIG
DATA DEVELOPER\r\nIntuitive Research & Technology\r\nHuntsville, AL\r\nDecember
2014 - April 2016\r\n\r\n�\tMigrated data from RDBMS for streaming or static data
into the Hadoop cluster using Hive, Pig, Flume and Sqoop.\r\n�\tImplemented HDFS
access controls, directory and file permissions user authorization that facilitates
stable, secure access for multiple users in a large multi-tenant
clusterDFS\r\n�\tHadoop\r\n�\tApplication development using Hadoop Ecosystems such
as Spark, Kafka, HDFS, HIVE, Oozie and Sqoop.\r\n�\tWorked in Big Data Hadoop
Ecosystem technologies like HDFS, Map Reduce, YARN, Apache Hive, Apache Spark,
Hbase, Scala and Python for distributed processing of data.\r\n�\tAutomated all the
jobs for pulling data from FTP server to load data into Hive tables, using Oozie
workflows.\r\n�\tInvolved in scheduling Oozie workflow engine to run multiple
HiveQL, Sqoop and Pig jobs.\r\n�\tDesigned HBase row key and data modelling to
insert to HBase tables using concepts of lookup tables and staging
tables.\r\n�\tSpark Involved in creating frameworks which utilized a large number
of Spark and Hadoop applications running in series to create one cohesive E2E Big
Data pipeline.\r\n�\tUsed Spark-SQL to Load Parquet data and created
Datasets defined by Case classes and handled structured data using Spark SQL which
were finally stored into Hive tables for downstream consumption.\r\n�\tCloudera
implementation of several applications, highly distributive, scalable and large in
nature using Cloudera Hadoop.\r\n�\tCloudera Manager used to collect metrics
\r\n�\tDeveloped Shell Scripts, Oozie Scripts and Python Scripts.\r\n\r\nHADOOP
DEVELOPER\r\nSage Rutty\r\nRochester NY\r\nAugust 2013 - December
2014\r\n\r\n�\tMonitored Hadoop cluster using tools like Nagios, Ganglia,
Ambari.\r\n�\tManaging Hadoop clusters via Cloudera Manager, Command Line, and
Hortonworks Ambari agent.\r\n�\tInstalled and configured Tableau Desktop to connect
to the Hortonworks Hive Framework (Database) which contains the Bandwidth data form
the locomotive through the Hortonworks ODBC connector for further analytics of the
data.\r\n�\tCluster\r\n�\tYarn\r\n�\tDeveloped Oozie workflow for scheduling and
orchestrating the ETL process within the Cloudera Hadoop system.\r\n�\tAutomated
workflows using shell scripts pull data from various databases into
Hadoop.\r\n�\tInvolved in Cluster Level Security, Security of perimeter
(Authentication- Cloudera Manager, Active directory, Kerberos/Ranger) Access
(Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator)
Data (Data Encryption at Rest).\r\n�\tBalancing Hadoop cluster using balancer
utilities to spread data across the cluster equally.\r\n�\tUsed Spark API over
Cloudera Hadoop YARN to perform analytics on data in Hive.\r\n�\tConfigured Yarn
capacity scheduler to support various business SLA's.\r\n�\tImplemented Capacity
schedulers on the Yarn Resource Manager to share the resources of the cluster for
the Map Reduce jobs given by the users.\r\n\r\nEDUCATION\r\nMaster�s of Computer
Science \r\nWestern Illinois University\r\nMacomb, Illinois\r\n"

You might also like