You are on page 1of 4

Google Certified Professional - Data Engineer

Job Role Description
A Google Certified Professional - Data Engineer enables data-driven decision making by collecting,
transforming, and visualizing data. The data engineer should be able to design, build, maintain, and
troubleshoot data processing systems with a particular emphasis on the security, reliability,
fault-tolerance, scalability, fidelity, and efficiency of such systems. The data engineer should also be able
to analyze data to gain insight into business outcomes, build statistical models to support
decision-making, and create machine learning models to automate and simplify key business processes.

Certification Exam Guide
Section 1: Designing data processing systems
1.1
Designing flexible data representations. Considerations include:
● future advances in data technology
● changes to business requirements
● awareness of current state and how to migrate the design to a future state
● data modeling
● tradeoffs
● distributed systems
● schema design
1.2

Designing data pipelines. Considerations include:
● future advances in data technology
● changes to business requirements
● awareness of current state and how to migrate the design to a future state
● data modeling
● tradeoffs
● system availability
● distributed systems
● schema design
● common sources of error (eg. removing selection bias)

1.3

Designing data processing infrastructure. Considerations include:
● future advances in data technology
● changes to business requirements
● awareness of current state, how to migrate the design to the future state
● data modeling
● tradeoffs
● system availability
● distributed systems
● schema design
● capacity planning

Considerations include: ● data cleansing ● batch and streaming ● transformation ● acquire and import data ● testing and quality control ● connecting to new data sources 2. Considerations include: ● provisioning resources ● monitoring pipelines ● adjusting pipelines ● testing and quality control Section 3: Analyzing data and enabling machine learning 3. Considerations include: ● repeatability ● generalization ● distributed computing ● improved model accuracy 3. middleware. service-oriented Section 2: Building and maintaining data structures and databases 2.● different types of architectures: message brokers.3 Building and maintaining processing infrastructure.1 Analyzing data.3 Identifying or building data visualization and reporting tools. Considerations include: ● automation ● decision support ● data summarization ● enabling patterns and insights .2 Transforming data to enable machine learning and pattern discovery. Considerations include: ● data profiling ● data correlation ● patterns and insights ● anomaly detection ● statistical models ● machine learning ● assessing the statistical relevance of conclusions 3.1 Building and maintaining flexible data representations 2. message queues.2 Building and maintaining pipelines.

integrity) 6. troubleshooting. Considerations include: ● planning (e. removing selection bias) Section 5: Ensuring reliability 5. Considerations include: .g. rerunning failed jobs.1 Designing secure data infrastructure and processes.g. Considerations include: ● Identify and Access Management (IAM) ● data security ● penetration testing ● Separation of Duties (SoD) ● security control 7. and improving data representations and data processing infrastructure. fault-tolerance) ● executing (e. Considerations include: ● automation ● decision support ● data summarization.2 Designing for legal compliance.2 Advocating policies and publishing data and reports.Section 4: Modeling business processes for analysis and optimization 4. distributed systems ● high performance algorithms ● common sources of error (eg. (e. Considerations include: ● resizing and scaling resources ● data cleansing.1 Building (or selecting) data visualization and reporting tools.3 Recovering data. fidelity. data infrastructure performance and cost. 5.1 Performing quality control. trackability..g. Section 7: ​ ​Designing for security and compliance 7.1 Mapping business requirements to data representations. Considerations include: ● verification ● building and running test suites ● pipeline monitoring 5. performing retrospective re-analysis) ● stress testing data recovery plans and processes Section 6: Visualizing data and advocating policy 6. translation up the chain. Considerations include: ● working with business users ● gathering business requirements 4.2 Optimizing data representations.2 Assessing.

Children’s Online Privacy Protection Act (COPPA). audits . etc.● ● Health Insurance Portability and Accountability Act (HIPAA).