Módulo 1: Introduction
Temas:
- Course Introduction
- Beam and Dataflow Refresher
Objetivos:
- Introduce the course objectives.
- Demonstrate how Apache Beam and Dataflow work together to fulfill your organization’s data processing needs.
Módulo 2: Beam Portability
Temas:
- Beam Portability
- Runner v2
- Container Environments
- Cross-Language Transforms
Objetivos:
- Summarize the benefits of the Beam Portability Framework.
- Customize the data processing environment of your pipeline using custom containers.
- Review use cases for cross-language transformations.
- Enable the Portability framework for your Dataflow pipelines.
Módulo 3: Separating Compute and Storage with Dataflow
Temas:
- Dataflow
- Dataflow Shuffle Service
- Dataflow Streaming Engine
- Flexible Resource Scheduling
Objetivos:
- Enable Shuffle and Streaming Engine, for batch and streaming pipelines respectively, for maximum performance.
- Enable Flexible Resource Scheduling for more cost-efficient performance.
Módulo 4: IAM, Quotas, and Permissions
Temas:
Objetivos:
- Select the right combination of IAM permissions for your Dataflow job.
- Determine your capacity needs by inspecting the relevant quotas for your Dataflow jobs.
Módulo 5: Security
Temas:
- Data Locality
- Shared VPC
- Private IPs
- CMEK
Objetivos:
- Select your zonal data processing strategy using Dataflow, depending on your data locality needs.
- Implement best practices for a secure data processing environment.
Módulo 6: Beam Concepts Review
Temas:
- Beam Basics
- Utility Transforms
- DoFn Lifecycle
Objetivos:
Review main Apache Beam concepts (Pipeline, PCollections, PTransforms, Runner, reading/writing, Utility PTransforms, side inputs), bundles and DoFn Lifecycle.
Módulo 7: Windows, Watermarks, Triggers
Temas:
- Windows
- Watermarks
- Triggers
Objetivos:
- Implement logic to handle your late data.
- Review different types of triggers.
- Review core streaming concepts (unbounded PCollections, windows).
Módulo 8: Sources and Sinks
Temas:
- Sources and Sinks
- Text IO and File IO
- BigQuery IO
- PubSub IO
- Kafka IO
- Bigable IO
- Avro IO
- Splittable DoFn
Objetivos:
- Write the I/O of your choice for your Dataflow pipeline.
- Tune your source/sink transformation for maximum performance.
- Create custom sources and sinks using SDF.
Módulo 9: Schemas
Temas:
- Beam Schemas
- Code Examples
Objetivos:
- Introduce schemas, which give developers a way to express structured data in their Beam pipelines.
- Use schemas to simplify your Beam code and improve the performance of your pipeline.
Módulo 10: State and Timers
Temas:
- State API
- Timer API
- Summary
Objetivos:
- Identify use cases for state and timer API implementations.
- Select the right type of state and timers for your pipeline.
Módulo 11: Best Practices
Temas:
- Schemas
- Handling unprocessable Data
- Error Handling
- AutoValue Code Generator
- JSON Data Handling
- Utilize DoFn Lifecycle
- Pipeline Optimizations
Objetivos:
Implement best practices for Dataflow pipelines.
Módulo 12: Dataflow SQL and DataFrames
Temas:
- Dataflow and Beam SQL
- Windowing in SQL
- Beam DataFrames
Objetivos:
Develop a Beam pipeline using SQL and DataFrames.
Módulo 13: Beam Notebooks
Temas:
Beam Notebooks
Objetivos:
- Prototype your pipeline in Python using Beam notebooks.
- Launch a job to Dataflow from a notebook.
Módulo 14: Monitoring
Temas:
- Job List
- Job Info
- Job Graph
- Job Metrics
- Metrics Explorer
Objetivos:
- Navigate the Dataflow Job Details UI.
- Interpret Job Metrics charts to diagnose pipeline regressions.
- Set alerts on Dataflow jobs using Cloud Monitoring.
Módulo 15: Logging and Error Reporting
Temas:
Objetivos:
Use the Dataflow logs and diagnostics widgets to troubleshoot pipeline issues.
Módulo 16: Troubleshooting and Debug
Temas:
- Troubleshooting Workflow
- Types of Troubles
Objetivos:
- Use a structured approach to debug your Dataflow pipelines.
- Examine common causes for pipeline failures.
Módulo 17: Performance
Temas:
- Pipeline Design
- Data Shape
- Source, Sinks, and External Systems
- Shuffle and Streaming Engine
Objetivos:
- Understand performance considerations for pipelines.
- Consider how the shape of your data can affect pipeline performance.
Módulo 18: Testing and CI/CD
Temas:
- Testing and CI/CD Overview
- Unit Testing
- Integration Testing
- Artifact Building
- Deployment
Objetivos:
- Testing approaches for your Dataflow pipeline.
- Review frameworks and features available to streamline your CI/CD workflow for Dataflow pipelines.
Módulo 19: Reliability
Temas:
- Introduction to Reliability
- Monitoring
- Geolocation
- Disaster Recovery
- High Availability
Objetivos:
Implement reliability best practices for your Dataflow pipelines.
Módulo 20: Flex Templates
Temas:
- Classic Templates
- Flex Templates
- Using Flex Templates
- Google-provided Templates
Objetivos:
Using flex templates to standardize and reuse Dataflow pipeline code.
Módulo 21: Summary
Temas:
Summary
Objetivos:
Quick recap of training topics