GCP Data Engineering
Module-1: Introduction to Cloud Computing
- Differences between on-premises and cloud
- What is Cloud Computing
- Cloud Service Models (IAAS, PAAS and SAAS)
- Cloud Deployment Models (Public, Private and Cloud)
- Leading Cloud Providers (AWS, AZURE and GCP)
Module-2: Starting with Google Cloud
- Understanding the fundamentals of Google Cloud Platform
- Create GCP free tier account
- How to use Cloud Console
- Cloud Locations - Regions And Zone
- Google Cloud Services Overview
- GCP Interfaces
- How to use Cloud Shell
- Cloud SDK installation and setup
Module-3: IAM and Resource Hierarchy
- Resource Management & IAM Introduction
- What is Resource, Resource Hierarchy and Benefits
- Demo – Create and Manage Project
- Billing Account Introduction
- IAM introduction
- What is member and Service Account
- What is Roles and Permissions
- Demo – Assign role to member
- What is Policy
Module-4: Google Storage and Database Services
1. Cloud Storage :
a. Introduction to Google Cloud Storage
b. How to store and Retrieve the data
c. What is bucket and object? How to create?
d. Working with GCS via both Console and Shell (gsutil commands)
e. What are storage classes, how to choose proper storage class?
f. How to control the access to buckets and objects
2. Cloud SQL :
a. Introduction to Cloud SQL
b. How to setup, operate and manage the Relational databases with Cloud SQL
c. Difference between the Relational and No SQL databases
d. Processing the bulk data loads and setting up the Migration Jobs
e. Create and connect to database engines such as SQL server, Postgre and MySQL
f. Difference between the Transactional and Data warehouse
3. Cloud Spanner :
a. Introduction to Cloud Spanner
b. Spanner data types and models
c. How to setup, operate and manage the Relational databases with Cloud Spanner
d. Difference between Cloud SQL and Cloud Spanner
e. How to create instance, databases and manage the data in Spanner
f. What if we data already, want to Migrate to Spanner ?
4. BigTable :
a. Introduction to BigTable
b. How to setup, operate and manage the No-SQL databases with Big table
c. Introduction to No-SQL databases and types
d. BigTable Schema design row key, column families and column values
e. Create a instance, connect to Hbase Terminal and Process the data
Module-5: Google Big Data Services
1. BigQuery :
a. What is big data and 5V’s of big data
b. Introduction to BigQuery, data warehousing on GCP
c. BigQuery(Modern Dwh) vs Traditional Warehouses
d. How to collect, store and analyse the data with BigQuery
e. Native table and External tables
f. What is View and Authorized views
g. Storage optimization :i. What is partitioned tables(Partitioning)
ii. What is Clustered table(Clustering)h. Query results : Temporary table and Permeant tables
i. How to create dataset, working with tables, transformations
2. Dataproc :
a. Introduction to Hadoop and Spark infrastructure
b. What is DataProc service, features and benefits
c. What is cluster, how to setup a cluster and component gateways
d. Traditional vs Dataproc clusters
e. Hadoop vs GCP
f. Introduction to Pyspark SQL data frames. How to process the data with pyspark in DataProc
g. How to submit pyspark jobs
3. Dataflow :
a. Introduction to Batch and Stream data processing
b. What is Dataflow, features and benefits
c. Apache Beam Programming Model
d. Data Pipeline vs ETL
e. Difference between Dataflow vs DataProc
f. How to create, manage, monitor data pipelines with DataFlow
g. Pre-defined vs Custom Templates
h. How can you use Apache Beam and Cloud Dataflow together to process large sets of data
4. Pub/Sub :
a. Introduction to Stream data processing
b. What is Pub/Sub, features and benefits
c. When and how to use Cloud Pub/Sub in your application
d. How to create Topic and add subscriptions
e. How to integrate with other services in Pub/Sub for ingestion
5. Airflow/ Cloud Composer :
a. Introduction to Cloud Composer/Airflow and features
b. Understanding the working of Airflow
c. Airflow terminologies
- DAG, Tasks, Operators and dependencies
d. How to create, schedule, monitor and manage the DAGs with Python
e. How to setup composer environment and connect to Airflow webserver
f. How to Trigger the DAG, check the logs for error finding.
g. How to integrate with other services
6. Data Fusion :
a. Introduction to Data fusion, features and benefits
b. How to create data pipelines without any code
c. How to perform transformations without any code
d. Setting up a instance with wrangler
e. Create, monitor, schedule and manage pipelines with fusion by integrating with other services
Module-6: Google Compute Services
- Compute Engine: virtual machines
- Kubernetes Engine
- App Engine
- What is Continuous integration and Continuous Deployment
- Understanding CI/CD components with GCP Services
- Cloud Build
- Google Cloud Repository
- Google Container Registry
Module-7: Real-Time Projects
- Building a Data Warehouse in BigQuery
- Building a Data Lake Using Dataproc
- Building Orchestration for Batch Data Loading Using Cloud Composer
- Processing Streaming Data with Pub/Sub and Dataflow
- ETL case study with PySpark
Topics: We cover almost all the services related to above modules mentioned.
Pre-Requisite: Not required, I will start the course covering all the basics keeping everyone in mind, Concepts will be cleared in both Telugu and English as needed