Description
Duration: 5 days
During this five-day course, the student will develop the skills to design and implement big data engineering workflows with the Microsoft Cloud Ecosystem and Microsoft HD Insight to extract the greatest amount of value from Data.
The MCSA: Data Engineering with Azure Certification will give validation to the skills learned in implementing big data engineering workflows with Microsoft Cloud services and Microsoft HD Insight.
This course is ideal for:
Data Engineer
Data Architect
Data Scientist
Data Developer
After completing this course, students will be able to:
To describe the purpose of Azure Data Factory, and explain how it works
To describe how to create Azure Data Factory pipelines that can transfer data efficiently
To describe how to perform transformations using an Azure Data Factory pipeline
To describe how to monitor Azure Data Factory pipeline, and how to protect the data flowing through these pipelines
20755: Perform data engineering on Microsoft HD Insight
Deploy HDInsight Clusters
Authorizing Users to Access Resources
Loading Data into HDInsight
Troubleshooting HDInsight
Implement Batch Solutions
Design Batch ETL Solutions for Big Data with Spark
Analyze Data with Hive and Phoenix
Describe Stream Analytics
Implement Spark Streaming Using the DStream API
Develop Big Data Real-Time Processing Solutions with Apache Storm
Build Solutions that use Kafka and HBase
Perform Big Data Engineering on Microsoft Cloud Services (beta)
Describe common architectures for processing Big Data using Azure tools and services
Use Azure Stream Analytics to design and implement stream processing over large-scale data
How to include custom functions and incorporate machine learning activities into an Azure Stream Analytics job
How to use Azure Data Lake Store as a large-scale repository of data files
How to use Azure Data Lake Analytics to examine and process data held in Azure Data Lake Store
How to create and deploy custom functions and operations, integrate with Python and R, and protect and optimize jobs
How to use Azure SQL Data Warehouse to create a repository that can support large-scale analytical processing over data at rest
How to use Azure SQL Data Warehouse to perform analytical processing, how to maintain performance, and how to protect the data
How to use Azure Data Factory to import, transform, and transfer data between repositories and services
The purpose of Azure Data Factory, and explain how it works
How to create Azure Data Factory pipelines that can transfer data efficiently
How to perform transformations using an Azure Data Factory pipeline
How to monitor Azure Data Factory pipelines and how to protect the data flowing through these pipelines
Prerequisites
It is recommended that students interested in this course have previous knowledge or experience with:
Azure Data Services
Microsoft Windows Operating system and its core functionality
Relational databases
Programming using R, and familiarity with common R packages
Common statistical methods and data analysis best practices
What’s included?
- Authorized Courseware
- Intensive Hands on Skills Development with an Experienced Subject Matter Expert
- Hands-on practice on real Servers and extended lab support 1.800.482.3172
- Examination Vouchers & Onsite Certification Testing- (excluding Adobe and PMP Boot Camps)
- Academy Code of Honor: Test Pass Guarantee
- Optional: Package for Hotel Accommodations, Lunch and Transportation
With several convenient training delivery methods offered, The Academy makes getting the training you need easy. Whether you prefer to learn in a classroom or an online live learning virtual environment, training videos hosted online, and private group classes hosted at your site. We offer expert instruction to individuals, government agencies, non-profits, and corporations. Our live classes, on-sites, and online training videos all feature certified instructors who teach a detailed curriculum and share their expertise and insights with trainees. No matter how you prefer to receive the training, you can count on The Academy for an engaging and effective learning experience.
Methods
- Instructor Led (the best training format we offer)
- Live Online Classroom – Online Instructor Led
- Self-Paced Video
Speak to an Admissions Representative for complete details
Start | Finish | Public Price | Public Enroll | Private Price | Private Enroll |
---|---|---|---|---|---|
9/23/2024 | 9/27/2024 | ||||
10/14/2024 | 10/18/2024 | ||||
11/4/2024 | 11/8/2024 | ||||
11/25/2024 | 11/29/2024 | ||||
12/16/2024 | 12/20/2024 | ||||
1/6/2025 | 1/10/2025 | ||||
1/27/2025 | 1/31/2025 | ||||
2/17/2025 | 2/21/2025 | ||||
3/10/2025 | 3/14/2025 | ||||
3/31/2025 | 4/4/2025 | ||||
4/21/2025 | 4/25/2025 | ||||
5/12/2025 | 5/16/2025 | ||||
6/2/2025 | 6/6/2025 | ||||
6/23/2025 | 6/27/2025 | ||||
7/14/2025 | 7/18/2025 | ||||
8/4/2025 | 8/8/2025 | ||||
8/25/2025 | 8/29/2025 | ||||
9/15/2025 | 9/19/2025 | ||||
10/6/2025 | 10/10/2025 | ||||
10/27/2025 | 10/31/2025 | ||||
11/17/2025 | 11/21/2025 | ||||
12/8/2025 | 12/12/2025 | ||||
12/29/2025 | 1/2/2026 |
Curriculum
20775: Perform Data Engineering on Microsoft HDInsight
Module 1: Getting Started with HDInsight
Lessons
What is Big Data?
Introduction to Hadoop
Working with MapReduce Function
Introducing HDInsight
Lab: Working with HDInsight
Provision an HDInsight cluster and run MapReduce jobs
Module 2: Deploying HDInsight Clusters
Lessons
Identifying HDInsight cluster types
Managing HDInsight clusters by using the Azure portal
Managing HDInsight Clusters by using Azure PowerShell
Lab: Managing HDInsight clusters with the Azure Portal
Create an HDInsight cluster that uses Data Lake Store storage
Customize HDInsight by using script actions
Delete an HDInsight cluster
Module 3: Authorizing Users to Access Resources
Lessons
Non-domain Joined clusters
Configuring domain-joined HDInsight clusters
Manage domain-joined HDInsight clusters
Lab: Authorizing Users to Access Resources
Prepare the Lab Environment
Manage a non-domain joined cluster
Module 4: Loading data into HDInsight
Lessons
Storing data for HDInsight processing
Using data loading tools
Maximizing value from stored data
Lab: Loading Data into your Azure account
Load data for use with HDInsight
Module 5: Troubleshooting HDInsight
Lessons
Analyze HDInsight logs
YARN logs
Heap Dumps
Operations management suite
Lab: Troubleshooting HDInsight
Analyze HDInsight logs
Analyze YARN logs
Monitor resources with Operations Management Suite
Module 6: Implementing Batch Solutions
Lessons
Apache Hive storage
HDInsight data queries using Hive and Pig
Operationalize HDInsight
Lab: Implement Batch Solutions
Deploy HDInsight cluster and data storage
Use data transfers with HDInsight clusters
Query HDInsight cluster data
Module 7: Design Batch ETL solutions for big data with Spark
Lessons
What is Spark?
ETL with Spark
Spark performance
Lab: Design Batch ETL solutions for big data with Spark
Create an HDInsight Cluster with access to Data Lake Store
Use HDInsight Spark cluster to analyze data in Data Lake Store
Analyzing website logs using a custom library with Apache Spark cluster on HDInsight
Managing resources for Apache Spark cluster on Azure HDInsight
Module 8: Analyze Data with Spark SQL
Lessons
Implementing iterative and interactive queries
Perform exploratory data analysis
Lab: Performing exploratory data analysis by using iterative and interactive queries
Build a machine learning application
Use zeppelin for interactive data analysis
View and manage Spark sessions by using Livy
Module 9: Analyze Data with Hive and Phoenix
Lessons
Implement interactive queries for big data with an interactive hive.
Perform exploratory data analysis by using Hive
Perform interactive processing by using Apache Phoenix
Lab: Analyze data with Hive and Phoenix
Implement interactive queries for big data with an interactive Hive
Perform exploratory data analysis by using Hive
Perform interactive processing by using Apache Phoenix
Module 10: Stream Analytics
Lessons
Stream analytics
Process streaming data from stream analytics
Managing stream analytics jobs
Lab: Implement Stream Analytics
Process streaming data with stream analytics
Managing stream analytics jobs
Module 11: Implementing Streaming Solutions with Kafka and HBase
Lessons
Building and Deploying a Kafka Cluster
Publishing, Consuming, and Processing data using the Kafka Cluster
Using HBase to store and Query Data
Lab: Implementing Streaming Solutions with Kafka and HBase
Create a virtual network and gateway
Create a storm cluster for Kafka
Create a Kafka producer
Create a streaming processor client topology
Create a Power BI dashboard and streaming dataset
Create an HBase cluster
Create a streaming processor to write to HBase
Module 12: Develop big data real-time processing solutions with Apache Storm
Lessons
Persist long-term data
Stream data with Storm
Create Storm topologies
Configure Apache Storm
Lab: Developing big data real-time processing solutions with Apache Storm
Stream data with Storm
Create Storm Topologies
Module 13: Create Spark Streaming Applications
Lessons
Working with Spark Streaming
Creating Spark Structured Streaming Applications
Persistence and Visualization
Lab: Building a Spark Streaming Application
Installing Required Software
Building the Azure Infrastructure
Building a Spark Streaming Pipeline
20776A: Performing Big Data Engineering on Microsoft Cloud Services
Module 1: Architectures for Big Data Engineering with Azure
Lessons
Understanding Big Data
Architectures for Processing Big Data
Considerations for designing Big Data solutions
Lab: Designing a Big Data Architecture
Design big data architecture
Module 2: Processing Event Streams using Azure Stream Analytics
Lessons
Introduction to Azure Stream Analytics
Configuring Azure Stream Analytics jobs
Lab: Processing Event Streams with Azure Stream Analytics
Create an Azure Stream Analytics job
Create another Azure Stream job
Add an Input
Edit the ASA job
Determine the nearest Patrol Car
Module 3: Performing custom processing in Azure Stream Analytics
Lessons
Implementing Custom Functions
Incorporating Machine Learning into an Azure Stream Analytics Job
Lab: Performing Custom Processing with Azure Stream Analytics
Add logic to the analytics
Detect consistent anomalies
Determine consistencies using machine learning and ASA
Module 4: Managing Big Data in Azure Data Lake Store
Lessons
Using Azure Data Lake Store
Monitoring and protecting data in Azure Data Lake Store
Lab: Managing Big Data in Azure Data Lake Store
Update the ASA Job
Upload details to ADLS
Module 5: Processing Big Data using Azure Data Lake Analytics
Lessons
Introduction to Azure Data Lake Analytics
Analyzing Data with U-SQL
Sorting, grouping, and joining data
Lab: Processing Big Data using Azure Data Lake Analytics
Add functionality
Query against Database
Calculate average speed
Module 6: Implementing custom operations and monitoring performance in Azure Data Lake Analytics
Lessons
Incorporating custom functionality into Analytics jobs
Managing and Optimizing jobs
Lab: Implementing custom operations and monitoring performance in Azure Data Lake Analytics
Custom extractor
Custom processor
Integration with R/Python
Monitor and optimize a job
Module 7: Implementing Azure SQL Data Warehouse
Lessons
Introduction to Azure SQL Data Warehouse
Designing tables for efficient queries
Importing Data into Azure SQL Data Warehouse
Lab: Implementing Azure SQL Data Warehouse
Create a new data warehouse
Design and create tables and indexes
Import data into the warehouse.
Module 8: Performing Analytics with Azure SQL Data Warehouse
Lessons
Querying Data in Azure SQL Data Warehouse
Maintaining Performance
Protecting Data in Azure SQL Data Warehouse
Lab: Performing Analytics with Azure SQL Data Warehouse
Performing queries and tuning performance
Integrating with Power BI and Azure Machine Learning
Configuring security and analyzing threats
Lessons
Introduction to Azure Data Factory
Transferring Data
Transforming Data
Monitoring Performance and Protecting Data
Lab: Automating the Data Flow with Azure Data Factory
Automate the Data Flow with Azure Data Factory