Data Engineering with AWS: Learn how to design and build cloud-based data transformation pipelines using AWS

Data Engineering with AWS: Learn how to design and build cloud-based data transformation pipelines using AWS

作者: Gareth Eagar
出版社: Packt Publishing
出版在: 2021-12-29
ISBN-13: 9781800560413
ISBN-10: 1800560419
裝訂格式: Quality Paper - also called trade paper
總頁數: 482 頁





內容描述


Key Features

Learn about common data architectures and modern approaches to generating value from big data
Explore AWS tools for ingesting, transforming, and consuming data, and for orchestrating pipelines
Learn how to architect and implement data lakes and data lakehouses for big data analytics

Book Description
Knowing how to architect and implement complex data pipelines is a highly sought-after skill. Data engineers are responsible for building these pipelines that ingest, transform, and join raw datasets - creating new value from the data in the process.
Amazon Web Services (AWS) offers a range of tools to simplify a data engineer's job, making it the preferred platform for performing data engineering tasks.
This book will take you through the services and the skills you need to architect and implement data pipelines on AWS. You'll begin by reviewing important data engineering concepts and some of the core AWS services that form a part of the data engineer's toolkit. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how the transformed data is used by various data consumers. The book also teaches you about populating data marts and data warehouses along with how a data lakehouse fits into the picture. Later, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. In the final chapters, you'll understand how the power of machine learning and artificial intelligence can be used to draw new insights from data.
By the end of this AWS book, you'll be able to carry out data engineering tasks and implement a data pipeline on AWS independently.
What you will learn

Understand data engineering concepts and emerging technologies
Ingest streaming data with Amazon Kinesis Data Firehose
Optimize, denormalize, and join datasets with AWS Glue Studio
Use Amazon S3 events to trigger a Lambda process to transform a file
Run complex SQL queries on data lake data using Amazon Athena
Load data into a Redshift data warehouse and run queries
Create a visualization of your data using Amazon QuickSight
Extract sentiment data from a dataset using Amazon Comprehend

Who this book is for
This book is for data engineers, data analysts, and data architects who are new to AWS and looking to extend their skills to the AWS cloud. Anyone who is new to data engineering and wants to learn about the foundational concepts while gaining practical experience with common data engineering services on AWS will also find this book useful.
A basic understanding of big data-related topics and Python coding will help you get the most out of this book but is not needed. Familiarity with the AWS console and core services is also useful but not necessary.


目錄大綱


Table of Contents

An Introduction to Data Engineering
Data Management Architectures for Analytics
The AWS Data Engineer's Toolkit
Data Cataloging, Security and Governance
Architecting Data Engineering Pipelines
Ingesting Batch and Streaming Data
Transforming Data to Optimize for Analytics
Identifying and Enabling Data Consumers
Loading Data into a Data Mart
Orchestrating the Data Pipeline
Ad Hoc Queries with Amazon Athena
Visualizing Data with Amazon QuickSight
Enabling Artificial Intelligence and Machine Learning
Wrapping Up the First Part of Your Learning Journey


作者介紹


Gareth Eagar has worked in the IT industry for over 25 years, starting in South Africa, then working in the United Kingdom, and now based in the United States. In 2017, he started working at Amazon Web Services (AWS) as a solution architect, working with enterprise customers in the NYC metro area. Gareth has become a recognized subject matter expert for building data lakes on AWS, and in 2019 he launched the Data Lake Day educational event at the AWS Lofts in NYC and San Francisco. He has also delivered a number of public talks and webinars on topics relating to big data, and in 2020 Gareth transitioned to the AWS Professional Services organization as a senior data architect, helping customers architect and build complex data pipelines.




相關書籍

Google Apps Script 雲端自動化與動態網頁系統實戰 (附320分鐘影音教學/範例程式碼)

作者 呂國泰 白乃遠 王榕藝

2021-12-29

微軟 Azure 實戰參考

作者 李競 陳勇華

2021-12-29

Kubernetes 使用指南

作者 龔正 吳治輝 葉伙榮 張龍春 Philipz(鄭淳尹) 譯

2021-12-29