Big Data Analysis Using Pig and Hive

דאטה ומסדי נתונים - Data

Big Data Analysis Using Pig and Hive

מספר הקורס 3536

למה ללמוד בג'ון ברייס?

למידה חדשנית ודינמית עם כלים מתקדמים בשילוב סימולציות, תרגול וסביבות מעבדה
מגוון הכשרות טכנולוגיות עם תכנים המותאמים להתפתחות הטכנולוגית ולביקוש בתעשיית ההייטק
מובילים את תחום ההכשרות לעולם ההייטק והטכנולוגיה כבר 30 שנה, עם קהילה של עשרות אלפי בוגרים
אתם בוחרים איך ללמוד: פרונטאלית בכיתה, מרחוק ב- Live Class או בלמידה עצמית

מעניין אותי לשמוע עוד

קורס לקבוצות

הקורס נפתח במתכונת של קבוצה בלבד, בהתאמה אישית לארגונים.
לפרטים נוספים: Muzman@johnbryce.co.il

משך הקורס

שעות לימוד:

מספר מפגשים:

קורס בוקר:

3 מתכונת הקורס

הקורסים המוזמנים לארגונים מותאמים באופן אישי ומלא לצרכי הארגון, מערכי הלימוד גמישים וניתן לשלב בהם תכנים רלוונטיים וייעודיים.

Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns and other useful information that can provide competitive advantages and better business results.

Big data analytics can be done with the software tools commonly used – but the unstructured data sources used for big data analytics may not fit in traditional data warehouses. Additionally, traditional data warehouses may not be able to handle the processing demands posed by big data. As a result, a new class of big data technology has emerged and is being used in many big data analytics environments.

In this course participants will learn how to analyze Big Data stored in Hadoop by using Pig and Hive.

This course is designed to introduce you to Hadoop architecture and MapReduce framework, Pig scripting language and Hive

full syllabus

PDF להורדה

Module 1: Introduction to Big Data

RDBMS – Advantages and disadvantages
Dynamic schema, sharding, replications and caching
Performance
Motivation for Hadoop

Module 2: Introduction to Hadoop

Hadoop overview
HDFS architecture
Map/Reduce framework
Joins with Map-Reduce
- Map-side join
- Reduce side join
- Join with distributed cache
Hands on: launching Hadoop and a map reduce job on it

Module 3: Pig

Thinking like a Pig
Pig vs RDBMS
Learning Pig Latin
- Structure
- Statement
- Expressions
- Data Types
- Schemas
- Functions
- Macros
Data processing operators:
- Loading
- Grouping
- Sorting

Module 4: Pig Advanced

Data sampling
Execution plan
User defined functions
Case studies

Module 5: Hive

Hive introduction
- What is Hive?
- Hive schema and data storage
- Hive vs RDBMS
- Hive vs Pig.
- When to use Hive?
- Interacting with hive.
- Hive services
The Hive commands
Hive as relational data
The Metastore

Module 6: Hive Data Types

Primitive Data types
Collection data types (Struct, map and array)
Text file encoding of data values

Module 7: HiveQL – DDL

Database and table commands
External tables
Partitioned table
Storage formats

Module 8: HiveQL – Data Manipulation

Loading data
Inserting data into table from queries
Exporting data

Module 9: HiveQL – Queries

SELECT clauses
WHERE clauses
Nested SELECT
Using functions
GROUP BY
JOINS statements
ORDER BY
SORT BY
Queries that sample data

Module 10: HiveQL – Views

Why view?
View and map types for dynamic tables

Module 11: HiveQL – Indexes

Creating an index
Rebuilding the index

Module 12: Schema design

Schema on read vs schema on write
Table by day
Partitioning
Unique keys and normalization
Bucketing
Compression

Module 13: Hive advanced

Hive optimization and tuning:
- Using explain
- Job execution plans
- Optimized joins
- Local mode
- Parallel execution
- Tuning with Limit
- Controlling the number of mappers and reducers
- Indexing
- Dynamic partition

Module 14: Functions

Standard functions.
Aggregate functions.
Table generating functions.
Statistics and data mining function in Hive.
User defined functions.

Module 15: Advance format types

File formats:
- Sequence
- RCFile
CSV, TSV and SerDes files.
XML.
JSON SerDe.

Prerequisites

SQL and basic UNIX or Linux commands
A background in Java is NOT required
Prior knowledge of Apache Hadoop is NOT required

דאטה ומסדי נתונים - Data

Big Data Analysis Using Pig and Hive

המועדים הקרובים

קורס לקבוצות

משך הקורס

שעות לימוד:

מספר מפגשים:

קורס בוקר:

3

מתכונת הקורס

Overview

On Completion, Delegates will be able to

Who Should Attend

תכנית הלימודים

יכול לעניין אותך גם...

רוצה עוד מידע על קורס בהתאמה אישית לארגון שלך?