Big Data - Hadoop Ecosystem, Map-Reduce & HDFS

מק"ט: #3572 | משך קורס: 40 שעות אק'
| מספר מפגשים: 5

Hadoop HDFS allows you to manage distributed file system and by that to better use your elastic domains. Hadoop is also, in most cases, is the infrastructure for all the upper level distributed abstractors and services like No-SQL implementations.

This course is designed to introduce you to:

  • Basic no-sql database types, concepts and applications.
  • Hadoop architecture and frameworks.
  • Hadoop eco-system.

Hand on exercises are included. 

לפרטים נוספים, מלא את פרטיך או התקשר 03-7100673
*שדות חובה
PDF version

קהל יעד

  • Architects and developers

תנאי קדם

  • Basic Knowledge of Java
  • Prior knowledge of Hadoop is not required.


Introduction to Big-data and No-Sql databases

  • RDBMS - Advantages and disadvantages.
  • No-Sql vs. Traditional Enterprise Relational Data
  • ACID.
  • Dynamic schema, sharding, replications and caching.
  • Performance.
  • Scaling vs consistency.
  • CAP theorem.
  • No-Sql types and use cases
  • Key/value stores.
  • Document databases.
  • Column oriented databases.
  • Graph databases. 

Introduction to Hadoop

  • The motivation for Hadoop.
  • Installation and configuration files.
  • Hadoop vs traditional data storage and processing.
  • Hadoop Distributors: Cloudera, Hortonworks, MapR
  • The building blocks of Hadoop.
  • NameNode, DataNode and etc.
  • Working with HDFS
  • Basic file commands
  • Architecture.
  • Reading and writing to HDFS programmatically.
  • Serialization – the writable interface.

Writing Map-Reduce

  • Hadoop data types.
  • InputFormat and OutputFormat
  • Classic Map-reduce (Map-Reduce 1).
  • Mapper, Reducer, Partitioner and Combiner.
  • Distributed Cache.
  • Job scheduler

YARN (Map-Reduce 2)

  • Motivation for YARN.
  • Architecture.
  • Features.


  • Introduction to Hive for ad-hoc queries
  • Hive basics
  • Hive data types
  • HiveQL


  • Introduction to Pig as data flow language
  • Pig Latin basic expressions
  • Operators for data processing


  • Introduction to Pig as data flow language
  • Introduction to Hbase for processing huge tables
  • Hbase data model
  • Hbase vs. RDBMS
  • Client API (CRUD, queries and batch operations(
  • Interactive REST clients

Apache Spark

  • Basics
  • More on RDD Operations
  • Caching
  • Modules built on Spark
  • Spark Streaming
  • Spark SQL