Streaming and Transporting Data into Apache Hadoop

מק"ט: #3576 | משך קורס: 8 שעות אק'
| מספר מפגשים: 1

Over the last few years, many organizations have made a strategic decision to turn into big data. At the heart of this challenge, there is the process of extracting data from many sources, transforming it, and then load it into your Data Warehouse for subsequent analysis. A process known as “Extract, Transform & Load” (ETL).

Apache Hadoop is one of the most common platforms for managing big data, and in this course, we'll introduce you with three common methods of transporting and streaming your data into your Hadoop Data File System (HDFS):

  • Data transfer between Hadoop and relational databases using Apache Sqoop
  • Collecting, aggregating, and moving large amounts of streaming data into Hadoop using Apache Flume and Apache Kafka

Hand on exercises are included.

הקורס פעיל לקבוצות מטעם ארגונים בלבד, ניתן לשלוח פנייה רק אם מדובר בקבוצה
*שדות חובה
PDF version

קהל יעד

This course is mainly intended for System Administrators, Developers, Business Intelligence professionals, and other roles responsible for transferring data into Hadoop.

תנאי קדם

  • Working experience with Databases
  • Prior knowledge of Hadoop, working experience with HDFS in particular.
  • Basic understanding of Apache Hive  


Working with Apache Sqoop

  • Introduction to Sqoop
  • Import Architecture
  • Transferring RDBMS tables into the HDFS
  • Integrating with Hive
  • Incremental Import
  • Export Architecture
  • Exporting data from HDFS into RDBMS

Working with Apache Flume

  • Introduction to Flume
  • Flume Architecture
  • Setting up Flume Agents

Working with Apache Kafka

  • Introduction to Kafka
  • Use Cases
  • Kafka in the Enterprise
  • Topics and Partitions
  • Brokers
  • Topic Replication Factor
  • Producers
  • Consumers
  • Zookeeper
  • Kafka Basic Configuration