Big Data Analytics

מק"ט: #3572 | משך קורס: 40 שעות אק'
| מספר מפגשים: 5

Hadoop allows you to manage distributed file system and by that to better use your elastic domains. Hadoop is also, in most cases, the infrastructure for all the upper level distributed abstractors and services.
This course is designed to introduce you to:

  • Hadoop eco-system
  • Hadoop architecture and frameworks
  • Basic no-sql database types, concepts and applications

Hand on exercises are included. 

לפרטים נוספים, מלא את פרטיך או התקשר 03-7100673
*שדות חובה
PDF version

קהל יעד

This course is mainly intended for Database Administrators, Database Developers, Business Intelligence professionals, QA professionals, Data Analysts, and other roles responsible for analyzing high volumes of data

תנאי קדם

  • Basic Knowledge of Python, or experience with other programming languages
  • Working experience with Databases
  • Prior knowledge of Hadoop is not required

נושאים

  • Introduction to Big-data and No-Sql databases
    • RDBMS - Advantages and disadvantages.
    • No-Sql vs. Traditional Enterprise Relational Data
    • ACID.
    • Dynamic schema, sharding, replications and caching.
    • Performance.
    • Scaling vs consistency.
    • CAP theorem.
    • No-Sql types and use cases
    • Key/value stores.
    • Document databases.
    • Column oriented databases.
    • Graph databases. 
  • Introduction to Hadoop
    • The motivation for Hadoop.
    • Installation and configuration files.
    • Hadoop vs traditional data storage and processing.
    • Hadoop Distributors: Cloudera, Hortonworks, MapR
    • The building blocks of Hadoop.
    • NameNode, DataNode and etc.
    • Working with HDFS
    • Basic file commands
    • Architecture.
    • Reading and writing to HDFS programmatically.
  • YARN (Map-Reduce 2(
    • Motivation for YARN.
    • Architecture.
    • Features.
  • Hive
    • Introduction to Hive for ad-hoc queries
    • Hive data types
    • HiveQL
  • Pig
    • Introduction to Pig as data flow language
    • Pig Latin basic expressions
    • Operators for data processing
  • Hbase
    • Introduction to Pig as data flow language
    • Introduction to Hbase for processing huge tables
    • Hbase data model
    • Hbase vs. RDBMS
    • Client API (CRUD, queries and batch operations(
    • Interactive REST clients
  • Apache Spark (PySpark)
    • Basics
    • More on RDD Operations
    • Caching
    • Modules built on Spark
    • Spark Streaming
    • Spark SQL
תגיות