Big Data Analytics

מק"ט: #3572 | משך קורס: 40 שעות אק'
| מספר מפגשים: 5

Business success in the information age is predicated on the ability of organizations to convert massive amount of raw data coming from various sources into high-grade business information.
Many organizations are overwhelmed by the sheer volume of information they have to process in order to stay competitive. Traditional database systems may become either prohibitively expensive to handle the exponential growth of data volumes or found unsuitable for the job.
Hadoop allows you to manage distributed file system and by that to better use your elastic domains. Hadoop is also, in most cases, the infrastructure for all the upper level distributed abstractors and services.
In addition to Hadoop, there are several other popular platforms which were designed to answer Big Data use cases.

This course is designed to introduce you to:

  • Hadoop eco-system.
  • Hadoop architecture and frameworks.
  • Basic NoSQL database types, concepts and applications.

Hands on exercises are included.
 

לפרטים נוספים, מלא את פרטיך או התקשר 03-7100673
*שדות חובה
PDF version

קהל יעד

This course is mainly intended for Database Administrators, Database Developers, Business Intelligence professionals, QA professionals, Data Analysts, and other roles responsible for analyzing high volumes of data.

תנאי קדם

  • Basic Knowledge of Python, or experience with other programming languages
  • Working experience with Databases
  • Prior knowledge of Hadoop, or any other Big Data solution is not required.  

נושאים

Introduction to Big Data and NoSQL Databases

  • RDBMS - Advantages and disadvantages
  • No-SQL vs. Traditional Enterprise Relational Data
  • ACID
  • Dynamic schema, sharding, replications and caching
  • Performance
  • Scaling vs. consistency
  • CAP theorem
  • No-SQL types and use cases
  • Key/value stores
  • Document databases
  • Column oriented databases
  • Graph databases

Introduction to Hadoop

  • The motivation for Hadoop
  • Installation and configuration files
  • Hadoop vs. traditional data storage and processing
  • Hadoop Distributors: Cloudera, Hortonworks, MapR
  • The building blocks of Hadoop
  • NameNode, DataNode and etc.
  • Working with HDFS
  • Basic file commands
  • Architecture
  • Reading and writing to HDFS programmatically

YARN (Map-Reduce 2)

  • Motivation for YARN
  • Architecture
  • Features

Hive

  • Introduction to Hive for ad-hoc queries
  • Hive data types
  • HiveQL

Pig

  • Introduction to Pig as data flow language
  • Pig Latin basic expressions
  • Operators for data processing

Hbase

  • Introduction to Hbase for processing huge tables
  • Hbase data model
  • Hbase vs. RDBMS
  • Client API (CRUD, queries and batch operations(
  • Interactive REST clients

Apache Spark (PySpark)

  • Basics
  • More on RDD Operations
  • Caching
  • Modules built on Spark
  • Spark Streaming
  • Spark SQL
  • Introduction to Machine Learning using PySpark

MongoDB

  • Introduction to MongoDB
  • Core concepts
  • Environments
  • CRUD and the MongoDB Shell

Neo4j

  • Graph database concepts
  • Basic querying - Nodes and Relationships
תגיות