~ Home ~

Description

This course aims to provide students an understanding in the operating principles and hands-on experience with mainstream Big Data Computing systems. Open-source platforms for Big Data processing and analytics would be discussed. Data mining algorithms and machine learning applications are another major stream of this course. In addition, widely-adopted optimization methods and models for big data analytics will also be investigated. Topics to be covered include:

Programming models and design patterns for mainstream Big Data computational frameworks ;
System Architecture and Resource Management for Data-center-scale Computing ;
Algorithm Design for Big Data Analytics, e.g., SVM Model, K-means Clustering, Deep Neural Network ;
Optimization Methods, e.g., convex optimization, gradient descent, online optimization ;

Course Pre-requisite:

This course contains substantial hands-on components which require solid background in programming and hands-on operating systems experience. If you have never used a command-line interface to install/configure/manage an operating system, e.g. a linux-based one, you will need to pick-up the skills yourself and IT CAN BE VERY TIME-CONSUMING for you to complete the homeworks. (Students without the aforementioned required background may take several 10's of hours to finish EACH homework assignment).

Course Information

Lecture time and venue:

6F503 (2:30pm - 5:00pm, Tuesday);

Instructor:

Dr. Huanle Xu. xhlcuhk [at] gmail [dot] com
Office hours: Fri 4:30-5:15pm or by Appointment (9A 304)

Teaching Assistant:

Zizhao Mo yc17461@connect.um.edu.mo

Recommended Programming References

[DataAlgorithms] Data Algorithms: Recipes for Scaling Up with Hadoop and Spark, by Mahmoud Parsian, Publisher: O'Reilly Media, Aug 2015
[Pig] Programming Pig, by Alan Gates, published by O’Reilly Media
[Hive] Programming Hive, by Edward Capriolo, Dean Wampler, Jason Rutherglen, published by O’Reilly Media,
[OpenStackOp] OpenStack Operations Guide, published by O’Reilly Media, (current-version available online at: http://docs.openstack.org/openstack-ops/content )

Course Assessment

Your grade will be based on the following components:

Homework & Programming assignments (about 3-4 sets in total): 40%
Project: 60%