~ Home ~



This course aims to provide students an understanding in the operating principles and hands-on experience with mainstream Big Data Computing systems. Open-source platforms for Big Data processing and analytics would be discussed. Data mining algorithms and machine learning applications are another major stream of this course. In addition, widely-adopted optimization methods and models for big data analytics will also be investigated. Topics to be covered include:

  • Programming models and design patterns for mainstream Big Data computational frameworks ;

  • System Architecture and Resource Management for Data-center-scale Computing ;

  • Algorithm Design for Big Data Analytics, e.g., SVM Model, K-means Clustering, Deep Neural Network ;

  • Optimization Methods, e.g., convex optimization, gradient descent, online optimization ;

Course Pre-requisite:

This course contains substantial hands-on components which require solid background in programming and hands-on operating systems experience. If you have never used a command-line interface to install/configure/manage an operating system, e.g. a linux-based one, you will need to pick-up the skills yourself and IT CAN BE VERY TIME-CONSUMING for you to complete the homeworks. (Students without the aforementioned required background may take several 10's of hours to finish EACH homework assignment).

Course Information

Lecture time and venue:

  • 6F503 (2:30pm - 5:00pm, Tuesday);


  • Dr. Huanle Xu. xhlcuhk [at] gmail [dot] com
  • Office hours: Fri 4:30-5:15pm or by Appointment (9A 304)

Teaching Assistant:

  • Zizhao Mo yc17461@connect.um.edu.mo

Recommended Programming References

  • [DataAlgorithms] Data Algorithms: Recipes for Scaling Up with Hadoop and Spark, by Mahmoud Parsian, Publisher: O'Reilly Media, Aug 2015

  • [Pig] Programming Pig, by Alan Gates, published by O’Reilly Media

  • [Hive] Programming Hive, by Edward Capriolo, Dean Wampler, Jason Rutherglen, published by O’Reilly Media,

  • [OpenStackOp] OpenStack Operations Guide, published by O’Reilly Media, (current-version available online at: http://docs.openstack.org/openstack-ops/content )

Course Assessment

Your grade will be based on the following components:

  • Homework & Programming assignments (about 3-4 sets in total): 40%
  • Project: 60%