~ Reference ~
Previous course website
http://mobitec.ie.cuhk.edu.hk/iems5709Spring2016
Additional References
Related Courses offered Elsewhere
[JLinUWaterloo] INST 767 Big Data Infrastructure, by Jimmy Lin, University of Waterloo Course INST 767, http://lintool.github.io/UMD-courses/bigdata-2015-Spring
[UIUCcs498] CS498 Cloud Computing, by Roy Campbell and Reza Farivar, UIUC.
[UPennNETS] NETS212 Scalable and Cloud Computing, by Andreas Haeberlen, UPenn
[UPennCS555] CIS455/555 Internet and Web Systems, by Andreas Haeberlen, UPenn
[CornellBirman] CS5412 Cloud Computing, by Ken Birman, Cornell
[CMUQatar] 15-319 Cloud Computing, by M. F. Sakr and M. Hammoud, CMU Qatar
[LASERsummer2013] Software for the Cloud and Big Data, 10th LASER Summer School on Software Engineering, Sept 2013, [http://laser.inf.ethz.ch/2013/lectures.php]
[TwitterUCB] Analyzing Big Data with Twitter, by Marti Hearst et al, UC Berkeley School of Information, Course i290, http://blogs.ischool.berkeley.edu/i290-abdt-s12/
[JLeskovecMMDS] Mining Massive Data Sets, by Jure Leskovec, Stanford Course CS246, http://www.stanford.edu/class/cs246/
[ASmolaUCB] Scalable Machine Learning, by Alex Smola, UC Berkeley Course Statistics 241B, CS281B, http://alex.smola.org/teaching/berkeley2012/
[WCohenCMU] Machine Learning with Large Datasets, by William W. Cohen, CMU Course 10-605 http://curtis.ml.cmu.edu/w/courses/index.php/Machine_Learning_with_Large_Datasets_10-605_in_Spring_2014
General Big Data
- [Tim Harford] Tim Harford,Big data: are we making a big mistake?
Infrastructure for Big Data Processing/ Cloud Computing
[DataCenter]The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, by Luiz Andre Barroso and Urs Holzle, Published by Morgan and Claypool, 2009, http://bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F09/wharehousesizedcomputers.pdf
[CloudData] Siba Mohammad, Sebastian Breb, Eike Schallenhn, "Cloud Data Management: A Short Overview and Comparison of Current Approaches," 24th GI-Workshop on Foundations of Databases, May 2012.http://ceur-ws.org/Vol-850/paper_mohammad.pdf
[JupiterRising]Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network, SIGCOMM, 2015, http://www.datascienceassn.org/sites/default/files/Jupiter Rising - A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network.pdf
MapReduce and other Big Data Processing Platforms
[Cloudera] Cloudera Developer Training for Apache Hadoop, http://cloudera.com/content/cloudera/en/training/courses/developer-training.html , http://www.cloudera.com/content/dam/cloudera/Resources/PDF/Developer_Training_for_Apache_Hadoop.pdf
[MMDSHadoopLabs] Mining Massive Data Sets: Hadoop Labs, by Daniel Templeton and Jure Leskovec, Stanford Course CS246H, http://www.stanford.edu/class/cs246h/
[PlatformsKentU] Advanced computing Platforms for Data Processing, by Ruoming Jin, Kent State University Course http://www.cs.kent.edu/~jin/Cloud12Spring/Cloud.html
[BDAS] The Berkeley Data Analytics Stack (BDAS), https://amplab.cs.berkeley.edu/software/
[Mahout] Apache Mahout: Scalable Machine Learning and Data Mining, http://mahout.apache.org
[TeraSort] TeraByte Sort on Apache Hadoop, Yahoo, http://sortbenchmark.org/YahooHadoop.pdf
[TeraSort] TeraSort using Hadoop, http://www.slideshare.net/tungld/terasort
[Kay Ousterhout] Kay Ousterhout Ryan Rasti, Sylvia Ratnasamy, Scott Shenker and Byung-Gon Chun, Making Sense of Performance in Data Analytics Frameworks, OSDI 2015. https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-ousterhout.pdf
Mining Massive Graphs and Graph-based Processing Platforms
[PowerLaw] Zipf, Power-Laws and Pareto: A Ranking Tutorial, by L. Adamic, http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html
[Pregel] G. Malewicz et al, “Pregel: A System for Large-Scale Graph Processing,” ACM SIGMOD 2010.
[GraphLab] GraphLab: Large-scale Machine Learning on Graphs, http://graphlab.org
[GraphLab2] Carlos Guestrin et al, “GraphLab 2: Parallel Machine Learning for Large-Scale Natural Graphs,” NIPS Big Learning Workshop 2011.
[GraphLab1] Yucheng Low, Joseph Gonzalez et al, “GraphLab: A New Framework for Parallel Machine Learning,” UAI 2010.
[PowerGraph] Joseph Gonzalez et al, “PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs,” OSDI 2012.
Data Stream Processing Algorithms
[GaroRamaUCB] CS286 Implementation of Database Systems, UC Berkeley, Minos Garofalakis, Raghu Ramakrishnan, http://db.cs.berkeley.edu/cs286sp07/
[JXu] A Tutorial on Network Data Streaming, by Jun (Jim) Xu, ACM Sigmetrics 2007, http://www.cc.gatech.edu/~jx/8803DS08/sigm07.pdf
[SmolaUCB] Stat 260 Scalable Machine Learning of UC Berkeley, by Alex Smola, CMU, http://alex.smola.org/teaching/berkeley2012/streams.html
High-level Big Data Query Language/ Processing Systems
- Pig Cheat Sheet from Mortar: http://mortar-public-site-content.s3-website-us-east-1.amazonaws.com
- Pig on Spark: https://cwiki.apache.org/confluence/display/PIG/Pig+on+Spark and its original effort - Spork: https://www.sigmoid.com/spork/
- Hive Cheat Sheet for SQL users from Hortonworks: http://hortonworks.com/blog/hive-cheat-sheet-for-sql-users/
- A List of Subtle Differences Between HiveQL and SQL: http://spryinc.com/blog/list-subtle-differences-between-hiveql-and-sql
- A VLDB 2015 tutorial on SQL-on-Hadoop Systems by Daniel Abadi et al: Abstract: http://cs-www.cs.yale.edu/homes/dna/papers/sql-on-hadoop-tutorial.pdf ; Slides: http://www.slideshare.net/abadid/sqlonhadoop-tutorial
Big Data processing Architectures in the Real-World
[FacebookHive] Hive - A Peta-scale Data Warehouse System on Hadoop, by Ning Zhang, Data Infrastructure Team in Facebook, https://www.facebook.com/notes/facebook-engineering/hive-a-petabyte-scale-data-warehouse-using-hadoop/89508453919
[FacebookDataArch] Peta-scale Data at Facebook, by Dhruba Borthakur, XLDB Conference at Stanford University, 2012 http://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1105_DhrubaBorthakur.pdf