~ Project ~


1. Handwritten Digits Classification with Kernel-SVM

Dataset is available on THE MNIST DATABASE. In this project, you need to do the following:

  • SVM method: Use kernel method to train the SVM model on MapReduce and classify the digits.

2. Handwritten Digits Classification with CNN

With the same dataset above, you need to do the following:

  • In the first step, apply the Convolution Neural Network method to perform the training on one single CPU and testing.

  • In the second step, try the distributed training on at least two CPU/GPUs and evaluate the training time.

3. Comparison between Ceph and HDFS

In this project, you need to install Hadoop with Ceph where Ceph is another popular distributed file system. Run Terasort Benchmark with input data at least 1TB to compare the performance across Ceph and HDFS. The comparison should include:

  • The overall running time of Terasort under different file systems.

  • The actual I/O throughput.

4. Graph Algorithm Implemented in MapReduce

In this project, you need to implement graph algorithm in parallel under MapReduce, the input data size of the Graph should be at least 16GB and the number of machines should be no less than 3. In minimum, you shall run the following two graph algorithms:

  • Find the minimum spanning tree of a graph
  • Find the connected components of a graph

You also need to compare the running time between the parallel case and the case where there is only one machine.