My Blog

In this blog, I mainly talk about three important research issues for MapReduce framework which are:

Job scheduling for minimizing the total response time
Data locality issue
Speculative Execution

For each issue I will make two categories which are theoretical analysis based optimization and heuristic based algorithm design. Hope you can get something useful from this summary.

The Power of Choice in Data-Aware Cluster Scheduling This paper acutally studies two problems which are speculative execution and data locality.

Job Scheduling

Theoretical Analysis based:

B. Moseley, A. Dasgupta, R. Kumar, and T. Sarlos, “On scheduling in map-reduce and flow-shops,” in SPAA 2011. This paper seeks to minimize the flowtime of jobs which is motivated from the two-stage flow shop problem. The tasks to machine mapping is unknown and needs to be determined such that the total flowtime is minimized. It gives the offline and online algorithm in both homogeneous and heterogenous environment. The analysis here is nontrivial and gives us a flavor how traditional SRPT algorithm can apply to the scheduling in MapReduce System.
Hyunseok Chang, Murali Kodialam, Ramana Rao Kompella, T. V. Lakshman, Myungjin Lee, Sarit Mukherjee, “Scheduling in MapReduce-like Systems for Fast completion time,” in Infocom 2011. This paper seeks to optimize the total completion time of job running in the Mapreduce system. It decides the scheduling order of tasks on each processor. Moreover, this paper doesn’t consider the dependency between map task and reduce tasks. A linear program is formulated to relax the original problem. It then outlines a combinatorial approach to solve the LP problem.
F. Chen, M. Kodialam, and T. Lakshman, “Joint scheduling of processing and shuffle phases in MapReduce systems,” in Infocom 2012. Actually this paper is the follow-up work of the above paper. To be more practical, it considers the dependency between map and reduce tasks. Moreover, it jointly optimizes the job completion time by imposing the shuffle phase constraint.
Jian Tan, Xiaoqiao Meng, Li Zhang, “Delay Tails in MapReduce Scheduling,” in SIGMETRICS 2012. This paper focuses on the the starvation issue of small jobs under fair scheduler in MapReduce. The root reason of this problem is caused by launching the reduce tasks so early. To efficiently solve this issue and improve the job response performance, it presents a couple scheduler which assign the reduce tasks based on the progress of the map phase. The theoretical analysis here is very complicated.
Yousi Zheng, Ness Shroff, Prasun Sinha, “A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers,” in Infocom 2013. This paper is the first work to write down the formulation of mapreduce job scheduling including both map and reduce phases. There is no distinguish between map and reduce slots here. It uses adopts the knowledge in probability to prove that any work-conserving scheduler can achieves a constant efficiency ratio (not competitive ratio). Moreover, this paper presents the ASRPT algorithm which is inspired from the famous SRPT (Shortest remaining time first) algorithm and bound the competitive ratio to be 2.
M. Lin, L. Zhang, A. Wierman and J. Tan, “Joint optimization of overlapping phases in MapReduce,” in IFIP 2013.. This is the first work to consider the overlapping of map phase and shuffle phase so far. A nice formulation is also written down here. Hover, even the offline case with batch arrival is shown to be NP-Complete. It then presents the MaxSRPT ans SpiltSRPT algorithm which are complementary to solve schedule the jobs.

Heuristic based:

I haven’t read any papers which present heuristic-based algorithm to optimize the job completion time in MapReduce system.

Data locality

Theoretical Analysis based:

Qiaomin Xie, Yi Lu, “Degree-guided map-reduce task assignment with data locality constraint,” in ISIT 2012. This paper presents the peeling algorithm and adopts the mean-field equations to analyze the evolution (I cannot understand it well). However, it makes an impractical assumption which is that all the tasks with data locality on the machine can be finished in a time slot with probability p while the tasks without data locality can only be finished with probability q and p > q. Also the theoretical here is very complex.
W. Wang, K. Zhu, L. Ying, J. Tan, L. Zhang, “Map Task Scheduling in MapReduce with Data Locality: Throughput and Heavy-Traffic Optimality,” in Infocom 2013. This paper makes the same assumption as the above paper presented in ISIT 2012. Its task scheduling algorithm includes two aspects which are Routing and Scheduling. The purpose of routing is to keep load balancing for all the machines. Also scheduling process makes a very simple decision based on the length of queues. It adopts the Lyapunov function to prove the stability of the algorithm. The analysis here is also nontrivial.
J. Tan, X. Meng, L. Zhang, “Improving ReduceTask Data Locality for Sequential MapReduce Jobs,” in Infocom 2013. This is the first paper to analyze the data locality issue for reduce tasks with theoretical analysis. Actually, the model here is quite simple, the objective is only to minimize the data transfer cost in shuffle phase with proper reduce task placement. The data transfer cost gives a very good angle for analyzing data locality issue.

Heuristic based:

Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar and Andrew Goldberg, “Quincy: Fair Scheduling for Distributed Computing Clusters,” in SOSP 2009. This is the first paper to address the data locality issue and fairness problem in MapReduce-like systems. It encodes the scheduling as a flow network. In this network, the edge weights encode the demands of data locality and fairness. This is a very novel and beautiful work.
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Sunnyvale, Scott Shenker, Ion Stoica, “Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling,” in EuroSys 2010. This paper is very famous among all the papers which address the data locality issue. It presents the delay scheduling algorithm which delay the job at the head of the queue a little bit if data locality is not achieved with the available slot. As described next, this algorithm has a obvious drawback: there exists a probability that some slots wont’e be launching any tasks and keeps idle all the way. This phenomenon waste the resource in the cluster and needs to be improved.
J. Tan, X. Meng, L. Zhang, “Coupling Task Progress for MapReduce Resource-Aware Scheduling”, in Infocom 2013. This paper mainly includes 3 pieces of work and for each work a heuristic-based algorithm I is proposed: 1) Compute the mismatch between the map and reduce progress. The job gets the maximum mismatch value has the highest priority to assign a reduce task. 2) Waiting scheduling for reduce task. The main idea behind this algorithm is to assign a reduce task to an available slot who is close to the ‘centrality’ of the already generated intermediate date produced by the map phase. 3) Random peaking algorithm to assign the map task. This algorithm is motivated from the famous ‘delay scheduling’ algorithm may cause a bad resource utilization level. Actually, the heuristic algorithm for assigning map tasks is very complex to implement and doesn’t make a clear intuition. However, the bad resource utilization for ‘Delay Scheduling’ is a very good observation.

Speculative execution

Heuristic based:

J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, in USENIX OSDI, 2004. Actually, this paper is the technical report to the MapReduce system design of Google. It talks about the back up strategy with a short paragraph. The strategy is that when a MapReduce operation is close to complete, it will randomly choose some remaining tasks and launch the duplicates of them on alternative machines. It concludes that this speculative execution scheme can decrease the job execution time by 44%.
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, “Improving mapreduce performance in heterogeneous environments,” in OSDI 2008. In this paper, a new strategy named LATE is presented. LATE monitors the progress rate of the tasks and estimates their remaining time. Tasks with their progress rate below slowTaskTherehold are chosen as backup candidates and the one with the longest remaining time is given the highest priority when the number of backups in the cluster does not exceed speculativeCap.
G. Ananthanarayctions of Computer Networks 2013. This paper proposes a Smart Speculative Execution algorithm. The main contributions it makes are twofold: i) Use exponentially weighted moving average to predict process speed and compute a task’s remaining time; ii) Determine which task to backup based on the load of a cluster using a cost-benefit model.

In our following research, we can consider to optimize 
the job scheduling in a heterogeneous environment where
the machines are not identical.

Talk Something About My Research

Matrix Calculation

Basic notations:

Another way to illustrate the basic notations:

Chain rule:

Derivative of determinant:

Some fundamental tricks:

Research About MapReduce

Job Scheduling

Theoretical Analysis based:

Heuristic based:

Data locality

Theoretical Analysis based:

Heuristic based:

Speculative execution

Heuristic based:

Basic notations:

Another way to illustrate the basic notations:

Chain rule:

Derivative of determinant:

Some fundamental tricks:

Some new papers related to MapReduce scheduling and task cloning

Job Scheduling

Theoretical Analysis based:

Heuristic based:

Data locality

Theoretical Analysis based:

Heuristic based:

Speculative execution

Heuristic based: