Talk Something About My Research

I am a postgraduate student and therefore research plays a very important role in my current life. My research interests mainly include cloud computing, decentralised social network, graph algorithms and machine learning.

My first research project is motivated from the simple and heuristic-based speculative execution approach under the MapReduce framework. Since the previous work lacks the theoretical understanding, I consider to tackle this issue from the perspective of system modelling.

To begin with, I build a very simple model to characterize the tradeoff between the job completion time and resource consumption under the speculative execution scheme. In the first optimization problem, I only consider one single job which consists of multiple tasks that can run in parallel on multiple machines. Moreover, the optimization objective is to minimize the resource consumption while satisfy the job’s SLA requirement.

Matrix Calculation

Basic notations:

In the following descriptions, $x \in R^n $, $b \in R^n$ and $X \in R^{n \times n}$, $A \in R^{n \times n}$ and $f (x) \in R, f (X) \in R^{}$. To begin with, we first standardize the following basic notations:

Another way to illustrate the basic notations:


Chain rule:

An illustration of chain rule:

As such,

and


Derivative of determinant:

It holds that $A A^{\ast} = | A | I$ where $A^{ \ast }$ is the Adjugate matrix of $A$, hence,

and

If $A$ is a non-singular matrix, the following equations hold:

If A is also symmetric,

It then immediately follows that:


Some fundamental tricks:

  • $A \in S^n$, $A = U \Sigma U^T$ where $U$ is an orthogonal matrix and $A = A^{1/2} A^{1/2}$ where $A^{1/2} = U \Sigma^{1/2} U^T$;

  • If $| x | = 1$, then $(I + \lambda x x^T)^{- 1} = I - \frac{\lambda}{1 + \lambda} x x^T$;

  • $A \in S^{n}$, $\ln |I + A| = \sum_{i = 1}^n \ln (1 + \lambda_i)$ where $\lambda_1, \lambda_2, \ldots, \lambda_n$ are the eigenvalues of $A$ and $\lambda_i > - 1$.

Research About MapReduce

In this blog, I mainly talk about three important research issues for MapReduce framework which are:

  • Job scheduling for minimizing the total response time
  • Data locality issue
  • Speculative Execution

For each issue I will make two categories which are theoretical analysis based optimization and heuristic based algorithm design. Hope you can get something useful from this summary.


Job Scheduling

Theoretical Analysis based:

Heuristic based:

I haven’t read any papers which present heuristic-based algorithm to optimize the job completion time in MapReduce system.


Data locality

Theoretical Analysis based:

Heuristic based:


Speculative execution

Heuristic based:

1
2
3
In our following research, we can consider to optimize 
the job scheduling in a heterogeneous environment where
the machines are not identical.