Auto-Tuning for Cloud Resource Configuration

An interesting research direction

A bunch of applications, especially for those distributed machine learning jobs, can leverage this auto-tuning of resource configuration to achieve optimized performance.

Matrix Calculation

Basic notations:

In the following descriptions, $x \in R^n $, $b \in R^n$ and $X \in R^{n \times n}$, $A \in R^{n \times n}$ and $f (x) \in R, f (X) \in R^{}$. To begin with, we first standardize the following basic notations:

Another way to illustrate the basic notations:

Chain rule:

An illustration of chain rule:

As such,


Derivative of determinant:

It holds that $A A^{\ast} = | A | I$ where $A^{ \ast }$ is the Adjugate matrix of $A$, hence,


If $A$ is a non-singular matrix, the following equations hold:

If A is also symmetric,

It then immediately follows that:

Some fundamental tricks:

  • $A \in S^n$, $A = U \Sigma U^T$ where $U$ is an orthogonal matrix and $A = A^{1/2} A^{1/2}$ where $A^{1/2} = U \Sigma^{1/2} U^T$;

  • If $| x | = 1$, then $(I + \lambda x x^T)^{- 1} = I - \frac{\lambda}{1 + \lambda} x x^T$;

  • $A \in S^{n}$, $\ln |I + A| = \sum_{i = 1}^n \ln (1 + \lambda_i)$ where $\lambda_1, \lambda_2, \ldots, \lambda_n$ are the eigenvalues of $A$ and $\lambda_i > - 1$.

Research About MapReduce

In this blog, I mainly talk about three important research issues for MapReduce framework which are:

  • Job scheduling for minimizing the total response time
  • Data locality issue
  • Speculative Execution

For each issue I will make two categories which are theoretical analysis based optimization and heuristic based algorithm design. Hope you can get something useful from this summary.

Job Scheduling

Theoretical Analysis based:

Heuristic based:

I haven’t read any papers which present heuristic-based algorithm to optimize the job completion time in MapReduce system.

Data locality

Theoretical Analysis based:

Heuristic based:

Speculative execution

Heuristic based:

In our following research, we can consider to optimize 
the job scheduling in a heterogeneous environment where
the machines are not identical.