Auto-Tuning for Cloud Resource Configuration

An interesting research direction

A bunch of applications, especially for those distributed machine learning jobs, can leverage this auto-tuning of resource configuration to achieve optimized performance.

Matrix Calculation

Basic notations:

In the following descriptions, $x \in R^n$, $b \in R^n$ and $X \in R^{n \times n}$, $A \in R^{n \times n}$ and $f (x) \in R, f (X) \in R^{}$. To begin with, we first standardize the following basic notations:

Chain rule:

An illustration of chain rule:

As such,

and

Derivative of determinant:

It holds that $A A^{\ast} = | A | I$ where $A^{ \ast }$ is the Adjugate matrix of $A$, hence,

and

If $A$ is a non-singular matrix, the following equations hold:

If A is also symmetric,

It then immediately follows that:

Some fundamental tricks:

• $A \in S^n$, $A = U \Sigma U^T$ where $U$ is an orthogonal matrix and $A = A^{1/2} A^{1/2}$ where $A^{1/2} = U \Sigma^{1/2} U^T$;

• If $| x | = 1$, then $(I + \lambda x x^T)^{- 1} = I - \frac{\lambda}{1 + \lambda} x x^T$;

• $A \in S^{n}$, $\ln |I + A| = \sum_{i = 1}^n \ln (1 + \lambda_i)$ where $\lambda_1, \lambda_2, \ldots, \lambda_n$ are the eigenvalues of $A$ and $\lambda_i > - 1$.

In this blog, I mainly talk about three important research issues for MapReduce framework which are:

• Job scheduling for minimizing the total response time
• Data locality issue
• Speculative Execution

For each issue I will make two categories which are theoretical analysis based optimization and heuristic based algorithm design. Hope you can get something useful from this summary.

Job Scheduling

Heuristic based:

I haven’t read any papers which present heuristic-based algorithm to optimize the job completion time in MapReduce system.