AliExpress is the largest cross border trading platform in the world. Every day, from over 200 countries, millions of consumers visit AliExpress to order products from global merchants. To keep AliExpress operating at high performance at all times is a significant technical challenge. In the past few years, we have refined our best practices for high availability and high performance. In this session, we will explain both the process and some of our findings.
For high availability, we will share some high profile failure cases, we will explain the root of a failure and why failures are unavoidable for an Internet business. We analyze the cost of a failure and the cost of avoiding a failure. From these analyses, we illustrate the necessity of failure management, that is, the effective management of failure is the most economical way of maintaining the growth of an Internet business yet keeping the operational budget at a reasonable level. From this conclusion, we explain the best practices of failure management: risk identification, risk detection, failure detection, monitoring and alarm, and loss minimization. Furthermore, from deep analysis of thousands of failure cases, AliExpress has built a set of guiding principles for failure management. These principles have been safeguarding the growth of AliExpress in the past five years.
For performance management, we introduce the concept of performance loss, measuring how Website performance can affect the business result and explain how it can be measured in realtime with Big Data infrastructure. We will explain various tactics we have employed to increase the performance of an Internet business spanning multiple continents. In 2015, AliExpress has achieved 50% performance increase globally by applying these tactics. It illustrates how engineering innovations can lead to significant business results.