Published on 24 Jan 2022

Data centre twinning for the win

Developed by Prof Wen Yonggang of NTU’s School of Computer Science and Engineering (SCSE) and his team, DCWiz is an integrated industrial AI solution that optimises the operations of data centres.

Data centre

Electricity use by data centres has sky-rocketed in recent years, fuelled by the demand for mission-critical information and communications technology (ICT) infrastructure. Sustaining such rapid growth while lowering the overall carbon footprint is a challenge. At the same time, the increasing complexity of data centre management has led to more unplanned data outages, resulting in considerable economic losses.

Optimising data centre operations

Against this backdrop, artificial intelligence (AI) presents an unprecedented opportunity for data centres to enhance their energy efficiency and optimise their system management. My team’s integrated industrial AI solution, DCWiz, combines an industry-grade digital twin with Artificial Intelligence of Things (AIoT) (Figure 1).
DCWiz
Figure 1: Interactions between AI and the data centre digital twin in DCWiz. Credit: Wen Yonggang, Anna Chua and Yang Fan.

 

A digital twin is a virtual representation that serves as the real-time digital counterpart of a physical object, in this case, a data centre. It provides an accurate and intuitive 3D simulation platform that allows experts to better grasp information about the conditions—for instance, temperature and air flow rate—of the data centre hall and quickly pinpoint anomalies. In addition, a high-fidelity digital twin is able to generate massive amounts of synthesised data to augment datasets for AI algorithms.

Building on the data from the digital twin, our AIoT offers three tiers of intelligence. First, descriptive AI can accurately model the internal behaviour of the system based on historical and online data. Second, on the prescriptive level, moves to improve system management and efficiency can be proposed and then safely verified and validated on the cyber system before implementation. Finally, through predictive AI, we can forecast system behaviours with hypothetical inputs to anticipate data centre anomalies and failures.

Compared with traditional system management, which relies solely on human expert knowledge and limited sensor

readings from the data centre infrastructure management system, DCWiz offers high-precision, high-safety and efficient “what-if” analyses with an easy-to-understand user interface, on top of an automated cyber-physical control loop.

DCWiz in the wild

Our team has successfully conducted proof-of-concept trials of DCWiz in both China and Singapore. In China, DCWiz was successfully deployed by Alibaba Group in 2018 during their “Double Eleven” cybersales day, an event where Alibaba handled more than 13,000 transactions per second and hit a sales revenue of US$43 billion.

The fully automated digital twin calibration process was able to achieve accuracy to within±0.50C. With no prior maintenance required, the digital twin shortened the testing duration from one month to a mere week, saving the company tremendous operating costs in the process. Alibaba hailed the DCWiz solution as a “from zero to one” breakthrough in digitalising, optimising and automating data centre operations and management.

In Singapore, a trial was conducted at the enterprise-scale data centres of the National Supercomputing CentreHere, DCWiz improved the power usage effectiveness from 1.35 to 1.3 for 40 server racks, with an accompanying energy cost savings of S$6,000 (US$4,500) per month. With the help of DCWiz, the supercomputing centre achieved energy savings of 15% for an air-cooled system and 30% for a water-cooled system.

Widely recognised in industry and academia, DCWiz has won a series of prestigious awards—such as the 2020 IEEE TCCPS Industrial Technical Excellence Award, 2016 ASEAN ICT Award (Gold Medal), and 2015 DCD APAC Award—in addition to the Nanyang Research Award, NTU’s top research award, in 2020.

We are currently developing a minimum viable product, with all the essential components of DCWiz, that will be integrated into a cloud-based platform. Our plans include a series of proof-of-value trials with local partners, followed by commercialisation of DCWiz through a spin-off company.

 

By Wen Yonggang, Anna Chua and Yang Fan

Prof Wen Yonggang is the Alibaba-NTU President’s Chair in Computer Science and Engineering at NTU’s School
of Computer Science and Engineering (SCSE), where he heads the Cloud Application and Platform Lab. He is also

Associate Dean (Research) at NTU’s College of Engineering.

Dr Anna Chua is Assistant Director in Business Development at NTU’s College of Engineering, and Yang Fan is a research associate in SCSE.

Details of this research can be found in BuildSys ’20: Proceedings of the 7th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (2020), DOI: 10.1145/3408308.3427982; IEEE Transactions on Neural Networks and Learning Systems (2020), DOI: 10.1109/TNNLS.2020.3008249; IEEE International Conference on Distributed Computing Systems (2019), DOI: 10.1109/ICDCS.2019.00069; IEEE International Conference on Distributed Computing Systems (2019), DOI: 10.1109/ICDCS.2019.00070; Hacker Noon (2019), hackernoon.com/prediction-inthree-dimensions-alibaba-launches-its-live-cfd-based-sandbox-c3c452ae5418; and IEEE Transactions on Cybernetics (2018), arXiv:1709.05077v4.

 

The article appeared first in NTU's research & innovation magazine Pushing Frontiers (issue #19, August 2021).