Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: KWOH Chee Keong


Dr. Kwoh Chee Keong, PBM  



AsC - Graduate Studies 

ACM - Service

PD MSc – Bioinformatics


School of Computing Science and Engineering

Block N4, Level 2, Section C, Room 73/74 Nanyang Avenue, Singapore 639798

Tel: (65) 6790 6057

Fax: (65) 6792 6559 |





I think the nicest, most sincere compliments that I have received are those from my students and people I did not expect. 

Notes from Students and Friends



National Day Awards

Public Service Medal (National Day Award)

National Day Awards 2008, The Public Service Medal (PBM) , conferred by the President of Singapore, Mr S R Nathan


Ministry of Education Long Service Medal (National Day Award) 2016



Other Awards (NTU)

Best Faculty Mentor Award from Temasek Foundation (TF) LEARN 2014

Best Faculty Mentor Award from Temasek Foundation (TF) LEARN 2013



I am looking for versatile, highly motivated Research Fellow/Pos-doc PhD candidates. The successful candidates will build on the ongoing research directed and will help define and explore this exciting area of research.


Applicants must have a strong background in Computer Science and/or closely related areas (e.g. Mathematics, Computer Science, Bioinformatics, Statistics and Physics) and excellent skills in both written and spoken English, as the working language of the Faculty is English.



For PhD application, please visit the Graduate Studies by Research at NTU before writing. Please note that Ph.D program is a very intensive program and applicant must have strong interest, strong analytical mind, technically sound in the area of data mining, learning theory, algorithms and computer programming. You must be highly independent with good initiatives and aspire to publish in top-tier journals. If you are interested and suitable,


Enquiries about these vacancies can be sent to (the deadlines are flexible) with with your CV, your proposed research area with at least 3 references (either your own publications or papers that inspired you to do research).



My main interests lie in our desire to making sense of big heterogeneous data for real application in engineering, life science, and medical.

Machine Learning and Statistical Inference

Machine Learning and statistical modeling techniques that can learn from data to enables the making of decision and simply classification, this has application in almost every area. There are many approaches and each has its own merits. The following have been heavily used in my group: support vector machines, decision tree learning, artificial neural networks, Bayesian networks, genetic and mimetic algorithms. Example include application in supertype-specific HLA Class I binding peptides, protein–ligand binding affinity.

Learning with Unlabeled Data

In the context of machine learning, PU learning is a collection of semi-supervised techniques for training binary classifiers on positive (P) and unlabeled (U) examples. To improve performance, it is important to partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN, and weak negative set WN. Such approach has been used to identifying disease genes from human genome is an important but challenging task in biomedical research.

Meta and Ensemble learning

Meta learning is where automatic learning algorithms are applied on meta-data about machine learning experiments. The main goal is to use meta-data to understand how automatic learning can become flexible in solving different kinds of learning problems and enrich the knowledge discovered. Coupled with ensemble methods that that integrates results of multiple predictive methods into one system, these approach has found to be instrumental in improve predictive performance. Application of this approach has been widely used in big data such as bioinformatics and medical informatics. Example includes multiple kernel learning for heterogeneous data fusion and sparse learning in genome wide association study (GWAS), and drug-target interaction prediction.

Ontology for Knowledge Representation

Ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. There have been major initiatives in medical and bioinformatics to standardize the representation of terminology and relationship across diseases, species and databases. The controlled vocabulary of terms and description of product characteristics are the main outcome to improve annotation for representation of meta concept that greatly enhance analysis. Specifically it has been extensively applied in complex mining, inferring gene-disease-phenotype association.

Computational Structural Biology

This is where Computational Science and Engineering finds its applications in understanding molecular structure of biological macromolecules. Dealing with computational models and simulations, the field makes use of high-performance computing to solve complex and expensive problems in biology. Result include the acceleration of docking and its application in understanding of influenza and other viruses.


CovalentDock Cloud: a web server for automated covalent docking

Covalent binding is an important mechanism for many drugs to gain its function. We developed a computational algorithm to model this chemical event and extended it to a web server, the CovalentDock Cloud, to make it accessible directly online without any local installation and configuration. It provides a simple yet user-friendly web interface to perform covalent docking experiments and analysis online.


Software for Accelerating Autodock Vina

            Quickvina: This project aims at accelerating Autodock Vina, a program for protein-ligand docking. The main idea is to skip some of the local searches which is not promising in finding a better solution.

Quick Vina 2 is a fast and accurate molecular docking tool, attained at accurately accelerating AutoDock Vina. It was tested against 195 protein–ligand complexes that compose the core set of the 2014 release of the PDBbind using default exhaustiveness level of 8, QVina 2 successfully attained up to 20.49-fold acceleration over Vina.

Software for Learning with Unlabeled Data

1.      PUDI (2013) -  a Positive-Unalbeled (PU) learning based method aiming to address the problem of disease gene identification

Software – for complexes

1.      CACHET- Discovery of Protein Complexes with Core-Attachment Structures from TAP Data

2.      COACH- COre-AttaCHment based Complex Mining


Software – for Computational Structural Biology

1.      CovalentDock Cloud (2013) - This web server allows the researchers and scientists to perform protein-ligand covalent docking.

2.      CovalentDock: Automated covalent docking with parameterized covalent linkage energy estimation and molecular geometry constrains

3.      QuickVina: Accelerating AutoDock Vina Using Gradient-based Heuristics for Global Optimization



List of Research Grants

·         Computational Virulence Model with Functional Information for Influenza Viruses

·         Methodological Investigation for Automatic Detection of Primary Angle Closure Condition (PAC) and PAC induced Glaucoma

·         Bioinformatics Algorithms for Detecting Genetic and Epigenetic Determinants of Meiotic Recombination Hotspots from Genomic Data

·         Whole genome sequencing, single nucleotide polymorphisms, electron microscopy, Acinetobacter baumannii, lipopolysaccharide

·         Collaborative Research Programme On Bioinformatics Algorithms And Tools

·         Structural Analysis and Characterisation of Protein Complexes

·         Genomic analysis and development of a new multilocus variable-number tandem-repeat analysis - scheme for molecular epidemiological typing of Acinetobacter baumannii

·         Core-Attachment Based Mining For Protein Complexes & Small molecule

·         Interactions

·         Characterization of novel extracellular proteins produced by a newly-isolated strain of Bacillus subtillis.

·         Improved Design via Evolutionary Algorithms

·         Protein binding hotspots are water-free?

·         Analysis of Past DRG data for the study of LOS for better utilization of Hospital Resources

·         Data Warehousing and Data Mining Analysis on Staphylococcus Aureus

·         A novel approach for inter- to intra- network analysis of genetic diseases using high-throughput data

·         Neural Systems modeling with functional MRI

·         SCE incubator proposal for “Evolutionary and Complex Systems Lab”

·         The Application of ultrasound based augmented reality with the directional vacuum-assisted breast biopsy device in the treatment of breast cancer

·         Distributed Diagnosis and Home Healthcare (D2H2)

·         Development of a robotic semi-automated remote handling system for radioiodine dispensing

·         Functional MR Time-Series Analysis

·         Augmented Reality for Prosthesis Cup Placement

·         Robotic Skull Based Surgery

·         Cardiovascular and Respiratory Systems' Signal Simulation, Processing and Analysis for ICU, OR and Telemedicine Applications.

·         Strategic research: Interventive augmented reality for medical applications.

·         Surgeon Assistant Robot for Selected urological disorder.




·         YIN RUI (PhD, 2016 -)

·         Aly Mohamed Alaa Eldin Aly Ezzat - Biological Network Mining and its application in healthcare (PhD, 2013-)

·         Tan Kuan Pern - N-body Statistical Force-Field for 3D Structure Modeling of Biomolecules (PhD, 2012 -)

·         Amr Ali Mokhtar Alhossary - Accelerating Drug Design Workflow (PhD, 2012 -)

·         Pan Hong – DNA methylation biomarkers of personal disease risk (PhD, 2012)

·         Luay Aswad - A molecular basis of the 5-gene breast tumor aggressiveness grading signature (AGS) and its network – PhD, (2012 -)

·         Han Xu - Constructing the Semantic Web for Biomedical Literature (PhD, 2011 - )

·         Ouyang Xuchang - Automated and Accelerated Covalent Docking and Covalent Virtual Screening (PhD, 2010–)

·         Thidathip Wongsurawat - Computational Analysis and Prediction of Specific Genomic Regions Forming R-loop Structure and Chromosomal Variations Associated with Cancer  - (PhD, 2015)

·         Zhang Zhou - Knowledge Discovery In Post Genome-Wide Association Study For Glaucoma (PhD, 2015)

·         Su Tran To Chinh - Improving the Discrimination of Near-Native Complexes for Protein Rigid Docking by Implementing Interfacial Water into Protein Interfaces (PhD, 2015)

·         Yang Peng - Computational Approaches for Disease Gene Identification (PhD, 2014)

·         Wu Min - Mining Protein Complexes From Protein Interaction Data (PhD, 2012)

·         Zhang Tianyou - Contact Network Based Framework For Infectious Disease Interventions (PhD, 2015)

·         Stephanus Daniel Handoko - Constrained-Oriented Refinement-Efficacious Memetic Algorithms for Efficient Optimization of Computationally-Expensive Problems (PhD, 2014)

·         Adrianto Wirawan - Whole-Genome Discovery Of Transcriptional Regulator Binding Sites (PhD, 2011)

·         Zhang Guanglan- Computational Epitope-Driven Vaccine Design (PhD, 2008)

·         Zheng Yun- Design Of Gene Expression Networks From Microarray Data (PhD, 2006)

·         Zhao Ying- Efficient Model And Feature Selection For SVM In Biomedical Data Analysis (M Eng, -2004)

·         Zhao Jianhui- Human Animation from Motion Recognition, Analysis and Optimisation ( PhD, 2003)

·         Chen Yintao - Image Processing For Ultrasound Guidance System In Breast Lump Operation (M Eng, 2002)

·         Wang Yan - Image-Based Indexing And Retrieval Of Trademark Logos, (M Eng, 2001)

·         Veena Mohan Bhajammanavar - Image Processing Of The Digital Mammogram For Segmentation And Characterization Of Microcalcifications, (M Eng, 2000)

·         Misra Sabita - Time Series Analysis Of ECG For Detection Of Premature Ventricular Contraction (M Eng, 2000)

·         Zou Qingsong - Object Based Volume Visualisation For Medical Imaging (PhD, 2001)




Planed and lectured subjects in

  1. CZ4032 Data Analytics and Mining (2014,15): Data Mining is an analytic process designed to explore big data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the algorithms to new data.
  2. CE7411 Bioinformatics (2015): This course covers basic bioinformatics concepts, databases, tools and applications. Introduction: cell biology's central dogma, biological technologies for collecting and storing genomic sequence data; databases that store these data and strategies to extract information from them; Pairwise sequence alignment for assessment of similarity to infer homology; Fundamental of Scoring matrices to understand the assigned scores when performing alignment; The popular heuristic search tool - Basic Local Alignment Search Tool (BLAST) and advanced database searching; Multiple sequence alignment and phylogenetic trees to complete the coverage from genomic sequences. Functional genomics with the introduction to gene expression. Processes for microarray data analysis; Feature selection and classification for microarray data analysis. Protein families & proteomics; Protein structure and structural genomics; and Molecular evolution and phylogeny.
  3. BI6123 Methods and Tools of Proteomics (2007): Proteomics study and identify protein structure, interactions of protein/protein and protein/DNA and biology of organisms. We will further introduce the newly developed technology for the quantitative analysis of protein expression and function on a genome-wide scale.
  4. BI1602 and SC448 Introductory Bioinformatics (2005,06): Basic bioinformatics concepts. Databases, tools and applications.
  5. BI1603 Computational Biology (2006) Introduce the applications of the techniques of computer science, applied mathematics, and statistics to address problems inspired by biology. Major computational techniques used in biology include: Bayes, HMM, MI etc.
  6. BG3011 Biocomputing (2005, 06): Introduction the new course of biocomputing for students in SCBE, the subject is first offered in July 2005; It covers Concepts; Bioinformatics databases; Sequence alignment; Phylogeny and protein structure prediction.
  7. BI6104 Biostatistics: First offered in July 2003, this course equipped the students in MSc with Knowledge of statistics, experimental design and statistical learning.
  8. Curriculum for MSc in Bioinformatics: From August 2001 to June 2002, I worked with Vice-Dean (Academic) SCE, Head, Natural Science of NIE, Vice-Dean (Academic) of SBS and Professor from MPE and EEE to structure the new MSc in Bioinformatics. 
  9. SC104 Mathematics I Fundamental of mathematics for Engineering include statistics and calculus
  10. CE307 Computer Peripherals: In 1996 ?2000, re-design the course to include start-of-the-art techniques such as PRML, USB and Bluetooth.
  11. M495 & M6524 Medical Assist Surgery (2000-2002): Co-planed and lectured the final year and MSc elective for Biomedical Engineering.
  12. Digital Signal Processing (1992): Planed and lectured the final year elective for the computer engineering.



GRADUATE ADVISORS:  Prof Duncan Fyfe Gillies - Professor of Biomedical Data Analysis, Department of Computing, Imperial College London


My PhD thesis Probabilistic Reasoning From Correlated Objective Data, University of London, Imperial College




From Google Scholar