**E**xtreme **L**earning **M**achines **(ELM)**: Filling the Gap between Frank Rosenblatt's Dream and John von Neumann's Puzzle

- Learning without iterative tuning - Random hidden neurons - Random features

Neural networks (NN) and support vector machines (SVM) play key roles in machine learning and data analysis. Feedforward neural networks and support vector machines are usually considered different learning techniques in computational intelligence community. Both popular learning techniques face some challenging issues such as: intensive human intervene, slow learning speed, poor learning scalability.

It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: 1) the slow gradient-based learning algorithms are extensively used to train neural networks, and 2) all the parameters of the networks are tuned iteratively by using such learning algorithms. On the other hand, due to their outstanding classification capability, support vector machine and its variants such as least square support vector machine (LS-SVM) have been widely used in binary classification applications. The conventional SVM and LS-SVM cannot be used in regression and multi-class classification applications directly although different SVM/LS-SVM variants have been proposed to handle such cases.

ELM works for the “generalized” single-hidden layer feedforward networks (SLFNs) but the hidden layer (or called feature mapping) in ELM need not be tuned. Such SLFNs include but are not limited to support vector machine, polynomial network, RBF networks, and the conventional (both *single-hidden-layer* and *multi-hidden-layer*) feedforward neural networks. Different from the tenet in neural networks that all the hidden nodes in SLFNs need to be tuned, ELM learning theory shows that the hidden nodes / neurons of generalized feedforward networks needn’t be tuned and these hidden nodes / neurons can be randomly generated. All the hidden node parameters are independent from the target functions or the training datasets. ELM theories conjecture that this randomness may be true to biological learning in animal brains. Although in theory all the parameters of ELMs can be analytically determined instead of being tuned, for the sake of efficiency, in real applicaitons the output weights of ELMs may be determined in different ways (with or without iterations, with or without incremental implementations, etc.).

**Why can learning be made without tuning hidden neurons?**

**What kind of activation functions can be used in hidden neurons?**

**Does such a network have feature learning, clustering, regression and classification capabilities? **

According to ELM theory:

The hidden node / neuron parameters are not only independent of the training data but also of each other, standard feedforward neural networks with such hidden nodes have universial approximation capability and separation capability. Such hidden nodes and their related mappings are terms ELM random nodes, ELM random neurons or ELM random features.

Unlike conventional learning methods which MUST see the training data before generating the hidden node / neuron parameters, ELM could randomly generate the hidden node / neuron parameters before seeing the training data.

Multi hidden layers of networks can be built by hierarchical ELMs

ELM was originally proposed for standard single hidden layer feedforward neural networks (with random hidden nodes (random hidden neurons, random features)), and has recently been extended to kernel learning as well:

- ELM provides a unified learning platform with widespread type of feature mappings and can be applied in regression and multi-class classification applications directly;
- From the optimization method point of view ELM has milder optimization constraints compared to SVM, LS-SVM and PSVM;
- In theory ELM can approximate any target continuous function and classify any disjoint regions;
- In theory compared to ELM, SVM, LS-SVM and PSVM achieve suboptimal solutions and require higher computational complexity. (cf.: Dedails on the reasons why SVM/LS-SVM provide suboptimal solutions)

ELM is efficient in:

- Batch learning
- Sequential learning
- Incremental learning

ELM has been successfully used in the following applications:

- Biometrics
- Bioinformatics
- Image processing (image segmentation, image quality assessment, image super-resolution)
- Signal processing
- Human action recognition
- Disease prediction and eHealthCare
- Location positioning system
- Brain computer interface
- Human computer interface
- Feature selection
- Time-series
- Real-time learning and prediction
- Security and data privacy

Due to the demand on ELM solutions, ELM may help drive R&D in the following areas and make some applications which seem impossible in the past become true in the future:

- Machine learning and artificial intelligence
- Matrix theory and optimization theory
- Functioning artificial “brain”
- Robot and automation
- Data and knowledge discovery
- Cognitive and reasoning system
- Big data analytics
- Internet of Things (IoT)

International Conference on Extreme Learning Machines (ELM2015)

Singapore, December 15 - 17 2015

**Organized by:**

Nanyang Technological University, Singapore

**Co-Organized by:**

Zhejiang University, China

Tsinghua University, China