Many DNN models have been developed over the past two decades. Each of these models has a different
"network architecture" in terms of number of layers, layer types, layer shapes (i.e., filter size, number of channels and filters), and connections between layers
[25].
Figure A.4-1 presents three popular structures of DNNs: multilayer perceptrons (MLPs), convolution neural networks (CNNs), and recurrent neural networks (RNNs). Multilayer perceptrons (MLP) model is the most basic DNN, which is composed of a series of fully connected layers
[41]. In a fully connected layer, all outputs are connected to all inputs, as shown in
Figure A.4-1. Hence MLP requires a significant amount of storage and computation.
An approach to limiting the number of weights that contribute to an output is to calculate the output only using a function of a fixed-size window of inputs. An extremely popular window-based DNN model uses a convolution operation to structure the computation, hence is named as convolution neural network (CNN)
[25]. A CNN is composed of multiple convolutional layers, as shown in
Figure A.4-2. Applying various convolutional filters, CNN models can capture the high-level representation of the input data, making it popular for image classification
[7] and speech recognition
[42] tasks. In recent years, the modern CNN models have dramatically improved the performance of image classification tasks (e.g., AlexNet
[7], VGG network
[8], GoogleNet
[9], ResNet
[18], MobileNet
[19]), as shown in
Figure A.4-3 [25].
Recurrent neural network (RNN) models are another type of DNNs, which use sequential data feeding. The input of RNN consists of the current input and the previous samples. Each neuron in an RNN owns an internal memory that keeps the information of the computation from the previous samples. As shown in
Figure A.4-4, the basic unit of RNN is called cell, and further, each cell consists of layers and a series of cells enables the sequential processing of RNN models. RNN models have been widely used in the natural language processing task on mobile devices, e.g., language modelling, machine translation, question answering, word embedding, and document classification.
Deep reinforcement learning (DRL) is not another DNN model. It is composed of DNNs and reinforcement learning
[43]. As illustrated in
Figure A.4-5, the goal of DRL is to create an intelligent agent that can perform efficient policies to maximize the rewards of long-term tasks with controllable actions. The typical application of DRL is to solve various scheduling problems, such as decision problems in games, rate selection of video transmission, and so on.