（转）Illustrated: Efficient Neural Architecture Search ---Guide on macro and micro search strategies in ENAS

--- Guide on macro and micro search strategies in ENAS

2019-03-27 09:41:07

This blog is copied from: https://towardsdatascience.com/illustrated-efficient-neural-architecture-search-5f7387f9fb6

Neural Architecture Search (NAS), a task to automate the manual process of designing neural networks. NAS owes its growing research interest to the increasing prominence of deep learning models of late.

There are many ways to search for or discover neural architectures. Over the past couple of years, the community has seen different search methods proposed including:

et al., 2018)
et al., 2018)
et al., 2017)
et al. 2018)
et al., 2018)

micro search. That’s right – a neural network building another neural network.

how the macro and micro search strategies lead to generating neural networks. While the illustrations and animations serve to guide the readers, the sequence of animations do not necessarily reflect the flow of operations (due to vectorisation etc.).

neural architecture search for CNNs in an image classification task. This article assumes that the reader is familiar with the basics of RNNs, CNNs, and reinforcement learning. Familiarity with deep learning concepts like transfer learning and skip/residual connections will greatly help as they are heavily used in the architecture search. It is not required to have read the paper, but it would speed up your understanding.

References

0. Overview

In ENAS, there are 2 types of neural networks involved:

Controller – a predefined RNN, which is a long short-term memory (LSTM) cell
Child model – the desired CNN for image classification

Like most other NAS algorithms, the ENAS involves 3 concepts:

Search space — all the different possible architectures or child models that can possibly be generated
Search strategy — a method to generate these architectures or child models
Performance evaluation — a method to measure the effectiveness of the generated child models

Let’s see how these five ideas form the ENAS story.

search space.

child epochs, say 100. Then, a validation accuracy is obtained from this trained model.

Then, we update the controller’s parameters using REINFORCE, a policy-based reinforcement learning algorithm, to maximise the expected reward function which is the validation accuracy. This parameter update hopes to improve the controller in generating better decisions that give higher validation accuracies.

controller epoch. We then repeat this for a specified number of controller epochs, say 2000.

the neural network for your image classification task. However, this child model must go through just one more round of training (again specified by the number of child epochs), before it can be used for deployment.

A pseudo algorithm for the entire training is written below:

CONTROLLER_EPOCHS = 2000
CHILD_EPOCHS = 100

Build controller network

for i in CONTROLLER_EPOCHS:

     1. Generate a child model
     2. Train this child model for CHILD_EPOCHS
     3. Obtain val_acc
     4. Update controller parameters

Get child model with the highest val_acc
Train this child model for CHILD_EPOCHS

This entire problem is essentially a reinforcement learning framework with the archetypal elements:

Agent — Controller
Action — The decisions taken to build the child network
Reward — Validation accuracy from the child network

The aim of this reinforcement learning task is to maximise the reward (validation accuracy) from the actions taken (decisions taken to build child model architecture) by the agent (controller).

1. Search Strategy

Recall in the previous section that the controller generates the child model’s architecture using a certain search strategy. There are two questions that you should ask in this statement — (1) how does the controller make decisions and (2) what search strategy?

How does the controller make decisions?

This brings us to the model of the controller, which is an LSTM. This LSTM samples decisions via softmax classifiers, in an auto-regressive fashion: the decision in the previous step is fed as input embedding into the next step.

What are the search strategies?

2 strategies for searching for or generating an architecture.

Macro search
Micro search

designs modules or building blocks, which are combined to build the final network. Some papers that implement this approach are Hierarchical NAS, Progressive NAS and NASNet.

In the following 2 sub-sections we will see how ENAS implements these 2 strategies.

Micro Search]

1.1 Macro Search

In macro search, the controller makes 2 decisions for every layer in the child model:

Notes for the list of operations)
the previous layer to connect to for skip connections

In this macro search example, we will see how the controller generates a 4-layer child model. Each layer in this child model is colour-coded with red, green, blue and purple respectively.

Convolutional Layer 1 (Red)

conv3×3operation.

What this means for the child model is that the we perform a convolution with a 3×3 filter on the input image.

The output from the first time step (conv3×3) of the controller corresponds to building the first layer (red) in the child model. This means the child model will first perform 3×3 convolution on the input image.

I know I mentioned that the controller needs to make 2 decisions but there’s only 1 here. Since this is the first layer, we can only sample one decision which is the operation to perform, because there’s nothing else to connect to except for the input image itself.

Convolutional Layer 2 (Green)

sep5×5.

1, i.e. the output from the red layer.

The outputs from the 2nd and 3rd time step (1 and sep5×5) in the controller correspond to building Convolutional Layer 2 (green) in the child model.

Convolutional Layer 3 (Blue)

max3×3.

The outputs from the 4th and 5th time step (1,2 and max3×3) in the controller correspond to building Convolutional Layer 3 (blue) in the child model.

Convolutional Layer 4 (Purple)

conv5×5.

The outputs from the 6th and 7th time step (1,3 and conv5×5) in the controller correspond to building Convolutional Layer 4 (purple) in the child model.

End

And there you have it — a child model generated using the macro search! Now on to micro search. Heads up: micro search isn’t as straightforward as macro search.

1.2 Micro Search

reduction cells. Simply put, a convolutional cell or reduction cell is just a block of operations. Both are similar — the only thing different about reduction cells is that the operations are applied with a stride of 2, thus reducing the spatial dimensions.

How to connect these cells to form the final network, you may ask?

The final network

Below is an image that gives you a quick overview of the final generated child model.

Fig. 1.2.1: Overview of the final neural network generated. Image source.

Let’s come back to this in a bit.

Building units for networks derived for micro search

There’s sort of a hierarchy of the ‘building units’ of child networks derived from micro search. From biggest to smallest:

block
convolutional cell / reduction cell
node

B are hyperparameters that can be tuned by the architect.)

N=3 convolutional cells and 1 reduction cell. The operations within each cell are not shown here.

Fig. 1.2.2: Overview of the final neural network generated. Image source.

So how to generate this child model from micro search, you may ask? Continue reading!

Generate a child model from micro search

B=4 nodes. This means our generated child model should look like this:

Fig. 1.2.3: A neural network generated from micro search which has 1 block, consisting of 3 convolutional cells and 1 reduction cell. The individual operations are not shown here.

Let’s now build a convolutional cell!

Fast forward

add operations. Let’s just take it for granted for now.

Fig. 1.2.4: Two convolutional cells already built in micro search.

With 2 convolutional cells built for us, let’s move on to the third.

Convolutional Cell #3

Now, let’s ‘prepare’ the third convolutional cell — the cell that you and I will be building together.

Fig. 1.2.5: ‘Preparing’ the third convolutional cell in micro search.

nodes?

cells. What about the other 2 nodes? These 2 nodes fall in this very convolutional cell that we are building right now. Let’s make known where these nodes are:

Fig. 1.2.6: Identifying the 4 nodes while building Convolutional Cell #3.

From this section onwards, you can safely disregard the ‘Convolutional cell’ labels you see on the image above and concentrate on the ‘Nodes’ labels:

Node 1 — red (Convolutional Cell #1)

Node 2 — blue (Convolutional Cell #2)

Node 3 — green

Node 4 — purple

If you’re wondering if these nodes will change for every convolutional cell we’re building, the answer is yes! Every cell will ‘assign’ the nodes in this manner.

You might also wonder — since we’ve already built the operations in Node 1 and Node 2 (which are Convolutional Cells #1 and #2), what’s there left to build in these nodes? You asked the right question.

Convolutional Cell #3: Node 1 (red) and Node 2 (blue)

inputs to the other nodes. In our example, since we are building 4 nodes, so Node 1 and 2 can be inputs to Node 3 and Node 4. So, yay! We don’t have to do anything for Node 1 and Node 2 and we can now move on to building Node 3 and Node 4. Phew!

Convolutional Cell #3: Node 3 (Green)

Node 3 is where the building starts. Unlike in macro search where the controller samples 2 decisions for every layer, here in micro search we have the controller samples 4 decisions for us (or rather 2 sets of decisions):

2 nodes to connect to
the respective 2 operations to perform on the nodes to connect to

With 4 decisions to make, the controller runs 4 time steps. Have a look below:

Fig. 1.2.7: The outputs of the first four controller time steps (2, 1, avg5×5, sep5×5), which will be used to build Node 3.

sep5×5 from each of the four time steps. How does this translate to the architecture of the child model? Let’s see:

Fig. 1.2.8: How the outputs of the first four controller time steps (2, 1, avg5×5, sep5×5) are translated to build Node 3.

From the above, there are three things that just happened:

avg5×5 operation.
sep5×5 operation.
add operation.

add operations.

Convolutional Cell #3: Node 4 (Purple)

avg3×3.

Fig. 1.2.9: The outputs of the first four controller time steps (3, 1, id, avg3×3), which will be used to build Node 4.

This translates to building the following:

Fig. 1.2.10: How the outputs of the first four controller time steps (3, 1, id, avg3×3) are translated to build Node 3.

What just happened?

id operation.
avg3×3 operation.
add operation.

And that’s it we’re done for Convolutional Cell #3.

Reduction Cell

N=3 in this tutorial, and we’ve just finished with Convolutional Cell #3, it’s time to build a reduction cell. As mentioned earlier, the design of the reduction cell is similar to Convolutional Cell #3, except that the operations that are sampled have a stride of 2.

End

And so that wraps up generating a child model out of the micro search strategy. Phew! I hope that wasn’t too much for you, because it was for me when I first read the paper.

2. Notes

Because this post mainly shows the macro and micro search strategies, I’ve left out many small details (especially on the concept of transfer learning). Let me briefly cover them:

What’s so ‘efficient’ in ENAS? Answer: transfer learning. If a computation between two nodes has been done (trained) before, the weights from the convolutional filters and 1×1 convolutions (to maintain number of channel outputs; not mentioned in the previous sections) will be reused. This is what makes ENAS faster than its predecessors!
It is possible that the controller samples a decision where no skip connection is needed.
There are 6 operations available for the controller: convolutions with filter sizes 3×3 and 5×5, depthwise-separable convolutions with filter sizes 3×3 and 5×5, max pooling and average pooling of kernel size 3×3.
Do read up on the concatenate operation at the end of each cell which ties up ‘loose ends’ of any nodes.
Do read up briefly on the policy gradient algorithm (REINFORCE) reinforcement learning.

3. Summary

Macro search (for an entire network)

The final child model is as shown below.

Fig. 3.1: Generating a convolutional neural network with macro search.

Micro search (for a convolutional cell)

Note that only part of the final child model is shown here.

Fig. 3.2: Generating a convolutional neural network with micro search. Only part of the full architecture is shown.

4. Implementations

TensorFlow implementation by the authors
Keras implementation
PyTorch implementation

5. References

Efficient Neural Architecture Search via Parameter Sharing

Neural Architecture Search with Reinforcement Learning

Learning Transferable Architectures for Scalable Image Recognition

Efficient Neural Architecture Search via Parameter Sharing. If you have any questions, please highlight and leave a comment.