Convolutional Neural Network for Coronal Hole Detection
A Supervised Learning Image Segmentation Problem
The goal of this project was to create a machine learning model for the detection of Coronal Holes in Line of Sight Solar Disk Images. This is an image segmentation model that uses a supervised learning algorithm. Therefore, I decided upon the use of a Convolutional Neural Network. The creation of the final data product (a full-Sun map) was broken down into four main steps: dataset creation, model training, model validation, and mapping.
Dataset Creation
The data used were pre-processed images stored in an HDF5 file system through my previously created image pre-processing pipeline. Here, long-term averages of EUVI/STEREO A/B (195 Å) and SDO/AIA (193 Å) images are used to compute data-derived corrections for center-to-limb variations in images and intensity differences among instruments. The calculation of these corrections has been greatly simplified through modern database storage and querying techniques implemented in the image processing pipeline. Each image has limb-brightening and inter-instrument intensity corrections applied before moving on to the creation of a segmentation mask.
After image processing, Coronal Hole regions are detected using our previously created two-threshold, region-growing algorithm, which uses pixel-connectivity requirements to avoid false detections. These Coronal Hole Detections are used as the segmentation mask (Y data) while the pre-processed images are used as the input images (X data) for training the model.
In order to train this model, I first began by creating my training dataset. I used solar disk images from the year 2011 at a two hour cadence, with 3 images per timestamp (one from each instrument). Each image underwent the image pre-processing corrections and were then stored in an HDF5 file to decrease the amount of memory used when training the model. An additional advantage of the HDF5 format was that I was able to hierarchically structure my dataset and group together images and segmentation masks from the same timestamp.
The processed disk images and accompanying Coronal Hole Detections were two-dimensional intensity arrays (IMG_HEIGHT, IMG_WIDTH)
, however for the use in a CNN model, this data had to be converted to RGB (or RGB-A) arrays (IMG_HEIGHT, IMG_WIDTH, N_CHANNELS)
. The conversion of this data from an intensity based array to RGB(-A) arrays was done using the matplotlib colors package.
Then, this data was split into training and validation datasets using the sklearn.model_selection.train_test_split()
method. Once split, the data was converted into dictionaries of “image” and “segmentation_mask” for each datapoint. I then used the tensorflow tf.data.Dataset.from_tensor_slices(dict)
to split the dictionaries into tensorflow compatible tensor slices to be used for training. This was the most time consuming pre-processing step, as converting numpy arrays to tensorflow compatible tensors is a computationally intensive process.
After being converted into tensor format, these datasets had to be mapped into the correct image size and normalized. Each dataset (training and validation) was mapped and normalized, so that it was the size of the input image.
Model Creation
At its core, the model I used was a U-Net model. This is an autoencoder model favored because of its ability to have a precise output, despite using fewer images. This model has a symmetric contraction and expansion path that give it the characteristic U-shape and name!
Below we see the full expansion of the model:
The model input was a four dimensional tensor of size (BATCH_SIZE, IMG_SIZE, IMG_SIZE, N_CHANNELS)
such that each size was user inputted and mutable. The model went through an encoder and decoder sequence. Each layer used two consecutive layers of convolution which were each followed sequentially by a ReLU activation function.
In the encoder (contraction) pathway, regular convolutions and max pooling were applied. The goal of pooling is to downsample an input image to extract dominant features. Max pooling is a downsampling method by which the maximum value is picked out of the receptive field. With max pooling, you look at the maximum presence of a feature rather than the average presence (such as in average pooling). Through contraction, the model learns pixel intensity information about the image, but loses spatial information in the process.
We then have the decoder (expansion) pathway. In the decoder path, we apply upsampling along with our regular convolutions. Upsampling is a process of transposed convolutions. The goal of this process is to regain spatial information for the image and create an output image of identical dimensions to the input image. The functions I used to build both the encoder and decoder pathways were from the tensorflow.keras.layers
package.
In addition to the encoder and decoder pathways that build the basic structure of the model, some additional pieces in model creation were important, specifically in regards to model training and compilation. Below we see the line to compile the model:
model.compile(optimizer=Adam(), loss=”binary_crossentropy”, metrics=[“accuracy”])
The optimizer and loss arguments are user defined and the tensorflow.keras package has a multitude of options. The optimizer I used was the Adam optimizer. The Adam optimizer is computationally efficient and easily implemented, even with large training datasets. It is a stochastic gradient descent method based on adaptive estimation of first and second order moments. This means that, not only does it base changes to the learning rate on the mean (first moment), it also uses the uncentered variance (second moment). It has a recommended learning rate of 0.001, which I used. In an optimization model, the learning rate is the parameter used to determine the step size of each iteration. Choosing the correct learning rate is important because it can mean the difference between overfitting and convergence.
The binary cross entropy (log) loss function is a bit different than your traditional softmax loss function. It uses a sigmoid activation function coupled with cross entropy loss. This means that the loss calculated for each vector output in the CNN is independent of other values. Cross entropy is a measure of the difference between two randomly defined probability distributions. We can visualize this loss function as the log loss equation below:
The next step was fitting the model to training data:
results = model.fit(train_dataset, batch_size=BATCH_SIZE, epochs=EPOCHS, callbacks=callbacks, validation_data=val_dataset)
In addition to using the traditional model.fit()
function, I implemented callbacks using the tensorflow Keras library. The three functions I used were: EarlyStopping()
, ReduceLROnPlateau()
, and ModelCheckpoint()
.
EarlyStopping()
was used to end model training early if the validation loss ceased to improve after 10 epochs. This was to protect against continuing model training without any improvement in accuracy. ReduceLROnPlateau()
is a Keras function used for learning rate decay. If after 5 epochs (a used defined value) the validation loss had continued to plateau, the learning rate would decay by a factor of 0.1 (used defined) until the minimum learning rate of 0.00001 (used defined). This ensured a balance between a low learning rate, which allows for better prediction yet takes longer, and a high learning rate, which may converge faster but could result in an inaccurate model. ModelCheckpoint()
is the Keras function to save model weights as an HDF5 file. It was set to save the model after each epoch, yet only save the best model. This means that if the model in epoch 24 was not better than that of epoch 23, the HDF5 file would not be overwritten.
The model was then trained using the training dataset I created, as described earlier. Training was set to halt after 50 epochs, however the in some instances training ended early, around epoch 30. Below, we can see the learning curve from model training for the model that showed the best results. We see the training curve in orchid and the validation curve in light blue. The red X shows the best model which is the one saved and used for model validation and implementation.
Model Validation
The model was tested and validated using the training, validation, and test datasets. Each of these datasets was in the format of an HDF5 file that was read and parsed as needed. The predictions from each set of model validation steps is seen below. Each prediction has three images, the initial EUV Image, the CHD segmentation mask, and the Model Prediction. Additionally, I created a binary model prediction which took the output prediction from the model and if the prediction was above 0.5, it decided that there indeed was a Coronal Hole at this pixel, and vice versa if the prediction was less than 0.5.
Prediction of Training Data
This prediction was done on data the model saw while training. We see that the model prediction and segmentation mask are extremely similar, as because the model has seen this datapoint before.
Prediction of Validation Data
We see a prediction done on validation data below. This was the validation data that the model used when calculating the loss function and model accuracy.
Prediction of Test Data
Finally, we see below a prediction done on test data — completely new, never been seen before data for the model. We see that this prediction seems extremely accurate!
Mapping Pipeline
After all this hard work we are able to implement our model into the full mapping pipeline. There are multiple mapping pipelines for the creation of various full-Sun, Carrington Rotation, and ensemble detection maps. Below we see a full-Sun synchronic map with an overlaid Coronal Hole detection — how pretty!