The High Accuracy Diagnosis for MRI Imaging Of Alzheimer’s Disease using Xgboost

: Alzheimer’s disease (AD) is the most epidemic type of dementia. The cause and treatment of the disease remain unidentified. However, when the impairment is still at a preliminary stage or mild cognitive impairment (MCI), the symptoms might be more controlled, and the treatment can be more efficient. As a result, computational diagnosis of the disease based on brain medical images is crucial for early diagnosis. In this study, an efficient computational method was introduced to classify MRI brain scans for patients with Alzheimer’s disease (AD), mild cognitive impairment (MCI), and normal aging control (NC), comprising three main steps: I) feature extraction, II) feature selection III) classification. Although most of the current approaches utilize binary classification, the proposed model can differentiate between multiple stages of Alzheimer’s disease and achieve superior results in early-stage AD diagnosis. 158 magnetic resonance images (MRI) were taken from the Alzheimer’s Disease Neuroimaging Initiative database (ADNI), which were preprocessed and normalized to be suitable for extracting the volume, cortical thickness, sulci depth, and gyrification index measures for various brain regions of interest (ROIs), as they play a considerable role in the detection of AD. One of the embedded feature selection method was used to select the most informative features for AD diagnosis. Three models were used to classify AD based on the selected features: an extreme gradient boosting (XGBoost), support vector machine (SVM), and K-nearest neighborhood (KNN). highest accuracy of 92.31%, precision of 0.92, recall of 0.92, F1-score of 0.92, and AUC of 0.9543. Recent research has reported using multivariable data analysis to classify dementia stages such as MCI and AD and employing machine learning to predict dementia stages. In the proposed method, we achieved good performance for early-stage AD (MCI) detection, which is the most targeted stage to be identified. Moreover, we investigated the most reliable features for the diagnosis of AD.


INTRODUCTION
Alzheimer's disease (AD) is a cumulative neurodegenerative disease for which there is currently no cure. However, the detection of the disease at an earlier stage can assist in slowing down the progression of AD. Neuropathological changes due to AD appear prior to the onset of clinical symptoms. Hence, there is a need to detect brain alt-to a target brain via automated high-dimensional registration to obtain labeled ROIs from the target. Following this, regional tissue measurements such as ROI volume, cortical thickness, sulcal depth, and gyrification index were estimated. To accomplish this automatic feature extraction and analysis, there are various software packages such as Statistical Parametric Mapping (SPM), Computational Anatomy Toolbox (CAT12) [6], and FreeSurfer [7].
The chief concern addressed in several studies for diagnosing AD at its early stages are building more efficient biomarkers from MRI scans for AD detection. The use of various machine learning classifiers to select features related to Alzheimer's and to develop an accurate detection system is a current area of research. MRI scans have been studied to obtain several Alzheimer's biomarkers and to study the most atrophic regions using volume [8,9], shape [10], texture [11], cortical [12,13], and sulcal measurements [14]. These measurements were performed on various affected brain regions, including the hippocampus [15], amygdala [16,17], whole brain [18], entorhinal cortex [19], brainstem [20] and ventricles [21]. Although volume and cortical thickness are the most dominant biomarkers studied, there have been very few investigations of other surface-based features, such as the gyrification index and sulcal depth in early AD diagnosis. Analysis of the gyrification index and sulcal depth could provide remarkable information about the alterations in the brain shape caused by AD. These alterations are not detected with conventional volumetric analyses but could be captured with cortical gyrification analysis [22]. The sulcal depth and sulcal width were observed to be lower in normal controls and increasing along with the severity level of AD [12,14]. The gyrification index, which is the ratio of the inner surface (GM/WM interface) area to the outer surface area (GM/CSF interface), is an excellent feature for the early diagnosis of patients with mild AD and for separating them from normal controls [22]. Some researchers believed combining complementary biomarkers with different information could provide more efficient and accurate evidence for AD, MCI, and NC diagnosis [23].
Recent advances in machine learning techniques, such as support vector machine (SVM), K-nearest neighbor (KNN), decision tree [24], and ensemble models [25], enhance the process of disease diagnosis and increase the accuracy through automated systems instead of focusing entirely on physician experiments. However, selecting the best biomarkers that represent Alzheimer's is a major challenge that can be used to distinguish between stages of the disease. The particle swarm optimization (PSO) algorithm [26], XGBoost [27], , and t-test [29] are some of the feature selection methods that have been employed in recent studies. One study developed a method by combining SVM and particle swarm optimization (PSO) for classifying AD from NC with an accuracy of up to 94.12% and 88.89% for classifying MCI from NC, using volume and shape features [30]. Another study demonstrated 96.5% classification accuracy for AD/NC by investigating the temporal lobe and whole-brain gray matter [31]. Furthermore, a multistage classifier-based method used 88 features (50 volumes and 38 regional cortical thicknesses) to predict AD/MCI/NC with an accuracy of up to 81.3%. On the other hand, one report obtained 0.52 average precision and 0.56 average recall for classifying AD, MCI and NC using an ensemble random forest. Moreover, a surface-based morphometry report differentiated between AD and NC with 93.3% specificity and 87.1% sensitivity [32].
Our contribution is to have the most beneficial number of features among a large pool of AD biomarkers to differentiate between AD stages and diagnose the early stage with high accuracy using XGBoost. In this study, MRI scans were acquired from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and processed to extract volumetric features for 81 brain regions. In addition, cortical thickness, sulcal depth, and gyrification index features were obtained for 68 brain regions, and all features were combined to get 275 attributes. We used XGBoost to find the best features that represent Alzheimer's disease in order to build a more precise classification system. At last, three different classifiers (XGBoost, SVM, and KNN) were used to compare the classification accuracies.

Database
Data required for this study were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). ADNI was propelled as a publicprivate corporation by six nonprofit organizations in 2003: the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), and private pharmaceutical companies. ADNI's main objective was to check whether specific biomarkers, clinical and neuropsychological assessment, positron emission tomography (PET), and serial MRI can be combined to evaluate MCI evolution and early Alzheimer's.
158 T1-weighted MRI scans have been taken from ADNI, 26 female cases and 28 male cases in AD stage, 28 females and 25 males in NC stage and in MCI stage, 27 females and 24 males. The age range of the participants was 50-85 years. The imaging parameters were as follows: magnetic field strength =3T, flip angle= 9°, repetition time = 2,300 ms, echo time = 3.0 ms, slice thickness =1.2 mm, acquisition matrix = 240 ×256, pixel spacing X=1.0 mm; pixel spacing Y=1.0 mm and number of slices = 176. Some criteria were not considered in the dataset, such as the Clinical Dementia Rating Scale (CDR), Mini-Mental State Examination (MMSE), chronic diseases and medical history of the patient. The demographic characteristics of the subjects are presented in Table 1.

Image Analysis
Data were collected from ADNI and preprocessed using CAT12. The preprocessing workflow included a spatial adaptive nonlocal means (SANLM) denoising filter [33] to reduce noise while preserving edges, bias field inhomogeneity correction, and affine registration to get further high-quality segmentation outcome, skull stripping with adaptive probability region growing (APRG) approach, and segmentation to three tissues (GM, WM and CSF) using Adaptive Maximum A Posterior (AMAP) technique [34]. After that, spatial normalization of the three tissues was performed using the DARTEL [35] template. Then, the Hammers atlas [36], one of the volume atlases, was used to calculate GM and WM volumes for specific brain regions. Surface-based processing was performed following the completion of the previous processing. The projection-based thickness (PBT) method [37] estimated cortical thickness, and the central cortical surface was reconstructed. The central surface is the surface between the inner (WM/GM boundary) and outer (GM/CSF boundary) cortical surfaces, which represent the cortex well, and enable reliable estimation of cortical parameters (GI, SD). Ultimately, 71 raw volumetric measurements and 68 cortical thickness (CT), 68 gyrification indexe (GI), and and measurements 68 sulcal depth (SD) measurements were extracted. Volume measurements involving the hippocampus, amygdala, temporal pole, fusiform, insula, putamen, thalamus, lateral temporal ventricle, and cuneus were normalized by the intracranial volume. Relative volumes provided more precise volumes by reducing the influence of factors such as head size and brain size. Surfacebased features (CT, GI, and SD) include entorhinal, temporal pole, insula, fusiform, parahippocampus, insula, etc. By combining volume with surface-based features, we will collect most of the important parameters to indicate the existence of the disease, as they are complementary biomarkers with valid information (Fig. 1).

Features Selection
The feature selection process uses a specific algorithm to determine the most dominant features contributing more to the prediction variable to improve model accuracy and reduce computational cost. There are three feature selection methods: filter, wrapper, and embedded. In embedded methods, feature selection can be used as a part of the training process, as the model picks features that maximize accuracy [38]. Embedded methods have an advantage over wrapper methods because they eliminate the computation time required to reclassify different subsets. Moreover, they outrank filter methods by considering the dependencies between features [39]. Therefore, there is no need to take the step of inspecting the correlation between features. Thus, we used one of the embedded feature selection methods, XGBoost, to get the top-ranked features.
We have a high-dimensional feature vector, 68×3 surfacebased features (cortical thickness, sulcal depth, gyrification index), and 71 volumetric features, and not all have important information for diagnosing AD.
In XGBoost, we chose the gain value associated with each feature to rank them. The gain parameter for each feature corresponds to the average loss reduction gained when using this feature to split trees. After feature ranking, we built a model by progressively increasing the feature size, starting with the most important features and recording the accuracy. The accuracy stabilized from Features 16 to 21. From Feature 21, it decreased by 15% approximately. Adding more features will not improve the performance and will make the model more complex. Thus, the number of features is limited to 16.

Classification
In the classification step, two conventional techniques have been used: SVM, KNN, and one of the recent technology models, such as XGBoost. XGBoost is an enhanced version of the gradient boosting ensemble learning method with highly precise and promising results, which is implemented by Guestrin [40]. XGBoost comprises a series of decision trees (weak learners) that are created in a sequential manner and consequently combine their decisions to predict the target. As shown in Fig. (2), one tree (weak classifier) is fitted to a split of features to begin the training. It then fits another tree based on the training error (residual) from the previous tree, and this process is repeated. The final predicted output combines all results of the tree.
The prediction function is defined as:

(1)
Where y i is the predicted class of the i-th observation, x i is the corresponding feature vector, and k is the total number of decision trees ƒ k (x i ) defined as: (2) w qk (x i ) is the structure-function of the k-th decision tree that maps x i to the corresponding leaf node, w is the vector of leaf weights.
XGBoost uses gradient descent to minimize the errors of weak learners. The objective function is expressed as follows: (3) where is the loss function that measures the deviation between the prediction and the true value y i , is the regularization term. (Tree model complexity penalty term) Ω(ƒ k ) is defined as: where T is the number of leaf nodes, γ is the weight of the leaf nodes, and λ and w are regular coefficients. The model is being trained in an additive manner. Let be the predicted value of the i-th observation at the t-th iteration, and the prediction function is:

(5)
And the objective function is altered to:

(6)
The regularization function is responsible for stopping the training of a model when the function determines that the model is sufficiently effective based on the learning score, thereby avoiding the risk of overfitting.
XGBoost uses the second Taylor approximation to optimize the objective function quickly.

(7)
XGBoost is faster than gradient boosting because it uses the power of parallel processing, which makes it possible to train on large data in a better manner. It also deals with small and sparse data efficiently and uses regularization to avoid overfitting. XGBoost includes a large variety of tuning parameters for cross-validation, regularization, user-defined objective functions, missing values, and tree parameters. It uses the features of each MRI image to train and evaluate the importance score, which implies how significant the related feature was in buildingthe boosted decision trees within the model. The values used for each parameter are explained in Table 2 SVM is a supervised machine learning model used for classification or regression and has been broadly used in various successful applications. SVM chooses the best hyperplane or a group of hyper-planes that maximizes the distance of the margin between classes to classify data. For non-linearly separable data, SVM utilizes a kernel function that maps the input data (training samples) to a higher dimensional space, such as Gaussian kernel [41]:

(8)
Where γ is gamma, which controls the influence of each training point has on the position of the decision boundary, |x -y| 2 can be defined as squared the Euclidean distance between the two feature vectors.
We used a polynomial kernel, and the polynomial order was 3 and the box constraint was 5.23.
KNN is a supervised nonparametric machine learning method. It stores and arranges all labeled data in memory during the training process; therefore, it is memory dependent and does not require model fitting. The test point is then classified based on a similarity measure between this point and its neighbors. Given x0 as a new point, the KNN search selects the k-nearest points in terms of distance to x0. The number of data points in each class is counted among these k neighbors, and the data points are classified based on votes from the neighbors [42]. Cityblock was used to measure the distance between points, and the number of neighbors was 12.
Instead of using all features for classification, feature selection approaches are commonly used to improve the accuracy and performance, especially for high-dimensional datasets. XGBoost was used for feature selection, then XGBoost, SVM and KNN were used to classify the brain MRI scans into three classes: AD, MCI, and NC. The proposed classification approach is shown in Fig. (3).

RESULTS
There were 158 cases in this study, 119 (41 NC, 40 MCI, 38 AD) participants for training models and 39(12 NC, 11 MCI, 16 AD) participants for testing the performance of the classifiers. The features are in four main groups: volume features, cortical thickness, sulcal depth, and gyrification index. Volume was measured in 71 regions of interest (ROI) of the brain. Each of the other three features was measured for 68 ROI, as explained in Appendix 1.
The XGBoost algorithm was used to obtain feature importance. Feature importance is a score that illustrates the value of each attribute in the building of boosted decision trees within the model. The higher the relative importance of an attribute, the more it is used to make key decisions in the decision trees. Feature importance is measured explicitly for each feature in the dataset by calculating the average decrease in impurity or the error function (such as the Gini impurity) for each feature across all decision trees within the model.
Using the XGBoost algorithm, we ranked all features. Then, starting with the most informative feature, we implemented a method to gradually increase the feature size (number of features) until the features no longer increase the performance (16 and 17 features). The accuracy was fixed at 92.31%, as shown in Fig. (4).  We got 16 top-ranked features: from the volume features group: rAmy, lAmy, lSupFroGy, rPosTeLo, and lCbe; from the gyrification index features group: lparahippocampal, lparacentral, rinferiortemporal, rparsopercularis, and rlateralorbitofrontal; from the sulcal depth features group: lsuperiortemporal, lparacentral, and rtemporalpole; and cortical thickness: lentorhinal, lfusiform, and rinsula, as illustrated in Table 3 and Fig. (5). Subsequently, these features were trained using three classifiers: SVM, KNN, and XGBoost. We considered four commonly used metrics which are ACC (accuracy), SEN (sensitivity), SPE (specificity), and AUC (area under the curve), to evaluate the classification performance. To achieve more stable results and maintain the same distance for all classifiers, we used 10-fold cross-validation to compare all methods: sensitivity = recall = Where TP, TN, FP, and FN are true positive, true negative, false positive and false negative, respectively. Area Under the Curve (AUC) is the two-dimensional area under the receiver operating characteristic (ROC) curve, which is a graph between the precision (y-axis) and recall (x-axis) at various thresholds (0-1).
From Table 4, XGBoost gave the highest accuracy, which was 92.31%, among the other classifiers (SVM and KNN) with 89.18%. In addition, XGBoost has the best values for all metrics (precision, recall, F1-score, and AUC) over the SVM and KNN. It had the precision of 0.92, recall of 0.92, F1-score of 0.92, and AUC of 0.9543, as shown in Table 5 and Fig. (6).   The F1-score and AUC for the MCI stage had the highest values across all three classifiers, which means that with these selected features, we can differentiate the MCI stage from the others (NC and AD) in an excellent manner. Moreover, the AD stage had a quite high F1-score and AUC.
NC was the lowest stage in the F1-score among the other stages in all classifiers and had a quite high precision value and which means that all classifiers, in their errors, tended to classify NC as an AD or MCI patient. This status in disease diagnosis is preferable to classifying a patient as normal.
We used the original number of features for each group, and from Table 6, we can determine that volume features are the best group of features for detecting AD, followed by GI and CT groups. From the results, the SD feature group alone was not very effective in the diagnosis, although it improved the overall accuracy when combined with other feature groups. In Fig. (7), we performed the training and testing process for the three models (SVM, KNN and XGBoost) multiple times with different numbers of features (14-23 features). Therefore, XGBoost has the highest accuracy with the least number of features (16 features).

DISCUSSION
Recent research has reported using multivariable data analysis to classify dementia stages such as MCI and AD, as well as employing machine learning to predict dementia stages. Multivariate analysis studies have found that MCI is characterized by major temporal lobe atrophy, particularly in the superior and inferior temporal gyrus and hippocampus. The same study that classified early MCI in elderly healthy ageing people using only two structural regions in both hemispheres, the amygdala and hippocampus, found the best accuracy of up to 0.9 [43]. Temporal lobes are mostly associated with the encoding of memory and the processing of auditory information. The temporal lobe is also thought to play a critical role in processing certain aspects of vision and language [44]. Posterior medial temporal deterioration is related to disturbances in episodic memory in patients with AD [45].
Moreover, there is a study that reported that even in the early stages of dementia, the level of amygdala atrophy was associated with the severity of cognitive impairment (as determined by the MMSE and CDR-SB) [16]. In addition, amygdala atrophy is associated with abnormal motor behavior with a potential association with agitation and anxiety [46], which appear in Alzheimer's. As it Plays a vital role in the memorizing and processing of emotional responses [47]. The mean volume of the amygdala is 3.17 cm 3 lower than AD, which has 2.729 cm 2 average volume [48].
Another study reported that the insular cortex, entorhinal, and fusiform were included with the most significant ROIs to predict a specific score for AD neuropathologic changes [49]. According to post-mortem AD neuropathological examination, entorhinal cortical thickness assessment was significantly related to neurofibrillary tangles in a recent pre-mortem MRI analysis [24]. Insular functions vary from basic functions, such as interoception and gustation, to integrative functions, such as decision-making, self-awareness, and self-consciousness [50]. Alzheimer's disease (AD) often involves visceral dysfunction and behavioral dyscontrol, which are not found in other disorders that affect cognition. This may be associated with autonomic instability and loss of self-awareness, and pathological changes within the insula cortex may play an important role [51].
On the other hand, the hippocampus and entorhinal cortex are critical for memory and spatial navigation [52]. The entorhinal cortex sends information to the hippocampus from different areas of the cerebral cortex, collectively known as the association cortex, while also returning processed information by the hippocampus back out to the association cortex [53]. These are the first brain regions to be affected in Alzheimer's disease. The average hippocampus volume for 66.27 ± 6.1 years is 5.202 (±0.76) cm 3 and is reduced by 25% in Alzheimer's [54]. The entorhinal cortex has an average volume of 1.93 cm 3 for NC and 1.417 cm 3 for AD [55].
Sulci have only been used in a few studies to distinguish between MCI and NC subjects. Among them, Park and his colleagues employed cortical thickness and sulcal depth to classify AD and MCI [56,57]. Sulcal abnormalities have been associated with normal ageing and cognitive impairment in research [58,59]. There is a consensus between most of the features reported in previous studies that are correlated with either MCI or AD, and the features that we employed in our model.
Our approach has the advantage of using a combination of features (volume, CT, GI, and SD) rather than depending on one group of features. As, they complemented each other and covered all the anatomical changes in AD. Besides, it classifies multiple stages of AD in one step and produces excellent results.
There is currently no predictive imaging biomarker for Alzheimer's disease that has confirmed/substantial neuropathologic correlations, especially in the early stage. However, employing the improvement in imaging and machine learning in the early detection of anatomical abnormalities in the prodromal stage, before they become clinically manifest, will be beneficial for preventing disease progression and designing effective treatments. By implementing XGBoost for the selected 16 features of the four groups of MRI images, the classification of NC, MCI, and AD can be performed with an accuracy of 92.31%.

CONCLUSION
In the proposed method, we achieved good performance for early-stage AD (MCI) detection, which is the most targeted stage to be identified. Moreover, we investigated the most reliable features for the diagnosis of AD. This approach relies on using an embedded method such as XGBoost to extract the most important features representing AD from a large pool of features. In addition, three classifiers (XGBoost, SVM, and KNN) were used to determine the classifier with the highest accuracy. According to all the tested models, XGBoost was the most precise classifier because it had the highest precision, sensitivity, F-score, ROC-AUC, and overall accuracy of 92.31%. Moreover, the following features: rAmy, lAmy, lSupFroGy, rPosTeLo, lCbe, from gyrification index features group: lparahippocampal, lparacentral, rinferiortemporal, rparsopercularis, rlateralorbitofrontal, from sulcal depth features group: lsuperiortemporal, lparacentral, rtemporalpole, from cortical thickness features group: lentorhinal, lfusiform, rinsula are the most important features to detect MCI and AD together with NC. Furthermore, combining the volume features with cortical thickness, sulcal depth, and gyrification index of the brain regions yields more accurate results than using either of them independently.

HUMAN AND ANIMAL RIGHTS
Not applicable.

CONSENT FOR PUBLICATION
Not applicable.

STANDARDS OF REPORTING
STROBE guidelines were followed.

AVAILABILITY OF DATA AND MATERIALS
The data supporting the findings of this study are available within the article.

FUNDING
None.

CONFLICT OF INTEREST
The authors declare no conflict of interest, financial or otherwise.

ACKNOWLEDGEMENTS
Data used in this study was acquired from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this paper. Data collection and sharing for this project were funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: