Engineering Services - 4VDATASOLUTION-AI

ML-Breast Cancer Detection

Breast cancer is a disease caused by the abnormal growth of breast cells and is the second most common cancer worldwide, following lung cancer. Women face a 12% lifetime risk of developing breast cancer, with the likelihood increasing with age. Several factors contribute to its development, including gender, age, family history, obesity, and genetic mutations. Early detection is crucial, and imaging techniques such as mammography and ultrasound are widely used for screening. However, mammography has limitations, particularly in women with dense breast tissue, where its sensitivity is reduced.

AI-based models, particularly machine learning (ML) algorithms, are transforming breast cancer detection by analysing medical image datasets and patient characteristics to identify cancer or assess risk. Through radiomics, ML algorithms can extract quantitative features from mammograms and ultrasound images, enhancing diagnostic precision. AI-driven prediction models incorporate various risk factors—such as genetics, lifestyle, and environmental influences—to develop personalized imaging and treatment plans. Deep learning (DL) tools further enhance detection accuracy and efficiency.

These data-driven techniques have the potential to revolutionize breast imaging by leveraging large datasets to automatically learn and identify complex malignancy patterns. This paper provides an overview of recent ML-based approaches and architectures in mammography, highlighting their strengths and limitations. Additionally, it explores the challenges and opportunities of integrating deep learning into breast cancer screening and diagnosis to improve patient outcomes.

Objective

The objective of this study is to develop a deep neural network CNN model that, given a patient’s breast tissue image, can accurately predict whether the sample contains IDC .A benign tumor, also known as a benign neoplasm or benign growth, is a noncancerous collection of cells . Ductal carcinoma in situ (DCIS) is a condition that affects the cells of the milk ducts in the breast. The cells lining the milk ducts turn malignant (cancerous) but stay in place (in situ). DCIS Invasive ductal carcinoma (IDC) is the most common type BC, accounting for 80% of cases, and invades nearby breast tissues. IDC can be classified based on hormone receptor status and HER2 status. Our goal here is to identify IDC.

CNN Model Architecture

Mammography is an X-ray imaging technique designed for breast tissue using differential X-ray absorption. It is primarily used for breast cancer screening, early detection, and diagnosis, detecting small abnormalities, e.g., tumors and microcalcifications.

Detecting breast cancer in screening mammograms is a complex image classification task due to the small size of tumors relative to the overall breast image. For instance, a full-field digital mammography (FFDM) image typically measures 5000 × 4000 pixels, while a potentially cancerous region of interest (ROI) may be as small as 200 × 200 pixels.

To streamline the management of screening mammogram images, we will consolidate all images in a centralized repository while ensuring that each image retains its ownership and classification (malignant and normal), label data. The label or target data depends on the HER2 test status and clinical characteristics of each patient. Below Mammography images shows tissue sample of normal, Benign and Malignant.

Analysing the input data reveals that most mammograms appear two different types and distinguishing between the two types of images with the naked eye is challenging. However, the model can analyse subtle, hidden patterns within the images, allowing it to accurately determine their classification.

Architecture Diagram

1) Input Layer:This is the Input Layer of the neural network, kernel size(4,4),filter=32,input-shape=(25,25,3) and activation=relu. Where the data is initially fed into the model. No modifications are made to the input data at this stage; instead, the raw values of the input observations are passed directly to the next layer, known as the Hidden Layer.

2) Polling Layer:The Pooling Layer (maxPool2D(2,2)) is a crucial component of convolutional neural network (CNN) used to reduce the spatial dimensions of feature maps while preserving important information. This helps decrease computational complexity, prevent overfitting, and enhance feature extraction.

3) Hidden Layer:This is the Hidden Layer, filter=256, activation=relu, where the neural network processes and learns patterns from the input data. It performs computations, extracting meaningful features and refining the learned representations. The learned values are then compared and adjusted through training. Once these operations are completed, the processed information is passed to the Output Layer for final predictions.

4) Output Layer:The result of this process is produced in the Output Layer, SoftMax layer 2 outputs (malignant/non-malignant) where the final predictions or classifications of the neural network are generated based on the learned patterns from the hidden layers.

5) Code:Python code using python lib

6) Model Summary:Model Parameters and Summary

7) Test/validation splits: The split ration 70:30.Training Set:Class 0: 178,800 samples,Class 1: 70,800 samples.Validation Set: Class 0: 19,800 samples Class 1: 7,800 samples

8) Model Training: my_model.fit(trainData, validationData,epochs=60,verbose=1,callbacks=early_stop)

Epoch 7/60 24878/24878 [==============================] - 233s 9ms/step - loss: 0.3591 - accuracy: 0.8480 - val_loss: 0.3655 - val_accuracy: 0.8457

9) Model Result:After multiple iterations, we successfully designed an effective model. Our model achieves a solid 86% accuracy, with minimal overfitting, indicating a well-generalized performance on both training and validation data.

10) Save and load model:Now we can save the model and its artifacts in a file and load them to verify with unknown data.

Prediction:We have taken 20000 images of both classes for the predication.The percentage of cancer: 20.85%,Percentage of no cancer: 79.15%.

Conclusion

This study examines Deep Learning CNN model for breast cancer classification after preprocessing mammography Xray dataset. The data is processed using the Standard Scaler module and feature selection is performed using Python’s scikit-learn package. Breast cancer is a prevalent disease affecting women worldwide, with machine-learning approaches potentially impacting early detection and prognosis. The disease is classified into two subtypes: invasive ductal carcinoma (IDC) and ductal carcinoma in situ (DCIS). Early detection is crucial for successful treatment, and appropriate screening technologies are essential. Mammography, ultrasonography, and thermography are common imaging modalities for detecting breast cancer. Advancements in artificial intelligence have made mammography more accurate, and deep learning models are being developed to recognize breast cancer in computerized mammograms. Convolutional neural networks and AI are emerging in healthcare to improve image processing and reduce human eye recognition.