Abstract

Transform the diagnostic delivery process in Mexican public hospitals by identifying and understanding the different stages of the process, with the goal of deploying intelligent traffic light signaling to support the diagnosis of breast cancer using convolutional neural networks and mammograms in a real-world environment.

Code

https://github.com/sanchezcarlosjr/breast-cancer-toolkit

https://github.com/sanchezcarlosjr/breast-cancer-pipeline

Introduction

Since breast cancer is the first dead cause in Mexico among women it became a big public health problem —in fact, 2.26 million cases worldwide. In other words, is a type of cancer with the highest incidence and mortality in women: every day at least 14 women, chiefly between 50 to 69 years, die. Indeed, as we can see in the below figure, breast cancer is an increasing tendency compared to other cancers.

1-s2.0-S1470204512702462-gr1.jpg

Nowadays, doctors carried out analyses using Traditional 2D mammograms from patient requests at public Mexican hospitals. In Ensenada, doctors request private external assistance. Oncologists annotate medical images, and they chiefly say what is the patient's BI-RADS score. "BI-RADS" means Breast Imaging Reporting and Database System, and it's scoring standard radiologists and oncologists use to describe mammogram results. We'll explain further BI-RADS in the section.

Our goals are to build data mining models that understand mammograms and predict breast cancer developing risk, continuing works. Since our model output is a person's future healthy situation, we'll do descriptive and predictive methods, indeed we're going to apply Machine Learning algorithms to datasets. Of course, we don't expect to replace medical doctors but assist them. We know other computer-aided detection systems have been developed for breast cancer detection but no one applies them to regional cities and they are not free.

We expect our project can help thousands of women in quick cancer detection because deep learning is faster and cheaper than humans if we get good metrics, therefore we're contributing to the decrease in the death rate. PACS means Picture archiving and communication system.

Related work

The body of work related to breast cancer diagnostics using mammograms and convolutional neural networks is expansive. Various resources provide complementary perspectives, techniques, and tools.

For instance, the Open Health Imaging Foundation (OHIF) provides an open-source DICOM Viewer available on GitHub. The viewer is a zero-footprint medical image viewer provided as a Meteor package (OHIF, n.d.). It enables practitioners to visualize and navigate medical imaging data directly, enhancing understanding and improving diagnosis accuracy.

The Radiological Society of North America (RSNA) has published numerous papers discussing the importance of certain mammographic findings and terminologies. In one of these papers, they delve into the BI-RADS terminology for mammography reports, explaining what medical residents need to know (RSNA, 2023). This work informs the interpretation and communication of mammography results, which is a crucial step in diagnosing breast cancer.

Methodology

The first step in understanding the methodology for machine learning is familiarizing oneself with the key concepts. Data science is essentially the process of extracting meaningful insights and patterns from large and destructured datasets. This process leverages various techniques such as machine learning, neural networks, and statistical methodologies to decipher raw data, which can often be vast and complex.

An important tool in this context is the Digital Imaging and Communications in Medicine, or DICOM. DICOM is a standard protocol used for the transmission, storage, retrieval, and sharing of medical images. This protocol aids in the visualization and analysis of these images, enabling the identification of potential patterns or traits that might be of particular interest. In the world of data mining, this step is often referred to as "DICOM View."

In the context of mammography analysis, one might utilize the Digital Database for Screening Mammography, or DDSM. DDSM is one of the largest publicly available collections of mammograms. As part of the preprocessing step, the mammograms from DDSM can be analyzed and preprocessed to identify and potentially remove any noise or inconsistencies in the data. This process might involve data cleaning, normalization, transformation, and other techniques to prepare the data for further analysis.

Powered by Fruition