APPLICATION OF DATA MINING TECHNIQUES IN HEALTHCARE: IDENTIFYING INTER-DISEASE RELATIONSHIPS THROUGH ASSOCIATION RULE MINING

This research focuses on the application of data mining techniques in a healthcare environment by utilizing patient visit data from Hospital X, coded with ICD-10 diagnoses. The purpose of this study is to explore the application of data mining techniques in a healthcare environment, specifically to identify the relationship between diseases using patient visit data from X Hospital. This research utilizes the FP-Growth algorithm method followed by Association Rule Mining to find frequent occurrences of diseases in the data set. The research process involved data pre-processing, transformation into binary format, and careful parameter setting (minimum support 0.95 and confidence 0.9). The results showed a strong association between chronic conditions such as hypertension and diabetes, which are prevalent in the patient population. This association provides insight into potential comorbidities and may assist healthcare providers in improving diagnosis accuracy and treatment effectiveness. This research has implications for the application of data mining techniques, demonstrating its potential in improving predictive analytics in healthcare and strategic planning. This approach not only aids in the efficient allocation of healthcare resources, but also aligns with the broader goal of improving personalized patient care


INTRODUCTION
The healthcare industry has undergone significant transformations over the past few decades, primarily driven by advancements in technology and data management (Wang et al., 2018).The digitalization of health records and the systematic collection of patient data have improved the quality of care and opened new avenues for analyzing health trends and managing diseases more effectively (Senbekov et al., 2020).
In Indonesia, a nation characterized by its vast geographic spread and diverse population, the management and analysis of health data present unique challenges and opportunities (Hilbert, 2016).The country's healthcare system must handle various diseases and health conditions, from communicable diseases common in tropical climates to non-communicable diseases arising from lifestyle changes.Tracking and analyzing these conditions accurately is crucial for effective public health management and policy-making (Groseclose & Buckeridge, 2017).
Within this context, the International Classification of Diseases, Tenth Revision (ICD-10), plays a pivotal role.As a global standard for reporting diseases and health conditions, ICD-10 provides a comprehensive system that allows healthcare professionals and policymakers to systematically store, retrieve, and analyze health information (Alyahya & Khader, 2019).This ensures consistency and accuracy in the data collected, facilitating effective national and international health strategies, disease management, and epidemiological studies (Alonge et al., 2019).
Data mining has emerged as a powerful tool in healthcare, offering the potential to extract valuable insights from vast amounts of data (Gul et al., 2021).Some methods and algorithms can be implemented in the healthcare industry.One such method is association rule mining, which aims to discover meaningful relationships between diseases or health conditions (Behzadnia et al., 2020).Healthcare providers can uncover hidden patterns and relationships within the data by applying techniques such as Association Rule Mining, leading to improved disease prediction, patient care, and resource allocation (Domadiya & Rao, 2019).These techniques are instrumental in identifying trends and associations that may need to be evident through conventional analysis methods.
Focusing on a specific instance, this study utilizes anonymized patient visit data from Hospital X, which includes over 65,535 patient entries from 2019 to 2023, coded with ICD-10.This dataset provides a rich source of information for exploring the complex interrelations of diseases within the hospital's patient population.By applying data mining techniques, particularly Association Rule Mining, this research aims to decode the intricate web of disease patterns, offering insights that could influence future healthcare decisions and strategies in Indonesia (Chaudhry et al., 2023).
In advance, by integrating advanced data mining techniques with robust classification systems like ICD-10, the research at Hospital X represents a significant step forward in understanding and managing health in Indonesia.Through this study, we aim to highlight the potential of data-driven approaches in enhancing the efficacy and efficiency of healthcare services nationwide (Cascini et al., 2021).
The research aims to delve into Hospital X's data using Association Rule Mining, seeking hidden disease patterns that regular epidemiological studies might miss.This exploration is crucial for predictive health analytics and strategic healthcare planning.Furthermore, the study intends to assess Association Rule Mining's effectiveness in leveraging ICD-10 coded data to gain deeper insights into disease trends and improve patient management strategies within Indonesia's healthcare landscape.Ultimately, this research strives to connect extensive data resources with practical healthcare insights, employing advanced data mining methods to enhance Indonesia's healthcare system's responsiveness and knowledge base (Karatas et al., 2022).

METHOD
This section outlines the methodology for researching "Application of Data Mining Techniques in Healthcare: Identifying Inter-Disease Relationships through Association Rule Mining" using the data from Hospital X.The methodology is structured as follows:

Data Collection
The data for this study consists of anonymized patient visit records from Hospital X, spanning from 2019 to 2023.Each record includes the visit date and the diagnosis codes assigned according to the ICD-10 classification.The dataset encompasses 65,535 entries, reflecting diverse patient interactions within the hospital.

Data Preprocessing
Before applying any data mining techniques, the dataset will undergo several preprocessing steps: 1.Data Cleaning: Remove any inconsistencies or errors in the data, such as duplicate records or missing values.
2. Data Transformation: Convert the diagnosis codes into a binary matrix format where rows represent patient visits and columns represent ICD-10 codes.Each entry in the matrix will be set to 1 if the diagnosis was made during the visit and 0 otherwise.3. Data Reduction: Filter out infrequent ICD-10 codes to focus on the most common and potentially more relevant ones for discovering significant associations.

Application of Association Rule Mining
Association Rule Mining will be implemented to discover exciting relationships between diseases: 1. Algorithm Selection: Use the FP-Growth algorithm, which is efficient for datasets where specific patterns frequently occur.2. Setting Parameters: Define the minimum support and confidence thresholds to identify meaningful rules.The support threshold will help eliminate rare itemsets.In contrast, the confidence threshold will ensure that only associations strong enough to be considered reliable are included.3. Rule Generation: Generate association rules from the frequent itemsets discovered by the FP-Growth algorithm.These rules will indicate which diseases tend to co-occur within the dataset.

Analysis of Results
The generated rules will be analyzed to identify and interpret significant disease relationships: 1. Rule Evaluation: Assess the rules based on their support, confidence, and lift values.Rules with high lift values are exciting as they suggest a strong positive association between the antecedent and the consequent.2. Medical Relevance: Consult with medical experts to interpret the findings in the context of clinical significance and existing medical knowledge.Here is the flowchart illustrating the research methodology that will be implemented:

RESULTS AND DISCUSSION Data Collection
The initial phase of our research involved a comprehensive examination of the raw data collected from Hospital X.This dataset encompasses patient visit records from 2019 to 2023, with each entry detailing the visit date and the respective ICD-10 diagnosis codes.The raw data provides a foundational understanding of the patient demographics, frequency of visits, and the diversity of diagnoses made during the period under review (Ng et al., 2016).
Below is a table representing a snapshot of the raw data.This table includes sample entries from the dataset, illustrating how the data is structured and the type of information available for each patient visit.This initial analysis is crucial for identifying any discrepancies, missing values, or outliers in the data that may affect subsequent data processing and analysis stages.By ensuring the integrity and completeness of the raw data, we establish a robust basis for applying data mining techniques to uncover meaningful patterns and associations between diseases.

Data Preprocessing
Following the initial review of the raw data, a pivotal transformation was conducted to analyze the patterns in the patient visits further.This transformation involved creating a pivot table where the rows represent the dates of patient visits and the columns correspond to the ten most frequently occurring diseases based on the ICD-10 diagnosis codes.This restructuring allows for an aggregated data view, highlighting primary health conditions' prevalence and temporal distribution over the specified period.
The pivot table created for this study includes the ten most frequently occurring diseases among the patients visiting Hospital X from 2019 to 2023.These diseases, represented by their respective ICD-10 codes, are pivotal in understanding the health trends and challenges within the patient community served by the hospital.Below is a brief description of each disease based on the ICD-10 classification: Below is the pivot table that displays the date of visit along with the counts of the ten most common diseases: Following the creation of the pivot table, the next crucial step involved transforming the data to prepare for Association Rule Mining.This transformation entailed converting the dataset into a binary format, where each cell in the table was assigned a value based on the presence or absence of a disease diagnosis on a particular visit date.Specifically, entries with a recorded diagnosis were marked with '1', indicating the presence of the disease.In contrast, cells without a diagnosis were marked with '0', indicating its absence.
This binary transformation is essential for the application of Association Rule Mining, as it simplifies the data and focuses the analysis on the occurrence and association of diseases rather than their frequency.This format allows the mining algorithm to efficiently identify and generate rules based on the co-occurrence of diseases across different patient visits.
Below is the transformed table, now formatted to suit the requirements of the Association Rule Mining process: This binary format highlights patterns and relationships that might need to be more evident when considering the raw count data.It sets the stage for the subsequent analysis phase, which involves applying the Association Rule Mining algorithm to discover significant associations between the diseases that frequently co-occur during patient visits.

Results of Data Mining Implementation Using FP-Growth and Association Rules Mining
The application of the FP-Growth algorithm followed by Association Rule Mining has yielded significant insights into the relationships between various diseases recorded in the data from Hospital X.By setting the minimum support at 0.95 and the minimum confidence at 0.9; we ensured that only the most relevant and frequently occurring disease associations were considered, thereby focusing on the most impactful relationships.The results table, which will be provided below, details the calculated data from this analysis: The results are presented in a detailed table that lists the premises, conclusions, and metrics such as support, confidence, Laplace, gain, p-s, lift, and conviction values for each rule identified.This tabular data quantitatively measures the strength and reliability of each association rule discovered during the analysis.
For example, one of the most vital rules discovered (Rule No. 65) shows that the presence of conditions coded as K30 (Dyspepsia), R50.9 (Fever, unspecified), I63.9 (Cerebral infarction, unspecified), and M54.5 (Low back pain) together in a patient significantly increases the likelihood of a diagnosis of I10 (Essential hypertension), with a confidence of 97.7% and a lift of over 13, indicating an influential association that is likely not due to random chance.This analysis highlights the common co-occurrences and points towards potential comorbid conditions that could inform clinical decisions and healthcare policy planning.Below is a graphical representation of the association rules derived from the analysis, showing the complex interplay between various diseases: emphasize the growing importance of big data in healthcare, a trend reflected in this study's approach to digital healthcare data, supported by the focus on electronic medical records (EMR) highlighted by (Berros et al., 2023) Significant findings from this study are consistent with those of Wang et al., who noted the effectiveness of sophisticated data mining techniques in healthcare (Sun et al., 2018).The accuracy of medical coding discussed by the Thought Leadership Team ensures the reliability of ARM in analyzing disease codes, enhancing the applicability of the findings to public health surveillance and interventions, as noted by (Morgenstern et al., 2021) Furthermore, the research supports the call for diversity and inclusive leadership in healthcare (Ashikali et al., 2021), as understanding disease interactions can lead to better health outcomes by tailoring healthcare practices to diverse patient needs.Overall, this study reinforces the value and feasibility of using data mining to enhance healthcare management and decision-making, contributing valuable insights to the field and promoting a more informed approach to healthcare policy.

CONCLUSION
This study utilized the FP-Growth algorithm and Association Rule Mining to analyze patient data from Hospital X, uncovering significant inter-disease relationships.The findings highlight strong associations between diseases like hypertension and diabetes, confirming the utility of data mining in healthcare for enhancing predictive modeling and resource allocation.These insights can inform future healthcare strategies, emphasizing integrating data analytics into clinical practice for improved patient outcomes and more personalized treatment plans.This approach encourages a data-driven direction in healthcare, aligning with global trends towards innovative solutions.

Figure
Figure 1.Research Flowchart

Figure 1 .
Figure 1.Isometric Graph of Diseases and RulesThis visualization helps in quickly understanding the strength and direction of the relationships among different ICD-10 coded diseases, providing a clear picture of how multiple conditions correlate within the patient population of Hospital X.The graphical representation serves as a powerful tool for communicating the results to stakeholders involved in healthcare management and planning.This study's use of Association Rule Mining (ARM) with the FP-Growth algorithm to identify inter-disease relationships aligns with existing research such as Sanati-Mehrizy et al. and Kulkarni and Mundhe, demonstrating ARM's effectiveness in healthcare data analysis.Dicuonzo et al. emphasize the growing importance of big data in healthcare, a trend reflected in this study's approach to digital healthcare data, supported by the focus on electronic medical records (EMR) highlighted by(Berros et al., 2023) Significant findings from this study are consistent with those of Wang et al., who noted the effectiveness of sophisticated data mining techniques in healthcare(Sun et al., 2018).The accuracy of medical coding discussed by the Thought Leadership Team ensures the reliability of ARM in analyzing disease codes, enhancing the applicability of the findings to public health surveillance and interventions, as noted by(Morgenstern et al., 2021) Furthermore, the research supports the call for diversity and inclusive leadership in healthcare(Ashikali et al., 2021), as understanding disease interactions can lead to better health outcomes by tailoring healthcare practices to diverse patient needs.Overall, this study reinforces the value and feasibility of using data mining to enhance healthcare management and decision-making, contributing valuable insights to the field and promoting a more informed approach to healthcare policy.