Python-Powered Machine Learning for Credit Card Fraud Detection: An Analytical Tool Approach
Persistent URL
Author(s)
Ochieng, Bill
Date Issued
May 3, 2024
Abstract
The prevalence of digital payments has escalated the complexity and frequency of credit card fraud, presenting a significant challenge in financial security. This research focuses on the application of Logistic Regression to predict fraudulent transactions, utilizing a dataset of one million transactions with an 8.74% fraud incidence rate. Key features such as transaction distance from the cardholder's home, the transaction type (online or chip-based), and the use of PIN numbers are considered to determine the likelihood of fraud.
This study initially explores the data's highly unbalanced nature, where fraudulent cases are significantly underrepresented. To address this, Logistic Regression is applied, and model performance is evaluated, revealing limitations in handling the imbalanced data. Subsequently, an oversampling technique is employed to balance the dataset, significantly improving model precision and recall metrics for fraudulent transactions from 0.61 and 0.72 to 0.93 and 0.95, respectively.
The effectiveness of the model after oversampling shows a robust capability to identify fraudulent transactions, which is crucial for real-time fraud detection systems used by financial institutions. The results indicate that while the accuracy marginally decreased from 95.9% to 94.1%, the adjusted model offers a more reliable solution for predicting fraud, effectively reducing the risk and potential financial losses due to fraud.
Major
Computer Science
Economics
First Reader(s)
Jumadinova, Janyl A.
Other Reader(s)
Navarro-Sanchez, Francisco
Department
Computer and Information Science
Business and Economics
Type of Publication
Senior Project Paper
File(s)![Thumbnail Image]()
Name
SeniorThesis_final.pdf
Size
3.83 MB
Format
Adobe PDF
Checksum (MD5)
08c5b3ead8716e18aaba117e24724aa0