| dc.contributor.author | Ümit Yilmaz |  | 
| dc.contributor.author | Zafer Aydin |  | 
| dc.contributor.author | V. Çağri Güngör |  | 
| dc.contributor.author | Cengiz Gezer |  | 
| dc.date.accessioned | 2022-04-08T07:16:55Z |  | 
| dc.date.available | 2022-04-08T07:16:55Z |  | 
| dc.date.issued | 2021 | en_US | 
| dc.identifier.isbn | 978-145038954-9 |  | 
| dc.identifier.uri | https //doi.org/10.1145/3471287.3471299 |  | 
| dc.identifier.uri | https://hdl.handle.net/20.500.12573/1256 |  | 
| dc.description.abstract | Determining the potential customers is very important in direct marketing. Data mining techniques are one of the most important methods for companies to determine potential customers. However, since the number of potential customers is very low compared to the number of non-potential customers, there is a class imbalance problem that significantly affects the performance of data mining techniques. In this paper, different combinations of basic and advanced resampling techniques such as Synthetic Minority Oversampling Technique (SMOTE), Tomek Link, RUS, and ROS were evaluated to improve the performance of customer classification. Different feature selection techniques are used in order the decrease the number of non-informative features from the data such as Information Gain, Gain Ratio, Chi-squared, and Relief. Classification performance was compared and utilized using several data mining techniques, such as LightGBM, XGBoost, Gradient Boost, Random Forest, AdaBoost, ANN, Logistic Regression, Decision Trees, SVC, Bagging Classifier based on ROC AUC and sensitivity metrics. A combination of Tomek Link and Random Under-Sampling as a resampling technique and Chi-squared method as feature selection algorithm showed superior performance among the other combinations. Detailed performance evaluations demonstrated that with the proposed approach, LightGBM, which is a gradient boosting algorithm based on decision tree, gave the best results among the other classifiers with 0.947 sensitivity and 0.896 ROC AUC value. © 2021 ACM. | en_US | 
| dc.description.sponsorship | Illinois State UniversitySouth Asia Institute of Science and Engineering (SAISE)University of Hawaii at Hilo | en_US | 
| dc.language.iso | eng | en_US | 
| dc.publisher | Association for Computing Machinery | en_US | 
| dc.relation.isversionof | 10.1145/3471287.3471299 | en_US | 
| dc.rights | info:eu-repo/semantics/closedAccess | en_US | 
| dc.subject | Data Mining | en_US | 
| dc.subject | Direct Marketing | en_US | 
| dc.subject | Imbalanced Data | en_US | 
| dc.subject | Machine Learning | en_US | 
| dc.subject | Tomek Link | en_US | 
| dc.title | Data Mining Techniques in Direct Marketing on Imbalanced Data using Tomek Link Combined with Random Under-sampling | en_US | 
| dc.type | conferenceObject | en_US | 
| dc.contributor.department | AGÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü | en_US | 
| dc.contributor.institutionauthor | Yilmaz, Ümit |  | 
| dc.contributor.institutionauthor | Aydin, Zafer |  | 
| dc.contributor.institutionauthor | Güngör, V. Çağri |  | 
| dc.relation.journal | ACM International Conference Proceeding Series | en_US | 
| dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |