Performance evolution for sentiment classification using machine learning algorithm

Authors

  • Faisal Hassan University of Karachi
  • Naseem Afzal Qureshi University of Karachi
  • Muhammad Zohaib Khan Shaheed Mohtarma Benazir Bhutto Institute of Trauma
  • Muhammad Ali Khan Mehran UET
  • Abdul Salam Soomro Mehran UET
  • Aisha Imroz Avanza Solutions (Pvt.) Ltd
  • Hussain Bux Marri BBSUTSD

DOI:

https://doi.org/10.4995/jarte.2023.19306

Keywords:

Machine Learning, K-Means, Logistic Regression, Random Forest, Decision Tree Algorithms

Abstract

Machine Learning (ML) is an Artificial Intelligence (AI) approach that allows systems to adapt to their environment based on past experiences. Machine Learning (ML) and Natural Language Processing (NLP) techniques are commonly used in sentiment analysis and Information Retrieval Techniques (IRT). This study supports the use of ML approaches, such as K-Means, to produce accurate outcomes in clustering and classification approaches. The main objective of this research is to explore the methods for sentiment classification and Information Retrieval Techniques (IRT). So, a combination of different machine learning algorithms is used with a dataset from amazon unlocked mobile reviews and telecom tweets to achieve better accuracy as it is crucial to consider the previous predictions related to sentiment classification and IRT. The datasets consist of user reviews ratings and algorithms utilized consist of K-Means Clustering algorithm, Logistic Regression (LR), Random Forest (RF), and Decision Tree (DT) algorithms. The amalgamation of each algorithm with the K-Means resulted in high levels of accuracy. Specifically, the K-Means combined with Logistic Regression (LR) yielded an accuracy rate of 99.98%. Similarly, the K-Means integrated with Random Forest (RF) resulted in an accuracy of 99.906%. Lastly, when the K-Means was merged with the Decision Tree (DT) Algorithm, the accuracy obtained was 99.83%.We exhibited that we could foresee efficient, effective, and accurate outcomes.

Downloads

Download data is not yet available.

Author Biographies

Faisal Hassan, University of Karachi

Department of Mathematics, Faculty of Science

Naseem Afzal Qureshi, University of Karachi

 Department of Computer Science, Faculty of Science

Muhammad Zohaib Khan, Shaheed Mohtarma Benazir Bhutto Institute of Trauma

Software and Data Engineer

Muhammad Ali Khan, Mehran UET

 Professor (Assistant), Industrial Engineering and Management

Abdul Salam Soomro, Mehran UET

Professor & Chairman, Industrial Engineering and Management

Aisha Imroz, Avanza Solutions (Pvt.) Ltd

 Software Engineer

Hussain Bux Marri, BBSUTSD

Professor (Meritorious) & Dean Faculty of Engineering Technology

References

Abad-Segura, E., González-Zamar, M.-D., Infante-Moro, J.C., & Ruipérez García, G. (2020). Sustainable management of digital transformation in higher education: Global research trends. Sustainability, 12(5), 2107. https://doi.org/10.3390/su12052107

Abualigah, L.M., Khader, A.T., & Hanandeh, E.S. (2018). A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering? Intelligent Decision Technologies, 12(1), 3-14. https://doi.org/10.3233/IDT-170318

Alharbi, A.S.M., & de Doncker, E. (2019). Twitter sentiment analysis with a deep neural network: An enhanced approach using user behavioral information. Cognitive Systems Research, 54, 50-61. https://doi.org/10.1016/j.cogsys.2018.10.001

Arain, M.S., Khan, M.A., & Kalwar, M.A. (2020). Optimization of Target Calculation Method for Leather Skiving and Stamping: Case of Leather Footwear Industry. International Journal of Business Education and Management Studies (IJBEMS), 7(1), 15-30. https://www.ijbems.com/doc/IJBEMS-137.pdf

Baig, M.A., Shaikh, S.A., Khatri, K.K., Shaikh, M.A., Khan, M.Z., & Rauf, M.A. (2023). Prediction of Students Performance Level Using Integrated Approach of ML Algorithms. International Journal of Emerging Technologies in Learning, 18(1), 216-234. https://doi.org/10.3991/ijet.v18i01.35339

Bansal, J.C., Sharma, H., Jadon, S.S., & Clerc, M. (2014). Spider monkey optimization algorithm for numerical optimization. Memetic Computing, 6, 31-47. https://doi.org/10.1007/s12293-013-0128-0

Benavides, L.M.C., Tamayo Arias, J.A., Arango Serna, M.D., Branch Bedoya, J.W., & Burgos, D. (2020). Digital transformation in higher education institutions: A systematic literature review. Sensors, 20(11), 3291. https://doi.org/10.3390/s20113291

Boateng, E.Y., Otoo, J., & Abaye, D.A. (2020). Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: a review. Journal of Data Analysis and Information Processing, 8(4), 341-357. https://doi.org/10.4236/jdaip.2020.84020

Bouazizi, M., & Ohtsuki, T. (2017). A pattern-based approach for multi-class sentiment analysis in Twitter. IEEE Access, 5, 20617-20639. https://doi.org/10.1109/ACCESS.2017.2740982

Bouazizi, M., & Ohtsuki, T. (2018). Multi-class sentiment analysis in Twitter: What if classification is not the answer. IEEE Access, 6, 64486-64502. https://doi.org/10.1109/ACCESS.2018.2876674

Brownlee, J. (2016). Supervised and Unsupervised Machine Learning Algorithms. Machine Learning Mastery, 6(3). https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/

Brownlee, J. (2019). Machine learning mastery with Weka. Ebook. Edition, 1(4).

Buriro, M.A., Rahoo, L.A., Nagar, Muhammad Ali Khan; Kalhoro, M., Kalhoro, S., & Halepota, A.A. (2018). Social Media used for promoting the Libraries and Information Resources and services at University Libraries of Sindh Province. Proceedings of IEEE International Conference on Innovative Research and Development (ICIRD). https://doi.org/10.1109/ICIRD.2018.8376293

Channar, P.B., Ahmed, G., Thebo, J.A., Khan, M.A., & Rahoo, L.A. (2023). Factors Of Knowledge Sharing Among Faculty Members In Higher Educational Institutions: An Empirical Study Of The Public Sector. Journal of Positive School Psychology, 7(4), 1498-1506. https://journalppw.com/index.php/jpsp/article/view/16622

Chaudhry, A.K., Kalwar, M.A., Khan, M.A., & Shaikh, S.A. (2021). Improving the Efficiency of Small Management Information

System by Using VBA. International Journal of Science and Engineering Investigations, 10(111), 7-13. http://www.ijsei.com/papers/ijsei-1011121-02.pdf

Chauhan, N.S. (2020). Decision tree algorithm, explained. KDnuggets,[Online]. Available: https://www.kdnuggets.com/2020/01/Decision-Tree-Algorithm-Explained.html .[Accessed 16 April 2021].

Chugh, A., Sharma, V.K., Kumar, S., Nayyar, A., Qureshi, B., Bhatia, M.K., & Jain, C. (2021). Spider monkey crow optimization algorithm with deep learning for sentiment classification and information retrieval. IEEE Access, 9, 24249-24262. https://doi.org/10.1109/ACCESS.2021.3055507

Dabbura, I. (2018). K-means clustering: Algorithm, applications, evaluation methods, and drawbacks. Towards Data Science.

Datavedas. (2018). Classification Problems. Datavedas Classification Problems.

Ducange, P., Fazzolari, M., Petrocchi, M., & Vecchio, M. (2019). An effective Decision Support System for social media listening based on cross-source sentiment analysis models. Engineering Applications of Artificial Intelligence, 78, 71-85. https://doi.org/10.1016/j.engappai.2018.10.014

Gao, L., Wang, Y., Li, D., Shao, J., & Song, J. (2017). Real-time social media retrieval with spatial, temporal and social constraints. Neurocomputing, 253, 77-88. https://doi.org/10.1016/j.neucom.2016.11.078

Golubic, S., & Marusic, D. (1999). Reviews and inspections-an approach to the improvement of telecom software development process. Proceedings ConTEL, 99, 283-290.

Hassan, A.U., Hussain, J., Hussain, M., Sadiq, M., & Lee, S. (2017). Sentiment analysis of social networking sites (SNS) data using machine learning approach for the measurement of depression. 2017 International Conference on Information and Communication Technology Convergence (ICTC), 138-140. https://doi.org/10.1109/ICTC.2017.8190959

Injadat, M., Moubayed, A., Nassif, A.B., & Shami, A. (2021). Machine learning towards intelligent systems: applications, challenges, and opportunities. Artificial Intelligence Review, 54, 3299-3348. https://doi.org/10.1007/s10462-020-09948-w

Iqbal, F., Hashmi, J.M., Fung, B.C.M., Batool, R., Khattak, A.M., Aleem, S., & Hung, P.C.K. (2019). A hybrid framework for sentiment analysis using genetic algorithm based feature reduction. IEEE Access, 7, 14637-14652. https://doi.org/10.1109/ACCESS.2019.2892852

Jianqiang, Z., Xiaolin, G., & Xuejun, Z. (2018). Deep convolution neural networks for twitter sentiment analysis. IEEE Access, 6, 23253-23260. https://doi.org/10.1109/ACCESS.2017.2776930

Kaggle. (2023). Amazon Reviews: Unlocked Mobile Phones. https://www.kaggle.com/datasets/PromptCloudHQ/amazon-reviews-unlocked-mobile-phones

Kalwar, M.A., & khan. (2020). Optimization of Procurement & Purchase Order Process in Foot Wear Industry by Using VBA in Ms Excel. International Journal of Business Education and Management Studies (IJBEMS), 6(1), 213-220. https://ijbems.com/doc/IJBEMS-124.pdf

Kalwar, M.A., & Khan, M.A. (2020a). Increasing performance of footwear stitching line by installation of auto-trim stitching machines. Journal of Applied Research in Technology & Engineering (JARTE), 1(1), 31. https://doi.org/10.4995/jarte.2020.13788

Kalwar, M.A., & Khan, M.A. (2020b). Optimization of Procurement & Purchase Order Process in Foot Wear Industry by Using VBA in Ms Excel. International Journal of Business Education and Management Studies (IJBEMS), 5(2), 80-100.

Kalwar, M.A., Khan, M.A., Shahzad, M.F., Wadho, M.H., & Marri, H.B. (2022). Development of linear programming model for optimization of product mix and maximization of profit: case of leather industry. Journal of Applied Research in Technology & Engineering (JARTE), 3(1), 67-78. https://doi.org/10.4995/jarte.2022.16391

Kalwar, M.A., Marri, H.B., & Khan, M.A. (2021). Performance Improvement of Sale Order Detail Preparation by Using Visual Basic for Applications: A Case Study of Footwear Industry. International Journal of Business Education and Management Studies (IJBEMS), 3(1), 1-22. https://ijbems.com/doc/IJBEMS-159.pdf

Kalwar, M.A., Shahzad, M.F., Wadho, M.H., Khan, M.A., & Shaikh, S.A. (2022). Automation of order costing analysis by using Visual Basic for applications in Microsoft Excel. Journal of Applied Research in Technology & Engineering (JARTE), 3(1), 29-59. https://doi.org/10.4995/jarte.2022.16390

Kalwar, M.A., Shaikh, S.A., Khan, M.A., & Malik, T.S. (2020). Optimization of Vendor Rate Analysis Report Preparation Method by Using Visual Basic for Applications in Excel (Case Study of Footwear Company of Lahore). Proceedings of the International Conference on Industrial Engineering and Operations Management (IEOM, Dhaka, Bangladesh, December 26-27. https://ieomsociety.org/proceedings/2021dhaka/228.pdf

Kalwar, M.A., Wassan, A.N., Phul, Z., & Wadho, M.H., Malik, T.S., Khan, M.A. (2023). Automation of material cost comparative analysis report using VBA Excel: a case of footwear company of Lahore. Journal of Applied Research in Technology & Engineering (JARTE), 4(1), 13-23. https://doi.org/10.4995/jarte.2023.18776

Khan, M.A., Kalwar, M.A., & Chaudhry, A.K. (2021). Optimization of material delivery time analysis by using Visual Basic for applications in Excel. Journal of Applied Research in Technology & Engineering (JARTE), 2(2), 89. https://doi.org/10.4995/jarte.2021.14786

Khan, M.A., Kalwar, M.A., Malik, A.J., Malik, T.S., & Chaudhry, A.K. (2021). Automation of Supplier Price Evaluation Report in MS Excel by Using Visual Basic for Applications: A Case of Footwear Industry. International Journal of Science and Engineering Investigations (IJSEI), 10(113), 49-60. http://www.ijsei.com/papers/ijsei-1011321-08.pdf

Khan, M.Z., Khan, A.A., Laghari, A.A., Shaikh, Z.A., Kaimkhani, M.A., Morkovkin, D., Gavel, O., Shkodinsky, S., Taburov, D., & Makar, S. (2022). Comparative case study: an evaluation of performance computation between support vector machine, K-nearest comparative study: Evaluation of performance computation between support vector component analysis. Journal of Tianjin University Science and Technology, April. https://doi.org/10.17605/OSF.IO/HK3SF

Khan, M.Z., Shaikh, S.A., Shaikh, M.A., Khatri, K.K., Mahira Abdul Rauf, Kalhoro, A., & Muhammad, A. (2023). The Performance Analysis of Machine Learning Algorithms for Credit Card Fraud Detection. International Journal of Online and Biomedical Engineering (IJOE), 19(03), 82-98. https://doi.org/10.3991/ijoe.v19i03.35331

Khan, M.Z., Zaman, F.U., Adnan, M., Imroz, A., & Rauf, M.A. (2022). Comparative Case Study: An Evaluation of Performance Computation Between SQL And NoSQL Database. Sindh Journal of Headways in Software Engineering (SJHSE), 01(02), 14-23.

Kumar, S., Nayyar, A., Nguyen, N.G., & Kumari, R. (2020). Hyperbolic spider monkey optimization algorithm. Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), 13(1), 35-42. https://doi.org/10.2174/2213275912666181207155334

Kumar, S., Sharma, B., Sharma, V.K., & Poonia, R.C. (2021). Automated soil prediction using bag-of-features and chaotic spider monkey optimization algorithm. Evolutionary Intelligence, 14, 293-304. https://doi.org/10.1007/s12065-018-0186-9

Kumar, S., Sharma, B., Sharma, V.K., Sharma, H., & Bansal, J.C. (2020). Plant leaf disease identification using exponential spider monkey optimization. Sustainable Computing: Informatics and Systems, 28, 100283. https://doi.org/10.1016/j.suscom.2018.10.004

Li, L., Xu, Q., Gan, T., Tan, C., & Lim, J.-H. (2017). A probabilistic model of social working memory for information retrieval in social interactions. IEEE Transactions on Cybernetics, 48(5), 1540-1552. https://doi.org/10.1109/TCYB.2017.2706027

Mansour, S. (2018). Social media analysis of user's responses to terrorism using sentiment analysis and text mining. Procedia Computer Science, 140, 95-103. https://doi.org/10.1016/j.procs.2018.10.297

Mata-Rivera, F., Torres-Ruiz, M., Guzman, G., Moreno-Ibarra, M., & Quintero, R. (2015). A collaborative learning approach for geographic information retrieval based on social networks. Computers in Human Behavior, 51, 829-842. https://doi.org/10.1016/j.chb.2014.11.069

Mataoui, M., Sebbak, F., Benhammadi, F., & Bey, K.B. (2015). Query expansion in XML information retrieval: A new approach for terms selection. 2015 6th International Conference on Modeling, Simulation, and Applied Optimization (ICMSAO), 1-4. https://doi.org/10.1109/ICMSAO.2015.7152208

Matt, C., Hess, T., & Benlian, A. (2015). Digital transformation strategies. Business & Information Systems Engineering, 57, 339-343. https://doi.org/10.1007/s12599-015-0401-5

Mbaabu, O. (2020). Introduction to random forest in machine learning. Berreskuratua-(e) Tik https://www.Section.Io/Engineering-Education/Introduction-to-Random-Forest-in-Machine-Learning.

Memon, M., Khan, M.A., & Rahoo, L.A. (2020). Usage and Availability of Information and Communication Technology Applications Facilities at Central Library. International Research Journal in Computer Science and Technology (IRJCST), 1(1), 86-92. http://irjcst.com/index.php/irjcst/article/view/7/6

Munjal, P., Kumar, L., Kumar, S., & Banati, H. (2019). Evidence of Ostwald Ripening in opinion driven dynamics of mutually competitive social networks. Physica A: Statistical Mechanics and Its Applications, 522, 182-194. https://doi.org/10.1016/j.physa.2019.01.109

Munjal, P., Kumar, S., Kumar, L., & Banati, A. (2017). Opinion dynamics through natural phenomenon of grain growth and population migration. Hybrid Intelligence for Social Networks, 161-175. https://doi.org/10.1007/978-3-319-65139-2_7

Munjal, P., Narula, M., Kumar, S., & Banati, H. (2018). Twitter sentiments based suggestive framework to predict trends. Journal of Statistics and Management Systems, 21(4), 685-693. https://doi.org/10.1080/09720510.2018.1475079

Nagar, M.A.K., Kalhoro, M., & Kalhoro, S. (2018). Information Seeking Behavior of Research Scholars at MUET Library & Online Information Center, Jamshoro: A Study. Journal of Library Philosophy and Practice, August, 1-8.

Nagar, M.A.K., Rahoo, L.A., Rehman, H.A., & Arshad, S. (2018). Education management information systems in the primary schools of sindh a case study of hyderabad division. 2018 IEEE 5th International Conference on Engineering Technologies and Applied Sciences (ICETAS), 1-5. https://doi.org/10.1109/ICETAS.2018.8629249

Nitze, I., Schulthess, U., & Asche, H. (2012). Comparison of machine learning algorithms random forest, artificial neural network and support vector machine to maximum likelihood for supervised crop type classification. Proceedings of the 4th GEOBIA, Rio de Janeiro, Brazil, 79, 3540.

Pant, A. (2019). Introduction to logistic regression. Average. Towards Data Science.

Rahoo, L.A., Khan, M.A., Buriro, M.A., Baladi, Z.H., & Abbasi, M.S. (2020). Evaluation of Information Services from the Perspective of Faculties and Evaluation of Information Services from the Perspective of Faculties and Students of Mehran University Engineering and Technology, Jamshoro Pakistan. International Journal of Disaster Recovery and Business Continuity, 11(1), 1526-1538. http://sersc.org/journals/index.php/IJDRBC/article/view/20339

Rahoo, L.A., Nagar, M.A.K., & Bhutto, A. (2019). The Use of Information Retrieval Tools by the Postgraduate Students of Higher Educational Institutes of Pakistan. Asian Journal of Contemporary Education, 3(1), 59-64. https://doi.org/10.18488/journal.137.2019.31.59.64

Reis, I., Baron, D., & Shahaf, S. (2018). Probabilistic random forest: A machine learning algorithm for noisy data sets. The Astronomical Journal, 157(1), 16. https://doi.org/10.3847/1538-3881/aaf101

Reno, U. (2023). Intelligent Systems. Department of Computer Science & Engineering, University of Nevada, Reno, USA. https://www.unr.edu/cse/undergraduates/prospective-students/what-are-intelligent-systems

Riverside, U. (2023). Intelligent Systems. Department of Electrical and Computer Engineering, University of California, Riverside, USA. https://www.ece.ucr.edu/research/intelligentsystems

Sarker, I.H. (2021). Machine learning: Algorithms, real-world applications and research directions. SN Computer Science, 2(3), 160. https://doi.org/10.1007/s42979-021-00592-x

Schott, M. (2019). Random forest algorithm for machine learning. Medium. Com. https://medium.com/capital-one-tech/random-forest-algorithm-for-machine-learning-C4b2c8cc9feb (Erişim 4 Ocak 2021).

Schütze, H., Manning, C.D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 39). Cambridge University Press Cambridge. https://doi.org/10.1017/CBO9780511809071

Shah, I., El Affendi, M., & Qureshi, B. (2020). SRide: An online system for multi-hop ridesharing. Sustainability, 12(22), 9633. https://doi.org/10.3390/su12229633

Sharma, A., Sharma, A., Panigrahi, B.K., Kiran, D., & Kumar, R. (2016). Ageist spider monkey optimization algorithm. https://doi.org/10.1016/j.swevo.2016.01.002

Swarm and Evolutionary Computation, 28, 58-77. https://doi.org/10.1016/j.swevo.2016.01.002

Sheldon, R., & Wigmore, I. (2023). Intelligent System. Techtarget Network. https://www.techtarget.com/whatis/definition/intelligent-system

Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4), 35-43.

Tess, P.A. (2013). The role of social media in higher education classes (real and virtual)-A literature review. Computers in Human Behavior, 29(5), A60-A68. https://doi.org/10.1016/j.chb.2012.12.032

Tutorialspoint. (2023). Classification Algorithms - Random Forest. Machine Learning with Python, Tutorialspoint. Classification Algorithms - Random Forest

Vial, G. (2019). Understanding digital transformation: A review and a research agenda. The Journal of Strategic Information Systems, 28(2), 118-144. https://doi.org/10.1016/j.jsis.2019.01.003

Virmani, C., Juneja, D., & Pillai, A. (2018). Design of query processing system to retrieve information from social network using NLP. KSII Transactions on Internet and Information Systems (TIIS), 12(3), 1168-1188. https://doi.org/10.3837/tiis.2018.03.011

Zaman, F.U., Khuhro, M.A., Kumar, K., Mirbahar, N., Khan, Z., & Kalhoro, A. (2021). Comparative Case Study Difference Between Azure Cloud SQL and Mongo Atlas MongoDB NoSQL Database. International Journal of Emerging Trends in Engineering Research, 9(7), 999-1002. https://doi.org/10.30534/ijeter/2021/26972021

Zhang, L., Tan, J., Han, D., & Zhu, H. (2017). From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discovery Today, 22(11), 1680-1685. https://doi.org/10.1016/j.drudis.2017.08.010

Downloads

Published

2023-05-31

Issue

Section

Articles