Search

Browse Subject Areas

For Authors

Submit a Proposal

Data Engineering and Data Science

Concepts and Applications
Edited by Kukatlapalli Pradeep Kumar, Aynur Unal, Vinay Jha Pillai, Hari Murthy, and M. Niranjanamurthy
Series: Advances in Data Engineering and Machine Learning
Copyright: 2024   |   Status: Published
ISBN: 9781119841876  |  Hardcover  |  
461 pages
Price: $225 USD
Add To Cart

One Line Description
Written and edited by one of the most prolific and well-known experts in the field and his team, this exciting new volume is the “one stop shop” for the concepts and applications of data science and engineering for data scientists across many industries.

Audience
Engineers, designers, researchers, and undergraduate, postgraduate and research students and faculty working in the areas of artificial intelligence, machine learning models, architectures and their applications

Description
The field of data science is incredibly broad, encompassing everything from cleaning data to deploying predictive models. However, it is rare for any single data scientist to be working across the spectrum day to day. Data scientists usually focus on a few areas and are complemented by a team of other scientists and analysts. Data engineering is also a broad field, but any individual data engineer doesn’t need to know the whole spectrum of skills. Data engineering is the aspect of data science that focuses on practical applications of data collection and analysis. For all the work that data scientists do to answer questions using large sets of information, there have to be mechanisms for collecting and validating that information.

In this exciting new volume, the team of editors and contributors sketch the broad outlines of data engineering, then walk through more specific descriptions that illustrate specific data engineering roles. Data-driven discovery is revolutionizing the modeling, prediction, and control of complex systems. This book brings together machine learning, engineering mathematics, and mathematical physics to integrate modeling and control of dynamical systems with modern methods in data science. It highlights many of the recent advances in scientific computing that enable data-driven methods to be applied to a diverse range of complex systems, such as turbulence, the brain, climate, epidemiology, finance, robotics, and autonomy. Whether for the veteran engineer or scientist working in the field or laboratory, or the student or academic, this is a must have for any library.

Back to Top
Author / Editor Details
Kukatlapalli Pradeep Kumar, PhD, is an associate professor and the Program Coordinator for Data Science at Christ University, Bangalore, India. He has 13 years of research and academic experience. He has published in many journals, and presented numerous conferences papers.

Aynur Unal, PhD, educated at Stanford University (class of ’73), has taught at Stanford University for almost 40 years and established the Acoustics Institute. Her work on “New Transform Domains for the Onset of Failures” received a prestigious research award.

Vinay Jha Pillai, PhD, is an associate professor in the Department of Electronics and Communication Engineering at CHRIST University, Bangalore, India. He has 12 years of academic experience and holds two patents. He has also completed two funded projects as principal investigator.

Hari Murthy, PhD, is a faculty member in the Department of Electronics and Communication Engineering, CHRIST University, Bengaluru, India. He finished his PhD from the University of Canterbury, New Zealand where his thesis was on novel anticorrosion materials. He has authored book chapters and published papers in international journals and conferences and has served as part of the program committees for several international conferences.

M. Niranjanamurthy, PhD, is an assistant professor in the Department of Computer Applications, M S Ramaiah Institute of Technology, Bangalore, Karnataka. He earned his PhD in computer science at JJTU, Rajasthan, India. He has over 11 years of teaching experience and two years of industry experience as a software engineer. He has published several books, and he is working on numerous books for Scrivener Publishing. He has published over 60 papers for scholarly journals and conferences, and he is working as a reviewer in 22 scientific journals. He also has numerous awards to his credit.

Back to Top

Table of Contents
Preface
1. Quality Assurance in Data Science: Need, Challenges and Focus

Jasmine K.S., Ajay D. K. and Aditya Raj
1.1 Introduction
1.1.1 Quality Assurance and Testing
1.1.2 Data Science and Quality Assurance
1.1.3 Background
1.2 Testing and Quality Assurance
1.2.1 Key Terminologies Associated With Testing
1.3 Product Quality and Test Efforts
1.3.1 Testing Metrics
1.3.2 How to Improve the Business Value to Products Using Test Automation
1.3.3 Data Analysis and Management in Test Automation
1.3.4 Data Models in Data Science
1.4 Data Masking in Data Model and Associated Risks
1.5 Prediction in Data Science
Case Study
1.6 Role of Metrics in Evaluation
1.7 Quantity of Data in Quality Assurance
1.8 Identifying the Right Data Sources
1.8.1 Need to Gather Up-to-Date Data
1.8.2 Synthesising Existing Advanced Technologies for Continuous Business Improvements
1.9 Conclusion
References
2. Design and Implementation of Social Media Mining – Knowledge Discovery Methods for Effective Digital Marketing Strategies
Prashant Bhat and Pradnya Malaganve
2.1 Introduction
2.1.1 Objectives of the Study
2.2 Literature Review
2.3 Novel Framework for Social Media Data Mining and Knowledge Discovery
2.4 Classification for Comparison Analysis
2.5 Clustering Methodology to Provide Digital Marketing Strategies
2.5.1 Status (Text Form)
2.5.2 Images (Photos)
2.5.3 Video Post
2.5.4 Link Post
2.6 Experimental Results
2.7 Conclusion
References
3. A Study on Big Data Engineering Using Cloud Data Warehouse
Manjunath T. N., Pushpa S. K., Ravindra S. Hegadi and Ananya Hathwar K. S.
3.1 Introduction
3.2 Comparison Study of Different Cloud Data Warehouses
3.2.1 Amazon Redshift
3.2.2 High-Level Architecture of Amazon Redshift
3.2.3 Features of Amazon Redshift Cloud Data Warehouse
3.2.4 Pricing of Amazon Redshift Cloud Data Warehouse
3.3 Snowflake Cloud Data Warehouse
3.3.1 High-Level Architecture of Snowflake Cloud Data Warehouse
3.3.2 Features of Snowflake Cloud Data Warehouse
3.3.3 Snowflake Cloud Data Warehouse Pricing
3.4 Google BigQuery Cloud Data Warehouse
3.4.1 High-Level Architecture of Google BigQuery Cloud Data Warehouse
3.4.2 Features of Google BigQuery Cloud Data Warehouse
3.4.3 Google BigQuery Cloud Data Warehouse Pricing
3.5 Microsoft Azure Synapse Cloud Data Warehouse
3.5.1 Microsoft Azure Synapse Cloud Data Warehouse Architecture
3.5.2 Features of Microsoft Azure Synapse Cloud Data Warehouse
3.5.3 Pricing of Microsoft Azure Synapse Cloud Data Warehouse
3.6 Informatica Intelligent Cloud Services (IICS)
3.6.1 Informatica Intelligent Cloud Services Architecture
3.6.2 Salient Features of Informatica Intelligent Cloud Services
3.6.3 Informatica Intelligent Cloud Services Pricing Model
3.7 Conclusion
Acknowledgements
References
4. Data Mining with Cluster Analysis Through Partitioning Approach of Huge Transaction Data
Sampath Kini K. and Karthik Pai B.H.
4.1 Introduction
4.2 Methodology Used in Proposed Cluster Analysis System
4.2.1 Design of Algorithms
4.3 Literature Survey on Existing Systems
4.3.1 Experimental Results
4.4 Conclusion
References
5. Application of Data Science in Macromodeling of Nonlinear Dynamical Systems
Nagaraj S., Seshachalam D. and Jayalatha G.
5.1 Introduction
5.2 Nonlinear Autonomous Dynamical System
5.3 Nonlinear System - MOR
5.3.1 Proper Orthogonal Decomposition
5.4 Data Science Life Cycle
5.4.1 Problem Identification
5.4.2 Identifying Available Data Sources and Data Collection
5.4.3 Data Processing
5.4.4 Data Exploration
5.4.5 Feature Extraction
5.4.6 Modeling
5.4.7 Model Performance Evaluation
5.5 Artificial Neural Network in Modeling
5.5.1 Machine Learning
5.5.2 Biological Neuron Model
5.5.3 Artificial Neural Networks
5.5.4 Network Topologies
5.5.4.1 NARX Neural Network
5.5.5 ANN Modeling Using Mathematical Models
5.6 Neuron Spiking Model Using FitzHugh-Nagumo (F-N) System
5.6.1 Linearization of F-N System
5.6.2 Reduced Order Model of Linear System
5.6.3 Finite Difference Discretization of F-N System
5.6.4 MOR of F-N System Using POD-Galerkin Method
5.7 Ring Oscillator Model
5.7.1 Model Order Reduction of Ring Oscillator Circuit
5.7.2 Ring Oscillator Circuit Approximation Using Linear System MOR
5.7.3 POD-ANN Macromodel of Ring Oscillator Circuit
5.8 Nonlinear VLSI Interconnect Model Using Telegraph Equation
5.8.1 Macromodeling of VLSI Interconnect
5.8.2 Discretisation of Interconnect Model
5.8.3 Linearization of VLSI Interconnect Model
5.8.4 Reduced Order Linear Model of VLSI Interconnect
5.9 Macromodel Using Machine Learning
5.9.1 Activation Function
5.9.2 Bayesian Regularization
5.9.3 Optimization
5.10 MOR of Dynamical Systems Using POD-ANN
5.10.1 Accuracy and Performance Index
5.11 Numerical Results
5.11.1 F-N System
5.11.2 Ring Oscillator Model
5.11.3 Reduced Order POD Approximation of Ring Oscillator
5.11.3.1 Study of POD-ANN Approximation of Ring Oscillator for Variation
in Amplitude of Input Signal and for Different Input Signals
5.11.3.2 POD-ANN Approximation of Ring Oscillator for Variation in Frequency
5.11.4 POD-ANN Approximation of VLSI Interconnect
5.12 Conclusion
References
Terminologies Used in this Chapter
6. Comparative Analysis of Various Ensemble Approaches for Web Page Classification
J. Dutta, Yong Woon Kim and Dalia Dominic
6.1 Introduction
6.2 Literature Survey
6.3 Material and Methods
6.4 Ensemble Classifiers
6.4.1 Bagging
6.4.1.1 Bagging Meta Estimator
6.4.1.2 Random Forest
6.4.2 Boosting
6.4.2.1 AdaBoost
6.4.2.2 Gradient Tree Boosting
6.4.2.3 XGBoost
6.4.3 Stacking
6.5 Results
6.5.1 Bagging Meta Estimator
6.5.2 Random Forest
6.5.3 AdaBoost
6.5.4 Gradient Tree Boosting
6.5.5 XGBoost
6.5.6 Stacking
6.5.7 Comparison with Single Classifiers
6.6 Conclusion
Acknowledgement
References
7. Feature Engineering and Selection Approach Over Malicious Image
P.M. Kavitha and B. Muruganantham
7.1 Introduction
7.2 Feature Engineering Techniques
7.2.1 Methodologies in Feature Engineering
7.2.2 Strides in Feature Engineering
7.2.3 Feature Extraction
7.2.4 Feature Selection
7.2.5 Feature Engineering in Image Processing
7.2.6 Importance of Feature Engineering in Image Processing
7.3 Malicious Feature Engineering
7.4 Image Processing Technique
7.4.1 Steps Involved in Image Processing Technique
7.4.2 Image Processing Task
7.4.2.1 Image Enhancement
7.4.2.2 Image Restoration
7.4.2.3 Coloring Image Processing
7.4.2.4 Wavelets Processing and Multiple Solutions
7.4.2.5 Image Compression
7.4.2.6 Character Recognition
7.4.2.7 Characteristics of Image Processing
7.5 Image Processing Techniques for Analysis on Malicious Images
7.6 Conclusion
References
Blog
8. Cubic-Regression and Likelihood Based Boosting GAM to Model Drug Sensitivity for Glioblastoma
Satyawant Kumar, Vinai George Biju, Ho-Kyoung Lee and Blessy Baby Mathew
8.1 Introduction
8.1.1 Glioblastoma
8.2 Literature Survey
8.3 Materials and Methods
8.3.1 Methodology
8.3.1.1 Generalized Additive Models (GAMs)
8.3.1.2 Model-Based Boosting – Boosted GAM
8.3.2 Datasets Description
8.4 Evaluations, Results and Discussions
8.4.1 Akaike Information Criterion (AIC)
8.4.2 Adjusted R-Squared
8.4.3 Discussion
Conclusion
References
9. Unobtrusive Engagement Detection through Semantic Pose Estimation and Lightweight ResNet for an Online Class Environment
Michael Moses Thiruthuvanathan, Balachandran Krishnan and Madhavi Rangaswamy
9.1 Introduction
9.2 Related Work
9.2.1 Analysis for a Classroom Environment
9.2.2 Pose Estimation
9.2.3 Face Alignment and Landmark Estimation
9.2.4 Deep Networks for Emotional Analysis
9.3 Proposed Methodology
9.3.1 Data Description
9.3.2 Facial Detection and Recognition
9.3.2.1 Face Detection
9.3.2.2 Facial Landmark Detection
9.3.3 Emotion Quantification
9.3.4 Pose Estimation
9.3.4.1 Facial Pose Estimation
9.4 Experimentation
9.5 Results and Discussions
Conclusion
References
10. Building Rule Base for Decision Making – A Fuzzy-Rough Approach
Sabu M. K., Neeraj Krishna M. S. and Reshmi R.
10.1 Introduction
10.2 Literature Review
10.3 Discretization of the Dataset Using Fuzzy Set Theory
10.4 Description of the Dataset
10.5 Process Involved in Proposed Work
10.6 Experiment
10.7 Evaluation Result
10.8 Discussion
Conclusion
References
11. An Effective Machine Learning Approach to Model Healthcare Data
Shaila H. Koppad, S. Anupama Kumar and Mohan Kumar
11.1 Introduction
11.2 Types of Data in Healthcare
11.3 Big Data in Healthcare
11.4 Different V’s of Big Data
11.5 About COPD
11.6 Methodology Implemented
Conclusion
References
12. Recommendation Engine for Retail Domain Using Machine Learning Techniques
Chandrashekhara K. T., Gireesh Babu C. N. and Thungamani M.
12.1 Introduction
12.2 Proposed System
12.2.1 Classification of Suppliers
12.2.2 Recommendation for Buyer
12.2.3 Forecasting Using ARIMA Model
12.3 Results
12.3.1 ARIMA Forecasting
12.4 Conclusion
References
13. Mining Heterogeneous Lung Cancer from Computer Tomography (CT) Scan with the Confusion Matrix
Denny Dominic and Krishnan Balachandran
13.1 Introduction
13.2 Literature Review
13.3 Methodology
13.3.1 Description of the Data
13.3.2 Image Preprocessing
13.3.3 Image Segmentation
13.3.4 Image Processing
13.3.5 Zero Component Analysis (ZCA) Whitening
13.3.6 Local Binary Pattern (LBP Feature)
13.3.7 LESH Vector
13.3.8 Local Energy Map and Orientation Map
13.3.9 Training with Deep Learning Methods
13.4 Result
13.4.1 Lorenz Curve
13.4.2 Confusion Matrix
13.4.3 Gini Coefficient
13.5 Conclusion and Future Scope
References
14. ML Algorithms and Their Approach on COVID-19 Data Analysis
Kambaluru Ashok, Penumalli Anvesh Reddy and Kukatlapalli Pradeep Kumar
14.1 Introduction
14.2 DataSet
14.2.1 Labeled Datasets
14.2.2 Unlabelled Datasets
14.2.3 COVID-19 Data
14.3 Types of Machine Learning Algorithms
14.3.1 Supervised Learning
14.3.2 Unsupervised Learning
14.3.3 Semi-Supervised Learning
14.3.4 Reinforcement Learning
14.4 Conclusion
References
15. Analysis and Design for the Early Stage Detection of Lung Diseases Using Machine Learning Algorithms
Sindhu Madhuri, Mahesh T. R., Vivek V., Shashikala H. K. and C. Saravanan
15.1 Introduction
15.2 Machine Learning Algorithms
15.2.1 Linear Regression
15.2.2 Logistic Regression
15.2.3 Decision Tree
15.2.4 Random Forest
15.2.5 Naïve Bayes
15.2.6 Support Vector Machine (SVM)
15.3 Evaluation Metrics and Comparative Results for Early Detection of Lung Diseases
15.3.1 Accuracy (A)
15.3.2 Precision (P) and Recall (R)
15.3.3 Mean Squared Error (MSE)
15.3.4 Matthews Correlation Coefficient (MCC)
15.4 Conclusion
References
16. Estimation of Cancer Risk through Artificial Neural Network
K. Aditya Shastry, Sanjay H. A., Balaji N. and Karthik Pai B. H.
16.1 Introduction
16.2 Case Studies Related to Cancer Risk Estimation Using ANN
16.2.1 ANN Technique for Early LC Detection
16.2.2 ANNs in Image Processing for Early Diagnosis of BC
16.3 Datasets Used in Cancer Risk Estimation
16.3.1 Datasets Related to Breast Cancer
16.3.2 Dataset Related to Lung Cancer
16.3.3 BC Coimbra Data Set and BC Wisconsin (Diagnostic) Data
16.3.4 Comparison of ANN Techniques with Other Methods for Cancer Risk Estimation
16.4 Discussion
16.5 Future Scope
16.6 Conclusion
References
17. Applications and Advancements in Data Science and Analytics
T. Mamatha, A. Balaram, B. Rama Subba Reddy, C. Shoba Bindu and M. Niranjanamurthy
17.1 Data Science and Analytics in Software Testing
17.2 Applications of Data Science and Analytics
17.3 Selenium Testing Tool in Data Science
17.3.1 Basic Techniques for Testing Voice-Based Applications
17.3.2 Image Web Scraping Using Selenium
17.4 Challenges and Advancements in Data Science
17.5 Data Science and Analytics Tools
17.6 Conclusion
References
About the Editors
Index


Back to Top



Description
Author/Editor Details
Table of Contents
Bookmark this page