Search

Browse Subject Areas

For Authors

Submit a Proposal

Data Wrangling

Concepts, Applications, and Tools
M. Niranjanamurthy, Kavita Sheoran, Geetika Dhand, and Prabhjot Kaur
Copyright: 2023   |   Status: Published
ISBN: 9781119879688  |  Hardcover  |  

Price: $225 USD
Add To Cart

One Line Description
Written and edited by some of the world’s top experts in the field, this exciting new volume provides state-of-the-art research and latest technological breakthroughs in next-data wrangling, its theoretical concepts, practical applications, and tools for solving everyday problems.

Audience
Engineers, scientists, students, faculty, and other industry professionals in computer applications, computer science and engineering, electronics, communication engineering, information technology, research, and software developers

Description
Data wrangling is the process of cleaning and unifying messy and complex data sets for easy access and analysis. This process typically includes manually converting and mapping data from one raw form into another format to allow for more convenient consumption and organization of the data. Data wrangling is increasingly ubiquitous at todays top firms.
Data cleaning focuses on removing inaccurate data from your data set whereas data wrangling focuses on transforming the data's format, typically by converting raw data into another format more suitable for use. Data wrangling is a necessary component of any business. Data wrangling solutions are specifically designed and architected to handle diverse, complex data at any scale, including many applications, such as Datameer, Infogix, Paxata, Talend, Tamr, TMMData, and Trifacta.
This book synthesizes the processes of data wrangling into a comprehensive overview, with a strong focus on recent and rapidly evolving agile analytic processes in data-driven enterprises, for businesses and other enterprises to use to find solutions for their everyday problems and practical applications. Whether for the veteran engineer, scientist, or other industry professional, this book is a must have for any library.

Back to Top
Author / Editor Details
M. Niranjanamurthy, PhD, is an assistant professor in the Department of Computer Applications, M S Ramaiah Institute of Technology, Bangalore, Karnataka. He earned his PhD in computer science at JJTU, Rajasthan, India. He has over 11 years of teaching experience and two years of industry experience as a software engineer. He has published several books, and he is working on numerous books for Scrivener Publishing. He has published over 60 papers for scholarly journals and conferences, and he is working as a reviewer in 22 scientific journals. He also has numerous awards to his credit.

Kavita Sheoran, PhD, she is an associate professor in the Computer Science Department, MSIT, Delhi, and she earned her PhD in computer science from Gautam Buddha University, Greater Noida. With over 17 years of teaching experience, she has published various papers in reputed journals and has published two books.

Geetika Dhand, PhD, is an associate professor in the Department of Computer Science and Engineering at Maharaja Surajmal Institute of Technology. After earning her PhD in computer science from Manav Rachna International Institute of Research and Studies, Faridabad, she has taught for over 17 years. She has published one book and a number of papers in technical journals.

Prabhjot Kaur has over 19 years of teaching experience and has earned two PhDs for her work in two different research areas. She has authored two books and more than 40 research papers in reputed journals and conferences. She also has one patent to her credit.

Back to Top

Table of Contents
Preface
1. Basic Principles of Data Wrangling

Akshay Singh, Surender Singh and Jyotsna Rathee
1.1 Introduction
1.2 Data Workflow Structure
1.3 Raw Data Stage
1.3.1 Data Input
1.3.2 Output Actions at Raw Data Stage
1.3.3 Structure
1.3.4 Granularity
1.3.5 Accuracy
1.3.6 Temporality
1.3.7 Scope
1.4 Refined Stage
1.4.1 Data Design and Preparation
1.4.2 Structure Issues
1.4.3 Granularity Issues
1.4.4 Accuracy Issues
1.4.5 Scope Issues
1.4.6 Output Actions at Refined Stage
1.5 Produced Stage
1.5.1 Data Optimization
1.5.2 Output Actions at Produced Stage
1.6 Steps of Data Wrangling
1.7 Do’s for Data Wrangling
1.8 Tools for Data Wrangling
References
2. Skills and Responsibilities of Data Wrangler
Prabhjot Kaur, Anupama Kaushik and Aditya Kapoor
2.1 Introduction
2.2 Role as an Administrator (Data and Database)
2.3 Skills Required
2.3.1 Technical Skills
2.3.1.1 Python
2.3.1.2 R Programming Language
2.3.1.3 SQL
2.3.1.4 MATLAB
2.3.1.5 Scala
2.3.1.6 EXCEL
2.3.1.7 Tableau
2.3.1.8 Power BI
2.3.2 Soft Skills
2.3.2.1 Presentation Skills
2.3.2.2 Storytelling
2.3.2.3 Business Insights
2.3.2.4 Writing/Publishing Skills
2.3.2.5 Listening
2.3.2.6 Stop and Think
2.3.2.7 Soft Issues
2.4 Responsibilities as Database Administrator
2.4.1 Software Installation and Maintenance
2.4.2 Data Extraction, Transformation, and Loading
2.4.3 Data Handling
2.4.4 Data Security
2.4.5 Data Authentication
2.4.6 Data Backup and Recovery
2.4.7 Security and Performance Monitoring
2.4.8 Effective Use of Human Resource
2.4.9 Capacity Planning
2.4.10 Troubleshooting
2.4.11 Database Tuning
2.5 Concerns for a DBA
2.6 Data Mishandling and Its Consequences
2.6.1 Phases of Data Breaching
2.6.2 Data Breach Laws
2.6.3 Best Practices For Enterprises
2.7 The Long-Term Consequences: Loss of Trust and Diminished Reputation
2.8 Solution to the Problem
2.9 Case Studies
2.9.1 UBER Case Study
2.9.1.1 Role of Analytics and Business Intelligence in Optimization
2.9.1.2 Mapping Applications for City Ops Teams
2.9.1.3 Marketplace Forecasting
2.9.1.4 Learnings from Data
2.9.2 PepsiCo Case Study
2.9.2.1 Searching for a Single Source of Truth
2.9.2.2 Finding the Right Solution for Better Data
2.9.2.3 Enabling Powerful Results with Self-Service Analytics
2.10 Conclusion
References
3. Data Wrangling Dynamics
Simarjit Kaur, Anju Bala and Anupam Garg
3.1 Introduction
3.2 Related Work
3.3 Challenges: Data Wrangling
3.4 Data Wrangling Architecture
3.4.1 Data Sources
3.4.2 Auxiliary Data
3.4.3 Data Extraction
3.4.4 Data Wrangling
3.4.4.1 Data Accessing
3.4.4.2 Data Structuring
3.4.4.3 Data Cleaning
3.4.4.4 Data Enriching
3.4.4.5 Data Validation
3.4.4.6 Data Publication
3.5 Data Wrangling Tools
3.5.1 Excel
3.5.2 Altair Monarch
3.5.3 Anzo
3.5.4 Tabula
3.5.5 Trifacta
3.5.6 Datameer
3.5.7 Paxata
3.5.8 Talend
3.6 Data Wrangling Application Areas
3.7 Future Directions and Conclusion
References
4. Essentials of Data Wrangling
Menal Dahiya, Nikita Malik and Sakshi Rana
4.1 Introduction
4.2 Holistic Workflow Framework for Data Projects
4.2.1 Raw Stage
4.2.2 Refined Stage
4.2.3 Production Stage
4.3 The Actions in Holistic Workflow Framework
4.3.1 Raw Data Stage Actions
4.3.1.1 Data Ingestion
4.3.1.2 Creating Metadata
4.3.2 Refined Data Stage Actions
4.3.3 Production Data Stage Actions
4.4 Transformation Tasks Involved in Data Wrangling
4.4.1 Structuring
4.4.2 Enriching
4.4.3 Cleansing
4.5 Description of Two Types of Core Profiling
4.5.1 Individual Values Profiling
4.5.1.1 Syntactic
4.5.1.2 Semantic
4.5.2 Set-Based Profiling
4.6 Case Study
4.6.1 Importing Required Libraries
4.6.2 Changing the Order of the Columns in the Dataset
4.6.3 To Display the DataFrame (Top 10 Rows) and Verify that the Columns are in Order
4.6.4 To Display the DataFrame (Bottom 10 rows) and Verify that the Columns Are in Order
4.6.5 Generate the Statistical Summary of the DataFrame for All the Columns
4.7 Quantitative Analysis
4.7.1 Maximum Number of Fires on Any Given Day
4.7.2 Total Number of Fires for the Entire Duration for Every State
4.7.3 Summary Statistics
4.8 Graphical Representation
4.8.1 Line Graph
4.8.2 Pie Chart
4.8.3 Bar Graph
4.9 Conclusion
References
5. Data Leakage and Data Wrangling in Machine Learning for Medical Treatment
P.T. Jamuna Devi and B.R. Kavitha
5.1 Introduction
5.2 Data Wrangling and Data Leakage
5.3 Data Wrangling Stages
5.3.1 Discovery
5.3.2 Structuring
5.3.3 Cleaning
5.3.4 Improving
5.3.5 Validating
5.3.6 Publishing
5.4 Significance of Data Wrangling
5.5 Data Wrangling Examples
5.6 Data Wrangling Tools for Python
5.7 Data Wrangling Tools and Methods
5.8 Use of Data Preprocessing
5.9 Use of Data Wrangling
5.10 Data Wrangling in Machine Learning
5.11 Enhancement of Express Analytics Using Data Wrangling Process
5.12 Conclusion
References
6. Importance of Data Wrangling in Industry 4.0
Rachna Jain, Geetika Dhand, Kavita Sheoran and Nisha Aggarwal
6.1 Introduction
6.1.1 Data Wrangling Entails
6.2 Steps in Data Wrangling
6.2.1 Obstacles Surrounding Data Wrangling
6.3 Data Wrangling Goals
6.4 Tools and Techniques of Data Wrangling
6.4.1 Basic Data Munging Tools
6.4.2 Data Wrangling in Python
6.4.3 Data Wrangling in R
6.5 Ways for Effective Data Wrangling
6.5.1 Ways to Enhance Data Wrangling Pace
6.6 Future Directions
References
7. Managing Data Structure in R
Mittal Desai and Chetan Dudhagara
7.1 Introduction to Data Structure
7.2 Homogeneous Data Structures
7.2.1 Vector
7.2.2 Factor
7.2.3 Matrix
7.2.4 Array
7.3 Heterogeneous Data Structures
7.3.1 List
7.3.2 Dataframe
8. Dimension Reduction Techniques in Distributional Semantics: An Application Specific Review
Pooja Kherwa, Jyoti Khurana, Rahul Budhraj, Sakshi Gill, Shreyansh Sharma and Sonia Rathee
8.1 Introduction
8.2 Application Based Literature Review
8.3 Dimensionality Reduction Techniques
8.3.1 Principal Component Analysis
8.3.2 Linear Discriminant Analysis
8.3.2.1 Two-Class LDA
8.3.2.2 Three-Class LDA
8.3.3 Kernel Principal Component Analysis
8.3.4 Locally Linear Embedding
8.3.5 Independent Component Analysis
8.3.6 Isometric Mapping (Isomap)
8.3.7 Self-Organising Maps
8.3.8 Singular Value Decomposition
8.3.9 Factor Analysis
8.3.10 Auto-Encoders
8.4 Experimental Analysis
8.4.1 Datasets Used
8.4.2 Techniques Used
8.4.3 Classifiers Used
8.4.4 Observations
8.4.5 Results Analysis Red-Wine Quality Dataset
8.5 Conclusion
References
9. Big Data Analytics in Real Time for Enterprise Applications to Produce Useful Intelligence
Prashant Vats and Siddhartha Sankar Biswas
9.1 Introduction
9.2 The Internet of Things and Big Data Correlation
9.3 Design, Structure, and Techniques for Big Data Technology
9.4 Aspiration for Meaningful Analyses and Big Data Visualization Tools
9.4.1 From Information to Guidance
9.4.2 The Transition from Information Management to Valuation Offerings
9.5 Big Data Applications in the Commercial Surroundings
9.5.1 IoT and Data Science Applications in the Production Industry
9.5.1.1 Devices that are Inter Linked
9.5.1.2 Data Transformation
9.5.2 Predictive Analysis for Corporate Enterprise Applications in the Industrial Sector
9.6 Big Data Insights’ Constraints
9.6.1 Technological Developments
9.6.2 Representation of Data
9.6.3 Data That Is Fragmented and Imprecise
9.6.4 Extensibility
9.6.5 Implementation in Real Time Scenarios
9.7 Conclusion
References
10. Generative Adversarial Networks: A Comprehensive Review
Jyoti Arora, Meena Tushir, Pooja Kherwa and Sonia Rathee
List of Abbreviations
10.1 Introductıon
10.2 Background
10.2.1 Supervised vs Unsupervised Learning
10.2.2 Generative Modeling vs Discriminative Modeling
10.3 Anatomy of a GAN
10.4 Types of GANs
10.4.1 Conditional GAN (CGAN)
10.4.2 Deep Convolutional GAN (DCGAN)
10.4.3 Wasserstein GAN (WGAN)
10.4.4 Stack GAN
10.4.5 Least Square GAN (LSGANs)
10.4.6 Information Maximizing GAN (INFOGAN)
10.5 Shortcomıngs of GANs
10.6 Areas of Applicatıon
10.6.1 Image
10.6.2 Video
10.6.3 Artwork
10.6.4 Music
10.6.5 Medicine
10.6.6 Security
10.7 Conclusion
References
11. Analysis of Machine Learning Frameworks Used in Image Processing: A Review
Gurpreet Kaur and Kamaljit Singh Saini
11.1 Introduction
11.2 Types of ML Algorithms
11.2.1 Supervised Learning
11.2.2 Unsupervised Learning
11.2.3 Reinforcement Learning
11.3 Applications of Machine Learning Techniques
11.3.1 Personal Assistants
11.3.2 Predictions
11.3.3 Social Media
11.3.4 Fraud Detection
11.3.5 Google Translator
11.3.6 Product Recommendations
11.3.7 Videos Surveillance
11.4 Solution to a Problem Using ML
11.4.1 Classification Algorithms
11.4.2 Anomaly Detection Algorithm
11.4.3 Regression Algorithm
11.4.4 Clustering Algorithms
11.4.5 Reinforcement Algorithms
11.5 ML in Image Processing
11.5.1 Frameworks and Libraries Used for ML Image Processing
11.6 Conclusion
References
12. Use and Application of Artificial Intelligence in Accounting and Finance: Benefits and Challenges
Ram Singh, Rohit Bansal and Niranjanamurthy M.
12.1 Introduction
12.1.1 Artificial Intelligence in Accounting and Finance Sector
12.2 Uses of AI in Accounting & Finance Sector
12.2.1 Pay and Receive Processing
12.2.2 Supplier on Boarding and Procurement
12.2.3 Audits
12.2.4 Monthly, Quarterly Cash Flows, and Expense Management
12.2.5 AI Chatbots
12.3 Applications of AI in Accounting and Finance Sector
12.3.1 AI in Personal Finance
12.3.2 AI in Consumer Finance
12.3.3 AI in Corporate Finance
12.4 Benefits and Advantages of AI in Accounting and Finance
12.4.1 Changing the Human Mindset
12.4.2 Machines Imitate the Human Brain
12.4.3 Fighting Misrepresentation
12.4.4 AI Machines Make Accounting Tasks Easier
12.4.5 Invisible Accounting
12.4.6 Build Trust through Better Financial Protection and Control
12.4.7 Active Insights Help Drive Better Decisions
12.4.8 Fraud Protection, Auditing, and Compliance
12.4.9 Machines as Financial Guardians
12.4.10 Intelligent Investments
12.4.11 Consider the “Runaway Effect”
12.4.12 Artificial Control and Effective Fiduciaries
12.4.13 Accounting Automation Avenues and Investment Management
12.5 Challenges of AI Application in Accounting and Finance
12.5.1 Data Quality and Management
12.5.2 Cyber and Data Privacy
12.5.3 Legal Risks, Liability, and Culture Transformation
12.5.4 Practical Challenges
12.5.5 Limits of Machine Learning and AI
12.5.6 Roles and Skills
12.5.7 Institutional Issues
12.6 Suggestions and Recommendation
12.7 Conclusion and Future Scope of the Study
References
13. Obstacle Avoidance Simulation and Real-Time Lane Detection for AI-Based Self-Driving Car
B. Eshwar, Harshaditya Sheoran, Shivansh Pathak and Meena Rao
13.1 Introduction
13.1.1 Environment Overview
13.1.1.1 Simulation Overview
13.1.1.2 Agent Overview
13.1.1.3 Brain Overview
13.1.2 Algorithm Used
13.1.2.1 Markovs Decision Process (MDP)
13.1.2.2 Adding a Living Penalty
13.1.2.3 Implementing a Neural Network
13.2 Simulations and Results
13.2.1 Self-Driving Car Simulation
13.2.2 Real-Time Lane Detection and Obstacle Avoidance
13.2.3 About the Model
13.2.4 Preprocessing the Image/Frame
13.3 Conclusion
References
14. Impact of Suppliers Network on SCM of Indian Auto Industry: A Case of Maruti Suzuki India Limited
Ruchika Pharswan, Ashish Negi and Tridib Basak
14.1 Introduction
14.2 Literature Review
14.2.1 Prior Pandemic Automobile Industry/COVID-19 Thump on the Automobile Sector
14.2.2 Maruti Suzuki India Limited (MSIL) During COVID-19 and Other Players in the Automobile Industry and How MSIL Prevailed
14.3 Methodology
14.4 Findings
14.4.1 Worldwide Economic Impact of the Epidemic
14.4.2 Effect on Global Automobile Industry
14.4.3 Effect on Indian Automobile Industry
14.4.4 Automobile Industry Scenario That Can Be Expected Post COVID-19 Recovery
14.5 Discussion
14.5.1 Competitive Dimensions
14.5.2 MSIL Strategies
14.5.3 MSIL Operations and Supply Chain Management
14.5.4 MSIL Suppliers Network
14.5.5 MSIL Manufacturing
14.5.5 MSIL Distributors Network
14.5.6 MSIL Logistics Management
14.6 Conclusion
References
About the Editors
Index


Back to Top



Description
Author/Editor Details
Table of Contents
Bookmark this page