Written and edited by some of the world’s top experts in the field, this exciting new volume provides state-of-the-art research and latest technological breakthroughs in next-data wrangling, its theoretical concepts, practical applications, and tools for solving everyday problems.
Table of ContentsPreface
1. Basic Principles of Data WranglingAkshay Singh, Surender Singh and Jyotsna Rathee
1.1 Introduction
1.2 Data Workflow Structure
1.3 Raw Data Stage
1.3.1 Data Input
1.3.2 Output Actions at Raw Data Stage
1.3.3 Structure
1.3.4 Granularity
1.3.5 Accuracy
1.3.6 Temporality
1.3.7 Scope
1.4 Refined Stage
1.4.1 Data Design and Preparation
1.4.2 Structure Issues
1.4.3 Granularity Issues
1.4.4 Accuracy Issues
1.4.5 Scope Issues
1.4.6 Output Actions at Refined Stage
1.5 Produced Stage
1.5.1 Data Optimization
1.5.2 Output Actions at Produced Stage
1.6 Steps of Data Wrangling
1.7 Do’s for Data Wrangling
1.8 Tools for Data Wrangling
References
2. Skills and Responsibilities of Data WranglerPrabhjot Kaur, Anupama Kaushik and Aditya Kapoor
2.1 Introduction
2.2 Role as an Administrator (Data and Database)
2.3 Skills Required
2.3.1 Technical Skills
2.3.1.1 Python
2.3.1.2 R Programming Language
2.3.1.3 SQL
2.3.1.4 MATLAB
2.3.1.5 Scala
2.3.1.6 EXCEL
2.3.1.7 Tableau
2.3.1.8 Power BI
2.3.2 Soft Skills
2.3.2.1 Presentation Skills
2.3.2.2 Storytelling
2.3.2.3 Business Insights
2.3.2.4 Writing/Publishing Skills
2.3.2.5 Listening
2.3.2.6 Stop and Think
2.3.2.7 Soft Issues
2.4 Responsibilities as Database Administrator
2.4.1 Software Installation and Maintenance
2.4.2 Data Extraction, Transformation, and Loading
2.4.3 Data Handling
2.4.4 Data Security
2.4.5 Data Authentication
2.4.6 Data Backup and Recovery
2.4.7 Security and Performance Monitoring
2.4.8 Effective Use of Human Resource
2.4.9 Capacity Planning
2.4.10 Troubleshooting
2.4.11 Database Tuning
2.5 Concerns for a DBA
2.6 Data Mishandling and Its Consequences
2.6.1 Phases of Data Breaching
2.6.2 Data Breach Laws
2.6.3 Best Practices For Enterprises
2.7 The Long-Term Consequences: Loss of Trust and Diminished Reputation
2.8 Solution to the Problem
2.9 Case Studies
2.9.1 UBER Case Study
2.9.1.1 Role of Analytics and Business Intelligence in Optimization
2.9.1.2 Mapping Applications for City Ops Teams
2.9.1.3 Marketplace Forecasting
2.9.1.4 Learnings from Data
2.9.2 PepsiCo Case Study
2.9.2.1 Searching for a Single Source of Truth
2.9.2.2 Finding the Right Solution for Better Data
2.9.2.3 Enabling Powerful Results with Self-Service Analytics
2.10 Conclusion
References
3. Data Wrangling DynamicsSimarjit Kaur, Anju Bala and Anupam Garg
3.1 Introduction
3.2 Related Work
3.3 Challenges: Data Wrangling
3.4 Data Wrangling Architecture
3.4.1 Data Sources
3.4.2 Auxiliary Data
3.4.3 Data Extraction
3.4.4 Data Wrangling
3.4.4.1 Data Accessing
3.4.4.2 Data Structuring
3.4.4.3 Data Cleaning
3.4.4.4 Data Enriching
3.4.4.5 Data Validation
3.4.4.6 Data Publication
3.5 Data Wrangling Tools
3.5.1 Excel
3.5.2 Altair Monarch
3.5.3 Anzo
3.5.4 Tabula
3.5.5 Trifacta
3.5.6 Datameer
3.5.7 Paxata
3.5.8 Talend
3.6 Data Wrangling Application Areas
3.7 Future Directions and Conclusion
References
4. Essentials of Data WranglingMenal Dahiya, Nikita Malik and Sakshi Rana
4.1 Introduction
4.2 Holistic Workflow Framework for Data Projects
4.2.1 Raw Stage
4.2.2 Refined Stage
4.2.3 Production Stage
4.3 The Actions in Holistic Workflow Framework
4.3.1 Raw Data Stage Actions
4.3.1.1 Data Ingestion
4.3.1.2 Creating Metadata
4.3.2 Refined Data Stage Actions
4.3.3 Production Data Stage Actions
4.4 Transformation Tasks Involved in Data Wrangling
4.4.1 Structuring
4.4.2 Enriching
4.4.3 Cleansing
4.5 Description of Two Types of Core Profiling
4.5.1 Individual Values Profiling
4.5.1.1 Syntactic
4.5.1.2 Semantic
4.5.2 Set-Based Profiling
4.6 Case Study
4.6.1 Importing Required Libraries
4.6.2 Changing the Order of the Columns in the Dataset
4.6.3 To Display the DataFrame (Top 10 Rows) and Verify that the Columns are in Order
4.6.4 To Display the DataFrame (Bottom 10 rows) and Verify that the Columns Are in Order
4.6.5 Generate the Statistical Summary of the DataFrame for All the Columns
4.7 Quantitative Analysis
4.7.1 Maximum Number of Fires on Any Given Day
4.7.2 Total Number of Fires for the Entire Duration for Every State
4.7.3 Summary Statistics
4.8 Graphical Representation
4.8.1 Line Graph
4.8.2 Pie Chart
4.8.3 Bar Graph
4.9 Conclusion
References
5. Data Leakage and Data Wrangling in Machine Learning for Medical TreatmentP.T. Jamuna Devi and B.R. Kavitha
5.1 Introduction
5.2 Data Wrangling and Data Leakage
5.3 Data Wrangling Stages
5.3.1 Discovery
5.3.2 Structuring
5.3.3 Cleaning
5.3.4 Improving
5.3.5 Validating
5.3.6 Publishing
5.4 Significance of Data Wrangling
5.5 Data Wrangling Examples
5.6 Data Wrangling Tools for Python
5.7 Data Wrangling Tools and Methods
5.8 Use of Data Preprocessing
5.9 Use of Data Wrangling
5.10 Data Wrangling in Machine Learning
5.11 Enhancement of Express Analytics Using Data Wrangling Process
5.12 Conclusion
References
6. Importance of Data Wrangling in Industry 4.0Rachna Jain, Geetika Dhand, Kavita Sheoran and Nisha Aggarwal
6.1 Introduction
6.1.1 Data Wrangling Entails
6.2 Steps in Data Wrangling
6.2.1 Obstacles Surrounding Data Wrangling
6.3 Data Wrangling Goals
6.4 Tools and Techniques of Data Wrangling
6.4.1 Basic Data Munging Tools
6.4.2 Data Wrangling in Python
6.4.3 Data Wrangling in R
6.5 Ways for Effective Data Wrangling
6.5.1 Ways to Enhance Data Wrangling Pace
6.6 Future Directions
References
7. Managing Data Structure in RMittal Desai and Chetan Dudhagara
7.1 Introduction to Data Structure
7.2 Homogeneous Data Structures
7.2.1 Vector
7.2.2 Factor
7.2.3 Matrix
7.2.4 Array
7.3 Heterogeneous Data Structures
7.3.1 List
7.3.2 Dataframe
8. Dimension Reduction Techniques in Distributional Semantics: An Application Specific ReviewPooja Kherwa, Jyoti Khurana, Rahul Budhraj, Sakshi Gill, Shreyansh Sharma and Sonia Rathee
8.1 Introduction
8.2 Application Based Literature Review
8.3 Dimensionality Reduction Techniques
8.3.1 Principal Component Analysis
8.3.2 Linear Discriminant Analysis
8.3.2.1 Two-Class LDA
8.3.2.2 Three-Class LDA
8.3.3 Kernel Principal Component Analysis
8.3.4 Locally Linear Embedding
8.3.5 Independent Component Analysis
8.3.6 Isometric Mapping (Isomap)
8.3.7 Self-Organising Maps
8.3.8 Singular Value Decomposition
8.3.9 Factor Analysis
8.3.10 Auto-Encoders
8.4 Experimental Analysis
8.4.1 Datasets Used
8.4.2 Techniques Used
8.4.3 Classifiers Used
8.4.4 Observations
8.4.5 Results Analysis Red-Wine Quality Dataset
8.5 Conclusion
References
9. Big Data Analytics in Real Time for Enterprise Applications to Produce Useful IntelligencePrashant Vats and Siddhartha Sankar Biswas
9.1 Introduction
9.2 The Internet of Things and Big Data Correlation
9.3 Design, Structure, and Techniques for Big Data Technology
9.4 Aspiration for Meaningful Analyses and Big Data Visualization Tools
9.4.1 From Information to Guidance
9.4.2 The Transition from Information Management to Valuation Offerings
9.5 Big Data Applications in the Commercial Surroundings
9.5.1 IoT and Data Science Applications in the Production Industry
9.5.1.1 Devices that are Inter Linked
9.5.1.2 Data Transformation
9.5.2 Predictive Analysis for Corporate Enterprise Applications in the Industrial Sector
9.6 Big Data Insights’ Constraints
9.6.1 Technological Developments
9.6.2 Representation of Data
9.6.3 Data That Is Fragmented and Imprecise
9.6.4 Extensibility
9.6.5 Implementation in Real Time Scenarios
9.7 Conclusion
References
10. Generative Adversarial Networks: A Comprehensive ReviewJyoti Arora, Meena Tushir, Pooja Kherwa and Sonia Rathee
List of Abbreviations
10.1 Introductıon
10.2 Background
10.2.1 Supervised vs Unsupervised Learning
10.2.2 Generative Modeling vs Discriminative Modeling
10.3 Anatomy of a GAN
10.4 Types of GANs
10.4.1 Conditional GAN (CGAN)
10.4.2 Deep Convolutional GAN (DCGAN)
10.4.3 Wasserstein GAN (WGAN)
10.4.4 Stack GAN
10.4.5 Least Square GAN (LSGANs)
10.4.6 Information Maximizing GAN (INFOGAN)
10.5 Shortcomıngs of GANs
10.6 Areas of Applicatıon
10.6.1 Image
10.6.2 Video
10.6.3 Artwork
10.6.4 Music
10.6.5 Medicine
10.6.6 Security
10.7 Conclusion
References
11. Analysis of Machine Learning Frameworks Used in Image Processing: A ReviewGurpreet Kaur and Kamaljit Singh Saini
11.1 Introduction
11.2 Types of ML Algorithms
11.2.1 Supervised Learning
11.2.2 Unsupervised Learning
11.2.3 Reinforcement Learning
11.3 Applications of Machine Learning Techniques
11.3.1 Personal Assistants
11.3.2 Predictions
11.3.3 Social Media
11.3.4 Fraud Detection
11.3.5 Google Translator
11.3.6 Product Recommendations
11.3.7 Videos Surveillance
11.4 Solution to a Problem Using ML
11.4.1 Classification Algorithms
11.4.2 Anomaly Detection Algorithm
11.4.3 Regression Algorithm
11.4.4 Clustering Algorithms
11.4.5 Reinforcement Algorithms
11.5 ML in Image Processing
11.5.1 Frameworks and Libraries Used for ML Image Processing
11.6 Conclusion
References
12. Use and Application of Artificial Intelligence in Accounting and Finance: Benefits and ChallengesRam Singh, Rohit Bansal and Niranjanamurthy M.
12.1 Introduction
12.1.1 Artificial Intelligence in Accounting and Finance Sector
12.2 Uses of AI in Accounting & Finance Sector
12.2.1 Pay and Receive Processing
12.2.2 Supplier on Boarding and Procurement
12.2.3 Audits
12.2.4 Monthly, Quarterly Cash Flows, and Expense Management
12.2.5 AI Chatbots
12.3 Applications of AI in Accounting and Finance Sector
12.3.1 AI in Personal Finance
12.3.2 AI in Consumer Finance
12.3.3 AI in Corporate Finance
12.4 Benefits and Advantages of AI in Accounting and Finance
12.4.1 Changing the Human Mindset
12.4.2 Machines Imitate the Human Brain
12.4.3 Fighting Misrepresentation
12.4.4 AI Machines Make Accounting Tasks Easier
12.4.5 Invisible Accounting
12.4.6 Build Trust through Better Financial Protection and Control
12.4.7 Active Insights Help Drive Better Decisions
12.4.8 Fraud Protection, Auditing, and Compliance
12.4.9 Machines as Financial Guardians
12.4.10 Intelligent Investments
12.4.11 Consider the “Runaway Effect”
12.4.12 Artificial Control and Effective Fiduciaries
12.4.13 Accounting Automation Avenues and Investment Management
12.5 Challenges of AI Application in Accounting and Finance
12.5.1 Data Quality and Management
12.5.2 Cyber and Data Privacy
12.5.3 Legal Risks, Liability, and Culture Transformation
12.5.4 Practical Challenges
12.5.5 Limits of Machine Learning and AI
12.5.6 Roles and Skills
12.5.7 Institutional Issues
12.6 Suggestions and Recommendation
12.7 Conclusion and Future Scope of the Study
References
13. Obstacle Avoidance Simulation and Real-Time Lane Detection for AI-Based Self-Driving CarB. Eshwar, Harshaditya Sheoran, Shivansh Pathak and Meena Rao
13.1 Introduction
13.1.1 Environment Overview
13.1.1.1 Simulation Overview
13.1.1.2 Agent Overview
13.1.1.3 Brain Overview
13.1.2 Algorithm Used
13.1.2.1 Markovs Decision Process (MDP)
13.1.2.2 Adding a Living Penalty
13.1.2.3 Implementing a Neural Network
13.2 Simulations and Results
13.2.1 Self-Driving Car Simulation
13.2.2 Real-Time Lane Detection and Obstacle Avoidance
13.2.3 About the Model
13.2.4 Preprocessing the Image/Frame
13.3 Conclusion
References
14. Impact of Suppliers Network on SCM of Indian Auto Industry: A Case of Maruti Suzuki India LimitedRuchika Pharswan, Ashish Negi and Tridib Basak
14.1 Introduction
14.2 Literature Review
14.2.1 Prior Pandemic Automobile Industry/COVID-19 Thump on the Automobile Sector
14.2.2 Maruti Suzuki India Limited (MSIL) During COVID-19 and Other Players in the Automobile Industry and How MSIL Prevailed
14.3 Methodology
14.4 Findings
14.4.1 Worldwide Economic Impact of the Epidemic
14.4.2 Effect on Global Automobile Industry
14.4.3 Effect on Indian Automobile Industry
14.4.4 Automobile Industry Scenario That Can Be Expected Post COVID-19 Recovery
14.5 Discussion
14.5.1 Competitive Dimensions
14.5.2 MSIL Strategies
14.5.3 MSIL Operations and Supply Chain Management
14.5.4 MSIL Suppliers Network
14.5.5 MSIL Manufacturing
14.5.5 MSIL Distributors Network
14.5.6 MSIL Logistics Management
14.6 Conclusion
References
About the Editors
IndexBack to Top