Udemy – The Mahalanobis Distance Test for Outliers
Free Download Udemy – The Mahalanobis Distance Test for Outliers
Published 11/2024
Created by Akram Najjar
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz, 2 Ch
Genre: eLearning | Language: English | Duration: 11 Lectures ( 4h 19m ) | Size: 1.35 GB
Using a 7-Step Procedure, learn how calculate and validate the Mahalanobis Distance Test
What you'll learn
Calculating distances in datasets
A 7-Step procedure for calculating the Mahalanobis Distance Matrix
Using the Chi-Square DIstribution to identify outliers
Matrix operations and how it is used in the Mahalanobis Distance Matrix development
The Covariance Matrix and how it is calculated and used in the Mahalanobis Distance Matrix
Requirements
A working knowledge of Excel
A working knowledge of Matrix algebra (supported by a detailed Lecture in the course)
A working knowledge of statistical parameters: Covariance, Variance and Correlation (also suppored by a lecture in the course)
Description
A) The Purpose of the CourseIn most of the Machine Learning methods and algorithms that analyze datasets, there is a need to investigate how "close" or "far" the items of the dataset are to each other. This can allow analysts to look for outliers, anomalies, classify data items into clusters, establish if there are associations between the items or not and such issues.To do that, Machine Learning methods rely on the use of a mathematical concept: the distance between items in a dataset. We are used to consider distance as the length between two points. Mathematicians have a wider use of the term. A customer dataset consisting of 1000s of customers will have a set of M attributes about each customer. M can be in the 2 digit range. If M = 1, 2 or 3, we can visualize the distance between points in terms of charts. This stops being possible for M > 3. The distance becomes a mathematical expression consisting of a vector for each item in the dataset where the vector is a set of the instances or values of the attributes for each item in the dataset.What makes this more interesting is that there are various ways distances can be calculated: Euclidian, Manhattan, Minkovsky and Chebyshev distances. The Euclidian is the most common. However, with time, Machine Learning methods using the Euclidian Distance resulted in anomalies in the results giving invalid answers to the use of the distances.Since the Euclidian Distance is calculated in multivariate space by multiplying the Transpose of the dataset, PT with the dataset P. This is where Mr. Prasad Mahalanobis with his genius in statistics, came up with the idea: why not transform that dataset P before multiplying it by its transpose. This resolve a large number of issues with the Euclidian Distance.The objective of the course is to present a 7-Step procedure used to calculate the Mahalanobis Distances and from the resulting matrix, identify the outliers. Identification will be based on specifying a significance level (such as 0.1%, 1% and 5%).The course will also provide support lectures that are required as pre-requisites or knowledge and practices needed to apply the 7 steps.B) So, why do we Present a Course based on Excel?The course will then use Excel specifically for educational purposes and not as a machine learning tool. The course is not setup to show you how to use the 7-Step in real life. That would require more advanced programming environments. The course mainly aims to clarify the procedure for identifying outliers using the Mahalanobis Distance test. For that, Excel is used as an educational tool as it is easy to understand and is well known by mots business analysts.B) What Does the Course Cover?The course is made up of 3 sectionsSection 1: Introducing the CourseThis section consists of one lecture that presents the objectives of the course, its structure and resources as well as what to expect and what not to expect.Section 2: This is the heart of the course and consists of 7 Lectures:2) Introducing Distances, Specifically the Mahalanobis Distance A. Introducing Prasanta Chandra Mahalanobis B. Introducing Mahalanobis Distance C. The Importance of Measuring Distance between Items of Data3) Practices in our Data and the Matrix Representation of the Euclidian Distance D. Introducing Some Terms and Practices in our Data E. Starting with the Matrix Representation of the Euclidian Distance4) Shortcoming of Euclidian Distances F. Shortcoming of Euclidian Distances5) The 7-Step Procedure for Calculating Mahalanobis Distances G. The 7-Step Procedure for Calculating Mahalanobis Distances6) Conditions for the Covariance Matrix to be Positive Definite H. Interlude: Conditions for the Covariance Matrix to be Positive Definite7) How to Identify Outliers and More Examples I. Calculating the Mahalanobis Distance for the Two Equidistance Points J. How to Identify Outliers by using the Chi-Square DistributionSection 3: Support PresentationsThis section consists of 4 lectures covering material that should be known, as a pre-requisite, to appreciate and use the calculations in the 7-Step Procedure that results in identification of outliers through the Mahalanobis Distance test:8) Support - Matrices and Transformation9) Support - The Cholesky Decomposition10) Support - Multivariate Data and their Parameters K. Introducing Univariate and Bivariate Data and their Parameters L. Calculating the Variance, Standard Deviation and Covariance11) Support - Covariance and Correlation M. The Covariance Matrix N. 6 Methods for Calculating the Covariance Matrix O. Correlation P. How to Calculate the Correlation Coefficient RResourcesAll lectures will be supported by a variety of resources:· Each lecture will have its PowerPoint presentation uploaded in PDF format for your later use· Solved and documented workouts in Excel· Dedicated workbooks that animate and describe various probability distributions· Links to Interesting articles and books
Who this course is for
Data Scientists and Analysts
Machine Learning Engineers
Artificial Intelligence Researchers
Software Developers
Business Analysts
Market Researchers
Healthcare Professionals
Finance Professionals
Educators and Researchers
Cybersecurity Experts
Natural Language Processing (NLP) Specialists
Students embarking on machine learning and data science careers
Product Managers
Business Improvement Experts
Quality Assurance Professionals
Social Scientists
Homepage
https://www.udemy.com/course/the-mahalanobis-distance-test-for-outliers/
No Password - Links are Interchangeable