Showing results for Humans Background Vector Vector
GitHub Repo
https://github.com/vrajpatel9988/Human-Activity-Recognition-Using-OpenCV
vrajpatel9988/Human-Activity-Recognition-Using-OpenCV
Human-Activity-Recognition-Using-OpenCV is a Python implementation of human activity recognition using the OpenCV computer vision library. It uses background subtraction, motion detection, and feature extraction to classify activities based on input video data, with a support vector machine (SVM) algorithm.
GitHub Repo
https://github.com/Aastha2104/Parkinson-Disease-Prediction
Aastha2104/Parkinson-Disease-Prediction
Introduction Parkinson’s Disease is the second most prevalent neurodegenerative disorder after Alzheimer’s, affecting more than 10 million people worldwide. Parkinson’s is characterized primarily by the deterioration of motor and cognitive ability. There is no single test which can be administered for diagnosis. Instead, doctors must perform a careful clinical analysis of the patient’s medical history. Unfortunately, this method of diagnosis is highly inaccurate. A study from the National Institute of Neurological Disorders finds that early diagnosis (having symptoms for 5 years or less) is only 53% accurate. This is not much better than random guessing, but an early diagnosis is critical to effective treatment. Because of these difficulties, I investigate a machine learning approach to accurately diagnose Parkinson’s, using a dataset of various speech features (a non-invasive yet characteristic tool) from the University of Oxford. Why speech features? Speech is very predictive and characteristic of Parkinson’s disease; almost every Parkinson’s patient experiences severe vocal degradation (inability to produce sustained phonations, tremor, hoarseness), so it makes sense to use voice to diagnose the disease. Voice analysis gives the added benefit of being non-invasive, inexpensive, and very easy to extract clinically. Background Parkinson's Disease Parkinson’s is a progressive neurodegenerative condition resulting from the death of the dopamine containing cells of the substantia nigra (which plays an important role in movement). Symptoms include: “frozen” facial features, bradykinesia (slowness of movement), akinesia (impairment of voluntary movement), tremor, and voice impairment. Typically, by the time the disease is diagnosed, 60% of nigrostriatal neurons have degenerated, and 80% of striatal dopamine have been depleted. Performance Metrics TP = true positive, FP = false positive, TN = true negative, FN = false negative Accuracy: (TP+TN)/(P+N) Matthews Correlation Coefficient: 1=perfect, 0=random, -1=completely inaccurate Algorithms Employed Logistic Regression (LR): Uses the sigmoid logistic equation with weights (coefficient values) and biases (constants) to model the probability of a certain class for binary classification. An output of 1 represents one class, and an output of 0 represents the other. Training the model will learn the optimal weights and biases. Linear Discriminant Analysis (LDA): Assumes that the data is Gaussian and each feature has the same variance. LDA estimates the mean and variance for each class from the training data, and then uses properties of statistics (Bayes theorem , Gaussian distribution, etc) to compute the probability of a particular instance belonging to a given class. The class with the largest probability is the prediction. k Nearest Neighbors (KNN): Makes predictions about the validation set using the entire training set. KNN makes a prediction about a new instance by searching through the entire set to find the k “closest” instances. “Closeness” is determined using a proximity measurement (Euclidean) across all features. The class that the majority of the k closest instances belong to is the class that the model predicts the new instance to be. Decision Tree (DT): Represented by a binary tree, where each root node represents an input variable and a split point, and each leaf node contains an output used to make a prediction. Neural Network (NN): Models the way the human brain makes decisions. Each neuron takes in 1+ inputs, and then uses an activation function to process the input with weights and biases to produce an output. Neurons can be arranged into layers, and multiple layers can form a network to model complex decisions. Training the network involves using the training instances to optimize the weights and biases. Naive Bayes (NB): Simplifies the calculation of probabilities by assuming that all features are independent of one another (a strong but effective assumption). Employs Bayes Theorem to calculate the probabilities that the instance to be predicted is in each class, then finds the class with the highest probability. Gradient Boost (GB): Generally used when seeking a model with very high predictive performance. Used to reduce bias and variance (“error”) by combining multiple “weak learners” (not very good models) to create a “strong learner” (high performance model). Involves 3 elements: a loss function (error function) to be optimized, a weak learner (decision tree) to make predictions, and an additive model to add trees to minimize the loss function. Gradient descent is used to minimize error after adding each tree (one by one). Engineering Goal Produce a machine learning model to diagnose Parkinson’s disease given various features of a patient’s speech with at least 90% accuracy and/or a Matthews Correlation Coefficient of at least 0.9. Compare various algorithms and parameters to determine the best model for predicting Parkinson’s. Dataset Description Source: the University of Oxford 195 instances (147 subjects with Parkinson’s, 48 without Parkinson’s) 22 features (elements that are possibly characteristic of Parkinson’s, such as frequency, pitch, amplitude / period of the sound wave) 1 label (1 for Parkinson’s, 0 for no Parkinson’s) Project Pipeline pipeline Summary of Procedure Split the Oxford Parkinson’s Dataset into two parts: one for training, one for validation (evaluate how well the model performs) Train each of the following algorithms with the training set: Logistic Regression, Linear Discriminant Analysis, k Nearest Neighbors, Decision Tree, Neural Network, Naive Bayes, Gradient Boost Evaluate results using the validation set Repeat for the following training set to validation set splits: 80% training / 20% validation, 75% / 25%, and 70% / 30% Repeat for a rescaled version of the dataset (scale all the numbers in the dataset to a range from 0 to 1: this helps to reduce the effect of outliers) Conduct 5 trials and average the results Data a_o a_r m_o m_r Data Analysis In general, the models tended to perform the best (both in terms of accuracy and Matthews Correlation Coefficient) on the rescaled dataset with a 75-25 train-test split. The two highest performing algorithms, k Nearest Neighbors and the Neural Network, both achieved an accuracy of 98%. The NN achieved a MCC of 0.96, while KNN achieved a MCC of 0.94. These figures outperform most existing literature and significantly outperform current methods of diagnosis. Conclusion and Significance These robust results suggest that a machine learning approach can indeed be implemented to significantly improve diagnosis methods of Parkinson’s disease. Given the necessity of early diagnosis for effective treatment, my machine learning models provide a very promising alternative to the current, rather ineffective method of diagnosis. Current methods of early diagnosis are only 53% accurate, while my machine learning model produces 98% accuracy. This 45% increase is critical because an accurate, early diagnosis is needed to effectively treat the disease. Typically, by the time the disease is diagnosed, 60% of nigrostriatal neurons have degenerated, and 80% of striatal dopamine have been depleted. With an earlier diagnosis, much of this degradation could have been slowed or treated. My results are very significant because Parkinson’s affects over 10 million people worldwide who could benefit greatly from an early, accurate diagnosis. Not only is my machine learning approach more accurate in terms of diagnostic accuracy, it is also more scalable, less expensive, and therefore more accessible to people who might not have access to established medical facilities and professionals. The diagnosis is also much simpler, requiring only a 10-15 second voice recording and producing an immediate diagnosis. Future Research Given more time and resources, I would investigate the following: Create a mobile application which would allow the user to record his/her voice, extract the necessary vocal features, and feed it into my machine learning model to diagnose Parkinson’s. Use larger datasets in conjunction with the University of Oxford dataset. Tune and improve my models even further to achieve even better results. Investigate different structures and types of neural networks. Construct a novel algorithm specifically suited for the prediction of Parkinson’s. Generalize my findings and algorithms for all types of dementia disorders, such as Alzheimer’s. References Bind, Shubham. "A Survey of Machine Learning Based Approaches for Parkinson Disease Prediction." International Journal of Computer Science and Information Technologies 6 (2015): n. pag. International Journal of Computer Science and Information Technologies. 2015. Web. 8 Mar. 2017. Brooks, Megan. "Diagnosing Parkinson's Disease Still Challenging." Medscape Medical News. National Institute of Neurological Disorders, 31 July 2014. Web. 20 Mar. 2017. Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection', Little MA, McSharry PE, Roberts SJ, Costello DAE, Moroz IM. BioMedical Engineering OnLine 2007, 6:23 (26 June 2007) Hashmi, Sumaiya F. "A Machine Learning Approach to Diagnosis of Parkinson’s Disease."Claremont Colleges Scholarship. Claremont College, 2013. Web. 10 Mar. 2017. Karplus, Abraham. "Machine Learning Algorithms for Cancer Diagnosis." Machine Learning Algorithms for Cancer Diagnosis (n.d.): n. pag. Mar. 2012. Web. 20 Mar. 2017. Little, Max. "Parkinsons Data Set." UCI Machine Learning Repository. University of Oxford, 26 June 2008. Web. 20 Feb. 2017. Ozcift, Akin, and Arif Gulten. "Classifier Ensemble Construction with Rotation Forest to Improve Medical Diagnosis Performance of Machine Learning Algorithms." Computer Methods and Programs in Biomedicine 104.3 (2011): 443-51. Semantic Scholar. 2011. Web. 15 Mar. 2017. "Parkinson’s Disease Dementia." UCI MIND. N.p., 19 Oct. 2015. Web. 17 Feb. 2017. Salvatore, C., A. Cerasa, I. Castiglioni, F. Gallivanone, A. Augimeri, M. Lopez, G. Arabia, M. Morelli, M.c. Gilardi, and A. Quattrone. "Machine Learning on Brain MRI Data for Differential Diagnosis of Parkinson's Disease and Progressive Supranuclear Palsy."Journal of Neuroscience Methods 222 (2014): 230-37. 2014. Web. 18 Mar. 2017. Shahbakhi, Mohammad, Danial Taheri Far, and Ehsan Tahami. "Speech Analysis for Diagnosis of Parkinson’s Disease Using Genetic Algorithm and Support Vector Machine."Journal of Biomedical Science and Engineering 07.04 (2014): 147-56. Scientific Research. July 2014. Web. 2 Mar. 2017. "Speech and Communication." Speech and Communication. Parkinson's Disease Foundation, n.d. Web. 22 Mar. 2017. Sriram, Tarigoppula V. S., M. Venkateswara Rao, G. V. Satya Narayana, and D. S. V. G. K. Kaladhar. "Diagnosis of Parkinson Disease Using Machine Learning and Data Mining Systems from Voice Dataset." SpringerLink. Springer, Cham, 01 Jan. 1970. Web. 17 Mar. 2017.
GitHub Repo
https://github.com/drraj/ProgAssignment3
drraj/ProgAssignment3
Programming Assignment 3 R Programming Introduction Download the file ProgAssignment3-data.zip file containing the data for Programming Assignment 3 from the Coursera web site. Unzip the file in a directory that will serve as your working directory. When you start up R make sure to change your working directory to the directory where you unzipped the data. The data for this assignment come from the Hospital Compare web site (http://hospitalcompare.hhs.gov) run by the U.S. Department of Health and Human Services. The purpose of the web site is to provide data and information about the quality of care at over 4,000 Medicare-certified hospitals in the U.S. This dataset essentially covers all major U.S. hospitals. This dataset is used for a variety of purposes, including determining whether hospitals should be fined for not providing high quality care to patients (see http://goo.gl/jAXFX for some background on this particular topic). The Hospital Compare web site contains a lot of data and we will only look at a small subset for this assignment. The zip file for this assignment contains three files • outcome-of-care-measures.csv: Contains information about 30-day mortality and readmission rates for heart attacks, heart failure, and pneumonia for over 4,000 hospitals. • hospital-data.csv: Contains information about each hospital. • Hospital_Revised_Flatfiles.pdf: Descriptions of the variables in each file (i.e the code book). A description of the variables in each of the files is in the included PDF file named Hospital_Revised_Flatfiles.pdf. This document contains information about many other files that are not included with this programming assignment. You will want to focus on the variables for Number 19 (“Outcome of Care Measures.csv”) and Number 11 (“Hospital Data.csv”). You may find it useful to print out this document (at least the pages for Tables 19 and 11) to have next to you while you work on this assignment. In particular, the numbers of the variables for each table indicate column indices in each table (i.e. “Hospital Name” is column 2 in the outcome-of-care-measures.csv file). 1 Plot the 30-day mortality rates for heart attack Read the outcome data into R via the read.csv function and look at the first few rows. > outcome <- read.csv("outcome-of-care-measures.csv", colClasses = "character") > head(outcome) There are many columns in this dataset. You can see how many by typing ncol(outcome) (you can see the number of rows with the nrow function). In addition, you can see the names of each column by typing names(outcome) (the names are also in the PDF document. To make a simple histogram of the 30-day death rates from heart attack (column 11 in the outcome dataset), run > outcome[, 11] <- as.numeric(outcome[, 11]) > ## You may get a warning about NAs being introduced; that is okay > hist(outcome[, 11]) 1 Because we originally read the data in as character (by specifying colClasses = "character" we need to coerce the column to be numeric. You may get a warning about NAs being introduced but that is okay. There is nothing to submit for this part. 2 Finding the best hospital in a state Write a function called best that take two arguments: the 2-character abbreviated name of a state and an outcome name. The function reads the outcome-of-care-measures.csv file and returns a character vector with the name of the hospital that has the best (i.e. lowest) 30-day mortality for the specified outcome in that state. The hospital name is the name provided in the Hospital.Name variable. The outcomes can be one of “heart attack”, “heart failure”, or “pneumonia”. Hospitals that do not have data on a particular outcome should be excluded from the set of hospitals when deciding the rankings. Handling ties. If there is a tie for the best hospital for a given outcome, then the hospital names should be sorted in alphabetical order and the first hospital in that set should be chosen (i.e. if hospitals “b”, “c”, and “f” are tied for best, then hospital “b” should be returned). The function should use the following template. best <- function(state, outcome) { ## Read outcome data ## Check that state and outcome are valid ## Return hospital name in that state with lowest 30-day death ## rate } The function should check the validity of its arguments. If an invalid state value is passed to best, the function should throw an error via the stop function with the exact message “invalid state”. If an invalid outcome value is passed to best, the function should throw an error via the stop function with the exact message “invalid outcome”. Here is some sample output from the function. > source("best.R") > best("TX", "heart attack") [1] "CYPRESS FAIRBANKS MEDICAL CENTER" > best("TX", "heart failure") [1] "FORT DUNCAN MEDICAL CENTER" > best("MD", "heart attack") [1] "JOHNS HOPKINS HOSPITAL, THE" > best("MD", "pneumonia") [1] "GREATER BALTIMORE MEDICAL CENTER" > best("BB", "heart attack") Error in best("BB", "heart attack") : invalid state > best("NY", "hert attack") Error in best("NY", "hert attack") : invalid outcome > 2 Save your code for this function to a file named best.R. Use the submit script provided to submit your solution to this part. There are 3 tests that need to be passed for this part of the assignment. 3 Ranking hospitals by outcome in a state Write a function called rankhospital that takes three arguments: the 2-character abbreviated name of a state (state), an outcome (outcome), and the ranking of a hospital in that state for that outcome (num). The function reads the outcome-of-care-measures.csv file and returns a character vector with the name of the hospital that has the ranking specified by the num argument. For example, the call rankhospital("MD", "heart failure", 5) would return a character vector containing the name of the hospital with the 5th lowest 30-day death rate for heart failure. The num argument can take values “best”, “worst”, or an integer indicating the ranking (smaller numbers are better). If the number given by num is larger than the number of hospitals in that state, then the function should return NA. Hospitals that do not have data on a particular outcome should be excluded from the set of hospitals when deciding the rankings. Handling ties. It may occur that multiple hospitals have the same 30-day mortality rate for a given cause of death. In those cases ties should be broken by using the hospital name. For example, in Texas (“TX”), the hospitals with lowest 30-day mortality rate for heart failure are shown here. > head(texas) Hospital.Name Rate Rank 3935 FORT DUNCAN MEDICAL CENTER 8.1 1 4085 TOMBALL REGIONAL MEDICAL CENTER 8.5 2 4103 CYPRESS FAIRBANKS MEDICAL CENTER 8.7 3 3954 DETAR HOSPITAL NAVARRO 8.7 4 4010 METHODIST HOSPITAL,THE 8.8 5 3962 MISSION REGIONAL MEDICAL CENTER 8.8 6 Note that Cypress Fairbanks Medical Center and Detar Hospital Navarro both have the same 30-day rate (8.7). However, because Cypress comes before Detar alphabetically, Cypress is ranked number 3 in this scheme and Detar is ranked number 4. One can use the order function to sort multiple vectors in this manner (i.e. where one vector is used to break ties in another vector). The function should use the following template. rankhospital <- function(state, outcome, num = "best") { ## Read outcome data ## Check that state and outcome are valid ## Return hospital name in that state with the given rank ## 30-day death rate } The function should check the validity of its arguments. If an invalid state value is passed to rankhospital, the function should throw an error via the stop function with the exact message “invalid state”. If an invalid outcome value is passed to rankhospital, the function should throw an error via the stop function with the exact message “invalid outcome”. Here is some sample output from the function. 3 > source("rankhospital.R") > rankhospital("TX", "heart failure", 4) [1] "DETAR HOSPITAL NAVARRO" > rankhospital("MD", "heart attack", "worst") [1] "HARFORD MEMORIAL HOSPITAL" > rankhospital("MN", "heart attack", 5000) [1] NA Save your code for this function to a file named rankhospital.R. Use the submit script provided to submit your solution to this part. There are 4 tests that need to be passed for this part of the assignment. 4 Ranking hospitals in all states Write a function called rankall that takes two arguments: an outcome name (outcome) and a hospital ranking (num). The function reads the outcome-of-care-measures.csv file and returns a 2-column data frame containing the hospital in each state that has the ranking specified in num. For example the function call rankall("heart attack", "best") would return a data frame containing the names of the hospitals that are the best in their respective states for 30-day heart attack death rates. The function should return a value for every state (some may be NA). The first column in the data frame is named hospital, which contains the hospital name, and the second column is named state, which contains the 2-character abbreviation for the state name. Hospitals that do not have data on a particular outcome should be excluded from the set of hospitals when deciding the rankings. Handling ties. The rankall function should handle ties in the 30-day mortality rates in the same way that the rankhospital function handles ties. The function should use the following template. rankall <- function(outcome, num = "best") { ## Read outcome data ## Check that state and outcome are valid ## For each state, find the hospital of the given rank ## Return a data frame with the hospital names and the ## (abbreviated) state name } NOTE: For the purpose of this part of the assignment (and for efficiency), your function should NOT call the rankhospital function from the previous section. The function should check the validity of its arguments. If an invalid outcome value is passed to rankall, the function should throw an error via the stop function with the exact message “invalid outcome”. The num variable can take values “best”, “worst”, or an integer indicating the ranking (smaller numbers are better). If the number given by num is larger than the number of hospitals in that state, then the function should return NA. Here is some sample output from the function. 4 > source("rankall.R") > head(rankall("heart attack", 20), 10) hospital state AK <NA> AK AL D W MCMILLAN MEMORIAL HOSPITAL AL AR ARKANSAS METHODIST MEDICAL CENTER AR AZ JOHN C LINCOLN DEER VALLEY HOSPITAL AZ CA SHERMAN OAKS HOSPITAL CA CO SKY RIDGE MEDICAL CENTER CO CT MIDSTATE MEDICAL CENTER CT DC <NA> DC DE <NA> DE FL SOUTH FLORIDA BAPTIST HOSPITAL FL > tail(rankall("pneumonia", "worst"), 3) hospital state WI MAYO CLINIC HEALTH SYSTEM - NORTHLAND, INC WI WV PLATEAU MEDICAL CENTER WV WY NORTH BIG HORN HOSPITAL DISTRICT WY > tail(rankall("heart failure"), 10) hospital state TN WELLMONT HAWKINS COUNTY MEMORIAL HOSPITAL TN TX FORT DUNCAN MEDICAL CENTER TX UT VA SALT LAKE CITY HEALTHCARE - GEORGE E. WAHLEN VA MEDICAL CENTER UT VA SENTARA POTOMAC HOSPITAL VA VI GOV JUAN F LUIS HOSPITAL & MEDICAL CTR VI VT SPRINGFIELD HOSPITAL VT WA HARBORVIEW MEDICAL CENTER WA WI AURORA ST LUKES MEDICAL CENTER WI WV FAIRMONT GENERAL HOSPITAL WV WY CHEYENNE VA MEDICAL CENTER WY Save your code for this function to a file named rankall.R. Use the submit script provided to submit your solution to this part. There are 3 tests that need to be passed for this part of the assignment. 5
GitHub Repo
https://github.com/atharv6f/Feature-Subset-Selection-on-Heliophysics-Time-Series-data
atharv6f/Feature-Subset-Selection-on-Heliophysics-Time-Series-data
Solar Energy Particles (SEPs) can be associated with solar flares and coronal mass ejections (CMEs) and offer energy spectra ranging from few KeVs to many GeVs. These events can occur without any notable indication and alter the radiation environment of the inner solar systems, which can potentially lead to precarious conditions for humans in space, affect the interior of spacecraft's sensitive electronics, and trigger radio blackouts. Identifying the most critical physical parameters of the Solar Dynamic Observatory (SDO) to detect SEPs can allow for a swift response against its adverse effects. With the profusion of high-quality time series data from the SDO, which accounts for the modulating background of magnetic activity and the inherently dynamic phenomenon of pre-flares and post-flare phases; antithetical to non-representative data with the point-in-time measurements employed earlier, selection of vital parameters for solar flare classification using machine learning algorithms appears to be a well-fitted problem in this realm. The primary issue of dealing with multivariate time series data (mvts) is the large number of physical parameters operating at a rapid frequency, making the data dimensionality very high and thus causing the learning process to curb. Moreover, manually selecting vital parameters is a tedious and costly task on which experts may not always agree on the results. In response, we examined feature subset selection using multiple algorithms on mvts data and the statistical features derived from mvts segments (vectorized data). We used the SWAN-SF (Space Weather Analytics for Solar Flares) benchmark dataset collected from May 2010 - September 2018 to conduct our experiments. The comprehensive study gives a stable scheme to recognize the critical physical parameters, which boosts the learning process and can be used as a blueprint to foretell future solar flare episodes.
GitHub Repo
https://github.com/AnandMukhopadhyay/Tutorial_AcousticWirelessSensorNode
AnandMukhopadhyay/Tutorial_AcousticWirelessSensorNode
This tutorial illustrates the proceess to derive feature vector (FV) from acoustic signals followed by training an ANN model. The data considered is human footstep acoustic sound in presence of forest background noise. This will be applicable to internet of things (IoT) security surveillance application for detecting human intrusion in restricted zones.
GitHub Repo
https://github.com/dhvanikotak/Emotion-Detection-in-Videos