A REGRESSION BASED MACHINE LEARNING MODEL TO ESTIMATE THE MISSING FAT-MASS PARAMETER IN CALERIE STUDY DATASET
There are a lot of clinical trials run on numerous scales in the world. Clinical trials are so important to learn many unseen patterns of our test results from a statistician's point of view. But the majority of the tests are done to humans by humans. Hence there is always an element of human error while entering the data which can lead to misinterpretation. Also, the participants are unavailable to certain tests one such problem is observed in the CALERIE study where there were missing values for fat mass using DXA almost 4 times in 2 years. This missing data problem can be resolved using the technique called data imputation. So, using existing data a machine learning model is built to replicate the missing data concerning the other features. For this purpose, numerous regression models were built to estimate the fat mass model and Lasso linear regression model showed relatively better results with a 0.986 r-squared value which cross-validated with the test data for an impressive cross-validation score of 0.978. Lasso performed better because of its ability to identify the significant feature from the dataset without any feature selection. A machine learning model is built to fill missing values of the fat mass values in the CALERIE dataset and it shows promising results to continue and understand the CALERIE study better in terms of energy expenditure and specific reactions of the human body based on various parameters under various circumstances.