8th International Conference on Body Area Networks

Research Article

Pattern Recognition of Big Nutritional Data in RCT

  • @INPROCEEDINGS{10.4108/icst.bodynets.2013.253690,
        author={Jin Wang and Hua Fang and Honggang Wang and Gin-Fei Olendzki and Chonggang Wang and Yunsheng Ma},
        title={Pattern Recognition of Big Nutritional Data in RCT},
        proceedings={8th International Conference on Body Area Networks},
        keywords={big data random controlled trial pattern recognition heterogeneity simulation dietary quality nutritional datasets gaussian mixture model (gmm) hidden markov random fields (hmrfs) self-organizing map-based neural networks (som) k-means agglomerative hierarchical clustering},
  • Jin Wang
    Hua Fang
    Honggang Wang
    Gin-Fei Olendzki
    Chonggang Wang
    Yunsheng Ma
    Year: 2013
    Pattern Recognition of Big Nutritional Data in RCT
    DOI: 10.4108/icst.bodynets.2013.253690
Jin Wang1, Hua Fang2, Honggang Wang1,*, Gin-Fei Olendzki2, Chonggang Wang3, Yunsheng Ma2
  • 1: University of Massachusetts Dartmouth
  • 2: University of Massachusetts Medical School
  • 3: Interdigital
*Contact email: hwang1@umassd.edu


As technology develops and research environment improves, large volume of data is collected for analyses. Unfortunately, these data are collected but not fully used or even untouched. Particularly, such big data from health and medical studies pose significant challenges to the methodological field. This paper presents a new multi-clustering approach for pattern recognition of big data in a randomized controlled trial (RCT) with multi-validation criteria. Specifically, a nutritional dataset was used to demonstrate our approach, which was generated from an NIH-funded RCT for patients with metabolic syndromes The proposed approach includes a suite of emerging and popular clustering methods: probability-based Gaussian Mixture Model (GMM), Hidden Markov Random Fields(HMRFs), Self-Organizing Map (SOM)-based neural networks, K-means and Agglomerative Hierarchical method. Using our RCT data and multi-validation criteria, our approach identified a most sufficient set of nutritional variables and detected distinct dietary change patterns with a universal agreement among the proposed multi-methods. The trajectory patterns were then generated using the method with the most clustering accuracy which was cross-validated via simulation. These patterns generated new and finer results for outcomes of the RCT. While our approach demonstrated a more accurate and comprehensive clustering only for big nutritional data in RCT, it can be generalized to big data in other research fields.