The HMS IT Research Computing (RC) newsletter highlights advocates in our community who engage with members of RC to drive science and innovation. This month, we spotlight Dr. David Harmin, an instructor in the Neurobiology department and a member of the Greenberg Lab. Collaborating with a lab fellow, Dr. Sara Trowbridge, they study the correlations of Autism Spectrum Disorder (ASD) diagnoses in 12,000 Simons Foundation Powering Autism Research for Knowledge Study (SPARK) subjects with their genomic variants.
The project involved processing several large tables containing Variant Cell Files (one per chromosome). The computational pipelines required all objects to persist as uncompressed flat files while variants were extracted and collected for each individual. This was followed by the construction of a complete genome for each individual that could then be queried for specific features of interest. The expanded files required 60 TBs of storage space. A typical digital photo is about 5MB in size. So, 60TB of space could hold 12 million digital photos. That is a lot of data!
Dr. Harmin contacted HMS-RC to leverage the resources and consulting services available to members of the HMS community. The RC team helped him optimize data workflows, troubleshoot a cryptic bug, provision the best storage option, and obtain a temporary exception on Scratch to deploy a computationally demanding workflow. Dr. Harmin continues to collaborate with RC as the resources needed to run the workflow grow and he migrates old data to a better-suited storage system. The combination of computational and storage resources, as well as one-on-one consultations, has allowed Harmin to continue to interrogate the dataset for clues on how disruption of gene regulatory sites in human genomes may contribute to autism spectrum disorders.
When RC asked Dr. Harmin to share his thoughts about data management, he mentioned
“I'm much more careful now about pre-testing small versions of large jobs to anticipate and plan for their memory and time usage on the cluster.” We also asked Dr Harmin what advice would you give someone just getting started who has never used RC services before. He responded, “Take your courses; know how to file a help request and feel free [to] use it -- RC's responsiveness is reliable.”