I had quite a lesson today, and learnt some critical steps as to how to go about working in a data project. How one should really approach data projects? (We assume we already have a dataset)

  • Know your Data.
    • Spend a week on getting to know your data.
    • Ask Questions:
      • What is the source of my dataset?
      • Why was the data measured in the first place?
      • Why is the dataset important?
      • What are the different features of my dataset and what they mean?
      • How much accurate your data is? Are there features which can have faulty observations?
    • Tempted to start your analysis, it will all be a waste, if you don’t do the above.
  • Set up aims and objectives for your analysis. What do you want to achieve? What are you interested in knowing from the data?
    • If you are dealing in a specific domain then ask the domain specific question first.
    • For example: I have started working in MATH-BIO so a relevant question that I should first ask is what’s the biological question that this dataset can be used to answer. Is there any lengthy wet process that can be simplified using the dataset?
    • Fix the computational question next. Are you looking at clustering points, classifying them or whatever