I had quite a lesson today, and learnt some critical steps as to how to go about working in a data project. How one should really approach data projects? (We assume we already have a dataset)
- Know your Data.
- Spend a week on getting to know your data.
- Ask Questions:
- What is the source of my dataset?
- Why was the data measured in the first place?
- Why is the dataset important?
- What are the different features of my dataset and what they mean?
- How much accurate your data is? Are there features which can have faulty observations?
- Tempted to start your analysis, it will all be a waste, if you don’t do the above.
- Set up aims and objectives for your analysis. What do you want to achieve? What are you interested in knowing from the data?
- If you are dealing in a specific domain then ask the domain specific question first.
- For example: I have started working in MATH-BIO so a relevant question that I should first ask is what’s the biological question that this dataset can be used to answer. Is there any lengthy wet process that can be simplified using the dataset?
- Fix the computational question next. Are you looking at clustering points, classifying them or whatever
Leave a Comment
Your email address will not be published. Required fields are marked *