# Identify variables in your statistical study

One of the things we have learnt is that statistics is the technology of extracting information so for this mini project you will do exactly that. Given the current circumstances due to the pandemic, you can collect raw data from the internet based on the topic of your interest.

Please do not do the project in a piecewise manner, rather you should pursue a holistic approach in order to maximize your grade.

Part 1:

Formulate a research question which would allow you to design an experiment in order to answer your question . The question should be chosen around a topic of your interest or you should possess a good understanding of it . When collecting data, keep in mind that you need to have a minimum of 50 samples. (CP2)
Identify variables in your statistical study (CP2)
Identify the population, type of data and determine levels of measurements (CP2)
Discuss the possible issues of ethics, subject confidentiality and privacy that could have been considered when raw data was collected.(CR2)
Describe a possible sampling technique that could have been used to collect the data, if you were in charge of collecting the raw data, which sampling technique would you prefer and why.(CP2)
How could this study result in undercoverage (CR1)
If it was possible to use simulation, would it be appropriate or not, explain why (CR1)
If you were told that there is a non-sampling error in your studies, explain what that means and give a specific example of how this could have happened. (CP2)
Design a complete Random Experiment as well as Randomized Block Experiments (CP1)
Discuss possible confounding and lurking variables (CR2)
Suppose you were sponsored by somebody or a company, how and why would that impact the findings. (CR1)

## Identify variables in your statistical study

Part 2:
Organize the raw data using a frequency table, (CP1)
Construct a frequency and relative frequency histogram as well as an ogive. Comment on the distribution of the data and interpret the graph. (PS1)
Compute mean, mode and median and explain what they each represent. Which one would you pick to represent the sample. Is there a significant difference between the mean and the trimmed mean. Using Chebyshevâ€™s theorem, find the percentage of data that lies within 1 standard deviation and outside 4 standard deviations around the mean. (PS2)
Finally make a box and whiskers plot and describe the spread of data. (CP2)

