You’re probably wondering when and where you’ll use this very broad and vast topic. You’d be surprised to know that you already do, every time to use the internet read a newspaper there are statistics about real estate, crime, sports, education etc. you are provided with a sample information, on the basis of which you’ll judge the quantity of truth in the data and try to decipher is a fact or claim. Well, this is when this vastly important topic comes into play. Methods of statistics help us in making the “best educated guess”. They provide us with techniques to analyse the data that we are provided daily in a critical sense.
Statistics is a multi-faceted topic with branches in various fields including medical science and Sociology. It is often considered as being part of the branch of mathematics, dealing of charts, figures and organisation of data. According to Agresti & Finlay (1997),
“Statistics inculcates a body of methods for collecting and analysing data”.
In general, we can say that for the purpose of drawing concrete conclusions for data, a methodology is adapted by collecting, analysing and interpreting it.
All the activities entailing any form of collection, procession, interpretation and presentation of information finds its roots in statistics.
In the following example we can see how statistics are used in various fields:
These are two very basic ideas of statistics: Population and Sample.
Population can be considered as a collection of people, things or objects that a statistician or for that matter any researcher is interested in studying.
Often times, when census data cannot be collected and the appropriate measurements are required to be collected for all the people in the population, sometimes only a part of the population are observed for the purpose of the study. This small set of the entire population is called a Sample. A sample is often collected through surveys and various experimental designs via statistical models. Studies such as cohort study or observational studies are used for descriptive statistics that we’ll discuss below.
Definition of Population as devised by Weiss (1999) signifies that it is the incorporation of the group of all the people or things collected for the purpose of conducting a statistical study. Alternately, for the purpose of making inferences, values, generally quantitative in nature are collected from people. This is called a population.
Weiss (1999) again defines a Sample as “that part of a population which is observed and from which the information is collected”. Most studies require only a few features of the people under observation to be studied at the same time.
The target is always established within the population during investigation. For the purpose of a successful study, the sample from the population is the one which is actually studied. Populations can be further divided into different kinds. There are established discrepancies than can arise out of the populations. First being Finite Population : It is the population under consideration which physically exist such as: people or things. Example : - children in a school, computers in a classroom. The second being the Hypothetical Population: In this case, the population can be arbitrary and may arise from “phenomenon under consideration” . Example: an automobile manufacturer building a car. While using the same infrastructure and materials through the same mode of production in the future, the old cars are used to make inferences regarding the mobility and quality of the new batch. Here, the new cars will be considered a hypothetical population.
The typed of data that is being collected can also differ. For example the psychotherapist Stanley Smith Steven defined various data scales, such as ordinal, nominal, ratio and interval. The Categorical scales have no way for numerical measurement such as Nominal Scale depicts that the order has no meaning or values and are mutually exclusive. Such as names, gender and pass/fail, whereas Ordinal scale has meaning such as motivation level, university marks etc. The quantitative scales involves discrete and continuous numerical values. Interval scale with the order and distance showing meaning. Example will be dates and temperature in Fahrenheit. The ratio scale will also have meaning such as height and weight.
In the vast field of statistics, there are two important ones, namely Descriptive and Inferential.
This form of statistics is very self-explanatory. This leg being involved in the description of data for the purpose of summarising is called Descriptive statistics. Weiss officially defines Descriptive Statistics as a “set of methods used for organising and summarizing of information” (Weiss, 1999). This branch deals in the process and construction of various charts, graphs, tables etc. as well as inculcating descriptive measures such as percentiles and averages.
In the same self-explanatory way, this form of statistics refers to the branch which makes inferences about the population by observing the data collected from the small sample set.. Weiss officially defines Inferential statistics as a “set of methods used for drawing and measuring the reliability of conclusions about population based on information obtained from a sample of the population.” (Weiss, 1999) . This branch inculcates various probability theory base methods like point estimation, interval estimation and the testing of hypotheses.
Example: If one rolls a die a couple of times, say 50 times. The number at which the die is rolled at every time forms data for a sample.
In this case, a table can be constructed, dividing the outcome and the frequencies (number of times a number turned up). Here, Inferential statistics is now utilised to ensure that the die is a fair or not. The frequencies are further divided in the way of measuring such as Mean, Median and Mode. We can conclude that both the types of statistics are inter connected. In most of the cases it is required to use both methods of descriptive and inferential statistics for making a thorough analysis of the study being conducted. Descriptive statistics is needed to process and organise the data. The initial analysis using descriptive statistics often shows us the appropriate method to be used in inferential statistics
In some rare case, information is possible to be collected from the whole population. This is when we can perform a study on both the population and the sample. A study only becomes inferential when an inference is made about the population in accordance to the data obtained from the sample.
Most often than not, population which is observed can be summarized numerically through various parameters. However, the purpose behind the entire study can sometimes be to investigate the true parameters since they are unknown. This is when sample statics is used to make inferences regarding the same.
A parameter can be defined as an “unknown numerical summary of the population”. “A statistic is a known numerical summary of the sample which can be used to make inference about parameters” (Agresti & Finlay, 1997).
In this case, knows statistical samples are used to make inferences about unknown parameters since the main idea behind most of the studies is regarding the parameters and not the data, which is also considered at statistics, collected. The only reason both sample and statistics are significant is because they tell unknown parameters.
An example can be taken: for researching about what percentage of teenagers between the ages 14-17 drink Red Bull. Here the parameter will be the ratio r of 14-17 year-olds drinking Red bull at least twice a week. The statistic will be ratio of rˆ of 14-17 year-olds drinking Red bull at least twice a week measured from the sample of 14-17 year-olds.
In application of Statistics, the process of transforming seemingly insignificant data to useful information by drawing conclusion and inferences for the purpose of decision- making. It involves two types of analyses namely Exploratory data Analysis also called EDA, which is another form of descriptive statistics, in which new data is discovered, whereas CDA, also called Confirmatory data analysis focuses on proving the existing hypotheses wrong or right. The process begins with formulation of the research problem. A clear population and sample is then defined for collecting the data. The data then goes through descriptive analysis after which an appropriate statistical method is used to solve the research problem. The final results are then reported which marks the end of the Data Analysis Cycle.
From above, it should be clear that statistics is much more than just the tabulation of numbers and the graphical presentation of these tabulated numbers. It not only makes life easier but also qualifies us to make informed decision about choices. Statistics is the science of gaining information from numerical and categorical data.