A statistical modelling method, used in various fields such as finance, investing etc. to examine the relationship between dependent and independent variables. It’s a form of predictive analysis
The basic idea is to understand how the dependent value changes when another variable which is often independent varies. For two variables, it is based on a linear equation .
This linear equation has one independent variable denoted with the equation :
y = a + bx
in which constant numbers are denoted by a and b.
Here x and y are independent and dependent variables respectively. We usually take a value to replace x to find out y.
Example:
Y = 4 + 3x
Y = 2 +.5x
This can be graphical represented in a form of a straight line :
The observed data hardly ever forms a line. We have to make wild guesses and be satisfied with them.
This can be depicted in the below example:
Example: A in class of 11 students , data set represent x as the marks obtained out of 80 in Statistics and y depicts the exam score out of 200 for English. We need to make it possible so that we can predictthe English score for a random student if only the statistics score in known:
X (Statistics) | Y (English) |
65 | 175 |
67 | 133 |
71 | 185 |
71 | 163 |
66 | 126 |
75 | 198 |
67 | 153 |
70 | 163 |
71 | 159 |
69 | 151 |
69 | 159 |
Here the statistics score represented by x is an independent variable and English score represented by y is the dependent variable. We connect the point into a line that best fits the data. We use least-squares regression line to obtain this.
This process of fitting the line that best represents the data set is called linear regression.
Regression analysis is used for a variety of different reasons. The most important one being trend forecasting. It is also used to determine the strength of the predictors variables and to forecast an effect.
Multiple Linear Regression which a has one single dependent variable along with two or more independent variables
Logistic regression has one dependent variable and two or more independent variables as well
Ordinal and Multinomial Regression both have one dependent variable along with 1 independent variable.
Regression analysis predict the value of Y variable when the value for x is give. This prediction which is used within the intervals or classes is known at interpolation . The same way when this prediction is conducted for outside these intervals or classes, it is called extrapolation. There is certain limitation to regression. Linear regression cannot be used to explain certain relationships such as quadratic ones.
In statistics and probability theory, Covariance is used to measure the variability jointly between two different variables. If any two random variables show similar behaviour i.e. if the larger values of the first variable correspond with the larger values of the second variable, it results in the covariance being positive. In another situation when the two variables show very different or even opposite behaviour i.e. the larger values of the first variable maybe corresponding to the smallest value of the second variable. This results in the co-variance being negative. It helps us depict the probability or tendency of a linear relationship between the first and second variables.
For to random variables which are jointly distributed, are considered to be the expected result derived of their deviations from already existing expected values:
Cov (X, Y) = E [ ( X – E [X] ) ( Y – E [ Y ] ) ] ,
This is where the Expected value of X is depicted through E[x]. This is also considered as the mean of X. Through the Linearity property, we can make it easy to comprehend is by subtracted the result of expected values from the expected value of their product:
Cov (X, Y) = E[(X – E [X]) (Y- [{Y])]
= E{XY- X E[Y] – E[X] Y + E[X] E[Y]]
= E [XY] – E [X] E[Y] – E[X] E[Y] + E[X] E [Y]
= E [XY] – E[X] E [Y]
There are several properties attached to Covariance:
1) Covariance in which, both the variables are identical and always take the same value, this is called a Variance.
2) For absolute valued random variables of X, Y, V, W and constants a, b, c, d : we will get the following :
Cov (X, a) = 0
Cov X, X) = var (X)
Cov( X, Y) = cov (Y, x)
Cov( aX, bY) = cov (X, Y)
Cov (aX+ bY , cW + dV) = ac cov( X, W) + ad cov (X, V) + bc cov(Y , W) +bd cov (Y, V)
3) Uncorrelated and independence: The covariance of two variables will be zero if both of them are independent. This can be depicted in the following equation:
E [XY] = E[X] . E [Y]
However, in a case in a certain scenario where X is distributed uniformly in [-1, 1] and Y = X2
Here both the variables are clearly dependent, but we get the following answer:
Cov ((X, Y) = cov ( X, X2)
= E [X. X2] – E[X] . E[X2]
= E [X3] – E[X] E [X2]
= 0 – 0 . E [X2]
= 0
Due to this non-linear relationship between both the variables, both covariance and correlation cannot be found since they measure linear dependence. So just because two variables are uncorrelated, it doesn’t mean that they are independent. But if x and y are jointly normally distributed, independence is implied through uncorrelatedness. However, this is not true in certain cases. Cases where the two variables are just individually normally distributed.
Conclusion: Covariance is used in numerous different fields these days. Such are Molecular biology uses it to study DNA sequences and theory closely related different species. Covariance is a very important too in financial economic to find out the returns of an asset in a capital asset pricing model.