A researcher investigating the association between two variables collected some data and was surprised when he calculated the correlation. He had expected to find a fairly strong association, yet the correlation was near 0. Discouraged, he didn’t bother making a scatterplot. Explain to him how the scatterplot could still reveal the strong association he anticipated. The correlation coefficient only measures the degree of linearity in the relationship between two variables. If the two variables have some other relationship, (e. g. quadratic, logarithmic, time series, etc) than there may be possibility that correlation will be near zero.
For example below time series data the correlation is approximately -0. 16 that is near zero, that does not mean that two variable are not associated because here, there is seasonal (Time-Series) pattern between two variables and scatterplot revel that there is strong association.. A friend of the instructor recorded the amount of ice cream sold (in tons – data obtained from the local Chamber of Commerce) in an ocean resort for each of 25 consecutive seasons. A season runs from Memorial Day to Labor Day. She also recorded the number of drownings recorded at the resort for each of the same 25 seasons.
A scatterplot of drowning against amount of ice cream sold is shown below. There is quite a strong, positive relationship between these two variables. Does this suggest that people are ignoring their mother’s advice about going in the water immediately after eating (ice cream)? Explain. The data suggest that there is quite a strong, positive relationship between ice cream sold and the number of drowning. However, this does not clearly suggest that people are ignoring their mother’s advice about going in the water immediately after eating (ice cream).
The reason for this is that there may be possibility that number of people has increased during last 25 seasons and hence now more people are eating (ice cream) and also more cases of drowning. If women always married men who were two years older than themselves, what would be the correlation between the ages of husband and wife? The correlation between the ages of husband and wife will be +1 (perfect positive linear relationship). A study shows that there is a moderate, positive correlation between the size of a hospital (measured by its number of beds, X) and the median number of days, Y, that patients remain in the hospital.
Does this mean that you can shorten a hospital stay by choosing a small hospital? Why? No, this does not mean that we can shorten a hospital stay by choosing a small hospital because we choose hospital depending upon patient’s condition. In general, small hospitals are for general patient’s problems and bigger hospitals are more severe patient’s problems, therefore, patient stay less number of days in smaller hospital as compared to bigger hospitals. Further, here, the median number of days, Y, that patient remains in the hospital only suggest middle value of all data.
There may be possibility that the mean number of days, Y, that patient remains in the hospital is quite different (higher or lower). A correlation analysis tells about association between two variables, whereas a regression analysis is used to predict response (dependent) variable based on explanatory (independent) variable assuming there is a linear relationship between the independent variable and the dependent variable. Moreover, correlation analysis is first step to check, if there is any association between two variables, than, regression analysis is performed, if there is correlation present between two variables.
The gestational age (in weeks) and the infant birth weight (in grams) was recorded for 100 low birth weight infants born in Boston . The scatterplot of weight against gestational age is shown below together with some relevant computer output. The regression equation is birthwt = – 932 + 70. 3 gestage S = 203. 884 R-Sq = 43. 6% (a) Use the equation of the least squares line to predict the infant birth weight for an infant born after a gestation period of 30 weeks. birthwt = – 932 + 70. 3 gestage = – 932 + 70. 3*30 = 1177 grams (b) Interpret the slope of the line in this case (in terms of weight and age).
Each additional week in gestational age raises the infant birth weight by 70. 3 grams. (c) Interpret the value for R-Sq in this case (in terms of weight and age). Approximately 43. 6% variation in the infant birth weight is explained by the gestational age. A pediatrician is interested in the relationship between the ages of children and hours of sleep. She asks the parents of 15 children, varying in age from 3 to 11 years old, to record how long their child slept on each of 10 consecutive school nights. The scatterplot below shows the average hours of sleep against age for the 15 children.
Also shown is some relevant computer output. The regression equation is sleep = 12. 7 – 0. 380 age S = 0. 599702 R-Sq = 76. 9% (a) Interpret the slope of the least squares line in this case. Each additional year of children age reduces the hours of sleep by 0. 38 hours. (b) Interpret the value for R-Sq in this case. Approximately 76. 9% variation in the sleep is explained by the children age. (c) Suppose that Adam is four years older than Alice. Predict how much longer Alice will sleep compared to Adam. Alice will sleep 1. 52 (=4*0. 38) hours longer as compared to Adam.