Vol. XXXIV (March 1996), pp. 97-114

The Standard Error of Regressions

By D E I R D R E N . M C C L O S K E Y

and

STEPHEN T. ZILIAK

University of Iowa

Suggestions by two anonymous and patient referees greatly improved the paper. Our thanks also to seminars at Clark, Iowa State, Harvard, Houston, Indiana, and Kansas State universities, at Williatns College, and at the universities of Virginia and Iowa. A colleague at Iowa,

Calvin Siehert, was materially helpful.

T

cant for science or policy and yet be insignificant statistically, ignored by the less thoughtful researchers.

In the 1930s Jerzy Neyman and Egon

S. Pearson, and then more explicitly

Abraham Wald, argued that actual investigations should depend on substantive not merely statistical significance. In

1933 Neyman and Pearson wrote of type

I and type II errors:

HE IDEA OF Statistical significance is

old, as old as Cicero writing on forecasts (Cicero, De Divinatione, 1. xiii. 23).

In 1773 Laplace used it to test whether comets came from outside the solar system (Elizabeth Scott 1953, p. 20). The first use of the very word "significance" in a statistical context seems to be John

Venn's, in 1888, speaking of differences expressed in units of probable error;

Is it more serious to convict an innocent man or to acquit a guilty? That will depend on the consequences of the error; is the punishment death or fine; what is the danger to the community of released criminals; what are the current ethical views on punishment? From the point of view of mathematical theory all that we can do is to show how the risk of errors may be controlled and minimised. The use of these statistical tools in any given case, in determining just how the balance should be struck, must be left to the investigator.

(Neyman and Pearson 1933, p. 296; italics

supplied)…...

...Chapter 4 Multiple Linear Regression Section 4.1 The Model and Assumptions Objectives Participants will: understand the elements of the model understand the major assumptions of doing a regression analysis learn how to verify the assumptions understand a median split 3 The Model y o 1x1 ... p x p or in Matrix Notation Dependent Variable nx1 Unknown Parameters (p+1) x 1 Y X e Independent Variables – n x(p+1) Error – nx1 4 Questions How many unknown parameters are there? Can you name them? How many populations will be sampled? What are conceptual populations? 5 Major Requirements for Doing a Regression Analysis The errors are normally distributed (not Y). Constant variance – What is the null hypothesis? Linear in the parameters Errors are independent. Some people call these assumptions. EY () X 6 Example We have observed y = response (change in blood pressure) and x = dosage level of a drug. We assume a linear relationship between E(y) and x. The two graphs are the same, but they have been rotated to give additional views. 7 continued... Example 8 continued... Example Sketch E(y). Based on the graphs, make comments about the assumptions. Do they appear to be satisfied or violated? How many populations are represented by the graphs? List all of the parameters. Write the model down. 9 Checking Assumptions Testing the residuals for normality PROC......

...Introduction Regression analysis was developed by Francis Galton in 1886 to determine the weight of mother/daughter sweet peas. Regression analysis is a parametric test used for the inference from a sample to a population. The goal of regression analysis is to investigate how effective one or more variables are in predicting the value of a dependent variable. In the following we conduct three simple regression analyses. Benefits and Intrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.616038 R Square 0.379503 Adjusted R Square 0.371338 Standard Error 0.773609 Observations 78 ANOVA df SS MS F Significance F Regression 1 27.81836 27.81836 46.48237 1.93E-09 Residual 76 45.48382 0.598471 Total 77 73.30218 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 2.897327 0.310671 9.326021 3.18E-14 2.278571 3.516082 2.278571 3.516082 X Variable 1 0.42507 0.062347 6.817798 1.93E-09 0.300895 0.549245 0.300895 0.549245 Graph Benefits and Extrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.516369 R Square 0.266637 Adjusted R Square 0.256987 Standard Error 0.35314 Observations 78 ANOVA ...

...Regression Models Student Name Grantham University BA/520 – Quantitative Analysis Instructor Name April 6, 2013 Abstract This paper will refer to regression models and the benefits that variables provide when developing and examining such models. Also, it will discuss the reason why scatter diagrams are used and will describe the simple linear regression model and will refer to multiple regression analysis as well as the potential uses for this type of model. Regression Models Regression models are a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Regression models provide the scientist with a powerful tool, allowing predictions about past, present, or future events to be made with information about past or present events. Inference based on such models is known as regression analysis. The main purpose of regression analysis is to predict the value of a dependent or response variable based on values of the independent or explanatory variables. According to Render, Stair, and Hanna (2011) they are two reasons for which regression analyses are used: one is to understand the relation between various variables and the second is to predict the variable's value based on the value of the other. Variables provide many advantages when creating models. One of the......

...Methods and Analysis Instructor Leonidas Murembya April 23, 2013, Abstract This paper will be discussing regression analysis using AIU’s survey responses from the AIU data set in order to complete a regression analysis for benefits & intrinsic, benefits & extrinsic and benefit and overall job satisfaction. Plus giving an overview of these regressions along with what it would mean to a manager (AIU Online). Introduction Regression analysis can help us predict how the needs of a company are changing and where the greatest need will be. That allows companies to hire employees they need before they are needed so they are not caught in a lurch. Our regression analysis looks at comparing two factors only, an independent variable and dependent variable (Murembya, 2013). Benefits and Intrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.018314784 R Square 0.000335431 The portion of the relations explained Adjusted R Square -0.009865228 by the line 0.00033% of relation is Standard Error 1.197079687 Linear. Observations 100 ANOVA df SS MS F Significance F Regression 1 0.04712176 0.047122 0.032883 0.856477174 Residual 98 140.4339782 1.433 Total 99 140.4811 Coefficients Standard Error t Stat P-value Lower 95% Upper......

...Regression Analysis: Basic Concepts Allin Cottrell∗ 1 The simple linear model Suppose we reckon that some variable of interest, y, is ‘driven by’ some other variable x. We then call y the dependent variable and x the independent variable. In addition, suppose that the relationship between y and x is basically linear, but is inexact: besides its determination by x, y has a random component, u, which we call the ‘disturbance’ or ‘error’. Let i index the observations on the data pairs (x, y). The simple linear model formalizes the ideas just stated: yi = β0 + β1 xi + ui The parameters β0 and β1 represent the y-intercept and the slope of the relationship, respectively. In order to work with this model we need to make some assumptions about the behavior of the error term. For now we’ll assume three things: E(ui ) = 0 2 2 E(ui ) = σu E(ui u j ) = 0, i = j u has a mean of zero for all i it has the same variance for all i no correlation across observations We’ll see later how to check whether these assumptions are met, and also what resources we have for dealing with a situation where they’re not met. We have just made a bunch of assumptions about what is ‘really going on’ between y and x, but we’d like to put numbers on the parameters βo and β1 . Well, suppose we’re able to gather a sample of data on x and y. The task ˆ of estimation is then to come up with coefﬁcients—numbers that we can calculate from the data, call them β0 and ˆ1 —which serve as estimates of the unknown......

...negative or positive. This is told by whether the graph increases or decreases. Benefits and Intrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.069642247 R Square 0.004850043 Adjusted R Square -0.00471871 Standard Error 0.893876875 Observations 106 ANOVA df SS MS F Significance F Regression 1 0.404991362 0.404991 0.50686 0.478094147 Residual 104 83.09765015 0.799016 Total 105 83.50264151 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 5.506191723 0.363736853 15.13784 4.8E-28 4.784887893 6.2274956 4.7848879 6.22749555 Benefits -0.05716561 0.080295211 -0.711943 0.47809 -0.21639402 0.1020628 -0.216394 0.10206281 Y=5.5062+-0.0572x Graph Benefits and Extrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.161906 R Square 0.026214 Adjusted R Square 0.01685 Standard Error 1.001305 Observations 106 ANOVA df SS MS F Significance F Regression 1 2.806919 2.806919 2.799606 0.097293 Residual 104 104.2717 1.002612 Total 105 107.0786 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper......

...Linear Regression deals with the numerical measures to express the relationship between two variables. Relationships between variables can either be strong or weak or even direct or inverse. A few examples may be the amount McDonald’s spends on advertising per month and the amount of total sales in a month. Additionally the amount of study time one puts toward this statistics in comparison to the grades they receive may be analyzed using the regression method. The formal definition of Regression Analysis is the equation that allows one to estimate the value of one variable based on the value of another. Key objectives in performing a regression analysis include estimating the dependent variable Y based on a selected value of the independent variable X. To explain, Nike could possibly measurer how much they spend on celebrity endorsements and the affect it has on sales in a month. When measuring, the amount spent celebrity endorsements would be the independent X variable. Without the X variable, Y would be impossible to estimate. The general from of the regression equation is Y hat "=a + bX" where Y hat is the estimated value of the estimated value of the Y variable for a selected X value. a represents the Y-Intercept, therefore, it is the estimated value of Y when X=0. Furthermore, b is the slope of the line or the average change in Y hat for each change of one unit in the independent variable X. Finally, X is any value of the independent variable that is......

...STATISTICS FOR ENGINEERS (EQT 373) TUTORIAL CHAPTER 3 – INTRODUCTORY LINEAR REGRESSION 1) Given 5 observations for two variables, x and y. | 3 | 12 | 6 | 20 | 14 | | 55 | 40 | 55 | 10 | 15 | a. Develop a scatter diagram for these data. b. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? c. Develop the estimated regression equation by computing the values and. d. Use the estimated regression equation to predict the value of y when x=10. e. Compute the coefficient of determination. Comment on the goodness of fit. f. Compute the sample correlation coefficient (r) and explain the result. 2) The Tenaga Elektik MN Company is studying the relationship between kilowatt-hours (thousands) used and the number of room in a private single-family residence. A random sample of 10 homes yielded the following. Number of rooms | Kilowatt-Hours (thousands) | 12 9 14 6 10 8 10 10 5 7 | 9 7 10 5 8 6 8 10 4 7 | a. Identify the independent and dependent variable. b. Compute the coefficient of correlation and explain. c. Compute the coefficient of determination and explain. d. Test whether there is a positive correlation between both variables. Use α=0.05. e. Determine the regression equation (used Least Square method) f. Determine the value of kilowatt-hours used if number of rooms is 11. g. Can you use the model in (f.) to predict the kilowatt-hours if number of......

...MULTIPLE REGRESSION After completing this chapter, you should be able to: understand model building using multiple regression analysis apply multiple regression analysis to business decision-making situations analyze and interpret the computer output for a multiple regression model test the significance of the independent variables in a multiple regression model use variable transformations to model nonlinear relationships recognize potential problems in multiple regression analysis and take the steps to correct the problems. incorporate qualitative variables into the regression model by using dummy variables. Multiple Regression Assumptions The errors are normally distributed The mean of the errors is zero Errors have a constant variance The model errors are independent Model Specification Decide what you want to do and select the dependent variable Determine the potential independent variables for your model Gather sample data (observations) for all variables The Correlation Matrix Correlation between the dependent variable and selected independent variables can be found using Excel: Tools / Data Analysis… / Correlation Can check for statistical significance of correlation with a t test Example A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per......

...Applied Regression Analysis 41100-81 Christian Hansen Winter 2015 “I pledge my honor that I have not violated the Honor Code during this assignment.” Kataras, Peter Foltyn, Tom Erzen, Robert Scholl, Katie In order to begin we first had to gain a high level understanding of the 6000 observations that we were given. We ran descriptive statistics on all of the original variables after transforming the variable Color into a dummy variable called White (White Wine=1, Red wine=0). Descriptive Statistics | | N | Minimum | Maximum | Mean | Std. Deviation | quality | 6000 | 2.5000 | 9.5000 | 5.825317 | .9206965 | fixed_acidity | 6000 | 3.8000 | 15.9000 | 7.221233 | 1.3094165 | volatile_acidity | 6000 | .0800 | 1.5800 | .340727 | .1653986 | citric_acid | 6000 | .0000 | 1.6600 | .318008 | .1455540 | residual_sugar | 6000 | .6000 | 65.8000 | 5.425650 | 4.7411670 | chlorides | 6000 | .0100 | .6100 | .056483 | .0344872 | free_sulfur_dioxide | 6000 | 1.0 | 289.0 | 30.482 | 17.7550 | total_sulfur_dioxide | 6000 | 6.0 | 440.0 | 115.576 | 56.5940 | density | 6000 | .99 | 1.04 | .9949 | .00504 | pH | 6000 | 2.74 | 4.01 | 3.2195 | .16022 | sulphates | 6000 | .2200 | 2.0000 | .532073 | .1487300 | alcohol | 6000 | 8.0000 | 14.9000 | 10.491008 | 1.1901957 | White | 6000 | 0 | 1 | .75 | .433 | Some of our variables in the dataset have very tight ranges, for example density has a min of .99 and a max of 1.04. On the other hand, total sulfur......

...Regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed. Local Government Engineering Department (LGED) is a public sector organization under the ministry of Local Government, Rural Development & Cooperatives. The prime mandate of LGED is to plan, develop and maintain local level rural, urban and small scale water resources infrastructure throughout the country. Here, I considered LGED as the organization and considering a projects eight districts “available fund” as Independent variable and “development (length of development of road in km)” as dependent variable. The value of the variables are- Districts Fund, X (lakh tk) Development,Y (km) Panchagar 450 10 Thakurgaon 310 6.8 Dinajpur 1500 32 Nilphamari 1160 24.5 Rangpur 1450 31 Kurigram 450 9 Lalmonirhat 950 16 Gaibandha 1550 33 For the two variables “available fund” and “development”, the regression equation can be given as: Y= a + bX Where, Y = Development X = Fund b = rate of change of development a...

...A) Estimated regression equation – First Order: y = β0 + β1x1 + β2x2 + ε Output of 1st Model | | | | | | | | | | | | | | Regression Statistics | | | | | | Multiple R | 0.763064634 | | | | | | R Square | 0.582267636 | SSR/SST | | ̂̂̂ | | | Adjusted R Square | 0.512645575 | | | | | | Standard Error | 547.737482 | | | | | | Observations | 15 | | | | | | | | | | | | | ANOVA | | | | | | | | df | SS | MS | F | Significance F | | Regression | 2 | 5018231.543 | 2509115.772 | 8.363263464 | 0.005313599 | | Residual | 12 | 3600196.19 | 300016.3492 | | | | Total | 14 | 8618427.733 | | | | | | | | | | | | | Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Intercept | -20.35201243 | 652.7453202 | -0.031179101 | 0.975639286 | -1442.561891 | 1401.857866 | Age (x1) | 13.35044655 | 7.671676501 | 1.740225432 | 0.107375657 | -3.364700634 | 30.06559374 | Hours (x2) | 243.7144645 | 63.51173661 | 3.837313819 | 0.002363965 | 105.334278 | 382.0946511 | B) equation | ŷ= -20.3520124320994 + 13.3504465516772 x̂1 + 243.714464532425 x̂2 | C) Interpretation of β β̂1 = 13.35044655, If number of hours worked (x2) held fixed, we can estimate that every one-year increase in age (x1) the mean of annual earnings will increase by 13.35044655. β̂2 = 243.7144645, If age (X1) held fixed, we can estimate that every one hour (x2) of work increase, the mean of......

...Project Title: A STATISTICAL ANAYLYSIS OF NBA PLAYER SALARIES USING A MULTIPLE REGRESSION. ABSTRACT Basketball is one of the most popular sports in the world and National Basketball Association (NBA) is the most popular basketball league in the world. The NBA league is based on the United States of America and it consists of 30 teams. The NBA is so popular that the NBA finals are the 2nd most watched televised event in the U.S. after the NFL (National Football League) Super Bowl. Sometimes when we think about NBA players and the enormous amount of money they are making, we become a little jealous. It is well known about how some star players make so much money or are over-paid and yet can hardly form a sentence. The greatest challenge for the board of NBA has been how to harmonize the salaries. Due to this various people have tried to come up with different solutions .Some argue that height ,weight and physical strength play a big role in team winning but this is not the case as some players who are short help their teams win in several occasions. To solve this problem a multiple regression analysis will be utilized to analyze the salary data. A relationship will be established between the salary and performance variables. The other challenge will be choosing the model parameters that will be significant in order to be included in the model that will be developed. This can be solved by arranging the factors affecting an NBA player salary in a decreasing order of......

ACKNOWLEDGEMENT For the completion of this task, we can't deserve all praise. There were a lot of people who helped us by providing valuable information, advice and guidance. Course report is an important part of BBA program as one can gather practical knowledge within the short period of time by observing and doing this type of task. In this regard our report has been prepared on 'regression analyses.

...------------------------------------------------- REYEM AFFAIR Regression Case Quantitative Methods II To ------------------------------------------------- Prof. Arnab Basu On October 21, 2011 By GROUP NO. 5 Bharati vishal (11110) akshay ram (11110) dhanashree vinayak shirodkar (11110) amol devnath kumbhare (11110) ajusal sugathan (11110) arun prabu (11110) ghule nilesh vishnu (11110) mudavath swetha (11110) Raja Simon J (1111052) sagar behera (11110) shreya sethi (11110) swati murarka (11110) Indian Institute Of Management, Bangalore Table of Contents S.No | Particulars | Pages | 1. | Executive Summary | 3-4 | 2. | Understanding of the Problem | 4 | 3. | Model Description | 5-13 | | Model 1Prediction interval Vs Confidence IntervalStep wise Regression: A closer lookTest of Model: Analysis of Results | 5-8 | | | 6 | | | 7 | | | 8 | | Model 2Test of Model: Analysis of Results | 9-13 | | | 11-13 | | Other Models | 13 | 4. | Conclusions and Recommendations | 14 | 5. | Appendix 1. Variables Entered/Removed 2. Model Summary 3. ANOVA 4. Coefficients 5. Residual Statistics | 15 | Executive Summary Reyem Affiar has recently found the below described condominium in Mid-Cambridge that he wants to purchase. Street Address : 236 Ellery Street Last Price : $169000 Area & Area Code : M/9 Bed : 2 Bath : 1 Rooms : 5 Interior : 1040 Condo : $175 Tax : $1121 RC :......

Words: 8503 - Pages: 35