Wednesday, May 6, 2020
Electricity Consumption
Question: Explain the project is aimed at performing a detailed analysis on the various factors that affect the electricity consumption. Answer: The project is aimed at performing a detailed analysis on the various factors that affect the electricity consumption. Some of the important factors that affect Electricity consumption are Family Income Number of Occupants of the household Number of Rooms of the household Apart from these three factors, five more factors are considered in the analysis. We first draw a random sample of size 100 from the main population data of size 1000 and carried out relevant analysis on the data of size 100. We have generalized our results to the population data of size 1000. The sample was drawn without replacement and the 100 data points are unique and there are no repetitions. We have tried to visually represent these eight variables with the help of diagrams and descriptive measures. This has provided us with a brief description of the different variables that are considered in the analysis and we have seen that the mean energy consumption and mean amount of weekly income is not very high for the given population. We have also tried to give a definite range of values that the energy consumption and weekly income may take. We are 95% confidence about the intervals constructed and the validity of our result was also verified. We have proceeded to verify if the energy consumption varies with house Ownership, or with different states or with the average age of the households. The result obtained told us that the three factors do not significantly give different Energy Bills for different categories. Thus there is no significant difference in the electricity consumption of households with renters and owners, in the state NSW and other states, with age group less than 45 years and with age group greater than 45 years. Also, we have proceeded to find out if there is any kind of linear relationship between Energy Consumption and Weekly Income and between Energy Consumption and Number of Rooms in the household. Even if there is a strong relationship, we must find out how strong is the relationship. This was done with help of regression analyses using statistical tools. It was found that there is indeed moderately strong linear relationship between Energy Consumption and Weekly Income and also between Energy Consumption and Number of Rooms in the household. Both the variables are positively related with energy consumption. The results that we have obtained from the different analyses we conducted in this project are considered to hold for the entire population under consideration and hence the results obtained from the sample data is true for the whole population. Introduction: The main objective of the project is to investigate various factors that affect the household electricity consumption. The consumption of electricity of a household depends on several factors like number of members of the family, no. of electrical appliances and heir average usage. Also income is an important influential factor of electricity consumption. Thus here, our objective is to find out various factors that affect electricity consumption and also we shall try to investigate how much is their role in the consumption of electricity. The variables that considered in this project are Home Ownership: The state of home ownership (O: Owned and R: Rented) Average Age: Average age of occupants State: Location of property (NSW: New South Wales, Vic: Victoria and Qld: Queensland) No of Rooms: Number of rooms in a house Energy Bill: Total cost of monthly energy bills (in AUD) Language: Language spoken at home (1: English and 2: Other than English) No of Occupants: Number of people live in a house Weekly Income: Average weekly income of a household (in AUD) For each of these eight variables, we shall try to represent them graphically and shall also produce descriptive statistics wherever needed. For the variables Home Ownership, Average Age, State, No. of rooms, Language, No. of Occupants, we shall use graphical representations like bar chart and pie diagram since the data available on them are categorical here. Also we shall provide a frequency distribution table for each of the variables. For variables Energy Bill and Weekly Income, we shall use histogram since the data is continuous in nature and has been measured using Ratio scale and also provide some of the important descriptive statistics that would give us a short summary about the distribution of the variables. We shall also conduct some tests regarding correlation and shall also proceed to test for the averages of some of the variables. We shall also construct confidence interval for some of the variables and then proceed to see if the actual population mean fall in the confidence interval. This will help us to determine whether the confidence interval is worthy or not. Before we conduct all the statistical tests and analysis, it is first mandatory to draw a suitable representative sample. We have drawn a random sample of size 100 from the population of size 1000. The sample chosen must be a good representative of the population and should be collected without replacement. Hence, the sample drawn in this project has been drawn by the method of simple random sampling without replacement. In the method of simple random sampling without replacement, all the units of the population are given equal importance and thus have equal chance of being selected in the sample. The selection of sample is done without replacement, thus once an unit has been chosen, it will not appear in the sample for second time. Thus, the sample contains 100 distinct units. Analysis: DESCRIPTIVE STATISTICS Here we shall use various statistical techniques like graphical representation and descriptive statistics (average, volatility etc). We shall consider one variable at a time. Home Ownership The variable Home Ownership is Categorical in nature as the two possible categories here are Owned and Rented. Thus the variable has been measured in Nominal Scale and we use a pie diagram to represent the percentages of households that are Owned and Rented. The pie diagram is shown below: The frequency distribution table of the House Ownership is shown below: Home Ownership Frequency Owned 73 Rented 27 We thus observe that number of houses that are Owned is much greater than the number of houses that are Rented. Hence most of the houses that have been sampled have their Ownership. Average Age We know that average age of a household is measured in Ratio scale since Age may assume a large range of values. However in this case the average age has been categorised in the following categories Thus in this case, we have four categories of age and thus we should treat the variable as Ordinal in Nature (since different age groups can be considered to be ordered from lower to higher ages) and would summarise the data with the help of a Bar diagram. The Bar diagram is shown below: Also, the frequency distribution table for the age groups is shown below: Age Frequency 24 23 24-44 25 45-64 19 65 33 We thus observe that most of the household has average age to be greater than 65 which implies that aged people constitute a great proportion of the population. It can be assumed that aged people spend most of their time indoor and are thus expected to consume more amount of electricity. We also observe that the least frequency is shown by the class 45-64. Thus not many people in the population are in the age group 45-64. State The variable State is also categorical in nature and is measured in Nominal scale. The appropriate diagram used here is the pie chart and is shown below: The frequency distribution table is shown below: We thus observe that the highest number of households has been selected from NSW and Qld has the lowest contribution to the data. This choice might have been done depending on the size of the States. Number of Rooms The variable number of rooms is discrete in nature and is basically a count data. The data is numerical and thus is measured in interval scale. Hence we shall use Bar diagram to represent the data in this project. The diagram is shown below: The frequency table is shown below: We thus observe that most of the houses have 3 rooms and least number of houses have 7 rooms. The more is the number of houses, the more is the number of electrical appliances and thus more is the electricity consumption. Here since maximum number of houses has 2 rooms, thus electricity consumption may be supposed to be less. Electricity Bill The variable Electricity Bill is continuous in nature and is measured in Ratio scale. Since it is measured in Ratio Scale, thus we shall use histogram to represent the data graphically (since histogram is used to represent continuous variables). The diagram is shown below: Some of the descriptive statistics are also listed below: We observe that the mean amount of electricity bill is 122.5 and also the histogram suggests that the highest frequency of electricity bill lies somewhere around 115.9. Thus the average amount of electricity bill is not very high for the chosen sample. Language Here the language has been categorised in two categories which are 1: English 0: Other than English Since the data is categorical in nature and is measured in Nominal scale, we shall use Pie Diagram to represent the data. Also the frequency distribution table is shown below: We observe that most of that most of households chosen in the sample speak English as the percentage of households that Speak English is 68% which is quite high. of Occupants The variable number of occupants is discrete in nature and is basically a count data. The data is numerical and thus is measured in interval scale. Hence we shall use Bar diagram to represent the data in this project. The diagram is shown below: The frequency Table is shown below: We thus observe that most of the houses have 2 occupants and least number of houses have 8 occupants. The more is the number of occupants, the more is the number of electrical appliances and thus more is the electricity consumption. Here since maximum number of houses has 2 occupants, thus electricity consumption may be supposed to be less. Weekly Income The variable Weekly Income is continuous in nature and is measured in Ratio scale. Since it is measured in Ratio Scale, thus we shall use histogram to represent the data graphically (since histogram is used to represent continuous variables). The diagram is shown below: Some of the descriptive statistics are also listed below: We observe that the mean amount of weekly income is 639.2 and also the histogram suggests that the highest frequency of Weekly Income lies somewhere around 65.6. Thus the average amount of Weekly Income is not very high for the chosen sample. CONFIDENCE INTERVAL We are 95% Confident that the actual mean Weekly Income of households of Home Owners lie between the range [525.4, 861.9] The actual mean Weekly Income of households of Home Owners is 554.3544. Clearly the value 554.3544 lie in the range mentioned above. Hence our estimation of the Confidence Interval is worthy. We are 95% confident that the average Total Bill of the households lie between the range [96.8, 148.3] The actual mean Total Bill for all the 1000 households is 119.001 Clearly the value 119.001 lie in the range mentioned above. Hence our estimation of the Confidence Interval is worthy. HYPOTHESIS TESTING We shall perform hypothesis test to find out if the renters consume less energy on an average than Home Owners. The type of the Hypothesis is lower, i.e. we shall test if mean electricity consumption of renters is less than Home owners. The test statistic is given by T =(m1-m2)/(s *sqrt (/1/n1 + 1/n2)) m1 and m2 are the mean Electricity Bill for Renters and Home Owners. s is the pooled standard deviation n1 and n2 are the number of Renters and Home Owners. The degrees of freedom of the test statistic is 98. The value of the test statistic is T = -0.62 The p-value of the test is p =0.269 We observe that p 0.05. Hence, we are 95% confident that the renters do not consume less energy than the home owners. We shall perform hypothesis test to find out if the energy bill for NSW is different from the two other states taken together. The type of the Hypothesis is two tailed, i.e. we shall test if mean energy bill for NSW is either less or more than the two other states. The test statistic is given by T =(m1-m2)/(s *sqrt (/1/n1 + 1/n2)) m1 and m2 are the mean Electricity Bill for NSW and two other states s is the pooled standard deviation n1 and n2 are the number of households in NSW and in two other states. The degrees of freedom of the test statistic is 98. The value of the test statistic is T = 1.78 The p-value of the test is p =0.079 We observe that p 0.05. Hence, we are 95% confident that the energy bill of NSW is not different from that of the other two states. We shall perform hypothesis test to find out if the average number of occupants is more in the age group less than 44 years than the age groups greater than 45 years. The type of the Hypothesis is upper, i.e. we shall test if the mean number of occupants is more for the age group less than 44 years than those in the age group more than 45 years. The test statistic is given by T =(m1-m2)/(s *sqrt (/1/n1 + 1/n2)) m1 and m2 are the mean number of occupants in the two age groups s is the pooled standard deviation n1 and n2 are the number of households in the two age groups. The degrees of freedom of the test statistic is 98. The value of the test statistic is T = -0.98 The p-value of the test is p =0.835 We observe that p 0.05. Hence, we are 95% confident that the mean number of occupants in the age group less than 44 years is not more than those in the age group above 45 years. CORRELATION and REGRESSION Here we shall first consider the Energy Bill and Weekly Income of the household where English is spoken at home. Here the dependent variable is Energy Bill and the independent variable is Weekly Income. This is because, the more is the income, the more people would tend to spend on Energy Bill. We shall commence our analysis with the help of a scatterplot between the two variables which is shown below: The diagram shows us that Most of the points tends to follow a straight line but are very clustered around each other. Some of the points got deviated from the straight line and lies above and below the fitted regression line. Some pints are extremely scattered and lies far away from the fitted linear line. The trend of the linear relationship is in positive direction. Now, to fit a linear regression line of Energy Bill (Y) and Weekly Income (X), we are required to model the relation as Y =a + bx+ u u is the random error and a, b are unknown parameters to be estimated by least square theory. We obtain our estimated model as give below: Y =41.34 + 0.138X Here the value 41.34 is called the y-intercept, that is the fitted linear line to the given data passes through the point (0, 41.34). Also, the value 0.138 is known as the regression coefficient and it given us the unit increase in Y with an unit increase in X. Here with an unit increase in X, the Y value increases by an amount of 0.138. Since the value of regression is positive, thus the relation between X and Y is positive; thus as X increases, Y also increase. The value of the correlation coefficient between X and Y is 0.454 which is not very high and thus we may conclude that the linear relationship between X and Y is not very strong. The value of the coefficient of determination is 0.206 which is also very low and indicates that the fitted regression line in prediction of Y is not much worthy. There may be some other kind of relationship between X and Y line quadratic, exponential or cubic. Now, we shall test for whether there is any linear relationship between X and Y. For this purpose we shall use the right tailed test with the test statistic F =MST/MSE, where MST is the mean square due to Treatment and MSE is the mean square error. The degrees of Freedom of F are 1 and 66. The value of the test statistic is F=17.120255. The p-value of the test is given as p =0.0001 (0.05) If we consider our degree of error to be 5%, then we are 95% confident that the there exist a linear relationship between Energy Bill and Weekly Income for those households where only English is spoken. Here we shall first consider the Energy Bill and number of rooms for all the households in the sample. Here the dependent variable is Energy Bill and the independent variable is Number of Rooms. This is because, the more is the number of rooms, the more people would tend to spend on Energy Bill. We shall commence our analysis with the help of a scatterplot between the two variables which is shown below: We observe from the scatter plot that Against some given values of number of rooms, there are several points of Energy Bill. Thus the clustering of points are around some points and are not uniformly distributed. We cannot determine any trend from the scatterplot. Most of the points are scattered away from the fitted linear regression line. Now, to fit a linear regression line of Energy Bill (Y) and Number of Rooms (X), we are required to model the relation as Y =a + bx+ u u is the random error and a, b are unknown parameters to be estimated by least square theory. We obtain our estimated model as give below: Y =54.783 + 17.285X Here the value 54.783 is called the y-intercept, that is the fitted linear line to the given data passes through the point (0, 54.783). Also, the value 17.285 is known as the regression coefficient and it given us the unit increase in Y with an unit increase in X. Here with an unit increase in X, the Y value increases by an amount of 17.285. Since the value of regression is positive, thus the relation between X and Y is positive; thus as X increases, Y also increase. The value of the correlation coefficient between X and Y is 0.28 which is not very high and thus we may conclude that the linear relationship between X and Y is not very strong. The value of the coefficient of determination is 0.07862 which is also very low and indicates that the fitted regression line in prediction of Y is not much worthy. There may be some other kind of relationship between X and Y line quadratic, exponential or cubic. Now, we shall test for whether there is any linear relationship between X and Y. For this purpose we shall use the right tailed test with the test statistic F =MST/MSE, where MST is the mean square due to Treatment and MSE is the mean square error. The degrees of Freedom of F are 1 and 98. The value of the test statistic is F= 8.362. The p-value of the test is given as p =0.005 (0.05) If we consider our degree of error to be 5%, then we are 95% confident that the there exist a linear relationship between Energy Bill and Number of Rooms . CONCLUSIONS AND LIMITATIONS: After conducting the relevant analyses, we are here to conclude the following things: The average Energy Bill and average Weekly Income are not very high for the given population. The sample of size 100 drawn is a good representative of the actual population of size 1000 since the Confidence Intervals constructed for the means of the variables contained the actual population mean. The energy consumption does not vary significantly with respect to House Ownership. The energy consumption does not vary with significantly between NSW and other states taken together. Age of the household members does not have any significant effect on energy consumption. Energy Consumption depends linearly on Weekly Income. Energy Consumption also has a linear relationship with Number of rooms in the household. The limitations of the analyses conducted in this project are. In most of the cases we have assumed the theoretical distribution of the variables. However, there is no validity of our assumptions. In most of the cases, we have assumed our degree of error to be 5%. We could have chosen a smaller degree of error to get more accurate results.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.