Issuu on Google+

Crash Course in working with Spreadsheets Spreadsheets can be really useful for doing calculations. Simple uses are computing the sum, the maximum, the mean, or the median of a number of values. 1. Computing the sum of a number of values a. Open a fresh spreadsheet. b. Notice that columns are labeled with letters, and rows are labeled with numbers. c. In column A, now enter in the first 5 rows the numbers 1, 3, 2, 6, 4 d. Select the cells A1 through A5 by clicking and holding cell A1 and dragging with your mouse over to A5, the cells should color blue. e. If youʼre using Google Spreadsheets, in the right bottom corner youʼll see the sum of this series (in Excel itʼll be in the right half of the status bar at the bottom of the window). f. Now, click cell A6 and type the command =SUM(A1:A5) (donʼt forget the = sign!) and press enter. Now youʼll see the sum of the cells A1 through A5 (which is indicated by the notation A1:A5) given in cell A6. g. If the cell A6 is selected (by clicking it once), youʼll notice that the command you just typed is in the formula bar just above the cells. Furthermore, you may notice a dash line demarcation of the cells A1 through A5 indicated in the same color as the expression A1:A5 in the formula. h. Try to get the MAX, MEDIAN, AVERAGE, and VAR (variance) in respectively cells A7, A8, A9, and A10 i. You can also use values that are present in one cell in the computation of the value in another: In cell B1 type the command =A1*A2+A3^2+A4/A5 and press enter. The value in the cell B1 should now be 1x3+22+6/4 = 8.5. 2. Generating sequences of numbers and of formulas a. Spreadsheets are ʻsmartʼ in that they can recognize a sequence from a few example values and generate more values: Enter the values 1, 3, 5 in the first three cells in column C. Now select those three cells by clicking and holding C1 and dragging the mouse down to C3. The three cells should turn blue (like earlier). Also thereʼs a small dark square at the bottom right of the selection. With your mouse, click and hold and drag down till cell C12. The cells until C12 should now be filled with subsequent values in the series (1, 3, 5, 7, 9, 11, etc.). b. Enter the values 1 and 2 in cell D1 and D2 respectively. By selecting these cells and dragging the little square at the bottom right of the selection, generate the sequence 1, 2, 3, ..., 30. c. The really powerful thing of spreadsheets is that they can also generate sequences of formulaʼs: In cell E1, type the command


=(D1 - 15)^2 + 3 and press enter. In column D you should now find a sequence of values that runs from 199 down to 3, and then back up again to 228. d. Click on any of the cells in column D (that have just been filled with numbers) and look at the expression in the formula bar. What do you notice in the formulas as you go from one cell to the next in column D? e. You can make a graph of the values in column D. To do so, select the cells D1:E30 (that is, click and hold cell D1 and drag down to D30 and then one cell to the right to end up at E30, the cells should turn blue). Now, click the graph icon on the tool bar (or choose ʻChartʼ from the ʻInsertʼ menu), and look for the Scatter plot (XY Scatter in Excel). Click ʻInsertʼ (ʻFinishʼ in Excel) to insert the graph in the spreadsheet. 3. Using a cell as a global variable a. In formulaʼs you can indicate the value of any cell in the spreadsheet (by specifying its column and row, such as A2 or B4), except the cell itself. So for instance, cell F3 cannot contain the formula =3*F3^3 Just try it, youʼll get an error message. For instance we can accumulate a series of values by referring to the previous cell and adding another value: In cell G1 type the formula =E1 and in cell G2 type the formula =G1 + E2 and press enter (notice that weʼre referring to the value in the previous row in column G as well as in the same row but in column E?). Now click cell G2, and drag the small square in the bottom right corner down to cell G30. The value in cell G30 should now be 2345. Verify that indeed cell G2 = E1+E2, G3 = E1+E2+E3, G4=E1+E2+E3+E4, etc. b. If you click any of the cells in column G and look at the formula bar, you again notice that the spreadsheet has been ʻsmartʼ in changing the cell references in the formula for us. Sometimes you donʼt want this, however. For instance, we can also calculate the values in column G as follows: In cell H1 type the command =SUM($E$1:E1) and press enter. Now click cell H1 once, and drag the small square in the bottom right corner of the selection down to cell H30. The same values as in column G should now appear in column H. Click any of the cells in column H and investigate the formulas in the formula bar. Notice that in the formulas generated by the spreadsheet the $E$1 part in $E$1:E1 didnʼt change, while the part after the colon did change. The dollar signs ($) can (and must) always be used to indicate a cell reference that mustnʼt be changed in generating a sequence of formulas.


Stochastic Simulation 1. Random number generation a. Think about the following: Can a computer generate randomness? b. One way to generate a sequence of ʻrandomʼ numbers with a computer (often called pseudo-random) is the following: i. Start with an initial number n0 ii. Compute the sequence nk+1 = 75 nk + 1321 (mod) 232 – 1 (x (mod) a indicates the remainder after subtracting the largest multiple of a from x) c. Implement this in fresh sheet in your spreadsheet as follows: i. In cells A1 through A30, generate the sequence 1, 2, 3, ..., 30. ii. In cell B1 type the initial number n0, weʼll first use n0 = 10 (so, enter just the number 10 in B1) iii. In cell B2 type the formula =MOD(7^5 * B1 + 1321, 2^31 - 1) and press enter. Click cell B2 and drag the small blue square in the bottom right corner of the selected cell down to B30. The value in B30 should now be 166803366. (You may need to make column B a bit broader to be able to read all the digits of the number.) How can I know what random number you get in your spreadsheet? d. Change the initial value of the sequence a number of times (in cell B1) and observe how the series in column B changes. e. To generate a bit more of these ʻrandomʻ numbers, select the cells A29:B30 (click and hold A29 and drag towards B30, cells A29, B29, A30 and B30 should be blue). Now click and drag the small dark square in the bottom right of the selected range of cells all the way down to B100. f. To visualize select the cell range A1:B100 and insert a scatter plot. As you may observe in the scatter plot, the values in the series changes quite wildly in a seemingly random way. It turns out however, that this is a bad random number generator for statistical purposes. Fortunately, there are much better random number generators present in your spreadsheet program (or other number crunch programs), all of which are based on the exactly same principle as the above sequence. 2. Simulating throws with a die a. To simulate a throw with a 6-face die, in cell D1 type the command =RANDBETWEEN(1, 6) and press enter. In Excel, if you get a #NAME? error, install the Analysis ToolPak add in, by clicking Add-Ins in the Tools menu and clicking the box next to in the popup window.1 b. Generate multiple throws by clicking cell D1 once, and dragging the small dark box down to cell D100. c. Make a histogram of the die faces as follows: i. In cells E1 though E6 generate the sequence 1, 2, 3, 4, 5, 6 1

If this doesnʼt work, you can try the formula =ROUND(6*RAND()+0.5,0)


ii. First select cells F1:F7 by clicking and holding F1 and dragging down to F7. These cells should now be blue. iii. Now start typing the command =FREQUENCY(D1:D100, E1:E6) and then hold the Ctrl and Shift keys down, and then press enter. (You need to press Ctrl and Shift while pressing Enter, because this is a so-called array formula, which means that an array is returned.) A sequence of values should now appear, ending with the value 0. (If in Excel this is not the case, you may need to select F1 and drag the small square down to F7.) iv.Select cells E1:F6 (just F1:F6 in Excel) and click the Chart icon in the tool bar. Choose the bar chart, and make sure the values in column E are used as labels for the x-axis. Also customize the y-axis range to go from 0 to 25. d. Is the shape of the histogram as expected? That is, what did you expect? And how does it deviate from what you expected? e. What accounts for the deviation from what you expect? f. How could you get it closer to what you expect? 3. Simulating Bernoulli trials a. RANDBETWEEN generates values that are uniformly distributed between the first and second argument (in the die experiment between 1 and 6). If we want to simulate a Bernoulli trial we want to be able to specify the probability of success. There is no built in function for generating Bernoulli trials, however. Fortunately we can use another random number generator, RAND, which generates real numbers between 0 and 1 in a uniform fashion: Create a fresh spreadsheet. In cell A1 type the formula =RAND() and press enter. By selecting and dragging generate more random values in the cells A2:A10. b. We can use this function to generate Bernoulli trials with the following trick: Suppose we would like to have success rate 1/3. In cell B1 type the following formula =IF( RAND() < 1/3, 1, 0) and press enter. This formula says â&#x20AC;&#x153;If (the random between 0 and 1 returned by RAND) is smaller than 1/3, the value is 1, else, the value is 0â&#x20AC;?. By clicking and dragging, generate Bernoulli observations in cells B2:B100. c. In cell C1 type the formula =AVERAGE($B$1:B1) and press enter. Select C1 and drag the small dark box in the bottom right of the selected cell down to C100. d. Explain what is calculated by this sequence (refer back to the Crash course working with spreadsheets questions if necessary). What do you observe in the resulting sequence? e. Select cells C1:C100 and click the graph icon. Insert a Line Chart. What do you see happening in the chart as the number of values in the averages increases? How does the start of the series differ from the end of the series in terms of the variability? f. Using covariance calculus, try to explain analytically what is happening.


4. Simulating test scores a. Test scores are often the number of correctly answered items amongst a fixed set of items. Suppose a test consists of 4 test items, and suppose subject X is submitted to this test and for that person all the items are equally difficult, meaning that for each of the 4 items he has the same probability p of answering the item correctly. Let p = 0.5, simulate the correctness of the answers on each item for subject X in cells E1, F1, G1 and H1, respectively (1=correct, 0=incorrect). b. Compute the test score for subject X in cell I1 by entering the formula =SUM(E1:H1) and pressing enter. c. Now suppose we could brainwash subject X, such that he has no recollection of the test or the items, or even remembers being tested at all. Furthermore suppose we do this multiple times. Simulate this by selecting cells E1:I1 and dragging the small dark box in the bottom right corner of the selected cells down to row 500 (cell I500; in Google Spreadsheet you may need to add 400 rows at the bottom of the sheet). What do the numbers in column I represent? d. Make a normalized histogram as follows: In cells K1:K5 type the values 0, 1, 2, 3, 4. Select cells L1:L6, and enter the formula =FREQUENY(I1:I100, K1:K5) and hold the Ctrl and Shift keys down while pressing enter. In cell M1 type the formula =L1/500 and press enter. Select cell M1 and drag the small dark box in the bottom right corner of the selection down to M5. Select M1:M5 and insert a bar chart. e. Is the normalized bar chart as expected? f. To redo the simulation press Ctrl-R, and notice how also the chart is updated with the new simulated values. g. What do you expect this bar chart to look like? What is the name of this distribution? h. Do parts a–g again (if necessary in fresh sheets), but this time simulating a test with 12 items and with 24 and 48 items and normalize the test scores by first subtracting n*0.3 and then dividing through by SQRT(n*0.3*(1-0.3)), where n is the number of items (i.e., 12, 24 and 48). What do you notice about the shape of the distribution of these normalized test scores? i. In real life of course we canʼt brainwash subjects. We can however test multiple subjects, each having their own probability of answering an item correctly. To simulate this, start with a fresh sheet, and in column A generate the sequence 0, 0.005, 0.01, etc., up till, 0.995. These will be the probabilities of answering correctly for 200 different subjects. Now simulate answers to the items on a test of 12 items, by entering in cell B1 the formula =IF(RAND()<$A1, 1, 0) and pressing enter, and then selecting B1 and dragging the small dark square to the right to cell M1. Then select cells B1:M1 and drag the small dark square down to the 200th row (cell M200). j. In column N compute the test score for each subject.


k. Make a normalized histogram of the test scores. Is it conform what you expected in advance? 5. Using RAND to generate observations from other distributions a. As indicate earlier RAND() generates real numbers between 0 and 1 (1 not included) with equal probability. That is, it generates the continuous analogue of the (discrete) uniform distribution that we used in the die simulation (where we used RANDBETWEEN). (Actually, because of computer limitations, RAND generates also from a discrete outcome space, but the spacing between the numbers is so small that most of the time we can safely pretend as if it where continuous.) In a fresh new sheet, with the function RAND fill column A with 1000 draws from the continuous uniform distribution, and make a normalized histogram with 16 bins: In cell B1 type the value 0.0625, and in cell B2 type value 0.125. Select cells B1:B2 and drag the small dark square down to B16. Then select cells C1:C16 and type the formula =FREQUENCY(A1:A1000, B1:B16) Hold down the Ctrl and Shift key while you press Enter. Normalize this frequency histogram in column D and make a bar plot diagram. Is the histogram as expected? b. In column F generate 500 numbers uniformly from the interval (â&#x20AC;&#x201C;2, 2) by multiplying the values in column A by and appropriate constant and adding another appropriate constant. Make a normalized histogram (with 16 bins). c. In column K generate 1000 numbers by transforming the values in column A by typing in cell K1 the formula =-LN(1 - A1) pressing enter, selecting cell A1 and dragging the small dark square down to K1000. Make a normalized histogram (with 16 bins) and describe the shape of the histogram. d. In column P generate 1000 numbers by transforming the values in column K as follows: in cell P1 type the formula = 1 - EXP(-K1) and press enter. Click K1 and drag the small dark square down to K1000. i. What are the minimum and maximum values in column K? ii. Create a normalized histogram with 16 bins in the range from 0 to 10. iii. Compare the normalized histogram of the values in column P with the histogram of the values in column K. What has changed? Compare the normalized histogram of the values in column A with the one of column P? What do you notice? iv.Try to explain the result analytically


Mathematical Statistics Workgroup 2 – exercises