A02 CoCalc and Sage Introduction
CoCalc with Sage combines the capabilities of a word processor and a sophisticated calculator. Read and follow the instructions below to obtain an introduction to CoCalc and Sage and a review of some important mathematical concepts and techniques. You are encouraged to collaborate with other students and to seek assistance from the instructor and others. However, each student must submit their own completed notebook and acknowledge any collaborators, resources used, and assistance received. Note that there are some suggested exercises along the way, but it is only the assignment in the last section that should be submitted for a grade.
Introduction
Notebooks are partitioned into markdown and code cells. To edit a markdown cell: double-click, type the text you want, and press shift-enter to make it look pretty. Lines starting with one to six hash marks (#) are converted into header font sizes. Text surrounded by single underscores or asterisks are emphasized, and text surrounded by two underscores or asterisks are strongly emphasized. More details about the markdown syntax can be obtained by clicking on 'Help' in the menu bar and then clicking on 'Markdown' in the dropdown menu.
Exercise. Insert a new markdown cell below this one and type a random sentence with a word emphasized.
Code cells contain commands to be executed. The following code cell calculates . Execute the command by clicking anywhere in the command and pressing shift-enter.
Several observations can be made. Commands are typed with standard keyboard characters. The characters +, -, *, and / are used for addition, subtraction, multiplication, and division, respectively. Functions are represented with standard names: cos for cosine, log for the natural logarithm, and sqrt for the square root. The arguments to functions are enclosed within parentheses. Finally, the output is an exact value: cannot be written as an integer or rational number and so is left as , and the rational number is left as 17/2 rather than 8.5 because the presence of a decimal point is Sage's way of representing an approximate real number. You can ask Sage to output an approximate real number or 'numerical' result by using the n function. Execute the next code cell and observe the output.
Pretty output can be obtained with the 'show' function. Execute the next code cell and observe the output. (This is the last time you will be explicitly told to do this. Every time you reach a code cell, you should execute it, observe the output, and make sure you understand the meaning of the commond and output.)
You may have observed that symbolic expressions in the markdown cells, such as have appeared in the pretty format. This is accomplished through LaTeX enclosed within dollar signs (single for in-line expressions and double for displayed expressions). If you double-click on this markdown cell, you can see the LaTeX used to create the pretty symbolic expression. In LaTeX,
symbolic names are preceeded with the backslash character (e.g.,
\cos),superscripts are created with
base^{superscript enclosed in curly braces},square roots are created with
\sqrt{contents enclosed within curly braces},fractions are created with
\frac{numerator}{denominator}, andresizeable parentheses are created with
\left(' and '\right).
A good summary of LaTeX symbols and commands can be found at https://artofproblemsolving.com/wiki/index.php/LaTeX:Symbols.
Exercise. Edit this markdown cell to include LaTex versions of the following symbolic expressions:
a linear equation with slope 3 and vertical intercept 5
the area of a circle with radius
a sinusoidal function with period 5, amplitude 7, and mean 3
a formula for the length of the hypotenuse of a right triangle with leg lengths and
One way to create a new cell is through the 'Insert' menu item. A second way is by pressing the a or b keys while in command mode (the cursor is not where a cell can be editted and the cell is framed in blue and gray instead of green). By default, a new code cell is created. It can be changed to a markdown cell with the dropdown found immediately below the menu bar.
Exercise. Insert a code cell below. Use it to calculate the area of a circle with radius 7.3 centimeters. Insert a markdown cell below the calculation, and write a sentence to summarize your result. Be sure to use an appropriate number of digits and include appropriate units.
Exercise. Insert two code cells below. Use them to calculate exactly and approximately the length of the hypotenuse of a right triangle with side lengths 7 and 8. Insert a markdown cell below the calculations, and write a sentence to summarize your results.
We close this section with illustrations of other ways Sage is different from a standard calculator. Try calculating 100! or finding 200 decimal digits of with a calculator versus the output of the following commands.
Algebra
In high school algebra, you learned how to expand and factor symbolic expressions and solve equations and inequalities. Sage can carry out such symbol manipulations.
In Sage, numbers can be assigned to names (which computer programmers call "variables"). The following code
assigns to the name
radiusthe number17,assigns to the name
volumethe number(4/3)*pi*17^3,assigns to the name
avolumethe numerical approximation of the number assigned tovolume, andoutputs a list containing what has been assigned to the three names.
Names can also be used as symbols (which mathematicians call "variables"). The following code
makes the names
a,b,c, andxact as symbols, andoutputs a list containing the symbol
aand the number assigned toradius.
Notice what happens when we try to output a name that has neither been assigned a number nor been made a symbol.
Symbolic expressions can be expanded.
Symbolic expressions can be factored.
Symbolic equations can be solved. Since a single equals sign is used for assignment, two equal signs are used to represent the equals sign in an equation. All solutions are displayed in a list.
The next code finds solutions in to the equation .
Here is how to solve the following system of equations:
Sage cannot find exact solutions to the equation .
In such a situation, an approximate solution within an interval can be sought. The next three code cells look for an approximate solution for the equation on each of the intervals , , and .
The next code cell calculates that the solution to the inequality is the set .
Exercise. Simple Bank offers a 9% annual rate of interest without compounding. Compound Bank offers a 7% annual rate compounded annually. If $100 is deposited into each bank, when will the accumulated amount in Bank Compound overtake the accumulated amount in Bank Simple? Include both one or more code cells with your computations and a final markdown cell with your answer expressed in a complete sentence.
Exercise. One widget contains 3 bolts and 4 links. One framingstan contains 5 bolts and 2 links. If 172 bolts and 122 links were used, how many widgets and framingstans were built? Include both one or more code cells with your computations and a final markdown cell with your answer expressed in a complete sentence.
Exercise. Where does the circle of radius 3 and center at the origin intersect with the line passing through the points and ? Include both one or more code cells with your computations and a final markdown cell with your answer expressed in a complete sentence.
Exercise. Describe the difference between what computer programmers and mathematicians call variables.
Functions and Plots
A function is a rule that yields an output for each possible input. The set of possible inputs is called the domain of the function, and the set of actual outpus is called the range of the function. In calculus, we typically define functions symbolically and assume that the domain consists of all real numbers for which the symbolic expression makes sense. For example, defines a function on the domain of all real numbers and Symbolical functions can be defined, evaluated, and graphed in Sage in a natural manner.
Observe that f is the function and f(x) is a symbolic expression for the rule.
Exercise. Approximate to two decimal places all solutions to the equation . Hint: Argue why all solutions must lie in the closed interval . Graph over the interval . Finally, use find_root and state your answer.
The piecewise defined symbolic function is defined in Sage in the following manner. The RealSet methods are used to avoid mismatched parentheses and square brackets which would be a syntax error. To improve readability, the 'single' line of code is broken into three lines with the continuation marker (\) telling Sage to ignore the line break.
The following graph shows that Sage does not simply plot equally spaced points that are then connected by lines. The graph of for should oscillate an infinite number of times as approaches 0, and Sage has done a reasonable job of representing that behavior. Nonetheless, Sage does connect points it adaptively chooses to plot. This means it will cnnect two points on either side of , resulting in a nearly vertical line near . The exclude option tells the plot command to not connect the points plotted on immediately less and more than 1.
Exercise. In addition to exclude, there are many additional options available to enhance a graph. A good way to learn about a Sage function is to type its name followed by a question mark (?). Execute the following cell and find information about the plot funcion. Use this information to create a graph of on the interval which is green, only shows the y values between -2 and 2, includes a title, and includes axes labels. Feel free to experiment with additional options.
In addition to symbolic functions, Sage allows functions to be defined using Python code. The next code cell redefines the function by the symbolic expression , and then defines to be a pseudo inverse of . More precisely, if , then is the value of satisfying , and if , then is the value of satisfying . (It might be helpful to look at the graph of obtained earlier when trying to read and understand the previous sentence.)
A Python function can be evaluated in the same manner as a symbolic function.
As we have seen previously, the plot function takes as its first argument the expression of a symbolic function (e.g., f(x)); however, for a Python function, the first argument must be the name of the function (e.g., g).
Calculus
Given a function, we can calculate its symbolic derivative and its derivative at a particular number.
Given an expression, we can calculate its indefinite integral and any definite integral.
Fitting a Functional Model to Bivariate Data
Eight Ponderosa pine trees had their circumference at waist height measured in inches, chopped down, and the amount of usable lumber was measured in board-feet. These measurements are given in the following table. We can store this data in Sage using the following cell.
Observe that the zip function puts together the data from the two separate lists into a single list of ordered pairs.
We are most interested in the relationship (if any) between the amount of usable lumber in a Ponderosa pine and its circumference at waist height. If there is a strong relationship, then we could use that relationship to predict usable lumber in a forest simply by measuring the circumference of the trees at waist height. A scatter plot of the data shows that there is a strong positive relationship between usable lumber and circumference at waist height.
A close look at the usable lumber vs. circumference scatter plot of the Ponderosa pine tree data suggests a parabola. The following code chooses the parameters , , and that minimizes the root mean square error among all models of the form where is the usable lumber in board-feet and is the circumference at waist height in inches. It then displays the best fit model, a graph of the data and best fit model, and the mean, root mean square, and maximum errors.
You should make sure you have a basic understanding of the code. One way to do that is to examine some of the names that were created. For example, the following code outputs the errors list. Comparing these numbers to the above graph, you can hopefully see that they are the vertical distances between the model curve and data point for each of the eight data points.
Visually, the model fits the data fairly well. The root mean square error of 123 board-feet and maximum error of 210 board-feet give us a sense of the precision of any predictions made by the best fit model . One downside of this model is that it is somewhat difficult to provide a meaningful verbal description of it.
You will never need to come up the above code from memory. You can always copy, paste, and ppropriately modify it for other situations. For example, I have done that below where there was no need to repeat the first few lines and then the only change was to use a power function model .
I prefer this power model to the quadratic model because
the root mean square error is smaller (119 vs. 123),
fewer parameters were used (2 vs. 3), and
there are natural interpretations.
The parameters in the power model have standard verbal descriptions:
The 1.5 can be described as the board-feet of usable lumber from a tree with 1 inch circumference. Of course, no tree in the sample was that small, and so it would be better to give meaning to the 1.5 parameter value with a larger tree. For example, we could say that the model predicts that a tree having a 5 inch circumference would have board-feet.
Notice that according to the best fit model, doubling the circumference of a tree increases the amount of usable lumber times. This is a standard invariant to report: a doubling of the circumference increases the usable lumber by a factor of 7.5.
More qualitatively, the model is almost a cubic relationship. Since usable lumber is a volume measurement and circumference is a linear measurement, it is reasonable to think that usable lumber would be proportional to the cube of the circumference.
Importing and Describing Grouped Data
TwoProcesses.csv contains measurements of an old process and a new process. Import this data into the variables old and new.
Two ways of comparing central tendency are the mean and median.
Two ways of comparing spread are the standard deviation and interquartile range. The later statistic is not built into Sage but can be found in a standard Python library.
Sage has a built in histogram function, but it does not seem to work with two data sets of different sizes, and Sage does not have a built in boxplot. Instead we make use of matplotlib to obtain a boxplot and a histogram.
Error Propagation
A random variable can be thought of as a black box from which a number is returned each time it is queried. The number returned on any particular query is unknown; however, the distribution of numbers after a large number of queries is known. For example, here is how to define a random variable that simulates a fair six-sided die.
Query the random variable 100 times (i.e., simulate rolling a fair six-sided die 100 times) and display a histogram of the results.
Exercise. Execute the previous input cell several times. What do you observe? Change from 100 to 900 rolls in the previous input cell (maybe end the first line with a semicolon so that the 900 numbers are not displayed) and execute several times. What do you observe?
The accuracy and precision of a measurement can often be modeled as a random variable having a normal distribution. Assume that the floor of a room is a rectangle, the length is 425 inches but actual measurements could vary with a standard deviation of 1 inch, and the width is 296 inches but actual measurements could vary with a standard deviation of 3 inches. Generate 900 simulated measurements of the length and width of the room.
The mean and standard deviation of the length measurements should be near the true mean and standard deviation. The histogram of the length measurements should be symmetric about the mean and bell shaped with the inflection points about one standard deviation away from the mean.
The empirical rule states that about 68%, 95%, and 100% of the distribution is within one, two, and three standard deviations of the mean.
If and are constants and and are independent random variables, then and . Undergraduate science courses usually use the rules that multiplying by a constant multiplies the error by a constant and adding two measurements adds the absolute errors. This later rule is equivalent to assuming that . While this may seem like an algebraic error we reprimand students for making, it does simplify the rule and provides a conservative overestimate for the error.
If and are independent random variables, then and Undergraduate science courses usually use the rule that multiplying two measurements adds the relative errors. This later rule is equivalent to assuming that Again this simpler rule provides a reasonable overestimate for the error.
Whether you know the simple rules of thumb for error propagation for sums, differences, products, and quotients, or the more accurate rules from probability theory, you will be stuck if you are using more complicated formulas. For example, to measure the vertical height of a tall object, you can measure the angle between the horizontal and the top of the object at a point a known horizontal distance from the tall object. The height is then given by the formula . Suppose feet and degrees. What is meant by ? Usually it means that we have a high degree of confidence that the true measurement is within of the stated measurement. If our level of confidence is 95%, then according to the empirical rule, is roughly 2 times the standard deviation of the random variable representing the possible measurements. In general, means that is a constant multiple of the standard deviation, and so our previous approach of randomly generating measurements can again be applied.
A reasonable conclusion is that the height of the object is feet.
Assignment
Solve the following two problems in a new notebook. You should include a title or subtitle with your name at the top of the notebook, and each problem should be in its own section. Include all relevant calculations and explanations (but no more).
Problem 1. Display a graph of the function and the line tangent to the graph of when . Choose the domain of your graph to clearly show the point of tangency and the next intersection of the graph of and the tangent line. Find the area (to at least three significant digits) enclosed by the graph of and the tangent line between the point of tangency and the next intersection point.
Problem 2. Suppose is normally distributed random variable having a mean of and standard deviation of . Suppose further that . What is the mean and standard deviation of ? What is the relationship for an arbitrary mean and standard deviation ?