Using JupyterLab
Overview
Teaching: 0 min
Exercises: 0 minQuestions
How can I use JupyterLab for Climate Data Analysis
Objectives
Learn key features of JupyterLab to use for Climate Data Analysis
JupyterLab (Jupyter Notebook)
JupyterLab allows us to create Jupyter Notebooks which can contain a combination of code, figures, links, Formatted text, and latex LaTex equations.
It is also a web-based programming interface for mutiple languages (e.g. Python, R, Matlab). We will use it as our Python programming interface.
Because it also allows figures, links, test, and equations in addition to code, it is very useful for use in research allowing all your information related to your research to be kept together rather than in separate documents.
Creating and working with a Jupyter Notebook
First, let’s create a new Jupyter Notebook by clicking File
->New
->Noteboo
or clicking on Python 3
This creates a new notebook with a default title Untitled.ipynb
or Untitled#.ipynb
. Note that Jupyter Notebooks end in .ipynb
Change the name of your notebook to PracticeNotebook.ipynb
by clicking File
->Save Notebook As
The rectangular box in your notebook is called a cell
it contains a block of code
, Markdown
, or Raw
text. What a cell
contains is indicated in te menu above.
To determine what kind of Code
a cell contains for a given notebook, the kernel
is shown in the upper right. This notebook contains Python 3
code.
What is Markdown?
Markdown is a formatting language that allows you to provide formatted text (e.g. bold, italics, links, different sized font, and LaTeX equations.
As an example, let’s type the following in a cell and change the cell to Markdown:
CLIM 680 Practice Notebook
by Kathy Pegion
for class
We can insert
LaTeX
equationsThe equation for the mean is given by: \begin{equation} \mu_n=\sum_{i=1}^{N}X \end{equation}
We can link to papers
The analysis in this notebook follows, Pegion et al. 2019
We can make a numbered list; This notebook will:
- First thing
- Second thing
- Third thing
We can make a bulleted list. Important things for this notebook are:
- something important
- something else important
Key Points
JupyterLab can be used as a Python programming environment
You can create notebooks with codes, figures, links, text, and equations
You can run your codes in JupyterLab cell by cell
Repeating Actions with Loops
Overview
Teaching: 30 min
Exercises: 0 minQuestions
How can I do the same operations on many different values?
Objectives
Explain what a
for
loop does.Correctly write
for
loops to repeat simple calculations.Trace changes to a loop variable as the loop runs.
Trace changes to other variables as they are updated by a
for
loop.
In the last episode, we wrote Python code that plots values of interest from our first
inflammation dataset (inflammation-01.csv
), which revealed some suspicious features in it.
We have a dozen data sets right now, though, and more on the way. We want to create plots for all of our data sets with a single statement. To do that, we’ll have to teach the computer how to repeat things.
An example task that we might want to repeat is printing each character in a word on a line of its own.
word = 'lead'
In Python, a string is basically an ordered collection of characters, and every
character has a unique number associated with it – its index. This means that
we can access characters in a string using their indices.
For example, we can get the first character of the word 'lead'
, by using
word[0]
. One way to print each character is to use four print
statements:
print(word[0])
print(word[1])
print(word[2])
print(word[3])
l
e
a
d
This is a bad approach for three reasons:
-
Not scalable. Imagine you need to print characters of a string that is hundreds of letters long. It might be easier to type them in manually.
-
Difficult to maintain. If we want to decorate each printed character with an asterisk or any other character, we would have to change four lines of code. While this might not be a problem for short strings, it would definitely be a problem for longer ones.
-
Fragile. If we use it with a word that has more characters than what we initially envisioned, it will only display part of the word’s characters. A shorter string, on the other hand, will cause an error because it will be trying to display part of the string that doesn’t exist.
word = 'tin'
print(word[0])
print(word[1])
print(word[2])
print(word[3])
t
i
n
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-3-7974b6cdaf14> in <module>()
3 print(word[1])
4 print(word[2])
----> 5 print(word[3])
IndexError: string index out of range
Here’s a better approach:
word = 'lead'
for char in word:
print(char)
l
e
a
d
This is shorter — certainly shorter than something that prints every character in a hundred-letter string — and more robust as well:
word = 'oxygen'
for char in word:
print(char)
o
x
y
g
e
n
The improved version uses a for loop to repeat an operation — in this case, printing — once for each thing in a sequence. The general form of a loop is:
for variable in collection:
# do things using variable, such as print
Using the oxygen example above, the loop might look like this:
where each character (char
) in the variable word
is looped through and printed one character
after another. The numbers in the diagram denote which loop cycle the character was printed in (1
being the first loop, and 6 being the final loop).
We can call the loop variable anything we like, but
there must be a colon at the end of the line starting the loop, and we must indent anything we
want to run inside the loop. Unlike many other languages, there is no command to signify the end
of the loop body (e.g. end for
); what is indented after the for
statement belongs to the loop.
What’s in a name?
In the example above, the loop variable was given the name
char
as a mnemonic; it is short for ‘character’. We can choose any name we want for variables. We can even call our loop variablebanana
, as long as we use this name consistently:word = 'oxygen' for banana in word: print(banana)
o x y g e n
It is a good idea to choose variable names that are meaningful, otherwise it would be more difficult to understand what the loop is doing.
Here’s another loop that repeatedly updates a variable:
length = 0
for vowel in 'aeiou':
length = length + 1
print('There are', length, 'vowels')
There are 5 vowels
It’s worth tracing the execution of this little program step by step.
Since there are five characters in 'aeiou'
,
the statement on line 3 will be executed five times.
The first time around,
length
is zero (the value assigned to it on line 1)
and vowel
is 'a'
.
The statement adds 1 to the old value of length
,
producing 1,
and updates length
to refer to that new value.
The next time around,
vowel
is 'e'
and length
is 1,
so length
is updated to be 2.
After three more updates,
length
is 5;
since there is nothing left in 'aeiou'
for Python to process,
the loop finishes
and the print
statement on line 4 tells us our final answer.
Note that a loop variable is a variable that’s being used to record progress in a loop. It still exists after the loop is over, and we can re-use variables previously defined as loop variables as well:
letter = 'z'
for letter in 'abc':
print(letter)
print('after the loop, letter is', letter)
a
b
c
after the loop, letter is c
Note also that finding the length of a string is such a common operation
that Python actually has a built-in function to do it called len
:
print(len('aeiou'))
5
len
is much faster than any function we could write ourselves,
and much easier to read than a two-line loop;
it will also give us the length of many other things that we haven’t met yet,
so we should always use it when we can.
From 1 to N
Python has a built-in function called
range
that generates a sequence of numbers.range
can accept 1, 2, or 3 parameters.
- If one parameter is given,
range
generates a sequence of that length, starting at zero and incrementing by 1. For example,range(3)
produces the numbers0, 1, 2
.- If two parameters are given,
range
starts at the first and ends just before the second, incrementing by one. For example,range(2, 5)
produces2, 3, 4
.- If
range
is given 3 parameters, it starts at the first one, ends just before the second one, and increments by the third one. For example,range(3, 10, 2)
produces3, 5, 7, 9
.Using
range
, write a loop that usesrange
to print the first 3 natural numbers:1 2 3
Solution
for number in range(1, 4): print(number)
Understanding the loops
Given the following loop:
word = 'oxygen' for char in word: print(char)
How many times is the body of the loop executed?
- 3 times
- 4 times
- 5 times
- 6 times
Solution
The body of the loop is executed 6 times.
Computing Powers With Loops
Exponentiation is built into Python:
print(5 ** 3)
125
Write a loop that calculates the same result as
5 ** 3
using multiplication (and without exponentiation).Solution
result = 1 for number in range(0, 3): result = result * 5 print(result)
Reverse a String
Knowing that two strings can be concatenated using the
+
operator, write a loop that takes a string and produces a new string with the characters in reverse order, so'Newton'
becomes'notweN'
.Solution
newstring = '' oldstring = 'Newton' for char in oldstring: newstring = char + newstring print(newstring)
Computing the Value of a Polynomial
The built-in function
enumerate
takes a sequence (e.g. a list) and generates a new sequence of the same length. Each element of the new sequence is a pair composed of the index (0, 1, 2,…) and the value from the original sequence:for idx, val in enumerate(a_list): # Do something using idx and val
The code above loops through
a_list
, assigning the index toidx
and the value toval
.Suppose you have encoded a polynomial as a list of coefficients in the following way: the first element is the constant term, the second element is the coefficient of the linear term, the third is the coefficient of the quadratic term, etc.
x = 5 coefs = [2, 4, 3] y = coefs[0] * x**0 + coefs[1] * x**1 + coefs[2] * x**2 print(y)
97
Write a loop using
enumerate(coefs)
which computes the valuey
of any polynomial, givenx
andcoefs
.Solution
y = 0 for idx, coef in enumerate(coefs): y = y + coef * x**idx
Key Points
Use
for variable in sequence
to process the elements of a sequence one at a time.The body of a
for
loop must be indented.Use
len(thing)
to determine the length of something that contains other values.
Storing Multiple Values in Lists
Overview
Teaching: 30 min
Exercises: 15 minQuestions
How can I store many values together?
Objectives
Explain what a list is.
Create and index lists of simple values.
Change the values of individual elements
Append values to an existing list
Reorder and slice list elements
Create and manipulate nested lists
Similar to a string that can contain many characters, a list is a container that can store many values. Unlike NumPy arrays, lists are built into the language (so we don’t have to load a library to use them). We create a list by putting values inside square brackets and separating the values with commas:
odds = [1, 3, 5, 7]
print('odds are:', odds)
odds are: [1, 3, 5, 7]
We can access elements of a list using indices – numbered positions of elements in the list. These positions are numbered starting at 0, so the first element has an index of 0.
print('first element:', odds[0])
print('last element:', odds[3])
print('"-1" element:', odds[-1])
first element: 1
last element: 7
"-1" element: 7
Yes, we can use negative numbers as indices in Python. When we do so, the index -1
gives us the
last element in the list, -2
the second to last, and so on.
Because of this, odds[3]
and odds[-1]
point to the same element here.
If we loop over a list, the loop variable is assigned to its elements one at a time:
for number in odds:
print(number)
1
3
5
7
There is one important difference between lists and strings: we can change the values in a list, but we cannot change individual characters in a string. For example:
names = ['Curie', 'Darwing', 'Turing'] # typo in Darwin's name
print('names is originally:', names)
names[1] = 'Darwin' # correct the name
print('final value of names:', names)
names is originally: ['Curie', 'Darwing', 'Turing']
final value of names: ['Curie', 'Darwin', 'Turing']
works, but:
name = 'Darwin'
name[0] = 'd'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-220df48aeb2e> in <module>()
1 name = 'Darwin'
----> 2 name[0] = 'd'
TypeError: 'str' object does not support item assignment
does not.
Ch-Ch-Ch-Ch-Changes
Data which can be modified in place is called mutable, while data which cannot be modified is called immutable. Strings and numbers are immutable. This does not mean that variables with string or number values are constants, but when we want to change the value of a string or number variable, we can only replace the old value with a completely new value.
Lists and arrays, on the other hand, are mutable: we can modify them after they have been created. We can change individual elements, append new elements, or reorder the whole list. For some operations, like sorting, we can choose whether to use a function that modifies the data in-place or a function that returns a modified copy and leaves the original unchanged.
Be careful when modifying data in-place. If two variables refer to the same list, and you modify the list value, it will change for both variables!
salsa = ['peppers', 'onions', 'cilantro', 'tomatoes'] my_salsa = salsa # <-- my_salsa and salsa point to the *same* list data in memory salsa[0] = 'hot peppers' print('Ingredients in my salsa:', my_salsa)
Ingredients in my salsa: ['hot peppers', 'onions', 'cilantro', 'tomatoes']
If you want variables with mutable values to be independent, you must make a copy of the value when you assign it.
salsa = ['peppers', 'onions', 'cilantro', 'tomatoes'] my_salsa = list(salsa) # <-- makes a *copy* of the list salsa[0] = 'hot peppers' print('Ingredients in my salsa:', my_salsa)
Ingredients in my salsa: ['peppers', 'onions', 'cilantro', 'tomatoes']
Because of pitfalls like this, code which modifies data in place can be more difficult to understand. However, it is often far more efficient to modify a large data structure in place than to create a modified copy for every small change. You should consider both of these aspects when writing your code.
Nested Lists
Since a list can contain any Python variables, it can even contain other lists.
For example, we could represent the products in the shelves of a small grocery shop:
x = [['pepper', 'zucchini', 'onion'], ['cabbage', 'lettuce', 'garlic'], ['apple', 'pear', 'banana']]
Here is a visual example of how indexing a list of lists
x
works:Using the previously declared list
x
, these would be the results of the index operations shown in the image:print([x[0]])
[['pepper', 'zucchini', 'onion']]
print(x[0])
['pepper', 'zucchini', 'onion']
print(x[0][0])
'pepper'
Thanks to Hadley Wickham for the image above.
Heterogeneous Lists
Lists in Python can contain elements of different types. Example:
sample_ages = [10, 12.5, 'Unknown']
There are many ways to change the contents of lists besides assigning new values to individual elements:
odds.append(11)
print('odds after adding a value:', odds)
odds after adding a value: [1, 3, 5, 7, 11]
removed_element = odds.pop(0)
print('odds after removing the first element:', odds)
print('removed_element:', removed_element)
odds after removing the first element: [3, 5, 7, 11]
removed_element: 1
odds.reverse()
print('odds after reversing:', odds)
odds after reversing: [11, 7, 5, 3]
While modifying in place, it is useful to remember that Python treats lists in a slightly counter-intuitive way.
As we saw earlier, when we modified the salsa
list item in-place, if we make a list, (attempt to) copy it and then modify this list, we can cause all sorts of trouble. This also applies to modifying the list using the above functions:
odds = [1, 3, 5, 7]
primes = odds
primes.append(2)
print('primes:', primes)
print('odds:', odds)
primes: [1, 3, 5, 7, 2]
odds: [1, 3, 5, 7, 2]
This is because Python stores a list in memory, and then can use multiple names to refer to the
same list. If all we want to do is copy a (simple) list, we can again use the list
function, so we do
not modify a list we did not mean to:
odds = [1, 3, 5, 7]
primes = list(odds)
primes.append(2)
print('primes:', primes)
print('odds:', odds)
primes: [1, 3, 5, 7, 2]
odds: [1, 3, 5, 7]
Turn a String Into a List
Use a for-loop to convert the string “hello” into a list of letters:
['h', 'e', 'l', 'l', 'o']
Hint: You can create an empty list like this:
my_list = []
Solution
my_list = [] for char in 'hello': my_list.append(char) print(my_list)
Subsets of lists and strings can be accessed by specifying ranges of values in brackets, similar to how we accessed ranges of positions in a NumPy array. This is commonly referred to as “slicing” the list/string.
binomial_name = 'Drosophila melanogaster'
group = binomial_name[0:10]
print('group:', group)
species = binomial_name[11:23]
print('species:', species)
chromosomes = ['X', 'Y', '2', '3', '4']
autosomes = chromosomes[2:5]
print('autosomes:', autosomes)
last = chromosomes[-1]
print('last:', last)
group: Drosophila
species: melanogaster
autosomes: ['2', '3', '4']
last: 4
Slicing From the End
Use slicing to access only the last four characters of a string or entries of a list.
string_for_slicing = 'Observation date: 02-Feb-2013' list_for_slicing = [['fluorine', 'F'], ['chlorine', 'Cl'], ['bromine', 'Br'], ['iodine', 'I'], ['astatine', 'At']]
'2013' [['chlorine', 'Cl'], ['bromine', 'Br'], ['iodine', 'I'], ['astatine', 'At']]
Would your solution work regardless of whether you knew beforehand the length of the string or list (e.g. if you wanted to apply the solution to a set of lists of different lengths)? If not, try to change your approach to make it more robust.
Hint: Remember that indices can be negative as well as positive
Solution
Use negative indices to count elements from the end of a container (such as list or string):
string_for_slicing[-4:] list_for_slicing[-4:]
Non-Continuous Slices
So far we’ve seen how to use slicing to take single blocks of successive entries from a sequence. But what if we want to take a subset of entries that aren’t next to each other in the sequence?
You can achieve this by providing a third argument to the range within the brackets, called the step size. The example below shows how you can take every third entry in a list:
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37] subset = primes[0:12:3] print('subset', subset)
subset [2, 7, 17, 29]
Notice that the slice taken begins with the first entry in the range, followed by entries taken at equally-spaced intervals (the steps) thereafter. If you wanted to begin the subset with the third entry, you would need to specify that as the starting point of the sliced range:
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37] subset = primes[2:12:3] print('subset', subset)
subset [5, 13, 23, 37]
Use the step size argument to create a new string that contains only every other character in the string “In an octopus’s garden in the shade”. Start with creating a variable to hold the string:
beatles = "In an octopus's garden in the shade"
What slice of
beatles
will produce the following output (i.e., the first character, third character, and every other character through the end of the string)?I notpssgre ntesae
Solution
To obtain every other character you need to provide a slice with the step size of 2:
beatles[0:35:2]
You can also leave out the beginning and end of the slice to take the whole string and provide only the step argument to go every second element:
beatles[::2]
If you want to take a slice from the beginning of a sequence, you can omit the first index in the range:
date = 'Monday 4 January 2016'
day = date[0:6]
print('Using 0 to begin range:', day)
day = date[:6]
print('Omitting beginning index:', day)
Using 0 to begin range: Monday
Omitting beginning index: Monday
And similarly, you can omit the ending index in the range to take a slice to the very end of the sequence:
months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']
sond = months[8:12]
print('With known last position:', sond)
sond = months[8:len(months)]
print('Using len() to get last entry:', sond)
sond = months[8:]
print('Omitting ending index:', sond)
With known last position: ['sep', 'oct', 'nov', 'dec']
Using len() to get last entry: ['sep', 'oct', 'nov', 'dec']
Omitting ending index: ['sep', 'oct', 'nov', 'dec']
Overloading
+
usually means addition, but when used on strings or lists, it means “concatenate”. Given that, what do you think the multiplication operator*
does on lists? In particular, what will be the output of the following code?counts = [2, 4, 6, 8, 10] repeats = counts * 2 print(repeats)
[2, 4, 6, 8, 10, 2, 4, 6, 8, 10]
[4, 8, 12, 16, 20]
[[2, 4, 6, 8, 10],[2, 4, 6, 8, 10]]
[2, 4, 6, 8, 10, 4, 8, 12, 16, 20]
The technical term for this is operator overloading: a single operator, like
+
or*
, can do different things depending on what it’s applied to.Solution
The multiplication operator
*
used on a list replicates elements of the list and concatenates them together:[2, 4, 6, 8, 10, 2, 4, 6, 8, 10]
It’s equivalent to:
counts + counts
Key Points
[value1, value2, value3, ...]
creates a list.Lists can contain any Python object, including lists (i.e., list of lists).
Lists are indexed and sliced with square brackets (e.g., list[0] and list[2:9]), in the same way as strings and arrays.
Lists are mutable (i.e., their values can be changed in place).
Strings are immutable (i.e., the characters in them cannot be changed).