Chapter 2.6 - Tuples, Sets, and Dictionaries - Computer Programming for the Geosciences

Chapter 2.6.1 - Tuple¶

Full name: tuple

Python keyword: tuple or ()

Python data type group: sequence

A tuple is an unmodifiable “list-like” data type.

Tuples are organizations of primitive (or even composite) data types and cannot be modified once defined
You can access values in a tuple based on their position, starting with 0, 1, 2, etc. by placing square brackets after variable name (just like strings!)
You can perform “sequence unpacking” to set multiple variables to all values in a tuple.
Useful for:
- Creating unrelated groups of data
- Storing information generated by a method or function

We define a tuple in one of two ways:

Commas between variables or values set equal to a variable (avoid doing it this way)

my_tuple = 1, 2, 3, 4

Commas between variables or values within open and closed parentheses (do it this way)

my_tuple = (1, 2, 3, 4)

Common errors when using tuples

Case 1: You remembered the commas, but forgot the square brackets. Python recognizes this as a tuple instead (see #1 above). Note that you do not have the list methods available for tuple variables. For example, if you try to use append you will get an error:

my_tuple = 1, 2, 3, 4

my_tuple.append(5)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 3
      1 my_tuple = 1, 2, 3, 4
----> 3 my_tuple.append(5)

AttributeError: 'tuple' object has no attribute 'append'

Case 2: You create a tuple, but you assume that you can modify values within the tuple. This data type cannot be modified once it is defined.

my_tuple = (1, 2, 3, 4)

my_tuple[0] = 0

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 3
      1 my_tuple = (1, 2, 3, 4)
----> 3 my_tuple[0] = 0

TypeError: 'tuple' object does not support item assignment

Otherwise, a tuple is similar to a list. For example, indexing works just like it does in a list:

my_tuple = (1, 2, 3, 4)

print("my_tuple[0] prints", my_tuple[0])
print("my_tuple[:2] prints", my_tuple[:2])

my_tuple[0] prints 1
my_tuple[:2] prints (1, 2)

You can perform “type casting” to convert a list to a tuple, or the other way around:

# set x to a tuple
x = (1, 2, 3, 4)

# cast the tuple x into a list
# set the result to x
x = list(x)

# cast the list x into a tuple
# set the result to y
y = tuple(x)

print(x, y)

[1, 2, 3, 4] (1, 2, 3, 4)

Sequence unpacking¶

Tuples are most often used for storing the results of a method or function. This is because they have built in functionality to “unpack” their contents into multiple variables.

Consider the following tuple definition:

my_tuple = (85, 70, "cloudy", "West Wind")

print(my_tuple)

(85, 70, 'cloudy', 'West Wind')

If instead you would like to set each one of these to a variable, you could do so like this:

high, low, weather, wind = (85, 70, "cloudy", "west wind")

print(f"The high was {high}F, the low was {low}F")
print(f"the weather was {weather}, and there was a {wind}")

The high was 85F, the low was 70F
the weather was cloudy, and there was a west wind

which is equivalent to:

high = 85
low = 70
weather = "cloudy"
wind = "west wind"

print(f"The high was {high}F, the low was {low}F")
print(f"the weather was {weather}, and there was a {wind}")

The high was 85F, the low was 70F
the weather was cloudy, and there was a west wind

Tuples and methods

Later in the course, we will use the concept of importing to add more functionality to our Python programs.

For example, we can import math to get access to many mathematical operations and functions that are not available by default:

import math

math.exp is a method that takes a numeric parameter (int or float), runs the calculation $e^{x}$ , where x is the number you provide, and “returns” the result which is a float value.

In the following example, we are using the int 5 as the exponent. Since the value of e is approximately 2.71828, we can compare the results using a basic approach (simple exponent operator).

Notice that the results are only equivalent to 2 decimal places. This is due to how we simplified e by only including 5 decimal places.

x = 5

result_math = math.exp(x)
result_basic = 2.71828 ** x

print(result_math, result_basic)

148.4131591025766 148.41265995084171

There are some methods that can return 2 or more results.

Consider the following method definition that takes a list as a parameter and returns the unique values and the count of unique values:

def unique(my_list):
    
    unique_values = list(set(my_list))
    unique_count = len(my_list)
    
    return unique_values, unique_count

x = [1, 2, 1, 1, 4, 5]

result = unique(x)

print(result)

([1, 2, 4, 5], 6)

Hopefully you noticed that result is a tuple!

It may be more convenient to set the result to two variables. We can do this automatically by using sequence unpacking:

def unique(my_list):
    
    unique_values = list(set(my_list))
    unique_count = len(unique_values)
    
    return unique_values, unique_count

x = [1, 2, 1, 1, 4, 5]

items, count = unique(x)

print(f"The unique values were {items}")
print(f"There were {count} unique values")

The unique values were [1, 2, 4, 5]
There were 4 unique values

Chapter 2.6.2 - Set¶

Full name: set

Python keyword: set or {}

Python data type group: set

A set is also somewhat similar to a list, except that it only keeps unique values and only allows primitive data types.

Sets are organizations of primitive data types that can be modified after they are created (but only by adding or removing items)```
You cannot access values in a tuple based on their position using indexing.
You can use the set method add() to add an item to a set, as long as it is not already in the set.
You can use the set method remove() to remove an item from a set based on its value, as long as it is already in the set. While this does only remove the first instance of the value, that will be the only instance of that value because sets only keep unique values.
Useful for:
- Identifying unique values
- Removing duplicates

We define a set in one of two ways:

Open and closed curly brackets with items separated by commas:

my_set = {1, 2, 3, 4}

“Type casting” an existing list or tuple to a set

my_set = set(my_list)

Even when you define the set initially, duplicates are removed:

my_set = {1, 2, 3, 5, 5}

print(my_set)

{1, 2, 3, 5}

Common issues when using a tuple

Case 1:

You create a set, but then try to access an index in the set. Remember that you cannot access items in a set using indexing:

my_set = {1, 2, 3, 5, 5}

my_set[0]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[13], line 3
      1 my_set = {1, 2, 3, 5, 5}
----> 3 my_set[0]

TypeError: 'set' object is not subscriptable

Case 2:

You create a set, but then try to modify an item in that set using indexing. The same issue as the previous example applies here:

my_set = {1, 2, 3, 5, 5}

my_set[0] = 5

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[14], line 3
      1 my_set = {1, 2, 3, 5, 5}
----> 3 my_set[0] = 5

TypeError: 'set' object does not support item assignment

Case 3

You create a list, and then use type casting to turn it into a set. You then wonder why some of the values disappeared and why the order of the values is different. This is because duplicate values will be removed and order is not preserved in a set:

my_list = ["a", "b", "c", 1, 1, "c", "1", 8]

my_set = set(my_list)

print(my_set)

{1, '1', 8, 'a', 'c', 'b'}

Case 4

You create a set and then try to add an item that already exists in the set. Because the set will not accept duplicate values, it will not modify the set:

my_set = {1, 2, 3, 4, 5, 5, 5}

print("my_set after definition", my_set)

my_set.add(6)

print("my_set after adding value not in the set", my_set)

my_set.add(6)

print("my_set after adding value already in the set", my_set)

my_set after definition {1, 2, 3, 4, 5}
my_set after adding value not in the set {1, 2, 3, 4, 5, 6}
my_set after adding value already in the set {1, 2, 3, 4, 5, 6}

“List-like” operations to test if items are in a `set`, `list` or `tuple`¶

It may be useful to check if an item is in a composite data type.

The syntax is:

True if item is in the list, False otherwise:

item in list

True if item is not in the list, False otherwise:

item not in list

my_list = ["a", 1, 1.5, True]

print("Is True in my_list?", True in my_list)

print('Is "a" not in my_list?', "a" not in my_list)

print("Is 1.5 in my_list?", 1.5 in my_list)

print("Is 1 in my_list?", 1 in my_list)

Is True in my_list? True
Is "a" not in my_list? False
Is 1.5 in my_list? True
Is 1 in my_list? True

Operations unique to set¶

You can use a set to perform some more complex operations that are possible due to every value being unique.

Some examples of set operators include:

union operator |: Create a new set that combines the unique values of two sets

set1 = {1, 2, 3, 4, 5}
set2 = {3, 4, 5, 6, 7}

union = set1 | set2

print(union)

{1, 2, 3, 4, 5, 6, 7}

intersection operator &: Create a new set that combines the unique values that are in both sets

set1 = {1, 2, 3, 4, 5}
set2 = {3, 4, 5, 6, 7}

intersection = set1 & set2

print(intersection)

{3, 4, 5}

difference operator -: Create a new set that includes unique values that are in the set on the left side of the operator and not in the set on the right side of the operator

set1 = {1, 2, 3, 4, 5}
set2 = {3, 4, 5, 6, 7}

diff = set1 - set2

print(diff)

{1, 2}

symmetric difference operator ^: Create a new set that includes unique values that are either in the set on the left side of the operator or in the set on the right side of the operator, but not both.

set1 = {1, 2, 3, 4, 5}
set2 = {3, 4, 5, 6, 7}

symm = set1 ^ set2

print(symm)

{1, 2, 6, 7}

Example: You want to find shared hurricane names during two seasons: 2005 and 2020

year_2005 = {"Arlene", "Bret", "Cindy", "Dennis", "Emily", "Franklin", "Gert",
             "Harvey", "Irene", "Jose", "Katrina", "Lee", "Maria", "Nate",
             "Ophelia", "Philippe", "Rita", "Stan", "Tammy", "Vince", "Wilma",
             "Alpha", "Beta", "Gamma", "Delta", "Epsilon", "Zeta"}

year_2020 =  {"Arthur", "Bertha", "Cristobal", "Dolly", "Edouard", "Fay", "Gonzalo",
              "Hanna", "Isaias", "Josephine", "Kyle", "Laura", "Marco", "Nana",
              "Omar", "Paulette", "Rene", "Sally", "Teddy", "Vicky", "Wilfred",
              "Alpha", "Beta", "Gamma", "Delta", "Epsilon", "Zeta", "Eta", "Theta",
              "Iota"}

You can use the & operator, which finds shared unique values:

both = year_2005 & year_2020

print(both)

{'Beta', 'Gamma', 'Zeta', 'Epsilon', 'Alpha', 'Delta'}

Example: you were moving a bunch of files from your external hard drive to your laptop, but the power went out in the middle of the transfer. You want to find out what files are missing on your laptop so you know which ones to copy over.

files_on_external = {'zophet_521.csv', 'qumira_87.csv', 'talverin_404.csv', 'wosneta_191.csv',
                     'byrolan_732.csv', 'neropi_613.csv', 'jaxisor_205.csv', 'fyntaro_888.csv',
                     'lodrex_75.csv', 'marqen_452.csv', 'thirvo_323.csv', 'xelmar_190.csv',
                     'ponivra_601.csv', 'cryvorn_288.csv', 'heltrix_945.csv', 'joventa_71.csv',
                     'rukiel_379.csv', 'valtor_556.csv', 'zontri_220.csv', 'mevrik_482.csv',
                     'yandrel_649.csv', 'korvix_118.csv', 'drelta_532.csv', 'omrika_744.csv',
                     'tivorn_269.csv'}

files_on_laptop = {'zophet_521.csv', 'qumira_87.csv', 'wosneta_191.csv',
                   'neropi_613.csv', 'jaxisor_205.csv', 'lodrex_75.csv',
                   'thirvo_323.csv', 'ponivra_601.csv', 'heltrix_945.csv',
                   'rukiel_379.csv', 'valtor_556.csv', 'zontri_220.csv',
                   'mevrik_482.csv', 'yandrel_649.csv', 'korvix_118.csv',
                   'omrika_744.csv', 'tivorn_269.csv'}

You can use the - operator, which gives you unique values that are in the left set but are not in the right set:

only_external = files_on_external - files_on_laptop

print(only_external)

{'byrolan_732.csv', 'xelmar_190.csv', 'marqen_452.csv', 'talverin_404.csv', 'drelta_532.csv', 'joventa_71.csv', 'cryvorn_288.csv', 'fyntaro_888.csv'}

You can also make sure that files_on_laptop is a subset of files_on_external by reversing the order. If you get an empty set, you know that there are no files that are on your laptop but not your external.

only_laptop = files_on_laptop - files_on_external

print(only_laptop)

set()

Chapter 2.6.3 - Dictionary¶

Full name: dictionary

Python keyword: dict or {'key': 'value'}

Python data type group: mapping

Dictionaries are organizations of primitive (or even composite) data types and can be modified once defined and can use any primitive or string as an index
- Items in a dictionary can be numbers, strings, etc., just like in list, tuple, and set
Biggest difference between dictionaries and list is how you access items in the dictionary.
Useful for:
- Organizing data with human-readable indexes
- Defining parameters to be used in a method
- Generating data that can be easily converted to csv/excel files and used in pandas

We define a dict in one of two ways:

Open and closed curly brackets with key, value pairs separated by commas:

my_dict = {key1: value1, key2: value2}

The keyword dict with key, value pairs identified using = and separated by commas

my_dict = dict(key1=value1, key2=value2)

my_dict = {'temperature': 85, 'dewpoint': 70}

print(my_dict)

{'temperature': 85, 'dewpoint': 70}

my_dict = dict(temperature=85, dewpoint=70)

print(my_dict)

{'temperature': 85, 'dewpoint': 70}

Basic `dict` usage¶

You need to choose useful indexes.

For example, we might have data on an observation at NIU

observation = {'temperature': 85,
               'dewpoint': 75,
               'wind_speed': 10,
               'wind_direction': 'SSW',
               'weather_conditions': 'Partly Cloudy'}

observation

{'temperature': 85,
 'dewpoint': 75,
 'wind_speed': 10,
 'wind_direction': 'SSW',
 'weather_conditions': 'Partly Cloudy'}

If you want to access the temperature, you would do the following

add an opening square bracket to the dictionary name: observation[
type in the key exactly as it appears in the dictionary definition: observation['temperature'
add a closing square bracket: observation['temperature']

observation['temperature']

85

Common issues when using a dict:

Case 1

You create a dictionary, but try to access the first value in that dictionary by using list indexing. dict indexing requires the use of the key value you defined when creating the dictionary. You get a ‘KeyError’ when using a dictionary when you try to access a key index that does not exist.

observation = {'temperature': 85,
               'dewpoint': 75,
               'wind_speed': 10,
               'wind_direction': 'SSW',
               'weather_conditions': 'Partly Cloudy'}

observation[0]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[31], line 7
      1 observation = {'temperature': 85,
      2                'dewpoint': 75,
      3                'wind_speed': 10,
      4                'wind_direction': 'SSW',
      5                'weather_conditions': 'Partly Cloudy'}
----> 7 observation[0]

KeyError: 0

Critical thinking: Create a dictionary that does act like a list. In other words, you can use 0-based indexing like a list.

test_dict = {}

print(test_dict[0])

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[32], line 3
      1 test_dict = {}
----> 3 print(test_dict[0])

KeyError: 0

Case 2

You create a dictionary and try to access a value using a key, but you spell the key incorrectly or use the incorrect case. Be aware that case and spelling matter!

observation = {'temperature': 85,
               'dewpoint': 75,
               'wind_speed': 10,
               'wind_direction': 'SSW',
               'weather_conditions': 'Partly Cloudy'}

observation['Temperature']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[33], line 7
      1 observation = {'temperature': 85,
      2                'dewpoint': 75,
      3                'wind_speed': 10,
      4                'wind_direction': 'SSW',
      5                'weather_conditions': 'Partly Cloudy'}
----> 7 observation['Temperature']

KeyError: 'Temperature'

observation['tempratrue']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[34], line 1
----> 1 observation['tempratrue']

KeyError: 'tempratrue'

Dictionaries with composite data types¶

A very common pattern used with dictionaries is to use the indexes as placeholders for item containers like a list.

For example, you may want to have a list of observations on different days.

You would define the dictionary like above, except you would remove the values and add an empty list: []:

observation = {'temperature': [],
               'dewpoint': [],
               'wind_speed': [],
               'wind_direction': [],
               'weather_conditions': []}

observation

{'temperature': [],
 'dewpoint': [],
 'wind_speed': [],
 'wind_direction': [],
 'weather_conditions': []}

Now, you can access each list in the dictionary and add values using append:

observation = {'date': [],
               'temperature': [],
               'dewpoint': [],
               'wind_speed': [],
               'wind_direction': [],
               'weather_conditions': []}


observation['date'].append('1999-05-03')
observation['temperature'].append(85)
observation['dewpoint'].append(75)
observation['wind_speed'].append(10)
observation['wind_direction'].append("SSW")
observation['weather_conditions'].append("Partly Cloudy")

observation['date'].append('1999-05-04')
observation['temperature'].append(95)
observation['dewpoint'].append(78)
observation['wind_speed'].append(15)
observation['wind_direction'].append("S")
observation['weather_conditions'].append("Thunderstorms")

observation

{'date': ['1999-05-03', '1999-05-04'],
 'temperature': [85, 95],
 'dewpoint': [75, 78],
 'wind_speed': [10, 15],
 'wind_direction': ['SSW', 'S'],
 'weather_conditions': ['Partly Cloudy', 'Thunderstorms']}

Dictionaries are everywhere in Python¶

When we learn about functions later in the course, we will be using parameters to modify the behavior of functions. The typical pattern associated with functions is very similar to those in a math class. You have a function definition with variables and then code that defines the general behavior of the function.

Consider the following example that tests to see if value is in my_list. We “pass” 5 in as the value and the list named x as my_list. Since the value 5 is in my_list, the result is True.

def isin(value, my_list):
    
    return value in my_list

x = [1, 2, 1, 1, 4, 5]

print(isin(5, x))

True

We can define the parameter list using a dict, and then pass that dict into the function using the “unpacking” operator **:

x = [1, 2, 1, 1, 4, 5]

params = dict(my_list=x, value=5)

result = isin(**params)

print(result)

True

Dictionaries are very important for data analysis tasks in Python¶

Dictionaries and the python pandas package make it easy to visualize, analyze, and save your data on the hard drive.

Remember the dict we created before with list values?

observation

{'date': ['1999-05-03', '1999-05-04'],
 'temperature': [85, 95],
 'dewpoint': [75, 78],
 'wind_speed': [10, 15],
 'wind_direction': ['SSW', 'S'],
 'weather_conditions': ['Partly Cloudy', 'Thunderstorms']}

We can automatically turn this into a pandas DataFrame:

import pandas as pd

df = pd.DataFrame.from_dict(observation)

df = df.set_index('date')

df

Chapter 2.6.4 - Summary¶

Examples using a `list`¶

Accessing values¶

Note: You cannot index set values.

1a. Access one value in a list

a = [1, 2, 3]

print(a[0])

1b. Access one value in a tuple

a = (1, 2, 3)

print(a[0])

1c. Access one value in a dict

a = {'ijk': 1, 'xyz': 2, 'abc': 3}

print(a['ijk'])

Adding values¶

Note: You cannot add values to a tuple.

2a. Add a value to a list:

a = [1, 2, 3]

a.append(5)

print(a)

[1, 2, 3, 5]

2b. Add a value to a set:

a = {1, 2, 3}

a.add(4)

print(a)

{1, 2, 3, 4}

2c. Add a key, value pair to a dictionary:

a = {'ijk': 1, 'xyz': 2, 'abc': 3}

a['def'] = 4

print(a)

{'ijk': 1, 'xyz': 2, 'abc': 3, 'def': 4}

Combining composite data types¶

3a. Add a value to a list within a dictionary:

a = {'ijk': [1, 2, 3], 'xyz': [4, 5, 6], 'abc': [7, 8, 9]}

a['abc'].append(10)

print(a)

{'ijk': [1, 2, 3], 'xyz': [4, 5, 6], 'abc': [7, 8, 9, 10]}

3b. Access a value in a list within a dictionary:

a = {'ijk': [1, 2, 3], 'xyz': [4, 5, 6], 'abc': [7, 8, 9]}

print(a['abc'][0])

3c. Access multiple values using slicing in a list within a dictionary:

a = {'ijk': [1, 2, 3], 'xyz': [4, 5, 6], 'abc': [7, 8, 9]}

print(a['ijk'][:2])

[1, 2]

Chapter 2.6.5 - Practice¶

Create a dictionary that contains the following tornado report data, make sure you use the correct data types:

Time	F-Scale	Location	County	State
0149	1	1 E Big Rock	Kane	IL
0223	0	1 ESE Glen Ellyn	DuPage	IL
0229	0	Villa Park	DuPage	IL
0234	0	Bensenville	DuPage	IL

Chapter 2.6 - Tuples, Sets, and Dictionaries