Chapter 2.2 - Strings - Computer Programming for the Geosciences

Strings are any text identified in Python code within "" or ''

Full name: string

Python keyword: str

Python data type group: text sequence

Chapter 2.2.1 - String overview¶

Strings are our first introduction to composite (sometimes referred to as sequence) data types. These data types combine multiple primitive data types and turn them into sequences of values. These sequences can include int, float, bool, and other primitive data types. Unlike languages like C, C++, etc., that are statically-typed, Python has dynamic typing. While both Python and C++, for example, use values that have types, variables in Python can change type at any time. This can be useful for beginning programmers who might have trouble dealing with the extra static typing syntax and behavior. However, it can also cause some hidden issues in your program because you might, for example, expect a counter to only have an int, but if at some point during the program, you change it to a str, Python will not stop you from doing so. Typing in Python is implicitly enforced when expected operations cause an error (e.g., “adding” an int and a str).

Strings are a special case of sequence data types. This is because they generally only allow the inclusion of alphanumeric characters in the sequence. We will learn about other sequence data types that do not enforce this requirement later in this chapter.

We can call these alphanumeric values “characters” (actually called Unicode codes) to match with the data type name in other common languages (e.g., char in C++, Java, etc.). Strings, therefore, are sequences of characters, where each character inhabits a specific position (index) in that sequence. For example, the first character in this sentence is “F”.

There are multiple ways to define strings:

a = "double quotes"
a = 'single quotes'
a = """triple quotes"""

While single and double quotes can only occur on one line, triple quotes can help your definition span multiple lines.

a = "double quotes"
b = 'single quotes'
c = """
    triple 
    quotes
    work
    on
    multiple
    lines
    """

print(a)
print(b)
print(c)

double quotes
single quotes

    triple 
    quotes
    work
    on
    multiple
    lines

Chapter 2.2.2 - String rules¶

The following rules will help you to effectively use strings in Python:

Rule #1¶

You must “close” a string using a starting " and ending " (alternatively a starting ' and ending ', or """ and """ for multiple lines). Whatever you choose, you must open and close with the same symbol (e.g., do not start with " and end with ').

Rule #2¶

You cannot add an int or float to a str without first casting (type converting) those values.

Rule #3¶

If you need to format your strings or insert values, you must use an f-string to do so.

Related to Rule #2, you may be surprised to learn that we can treat strings like int or float by “adding” two variables, as long as both variables are strings.

Fix the code below to print out Huskies #1.

string1 = "Huskies #" 
string2 = 1

string3 = string1 + string2

print(string3)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[13], line 4
      1 string1 = "Huskies #" 
      2 string2 = 1
----> 4 string3 = string1 + string2
      6 print(string3)

TypeError: can only concatenate str (not "int") to str

However, we typically want to use more complex approaches which include:

casting / type conversion: converting int or float into str
format string / f-string: defining where and how to insert an int or float into a str (see Rule #3)

Chapter 2.2.3 - Defining strings (Rule #1)¶

This is the first thing you should check if you are debugging / troubleshooting code with strings.

In the following example, I try to set a variable to a string value, but I do not close the end of the string with ". You get a SyntaxError, which suggests something very basic is wrong. When you see “EOL while scanning string literal”, that means that the interpreter expected a second " and never got one. This is an error that will stop your code from running.

a = "Test

  Cell In[4], line 1
    a = "Test
             ^
SyntaxError: EOL while scanning string literal

You can also get an error if you mix up your " and '

a = "Test'

  Cell In[5], line 1
    a = "Test'
              ^
SyntaxError: EOL while scanning string literal

Be consistent!

a = "NIU"
b = 'NIU'

print(a)
print(b)

NIU
NIU

Why does the following not work?

a = Test

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[17], line 1
----> 1 a = Test

NameError: name 'Test' is not defined

Chapter 2.2.4 - String conversions (Rule #2)¶

This is a rule across most programming languages, and is a subset of a more general rule that data types have different purposes and abilities. It is up to the programmer to make sure the right data type is being used. If the correct data type is not in place, casting (type conversion) is required.

In the case of float and int, the conversion is straightforward:

int_example = 1
float_example = 1.5

int_to_str_example = str(int_example)
float_to_str_example = str(float_example)

print(int_to_str_example)
print(float_to_str_example)

1
1.5

The original data types are incompatible with strings, and you will get a TypeError, which suggests there is an incompatible data type problem.

NOTE: notice how the variables defined in the code cell above are able to be used by the code below. This is true as long as the code cell above is “run” before the code cells below OR if those variables are defined before the examples below.

combined_str = "The answer is " + int_example

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 combined_str = "The answer is " + int_example

TypeError: can only concatenate str (not "int") to str

combined_str = "The answer is " + float_example

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 combined_str = "The answer is " + float_example

TypeError: can only concatenate str (not "float") to str

However, after casting, the variables are now also str and can be added to another string:

combined_str = "The answer is " + int_to_str_example

print(combined_str)

The answer is 1

combined_str = "The answer is " + float_to_str_example

print(combined_str)

The answer is 1.5

In some cases, a string can be cast to an int or float. This only works if the string is “obviously” a number:

str_number = "1.452"

str_to_float_number = float(str_number)

print(str_to_float_number)

1.452

Did the following code work as expected? If not, what do you have to do to make it work?

str_number_1 = "1.452"
str_number_2 = "3.134"

result = str_number_1 + str_number_2

print(result)

1.4523.134

Chapter 2.2.5 - Formatting strings (Rule #3)¶

A basic string is defined as in the examples above:

a = "test"

print(a)

test

If you have multiple variables to print, you can use a print statement with commas to print the values along with spaces:

a = "Let's"
b = "Go"
c = "Huskies!"

print(a, b, c)

Let's Go Huskies!

This works well with strings, but can result in undesirable results when using other data types, particularly float

a = 1
b = 3
c = a / b

print(a, "/", b, "=", c)

1 / 3 = 0.3333333333333333

Notice that we have many decimal places.

Often you are required to cut off the display of decimals at two decimal places.

There is no obvious way to make this work with what we have learned so far.

f-string syntax¶

f-strings (formatted strings) can take other data types, format them in a predictable way, and then output the result.

There are only two main differences between a normal string and f-string

You must put an f in front of the first ".
You must place “{ }” in the string where you would like to insert a value.

An example is shown below, where you insert a, b, and c like above into a string and then print the string:

a = 1
b = 3
c = a / b

result = f"{a} / {b} = {c}"

print(result)

1 / 3 = 0.3333333333333333

This is the most basic way we can insert variables into a string. While this is not particularly useful, we will see the utility in subsequent examples.

Formatting numbers¶

When inserting numbers (int, float) into a string, you must provide 3 pieces of information within the brackets ({ }):

The variable you wish to insert
The data type of the variable
The formatting code

The most common formatting task is to take a float and display a fixed amount of decimal places.

A basic template for this process can be seen below, where precision is the count of decimal places (try changing the precision variable from 2 to 3):

variable = 1.66666666666
precision = 2

formatted_float = f"{variable:.{precision}f}"

print(formatted_float)

1.67

You can also write it as such:

variable = 1.66666666666

formatted_float = f"{variable:.2f}"

print(formatted_float)

1.67

For the a, b, c example above, you would replace the above example with this:

a = 1
b = 3
c = a / b

result = f"{a} / {b} = {c:.2f}"

print(result)

1 / 3 = 0.33

Chapter 2.2.6 - Student Summarization¶

Use markdown cells to describe the code and use code cells to include code examples from the slideshow that summarize string usage

# code cell

markdown cell