Class 2: Virtual Environments and Object-oriented Programming (OOP)

Review

  • Describe the function of the break and continue statements.

  • Briefly explain type hinting in Python and its significance.

  • How would you recommend implementing a simple file I/O operation?

Virtual Environments

The basic idea is that every Python project you have should be contained within an isolated environment, which includes the interpreter and any libraries (i.e. external Python packages) that are needed for that specific project.

Why do we need virtual environments?

Once a project accumulates a number of dependencies (external Python packages used in your code), asserting compatibility between all components become tricky (see “dependency hell”). Python libraries are updated regularly, and each released version may introduce new bug fixes and features, or remove functionality that was flagged as depracated. Code that was written using one version of some external package will not necessarily run successfully using a different version. Virtual environments allow us to create as many isolated environments as we’d like, each containing only the libraries we need, and in the exact versions that guarantee compatibility.

Summary of environments and namespaces

If this concept feels a little vague, don’t be too worried. Once you’ve created a couple of virtual environments you’ll get it in no time.

Creating a Virtual Environment

There are numerous tools for creating and managing virtual environments with Python. In this course we will use venv, which is one of Python’s built-in packages and is therefore a simple option that will always be available to you.

Creating a new virtual environment with venv is as easy as running:

$ python3 -m venv /path/to/new/virtual/environment/
C:\> C:\Python38\python -m venv C:\path\to\new\virtual\environment\

We are calling <python binary> -m to execute the venv module as a script and pass it the path in which it should create the new virtual environment.

The Virtual Environment Directory

Looking at the newly created virtual environment’s directory tree, please note the following files and folders:

venv
├── bin/
│   ├── python
│   ├── pip
│   ├── activate
│   ├── <Other binaries and CLI tools>
├── lib/
│   ├── python3.8/
│   │   ├── site-packages/
│   │   │   ├── <pip-installed packages>
├── <...>
  • venv/bin/python: The copy of the Python interpreter used when the virtual environment is activated.

  • venv/bin/pip: The copy of the pip CLI used when the virtual environment is activated.

  • venv/bin/activate: A script we will use to activate the virtual environment (making the prior two files the default python and pip executables).

  • lib/python3.8/site-packages/: This directory will contain all external packages installed with pip install <package-name> when our virtual environment is activated.

Activating the Virtual Environment

Activating a virtual environment means modifying your working environment to use a particular, isolated installation of Python (created in Creating a Virtual Environment). This will normally also “override” some other related CLI tools with those installed within the virtual environment’s bin directory (such as pip), and sometimes also change some OS-related environment variables.

To activate the virtual environment and use the newly created isolated Python environment, simply run:

$ source <venv>/bin/activate

* Replace <venv> with your virtual environment’s directory path.

C:\> <venv>\Scripts\activate

* Replace <venv> with your virtual environment’s directory path.

You should now see your virtual environment’s directory name used as a prefix ((venv)) in your terminal window.

To validate the activation, run:

(venv) $ which python
<venv>/bin/python
(venv) C:\> where python
<venv>\Scripts\python

The returned path should be the path of the Python interpreter within the virtual environment’s directory.

For the purposes of this course one virtual environment will probably be enough. Always remember to activate it before starting work, and be sure to create a new virtual environment for any new project you begin in the future.

Project Setup Exercise

  1. Create a new my_project directory on your computer.

  2. Start VSCode and open your new project’s directory.

  3. Initialize Git to add version control to your project.

  4. Create a virtual environment for your project and activate it (this will also let VSCode know what Python executable is used in this project).

  5. Create a .py file with a function that simply prints your name.

  6. Create another .py file that imports your function and runs it.

  7. Install the numpy package using pip and make sure you’re able to import it.

  8. Add an MIT license to the repo, as well as a basic README.md.

  9. If the virtual environment’s directory is found under the project’s directory, create a .gitignore file and add its relative path to it. This will tell Git to ignore this directory (and not archive everything in it as part of our code repository, which would be an absolute mess).

  10. Create your first commit.

  11. Publish this project to your GitHub account.

Object-Oriented Programming: Part 1

Introduction

There are three main programming paradigms in use in mainstream programming languages:

  • Procedural

  • Functional

  • Object-oriented

While the functional paradigm is very interesting, we will not be discussing it in this course. You can read about Haskell, OCaml, F# and other functional programming languages wherever you get your information from.

Procedural Programming

Procedural Programming PB&J

The procedural paradigm is the most widely used paradigm… in the academia. And it’s probably the one you’re most familiar with from your work with MATLAB.

For example, if we wanted to write a naive script that multiplies the elements in two lists, we could write something like:

l1 = [1, 2, 3]
l2 = [4, 5, 6]
result = []
for item1, item2 in zip(l1, l2):
    result.append(item1 * item2)
result
[4, 10, 18]

If we later run into more lists we need to multiply, we’ll again write:

l3 = [10, 20, 30]
l4 = [40, 50, 60]
result2 = []
for item3, item4 in zip(l3, l4):
    result2.append(item3 * item4)
result2
[400, 1000, 1800]

At this point we’ll recognize a pattern and immediately be rememinded of the DRY (“Don’t Repeat Yourself”) principle, leading us to define a function and replace these two parts:

def list_multiplier(l1, l2):
    """
    Multiply two lists element-wise.
    
    Parameters
    ----------
    l1 : list
        First list
    l2 : list
        Second list    
    
    Returns
    -------
    list
        Element-wise multiplication result
    """
    result = []
    for item1, item2 in zip(l1, l2):
        result.append(item1 * item2)
    
    return result

This new procedure does one thing, and one thing only. This is what’s so powerful about it.

Procedural programming allows us to group and order our code base into small units, called functions or procedures, that have a specific, defined task.

It usually contains a “wrapper” script that defines the order of running for these functions:

def run_pipeline(foldername):
    """Main data pipeline script."""
    data = get_user_input(foldername)
    data_without_fieldnames = extract_fieldnames(data)
    columnar_data = generate_columns(data_without_fieldnames, num_of_columns)
    # ...
    # At the end of the file it will contain:

if __name__ == '__main__':
    foldername = '/path/to/folder'
    result = run_pipeline(foldername)
    print(result)

You should decisively eliminate any repeating code. It’s perhaps the most common source for errors in scientific computing, and it may bite you any of these ways:

  • Encapsulation

    # String concatenation
    first_string = 'abcd'
    second_string = 'efgh'
    concat = first_string + second_string[:-1] + 'zzz'  # you suddenly remember that you wish to exclude 
    # the last character in "second_string" and add the 'zzz' sequence at the end.
    # Program continues...
    # ...
    third_string = 'poiu'
    fourth_string = 'qwer'
    concat2 = third_string + fourth_string + 'zzz' # you wish to achieve the same goal in this 
    # concatenation - but you forgot that you excluded the last character of the second string.
    

    The moment you realized that you have a recurring operation on strings - you have to encapsulate it in a function. Be ruthless!

  • Parametrization

    Instead of writing:

    def process_data(data):
        scaled_data = data * 0.3  #  what is 0.3 exactly? Parametrize it.
    

    We might do:

    def process_data(data, na_concentration=0.3):
        """Multiplies data by the Na concentration."""
        scaled_data = data * na_concentration
    

    But this is usually not enough. When calling the process_data, parameterize the na_concentration variable as well. This will help you avoid a situation such as:

    data = b * c - 1 + a
    process_data(data, 0.4)
    # Script continues...
    process_data(data2, 0.5)  # Perhaps you really did wish to call "process_data" with two different
    # parameters, but it's more likely that you decided that 0.5 was too high, so you changed it to 0.4
    # in the first call, but forgot that you had a second call. This parameter should appear somewhere at
    # the top of your script.
    

    Instead, we can specify module-wide constants, e.g.:

    NA_CONCENTRATION = 0.4
    
    def process_data(data, na_concentration=NA_CONCENTRATION):
        """Multiplies data by the Na concentration."""
        scaled_data = data * na_concentration
    
    data = b * c - 1 + a
    process_data(data)
    # If we really do wish to process the data with some other na_concentration:
    process_data(data2, na_concentration=0.3) 
    

While procedural programming works great for most simple tasks, it might be considered inferior when writing code that is meant to scale and be collaborated on.

Classes and Objects

Classes are user-defined types. Just like str, dict, tuple and the rest of the standard types, Python allows us to create our own types.

Objects are instances of classes, they’re an instance of a type we made. Actually, all instances of all types are objects in Python. It means that every variable and function in Python are, by themselves, an instance of a type. A function you make is an instance of the function type, for example. We’ll get to this during later stages of the course.

Classes are a type of abstraction we create with our code. A variable is the most simple type of abstraction - it’s a thing that is closely tied to a “real value” in a very simple relationship: My variable \(x\) represents the value \(y\).

Classes are more abstract - they don’t relate to a specific value directly, but rather they try to convey an idea of an object.

Example I: The Point Class

To show what we mean by “our own type”, we’ll define the Point type.

So, what is a point?

  • In a 2D space it’s a pair of values, \((x, y)\), specifying a location on a grid.

  • \(x\) and \(y\) are the coordinates of the point.

  • Points have special relations to other points and to the space they reside in.

From these three simple observations, we expect our Point type to include both data about its coordinates, and functions, or methods, used to interact with the grid and\or other points.

An object usually bundles together data (attributes) and methods we wish to express as some abstract template in our code. It might seem like a lot to write at first, but it pays off tremendously in no time.

# Introducing the class keyword:
class Point:
    """Represents a point in a 2D space."""
    pass

Point
# A new type is born in __main__
__main__.Point

The name Point is now a factory to create new Point instances. To make one, we have to call it like we do with a function:

blank = Point()
blank
<__main__.Point at 0x7f43c02ff5b0>

We call this instantiation (and blank is now an instance of Point).

# Assign the point's data in the form of coordinates
blank.x = 1.0
blank.y = 0.0

# x and y are now attributes of our class:
blank.x
1.0

The . means x is an attribute or method (callable) of blank (and of course there’s no conflict between a variable named x and blank.x).

print(1 + blank.x)
2.0
f"A case of a pointy Point at {(blank.x, blank.y)}"
'A case of a pointy Point at (1.0, 0.0)'
def print_point(p: Point) -> str:
    """Print a Point object.
    
    Parameters
    ----------
    p : Point
        The point instance to print
    
    Returns
    -------
    str
        Point coordinates
    """
    print(f"{p.x, p.y}")

print_point(blank)
(1.0, 0.0)

Exercise: calculate_distance()

Write the calculate_distance() function that takes two points (p1 and p2) and returns the Cartesian distance between them.

Example II: Rectangles

Take a minute to think how you would implement a Rectangle class.

Here are a couple of options:

  • We can decide to define it with a point (corner or center) and two sides.

  • We can also use two opposing points.

We’ll go with the first option, with the point being the corner.

class Rectangle:
    """
    Rectangle model.
    
    Attributes
    ----------
    corner : Point
        Bottom left corner
    height : float
        Length of vertical side
    width : float
        Length of horizontal size
    """
    pass

rect = Rectangle()
rect.width = 100.0
rect.height = 200.0
corner = Point()
corner.x = 0.0
corner.y = 0.0
rect.corner = corner

rect
<__main__.Rectangle at 0x7f43c02c7220>

We can return instances of classes (just like we do with instances of dictionaries):

def find_center(rect: Rectangle) -> Point:
    """ 
    Return a Point instance with coordinates pointing to the center of the Rectangle.
    
    Parameters
    ----------
    rect : Rectangle
        Rectangle instance to calculate the center of
    
    Returns
    -------
    Point
        Rectangle center
    """
    p = Point()
    p.x = rect.corner.x + rect.width / 2
    p.y = rect.corner.y + rect.height / 2
    return p

center = find_center(rect)
print_point(center)
(50.0, 100.0)

Also, objects are mutable:

def grow_rectangle(rect: Rectangle, dwidth: float, dheight: float) -> None:
    """
    Take a Rectangle instance and grow it by (*dwidth*, *dheight*).

    Parameters
    ----------
    rect : Rectangle
        Rectangle instance to grow
    dwidth : float
        Width delta
    dheight : float
        Height delta
    """
    rect.width += dwidth
    rect.height += dheight
print(rect.width, rect.height)
100.0 200.0
grow_rectangle(rect, 100, 100)
print(rect.width, rect.height)
200.0 300.0

Methods

We really haven’t done object-oriented programming yet. Our objects currently contain only data (as attributes), and we wrote independent functions to manipulate them as required. Methods are functions bound to objects, describing actions they can do, or that can be done to them.

For example, a real-world car can drive. So a Car object should have a drive() method. It should also have a park() method, and a couple of attributes, like number_of_wheels, manufacturer and model.

As we’ll see in a second, the only difference between methods and functions is that methods are a part of an object, and they only make sense the context of that object or an instance of it. A park() method has no meaning when we try to run it on a Rectangle.

We’ve already met many methods and used them successfully. For example, we used the append() method of a list instance. In this case it’s clear why a method is always bound to a specific class - it’s irrelevant to “append” an item to an object which is not a list.

Let’s add a method to our Point object:

class Point:
    """A 2D point."""
    
    def transpose(self): # The first argument to a method is always the calling instance itself (self)
        """Trasnposes by flipping x and y"""
        self.x, self.y = self.y, self.x

p = Point()
p.x, p.y = 10, 20
print(f"transpose is now of type: {type(p.transpose)}")
transpose is now of type: <class 'method'>
print(f"Before:\tp.x: {p.x}, p.y: {p.y}")
p.transpose()
print(f"After:\tp.x: {p.x}, p.y: {p.y}")
Before:	p.x: 10, p.y: 20
After:	p.x: 20, p.y: 10

The conceptual change here is the following: The active agents here are the objects, not the functions. Instead of transpose(point) we have the point transpose itself with p.transpose().

In general, most functions that take an instance of some object as one of their parameters should be a candidate for becoming a method, bound to that object, since you might need it later on for other instances as well.

Note

Even though methods have self as their first argument, when we call them we don’t need to pass that first parameter. self acts as a reference to the instance, or object, that we’re currently handling. This is what makes methods “special” - they work with the data “inside” the object they’re a part of, and can modify this data if needed. All methods must be defined with the self parameter as their first parameter (self isn’t actually a special keyword, rather it’s just the convention for the first argument in the method definition).

# This doesn't work, look at the number of arguments:
p.transpose(p)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-cde66be33c83> in <module>
      1 # This doesn't work, look at the number of arguments:
----> 2 p.transpose(p)

TypeError: transpose() takes 1 positional argument but 2 were given

Two arguments were given? We gave only one. The second was the self argument that is implicitly passed.

One thing is still missing though, each time we create the point we have a three step process:

  1. Create the instance: p = Point()

  2. Add the x attribute: p.x = 2

  3. Add the y attribute: p.y = 3

First, it would be nice if we could make this process shorter. Second, the Point instance is really unusable unless it has both attributes (x, y) set, so we want to make sure that we don’t have a Point without both x and y. This is accomplished by the __init__ method.

The __init__ Method

Classes have several special methods attached to them. While most are out of the scope of this course, the __init__() method is regularly used and we should definitely familiarize ourselves with it.

The __init__() methods allows us to define our class attributes inside the class definition:

class Point:
    """A 2D point."""

    def __init__(self, x: float, y: float) -> None:
        """
        Initialize a new Point instance.

        Parameters
        ----------
        x : float
            X-axis coordinate
        y : float
            Y-axis coordinate
        """
        self.x = x
        self.y = y

    def transpose(self) -> None:
        """
        Trasnposes by flipping *x* and *y*.
        """
        self.x, self.y = self.y, self.x

Now, in order to create a Point instance, we have to pass in the two arguments that the __init__() method requires:

p = Point(10, 20)
print(f"p.x: {p.x}, p.y: {p.y}")
p.transpose()
print(f"p.x: {p.x}, p.y: {p.y}")
p.x: 10, p.y: 20
p.x: 20, p.y: 10
p2 = Point()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-13fd077f7b13> in <module>
----> 1 p2 = Point()

TypeError: __init__() missing 2 required positional arguments: 'x' and 'y'

As we said, this is better because we enforce our Point user to initialize all attributes, which eases the use of the other methods the Point has. Most chances are that the first method you’ll write for a newly defined class is the __init__() method.

Let’s look at a broader example using the Rectange we defined earlier and the two other functions we also had.

class Rectangle:
    """
    Representation of a rectangle in Cartesian space based on the
    bottom left corner and the sizes of its sides.
    """

    def __init__(self, corner: Point, height: float = 10, width: float = 10):
        """
        Initialize a Rectangle instance.

        Parameters
        ----------
        corner : Point
            Bottom left corner
        height : float
            Length of vertical side
        width : float
            Length of horizontal side
        """
        self.corner = corner
        self.height = height
        self.width = width

    def find_center(self) -> Point:
        """
        Return a Point to the center of the Rectangle box.

        Returns
        -------
        Point
            This rectangle instance's center
        """
        x = self.corner.x + self.width / 2
        y = self.corner.y + self.height / 2
        return Point(x, y)

    def grow(self, dwidth: float, dheight: float) -> None:
        """
        Change this instance's size by (*dwidth*, *dheight*).

        Parameters
        ----------
        dwidth, dheight : float
            Change the first and second axes by +dwidth\dheight
        """
        self.width += dwidth
        self.height += dheight

    def move_to_origin(self) -> None:
        """Moves the center of the rectangle to (0, 0)."""
        center = self.find_center()
        centered = center.x == 0 and center.y == 0
        if not centered:
            self.corner = Point(-self.width / 2, -self.height / 2)

Note

  • Class names are written in CamelCase.

  • The docstring of the entire class describes its general purpose.

  • The __init__ method takes in three arguments, but two of them are optional.

  • We added the two functions we defined earlier to the class as methods, since they only operate on rectangles in the first place.

rect = Rectangle(p)
print(f"rect.width: {rect.width}, rect.height: {rect.height}")
rect.width: 10, rect.height: 10

If we now wish to create a new Rectangle instance and find its center, we can:

corner = Point(10, 10)
rect = Rectangle(corner, 4, 4)
center = rect.find_center()
print(f"The center of the rectange is {(center.x, center.y)}")
The center of the rectange is (12.0, 12.0)

Move it to origin:

rect.move_to_origin()
new_center = rect.find_center()
print(f"The center of the moved rectange is {(new_center.x, new_center.y)}")
The center of the moved rectange is (0.0, 0.0)

Note how the object modifies itself and acts upon itself using its methods. We’re not modifying the internal parts of the instance ourselves, we let the methods do it for us.

The __str__ Method

Another interesting dunder (“double underscore”) method is the __str__() method, which defines what an instance of the class will show when invoked with the print(class_instance) command. For example:

class ShoppingList:
    def __init__(self, vegetables=10, fruit=5, bread=1):
        self.vegetables = vegetables
        self.fruit = fruit
        self.bread = bread
    
    def __str__(self):
        n_items = self.vegetables + self.fruit + self.bread
        return f"""
        Shopping List:
            Vegetabels: {self.vegetables}
            Fruits: {self.fruit}
            Bread: {self.bread}
            Total items: {n_items}
        """
shopping_list = ShoppingList()
print(shopping_list)
        Shopping List:
            Vegetabels: 10
            Fruits: 5
            Bread: 1
            Total items: 16
        

Note

We can change the order of parameters when using keyword arguments, e.g.:

shopping_list_2 = ShoppingList(fruit=5, bread=1, vegetables=3)

The str Dunder Method

Implement a __str__() method for the Point class from earlier.

Operator Overloading

One of the most interesting properties of Python (although it’s not unique to it) is operator overloading. It means that we can force our self-declared types (i.e. classes) to behave in a certain way with the standard mathematical operations.

We’ll use the ShoppingList class as an example. Say we want to add two different shopping lists. Naively, we might just try the following:

shopping_list_a = ShoppingList()
shopping_list_b = ShoppingList()
print(shopping_list_a + shopping_list_b)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-0334a09f9174> in <module>
      1 shopping_list_a = ShoppingList()
      2 shopping_list_b = ShoppingList()
----> 3 print(shopping_list_a + shopping_list_b)

TypeError: unsupported operand type(s) for +: 'ShoppingList' and 'ShoppingList'

To us, this expression seems completely fine - adding two shopping lists should just concatenate the items one after the other. The fact that it’s a very readable line of code makes it a good line of code, since you have to remember that we write code for humans to read, not computers.

Unfortunately, Python can’t add two shopping lists because it was never taught how to do that. Luckily, we can override the behavior of the addition operator, by defining the __add__() method in the class definition:

class ShoppingList:
    """Represents a shopping list."""

    def __init__(self, vegetables: int = 10, fruit: int = 5, bread: int = 1):
        """
        Initialize a ShoppingList instance.

        Parameters
        ----------
        vegetables : int
            Number of vegetable items
        fruit : int
            Number of fruit items
        bread : int
            Number of bread items
        """
        self.vegetables = vegetables
        self.fruit = fruit
        self.bread = bread

    def __str__(self) -> str:
        """
        Return a string representation of this instance.

        Returns
        -------
        str
            String representation
        """
        n_items = self.vegetables + self.fruit + self.bread
        return f"""
        Shopping List:
            Vegetabels: {self.vegetables}
            Fruits: {self.fruit}
            Bread: {self.bread}
            Total items: {n_items}
        """

    # ----- New method below: ------
    def __add__(self, other: ShoppingList) -> ShoppingList:
        """
        Combines two shopping lists and returns the result.

        Notes
        -----
        This method returns a new shopping list, meaning it doesn't modify
        any of the existing instances it was given.

        Parameters
        ----------
        other : ShoppingList
            Another shopping list

        Returns
        -------
        ShoppingList
            Combined shopping list
        """
        return ShoppingList(
            vegetables=self.vegetables + other.vegetables,
            fruit=self.fruit + other.fruit,
            bread=self.bread + other.bread,
        )

Now we can safely add two ShoppingList instances together:

shopping_list_a = ShoppingList()
shopping_list_b = ShoppingList()

shopping_list_c = shopping_list_a + shopping_list_b
print(shopping_list_c)
        Shopping List:
            Vegetabels: 20
            Fruits: 10
            Bread: 2
            Total items: 32
        

Addition of something other than a ShoppingList instance will result in an AttributeError.

shopping_list_a + 1
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-31-e48709e66585> in <module>
----> 1 shopping_list_a + 1

<ipython-input-29-1d6b0f8bbde6> in __add__(self, other)
     58         """
     59         return ShoppingList(
---> 60             vegetables=self.vegetables + other.vegetables,
     61             fruit=self.fruit + other.fruit,
     62             bread=self.bread + other.bread,

AttributeError: 'int' object has no attribute 'vegetables'

Exercise: Dunder Methods

Modify the __add__() method so that if other is an integer, it will simply add that number to all items in the shopping list.

Note

To learn more about other Python operators, see this overview.

Summary

OOP is the most important programming paradigm for you to master on your Python journey. Some problems fit this paradigm hand in glove, however, it’s not the “ultimate” answer to any design difficulty you have. Some problems can be solved by using intricate objects and multiple inheritance, but in reality they’re much simpler when solved using a procedural design. Remember to write code that humans, and especially your future self, can read and understand.

With that being said, throughout this course I prefer you write too many objects over writing too few. Whether you’ll be writing new objects every day or not, you will certainly be using them any time you write Python code, and creating classes will nurture your confidence when doing so.

Exercise: The Vector Class

  1. Create a Vector class that simulates a 1D vector array. Assume the inputs to the class are valid. The Vector instance should be initialized with at least two attributes.

  2. The Vector class should enable adding either an integer or a different vector to a vector.

  3. The Vector class should enable checking which of two vectors is bigger, element-wise. The output is another vector with the corresponding True and False values.