2 Introduction to Python

In this session, we will introduce you to Jupyter Notebooks and guide you through some of the essential programming concepts that form the foundation of working with data. Whether you are completely new to programming or looking to strengthen your understanding, this chapter is designed to help you build confidence as you take your first steps into coding.

This session is very python-specific, but (as you will find out in few lines) Jupyter Notebook is compatible with many programming languages and the programming concepts introduced here are fundamentals of every programming language.

A Jupyter Notebook is an interactive environment where you can write and execute code in small, manageable pieces, known as cells. This allows you to see the results of your code immediately, making it an excellent tool for learning, experimenting, and exploring data. It combines text, code, and the results of that code all in one place, making it a popular choice for data scientists, researchers, and educators.

In this session, we will be using Python, a programming language known for its simplicity and readability, which makes it ideal for beginners. We will cover core programming concepts, such as variables, functions, and loops, and you will learn how to apply these concepts to perform basic data exploration tasks.a

Getting familiar with Jupyter Notebook

Jupyter Notebook is an open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text. It supports various programming languages, including Python, R, Julia, and more. However, it is most commonly used with Python.

Jupyter Notebook provides an interactive computing environment where you can write and execute code in a series of cells. Each cell can contain code, markdown text, equations, or visualizations. You can run individual cells or the entire notebook to see the output of the code and the results of any computations.

The name “Jupyter” is a combination of three programming languages: Julia, Python, and R, which were the first three languages supported by the Jupyter project. It was originally developed as part of the IPython project (hence the name “Jupyter”), but has since evolved into a language-agnostic tool that supports multiple programming languages.

In the context of Python, Jupyter Notebook is a popular tool for data analysis, scientific computing, machine learning, education, and research. It allows users to write, test, and document Python code in an interactive and visually appealing manner, making it a valuable tool for both beginners and experienced programmers alike.

Jupyter Notebook cells

Jupyter notebook cells can be either code, markdown, or raw. For the simple purpose of programming and writing text, ignore the raw option. You can easily shift between code and markdown selecting the cell, pressing Esc, and then M for markdown or Y for code.

Markdown is a language for formatting text, it allows you to quickly and easily create formatted documents using simple and intuitive syntax. This current cell and any other cell displaying text in this notebook, is written in markdown. You can learn the basics of markdown syntax in few minutes reading here or simply looking at the content of the text cells in this notebook and see what happens when you select them and run them.

You can tell if your cell selected cell is a code cell because you will see square brackets on its left ([ ]:).

If you want to delete a cell, use Esc + DD (press Esc and then d twice)

WARNING: If your code cell has empty squred brackets, it means it has not been run YET.

Main programming concepts

There are some programming concepts that are common to all programming languages and can be found in any program:

variables and data types;
sequences of objects;
functions;
loops;
conditional statements;
packages (also called libraries or modules)

Variables and data types

In programming a variable is a container for a value. This value can either be a number, a string (a word), or any other type of programming object (we will talk about other possible objects later). Let’s initialise (define for the first time) some variables:

name = 'Stefano'
favourite_planet = 'Saturn'
birth_day = 6

In the previous cell we stored the word Stefano into the variable name, the word Saturn into the variable favourite_planet, and the value 6 into the variable birth_day. From now on, every time we need to use one of these values in our programming, we just need to digit its corresponding variable name.

In Jupyter notebooks, if you want to check the value contained in a variable (so its content), you can simply run a cell with the variable name inside:

name

'Stefano'

birth_day

name
birth_day

As you can notice, when you write different variable names in the same cell, only the last one will be printed on the screen.

Sequences of objects

We can store single numbers and words inside a variable, but how about we want to store a sequence of values or words, or a mix of the two, into a variable? Of course we can, we just need to use a python object called list:

names = ['Stefano','Pippo','Alfio','Tano']
ages = [20,34,94,'unknown']

names[0]

'Stefano'

ages[3]

'unknown'

In Python lists are defined listing our sequence of values separated by coma inside square brackets (variable_name = [… , … , …]). Values stored in a list can be accessed using indexing. In python you count items starting from 0, so that the first item in a list has index 0. This means that for accessing the first item in the list names we will digit name[0], and to access the last item in the list ages we will digit ages[3]

You can create lists of any object, even lists of lists:

info = [names,ages]
info[0]

['Stefano', 'Pippo', 'Alfio', 'Tano']

info[0][0]

'Stefano'

If we want to change a particular value in a list, we first need to access it and then we need to use the operator = to specify the new list value. For example, if we want to change ‘Stefano’ into ‘Steve’, we would do:

info[0][0] = 'Steve'
info[0]

['Steve', 'Pippo', 'Alfio', 'Tano']

Data types: Dictionaries

In python there are several ways you can store information. We just talked about lists, simple ordered sequences of objects. Another kind of data structure is called dictionary. In general a dictionary is a reference or resource that provides information, definitions, or explanations of words, terms, concepts, or objects. A dictionary is usually organised by alphabetically ordered words and by explanations associated to each word. In python a dictionary follows exactly the same organization principle: keyword and value.

info_dict = {'name':'Stefano','favourite_number':6}

To define a dictionary we use curled brackets ({}) instead of squared brackets. Inside the curled brackets we need to specify couples of key/values separated by comas. To each key we can associate a different python object. Keys need to be unique, while values can by any Python object.

info_dict = {'names':names}

info_dict['names']

['Steve', 'Pippo', 'Alfio', 'Tano']

To access the values contained inside a dictionary you cannot use numerical indices, as you would do for lists. Instead, you must use the name of the key related to the value. In the previous case the object names (a Python list) is associated with the key ‘names’. So, in order to access it, we need to digit info_dict['names'].

In a similar way, if you want to change the value related to a key, or create a new key/value couple, you first need to access that value and then to use the ‘=’ sign to assign a new value

info_dict['names'] = ['Steve','Josef','Alfonse','Gerrit']
info_dict['names']

['Steve', 'Josef', 'Alfonse', 'Gerrit']

Functions

A function is a python object that performs a single action given some parameters. In python, function names are usually verbs. If variable can be thought as subjects and objects in a sentence, functions are the verbs. Python has already some default functions, functions that are ready to use. Here you can find the build-in Python functions.

result = print(name)
result

Stefano

The way a function works is common to all programming languages: you give to the function one or more parameters, the function performs an action, and it returns a result. This happens so fast that, as a matter of fact, you can already think at a function and its parameters as its result. In the previous cell the function print() got as an input parameter the variable name and it printed its value on the screen.

numbers = [1,2,3,4,5,6,7,8,9,10]
result = sum(numbers)
print(result)

In the previous cell, we defined a list of values (the first 10 integers), then we used the function sum to (guess what??) sum all the numbers in the list, and we stored the result into the variable result. We finally printed the result using the function print().

Because we know that variable values are printed automatically in Jupyter notebook cells when they contain the variable name, we could write directly:

sum(numbers)

Indeed sum(numbers) represents an operation that returns the value 55 and can be considered equivalent to the value 55 itself, so that when we write it in a cell, we obtain the result printed on the screen.

len(numbers)

The function len() is one of the most used function on objects containing many items. Indeed it tell us how many items are contained in that object (i.e. the length of that object). The function type() returns the type of a variable:

type(numbers)

list

How many functions are there? thousands, probably millions. Some of them have very intuitive names (like print() and sum()), some others have more complicated names. However, every function that can be used in Python comes with its own documentation, explaining which parameters it accepts, which additional options you can specify, and which kind of result you get back when applying it. To find about a function just google “ Python documentation” or ask ChatGPT about that.

Methods

Methods are functions that are object-specific. What does it mean? There are certain operations that can be perfomed only on a certain type of object. For example, if we consider a function that transforms lower characters into capital letters, it would not make much sense to apply this function to a number.

All objects in python can have their own specific functions and these object-specific functions are called methods. To use a method on an object, you need to apply the syntax <object_name>.method(). Do you see the difference compared to a general function syntax? In a general function we have function(par1,par2,...), while in a method we already know that the function, in this case method, will be applied to its object. Therefore, inside the parenthesis we only have additional parameters.

Like functions, methods can accept all kind of parameters, but, of course, their main parameter is the object itself. Let’s see some example:

name = 'Stefano Rapisarda Arthurus Micaelus'
numbers = [1,2,3,4,5]

name.split()

['Stefano', 'Rapisarda', 'Arthurus', 'Micaelus']

We initialised two variables: a string made of several words and a list of numbers. The split() method (a string-specific function) divides the string into a list of strings according to a separator. If you dont’ specify any separator (like in our case), white spaces will be considered as separators. Let’s see another example:

numbers.pop(2)

numbers

[1, 2, 4, 5]

We initialised two variables: a string made of several words and a list of numbers. The split() method (a string-specific function) divides the string into a list of strings according to a separator. If you dont’ specify any separator (like in our case), white spaces will be considered as separators. The variable numbers is already a list and using the method pop(x) we can remove the item occupying the 3rd position (index 2). The method affects the list and returns the just removed value.

How can we found about methods if there are so many? Usually a google search can point you at the method or function you need. In general you can always consult python documentation. You will find about string a list methods here and here, respectively.

Loops

One of the potential of using machines is making them repeating the same operation hundreds, millions, or billions of times.

Let’s say I have a list of names and I want to print them on the screen one by one:

names = ['James','Martin','Sandra','Paul','Chani']
print(names[0])
print(names[1])
print(names[2])
print(names[3])
print(names[4])

James
Martin
Sandra
Paul
Chani

This did not take us much time, because the names are only 5, but imagine you have a list of 1000 names; in that case printing all the names could take hours. Looking at the previous cell we notice that we use repeatedly the function print() using as input the values contained in the list names. Every time we need to repeat an operation many times, we can use a loop, specifically a for loop:

for i in range(5):
    print(i,names[i])

0 James
1 Martin
2 Sandra
3 Paul
4 Chani

In the previous cell the same operation (print()) is executed 5 times, but at each step, so at each iteration, the variable i changes, going from 0 to 4, one step at the time.

In order to achieve this result we need to start declaring for i. i is the variable name acting as a place holder for a value that will change at every step of the iteration. We chose the letter i, but you can choose any other name. After for i, we need to specify which values i can assume at each iteration. in range(5) means that i will go from 0 to 4, so it will increase of 1 integer per iteration stopping just before 5. Instead of a range of numbers, we can specify any other object containing several objects in it. In that case, the variable i (or whatever you will decide to call it), at each iteration, will be initialized with each value contained in the specified object. Let’s see some example:

for a in range(12): print(a)

for value in [0,1,4,5,6,7]:
    print(value)

for name in names:
    print(name)

James
Martin
Sandra
Paul
Chani

For looping over dictionaries, the concept is the same, but the syntax is a bit different because of the key/value stricture of dictionaries:

info_dict = {
    'name':'Stefano',
    'surname':'Rapisarda',
    'favourite_number':6,
    'favourite_planet':'Saturn'
}
for key,value in info_dict.items():
    print(key,':',value)

name : Stefano
surname : Rapisarda
favourite_number : 6
favourite_planet : Saturn

WARNING You noticed that after the for statement, there is an indent of 4 spaces. You can make that indent using the TAB key. That indent tells python that that specific line of code is inside the look and, therefore, needs to be repeated. When you write code without indents, before or after the loop, those lines will be executed normally, i.e. once, one after the other.

print('Beginning of the for loop, we will have 10 iterations')
print('='*72)
for i in range(10):
    print('This is iteration number:',i)
    print('The next iteration will be:',i+1)
    print('End of iteration',i)
    print('-'*62)
print('='*72)
print('End of the loop')

Beginning of the for loop, we will have 10 iterations
========================================================================
This is iteration number: 0
The next iteration will be: 1
End of iteration 0
--------------------------------------------------------------
This is iteration number: 1
The next iteration will be: 2
End of iteration 1
--------------------------------------------------------------
This is iteration number: 2
The next iteration will be: 3
End of iteration 2
--------------------------------------------------------------
This is iteration number: 3
The next iteration will be: 4
End of iteration 3
--------------------------------------------------------------
This is iteration number: 4
The next iteration will be: 5
End of iteration 4
--------------------------------------------------------------
This is iteration number: 5
The next iteration will be: 6
End of iteration 5
--------------------------------------------------------------
This is iteration number: 6
The next iteration will be: 7
End of iteration 6
--------------------------------------------------------------
This is iteration number: 7
The next iteration will be: 8
End of iteration 7
--------------------------------------------------------------
This is iteration number: 8
The next iteration will be: 9
End of iteration 8
--------------------------------------------------------------
This is iteration number: 9
The next iteration will be: 10
End of iteration 9
--------------------------------------------------------------
========================================================================
End of the loop

Conditional statements

We have seen how to store data and information into variables and how to access this information by indexing, so referring to the position of values inside an object. How about selecting information using other criteria? What about if we want to visualize only peoples names if they are older than 30 or printing the names of towns that start with an ‘s’? To do that in programming we need to use conditional statements. Conditional statements are indeed conditions that need to be satisfied in order for something to happen. What is “something”? Whatever action we want: an operation, a printing function, etc.

for key,value in info_dict.items():
    if 'favourite' in key:
        print(key,':',value)
    else:
        print('Not interested!')

Not interested!
Not interested!
favourite_number : 6
favourite_planet : Saturn

We used the same for loop to explore dictionaries keys and values, but this time, inside it, we wrote a conditional statement. The syntax for a conditional statement is:

if <condition>: 
    action
else:
    other_action

<condition> is the condition that needs to be satisfied. In this case we want the word ‘favourite’ to be contained inside the key. If this happens, the condition is True and the “action” is perfomed (in our case, key and value will be printed). If the condition is False, the “other_action” will be perfomed (in our case, the ‘Not interested’ message will be printed).

You can also make conditions comparing quantities:

numbers = [1,2,3,4,5,6,7]
for number in numbers:
    if number < 4:
        print(number,'is smaller than 4')
    elif number > 4:
        print(number,'is larger than 4')
    elif number == 4:
        print(number,'is exactly 4')

1 is smaller than 4
2 is smaller than 4
3 is smaller than 4
4 is exactly 4
5 is larger than 4
6 is larger than 4
7 is larger than 4

Conditional statements may also be combined:

for number in numbers:
    if (number < 4) or (number > 4):
        print(number, 'is not 4')
    else:
        print(number, 'must be 4')

1 is not 4
2 is not 4
3 is not 4
4 must be 4
5 is not 4
6 is not 4
7 is not 4

In the previous case we used three conditions that are satisfied if a number is smaller, larger, and equal to 4.

Using loops in combination with conditional statements is particularly useful when it’s time to select data. For example, imagine we have data in a table with two columns, one containes years and the other column can be any kind of measurement. In this case, you can use conditional statements to select measurements and very specific time intervals.

Packages

There are millions of functions and objects out there, how can we use them? Python installation does not come with ALL the functions ever written for Python. Functions and objects are usually organized in packages (also called libraries or moduli). Each package contains a set of tools specific for certain tasks. There are tools for statistics, machine learning, building website, text-mining, etc. How can we access all these tools? First of all, we need to download the package into our computer. Usually in the documentation page of the package, there are installation instruction. Once installed, the package needs to be imported.

import pandas as pd
from matplotlib import pyplot as plt

In the previous cell we imported two packages, pandas and pyplot. When we import something, it is convenient to choose an alias for it, so that, when needed, we don’t need to write its entire name. In our case, pd will be the alias for pandas.

In the second line we see a slightly sifferent statement. In this case, we import the package pyplot. The package is a sub-package of the massive library matplotlib. Therefore, we need to specify the macro-package containing pyplot. We could also import pyplot in the following way.

import matplotlib.pyplot as plt

From now on, every time we will need to use a pandas function or object, we just need to specify the alias of the package before the function or object we want to use:

df = pd.DataFrame()

In the previous case, we initialised a variable called pd with a pandas DataFrame. Let’s see what happens if we forget to specify pd:

df = DataFrame()

NameError: name 'DataFrame' is not defined

We obtain an error because Python does not recognize the function name.