Glossaries

Data Glossary

Controlled vocabularies

Standardised sets of terms or phrases used to ensure consistency and accuracy in categorising and retrieving information.

Data cleaning

The process of identifying and correcting errors, inconsistencies, and inaccuracies in a dataset to improve its quality and reliability.

Data harmonisation

The process of integrating and standardising data from different sources or formats to ensure consistency and compatibility for analysis or other purposes.

Data models

Abstract representations defining the structure, relationships, and constraints of data within a system or database.

Enrichment

The process of enhancing or augmenting existing data with additional information to improve its quality, usability, or value.

Graph database

A database structured around graph theory, where data entities are represented as nodes and their relationships as edges, facilitating complex and interconnected data querying.

ID

Identification or identifier used to uniquely distinguish an entity within a system.

Normalisation

The process of organising data in a database to reduce redundancy and dependency by dividing large tables into smaller ones and defining relationships between them.

Relational database

A type of database management system (DBMS) organised around tables and relationships, adhering to the principles of the relational model.

Programming glossary

Code

Code is like a set of instructions that tells the computer what to do. It’s similar to a recipe for the computer. The instructions for a specific task may very according to the programming language you use, that is why you also usually specify the language you are using (e.g. “Python code”).

csv

CSV (coma separated values) is a way to store information, like making lists. It’s a simple way to organize data, like names and ages, using commas. A file containing data organised in this way, has usually the extention “.csv”. Even if the word “coma” is present in the acronym, data can be also separated by other symbols such as “;” or “:” and still be contained in a .csv file.

DataFrame

A DataFrame is a Python object included in the pandas library. It is basically a table where information is organised in rows and columns. Every DataFrame row has an index that can be either a numeric value or a string (i.e. a label)

Initialisation

Initialise basically means getting things ready. It’s the starting point before using something in a program. Initialising a variable, in particular, means assigning a value to it for the first time:

# We initialise the variable name and age for the first time with a string (a word) and a number
name = 'Stefano'
age = 28

Library

See package

Loop

A loop is like a computer doing something over and over again. It’s a way to repeat a task multiple times.

Object

An object is like a thing in the computer’s world. It has characteristics and things it can do. For example, a dog can have a name and bark.

Package

A library or package is like a toolbox with ready-made tools. It’s a collection of helpful code (objects and functions) that programmers can use to make their work easier. Packages are made by the programming community and are usually organised according to certain specific tasks. You may find packages specific for time series analysis, text analysis, satellite image analysis, etc. The advantage of using a package is that you do not have to spend time and energy in finding solutions to problems already tackled by other people. Packages do not usually come automatically with the basic programming language installation (so to optimize space), but need to first be downloaded and imported. You may think at packages as building tools. Downloading a package would be similar to buying them from the shop and importing them is similar to get them ready to work (you do not need all your tools ALL the time for ANY house job, right?)

Series

A Series is a DataFrame with a single column. Like DataFrames, a Series is an object belonging to the pandas library. Series rows, like in a DataFrame, have indices that can be either values or labels.

Variable

A variable is like a box where you can keep and change information. It is a container for numbers or words.