Summary

Databases and Data Analysis

In the first part of the workshop, we conducted an interactive exercise to experience the main challenges associated with converting unstructured data into structured, organized, tabular (and generally more mathematical) data. Having structured, machine-readable data is essential for proper data analysis. It is important to note that database creation, which marks the beginning of every data analysis process, can be affected by errors, missing data, and biases driven by human decisions.

In the second part of the workshop, we focused on data analysis. Data analysis is often only considered in relation to programming tools. However, we aimed to emphasize the fundamental principles of data analysis, emphasizing that analyzing data essentially involves querying a dataset. While some information may be easy to retrieve, other information may be hidden or require assumptions and speculation. Python (or any other programming language) is simply a tool to translate our questions into a machine-readable form.

Regardless of our data analysis process and the tools we use, it is crucial to describe and document our choices so that other researchers can reproduce the entire workflow that led to our conclusions.

General data analysis workflow

  1. Define a Research question:
    • Understand the problem or question you are trying to address;
    • Clearly define the goals, objectives, and sub-task to answer a research question.
  2. Collect/Organise Data:
    • Collect relevant data from various sources;
    • Ensure data quality, address any missing or inconsistent data, ensure proper data structure.
  3. Clean Data:
    • Clean and preprocess the data to handle missing values, outliers, and errors;
    • Standardize or normalize data formats if necessary.
  4. Explore Data:
    • Explore the data using statistical and visual methods;
    • Identify patterns, trends, and relationships in the data.
  5. (Model):
    • Select appropriate models based on the analysis goals;
    • Evaluate the model’s performance using metrics relevant to the analysis;
    • Fine-tune the model if necessary.
  6. Interpret Data:
    • Interpret the results of the analysis in the context of the initial research question;
    • Draw conclusions and make recommendations based on the findings.
  7. Visualization and Reporting:
    • Create visualizations to communicate key findings;
    • Prepare a comprehensive report summarizing the analysis process, results, and insights.