Course manual 2021/2022

Course content

Timely science supporting our future planet requires well developed data handling, analysis and modelling skills. We therefore offer a modular course where students can develop these skills in the field of their choice.

Modules from which can be made a choice:

  • Programming in Python
  • Programming in R
  • Modelling and Simulation
  • Agent Based Models
  • Basic Statistics in R
  • Statistical Learning A (and more advanced B, C and D)

These modules are all illustrated with examples from earth science and ecology.

Study materials

Practical training material

  • Manual

  • Data

  • Exercises

Objectives

  • The students will study key theoretical concepts of diverse data analysis and/or modelling methodologies
  • The students will master basic and some advanced skills, necessary for independently applying data analysis techniques and/or developing and applying dynamic models
  • The students will develop an attitude to independently solve analytical problems using statistical techniques and/or dynamic models and communicate the results and conclusions

Teaching methods

  • Coached self tuition
  • Lecture
  • Seminar

Each module is started with an introductory lecture to explain the matter. Students work on the modules during scheduled computer lab sessions.

Learning activities

Activity

Number of hours

 

Lectures

8

 

short seminars

8

 

Self-study  140

 

Tests 12

 

Total

160

 

Attendance

Requirements of the programme concerning attendance (OER-B):

  1. Attendance during practical components exercises is mandatory.

Additional requirements for this course:

Each module takes a week. In the end of the week students must be present for an on exam.

Assessment

Item and weight Details

Final grade

1 (100%)

Tentamen 1

During this course  ten modules are available. The student has to select four modules, according to their field of interest. At the end of each week the module will be graded by handing in a project, presenting the results or by a written exam. Each module weights 25% for the final grade. Students cannot fail in more than two modules.

Assignments

Module 1. Programming in Python

  • One hour long practical test in the end of the week.

Module 2. Modelling and Simulation

  • One hour long practical test in the end of the week covering all covered topics and approaches.

     

Module 5. Programming in R

  • One hour long practical test in the end of the week.

Module 6. Basic Statistics in R

  • Statistics is the science (and art) of learning from data, and experts in many fields use statistics to understand and analyze large data sets. In this module the emphasis is on statistical thinking by teaching key concepts, and applying them to case studies in R. This module covers basic probability, descriptive statistics, hypothesis testing (2 means or proportions, association among two variables, goodness-of-fit), analysis of variance, and simple and multiple regression. This module focuses on conceptual understanding and interpretation, and assumes that you have no experience with R. In the final assignment you will have one hour long practical test.

     

Module 7. Statistical Learning A

  • Statistical learning encompasses many methods, including regression modelling, which are used to make predictive models from complex data sets and systems, and for exploring cause-effect relationships between variables. The focus here is on understanding the techniques at a conceptual (non-mathematical) level. You will complete practical exercises in R, so you should already have basic R programming and statistics skills. In the final assignment you will have one hour long practical test.

     

Module 8. Statistical Learning B

  • Classification techniques (logistic regression, linear discriminant analysis and K-nearest neighbors) are the models which are primarily used when predicting categorical outcomes (e.g. species distribution modeling, land use classification medical applications). Hands-on exercises in this module will teach you to use R to develop and fit classification models, and compare and experiment with different modelling techniques, data sets and methods. By the end of the module, you will be able to apply what you have learned by creating your own R classification models. In the final assignment you will have one hour long practical test.

     

Module 3. Agent Based Models

  • Agent based models (ABMs) are increasingly being used in ecology, economics and the humanities to understand the behaviour of complex systems. ABMs describe a system by specifying rules or relationships between basic entities (agents), and subsequently use simulation and model-experiments to study the results of these design choices. In this module you learn to build different types of ABMs and apply these to understand typical ecological systems, involving navigation, competition, and evolution. In the final assignment you will have one hour long practical test.

Module 9. Statistical Learning C

  • Resampling, model selection algorithms and criteria, regularization and semi-parametric methods to estimate non-linearity make machine learning ‘tick’ (in fact, these methods facilitate the learning component). In this module you will learn the statistical basics for each of these techniques, but also apply them in practical applications as well in R. After completion, you will be able to apply resampling methods to select among different classification and regression models, understand general model selection procedures, understand the role of regularization, and be able to evaluate non-linear relations with generalized additive models. In the final assignment you will have one hour long practical test.

Module 10. Statistical Learning D

  •  Machine learning methods (tree-based methods, support vector machines as well as unsupervised techniques as dimension reduction and clustering) are algorithmic procedures that can find patterns in data and build predictive models – provided there are sufficient relevant data available for the system that is studied. In this module you learn to apply and interpret machine learning methods to earth-scientific and ecological examples in R. After completion you should be able to understand many of the machine learning techniques encountered in the current scientific literature and also apply such methods to a given data-set. In the final assignment you will have one hour long practical test.

The student selects a total of four modules from this list.

Fraud and plagiarism

The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl

Course structure

  • One module a week, depending on students' choice.
  • Daily hand-in of results (schedule can be found on Canvas)
  • Written exam, oral presentation or project deadline every Friday.

Timetable

The schedule for this course is published on DataNose.

Additional information

Basic knowledge on statistics and mathematics as can be expected from students with the BSc in Earth Sciences, Biology, Environmental Sciences or Future Planet Studies.

Contact information

Coordinator

  • Eldar Rakhimberdiev