Course manual 2021/2022

Course content

In this class, we will learn how to use statistical models to learn from data. Rather than memorizing many different types of tests and formulas, you will learn the fundamentals of building statistical models and using models to understand data. Importantly, we will show that dozens of different tests you may have heard of in statistics are just special cases of general linear models. We will teach you to use this flexible framework to learn from the many types of data you may come across in your research.

Above all else, the course is based on a philosophy to promote:

  •  critical thinking about data and models
  • a down-to-earth attitude to data analysis (as opposed to cookbook statistics)
  • a comfortable attitude towards models, math, and statistics (should not be scary or intimidating)

If you are taking this class for the second time it is important to note that the class has been completely redesigned, while there is some overlap in the topics covered, the format is entirely changed and you should treat this as a completely new class. 

Study materials

Literature

  • https://bookdown.org/connect/#/apps/1c17bcd1-d444-46fd-aaed-7d00c47d2aa1/access

Syllabus

Software

  • R and RStudio

Objectives

  • Be able to formulate general linear models to analyze data
  • Be able to fit linear models to data and interpret parameters
  • Interpret and use probability density functions and cumulative distribution functions.
  • Understand and interpret uncertainties in parameter estimates of statistical models
  • Be able to accurately interpret p-values and identify problematic uses of null hypothesis significance testing
  • Interpret the results of statistical models without relying on statistical jargon.
  • Formulate, implement, and interpret statistical models with multiple independent variable.
  • Recognize common pitfalls in the application and interpretation of statistical models models.

Teaching methods

  • Computer lab session/practical training
  • Self-study
  • Lecture

Lectures: We will have one lecture per week (2 hrs) for 7 weeks. During lectures, you will be encouraged to actively participate in discussions, ask questions, and participate in live polls.

Lab Practicals: We will have 4 computer practicals where you will work in self-selected groups of two. If you prefer to work alone that is also fine. Each pair will work on a problem set to get practical experience analyzing data in R. The goal of the lab practicals is for you to learn how to apply the theory covered in the lectures to analyze data in R. Course instructors will be present assist groups with their work and answer questions.

Assignment: In week 6, groups of two will analyze a data set and write up a short report based on their analysis. This assignment will put into practice the theory covered in the class.

Self-study: It is expected you spend six hours per week on self-study. This involves reading and watching the assigned materials, reviewing course notes and lab practicals, attending question hours, and taking the practice exam.

Learning activities

Activity

Hours

 

Digital Test

2

 

Lecture

14

 

Labs

8

 

Assignment

8

 

Self study

40

 

Total

82

(3 EC x 28 hr)

Attendance

Programme's requirements concerning attendance (OER-B):

  • Participation in fieldwork is compulsory and cannot be replaced by assignments or other courses.
  • In case of practical sessions, the student is obliged to attend at least of 90% of the sessions and to prepare himself adequately, unless indicated otherwise in the course manual. In case the student attends less than 90%, the practical sessions should be redone entirely.
  • In case of tutorials/seminars with assignments, the student is obliged to attend at least 7 out of 8 seminars and to prepare thoroughly for these meetings, unless indicated otherwise in the course manual. If the course has more than 8 seminars, the student can miss up to 1 extra meeting for every (part of) 8 tutorials/seminars. If the students attends less than the mandatory tutorials/seminars, the course cannot be completed.

Additional requirements for this course:

Attendance is mandatory for the lab practicals.

Assessment

Item and weight Details

Final grade

1 (100%)

Tentamen digitaal

10% of your grade will be based on completing the lab practicals. Your grade will be based on whether you followed instructions and thoughtfully attempted to answer every question on the practical.

20 % of your grade will be based on the quality of the assignment. The assignment will be a more in-depth analysis of a complex real-world data set which you will have one week to work on as a group. A grading rubric will be provided.

70% of your grade will be determined by your score on the week 8 exam.

Assessment diagram

Every course goal will be assessed with an equal number of questions in the digital exam.

Students that were enrolled in the course in previous years

There are no special rules for students who have taken the previous course 'From Analyisis to Evidence'.

Assignments

There are no assignments in the course.

Fraud and plagiarism

The 'Regulations governing fraud and plagiarism for UvA students' applies to this course. This will be monitored carefully. Upon suspicion of fraud or plagiarism the Examinations Board of the programme will be informed. For the 'Regulations governing fraud and plagiarism for UvA students' see: www.student.uva.nl

Course structure

Week 6

  • Self-study:
  • Lecture 1: Introduction to statistical modelling/Linear models
    • What are statistical models and how are they used/misused?
    • Linear models (parameters, independent and dependent variables)
    • Categorical vs. continuous variables

Week 7

  • Self-study:
    • Read Course Reader Part 2
  • Lecture 2: Fitting models to data
    • How do you ‘fit’ a model to data?
    • Probability density functions
    • What does it mean for a model to fit the data well? 
  • Lab 1: Fitting models to data

Week 8

  • Self-study:
    • Read Course Reader Part 3
  • Lecture 3: Quantifying uncertainty
    • Sampling distributions
    • Standard error of parameter values
    • Cumulative distribution functions
    • Confidence intervals
  • Lab 2: Quantifying uncertainty

Week 9

  • Self-study:
    • Read Course Reader Part 4
  • Lecture 4: Null hypothesis testing
    • What is a p-value?
    • False positives and false negatives
    • Correcting for multiple hypotheses tests
    • Power analysis
  • Lab 3: Null Hypothesis testing

Week 10

  • Self-study:
    • Read Course Reader Part 5
  • Lecture 5: Multiple independent variables
    • Factorial design experiments
    • Interactions
  • Lab 4: Multiple independent variables

Week 11

  • Self-study:
    • Read Course Reader Part 5
  • Lecture 6: Multiple independent variables (part 2) and review
    • Controlling for a variable
    • Multicollinearity 
    • How do I formulate, implement and interpret statistical models?
  • Start assignment project

Week 12

  • Self-study: review notes, labs, and reader
  • Lecture 7: Doing reproducible statistics
    • Reproducibility crisis
    • P-hacking
    • HARKing
  • Turn in project assignment
  • Take practice test

Week 13

  • Self-study: review notes, labs, and reader
  • Test

Timetable

The schedule for this course is published on DataNose.

Last year's course evaluation

In order to provide students some insight how we use the feedback of student evaluations to enhance the quality of education, we decided to include the table below in all course guides.

Course Name (#EC)N
Strengths
Notes for improvement
Response lecturer:

Contact information

Coordinator

  • dr. B.T. Martin

Staff

  • Walter van Dijk
  • Emma Polman