AIOS Mock Exam — 9 essay questions
Format mirrors the real exam: 9 questions, 2 hours, ~13 min/question. Sit it under timed conditions ideally on Thu 2026-05-28 (the day before).
Style was reverse-engineered from Van Rooij's in-class mock (see lecture_08_synthesis.md). Mix:
- 3 short recall + brief elaboration
- 3 paper-specific empirical / mechanistic
- 2 critical open-ended
- 1 integrative synthesis
Model answers are outlines, not full essays — write them out fully under timed conditions to feel the rate.
Question 1 — Foundations of CDR (L1)
In the introductory lecture and Elliott et al. (2021), the TRUST framework for Corporate Digital Responsibility (CDR) is presented. (a) State what each of the five letters of TRUST stands for. (b) For each letter, name an IOS platform that is most directly affected when that component fails, and justify in one sentence.
Model answer outline (~250 words): - T = Transparency: be clear about what you're doing and why. Fails → Futures of Democracy (citizens can't evaluate or contest decisions they cannot see). - R = Responsibility: be reputable and accountable. Fails → Contesting Governance (no addressable actor to challenge). - U = Understanding: provide services so customers understand outcomes and impact. Fails → In/Equality (those without resources to demand explanations bear disproportionate harm). - S = Stewardship: be a good custodian of data. Fails → Behaviour and Institutions (informational sovereignty erodes; trust in any institution collapses). - T = Truth: validate data accuracy; ensure inferences are beneficial not harmful. Fails → Security in Open Societies (false-positive policing or moderation harms innocents).
Each justification = one sentence on the mechanism.
Marking hints: all 5 letters must be correct (Transparency, Responsibility, Understanding, Stewardship, Truth) — losing any one is half-marks. Platform matches just have to be defensible; multiple right answers exist.
Question 2 — LBA mechanism (L2 + Palada)
(a) Describe the Linear Ballistic Accumulator (LBA) model in 4–6 sentences. Name its four free parameters. (b) Which parameter changes with task difficulty, and which with speed/accuracy instruction? (c) In Palada et al. (2016), what happens to these two parameters as workload increases for the cloud-monitoring task, and what is the practical implication for designing AI alerts for air-traffic controllers?
Model answer outline:
- (a) Each response option has its own accumulator with a start point sampled from U[0, A], a drift rate sampled from N(v, s), a threshold b, and a non-decision time t₀. Each accumulator races toward threshold; whichever crosses first wins → that option is chosen. Reproduces full RT distributions for correct and error responses.
- (b) Difficulty → drift rate (harder evidence → slower accumulation). Instruction → threshold (speed instruction → lower threshold → faster but more errors).
- (c) In Palada, increasing workload lowers drift rate (harder to extract info under workload) and lowers threshold (strategic, to meet time pressure). Practical implication: AI alerts should compensate for both — provide cleaner feature presentation (raises effective drift) AND avoid imposing harsh deadlines that force threshold reductions (which trade accuracy for speed).
Marking hints: must distinguish drift rate from threshold and tie each to a manipulation. The "both parameters move with workload" insight is the Palada-specific finding.
Question 3 — Machine behaviour applied (L3 + Rahwan)
(a) State Rahwan et al.'s (2019) working definition of an AI agent. (b) Name the three scales and four domains in their machine-behaviour framework. (c) Apply the framework to the Cambridge Analytica / 2016-US-election / Brexit-Vote case: locate it at one scale and one domain, and trace the digital-trace → psychological-targeting → political-behaviour loop.
Model answer outline: - (a) "Complex and simple algorithms used to make decisions." - (b) Scales: individual machine · collective machine · hybrid human–machine. Domains: Democracy · Kinetics · Markets · Society. - (c) Cambridge Analytica is a hybrid human–machine case in the Democracy domain. Loop: (1) Facebook likes and other digital traces (Rafaeli 2019) are scraped; (2) Kosinski-style models predict latent traits (political view, BIG5) with computer-equals-spouse accuracy (Hinds & Joinson 2019); (3) Matz et al. 2017-style psychological targeting selects ads congruent with the inferred trait, lifting conversion ~1.4–1.8×; (4) Kramer et al. 2014 confirms feed manipulation shifts emotional posts at population scale. Together this is the full digital-traces → prediction → targeting → behaviour loop, with elections as the output → democratic legitimacy threat.
Marking hints: scales (3) and domains (4) memorised exactly. The loop should mention at least three of the four reading-list papers from L3.
Question 4 — Mis/disinformation in bounded confidence (L4 + Douven-Hegselmann)
(a) In the Hegselmann–Krause bounded-confidence model, what is ε and how does the update rule work? (b) Douven & Hegselmann (2021) extend HK with three agent types — name them and the update rule for each. (c) Define misinformation vs. disinformation as the paper uses these terms, and state which logically implies which. (d) Describe one counter-intuitive D&H finding and explain its mechanism.
Model answer outline:
- (a) ε is each agent's confidence interval. Update: an agent's new opinion is the average over all other agents whose opinion is within ε of its own — others outside ε are ignored.
- (b) Free Riders — base HK only. Truth Seekers — pos_new = (1−α)·pos_social + α·τ (pull toward truth τ). Campaigners — hold a fixed position ρ regardless of τ.
- (c) Misinformation = aim to make the public believe a falsehood; disinformation = aim to impede/distract the public from believing a truth. Misinformation logically implies disinformation but not vice versa.
- (d) Counter-intuitive: without truth-seekers, more extreme campaign positions hurt the campaigner — because they isolate themselves outside others' ε and lose contact. OR (alternative): with truth-seekers, a subtle disinformation campaign (ρ close to τ) outperforms a bold one, because subtle ρ stays inside truth-seekers' ε and drags them off truth. Either is acceptable if mechanism is given.
Marking hints: the three agent types and the mis/dis distinction are recall. The counter-intuitive finding plus mechanism = the discriminating mark.
Question 5 — Proceed with caution (L5 + van der Vegt)
Van der Vegt, Kleinberg & Gill (2023) urge to "proceed with caution" when using computational linguistics in threat assessment. (a) Describe one of the two studies covered in the lecture (politicians or incels). (b) Specify what went wrong in the Google Perspective API's identity-attack measure in Study 1 — give one concrete example. (c) State why this measurement bias has democratic consequences, naming the IOS platform most affected.
Model answer outline: - (a) Study 1: 1.9 million tweets @-mentioning all 22 Dutch party leaders in 2022, collected via Twitter Academic API; six Perspective API measures (toxicity, severe toxicity, identity attack, insult, profanity, threat) modelled with regression on gender × ethnic-minority status × political position. Finding: female ethnic-minority politicians received the most threatening tweets. - (b) Perspective's identity-attack measure under-detects misogynistic content. Concrete: a tweet calling a female politician a "stupid bimbo, retarded crap-woman" scored 0.06, while structurally analogous antisemitic and anti-Islamic insults scored 0.84 and 0.85. Even though the definition of identity attack explicitly names gender, the model fails on gendered slurs. - (c) Democratic consequence: if the AI used to allocate close-protection resources or prioritise content moderation systematically under-counts abuse aimed at women and minorities, those groups are under-protected exactly when they need protection most → drives them out of public office → narrows democratic participation. Most affected IOS platform: Security in Open Societies, with strong coupling to In/Equality and Gender, Diversity and Global Justice.
Marking hints: specific numerical example from the slides (0.06 vs 0.84/0.85) is the discriminating mark.
Question 6 — Six threats and toeslagenaffaire (L7 + Grimmelikhuijsen-Meijer)
Grimmelikhuijsen & Meijer (2022) identify six threats that algorithmic decision-making poses to the legitimacy of public administration. (a) Name and briefly define each. (b) Apply at least four of these threats to the Dutch toeslagenaffaire, citing concrete features of how the algorithm was designed or deployed. (c) What does "calibrated institutional response" mean in their framework, and why is no single mitigation enough?
Model answer outline: - (a) (1) Reduced expertise / deskilling — frontline officers lose discretionary judgement. (2) Opacity — citizens can't see why a decision was made. (3) Bias and unequal treatment — training data and proxy variables encode historical discrimination. (4) Privacy infringement — wide data integration erodes informational sovereignty. (5) Reduced human oversight and accountability — "computer says no" / diffused responsibility. (6) Erosion of public values — speed/consistency optimised at the cost of mercy/individualised judgement. - (b) In toeslagenaffaire: Bias — dual nationality and postcode used as proxies for ethnicity → systematic over-flagging of immigrant families. Opacity — flagged families could not see the model's reasoning. Accountability — diffused; no single official was answerable; appeals were broken. Public-value erosion — "hard line on fraud" optimisation crushed proportionality and family integrity. (Optionally also: deskilling — officials deferred to the model rather than judging individual circumstances.) - (c) "Calibrated" = each threat needs its own institutional mitigation: pre-deployment bias audits for (3), XAI / explainability for (2), human-in-the-loop with real authority for (5), DPIAs for (4), training/non-AI rotation for (1), democratic deliberation about values for (6). No single mitigation addresses all six because they operate at different layers (data, model, process, institutional, normative); a single XAI tool may improve (2) but does nothing for (1) or (6).
Marking hints: all six listed; four mapped concretely to toeslagenaffaire with at least one concrete feature each; calibration explanation must say why a single fix is inadequate.
Question 7 — Medical AI + digital twins (L6 + Wang)
(a) Distinguish classification from stratification in medical ML and give one Van Rooij example of each. (b) Define a digital twin of a city and explain what makes it "bi-directional." (c) Wang et al. (2023) use federated edge learning for traffic digital twins. Explain which of Bontje's SWOT threats this architecture mitigates and which it does not. (d) Tie your answer to one letter of the Elliott TRUST framework.
Model answer outline: - (a) Classification = supervised, predict a-priori group labels (e.g. ADHD vs control from fMRI — Topic 1 in Van Rooij, Gaussian Process Classifier, 77% accuracy). Stratification = unsupervised, identify hidden subgroups (e.g. ASD subjects clustered by brain morphometry via normative modelling + spectral clustering — Topic 3). - (b) A virtual model of a physical city that reflects real-time sensor data and feeds decisions back to physical infrastructure (vehicles, traffic lights). "Bi-directional" = the data flow is both into and out of the twin. - (c) Bontje's threats: privacy and overreliance on models. Federated edge learning mitigates privacy (raw data stays on device; only gradient updates leave) but does not mitigate overreliance — an opaque federated model is still opaque to operators and citizens. - (d) Federated/edge operationalises Stewardship: data is kept local and minimally exposed, fulfilling the "good custodian of data" obligation at city scale. But it does not by itself improve Transparency or Truth, so TRUST is only partially served.
Marking hints: all four sub-answers required; the "partial TRUST" framing at the end is what lifts a 7 to an 8.
Question 8 — Rare events and class imbalance (open-ended)
AI tools are increasingly used to predict rare-but-severe events such as terror attacks, mass shootings, or imminent suicide. Explain one major problem with such tools, naming the underlying statistical phenomenon, giving an illustrative numerical sketch, and discussing how it interacts with measurement bias of the kind van der Vegt et al. (2023) document. Conclude with one institutional safeguard.
Model answer outline (the model L8 question expanded): - Phenomenon: class imbalance / base-rate fallacy. When the positive class is extraordinarily rare, even a highly accurate model produces overwhelmingly more false positives than true positives. - Sketch: suppose terror attacks among ~100,000 surveilled individuals occur at a true rate of 1-in-100,000 = 10⁻⁵. A model with 99% sensitivity and 99% specificity sees 1 expected true positive (correctly identified) and ~1,000 false positives. Precision is ≈ 0.1%. The dashboard appears to "work" by traditional accuracy metrics but is operationally useless and harmful. - Interaction with measurement bias (van der Vegt 2023): Perspective-style models also mis-detect identity attacks on women and minorities. False positives therefore concentrate on already-marginalised groups → discriminatory surveillance. The base-rate fallacy and measurement bias compound — the model both flags lots of innocent people and picks who to flag in biased ways. - Institutional safeguard: AI outputs must be used as prioritisation aids feeding into a structured-professional-judgement workflow (e.g., CTAP-25 from L5) where each flagged case is reviewed by a trained analyst; never as autonomous decision triggers. Add periodic disparity audits and a clear human-with-authority-to-override.
Marking hints: four moves — name the phenomenon, give a numerical sketch, link to van der Vegt bias, propose a safeguard. Missing any one move loses marks.
Question 9 — Integrative synthesis (the cross-lecture question)
Pick one IOS platform. Trace it across at least three lectures: state how each lecture's methodological tools can be applied to that platform's research questions, citing each lecture's obligatory paper. Conclude with the single unifying methodological move across the three.
Model answer outline — worked example: Future of Work (you could equally pick Open Cities, In/Equality, etc.) - L2 (Van Maanen, Palada 2016): The LBA gives a user model of workers under workload — useful for selection, training and protection of high-stakes operators (air-traffic controllers, surveillance analysts, medical readers). Palada shows workload affects both drift rate and threshold; AI-enhanced workplaces should account for both. - L3 (Hortensius, Rahwan 2019): Machine behaviour gives us the empirical lens to study how worker–AI interaction reshapes performance, autonomy and skill. Hinds & Joinson 2019: algorithms know workers' personality as well as their spouses → consent and surveillance issues at work. - L7 (Grimmelikhuijsen & Meijer 2022): When AI is used for hiring, performance review, or termination, six legitimacy threats apply: bias (training-data over-representation), opacity (black-box hiring), accountability (who answers for an AI rejection?), deskilling (managers defer to recommendation engines), etc. - Unifying move: in each lecture the same pattern — humans are modelled, then AI augments or replaces the human, then social/institutional design decides whether that augmentation strengthens or undermines an open society. The methodological work happens at the modelling step; the normative work happens at the institutional-design step. AIOS argues both are necessary.
Marking hints: at least three lectures and three papers, each with a specific methodological tool, ending with a named unifying move that goes beyond "AI affects society."
Final exam-day tips
- Read all 9 questions in the first 3 minutes. Mentally allocate your hardest answers a few extra minutes.
- Cite paper authors + year when you make a claim — that's the easiest way to demonstrate you've done the reading.
- Always state the IOS platform at least once per essay — that's the course's organising frame.
- Don't pad. The model answers above are tight; the grader is reading 30–50 of these, so clarity wins over volume.
- If you blank on a paper-specific finding, fall back to the methodology you do remember and frame the answer around it.
Good luck.