Lecture 6 — Medical AI & Digital Twins (Van Rooij + Bontje)

Paper: Wang, W., He, F., Li, Y., Tang, S., Li, X., Xia, J., & Lv, Z. (2023). Data information processing of traffic digital twins in smart cities using edge intelligent federation learning. Information Processing & Management, 60(1), 103171.

Type: Thematic (orange). Two short half-lectures combined into one session: - Van Rooij: ML applied to medical / neuro data — classification and stratification. - Bontje: digital twins of urban traffic — sensor-driven simulations of cities.

Both halves share the Promises vs. Risks framing that has structured the thematic lectures.

Half A — AI in Medicine (Van Rooij)

Background

AI applications in healthcare have skyrocketed in the last decade: - AI tools to develop new medicines - AI tools to analyse wearables data - AI tools to diagnose patients - AI tools to investigate (neuro)biological underpinnings of disorders

Two methodological families

Method	Type	Purpose
Classification	Supervised ML	Predict a-priori group labels (patient vs. control, diagnosis) and identify which features are most predictive
Stratification	Unsupervised ML	Identify hidden / underlying subgroups within a population (data-driven phenotyping)

Three worked examples

Topic 1 — Classifying ADHD patients from fMRI

Data: fMRI activation maps during an Inhibition task.
Classifier: Gaussian Process Classifier (a Bayesian ML method).
Outcome: patient diagnosis + a feature-weight map showing which voxels drove the prediction.
Performance: N = 700; accuracy 77.1%; sensitivity 75%; specificity 80%; ROC AUC ≈ 0.82.

Promises	Risks
Better classification of patients	Determinism / wrongful interpretation (a 77%-accurate model is not "the brain causes ADHD")
More insight into neurobiology of disorders	Inaccurate individual predictions
More insight into between-subject heterogeneity	Malignant use (e.g. by insurance companies)

Topic 2 — Predicting COVID-19 cases & deaths from demographics

Data: COVID-19 clinical data + community-level demographics.
Method: logistic regression.
Outcome: odds-ratios / risk estimates for community-level COVID cases and deaths.

Promises	Risks
Better understanding of COVID-19 spread	Discrimination / inequality (some groups labelled "high-risk" → over-policing or exclusion)
Targeted interventions in at-risk populations	Wrongful attribution of causality (black-box confounders)
Improvement of healthcare system	Wrongful biological determinism
	"Garbage in, garbage out" — biased input data → biased policy

Topic 3 — Stratifying autism (ASD) subjects from brain structure

Data: structural morphometry of 53 brain segments.
Methods: normative modelling (deviation from a learned normative range) + spectral clustering.
Outcome: data-driven clusters of patients with distinct clinical profiles.

Promises	Risks
Insight into between-subject heterogeneity	Statistically spurious clusters that don't replicate
Targeted interventions based on subject profile

Van Rooij's wrap-up

Many different applications for AI in the medical field.
More data, better access, better methods.
Benefits to science and health are real, but so are the risks.
Understanding the AI methods themselves is necessary to mitigate the risks — black-box clinical decisions cannot be ethically justified.

Half B — Digital Twins of (Urban) Traffic (Bontje)

What is a digital twin?

A virtual, bi-directional model of a physical city that visualises urban processes in real time and supports planning, management, and decision-making. "Bi-directional" = the twin both reflects sensor data from the city and feeds decisions back to it.

Why for cities?

Cities are complex and constantly changing.
DT data makes that complexity visible.
DTs support faster and better-informed decisions — traffic, air pollution, accessibility, safety, livability.

Traffic specifically

Traffic is one of the most important urban systems.
Goal: predict and control traffic flow.
Use cases: safety and accessibility analysis; event planning / crowd management; scenario testing ("what if we close a road?").

The Dutch state of play (DMI programme)

No "network" of DTs yet, but lots of shared assets: 3D city models, dashboards, simulations, fieldlabs / pilots.
Aim: move from isolated pilots / closed systems toward open, modular, reusable systems — reusable building blocks, shared standards, "Digital Twin as a Service," a European Digital Twin Appstore.

Emerging techniques

2D/3D visualisation.
AI and language models as analysis layers.
Dynamic sensor data with local AI — sensors collect real-time traffic data; local computing (edge processing) close to the street processes it; AI recognises traffic situations; the system feeds back to vehicles or traffic infrastructure.

Challenges

Many twins are still pilots.
Expensive custom solutions.
Dependence on one software provider (vendor lock-in).
Need for standards.
Uncertainty in models.

SWOT analysis of DTs of urban traffic (Bontje's framing)

	Positive	Negative
Internal	Strengths: scenario testing; better data-based decisions	Weaknesses: limited real-time traffic data; model uncertainty
External	Opportunities: national network of local DTs; reusable standards/modules; AI + sensor data	Threats: privacy risks; overreliance on models

IOS connections explicitly drawn by Bontje

Open Cities — test redevelopment scenarios before implementation; compare effects on movement, accessibility, safety, livability.
Behaviour and Institutions — embed cognitive models of pedestrians/cyclists in DTs to simulate how people perceive, decide and move (this is Bontje's PhD topic; bridges to L2 cognitive modelling).
Fair Transitions — DTs make redevelopment impacts visible across different user groups, supporting inclusive and transparent decisions.

Paper 7 — Wang et al. (2023): Edge intelligent federation learning for traffic DTs

⚠️ Reconstructed from the title and general knowledge — confirm against the paper.

What's in the title

Traffic digital twins in smart cities — same object Bontje describes.
Data information processing — the central problem: each car, sensor, intersection produces data; how do you fuse it into a coherent, real-time twin?
Edge intelligent federation learning — the methodological contribution. Federated learning trains a shared model across many local devices without moving raw data to a central server; "edge intelligent" puts the inference and partial training on the local sensor/device.

Why federated + edge for DTs?

Bandwidth: streaming all raw sensor data to a central server is infeasible at city scale.
Latency: traffic-control decisions need millisecond responses; round-tripping to the cloud is too slow.
Privacy: GDPR + the privacy concern Bontje flagged as a Threat in the SWOT — federation keeps raw data local; only gradient updates leave the device.
Resilience: a local intersection's model keeps working even if the central system is unreachable.

Likely paper structure

Architecture: device → edge → city layer.
Federation protocol: clients compute local gradients on local traffic data; aggregator combines them into a global model; global model is deployed back to clients.
Evaluation: prediction accuracy (traffic flow, congestion, accident risk) vs. centralised baselines, communication cost, latency, privacy guarantees.

Why this paper closes the loop with L1's CDR

Stewardship of citizen-generated traffic data → federation/edge is how you operationalise stewardship at city scale.
Transparency / Truth → DTs are only legitimate if their predictions are auditable; federated systems can be more auditable than monolithic clouds but only if standards demand it.
Bontje's Threats (privacy risks, overreliance) → federated edge architecture mitigates the first but not the second.

Why this matters for an open society

L7 is methodologically diverse but normatively unified by the Promises × Risks lens:

Medical AI → Transitions & Wellbeing pillar (healthcare delivery), with Equity & Diversity stakes (discriminatory misuse, biological determinism).
Digital twins → Open Cities and Behaviour & Institutions platforms, with privacy and overreliance as Equity & Democracy threats.
The unifying argument across both halves: AI in high-stakes thematic domains demands method-literate citizens, scientists, and policymakers — you cannot evaluate the legitimacy of a clinical classifier or a smart-city DT without understanding the basics of the underlying algorithm. This is essentially the course's core normative claim, re-stated in a thematic key.

Likely essay-question angles

"Distinguish classification from stratification in medical AI. Apply each to a concrete example from Van Rooij's lecture and discuss its risks."
"What is a digital twin of a city? Use Bontje's SWOT to argue whether the Dutch DMI programme is more likely to advance or threaten the IOS pillar 'Open Cities'."
"Wang et al. (2023) use federated edge learning for traffic DTs. Explain how this architecture mitigates some of Bontje's listed threats but not others. Tie back to Elliott's TRUST."
"Compare medical AI and traffic DTs as deployments of opaque ML systems in high-stakes public domains. Which of Grimmelikhuijsen & Meijer's six threats is most acute for each?"

Quick self-test

Difference between classification and stratification in ML — give one medical example of each.
Three risks Van Rooij identifies for classification of psychiatric patients from neuroimaging.
Definition of a digital twin — and what does "bi-directional" mean here?
Bontje's SWOT: name one item in each of the four cells.
Why federated learning + edge inference for traffic DTs? Three reasons.
Which IOS platforms does each half of L7 most directly speak to?