Data Quality Hub

Data Quality Dashboard

A central concern with online studies is whether each response was submitted by a single attentive human participant.

This breaks down into three specific data quality concerns:

Attention

Whether participants pay attention to study materials

AI and Bots

Whether responses are generated by AI agents, LLMs, or Bots

Account Fraud

Whether multiple responses were submitted by the same human using multiple accounts or fake accounts

Each of these concerns can be addressed with targeted quality checks, which may evolve over time. As a starting point, this dashboard tracks the checks discussed in our companion paper — and will expand as researchers contribute new metrics.

References

Builds on “Mission Possible: The Collection of High Quality Online Data” by Çelebi, Exley, Harrs, Kivimaki, Serra-Garcia & Yusof (2026). The paper provides an evidence-based assessment of data quality across online survey platforms, using laboratory and AI agent responses as benchmarks. It also proposes a new two-stage recruitment method. Materials are available in the companion GitHub repository.

Each metric is an indicator equal to 1 if the check passes. Higher pass rates are better across all metrics.

Attention Check Attention

Equal to 1 if respondents pass instructional attention checks. For example, respondent correctly selects the option furthest to the “left” and “right” when asked to do so in a 5-point Likert question.

Higher pass rates indicate more attentive participants.

Video Check AI and Bots

Equal to 1 if the respondent correctly types four numbers shown sequentially in a short video.

AI agents typically take infrequent screenshots and may miss numbers shown in a video.

Typed Text AI and Bots

Equal to 1 if the open-text response was entered through manual typing.
Three conditions must hold: (i) no paste event recorded; (ii) no large discrete jump in text length without corresponding keystrokes (>50 characters); and (iii) at least one keystroke recorded. Together these detect copy–paste, drag-and-drop, and fully automated insertion.

Higher rates indicate typed responses, rather than copy-and-pasting from an LLM.

Typed with Typical Speed AI and Bots

Equal to 1 if the respondent typed the open-text response and the median inter-keystroke interval exceeds 75 milliseconds.

Human typists naturally fall within a speed range; very fast input suggests automation.

ReCAPTCHA Score AI and Bots

Equal to 1 if Google’s reCAPTCHA score is at least 0.5. The score aggregates behavioral and contextual signals during the respondent’s interaction with the survey interface; lower scores indicate a higher likelihood of bot activity.

Scores ≥ 0.5 indicate human-like interaction patterns.

Pangram AI Detector AI and Bots

Equal to 1 if the Pangram AI likelihood score for the open-text response is below 0.5. The score reflects the estimated probability that the text was generated by a large language model rather than a human.

Scores below 0.5 indicate likely human-authored text.

Mouse Clicks AI and Bots

Equal to 1 if the respondent clicked on the screen at least once during the questionnaire.

Mouse clicks are expected from human participants navigating the survey.

Mouse Movements AI and Bots

Equal to 1 if the respondent moved the mouse at least once within the questionnaire page.

Mouse movements are expected from human participants navigating the survey.

Unique IP Address Account Fraud

Equal to 1 if the IP address is unique within the sample.

Duplicate IPs indicate submissions from the same device and/or network.

No Foreign IP Address Account Fraud

Equal to 1 if the IP address is not located outside the targeted country, as determined by an IP intelligence service (e.g., MaxMind).

A foreign IP address may indicate misrepresented or fraudulent participation.

Not in a Geolocation Cluster Account Fraud

Equal to 1 if fewer than five responses in the sample share the same geographic latitude–longitude coordinates, as reported by the survey platform (e.g., Qualtrics) or an IP intelligence service (e.g., MaxMind).

Geolocation clusters indicate submissions from the same server or geo-location.

No Duplicate Submission Account Fraud

Equal to 1 if the submission is not flagged as a duplicate by the survey platform (e.g., Qualtrics), which identifies repeated submissions using browser cookies.

Duplicate submissions indicate submissions from the same browser.

No Duplicate Device Fingerprint Account Fraud

Equal to 1 if no other response in the sample shares the same device fingerprint, as provided by a device fingerprinting service (e.g., Fingerprint.com). Unlike cookies, device fingerprints persist across IP changes and browser resets, making them more robust for detecting repeated submissions from the same device.

Duplicate device fingerprints indicate submissions from the same device.

Submitted Studies

Contribute your study results

Help expand the dataset and improve comparisons across platforms.

What to include

Platform
Sample size
Study date
Basic study information
Your name & affiliation
At least one data quality metric

Want to update a previously submitted entry? Click "Edit Existing Entry" below.

Takes about 5 minutes

Data Quality Dashboard

Data Quality Concerns

References

Data Quality Metrics

Submitted Studies

Contribute your study results