The original survey collected ~970,000 responses. This public dataset is a 15,503-respondent anonymized subsample, representative by age, gender, and politics but aggressively noised and binned.
Anonymization
98.4% of responses were dropped into a representative subsample, columns were aggressively binned, demographic info was reduced, and multiple layers of noise were added. If a row looks like someone specific, it almost certainly isn't.
Correlations are dampened
Noise reduces correlations by roughly 25% (range 15–30%). A correlation of 0.5 in this data was likely ~0.62 in the original. Patterns are real in direction, weaker in magnitude.
Limited demographic window
Ages 14–32, US/Canada and Europe only. Some extreme fetishes were removed.
Directional, not definitive
Good for spotting patterns and generating hypotheses. Not a source of precise measurements.
Tip: use Data Quality to see which questions have missing answers before making strong claims.