Dataclysm Summary and Review

by Christian Rudder

Has Dataclysm by Christian Rudder been sitting on your reading list? Pick up the key ideas in the book with this quick summary.

The internet certainly feels like it offers a high degree of anonymity. When you notice that someone has written a nasty comment about Justin Bieber on your favorite Bieber music video, it’s easy to reply with a simple “u suck!” and never have to worry about the consequences.

But while your YouTube enemy may never uncover your identity, and while you might erase your browser history and encrypt your devices with secure passwords, even so – your online data is still easily procured and monitored.

And with this data, companies learn all sorts of interesting things about what you do online; and from this, we can learn a lot about human behavior in general.

In this summary of Dataclysm by Christian Rudder, you’ll also discover:

  • that white men really do love The Allman Brothers Band;
  • why the word “pizza” is a force that binds all humans; and
  • why being less attractive can actually get you more dates.

Dataclysm Key Idea #1: Raw data from online dating sites has a lot to tell us about our preferences in potential partners.

When researchers interview people about sensitive issues, they have to account for a certain degree of dishonesty. Even when we voluntarily participate in a study, that doesn’t mean that questions – or our honest answers – won’t embarrass us.

The internet, and in this case data from dating website OkCupid, has enabled researchers to gather unfiltered information directly from the source.

For example, the data reveals that heterosexual men and women typically prefer the same types of partners, respectively.

When men are interviewed about their age preferences for the opposite sex, for instance, they tend to give numbers closer to their own age. However, data from OkCupid profiles reveals that most men actually prefer women in their early 20s.

OkCupid’s profile ratings data also shows that women tend to prefer men who are older than they are – that is, until the men reach their 30s. At that point, women will show a preference for both older men and men who are their own age.

The data also demonstrates qualitative differences in men’s and women’s preferences, namely: men are most interested in physical attributes, while women are more interested in materialistic things, such as social status and wealth.

In addition, and despite common wisdom, being seen as conventionally attractive is not always beneficial when seeking a partner online. In fact, having a low profile rating on dating sites can actually bring you more attention.

A woman with a rating of two out of ten, for instance, is more likely to find a match than a “perfect ten.” The assumption is that there is less competition for the lower-scoring woman as a partner, which in turn means a greater chance of success for an interested man.

Conversely, a woman with a higher score may give people the impression that the competition for her attention is high, and so she becomes less appealing to potential suitors.

Dataclysm Key Idea #2: While technology has killed the pen and paper, we write more today than we ever have before.

Some people believe that the internet has degraded our cognitive abilities and distracted us from having a real social life.

However, even critics cannot deny that the internet has vastly improved our abilities in at least one art form: writing.

Thanks to social media, we write far more than previous generations ever did. Indeed, the internet is a writer’s world. Whether it’s teenagers on Facebook, Twitter, Instagram or Snapchat, or career bloggers, the very foundations of internet communication require the written word.

Even when posting non-text media, such as photos or videos, words are crucial to caption and provide context, comment on the material or discuss it with others. Amazingly, there will be more written on Twitter in the next two years than all the words in all the books ever printed.

Not only is writing online ubiquitous, but social media platforms such as Twitter may actually improve our writing skills. Twitter’s 140-character limit, for example, forces users to exercise brevity in order to express themselves.  

Research shows that, despite the character limits, writing on Twitter isn’t “dumbed down.” For instance, abbreviations such as “u” in place of “you” are no more prevalent on Twitter than they are on any other medium, and has everything to do with the individual preferences of the user.

Moreover, the need to be concise on Twitter can actually be a good thing. Linguists, for example, have measured Twitter’s lexical density, finding that its proportion of “content-carrying words,” such as verbs and nouns, are not only higher than in emails, but even comparable to the writing on Slate, the control used for magazine-level syntax.

So, although the medium is changing, we are all still writing, and writing a lot, whether it’s through misspelled status updates, Instagram captions or eloquent tweets.

Dataclysm Key Idea #3: We learn and are inspired by the ideas around us. So the more connections, the better.

Even before social media, interpersonal interaction has always been viewed as important. We even see this reflected in the way we design spaces – such as the bathrooms at Pixar.

To force interdepartmental small talk, Pixar decided to put the only bathrooms in the building in its central atrium. The idea was that bringing people together – even if it means on the way to the bathroom –  would bring about the collision and synthesis of innovative ideas.

Ideas, start-ups, movies or anything else that requires a signal boost through word-of-mouth communication is usually spread by people to which you have only loose ties. So, it’s very important to maintain connections with people in general.

Think about it – you don’t have to be someone’s best friend to overhear a movie recommendation on the train, and perhaps become inspired to check it out yourself. In fact, we’ve known since the 1970s that innovations come from intersecting ideas, but thanks to robust online data, we are now even more convinced this is the case.

Indeed, the cat videos that go viral and are shared with millions of people globally prove that “word of mouth” knows no geographic boundaries.

Social connections aren’t just for cute cat videos, however; they can also affect your romantic relationships.  

For instance, Facebook data shows that you and your spouse are the link between two different social groups. However, the more connections you have in common, the more likely you are to stay together.

In contrast, the fewer mutual friends you share with your spouse on Facebook, the more likely it is that you’ll be disconnected in real life. This can lead to having separate lives, which can quickly escalate to having “secret lives,” followed shortly by a potentially nasty breakup.

Now that you’ve learned a bit about how behavioral data from the web can be applied, the next book summarys will examine the differences in how people express themselves publicly and privately.

Dataclysm Key Idea #4: People are prejudiced, superficial and even racist, when they think no one’s watching.

How often have we been told not to judge a book by its cover? Or not to make a snap judgment after meeting someone for only a moment?

We do it anyway, and eventually these judgments transform into schemas, theories or thought-models which then stick with us. Everyone has their own set of expectations and attitudes which often are not logical.

Let’s look at what what happened when OkCupid introduced an app, Crazy Blind Love, which allowed two users to exchange information to set up a date in the immediate future. While it sounded good, the app ultimately failed. But why?

The problem was that users weren’t able see what their potential date looked like until they actually met. People absolutely “judge a book by its cover,” and thus wanted the opportunity to judge their potential dates solely on how they look.

And yet, those who did meet through the app gave it exceptional ratings, showing that appearances and conventional attractiveness had little if nothing to do with how well the date went.

Darker still, people are pervasively racist. While it’s no longer socially acceptable to express overt racism, internet data reveals that people still hold racist ideas and beliefs. Taking a look at the numbers, Google data reveals that the “n-word” appears in seven million searches per year.

Google’s autocomplete feature, which finishes search terms as you type them based on past searches from other users, are often plagued with racist queries.

Examples include such queries as: “Why do black people like fried chicken?” “Why do Asians look alike?” and “Why do Muslims hate America?”

If people cannot be openly racist, then they will simply keep their racism out of public view – at least when it can’t easily be traced back to them.

Dataclysm Key Idea #5: Are all humans miserable jerks? Not necessarily, but online anonymity doesn’t bring out our best.

Have you ever made the mistake of scrolling through the comments on a YouTube video? Often, comments degrade quickly to bickering, in which people insult each other over things completely unrelated to the video itself.

Unfortunately, this kind of vitriol is all too easy to find in every corner of the internet. But why?

Basically, people are cruel when there are no consequences for their actions. The anonymity of the internet allows people to act with a total lack of restraint, also called the online disinhibition effect.

Hateful people figure that, since nobody knows their true identity, they can write whatever they want, no matter how hurtful or hateful their language may be.

We can see this in the unfortunate example of Safiyyaah Nawaz, who on January 1, 2014, tweeted that “this beautiful earth is now 2014 years old, amazing.” Whether it was a joke or simple ignorance didn’t seem to matter: her tweet was then re-tweeted countless times, far and wide.

At first, people were confused, but they eventually turned aggressive, reaching a point where comments became hateful and rude. One user even suggested Nawaz commit suicide, writing: “Kill yourself you stupid motherfuck.”

Of course, hateful behavior is not exclusive to the internet. We’ve been berating and humiliating each other for as long as humans have been on earth. Even the most ancient polytheistic religions, from Norse to Egyptian to Greek, all have gods dedicated to the dark art of gossip.

Even some of the Bible’s most famous verses deal with gossip, such as: “judge not lest you be judged.”

Negativity and hate is inherent in humanity, and social media of all forms – Facebook, Reddit, YouTube, Twitter and so on – stands as a testament to that.

You’ve seen how nasty we can be when we’re anonymous. But how do people choose to represent themselves when their data is more easily available?

Dataclysm Key Idea #6: Tell me what words you use, and I will tell you who you are (and whether you like pizza).

How do you choose to label yourself? By ethnicity, gender or perhaps age? Interestingly, the vocabulary you use is often enough to identify you as belonging to one group or another.

People tend to use words that specifically relate to their ethnic, sexual and political identity. For instance, if you were to map out the most common words found on OkCupid, the results draw an interesting yet stereotypical caricature of different social groups.

For example, words and phrases such as “my blue eyes,” “campfire” and “Allman Brothers” are the most common phrases found in the profiles of white men. Black men often use the words “dreads,” “Jamie Foxx” and “Paid in Full.”

Asian women write “Taiwan,” “tall for an Asian” and “filipina” more often, and Latinas write “una,” “merengue bachata” and “Colombian” the most. In essence, OkCupid users represented their own cultural backgrounds without ever making explicit reference to that background.

As it turns out, ethnic and cultural backgrounds are not the only things that can be distinguished by language alone. Gender, that is, the characteristics that distinguish masculinity and femininity, is equally (if not more) represented by our choice of vocabulary.

If you look at the phrase commonality of men and women on Twitter, you’ll find some predictable phrases that clearly distinguish one group from the other.

The most common phrases for women include: “my nails done,” “cute texts,” “girls night” and “my makeup.” Men, on the other hand, write “good bro,” “ps4,” “the squad” and “hoopin.”

However, essential vocabulary like “the” and “pizza” transcend racial and gender boundaries. So perhaps we aren’t so different after all!

Dataclysm Key Idea #7: We all deserve online privacy; but we have to control what and how we share online, too.

With a growing degree of openness and sharing online, the issue of privacy is a hot topic and will remain so for some time.

A good starting point is to ask: How much privacy do we even have when online?

In essence, we have significant control over our privacy if we choose what and how much we share on social media.

If you choose to limit your social media usage, then you’ll enjoy greater privacy. It’s harder for internet megacorporations to collect your online data if you aren’t constantly sharing photos or allowing social networking sites to publish when or where you’re traveling.

However, privacy comes at a cost. The services we enjoy, such as Google and Facebook, are free only because these companies have access to and can sell the data we share.

Basically, we barter away our private information for the chance to get free information from Google or to effortlessly connect with old friends on Facebook.

But what happens if you are a social media user, yet decide you no longer want to be? What should happen to your data?

Massachusetts Institute of Technology professor Alex Pentland believes that we should have a New Deal on Data which would give us greater control of how our data is used online.

One aspect of this New Deal would entail the ability to remove your data from a website whenever you feel like it is or might be misused.

It would also mean being able to take your data with you. You should have private access to the personal data collected on you, so you could then sell that data to scientists on your own if you so choose.

In the end, however, you can only have as much privacy as you allow yourself.

In Review: Dataclysm Book Summary

The key message in this book:

The massive amounts of data collected by internet services offers scientists and researchers entirely new information that they can then use to investigate the human condition. While the results aren’t always flattering, the data helps us nevertheless better understand our behavior when online.

Suggested further reading: Big Data by Viktor Mayer-Schönberger and Kenneth Cukier

Big Data provides an insightful look at why a change to “big data” is a major shift in how we collect, use and think about the data around us. It provides great explanations and examples of how individuals and companies already ahead of the curve are using the tools of big data to create value and profit. Casting an eye forward, the book also outlines the future implications for a big-data society in terms of the risks, opportunities and legal implications.