Okay, unlike the last big data book I read, this WAS the big data book I wanted to read. While it does describe the mathematics used in the different ways big-data affects society, it does show how big data, how using small bits of innocent data can reveal surprising truths about who we are and what we really think.
If you start with the assumption that everybody lies on surveys (and this isn't a bad assumption, people mess with surveys all the time, to make themselves look good, to make someone else look bad, to withhold information for ulterior reasons, or really just to f--- with the survey data), then the fundamental data from which theories and beliefs develop is wrong.
But put people in a place where they believe their thoughts are anonymous, truly believe the data they provide will not be traced back to them, then said people become more open and, well, more honest with what they are thinking.
Which is where Stephens-Davidowitz's data research idea came from, to use Google search data as a research source, and where many assumptions about what people are thinking can be debunked.
And the results are fascinating.
Stephens-Davidowitz provides a number of examples of "here's common knowledge, everyone knows this," and shows where, with big data, the "knowledge" is wrong. Either we aren't the same as when the knowledge was first determined, or it was declared as true based on something unknown, or simply accepted as true based on some voice of authority. Regardless of the source, the actual data, the actual numbers, show a different story, and that is the fascinating part.
This book is a good introduction to how powerful big-data can be. It only briefly touches upon how badly big-data warps society, which makes this book a good first book on what big-data can do, and WMD a good follow up on the flip side. Together, they make a good good-side-bad-side story of Big Data™.
I enjoyed this book a lot. I strongly recommend it.
In the previous three elections, the candidate who appeared first in more searches received the most votes. More interesting, the order the candidates were searched was predictive of which way a particular state would go. The order in which candidates are searched also seems to contain information that the polls can miss.
But for now it doesn’t matter. Seder is in the prediction business, not the explanation business. And, in the prediction business, you just need to know that something works, not why.
Women also like men who express support and sympathy. If a man says, “That’s awesome!” or “That’s really cool,” a woman is significantly more likely to report a connection. Likewise if he uses phrases such as “That’s tough” or “You must be sad.”
And as previously noted, a woman is also more likely to report a connection after a date where she talks about herself. Thus it is a great sign, on a first date, if there is substantial discussion about the woman. The woman signals her comfort and probably appreciates that the man is not hogging the conversation.
Some of the findings, however, were more interesting. Women use the word “tomorrow” far more often than men do, perhaps because men aren’t so great at thinking ahead.
Adding the letter “o” to the word “so” is one of the most feminine linguistic traits. Among the words most disproportionately used by women are “soo,” “sooo,” “soooo,” “sooooo,” and “soooooo.”
Maybe it was my childhood exposure to women who weren’t afraid to throw the occasional f-bomb. But I always thought cursing was an equal-opportunity trait. Not so. Among the words used much more frequently by men than women are “fuck,” “shit,” “fucks,” “bullshit,” “fucking,” and “fuckers.”
I am a guy. Or as Priyanka says, I was a guy in a previous life. No wait, still am.
Many people, particularly Marxists, have viewed American journalism as controlled by rich people or corporations with the goal of influencing the masses, perhaps to push people toward their political views.
Gentzkow and Shapiro’s paper suggests, however, that this is not the predominant motivation of owners. The owners of the American press, instead, are primarily giving the masses what they want so that the owners can become even richer.
These days, a data scientist must not limit herself to a narrow or traditional view of data. These days, photographs of supermarket lines are valuable data. The fullness of supermarket bins is data. The ripeness of apples is data. Photos from outer space are data. The curvature of lips is data. Everything is data! And with all this new data, we can finally see through people’s lies.
Men’s top Googled question related to how their body or mind would change as they aged was whether their penis would get smaller.
For every search women make about a partner’s phallus, men make roughly 170 searches about their own.
More than 40 percent of complaints about a partner’s penis size say that it’s too big.
Still cracking up.
In fact, we are all so busy judging our own bodies that there is little energy left over to judge other people’s.
There is also probably a connection between two of the big concerns revealed in the sexual searches on Google: lack of sex and an insecurity about one’s sexual attractiveness and performance. Maybe these are related.
Maybe if we worried less about sex, we’d have more of it.
Who is more sexually generous, men or women?
And when men do look for tips on how to give oral sex, they are frequently not looking for ways of pleasing another person. Men make as many searches looking for ways to perform oral sex on themselves as they do how to give a woman an orgasm. (This is among my favorite facts in Google search data.)
Parents are about twice as likely to ask how to get their daughters to lose weight as they are to ask how to get their sons to do the same.
Just as with giftedness, this gender bias is not grounded in reality. About 28 percent of girls are overweight, while 35 percent of boys are. Even though scales measure more overweight boys than girls, parents see—or worry about—overweight girls much more frequently than overweight boys.
Parents are also one and a half times more likely to ask whether their daughter is beautiful than whether their son is handsome. And they are nearly three times more likely to ask whether their daughter is ugly than whether their son is ugly. (How Google is expected to know whether a child is beautiful or ugly is hard to say.)
In general, parents seem more likely to use positive words in questions about sons. They are more apt to ask whether a son is “happy” and less apt to ask whether a son is “depressed.”
A recent Southern Poverty Law Center report linked nearly one hundred murders in the past five years to registered Stormfront members.
It turns out, some kids make some tragic, and heart-wrenching, searches on Google—such as “my mom beat me” or “my dad hit me.” And these searches present a different—and agonizing—picture of what happened during this time. The number of searches like this shot up during the Great Recession, closely tracking the unemployment rate.
Here’s what I think happened: it was the reporting of child abuse cases that declined, not the child abuse itself. After all, it is estimated that only a small percentage of child abuse cases are reported to authorities anyway. And during a recession, many of the people who tend to report child abuse cases (teachers and police officers, for example) and handle cases (child protective service workers) are more likely to be overworked or out of work.
In 2015, in the United States, there were more than 700,000 Google searches looking into self-induced abortions. By comparison, there were some 3.4 million searches for abortion clinics that year.
What drives interest in self-induced abortion? The geography and timing of the Google searches point to a likely culprit: when it’s hard to get an official abortion, women look into off-the-books approaches.
Search rates for self-induced abortion were fairly steady from 2004 through 2007. They began to rise in late 2008, coinciding with the financial crisis and the recession that followed. They took a big leap in 2011, jumping 40 percent. The Guttmacher Institute, a reproductive rights organization, singles out 2011 as the beginning of the country’s recent crackdown on abortion; ninety-two state provisions that restrict access to abortion were enacted.
We can’t blindly trust government data. The government may tell us that child abuse or abortion has fallen and politicians may celebrate this achievement. But the results we think we’re seeing may be an artifact of flaws in the methods of data collection. The truth may be different—and, sometimes, far darker.
Facebook is digital brag-to-my-friends-about-how-good-my-life-is serum.
Not exactly cheery stuff. Often, after I give a talk on my research, people come up to me and say, “Seth, it’s all very interesting. But it’s so depressing.” I can’t pretend there isn’t a darkness in some of this data. If people consistently tell us what they think we want to hear, we will generally be told things that are more comforting than the truth.
Digital truth serum, on average, will show us that the world is worse than we have thought.
First, there can be comfort in knowing that you are not alone in your insecurities and embarrassing behavior. It can be nice to know others are insecure about their bodies.
In fact, I think Big Data can give a twenty-first-century update to a famous self-help quote: “Never compare your insides to everyone else’s outsides.”
The second benefit of digital truth serum is that it alerts us to people who are suffering.
The final—and, I think, most powerful—value in this digital truth serum is indeed its ability to lead us from problems to solutions.
When we lecture angry people, the search data implies that their fury can grow. But subtly provoking people’s curiosity, giving new information, and offering new images of the group that is stoking their rage may turn their thoughts in different, more positive directions.
Noah finds baseball impossibly boring, and his hatred of the sport has long been a core part of his identity.
Huh. Can't say "core" is accurate, but I might understand some part of this.
One hypothesis—and this is speculative—was put forth by David Cutler, one of the authors of the study and one of my advisors. Contagious behavior may be driving some of this. There is a large amount of research showing that habits are contagious. So poor people living near rich people may pick up a lot of their habits.
In fact, Chetty’s team found even more evidence that knowledge drove this kind of cheating. When Americans moved from an area where this variety of tax fraud was low to an area where it was high, they learned and adopted the trick. Through time, cheating spread from region to region throughout the United States. Like a virus, cheating on taxes is contagious. Now stop for a moment and think about how revealing this study is. It demonstrated that, when it comes to figuring out who will cheat on their taxes, the key isn’t determining who is honest and who is dishonest. It is determining who knows how to cheat and who doesn’t.
The greater the percentage of foreign-born residents in an area, the higher the proportion of children born there who go on to notable success. (Take that, Donald Trump!) If two places have similar urban and college populations, the one with more immigrants will produce more prominent Americans. What explains this? A lot of it seems to be directly attributable to the children of immigrants.
For better or worse (okay, clearly worse), there is a huge random component to life. Nobody knows for sure what or who is in charge of the universe.
For Ahmed Yilmaz, the son of an insurance agent and teacher in Queens, Stuy was “the high school."
He still remembers the day he received the envelope with the results. He missed by two questions. I asked him what it felt like. “What does it feel like,” he responded, “to have your world fall apart when you’re in middle school?”
How horrible would this be?
More than a decade later, Yilmaz admits that he sometimes wonders how life would have played out had he gone to Stuy. “Everything would be different,” he says. “Literally, everyone I know would be different.”
But these what-ifs seem unanswerable. Life is not a video game. You can’t replay it under different scenarios until you get the results you want.
The economists found that prisoners assigned to harsher conditions were more likely to commit additional crimes once they left. The tough prison conditions, rather than deterring them from crime, hardened them and made them more violent once they returned to the outside world.
People adapt to their experience, and people who are going to be successful find advantages in any situation. The factors that make you successful are your talent and your drive.
This book is called Everybody Lies. By this, I mostly mean that people lie—to friends, to surveys, and to themselves—to make themselves look better. But the world also lies to us by presenting us with faulty, misleading data.
The fundamental problem is that they tested too many things. And if you test enough things, just by random chance, one of them will be statistically significant.
We can find patterns in ANYTHING.
This time, IGF2r did not correlate with IQ. Plomin — and this is a sign of a good scientist — retracted his claim.
This is huge. I applaud Plomin. How hard it is to admit you're wrong. How nearly impossible it is to admit you're wrong with the world knowning? F'ing hard.
I wish we were all good scientists.
How can you overcome the curse of dimensionality? You have to have some humility about your work and not fall in love with your results. You have to put these results through additional tests.
Social scientists call this an “out-of-sample” test. And the more variables you try, the more humble you have to be. The more variables you try, the tougher the out-of-sample test has to be. It is also crucial to keep track of every test you attempt.
But it points to a potential problem with people using data to make decisions. Numbers can be seductive. We can grow fixated with them, and in so doing we can lose sight of more important considerations.
Consider the twenty-first-century emphasis on testing in American schools—and judging teachers based on how their students score. While the desire for more objective measures of what happens in classrooms is legitimate, there are many things that go on there that can’t readily be captured in numbers.
We can measure how students do on multiple-choice questions. We can’t easily measure critical thinking, curiosity, or personal development.
It was easy to measure offense and pitching but not fielding, so some organizations ended up underestimating the importance of defense.
You might think — or at least hope — that a polite, openly religious person who gives his word would be among the most likely to pay back a loan. But in fact this is not the case. This type of person, the data shows, is less likely than average to make good on their debt.
This fact cracks me up.
Hey, guess what, you can be an honest person without the threat of a omniscient, bearded man watching you.
Phrases such as “lower interest rate” or “after-tax” indicate a certain level of financial sophistication on the borrower’s part, so it’s perhaps not surprising they correlate with someone more likely to pay their loan back. In addition, if he or she talks about positive achievements such as being a college “graduate” and being “debt-free,” he or she is also likely to pay their loans.
Generally, if someone tells you he will pay you back, he will not pay you back. The more assertive the promise, the more likely he will break it. If someone writes “I promise I will pay back, so help me God,” he is among the least likely to pay you back.
Another word that indicates default is “explain,” meaning if people are trying to explain why they are going to be able to pay back a loan, they likely won’t.
In sum, according to these researchers, giving a detailed plan of how he can make his payments and mentioning commitments he has kept in the past are evidence someone will pay back a loan. Making promises and appealing to your mercy is a clear sign someone will go into default.
Regardless of the reasons—or what it tells us about human nature that making promises is a sure sign someone will, in actuality, not do something — the scholars found the test was an extremely valuable piece of information in predicting default. Someone who mentions God was 2.2 times more likely to default.
They found that Facebook likes are frequently correlated with IQ, extraversion, and conscientiousness. For example, people who like Mozart, thunderstorms, and curly fries on Facebook tend to have higher IQs. People who like Harley-Davidson motorcycles, the country music group Lady Antebellum, or the page “I Love Being a Mom” tend to have lower IQs. Some of these correlations may be due to the curse of dimensionality. If you test enough things, some will randomly correlate. But some interests may legitimately correlate with IQ.
First, it must be acknowledged that there is growing evidence that Google searches related to criminal activity do correlate with criminal activity. Christine Ma-Kellams, Flora Or, Ji Hyun Baek, and Ichiro Kawachi have shown that Google searches related to suicide correlate strongly with state-level suicide rates. In addition,
If more people are making searches saying they want to do something, more people are going to do that thing.
My phone is filled with emails I forgot to respond to, e-vites I never opened, Bumble messages I ignored.
I learned about Bumble only a week or so ago. I wish I had not learned about Bumble.
For example, Leonard Cohen once gave his nephew the following advice for wooing women: “Listen well. Then listen some more. And when you think you are done listening, listen some more.” That seems to be roughly similar to what these scientists found.