Tuesday, October 26, 2021

The Best Idea From Facebook Staffers for Fixing Facebook: Learn From Google

[Image from Murat Yükselif/The Globe and Mail]
The Facebook Papers trove of internal documents show that Facebook employees understand the harms of their service and have many good ideas for limiting them. Many have value as part of a total solution. But only one has been proven to not only limit distribution of harmful content but also to select for quality content -- and to work economically at huge scale and across a multitude of languages and cultures.

Facebook knows that filtering for quality in newsfeeds (and filtering out mis/disinformation and hate) doesn’t require advanced AI -- or humans to understand content -- or the self-defeating Luddite remedy of prohibiting algorithms. It takes clever algorithms that weigh external signals of quality to augment human user intelligence, much as done by Google PageRank. 

I was pleased to read Gilad Edelman's capsule on this in Wired on 10/26, which brought me to Karen Hao's report in Tech Review from 9/16, both based on a leaked 10/4/19 report by Jeff Allen, a senior-level data scientist then leaving Facebook. I have long advocated such an approach -- seemingly as a voice in the wilderness -- and view this as a measure of validation. Here is a quick note (hopefully to be expanded).

Jeff Allen's Facebook Paper "How Communities are Exploited on Our Platforms"

As Karen Hao reports on Allen: 

“It will always strike me as profoundly weird ... and genuinely horrifying,” he wrote. “It seems quite clear that until that situation can be fixed, we will always be feeling serious headwinds in trying to accomplish our mission.”

The report also suggested a possible solution. “This is far from the first time humanity has fought bad actors in our media ecosystems,” he wrote, pointing to Google’s use of what’s known as a graph-based authority measure—which assesses the quality of a web page according to how often it cites and is cited by other quality web pages—to demote bad actors in its search rankings.

“We have our own implementation of a graph-based authority measure,” he continued. If the platform gave more consideration to this existing metric in ranking pages, it could help flip the disturbing trend in which pages reach the widest audiences.

When Facebook’s rankings prioritize engagement, troll-farm pages beat out authentic pages, Allen wrote. But “90% of Troll Farm Pages have exactly 0 Graph Authority … [Authentic pages] clearly win.” 

And as Gilad Edelman reports,

Allen suggests that Graph Authority should replace engagement as the main basis of recommendations. In his post, he posits that this would obliterate the problem of sketchy publishers devoted to gaming Facebook, rather than investing in good content. An algorithm optimized for trustworthiness or quality would not allow the fake-news story “Pope Francis Shocks World, Endorses Donald Trump for President” to rack up millions of views, as it did in 2016. It would kneecap the teeming industry of pages that post unoriginal memes, which according to one 2019 internal estimate accounted at the time for as much as 35 to 40 percent of Facebook page views within News Feed. And it would provide a boost to more respected, higher quality news organizations, who sure could use it.

Allen's original 2018 report expands: "...this is far from the first time humanity has fought bad actors in our media ecosystems. And it is even far from the first time web platforms have fought similar bad actors. There is a proven strategy to aligning media ecosystems and distribution platforms with important missions, such as ours, and societal value." He capsules the history of Google's PageRank as "the algorithm that built the internet" and notes that graph-based authority measures date back to the '70s.

He recounts the history of "yellow journalism" over a century ago, and how Adolph Ochs' New York Times changed that by establishing a reputation for quality, and then digs in to Google (emphasis added):

So Let's Just Follow Googles Lead. Google has set a remarkable example of how to build Ochs’ idea into a web platform. How to encode company values and missions into ranking systems. Figuring out how to make some of it work for FB and IG would provide the whole company with enormous value.

Google is remarkably transparent about how they work and how they fight these types of actors. If you haven't read “How Search Works" I highly recommend it. It is an amazing lesson in how to build a world class information retrieval system. And if you haven't read “How Google Fights Disinformation”, legitimately stop what you're doing right now and read it.

The problem of information retrieval (And Newsfeed is 100% an information retrieval system) comes down to creating a meaningful definition of both the quality of the content producer and the relevance of the content. Google's basic method was to use their company mission to define the quality.

Google's mission statement is to make the worlds information widely available and useful. The most important word in that mission statement is “useful”. A high quality content producer should be in alignment with the IR systems mission. In the case of Google, that means a content producer that makes useful content. A low quality producer makes content that isn't useful. Google has built a completely objective and defensible definition of what useful content is that they can apply at scale. This is done in their “Search Quality Rater Guidelines”, which they publish publicly.

The way Google breaks down the utility of content basically lands in 3 buckets. How much expertise does the author have in the subject matter of the content, as determined by the credentials the author presents to the users. How much effort does the author put into their content. And the level of 3rd party validation the author has.

If the author has 0 experience in the subject, doesn't spend any time on the content, and doesn't have any 3rd party validation, then that author is going to be labeled lowest quality by Google and hardly get any search traffic. Does that description sound familiar? It is a pretty solid description of the Troll Farms.

Google calls their quality work their first line of defense against disinformation and misinformation. All we have to do are figure out what the objective and defensible criteria are for a Page to build community, and bring the world closer together. We are leaving a huge obvious win on the table by not pursuing this strategy.

...It seems quite clear that until that situation can be fixed, we will always be feeling serious headwinds in trying to accomplish our mission. Newsfeed and specifically ranking is such an integral part of our platform. For almost everything we want to accomplish, Feed plays a key role. Feed is essential enough that it doesn't particularly need any mission beyond our companies. FB, and IG, need to figure out what implications our company mission has on ordering posts from users inventory.

Until we do, we should expect our platform to continue to empower actors who are antithetical to the company mission.

My views on applying this

Allen is focused here on troll-farm Pages rather than pure user generated content, and that is where Google's page ranking strategy is most directly parallel. It also may be the most urgent to remedy. 

UGC is more of a long tail -- more items, harder to rate according the first two of Allen's "3 buckets." But he did not explain the third bucket -- how Google users massive data, such as links placed by human "Webmasters," plus feedback on which items in search hit lists users actually click on, and even dwell times on those clicks. That is similar to the data on likes, shares, and comments that I have suggested be used to create graph authority reputations for ordinary users and their posts and comments. For details on just how I see that working, see my 2018 post, The Augmented Wisdom of Crowds:  Rate the Raters and Weight the Ratings. 

Of course there will be challenges for any effort to apply this to social media. Google has proven that technology can do this kind of thing efficiently at Internet scale, but social media UGC and virality is even more of a challenge than Web pages. 

The biggest challenge is incentives -- to motivate Facebook to optimize for quality, rather than engagement. One way to make it happen is to unbundle the filtering/ranking services from Facebook, as I described in Tech Policy Press, and as discussed by eminent scholars in the recent Tech Policy Press mini-symposium (and in other writings listed in the Selected Items tab, above). That could realign the incentives to filter for quality, by making filtering a service to users, not platforms or advertisers.

Maybe the level of anger at this increasingly blatant and serious abuse of society and threat to democracy will finally spur regulatory action -- and the realization that we need deep fixes, not just band-aids..

In any case, it is great to finally see recognition of the merit of this strategy from within Facebook (even if by an employee reportedly departing in frustration). 

No comments:

Post a Comment