Tuesday, October 26, 2021

The Best Idea From Facebook Staffers for Fixing Facebook: Learn From Google

[Image from Murat Yükselif/The Globe and Mail]
The Facebook Papers trove of internal documents show that Facebook employees understand the harms of their service and have many good ideas for limiting them. Many have value as part of a total solution. But only one has been proven to not only limit distribution of harmful content but also to select for quality content -- and to work economically at huge scale and across a multitude of languages and cultures.

Facebook knows that filtering for quality in newsfeeds (and filtering out mis/disinformation and hate) doesn’t require advanced AI -- or humans to understand content -- or the self-defeating Luddite remedy of prohibiting algorithms. It takes clever algorithms that weigh external signals of quality to augment human user intelligence, much as done by Google PageRank. 

I was pleased to read Gilad Edelman's capsule on this in Wired on 10/26, which brought me to Karen Hao's report in Tech Review from 9/16, both based on a leaked 10/4/19 report by Jeff Allen, a senior-level data scientist then leaving Facebook. I have long advocated such an approach -- seemingly as a voice in the wilderness -- and view this as a measure of validation. Here is a quick note (hopefully to be expanded).

Jeff Allen's Facebook Paper "How Communities are Exploited on Our Platforms"

As Karen Hao reports on Allen: 

“It will always strike me as profoundly weird ... and genuinely horrifying,” he wrote. “It seems quite clear that until that situation can be fixed, we will always be feeling serious headwinds in trying to accomplish our mission.”

The report also suggested a possible solution. “This is far from the first time humanity has fought bad actors in our media ecosystems,” he wrote, pointing to Google’s use of what’s known as a graph-based authority measure—which assesses the quality of a web page according to how often it cites and is cited by other quality web pages—to demote bad actors in its search rankings.

“We have our own implementation of a graph-based authority measure,” he continued. If the platform gave more consideration to this existing metric in ranking pages, it could help flip the disturbing trend in which pages reach the widest audiences.

When Facebook’s rankings prioritize engagement, troll-farm pages beat out authentic pages, Allen wrote. But “90% of Troll Farm Pages have exactly 0 Graph Authority … [Authentic pages] clearly win.” 

And as Gilad Edelman reports,

Allen suggests that Graph Authority should replace engagement as the main basis of recommendations. In his post, he posits that this would obliterate the problem of sketchy publishers devoted to gaming Facebook, rather than investing in good content. An algorithm optimized for trustworthiness or quality would not allow the fake-news story “Pope Francis Shocks World, Endorses Donald Trump for President” to rack up millions of views, as it did in 2016. It would kneecap the teeming industry of pages that post unoriginal memes, which according to one 2019 internal estimate accounted at the time for as much as 35 to 40 percent of Facebook page views within News Feed. And it would provide a boost to more respected, higher quality news organizations, who sure could use it.

Allen's original 2018 report expands: "...this is far from the first time humanity has fought bad actors in our media ecosystems. And it is even far from the first time web platforms have fought similar bad actors. There is a proven strategy to aligning media ecosystems and distribution platforms with important missions, such as ours, and societal value." He capsules the history of Google's PageRank as "the algorithm that built the internet" and notes that graph-based authority measures date back to the '70s.

He recounts the history of "yellow journalism" over a century ago, and how Adolph Ochs' New York Times changed that by establishing a reputation for quality, and then digs in to Google (emphasis added):

So Let's Just Follow Googles Lead. Google has set a remarkable example of how to build Ochs’ idea into a web platform. How to encode company values and missions into ranking systems. Figuring out how to make some of it work for FB and IG would provide the whole company with enormous value.

Google is remarkably transparent about how they work and how they fight these types of actors. If you haven't read “How Search Works" I highly recommend it. It is an amazing lesson in how to build a world class information retrieval system. And if you haven't read “How Google Fights Disinformation”, legitimately stop what you're doing right now and read it.

The problem of information retrieval (And Newsfeed is 100% an information retrieval system) comes down to creating a meaningful definition of both the quality of the content producer and the relevance of the content. Google's basic method was to use their company mission to define the quality.

Google's mission statement is to make the worlds information widely available and useful. The most important word in that mission statement is “useful”. A high quality content producer should be in alignment with the IR systems mission. In the case of Google, that means a content producer that makes useful content. A low quality producer makes content that isn't useful. Google has built a completely objective and defensible definition of what useful content is that they can apply at scale. This is done in their “Search Quality Rater Guidelines”, which they publish publicly.

The way Google breaks down the utility of content basically lands in 3 buckets. How much expertise does the author have in the subject matter of the content, as determined by the credentials the author presents to the users. How much effort does the author put into their content. And the level of 3rd party validation the author has.

If the author has 0 experience in the subject, doesn't spend any time on the content, and doesn't have any 3rd party validation, then that author is going to be labeled lowest quality by Google and hardly get any search traffic. Does that description sound familiar? It is a pretty solid description of the Troll Farms.

Google calls their quality work their first line of defense against disinformation and misinformation. All we have to do are figure out what the objective and defensible criteria are for a Page to build community, and bring the world closer together. We are leaving a huge obvious win on the table by not pursuing this strategy.

...It seems quite clear that until that situation can be fixed, we will always be feeling serious headwinds in trying to accomplish our mission. Newsfeed and specifically ranking is such an integral part of our platform. For almost everything we want to accomplish, Feed plays a key role. Feed is essential enough that it doesn't particularly need any mission beyond our companies. FB, and IG, need to figure out what implications our company mission has on ordering posts from users inventory.

Until we do, we should expect our platform to continue to empower actors who are antithetical to the company mission.

My views on applying this

Allen is focused here on troll-farm Pages rather than pure user generated content, and that is where Google's page ranking strategy is most directly parallel. It also may be the most urgent to remedy. 

UGC is more of a long tail -- more items, harder to rate according the first two of Allen's "3 buckets." But he did not explain the third bucket -- how Google users massive data, such as links placed by human "Webmasters," plus feedback on which items in search hit lists users actually click on, and even dwell times on those clicks. That is similar to the data on likes, shares, and comments that I have suggested be used to create graph authority reputations for ordinary users and their posts and comments. For details on just how I see that working, see my 2018 post, The Augmented Wisdom of Crowds:  Rate the Raters and Weight the Ratings. 

Of course there will be challenges for any effort to apply this to social media. Google has proven that technology can do this kind of thing efficiently at Internet scale, but social media UGC and virality is even more of a challenge than Web pages. 

The biggest challenge is incentives -- to motivate Facebook to optimize for quality, rather than engagement. One way to make it happen is to unbundle the filtering/ranking services from Facebook, as I described in Tech Policy Press, and as discussed by eminent scholars in the recent Tech Policy Press mini-symposium (and in other writings listed in the Selected Items tab, above). That could realign the incentives to filter for quality, by making filtering a service to users, not platforms or advertisers.

Maybe the level of anger at this increasingly blatant and serious abuse of society and threat to democracy will finally spur regulatory action -- and the realization that we need deep fixes, not just band-aids..

In any case, it is great to finally see recognition of the merit of this strategy from within Facebook (even if by an employee reportedly departing in frustration). 

Tuesday, October 12, 2021

It Will Take a Moonshot to Save Democracy From Social Media

A moonshot is what struck me, after some reflection on the afternoon’s dialog at the Tech Policy Press mini-symposium, Reconciling Social Media & Democracy on 10/7/21. It was crystalized by a tweet later that evening about “an optimistic note.” My optimism that there is a path to a much better future was reinforced, but so was my sense of the weight of the task.

Key advocates now see the outlines of a remedial program, and many are now united in calling for reform. But the task is unlikely to be undertaken voluntarily by the platforms -- and is far too complex, laborious, and uncertain to be effectively managed by legislation or existing regulatory bodies. There seemed to be general agreement on an array of measures as promising -- despite considerable divergence on details and priorities. The clearest consensus was that a new, specialized, expert agency is needed to work with and guide the industry to serve users and society.

While many of the remedies have been widely discussed, the focal point was a less-known strategy arising from several sources and recently given prominence by Francis Fukuyama and his Stanford-based group. The highly respected Journal of Democracy featured an article by Fukuyama, then a debate by other scholars plus Fukuyama’s response. Our event featured Fukuyama and most of those other debaters, plus several notable technology-focused experts. I moderated the opening segment with Fukuyama and two of the other scholars, drawing on my five-decade perspective on the evolution of social media to try to step back and suggest a long-term guiding vision.

The core proposal is to unbundle the filtering of items in our newsfeeds, creating an open market in filtering services (“middleware”) that users can choose from to work as their agents. The idea is 1) to reduce the power of the platforms to control for each of us what we see, and 2) to decouple that from the harmful effects of engagement-driven business incentives that favor shock, anger, and divisiveness. That unbundling is argued to be the only strategy that limits unaccountable platform power over what individuals see, as a “loaded gun on the table” that could be picked up by an authoritarian platform or government to threaten the very foundations of democracy.

Key alternatives, favored by some, are the more familiar remedies of shifting from extractive, engagement-driven, advertising-based business models; stronger requirements for effective moderation and transparency; and corporate governance reforms. These too have weaknesses: moderation is very hard to do well no matter what, and government enforcement of content-based moderation standards would likely fail First Amendment challenges.

Some of the speakers are proponents of even greater decentralization. My opening comments suggested that be viewed as a likely long-term direction, and that the unbundling of filters was an urgent first step toward a much richer blend of centralized and decentralized services and controls -- including greater user control and more granular competitive options.

There was general agreement by most speakers that there is no silver bullet, and that most of these remedies are needed at some level as part of a holistic solution. There were concerns whether the unbundling of filters would do enough to stop harmful content or filter bubble echo chambers, but general agreement that shifting power from the platforms is important. The recent Facebook Files and hearings make it all too clear that platform self-regulation cannot be relied on and that all but the most innocuous efforts at regulation will be resisted or subverted. My suggested long-term direction of richer decentralization seemed to generate little dispute.

This dialog may help bring more coherence to this space, but the deeper concern is just how hard reform will be. There seemed to be full agreement on the urgent need for a new Digital Regulatory Agency with new powers to draw on expertise from government, industry, and academia to regulate and monitor with an ongoing and evolving discipline (and that current proposals to expand the FTC role are too limited).

The Facebook Files and recent whistleblower testimony may have stirred regulators to action (or not?), but we need a whole of society effort. We see the outlines of the direction through a thicket of complex issues, but cannot predict just where it will lead.  That makes us all uncomfortable.

That is why this is much like the Apollo moonshot. Both are concerted attacks on unsolved, high-risk problems -- taking time, courage, dedication, multidisciplinary government/industry organization, massive financial and manpower resources, and navigation through a perilous and evolving course of trial and error.

But this problem of social media is far more consequential than the moonshot. “The lamps are going out all over the free world, and we shall not see them lit again in our lifetime” (paraphrasing Sir Edward Grey as the First World War began) -- this could apply within a very few years. We face the birthing of the next stage of democracy -- much as after Gutenberg, industrialization, and mass media. No one said this would be easy, and our neglect over the past two decades has made it much harder. It is not enough to sound alarms – or to ride off in ill-considered directions. But there is reason to be optimistic -- if we are serious about getting our act together.


This is my quick take, from my own perspective (and prior to access to recordings or transcripts) -- feedback reflecting other takes on this are welcome. More to follow...

Running updates on these important issues can be found here, and my updating list of Selected Items is on the tab above.