Wednesday, October 10, 2018

In the War on Fake News, All of Us are Soldiers, Already!

This is intended as a supplement to my posts "A Cognitive Immune System for Social Media -- Developing Systemic Resistance to Fake News" and "The Augmented Wisdom of Crowds: Rate the Raters and Weight the Ratings." (But hopefully this stands on its own as well).Maybe this can make a clearer point of why the methods I propose are powerful and badly needed...
---

A NY Times article titled "Soldiers in Facebook’s War on Fake News Are Feeling Overrun" provides a simple context for showing how I propose to use information already available from all of us, on what is valid and what is fake.

The Times article describes a fact checking organization that works with Facebook in the Philippines (emphasis added):
On the front lines in the war over misinformation, Rappler is overmatched and outgunned - and that could be a worrying indicator of Facebook’s effort to curb the global problem by tapping fact-checking organizations around the world.
...it goes on to describe what I suggest is the heart of the issue:
When its fact checkers determine that a story is false, Facebook pushes it down on users’ News Feeds in favor of other material. Facebook does not delete the content, because it does not want to be seen as censoring free speech, and says demoting false content sharply reduces abuse. Still, falsehoods can resurface or become popular again.
The problem is that the fire hose of fake news is too fast and furious, and too diverse, for any specialized team of fact-checkers to keep up with it. Plus, the damage is done by the time they do identify the fakes and begin to demote them.

But we are all fact checking to some degree without even realizing it. We are all citizen-soldiers. Some do it better than others.

The trick is to draw out all of the signals we provide, in real time -- and use our knowledge of which users' signals are reliable -- to get smarter about what gets pushed down and what gets favored in our feeds. That can serve as a systemic cognitive immune system -- one based on rating the raters and weighting the ratings.

We are all rating all of our news, all of the time, whether implicitly or explicitly, without making any special effort:

  • When we read, "like," comment, or share an item, we provide implicit signals of interest, and perhaps approval.
  • When we comment or share an item, we provide explicit comments that may offer supplementary signals of approval or disapproval.
  • When we ignore an item, we provide a signal of disinterest (and perhaps disapproval).
  • When we return to other activity after viewing an item, the time elapsed signals our level of attention and interest.
Individually, inferences from the more implicit signals may be erratic and low in meaning. But when we have signals from thousands of people, the aggregate becomes meaningful. Trends can be seen quickly. (Facebook already uses such signals to target its ads -- that is how they makes so much money).

But simply adding all these signals can be misleading. 
  • Fake news can quickly spread through groups who are biased (including people or bots who have an ulterior interest in promoting an item) or are simply uncritical and easily inflamed -- making such an item appear to be popular.
  • But our platforms can learn who has which biases, and who is uncritical and easily inflamed.
  • They can learn who is respected within and beyond their narrow factions, and who is not, who is a shill (or a malicious bot) and who is not.
  • They can use this "rating" of the raters to weight their ratings higher or lower.
Done at scale, that can quickly provide probabilistically strong signals that an item is fake or misleading or just low quality. Those signals can enable the platform to demote low quality content and promote high quality content. 

To expand just a bit:
  • Facebook can use outside fact checkers, and can build AI to automatically signal items that seem questionable as one part of its defense.
  • But even without any information at all about the content and meaning of an item, it can make realtime inferences about its quality based on how users react to it.
  • If most of the amplification is from users known to be malicious, biased, or unreliable it can downrank items accordingly
  • It can test that downranking by monitoring further activity.
  • It might even enlist "testers" by promoting a questionable item to users known to be reliable, open, and critical thinkers -- and may even let some generally reliable users to self-select as validators (being careful not to overload them).
  • By being open-ended in this way, such downranking is not censorship -- it is merely a self-regulating learning process that works at Internet scale, on Internet time.
That is how we can augment the wisdom of the crowd -- in real time, with increasing reliability as we learn. That is how we build a cognitive immune system (as my other posts explain further).

This strategy is not new or unproven. It is is the core of Google's wildly successful PageRank algorithm for finding useful search results. And (as I have noted before), it was recently reported that Facebook is now beginning to do a similar, but apparently still primitive form of rating the trustworthiness of its users to try to identify fake news -- they track who spreads fake news and who reports abuse truthfully or deceitfully.* 

What I propose is that we take this much farther, and move rapidly to make it central to our filtering strategies for social media -- and more broadly. An all out effort to do that quickly may be our last, best hope for enlightened democracy.

---
Please see my other posts for more.
----
(*More background from Facebook on their current efforts was cited in the Times article: Hard Questions: What is Facebook Doing to Protect Election Security?

[Update 10/12:] A subsequent Times article by Sheera Frenkel, adds perspective on the scope and pace of the problem -- and the difficulty in definitively identifying items as fakes that can rightly be censored "because of the blurry lines between free speech and disinformation" -- but such questionable items can be down-ranked.

Monday, October 08, 2018

A Cognitive Immune System for Social Media -- Developing Systemic Resistance to Fake News

To counter the spread of fake news, it's more important to manage and filter its spread than to try to interdict its creation -- or to try to inoculate people against its influence. 

A recent NY Times article on their inside look at Facebook's election "war room" highlights the problem, quoting cybersecurity expert Priscilla Moriuchi:
If you look at the way that foreign influence operations have changed these last two years, their focus isn’t really on propagating fake news anymore. “It’s on augmenting stories already out there which speak to hyperpartisan audiences.”
That is why much of the growing effort to respond to the newly recognized crisis of fake news, Russian disinformation, and other forms of disruption in our social media fails to address the core of the problem. We cannot solve the problem by trying to close our systems off from fake news, nor can we expect to radically change people's natural tendency toward cognitive bias. The core problem is that our social media platforms lack an effective "cognitive immune system" that can resist our own tendency to spread the "cognitive pathogens" that are endemic in our social information environment.

Consider how living organisms have evolved to contain infections. We did that not by developing impermeable skins that could be counted on to keep all infections out, nor by making all of our cells so invulnerable that they can resist whatever infectious agents may unpredictably appear.

We have powerfully complemented what we can do in those ways by developing a richly nuanced internal immune system that is deeply embedded throughout our tissues. That immune system uses emergent processes at a system-wide level -- to first learn to identify dangerous agents of disease, and then to learn how to resist their replication and virulence as they try to spread through our system.

The problem is that our social media lack an effective "cognitive immune system" of this kind. 

In fact many of our social media platforms are designed by the businesses that operate them to maximize engagement so they can sell ads. In doing so, they have learned that spreading incendiary disinformation that makes people angry and upset, polarizing them into warring factions, increases their engagement. As a result, these platforms actually learn to spread disease rather than to build immunity. They learn to exploit the fact that people have cognitive biases that make them want to be cocooned in comfortable filter bubbles and feel-good echo-chambers, and to ignore and refute anything that might challenge beliefs that are wrong but comfortable. They work against our human values, not for them.

What are we doing about it? Are we addressing this deep issue of immunity, or are we just putting on band-aids and hoping we can teach people to be smarter? (As a related issue, are we addressing the underlying issue of business model incentives?) Current efforts seem to be focused on measures at the end-points of our social media systems:
  • Stopping disinformation at the source. We certainly should apply band-aids to prevent bad-actors from injecting our media with news, posts, and other items that are intentionally false and dishonest. Of course we should seek to block such items and those who inject them. Band-aids are useful when we find an open wound that germs are gaining entry through. But band-aids are still just band-aids.
  • Making it easier for individuals to recognize when items they receive may be harmful because they are not what they seem. We certainly should provide "immune markers" in the form of consumer-reports-like ratings of items and of the publishers or people who produce them (as many are seeking to do). Making such markers visible to users can help prime them to be more skeptical, and perhaps apply more critical thinking -- much like applying an antiseptic. But that depends on the willingness of users to pay attention to such markers and apply the antiseptic. There is good reason to doubt that will have more than modest effectiveness, given people's natural laziness and instinct for thinking fast rather than slow. (Many social media users "like" items based only on click-bait headlines that are often inflammatory and misleading, without even reading the item -- and that is often enough to cause those items to spread massively.)
These end-point measures are helpful and should be aggressively pursued, but we need to urgently pursue a more systemic strategy of defense. We need to address the problem of dissemination and amplification itself. We need to be much smarter about what gets spread -- from whom, to whom, and why.

Doing that means getting deep into the guts of how our media are filtered and disseminated, step by step, through the "viral" amplification layers of the media systems that connect us. That means integrating a cognitive immune system into the core of our social media platforms. Getting the platform owners to buy in to that will be challenging, but it is the only effective remedy.

Building a cognitive immune system -- the biological parallel

This perspective comes out of work I have been doing for decades, and have written about on this blog (and in a patent filing since released into the public domain). That work centers on ideas for augmenting human intelligence with computer support. More specifically, it is centers on augmenting the wisdom of crowds. It is based on the idea the our wisdom is not the simple result of a majority vote -- but results from an emergent process that applies smart filters that rate the raters and weight the ratings. That provides a way to learn which votes should be more equal than others (in a way that is democratic and egalitarian, but also merit-based). This approach is explained in the posts listed below. It extends an approach that has been developing for centuries.

Supportive of those perspectives, I recently turned to some work on biological immunity that uses the term "cognitive immune system." That work highlight the rich informational aspects of actual immune systems, as a model for understanding how these systems work at a systems level. As noted in one paper (see longer extract below*), biological immune systems are "cognitive, adaptive, fault-tolerant, and fuzzy conceptually." I have only begun to think about the parallels here, but it is apparent that the system architecture I have proposed in my other posts is at least broadly parallel, being also "cognitive, adaptive, fault-tolerant, and fuzzy conceptually." (Of course being "fuzzy conceptually" makes it not the easiest thing to explain and build, but when that is the inherent nature of the problem, it may also necessarily be the essential nature of the solution -- just as it is for biological immune systems.)

An important aspect of this being "fuzzy conceptually," is what I call The Tao of Truth. We can't definitively declare good-faith "speech" as "fake" or "false" in the abstract. Validity is "fuzzy" because it depends on context and interpretation. ("Fuzzy logic" recognizes that in the real world, it is often the case that facts are not entirely true or false but, rather, have degrees of truth.)  That is why only the clearest cases of disinformation can be safely cut off at the source. But we can develop a robust system for ranking the probable (fuzzy) value and truthfulness of speech, revising those rankings, and using that to decide how to share it with whom. For practical purposes, truth is a filtering process, and we can get much smarter about how we apply our collective intelligence to do our filtering. It seems the concepts of "danger" and "self/not-self" in our immune systems have a similarly fuzzy Tao -- many denizens of our microbiome that are not "self" are beneficial to us, and our immune systems have learned that we live better with them inside of us.

My proposals

Details of the architecture I have proposed for a cognitive immune system -- and the need for it -- are here:
  • The Tao of Fake News – the essential need for fuzziness in our logic: the inherent limits of experts, moderators, and rating agencies – and the need for augmenting the wisdom of the crowd (as essential to maintaining the intellectual openness of our democratic/enlightenment values).
(These works did not explicitly address the parallels with biological cognitive immune systems -- exploring those parallels might well lead to improvements on these strategies.)

To those without a background in the technology of modern information platforms, this brief outline may seem abstract and unclear. But as noted in these more detailed posts, these methods are a generalization of methods used by Google (in its PageRank algorithm) to do highly context-relevant filtering of search results using a similar rate the raters and weight the ratings strategy. (That is also "cognitive, adaptive, fault-tolerant, and fuzzy conceptually.") These methods not simple, but they are little stretch from the current computational methods of search engines, or from the ad targeting methods already well-developed by Facebook and others. They can be readily applied -- if the platforms can be motivated to do so.

Broader issues of support for our cognitive immune system

The issue of motivation to do this is crucial. For the kind of cognitive immune system I propose to be effective, it must be built deeply into the guts of our social media platforms (whether directly, or via APIs). As noted above, getting incumbent platforms to shift their business models to align their internal incentives with that need will be challenging. But I suggest it need not be as difficult as it might seem.
A related non-technical issue that many have noted is the need for education of citizens 1) in critical thinking, and 2) in the civics of our democracy. Both seem to have been badly neglected in recent decades. Aggressively remedying that is important, to help inoculate users from disinformation and sloppy thinking -- but that will have limited effectiveness unless we alter the overwhelmingly fast dynamics of our information flows (with the cognitive immune system suggested here) -- to help make us smarter, not dumber in the face of this deluge of information.

---
[Update 10/12:] A subsequent Times article by Sheera Frenkel, adds perspective on the scope and pace of the problem -- and the difficulty in definitively identifying items as fakes that can rightly be censored "because of the blurry lines between free speech and disinformation" -- but such questionable items can be down-ranked.
-----
*Background on our Immune Systems -- from the introduction to the paper mentioned above, "A Cognitive Computational Model Inspired by the Immune System Response" (emphasis added):
The immune system (IS) is by nature a highly distributed, adaptive, and self-organized system that maintains a memory of past encounters and has the ability to continuously learn about new encounters; the immune system as a whole is being interpreted as an intelligent agent. The immune system, along with the central nervous system, represents the most complex biological system in nature [1]. This paper is an attempt to investigate and analyze the immune system response (ISR) in an effort to build a framework inspired by ISR. This framework maintains the same features as the IS itself; it is cognitive, adaptive, fault-tolerant, and fuzzy conceptually. The paper sets three phases for ISR operating sequentially, namely, “recognition,” “decision making,” and “execution,” in addition to another phase operating in parallel which is “maturation.” This paper approaches these phases in detail as a component based architecture model. Then, we will introduce a proposal for a new hybrid and cognitive architecture inspired by ISR. The framework could be used in interdisciplinary systems as manifested in the ISR simulation. Then we will be moving to a high level architecture for the complex adaptive system. IS, as a first class adaptive system, operates on the body context (antigens, body cells, and immune cells). ISR matured over time and enriched its own knowledge base, while neither the context nor the knowledge base is constant, so the response will not be exactly the same even when the immune system encounters the same antigen. A wide range of disciplines is to be discussed in the paper, including artificial intelligence, computational immunology, artificial immune system, and distributed complex adaptive systems. Immunology is one of the fields in biology where the roles of computational and mathematical modeling and analysis were recognized...
The paper supposes that immune system is a cognitive system; IS has beliefs, knowledge, and view about concrete things in our bodies [created out of an ongoing emergent process], which gives IS the ability to abstract, filter, and classify the information to take the proper decisions.

Monday, August 27, 2018

The Tao of Fake News

We are smarter than this!

Everyone with any sense sees "fake news" disinformation campaigns as an existential threat to "truth, justice, and the American Way," but we keep looking for a Superman to sort out what is true and what is fake. A moment's reflection shows that, no Virginia, there is no SuperArbiter of truth. No matter who you choose to check or rate content, there will always be more or less legitimate claims of improper bias.
  • We can't rely on "experts" or "moderators" or any kind of "Consumer Reports" of news. We certainly can't rely on the Likes of the Crowd, a simplistic form of the Wisdom of the Crowd that is too prone to "The Madness of Crowds." 
  • But we can Augment the Wisdom of the Crowd.
  • We can't definitively declare good-faith "speech" as "fake" or "false." 
  • But we can develop a robust system for ranking the probable value and truthfulness of speech, revising those rankings, and using that to decide how to share it with whom.
For practical purposes, truth is a filtering process, and we can get much smarter about how we apply our collective intelligence to do our filtering.

The Tao of Fake News, Truth, and Meaning

Truth is a process. Truth is complex. Truth depends on interpretation and context. Meaning depends on who is saying something, to whom, and why (as Humpty-Dumpty observed). The truth in Rashomon is different for each of the characters. Truth is often very hard for individuals (even "experts") to parse.

Truth is a process, because there is no practical way to ensure that people speak the truth, nor any easy way to determine if they have spoken the truth. Many look to the idea of flagging fake news sources, but who judges, on what basis and what aspects? (A recent NeimanLab assessment of NewsGuard's attempt to do this shows how open to dispute even well-funded, highly professional efforts to do that are.)

Truth is a filtering process: How do we filter true speech from false speech? Over centuries we have come to rely on juries and similar kinds of panels, working in a structured process to draw out and "augment" the collective wisdom of a small crowd. In the sciences, we have a more richly structured process for augmenting the collective wisdom of a large crowd of scientists (and their experiments), informally weighing the authority of each member of the crowd -- and avoiding over-reliance on a few "experts." Our truths are not black and white, absolute, and eternal -- they are contingent, nuanced, and tentative -- but this Tao of truth has served us well.

It is now urgent that our methods for augmenting and filtering our collective wisdom be enhanced. We need to apply computer-mediated collaboration to apply a similar augmented wisdom of the crowd at Internet scale and speed. We can make quick initial assessments, then adapt, grow, and refine our assessments of what is true, in what way, and with regard to what.

Filtering truth -- networks, context, and community

If our goal is to exclude all false and harmful material, we will fail. The nature of truth, and of human values, is too complex. We can exclude the most obviously pernicious frauds -- but for good-faith speech from humans in a free society, we must rely on a more nuanced kind of wisdom.

Our media filter what we see. Now the filters in our dominant social media are controlled by a few corporations motivated to maximize ad revenue by maximizing engagement. They work to serve the advertisers that are their customers, not we users (who now are really their product). We need to get them to change how the filters operate, to maximize value to their users.

We need filters to be tuned to the real value of speech as communication from one person to other people.  Most people want the "firehose" of items on the Internet to be filtered in some way, but just how may vary. Our filters need to be responsive to the desires of the recipients. Partisans may like the comfort of their distorting filter bubbles, but most people will want at least some level of value, quality, and reality, at least some of the time. We can reinforce that by doing well at it.

There is also the fact that people live in communities. Standards for what is truthful and valuable vary from community to community -- and communities and people change over time. This is clearer than ever, now that our social networks are global.

Freedom of speech requires that objectionable speech be speak-able, with very narrow exceptions. The issue is who hears that speech, and what control do they have over what they hear. A related issues is when do third parties have a right to influence those listener choices, and how to keep that intrusive hand as light as possible. Some may think we should never see a swastika or a heresy, but who has the right to draw such lines for everyone in every context?

We cannot shut off objectionable speech, but we can get smarter about managing how it spreads. 

To see this more clearly, consider our human social network as a system of collective intelligence, one that informs an operational definition of truth. Whether at the level of a single social network like Facebook, or all of our information networks, we have three kinds of elements:
  • Sources of information items (publishers, ordinary people, organizations, and even bots) 
  • Recipients of information items  
  • Distribution systems that connect the sources and recipients using filters and presentation service that determine what we see and how we see it (including optional indicators of likely truthfulness, bias, and quality).
Controlling truth at the source may, at first, seem the simple solution, but requires a level of control of speech that is inconsistent with a free society. Letting misinformation and harmful content enter our networks may seem unacceptable, but (with narrow exceptions) censorship is just not a good solution.

Some question whether it is enough to "downrank" items in our feeds (not deleted, but less likely to be presented to us), but what better option do we have than to do that wisely? The best we can reasonably do is manage the spread of low quality and harmful information in a way that is respectful of the rights of both sources and recipients, to limit harm and maximize value.*

How can we do that, and who should control it? We, the people, should control it ourselves (with some limited oversight and support).  Here is how.

Getting smarter -- The Augmented Wisdom of Crowds

Neither automation nor human intelligence alone is up to the scale and dynamics of the problem.  We need a computer-augmented approach to managing the wisdom of the crowd -- as embodied in our filters, and controlled by us. That will pull in all of the human intelligence we can access, and apply algorithms and machine learning (with human oversight) to refine and apply it. The good news is that we have the technology to do that. It is just a matter of the will to develop and apply it.

My previous post outlines a practical strategy for doing that -- "The Augmented Wisdom of Crowds: Rate the Raters and Weight the Ratings." Google has already shown how powerful a parallel form of this strategy can be to filter which search results should be presented to whom-- on Internet scale. My proposal is to broaden these methods to filter what our our social media present to us.

The method is one of considering all available "signals" in the network and learning how to use them to inform our filtering process. The core of the information filtering process -- that can be used for all kinds of media, including our social media -- is to use all the data signals that our media systems have about our activity. We can consider activity patterns across these three dimensions:
  • Information items (content of any kind, including news items, personal updates, comments/replies, likes, and shares/retweets).
  • Participants (and communities and sub-communities of participants), who can serve as both sources and recipients of items (and of items about other items)
  • Subject and task domains (and sub-domains) that give important context to information items and participants.
We can apply this data with the understanding that any item or participant can be rated, and any item can contain one or more ratings (implicit or explicit) of other items and/or participants. The trick is to tease out and make sense of all of these interrelated ratings and relationships. To be smart about that, we must recognize that not all ratings as equal, so we "rate the raters, and weight the ratings" (using any data that signals a rating). We take that to multiple levels -- my reputational authority depends not only on the reputational authority of those who rate me, but on those who rate them (and so on).

This may seem very complicated (and at scale, it is), but Google proved the power of such algorithms to determine which search results are relevant to a user's query (at mind-boggling scale and speed). Their PageRank algorithm considers what pages link to a given page to assess the imputed reputational authority of that page -- with weightings based on the imputed authority of the pages that link to it (again to multiple levels). Facebook uses similarly sophisticated algorithms to determine what ads should be targeted to whom -- tracking and matching user interests, similarities, and communities and matching that with information on their response to similar ads.

In some encouraging news, it was recently reported that Facebook is now also doing a very primitive form of rating the trustworthiness of its users to try to identify fake news -- they track who spreads fake news and who reports abuse truthfully or deceitfully. What I propose is that we take this much farther, and make it central to our filtering strategies for social media and more broadly.

With this strategy, we can improve our media filters to better meet our needs, as follows:
  • Track explicit and implicit signals to determine authority and truthfulness -- both of the speakers (participants) and of the things they say (items) -- drawing on the wisdom of those who hear and repeat it (or otherwise signal how they value it).
  • Do similar tracking to understand the desires and critical thinking skills of each of the recipients
  • Rate the raters (all of us!) -- and weight the votes to favor those with better ratings. Do that n-levels deep (much as Google does).
  • Let the users signal what levels and types of filtering they want. Provide defaults and options to accommodate users desiring different balances of ease or of fine control and reporting. Let users change that as they desire, depending on their wish to relax, to do focused critical thinking, or to open up to serendipity.
  • Provide transparency and auditability -- to each user (and to independent auditors) -- as to what is filtered for them and how.**
  • Open the filtering mechanisms to independent providers, to spur innovation in a competitive marketplace in filtering algorithms for users to choose from.
That is the best broad solution that we can apply. As we get good at it we will be amazed at how effective it can be. But given the catastrophic folly of where have have let this get to...

First, do no harm!

Most urgently, we need to change the incentives of our filters to do good, not harm. At present, our filters are pouring gasoline on the fires (even as their corporate owners claim to be trying to put them out). As explained in a recent HBR article, "current digital advertising business models incentivize the spread of false news." That article explains the insidious problem of the ad model for paying for services (others have called it "the original sin of the Web") and offers some sensible remedies.  

I have proposed more innovative approaches to better-aligning business models -- and to using a light-handed, market-driven, regulatory approach to mandate doing that -- in "An Open Letter to Influencers Concerned About Facebook and Other Platforms."

We have learned that the Internet has all the messiness of humanity and its truths. We are facing a Pearl Harbor of a thousand pin-pricks that is rapidly escalating. We must mobilize onto a war footing now, to halt that before it is too late.
  • First we need to understand the nature and urgency of this threat to democracy, 
  • Then we must move on both short and longer time horizons to slow and then reverse the threat. 
The Tao of fake news contains its opposite, the Tao of Augmented Wisdom. If we seek that, the result will be not only to manage fake news, but to be smarter in our collective wisdom than we can now imagine.

Related posts:
---
*Of course some information items will be clearly malicious, coming from fraudulent human accounts or bots -- and shutting some of that off at the source is feasible and desirable. But much of the spread of "fake news" (malicious or not) is from real people acting in good faith, in accord with their understanding and beliefs. We cannot escape that non-binary nature of human reality, and must come to terms with our world in nuanced shades of gray. But we can get very sophisticated at distinguishing when news is spread by participants who are usually reliable from when it is spread by those who have established a reputation for being credulous, biased, or malicious.

**The usual concern with transparency is that if the algorithms are known, then bad-actors will game them. That is a valid concern, and some have suggested that even if the how of the filtering algorithm is secret, we should be able to see and audit the why for a given result.  But to the extent that there is an open market in filtering methods (and in countermeasures to disinformation), and our filters vary from user to user and time to time, there will be so much variability in the algorithms that it will be hard to game them effectively.

---
[Update 8/30:]  Giuliani and The Tao of Truth 

To indulge in some timely musing, the Tao of Truth gives a perspective on the widely noted recent public statement that "truth isn't truth." At the level of the Tao, we can say that "truth is/isn't truth," or more precisely, "truth is/isn't Truth" (with one capital T). That is the level at which we understand truth to be a process in which the question "what is truth?" depends on what we mean, at what level, in what context, with what assurance -- and how far we are in that process. We as a society have developed a broadly shared expectation of how that process should work. But as the process does its never-ending work, there are no absolutes -- only more or less strong evidence, reasoning, and consensus about what we believe the relevant truth to be. (That, of course is an Enlightenment social perspective, and some disagree with this very process, and instead favor a more absolute and authoritarian variation. Perhaps most fundamentally, we are now in a reactionary time in which our prevailing process for truth is being prominently questioned. The hope here is that continuing development of a free, open, and wise process prevails over return to a closed, authoritarian one -- and prevails over the loss of any consensus at all.

[Update 10/12:] A Times article by Sheera Frenkel, adds perspective on the scope and pace of the problem -- and the difficulty in definitively identifying items as fakes that can be censored "because of the blurry lines between free speech and disinformation" -- but such questionable items can be down-ranked.

Sunday, July 22, 2018

The Augmented Wisdom of Crowds: Rate the Raters and Weight the Ratings

How technology can make us all smarter, not dumber

We thought social media and computer-mediated communications technologies would make us smarter, but recent experience with Facebook, Twitter, and others suggests they are now making us much dumber. We face a major and fundamental crisis. Civilization seems to be descending into a battle of increasingly polarized factions who cannot understand or accept one another, fueled by filter bubbles and echo chambers.

Many have begun to focus serious attention on this problem, but it seems we are fighting the last war -- not using tools that match the task.

A recent conference, "Fake News Horror Show," convened people focused on these issues from government, academia, and industry, and one of the issues was who decides what is "fake news," how, and on what basis. There are many efforts at fact checking, and at certification or rating of reputable vs. disreputable sources -- but also recognition that such efforts can be crippled by circularity: who is credible-enough in the eyes of diverse communities of interest to escape the charge of "fake news" themselves?

I raised two points at that conference. This post expands on the first point and shows how it provides a basis for addressing the second:
  • The core issue is one of trust and authority -- it is hard to get consistent agreement in any broad population on who should be trusted or taken as an authority, no matter what their established credentials or reputation. Who decides what is fake news? What I suggested is that this is the same problem that has been made manageable by getting smarter about the wisdom of crowds -- much as Google's PageRank algorithm beat out Yahoo and AltaVista at making search engines effective at finding content that is relevant and useful.

    As explained further below, the essence of the method is to "rate the raters" -- and to weight those ratings accordingly. Working at Web scale, no rater's authority can be relied on without drawing on the judgement of the crowd. Furthermore, simple equal voting does not fully reflect the wisdom of the crowd -- there is deeper wisdom about those votes to be drawn from the crowd.

    Some of the crowd are more equal than others. Deciding who is more equal, and whose vote should be weighted more heavily can be determined by how people rate the raters -- and how those raters are rated -- and so on. Those ratings are not universal, but depend on the context: the domain and the community -- and the current intent or task of the user. Each of us wants to see what is most relevant, useful, appealing, or eye-opening -- for us -- and perhaps with different balances at different times. Computer intelligence can distill those recursive, context-dependent ratings, to augment human wisdom.
  • A major complicating issue is that of biased assimilation. The perverse truth seems to be that "balanced information may actually inflame extreme views." This is all too clear in the mirror worlds of pro-Trump and anti-Trump factions and their media favorites like Fox, CNN, and MSNBC. Each side thinks the other is unhinged or even evil, and layers a vicious cycle of distrust around anything they say. It seems one of the few promising counters to this vicious cycle is what Cass Sunstein referred to as surprising validators: people one usually gives credence to, but who suggest one's view on a particular issue might be wrong. A recent example of a surprising validator was the "Confession of an Anti-GMO Activist." This item is  readily identifiable as a "turncoat" opinion that might be influential for many, but smart algorithms can find similar items that are more subtle, and tied to less prominent people who may be known and respected by a particular user. There is an opportunity for electronic media services to exploit this insight that "what matters most may be not what is said, but who, exactly, is saying it."
These are themes I have been thinking and writing about on and off for decades. This growing crisis, as highlighted by the Fake News Horror Show conference, spurred me to write this outline for a broad architecture (and specific methods) for addressing these issues. Discussions at that event led to my invitation to an up-coming workshop hosted by the Global Engagement Center (a US State Department unit) focused on "technologies for use against foreign propaganda, disinformation, and radicalization to violence." This post is offered to contribute to those efforts.

Beyond that urgent focus, this architecture has relevance to the broader improvement of social media and other collaborative systems. Some key themes:
  • Binary, black or white thinking is easy and natural, but humans are capable of dealing with the fact that reality is nuanced in many shades of gray, in many dimensions. Our electronic media can augment that capability.
  • Instead, our most widely used social media now foster simplistic, binary thinking.
  • Simple strategies (analogous to those proven and continually refined in Google's search engine) enable our social media systems to recognize more of the underlying nuance, and bring it to our attention in far more effective ways.
  • We can apply an architecture that draws on some core structures and methods to enable intelligent systems to better augment human intelligence, and to do that in ways tuned to the needs of a diversity of people -- from different schools of thought and with different levels of intelligence, education, and attention.
  • Doing this can not only better expose truly fake news for what it is, but can make us smarter and more aware and reflective of nuance. 
  • This can not only guide our attention toward quality, but can also enable us to be more favored by surprising validators and other forms of serendipity needed to escape our filter bubbles.
Where I am coming from

I was first exposed to early forms of augmented intelligence and hypermedia in 1969 (notably Nelson and Engelbart), and to collaborative systems in 1971 (notably Turoff). That set a broad theme for my work. After varied roles in IT and media technology, I became an inventor, and one of my patent applications outlined a collaborative system for social development of inventions and other ideas (in 2002). While my specific business objective proved elusive (as the world of patents changed), what I described was a general architecture for collaborative development of ideas that has very wide applicability ("ideas" include news stories, social media posts, and "likes"). That is obviously more timely now than ever. I had written on this blog about some specific aspects of those ideas in 2012: "Filtering for Serendipity -- Extremism, 'Filter Bubbles' and 'Surprising Validators.'" To encourage use of those ideas, I released that patent filing into the public domain in 2016.

Here, I take a first shot at a broad description of these strategies that is intended to be more readable and relevant to our current crisis than the legalese of the patent application. As supplement to this, a copy of that patent document with highlighting of the portions that remain most relevant is posted online.*

Of course some of these ideas are more readily applied than others. But the goal of an architecture is to provide a vision and a framework to build on. Without considering the broad scope of what might be done over time is the best way to be sure that we do the best that we can do at any point in time. We can then adjust and improve on that to build toward still-better solutions.

Augmenting the wisdom of crowds

Civilization has risen because of our human skills: to cooperate, to learn from one another, and to coalesce on wisdom and resist folly -- difficult as it may often be to distinguish which is which.

Life is complex, and things are rarely black or white. The Tao symbolizes the realization that everything contains its opposite -- Ted Nelson put it that "everything is deeply intertwingled," and conceived of the Web as a way to reflect that. But throughout human history this nuanced intertwingling has remained challenging for people to grasp.

Behavioral psychology has elucidated the mechanisms behind our difficulty. We are capable of deep and subtle rational thought (Kahneman's System 2, "thinking slow"), but we are pragmatic and lazy, and prefer the the more instinctive, quick, and easy path (system 1, "thinking fast" -- a mode that offers great survival value when faced with urgent decisions. Only reluctantly do we think more deeply. The thinking fast of System 1 favors biased assimilation, with its reliance on the "cognitive ease," quick reactions, and emotional and tribal appeal, rather than rationality.

Augmenting human intellect

For over half a century, a seminal dream of computer technology has been "augmenting human intellect" based on "man-computer symbiosis." The developers of our augmentation tools and our social media believed in their power to enhance community and wisdom -- but we failed to realize how easily our systems can reduce us to the lowest common denominator if we do not apply consistent and coherent measures to better augment the intelligence they automated. A number of early collaborative Web services recognized that some contributors should be more equal than others (for example, Slashdot, with its "karma" reputation system). Simple reputation systems have also proven important for eBay and other market services. However, the social media that came to dominate broader society failed to realize how important that is, and were motivated to "move fast and break things" in a rush to scale and profit.

Now, we are trying to clean up the broken mess of this Frankenberg's monster, to find ways to flag "fake news" in its various harmful forms. But we still seem not to be applying the seminal work in this field. That failure has made our use of the wisdom of crowds stupid to the point of catastrophe. Instead of augmenting our intellect as Engelbart proposed, we are de-augmenting it. People see what is popular, read a headline without reading the full story, jump to conclusions and "like" it, making it more popular, so more people see it. The headlines increasingly become clickbait that distorts the real story. Influence shifts from ideas to memes. This is clearly a vicious cycle -- one that the social media services have little economic incentive to change -- polarization increases engagement, which sells more ads. We urgently need fundamental changes to these systems.

Crowdsourced, domain-specific, authorities -- rating the raters -- much like Google

Raw forms of the wisdom of crowds look to "votes" from crowd, weight them equally, and select the most popular or "liked" items (or a simple average of all votes). This has been done for forecasting, for citation analysis of academic papers, and in early computer searching. But it becomes apparent that this can lead to the lowest common denominator of wisdom, and is easily manipulated with fraudulent votes. Of course we can restrict this to curated "expert" opinion, but then we lose the wisdom of the larger crowd (including its ability to rapidly sense early signs of change).

It was learned that better results can be obtained by weighting votes based on authority, as done in Google's PageRank algorithm, so that votes with higher authority count more heavily (while still using the full crowd to balance the effects of supposed authorities who might be wrong). In academic papers, it was realized that it matters which journal cites an article (now that many low-quality pay-to-publish journals have proliferated).

In Google's search algorithm (dating from 1996, and continuously refined), it was realized that links from a widely-linked-to Web site should be weighted higher in authority than links from another that has few links in to it. The algorithm became recursive: PageRank (used to rank the top search results) depends on how many direct links come in, weighted by a second level factor of how many sites link in to those sites, and weighted in turn by a third level factor of how many of those have many inward links, and so on. Related refinements partitioned these rankings by subject domain, so that authority might be high in one domain, but not in others. The details of how many levels of recursion and how the weighting is done are constantly tuned by Google, but this basic rate the raters strategy is the foundation for Google's continuing success, even as it is now enhanced with many other "signals" in a continually adaptive way. (These include scoring based on analysis of page content and format to weight sites that seem to be legitimate above those that seem to be spam or link farms.)

Proposed methods and architecture

My patent disclosure explains much the same rate the raters strategy (call it RateRank?) as applicable to ranking items of nearly any kind, in a richly nuanced, open, social context for augmenting the wisdom of crowds. (It is a strategy that can itself be adapted and refined by augmenting the wisdom of crowds -- another case of "eat your own dog food!")

The core architecture works in terms of three major dimensions that apply to a full range of information systems and services:
  1. Items. These can be any kind of information item, including contribution items (such as news stories, blog posts, or social media posts, or even books or videos, or collections of items), comment/analysis items (including social media comments on other items), and rating/feedback items (including likes and retweets, as well as comments that imply a rating of another item)
  2. Participants (and communities and sub-communities of participants). These are individuals, who may or may not have specific roles (including submitters, commenters, raters, and special roles such as experts, moderators, or administrators). In social media systems, these might include people (with verified IDs or anonymous), collections of people in the form of businesses, commercial advertisers, political advertisers, and other organizations. (Special rules and restrictions might apply to non-human participants, including bots and corporate or state actors.) Communities of participants might be explicit (with controlled membership), such as Facebook groups, and implicit (and fuzzy), based on closeness of social graph relationships and domain interests. These might include communities of interest, practice, geographic locality, or  degree of social graph closeness. 
  3. Domains (and sub-domains). These may be subject-matter domains in various dimensions. Domains may overlap or cross-cut. (For example issues about GMOs might involve cross-cutting scientific, business, governmental/regulatory, and political domains.)
An important aspect of generality in this architecture is that:
  • Any item or participant can be rated (explicitly or implicitly)
  • Any item can contain one or more ratings of other items or participants (and of itself)
It should be understood that Google's algorithm is a specialized instance of such an architecture -- one where all the items are Web pages, and all links between Web pages are implicit ratings of the link destination by the link source. The key element of man-computer symbiosis here is that the decision to place a link is assumed to be a "rating" decision of a human Webmaster or author (a vote for the destination, by the source, from the source context), but the analysis and weighting of those links (votes) is algorithmic. Much as could be applied to fake news, Google has developed finely tuned algorithms for detecting the multitudes of "link farms" that use bots that seek to fraudulently mimic this human intelligence, and downgrades the weighting of such links.

How the augmenting works

The heart of the method is a fully adaptive process that rates the raters recursively, using explicit and implicit ratings of items and raters (and potentially even the algorithms of the system itself). Rate the raters, rate those who rate the raters, and so on. Weight the ratings according to the rater's reputation (in context), so the wisest members of the crowd, in the current context, as judged by the crowd, have the most say. The wisest in context meaning the wisest in the domains and communities that are most relevant to the current usage context. But still, all of the crowd should be considered at some level.

This causes good items and raters (and algorithms) to bubble up into prominence, and less well-rated ones to sink from prominence. This process would rarely be binary black and white. Highly rated items or participants can lose that rating over time, and in other contexts. Poorly rated items or participants might never be removed (except for extreme abuse) but simply downgraded (to contribute what small weight is warranted, especially if many agree on a contrary view) and can remain accessible with digging, when desired. (As noted below, our social media systems have become essential utilities, and exclusion of people or ideas on the fringe is at odds with the value of free speech in our open society.) The rules and algorithms could be continuously learning and adaptive, using a hybrid of machine learning and human oversight. 

Attention management systems can ensure that the best items tend to made most visible, and the worst least visible, but the system should adjust those rankings to the context of what is known about the user in general, and what is inferred about what the user is seeking at a given time -- with options for explicit overrides (much as Google adjusts its search rankings to the user and their current query patterns).  It should be noted that Facebook and others already use some similar methods, but unfortunately these are oriented to maximizing an intensity of "engagement" that optimizes for the company's ad sale opportunities, rather than to a quality of content and engagement for the user. We need sophistication of algorithms, data science, and machine learning applied to quality for users, not just engagement for advertisers and those who would manipulate us.

Participants might be imputed high authority in one domain, or in one community, but lower in others. Movie stars might outrank Nobel prize-winners when considering a topic in the arts or even in social awareness, but not in economic theory. NRA members might outrank gun control opponents for members of an NRA community, but not for non-members of that community.

Openness is a key enabling feature: these algorithms should not be monolithic, opaque, and controlled by any one system, but should be flexible, transparent, and adaptive -- and depend on user task/context/desires/skill at any given time. Some users may choose simple default processes and behaviors, but other could be enabled to mix and match alternative ranking and filtering processes, and to apply deeper levels of analytics to understand why the system is presenting a given view. Users should be able to modify the view they see as they may desire, either by changing parameters or swapping alternative algorithms. Such alternative algorithms could be from a single provider, or alternative sources in an open marketspace, or "roll your own."

Within this framework, key design factors include how these key processes are managed to work in concert, and to change how each of these behaves, for a given user, at given time, depending on task/context/desires/skill (including the level of effort a user wishes to put in):
  • The core rate the raters process, based on both implicit and explicit ratings, weighted by authority as assessed by other raters (as themselves weighted based on ratings by others), with selective levels of partitioning by community and domain. Consideration of formal and institutional authority can be applied to partially balance crowdsourced authority. Dynamic selection of weighting and balancing methods might depend on user task/context/desires
  • Attention tools that filter irrelevant items and highlight relevant ones (such as to give Facebook or Twitter users different views of their feed). Thus different Facebook or Twitter user might be able to get different views of their feed, and change that as desired.
  • Consideration with regard to which communities and sub-communities most contribute to rankings for specific items at specific times.  Communities might have graded openness (in the form of selectively permeable boundaries) to avoid groupthink and cross-fertilize effectively. This could be applied by using insider/outsider thresholds to manage separation/openness.
  • Consideration with regard to domains and sub-domains to maximize the quality and relevance of ratings, authority, and attention, and to avoid groupthink and cross-fertilize effectively.
  • Consideration of explicit vs. implicit ratings.. While explicit ratings may provide the strongest and most nuanced information, implicit ratings may be far more readily available, thus representing a larger crowd, and so may have the greatest value in augmenting the wisdom of the crowd. Just as with search and ad targeting, implicit ratings can include subtle factors, such as measures of attention, sentiment, emotion, and other behaviors.
  • Consideration of verified vs. unverified vs. anonymous participants. It may be desirable to allow a range of levels, use weighting where anonymous participants have no reputation or a negative reputation. Bots might be banned, or given very poor reputation.
  • Open creation, selection and use of alternative tools for filtering, discovery, attention/alerting, ranking, and analytics depending on user task/context/desires. This kind of openness can stimulate development and testing of creative alternatives and enable market-based selection of the best-suited tools.
  • Use of valuation, crowdfunding, recognition, publicity, and other non-monetary incentives can also be used to encourage productive and meaningful participation, to bring out the best of the crowd.
(As expanded on below, all of this should be done with transparency and user control.)

[Update 10/10/18:] This subsequent post: In the War on Fake News, All of Us are Soldiers, Already!, may help make this more concrete and clarify why it is badly needed.

Applying this to social media -- fake news, community standards, polarization, and serendipity

A core objective is to augment the wisdom of crowds -- to benefit from the crowd to filter out the irrelevant or poor quality -- but to have augmented intelligence in determining relevance and quality in a dynamically nuanced way that reduces the de-augmenting effect of echo chambers and filter bubbles.

Using these methods, true fake news, which is clearly dishonest and created by known bad actors, can be readily filtered out, with low risk of blocking good-faith contrarian perspectives from quality sources. Such fake news can readily be distinguished from  legitimate partisan spin (point and counterpoint), from legitimate criticism (a news photo of a Nazi sign) or historically important news items (the Vietnam "terror of war" photo), and from legitimate humor or satire.

A dilemma that has become very apparent in our social media relates to "community standards" for managing people and items that are "objectionable." Since our social media systems have become essential utilities, exclusion of people or ideas on the fringe is at odds with the rights of free speech in our open society. Jessica Lessin recently commented on Facebook's "clumsy" struggles with content moderation, and on the calls of some to ban people and items. She observes that Facebook wants the community to determine the rules, but also is pressed to placate regulators -- and observes that "getting two billion people to write your rules isn’t very practical."

"Getting two billion people to write your rules" is just what the augmented wisdom of crowds does seek to make practical -- and more effective than any other strategy. The rules would rarely ban people (real humans) or items, but simply limit their visibility beyond the participants and communities that choose to accept such people or items. Such "objectionable" people have no right to require they be granted wide exposure, and, at the same time, those who find some people or materials objectionable rarely have a right to insist on an absolute and total ban.

This ties back to the converse issue, the seeking of surprising validators and serendipity described in my 2015 post. By understanding the items and participants, how they are rated by whom, and how they fit into communities, social graphs, and domains, highly personalized attention management tools can minimize exposure to what is truly objectionable, but can find and present just the right surprising validators for each individual user (at times when they might be receptive). Similarly, these tools can custom-chose serendipitous items from other communities and domains that would otherwise be missed.

This is an area where advanced augmentation of crowd wisdom can become uniquely powerful. The mainstream will become more aware and accepting of fringe views and materials (and might set aside specific times for exploring such items), and the extremes will have the freedom to choose (1) whether they wish to make their case in a way that others can accept as unpleasant but not unreasonable and antisocial, or (2) to be placed beyond the pale of broader society: hard to find, but still short of total exclusion. Again, a high degree of customization can be applied (and varied with changing context). Those who want walled gardens can create them -- with windows and gates that open where and when desired.

Innovation, openness, transparency, and privacy

Of course the key issues are how do we apply quick fixes for our current crisis, how do we evolve toward better media ecosystems, and how do we balance privacy and transparency. I generally advocate for openness and transparency. 

The Internet and the early Web were built on openness and transparency, which fueled a huge burst of innovation.  (Just as I refer to my 2002 patent filing, one can make a broad argument that many of the most important ideas of digital society emerged around the time of that "dot-com" era or before.) Open, interoperable systems (both Web 1.0 and Web 2.0) enabled a thousand flowers to bloom. There are also similar lessons from systems for financial market data (one of the first great data market ecologies) fueled by open access to market data from trading exchanges, and to competing, interoperable distribution, analytics, and presentation services. The patent filing I describe here (and others of mine) build on similar openness and interoperability. 

Now that we have veered down a path of closed, monopolistic walled gardens that have gained great power, we face difficult questions of how to manage them for the public good. I suggest we probably need a mix of all four of the following. Determining just how to do that will be challenging. (Some suggestions related to each of these follow.) 
  1. Can we motivate monopolies like Facebook to voluntarily shift to better serve us? Ideally, that would be the fastest solution, since they have full power to introduce such methods (and the skills to do so are much the same as the skills they now apply for targeting ads).
  2. Can we independently layer needed functions on top of such services (or in competition with them)? The questions are how to interface to existing services (with or without cooperation) and how to gain critical mass. Even at more limited scale, such secondary systems might provide augmented wisdom that could be fed back into the dominant systems, such as to help flag harmful items.
  3. Should we mandate regulatory controls, accepting these systems as natural monopolies to be regulated as such (much like early days of regulating the Bell System monopoly on telephonic media platforms)? There seem to be strong arguments for at least some of this, but being smart about it will be a challenge.
  4. Should we open them up or break portions of them apart (much like the later days of regulating the  Bell System)? Here, too, there seem to be strong arguments for at least some of this, but being smart about it will be a challenge.
  5. Can we use regulation to force the monopolies to better serve their users (and society) by forcing changes in their business model (with incentives to serve users rather than advertisers)? I suggest that may be one of the most feasible and effective levers we can apply.
My suggestions about those alternatives:
A transparent society?

A central (and increasingly urgent) dilemma relates to privacy. Some of my suggestions for openness and transparency in our social media and similar collaborative systems could potentially conflict with privacy concerns. We may have to choose between strict privacy and smart, effective systems that create immense new value for users and society. We need to think more deeply about which objectives matter, and how to get the best of mix. Privacy is an important human issue, but its role in our world of Big Data and AI is changing: 
  • As David Brin suggested in The Transparent Society, the question of privacy is not just what is known about us, but who controls that information. Brin suggests the greatest danger is that authoritarian governments will control information and use it to control us (as China is increasingly on track to do that). 
  • We now face a similar concern with monopolies that have taken on quasi-governmental roles -- they seem to be answerable to no one, and are motivated not to serve their users, but to manipulate us to serve the advertisers who they profit from. (There are also the advertisers, themselves.)
  • Brin suggested our technology will return us to the more transparent human norms of the village -- everyone knew one-another's secrets, but that created a balance of power where all but the most antisocial secrets were largely ignored and accepted. We seem to be well on the way to accepting less privacy, as long as our information is not abused.
  • I suggest we will gain the most by moving in the direction of openness and transparency -- with care to protect the aspects of privacy that really need protection (by managing well-targeted constraints on who has access to what, under what controls). 
That takes us back to the genius of man-computer symbiosis -- AI and machine learning thrive on big data. Locking up or siloing big data can cripple our ability to augment the wisdom of crowds and leave us at the mercy of the governments or businesses that do have our data. We need to find a wise middle ground of openness that fuels augmented intelligence and market forces -- in which service providers are driven by customer demand and desires, and constrained only by the precision-crafted privacy protections that are truly needed.

-----------------------
Related Posts:

------

*Appendix -- My patent disclosure document (now in public domain)

This post draws on the architecture and methods described in detail in my US patent application entitled "Method and Apparatus for and Idea Adoption Marketplace" (10/692,974), which was published 9/17/04. It was filed 10/24/03 formalizing a provisional filing on 10/24/02. I released this material into the public domain on 12/19/16. I retain no patent rights in it, and it is open to all who can benefit from it.

A copy of that application with highlighting of portions that remain most relevant to current needs is now online. While this is written in a legalese style required for patent applications that is not very readable, it is hoped that the highlighted sections are readable to those with interest. (A duplicate copy is here.)

The highlighted sections present a broad architecture that now seems more timely than ever, and provides an extensible framework for far better social media -- and important aspects of digital democracy in general.

Tuesday, June 26, 2018

AI = Augmented Intelligence: One More Time: Man + Machine (via HBR and SMR)

In a notable bit of synchronicity, the summer issues of both Harvard Business Review and MIT Sloan Management Review have feature articles advocating a more symbiotic approach to AI:

As Malone encapsulates it, what we need is, "an architecture for general purpose, problem-solving superminds: Computers use their specialized intelligence to solve parts of the problem, people use their general intelligence to do the rest, and computers help engage and coordinate far larger groups of people than has ever been possible."

Why do we keep forgetting how important such a symbiotic approach is?  As I have written multiple times on this blog (most recently in my last post):
Another very powerful aspect of networks and algorithms that many neglect is  the augmentation of human intelligence. This idea dates back some 60 years (and more), when "artificial intelligence" went through its first hype cycle -- Licklider and Engelbart observed that the smarter strategy is not to seek totally artificial intelligence, but to seek hybrid strategies that draw on and augment human intelligence. Licklider called it "man-computer symbiosis, and used ARPA funding to support the work of Engelbart on "augmenting human intellect." In an age of arcane and limited uses of computers, that proved eye-opening at a 1968 conference ("the mother of all demos"), and was one of the key inspirations for modern user interfaces, hypertext, and the Web.
The term augmentation is resurfacing in the artificial intelligence field, as we are once again realizing how limited machine intelligence still is, and that (especially where broad and flexible intelligence is needed) it is often far more effective to seek to apply augmented intelligence that works symbiotically with humans, retaining human visibility and guidance over how machine intelligence is used.
Both articles are valuable updates and teachings on how and why to pursue this understanding. But why is it so hard to keep in mind that what we seek is not man or machine, but man augmented by machine?

Thursday, April 26, 2018

Architecting Our Platforms to Better Serve Us -- Augmenting and Modularizing the Algorithm

We dreamed that our Internet platforms would serve us miraculously, but now see that they have taken a wrong turn in many serious respects. That realization has reached a crescendo in the press and in Congress with regard to Facebook and Google's advertising-driven services, but it reaches far more deeply.

"Titans on Trial: Do internet giants have too much power? Should governments intervene?" -- I had the honor last night of attending this stimulating mock trial, with author Ken Auletta as judge and FTC Commissioner Terrell McSweeny and Rob Atkinson, President of the Information Technology and Innovation Foundation (ITIF) as opposing advocates (hosted by Genesys Partners). My interpretation of the the jury verdict (voted by all of the attendees, who were mostly investors or entrepreneurs) was: yes, most agree that regulation is needed, but it must be nuanced and smartly done, not heavy handed. Just how to do that will be a challenge, but it is a challenge that we must urgently consider.

I have been outlining views on this that go in some novel directions, but are generally consistent with the views of many other observers. This post takes a broad view of those suggestions, drawing from several earlier posts.

One of the issues touched on below is a core business model issue -- the idea that the ad-model of "free" services in exchange for attention to ads is "the original sin of the Internet." It has made users of Facebook and Google (and many others) "the product, not the customer," in a way that distorts incentives and fails to serve the user interest and the public interest. As the Facebook fiasco makes clear, these business model incentives can drive these platforms to provide just enough value to "engage" us to give up our data and attend to the advertiser's messages and manipulation and even to foster dopamine-driven addiction, but not necessarily to offer consumer value (services and data protection) that truly serves our interests.

That issue is specifically addressed in a series of posts in my other blog that focuses on a novel approach to business models (and regulation that centers on that), and those posts remain the most focused presentations on those particular issues:
This rest of this post adapts a broader outline of ideas previously embedded in a book review (on Neal Ferguson's "The Square and the Tower: Networks and Power from the Freemasons to Facebook," a historical review of power in the competing forms of networks and hierarchies). Here I abridge and update that post to concentrate on on our digital platforms. (Some complementary points on the need for new thinking on regulation -- and the need for greater tech literacy and nuance -- are in a recent HBR article, "The U.S. Needs a New Paradigm for Data Governance.")

Rethinking our networks -- and the algorithms that make all the difference

Drawing on my long career as a systems analyst/engineer/designer, manager, entrepreneur, inventor, and investor (including early days in the Bell System when it was a regulated monopoly providing "universal service"), I have recently come to share the fear of many that we are going off the rails.

But in spite of the frenzy, it seems we are still failing to refocus on better ways to design, manage, use, and govern our networks -- to better balance the best of hierarchy and openness. Few who understand technology and policy are yet focused on the opportunities that I see as reachable, and now urgently needed.

New levels of man-machine augmentation and new levels of decentralizing and modularizing intelligence can make these network smarter and more continuously adaptable to our wishes, while maintaining sensible and flexible levels of control -- and with the innovative efficiency of an open market.   We can build on distributed intelligence in our networks to find more nuanced ways to balance openness and stability (without relying on unchecked levels of machine intelligence). Think of it as a new kind of systems architecture for modular engineering of rules that blends top-down stability with bottom-up emergence, to apply checks and balances that work much like our representative democracy. This is a still-formative development of ideas that I have written about for years, and plan to continue into the future.

First some context. The crucial differences among all kinds of networks (including hierarchies) are in the rules (algorithms, code, policies) that determine which nodes connect, and with what powers. We now have the power to create a new synthesis. Modern computer-based networks enable our algorithms to be far more nuanced and dynamically variable. They become far more emergent in both structure and policy, while still subject to basic constraints needed for stability and fairness.

Traditional networks have rules that are either relatively open (but somewhat slow to change), or constrained by laws and customs (and thus resistant to change). Even our current social and information networks are constrained in important ways. Some examples:
  • The US constitution defines the powers and the structures for the governing hierarchy, and processes for legislation and execution, made resilient by its provisions for self-amendable checks and balances. 
  • Real-world social hierarchies have structures based on empowered people that tend to shift more or less slowly.
  • Facebook has a social graph that is emergent, but the algorithms for filtering who sees what are strictly controlled by, and private to, Facebook. (In January they announced a major change --  unilaterally -- perhaps for the better for users and society, if not for content publishers, but reports quickly surfaced that it had unintended consequences when tested.)
  • Google has a page graph that is given dynamic weight by the PageRank algorithm, but the management of that algorithm is strictly controlled by Google. It has been continuously evolving in important respects, but the details are kept secret to make it harder to game.
Our vaunted high-tech networks are controlled by corporate hierarchies (FANG: Facebook, Amazon, Netflix, and Google in much of the world, and BAT: Baidu, Alibaba, and Tencent in China) -- but are subject to limited levels of government control that vary in the US, EU, and China. This corporate control is a source of tension and resistance to change -- and a barrier to more emergent adaptation to changing needs and stressors (such as the Russian interference in our elections). These new monopolistic hierarchies extract high rents from the network -- meaning us, the users -- mostly indirectly, in the form of advertising and sales of personal data.

Smarter, more open and emergent algorithms -- APIs and a common carrier governance model

The answer to the question of governance is to make our network algorithms not only smarter, but more open to appropriate levels of individual and multi-party control. Business monopolies or oligarchies (or governments) may own and control essential infrastructure, but we can place limits on what they control and what is open. In the antitrust efforts of the past century governments found need to regulate rail and telephone networks as common carriers, with limited corporate-owner power to control how they are used, giving marketplace players (competitors and consumers) a share in that control. 

Initially this was rigid and regulated in great detail by the government, but the Carterfone decision showed how to open the old AT&T Bell System network to allow connection of devices not tested and approved by AT&T. Many forget how only AT&T phones could be used (except for a few cases of alternative devices like early fax machines that went through cumbersome and often arbitrary AT&T approval processes). Remember the acoustic modem coupler, needed because modems could not be directly connected? That changed when the FCC's decision opened the network up to any device that met defined electrical interface standards (using the still-familiar RJ11, a "Registered Jack").

Similarly only AT&T long-distance connections could be used, until the antitrust Consent Decree opened up competition among the "Baby Bells" and broke them off from Long Lines to compete on equal terms with carriers like MCI and Sprint. Manufacturing was also opened to new competitors.

In software systems, such plug-like interfaces are known as APIs (Application Program Interfaces), and are now widely accepted as the standard way to let systems interoperate with one another -- just enough, but no more -- much like a hardware jack does. This creates a level of modularity in architecture that lets multiple systems, subsystems, and components  interoperate as interchangeable parts -- extending the great advance of the first Industrial Revolution to software.

What I suggest as the next step in evolution of our networks is a new kind of common carrier model that recognizes networks like Facebook, Google, and Twitter as common utilities once they reach some level of market dominance. Then antitrust protections would mandate open APIs to allow substitution of key components by customers -- to enable them to choose from an open market of alternatives that offer different features and different algorithms. Some specific suggestions are below (including the very relevant model of sophisticated interoperablilty in electronic mail networks), but first, a bit more on the motivations.

Modularity, emergence, markets, transparency, and democracy

Systems architects have long recognized that modularity is essential to making complex systems feasible and manageable. Software developers saw from the early days that monolithic systems did not scale -- they were hard to build, maintain, or modify. (The picture here of the tar pits is from Fred Brooks classic 1972 book in IBM's first large software project.)  Web 2.0 extended that modularity to our network services, using network APIs that could be opened to the marketplace. Now we see wonderful examples of rich applications in the cloud that are composed of elements of logic, data, and analytics from a vast array of companies (such as travel services that seamlessly combine air, car rental, hotel, local attractions, loyalty programs, advertising, and tracking services from many companies).

The beauty of this kind of modularity is that systems can be highly emergent, based on the transparency and stability of published, open APIs, to quickly adapt to meet needs that were not anticipated. Some of this can be at the consumer's discretion, and some is enabled by nimble entrepreneurs. The full dynamics of the market can be applied, yet basic levels of control can be retained by the various players to ensure resilience and minimize abuse or failures.

The challenge is how to apply hierarchical control in the form of regulation in a way that limits risks, while enabling emergence driven by market forces. What we need is new focus on how to modularize critical common core utility services and how to govern the policies and algorithms that are applied, at multiple levels in the design of these systems (another, more hidden and abstract, kind of hierarchy). That can be done through some combination of industry self-regulation (where a few major players have the capability to do that, probably faster and more effectively than government), but by government where necessary (preferably only to the extent and duration necessary).

That obviously will be difficult and contentious, but it is now essential, if we are not to endure a new age of disorder, revolution, and war much like the age of religious war that followed Gutenberg (as Ferguson described). Silicon Valley and the rest of the tech world need to take responsibility for the genie they have let out of the bottle, and to mobilize to deal with it, and to get citizens and policymakers to understand the issues.

Once that progresses and is found to be effective, similar methods may eventually be applied to make government itself more modular, emergent, transparent, and democratic -- moving carefully toward "Democracy 2.0." (The carefully part is important -- Ferguson rightfully noted the dangers we face, and we have done a poor job of teaching our citizens, and our technologists, even the traditional principles of history, civics, and governance that are prerequisite to a working democracy.)

Opening the FANG walled gardens (with emphasis on Facebook and Google, plus Twitter)

This section outlines some rough ideas. (Some were posted in comments on an article in The Information by Sam Lessin, titled, "The Tower of Babel: Five Challenges of the Modern Internet.")

The fundamental principle is that entrepreneurs should be free to innovate improvements to these "essential" platforms -- which can then be selected by consumer market forces. Just as we moved beyond the restrictive walled gardens of AOL, and the early closed app stores (initially limited to apps created by Apple), we have unleashed a cornucopia of innovative Web services and apps that have made our services far more effective (and far more valuable to the platform owners as well, in spite of their early fears). Why should first movers be allowed to block essential innovation? Why should they have sole control and knowledge of the essential algorithms that are coming to govern major aspects of our lives? Why shouldn't our systems evolve toward fitness functions that we control and understand, with just enough hierarchical structure to prevent excessive instability at any given time?

Consider the following specific areas of opportunity.

Filtering rules. Filters are central to the function of Facebook, Google, and Twitter. As Ferguson observes, there are issues of homophily, filter bubbles, echo chambers, and fake news, and spoofing that are core to whether these networks make us smart or stupid, and whether we are easily manipulated to think in certain ways. Why do we not mandate that platforms be opened to user-selectable filtering algorithms (and/or human curators)? The major platforms can control their core services, but could allow users to select separate filters that interoperate with the platform. Let users control their filters, whether just by setting key parameters, or by substituting pluggable alternative filter algorithms. (This would work much like third party analytics in financial market data systems.) Greater competition and transparency would allow users to compare alternative filters and decide what kinds of content they do or do not want. It would stimulate innovation to create new kinds of filters that might be far more useful and smart.

For example, I have proposed strategies for filters that can help counter filter bubble effects by being much smarter about how people are exposed to views that may be outside of their bubble, doing it in ways that they welcome and want to think about. My post, Filtering for Serendipity -- Extremism, "Filter Bubbles" and "Surprising Validators" explains the need, and how that might be done. The key idea is to assign levels of authority to people based on the reputational authority that other people ascribe to them (think of it as RateRank, analogous to Google's PageRank algorithm). This approach also suggests ways to create smart serendipity, something that could be very valuable as well.

The "wisdom of the crowd" may be a misnomer when the crowd is an undifferentiated mob, but,  I propose seeking the wisdom of the smart crowd -- first using the crowd to evaluate who is smart, and then letting the wisdom of the smart sub-crowd emerge, in a cyclic, self-improving process (much as Google's algorithm improves with usage, and much as science is open to all, but driven by those who gain authority, temporary as that may be).

Social graphs: Why do Facebook, Twitter, LinkedIn, and others own separate, private forms of our social graph. Why not let other user agents interoperate with a given platform’s social graph? Does the platform own the data defining my social graph relationships or do I? Does the platform control how that affects my filter or do I? Yes, we may have different flavors of social graph, such as personal for Facebook and professional for LinkedIn, but we could still have distinct sub-communities that we select when we use an integrated multi-graph, and those could offer greater nuance and flexibility with more direct user control.

User agents versus network service agents: Email systems were modularized in Internet standards long ago, so that we compose and read mail using user agents (Outlook, Apple mail, Gmail, and others) that connect with federated remote mail transfer agent servers (that we may barely be aware of) which interchange mail with any other mail transfer agent to reach anyone using any kind of user agent, thus enabling universal connectivity.

Why not do much the same, to let any social media user agent interoperate with any other, using a federated social graph and federated message transfer agents? We could then set our user agent to apply filters to let us see whichever communities we want to see at any given time. Some startups have attempted to build stand-alone social networks that focus on sub-communities like family or close friends versus hundreds of more or less remote acquaintances. Why not just make that a flexible and dynamic option, that we can control at will with a single user agent? Why require a startup to build and scale all aspects of a social media service, when they could just focus on a specific innovation? (The social media UX can be made interoperable to a high degree across different user agents, just as email user agents handle HTML, images, attachments, emojis, etc. -- and as do competing Web browsers.)

Identity: A recurring problem with many social networks is abuse by anonymous users (often people with many aliases, or even just bots). Once again, this need not be a simple binary choice. It would not be hard to have multiple levels of participant, some anonymous and some with one or more levels of authentication as real human individuals (or legitimate organizations). First class users would get validated identities, and be given full privileges, while anonymous users might be permitted but clearly flagged as such, with second class privileges. That would allow users to be exposed to anonymous content, when desired, but without confusion as to trust levels. Levels of identity could be clearly marked in feeds, and users could filter out anonymous or unverified users if desired. (We do already see some hints of this, but only to a very limited degree.)

Value transfers and extractions: As noted above, another very important problem is that the new platform businesses are driven by advertising and data sales, which means the consumer is not the customer but the product. Short of simply ending that practice (to end advertising and make the consumer the customer), those platforms could be driven to allow customer choice about such intrusions and extractions of value. Some users may be willing opt in to such practices, to continue to get "free" service, and some could opt out, by paying compensatory fees -- and thus becoming the customer. If significant numbers of users opted to become the customer, then the platforms would necessarily become far more customer-first -- for consumer customers, not the business customers who now pay the rent.

I have done extensive work on alternative strategies that adaptively customize value propositions and prices to markets of one -- a new strategy for a new social contract that can shape our commercial relationships to sustain services in proportion to the value they provide, and our ability to pay, so all can afford service. A key part of the issue is to ensure that users are compensated for the value of the data they provide. That can be done as a credit against user subscription fees (a "reverse meter"), at levels that users accept as fair compensation. That would shift incentives toward satisfying users (effectively making the advertiser their customer, rather than the other way around). This method has been described in the Journal of Revenue and Pricing Management: “A novel architecture to monetize digital offerings,” and very briefly in Harvard Business Review. More detail is my FairPayZone blog and my book (see especially the posts about the Facebook and Google business models that are listed in the opening section, above, and again at the end.*)

Analytics and metrics: we need access to relevant usage data and performance metrics to help test and assess alternatives, especially when independent components interact in our systems. Both developers and users will need guidance on alternatives. The Netflix Prize contests for improved recommender algorithms provided anonymized test data from Netflix to participant teams. Concerns about Facebook's algorithm, and the recent change that some testing suggests may do more harm than good, point to the need for independent review. Open alternatives will increase the need for transparency and validation by third parties.

(Sensitive data could be restricted to qualified organizations, with special controls to avoid issues like the Cambridge Analytica mis-use. The answer to such abuse is not greater concentration of power in one platform, as Maurice Stucke points out in Harvard Business Review, "Here Are All the Reasons It’s a Bad Idea to Let a Few Tech Companies Monopolize Our Data." (Facebook has already moved toward greater concentration of power.)

If such richness sounds overly complex, remember that complexity can be hidden by well-designed user agents and default rules. Those who are happy with a platform's defaults need not be affected by the options that other users might enable (or swap in) to customize their experience. We do that very successfully now with our choice of Web browsers and email user agents. We could have similar flexibility and choice in our platforms -- innovations that are valuable can emerge for use by early adopters, and then spread into the mainstream if success fuels demand. That is the genius of our market economy -- a spontaneous, emergent process for adaptively finding what works and has value -- in ways more effective than any hierarchy (as Ferguson extols, with reference to Smith, Hayek, and Levitt).

Augmentation of humans (and their networks)

Another very powerful aspect of networks and algorithms that many neglect is  the augmentation of human intelligence. This idea dates back some 60 years (and more), when "artificial intelligence" went through its first hype cycle -- Licklider and Engelbart observed that the smarter strategy is not to seek totally artificial intelligence, but to seek hybrid strategies that draw on and augment human intelligence. Licklider called it "man-computer symbiosis, and used ARPA funding to support the work of Engelbart on "augmenting human intellect." In an age of arcane and limited uses of computers, that proved eye-opening at a 1968 conference ("the mother of all demos"), and was one of the key inspirations for modern user interfaces, hypertext, and the Web.

The term augmentation is resurfacing in the artificial intelligence field, as we are once again realizing how limited machine intelligence still is, and that (especially where broad and flexible intelligence is needed) it is often far more effective to seek to apply augmented intelligence that works symbiotically with humans, retaining human visibility and guidance over how machine intelligence is used.

Why not apply this kind of emergent, reconfigurable augmented intelligence to drive a bottom up way to dynamically assign (and re-assign) authority in our networks, much like the way representative democracy assigns (and re-assigns) authority from the citizen up? Think of it as dynamically adaptive policy engineering (and consider that a strong bottom-up component will keep such "engineering" democratic and not authoritarian). Done well, this can keep our systems human-centered.

Reality is not binary:  "Everything is deeply intertwingled"

Ted Nelson (who coined the term "hypertext" and was another of the foundational visionaries of the Web), wrote in 1974 that "everything is deeply intertwingled." As he put it, "Hierarchical and sequential structures, especially popular since Gutenberg, are usually forced and artificial. Intertwingularity is not generally acknowledged—people keep pretending they can make things hierarchical, categorizable and sequential when they can't."

It's a race:  augmented network hierarchies that are emergently smart, balanced, and dynamically adaptable -- or disaster

If we pull together to realize this potential, we can transcend the dichotomies and conflicts that are so wickedly complex and dangerous. Just as Malthus failed to account for the emergent genius of civilization, and the non-linear improvements it produces, many of us discount how non-linear the effect of smarter networks, with more dynamically augmented and balanced structures, can be. But we are racing along a very dangerous path, and are not being nearly smart or proactive enough about what we need to do to avert disaster. What we need now is not a top-down command and control Manhattan Project, but a multi-faceted, broadly-based movement, with elements of regulation, but primarily reliant on flexible, modular architectural design.

---
Related posts:
---

Coda:  On becoming more smartly intertwingled

Everything in our world has always been deeply intertwingled. Human intellect augmented with technology enables us to make our world more smartly intertwingled. But we have lost our way, in the manner that Engelbart alluded to in his illustration of de-augmentation -- we are becoming deeply polarized, addicted to self-destructive dopamine-driven engagement without insight or nuance. We are being de-augmented by our own technology run amok.


(I plan to re-brand this blog as "Smartly Intertwingled" -- that is the objective that drives my work. The theme of "User-Centered Media" is just one important aspect of that.)


--------------------------------------------------------------------------------------------

*On business models - FairPay (my other blog):  As noted above, a series of posts in my other blog focus on a novel approach to business models (and regulation that centers on that), and those posts remain my best presentation on those issues: