A Quick Sketch for Discussion: Formative thoughts on
addressing open concerns, posted in anticipation of a 11/9 conference
session at Stanford on the “middleware”
unbundling proposals. (This also suggests linkage to an 11/10 session on “data
cooperatives” at the same event). [Update: As noted at the end, there was some discussion at the 11/9 conference session that was generally supportive of the directions suggested here.]
Abstract
Recent proposals to unbundle filtering services from social media platforms to
better serve user interests have generated support, tempered by concern -- notably about business models and privacy protection.
Instead of the one-level functional unbundling that has been proposed, these
concerns may be better handled by a two-level unbundling.
Between the platforms
and the large numbers of unbundled filtering services that need resources and access
to sensitive personal data to filter effectively on their users’ behalf, add a
layer with a small number of better-resourced “infomediaries” that are fiduciaries
for users. The infomediaries can manage coordination of services, data protection,
and revenue sharing in service to user interests, and enable the many independent
filtering services to share resources and run their filters in privacy-protected
ways.
The time may be ripe for the long-gestating idea of “infomediaries”
to emerge as a linchpin for resolving some of the management and control dilemmas
we now face with social media. The session with Francis Fukuyama and others that
I moderated at the Tech Policy Press event on 10/7, Reconciling Social Media & Democracy: Fukuyama, Keller, Maréchal
& Reisman (along with other
speakers that followed) generated a wide-ranging discussion of issues with those
proposals that provide context for the upcoming session on these proposals he
will participate in at Stanford.
Knotty problems with the “middleware” proposal
The unbundling proposals that Fukuyama, I, and others
advocate have been viewed to have considerable appeal in principle, but the 10/7
discussion sharpened many previously
raised questions about whether they can work -- relating to speech, business
models, privacy, compatibility and interoperability, and technological feasibility.
Reflecting on the privacy issues led me to refocus on “infomediaries”
as an important part of a solution, and how they might clarify the business model
issues, as well. Infomediaries were first proposed in the dot-com era, as
agents of consumers that could negotiate with businesses over data and attention,
to give consumers control and compensation for their information. The imbalance
of power over consumers has grown in the world of e-commerce, but social media have
given this even more importance and urgency.
The unbundling proposal is to spin out the filtering of what
users see in their newsfeeds from the platforms -- to create independent filtering
“middleware” services that users select in an open market to serve as their agents.
There is wide agreement that their ad-engagement-driven business model drives social
media to promote harmful speech in powerful and dangerous ways. Fukuyama raised
an even deeper concern that the concentration of power to control what we each see
is a mortal threat to democracy, “a loaded gun sitting on the table” that we cannot
rely on good actors to not pick up.
Unbundling of the filtering services would take that loaded gun
from the platforms (and those who might coerce them) and reduce its power -- by giving
individual users more independent control of what they see in their social media
newsfeeds and recommendations. But -- how can those unbundled services be funded,
since users seem disinclined to pay for them? -- and how can the filtering services
use the personal data needed to do filtering effectively without breaches of privacy?
This problem is compounded because we would want a wide diversity
of filtering services innovating and competing for users. Many would be small, and
under-resourced -- and there is no simple, automated solution to understanding the
content they filter and its authority.
- How would they have the resources -- to not only do the basic filtering task of ranking, but also to moderate the overwhelming firehose of harmful content that already taxes the ability of giants like Facebook?
- How would a multitude of small filtering services be able to protect non-public multi-party content, as well as the multi-party personal metadata, needed to understand the provenance and authority of what they filter?
These are challenging tasks, and there is reluctance to proceed
without a clear idea of how we might operationalize a solution.
The role of infomediaries
I suggest the answer to this dilemma could be a more sophisticated
distribution of functions. Not just two levels: of platform and filtering services (as user agents);
but three levels: of platform, of infomediaries (as a few, privileged user agents),
and of filtering services (as many, more limited user agents).
"Infomediaries" (Information intermediaries)
were suggested in 1997 in Harvard Business Review --as a trusted user agent
that manages a consumer’s data and attention -- and negotiates with businesses on
how it is used and for what compensation. Similar ideas re-surfaced in a law review
article in 2016 as "Information Fiduciaries" and then in HBR in 2018
as "Mediators of Individual Data" ("MIDs").
(As I was writing this, I learned that another session at
the Stanford event is on more a recent variant, “Data
Cooperatives.” Despite that coincidence, I am unaware a connection has been
seen, except for the observation in this recent work that social media data is
not individual but “collective.” If the participants at those two sessions
are not in communication, I suggest that might be productive.)
Why have infomediaries not materialized in any significant way?
It seems network effects and the "original sin of the Internet," advertising,
have proven so hugely powerful that infomediaries never got critical mass in commerce
beyond narrow uses. (I was CTO from ‘98-‘00 for a basic kind of infomediary service
that had some success before the crash.)
But now, with the harms of social media bringing the broader
abuses of “attention capitalism” to a head, regulators may see that the only
way to systematically limit these harms – and the harms of attention capitalism
more broadly -- is to mandate the creation of infomediaries to serve as negotiating
and custodial agents for consumers. They offer a way to enable business models
that balance business power with consumer power, especially regarding compensation
for attention and data -- in ways that empower users to decide what to allow,
for what benefit. They also offer a new solution to protecting sensitive multi-party
social media messages and related metadata -- while enabling society to refine
and benefit from the wisdom of the crowd that it contains -- to help us manage our attention.
Here is a sketch of how filtering services might be supported
by infomediaries. Working out the details will be a complex task that should be
guided by a dedicated Digital Regulatory Agency with significant business and
independent expert participation.
- Put all personal data of social media users under the control of carefully regulated infomediaries (IMs) who interface with the platforms and the filtering services (FSs), as fiduciary agents for their users. Create a small number of infomediaries (five to seven?) to support defined subsets of users. After that, users would be free to migrate among infomediaries in a free market -- and very limited numbers of new infomediary entrants might be enabled.
- Spin out the filtering services from the platforms – and create processes to encourage new entrants. The infomediaries would cooperate to enable the filtering services to benefit from the data of all qualified infomediaries, while protecting personally identifiable data.
- Empower the infomediaries to negotiate a share of advertising revenue from the platforms on behalf of their users, in compensation for their data and attention – to be shared with the filtering services (and perhaps the users). Provide alternatively for a mix of user support or public subsidy much like existing public media. Ideally that could grow to include user support for the platforms as an alternative to some or all advertising.
- Use regulatory power to work with industry to manage interface standards and the ongoing conduct of these roles and negotiations, much as other essential, complex, and dynamic industries like finance, telecom, transport, power, and other utilities are regulated. Creation of new infomediaries might be strictly limited by regulators, much like banks or securities exchanges.
The virtue of this two-level unbundling architecture is that
it concentrates elements of the infomediary role that have network-wide impact and
sensitive data in a small number of large competitive entities -- they could
apply the necessary resources and technology to maintain privacy and provide
complex services, with some competitive diversity. It enables much larger
numbers of filtering services that serve diverse user needs to be lean and unburdened.
Because the new infomediaries would be accredited custodians
of sensitive messaging data, as fiduciaries for the users, they could share
that data among themselves, providing a collective resource to safely power the
filtering services.
This could be done in two ways: 1) by providing purpose and
time-limited, privacy protected data to the filtering services, or perhaps
simpler and more secure, 2) by acting as a platform that runs filtering
algorithms defined by the filtering services and returning rankings without divulging
the data itself. (More on how that can be done, and why, is below). Either way,
the platforms would no longer control or be gatekeepers for the filtering.
This multilevel breakup may sound very complex and raise questions
of regulatory power, but it would be very analogous to the breakup of the Bell System,
which unbundled the integrated AT&T into a long-distance service (AT&T),
seven regional local-service operating companies (RBOCs), and a manufacturing
service (Lucent), all of which were opened to new competitive entrants,
unleashing a torrent of valuable innovation.
As our social media ecosystem becomes the underlying fabric
of most human discourse, a similarly ambitious undertaking is not only economically
desirable and justifiable, but essential to the survival of democracy and free
speech. Functional specialization multiplies the number of entities, but it simplifies
the tasks of those entities – and enables competition, innovation, and
resilience. To the fear of technical solutions to social problems that Nathalie
Marechal spoke of, I submit that the problem of algorithms that select for virality (thus exacerbating a social problem) is a newly-created technical problem, driven by an incentives problem – one that
this architecture (or some improvement on it) can help solve.
A casual reader might stop here. The following
sections dig deeper on how this addresses first, the business model challenges
that infomediaries were conceived to solve, and then, the difficult privacy
issues of middleware unbundling and other problems that they seem might help
finesse.
----
Looking deeper...
Resolving the business model issues
Even as one who had read it then, it is now enlightening to
turn the clock back to 1997 to read the original HBR article on
infomediaries by John Hagel III and Jeffrey F. Rayport, The
Coming Battle for Customer Information,” for perspective on the current problems
of surveillance and attention capitalism. The authors predicted:
In order to help [consumers] strike
the best bargain with vendors, new intermediaries will emerge. They will
aggregate consumers and negotiate on their behalf within the economic
definition of privacy determined by their clients. … When ownership of
information shifts to the consumer, a new form of supply is created. By
connecting information supply with information demand and by helping both
parties involved determine the value of that information, infomediaries would
be building a new kind of information supply chain.”
A 1999 book co-authored by Hagel greatly expands on this idea (and is also worth a look).
It specifically refers to “filtering services,” to include or exclude marketing
messages to match the need or preferences of its clients.
Growing due to network effects and scale economies, vendors
like Amazon and ad-tech services like Google and Facebook have effectively usurped
the vendor side of the infomediary function. These powers are now so entrenched
and engorged with obscene profits that there is little hope that infomediaries that
do represent user interests can emerge without regulatory action.
The proposal that unbundled filtering services be funded by
as revenue share from the platforms has struck critics as implausible and
complex. But if that role is not dispersed, among large numbers of often-small
filtering services, but managed by a small number of larger infomediaries who
have a mandate from regulators, the task may be far more tractable.
Yes, this would be a complex ecosystem, with multiple levels
of cooperating businesses for which economically sound revenue shares would
need to be negotiated. Ad revenues from platforms to infomediaries, to
filtering services, and possibly to consumers. Or alternatively, from consumers
or sponsors or public funding -- in whichever direction makes corresponds to
the value transfer. But many industries – such as financial services. ad-tech,
telecom, logistics -- flourish with equally complex revenue shares (whether
called shares, fees, commissions, settlements or whatever), often overseen by
regulators that ensure fairness.
Once such a multiplayer market begins to operate, innovation
can enable better revenue models. My 2018 article “Reverse
the Biz Model” explored some possible variations, and explained how they
could work via infomediaries, or directly between business and consumer. It also
suggested how consumer funding to eliminate ads on an individual basis could be
commensurate with ability to pay. The inherent economics are more egalitarian
than one might first think because those with low income have low value to
advertisers. They would have to contribute less to compensate for lost ad
revenue. Mediated well, users could even benefit from whatever level of
non-intrusive and relevant advertising they desire, and platforms would still
bring in sufficient funding to disperse through the ecosystem -- perhaps more
than now, given that there would be less waste. (Note that filtering services might specialize in advertising/marketing messages or in user-generated content to better address the different issues for each.)
Some fear that having filtering services receive funding
from advertising, even indirectly, would continue the perverse incentives for
engagement that are so harmful. But revenue shares to the infomediaries and filtering
services need not be tied to engagement – they could be tied to monthly average
users or other user-value-based metrics. With a multitude of filtering services,
the value of engagement to the platform would be decoupled, so that no individual
filtering service would materially affect engagement. These services might be structured as nonprofits, benefit corporations, or cooperatives, to further shift incentives toward user and social value.
Resolving the privacy issues
The other key opportunity for infomediaries is to manage data
privacy. This takes on special significance because key aspects of filtering
and recommendations depend on either message content or the metadata about how
users interact with those messages -- both of which are often privacy-sensitive.
Importantly, as noted by the recent proposals for data
cooperatives, that data is not individual, but collective.
Infomediaries may offer a way to finesse the concerns pinpointed
in the 10/7 discussion. I suggested that the most promising strategy for
filtering to understand quality -- given the limitations of AI and of human review
of billions of content items in hundreds of languages and contexts -- is to use
the metadata that signals how other users responded to that content. Daphne
Keller nicely delineated the privacy concern:
… I think a lot of content
moderation does depend on metadata. For example, spam detection and demotion is
very much driven by metadata. And Twitter has said that a lot of how they
detect terrorist content, isn’t really by the content, it’s by the patterns of
connections between accounts following each other or coming from the same IP
address or appearing the same– those aren’t the examples they gave, but what I
assume they’re using. And I think it’s a big part of what Camille Francois has
called the ABC framework, the Actors-Behavior-Content, as these three
frameworks for approaching responding to problematic online content.
And I think it just makes
everything much harder because if we pretend that metadata isn’t useful to
content moderation, that kind of simplifies things. If we acknowledge that
metadata is useful, that is often personally identifiable data about users,
including users who haven’t signed up for this new middleware provider, and
it’s a different kind of personally identifiable data than just the fact that
they posted particular content at a particular time. And all of the concerns that
I raised, but in particular, the privacy concern and just like how do we even
do this? What is the technology that takes metadata structured around the
backend engineering of Twitter or whomever and share it with a competitor? That
gets really hard. So I’m scared to hear you bring up metadata because that adds
another layer of questions I’m not sure how to solve.
This is what drove me to refocus on infomediaries as the way
to cut through the dilemma. The platforms could have filtered using as much of this
data as the wished, since they now control that data. Similar data is central
to Google search (the PageRank algorithm that was the key to their success) --
but search is less driven by engagement than social media.
Privacy has been a sore point for the unbundling of
filtering. The kind of issues that Keller raised led Fukuyama and his
colleagues to back off from the broadest unbundling to advocate more limited
ambitions, such as labelling, that are content-based and rather than
metadata-based. He points to services like NewsGuard that rate news sources for
their credibility. As I have argued
elsewhere, that is a useful service, but severely limited because it only
applies to limited numbers of established news services (which do represent
large amounts of content), not the billions of user-generated content sources (obviously
significant in aggregate, but intractable for expert ratings). Instead, I
suggest using metadata to draw out the wisdom
of crowds, much as Google does. Recent studies
support the idea that crowdsourced assessment of quality can be as good as expert
ratings, and there is no question that automated crowdsourced methods that draw
on passively obtained metadata are far more capable of operating at Internet
scale and speed – the only solution that can really scale as needed.
Thus, it would be a huge loss to society to not be able to
filter social media based on interaction metadata --- an infomediary strategy
for making that feasible is well worth some added complexity. A manageable
number of infomediaries could manage this data to include most (but not necessarily
all) users in this crowdsourcing. Each infomediary would only have a subset of
the users’ data, but that data could be pooled among properly regulated
infomediaries and restricted to use only in filtering.
More technical/operational detail on filtering and data
protection
As noted above, and drawing on work on trust and data
sharing by Sandy
Pentland (one of the speakers in the Stanford Data Cooperatives session),
and similar suggestions by Stephen
Wolfram (in his 2019 testimony to a US Senate subcommittee), there seem to
be two basic alternatives: 1) providing limited, privacy protected data to the
filtering services, or perhaps simpler and more securely, 2) acting as a
platform for running filtering algorithms defined by the filtering services and
returning rankings, without divulging the data itself.
Perhaps emerging technologies for secure data sharing (such
as those described by Pentland) might allow the fiduciaries to grant the filtering
services controlled and limited access to this data. But that is not necessary
to this architecture -- as noted above, the simpler solution appears to be that
of having the infomediaries act as a platform for running filtering algorithms
defined by the filtering services without divulging the data itself. Send the
algorithm to the data.
Adapting the approach and terminology suggested by Wolfram,
the infomediary retains operational control of the filtering operation, and all
the data used for that -- working essentially as a “final ranking provider” – as
a fully trusted level of user agent. But the setting of specific criteria for
that ranking is delegated to one or more user-chosen filtering services to operate
essentially as a “constraint provider” that instructs that rankings to be done
in accord with the preferences they set on behalf of their users. (In contrast,
the platforms now serve as both constraint providers and final ranking
providers -- and users have very little say in how that is done.)
Note that, ideally, these rankings should be done in a composable
format, such that rankings from multiple filters can be arithmetically combined
into a composite ranking. This might be done with relative weightings that users
can select for each filtering service, such as with sliders, to compose an
overall ranking drawn from on all the services they choose. Users might be
enabled to change their filter selections and weightings at any time to suit varying
objectives and moods. Thus, users control the filters by choosing the filtering
services (and setting any variations they enable), but the actual process of filtering
and the data needed for that remains within the cooperating array of secure infomediaries.
Back to Keller’s concerns: The boundaries between the
platforms and the infomediaries are clear and with well-defined interfaces,
much as in any complex, evolving ecosystem. There is nothing shared with
competitors, only with partners. It is co-opetition, among trusted peers, on
how shared data is used and protected, at what price. The personal data never
goes beyond a team of infomediaries, all trusted with purpose-specific portions
of one-another’s clients’ data. There is no more implementation complexity than
in Google’s ad business. It won’t happen in a day, but it is eminently do-able –
if we really want.
Improved functionality for filtering, blocking, and flow
control
Consider how this two-level architecture can enable rich
functionality with diverse characteristics needed to address the multi-faceted challenges of filtering, blocking,
and flow control as we face new technical/social issues like "ampliganda." Growing evidence favors not just filtering (ranking and
recommenders) or blocking (takedowns and bans), but flow controls. These include
circuit-breakers and other forms of friction that can slow the effects of
virality (such as nudging users to take time and read an item before sharing it).
The infomediaries could pool their real-time network flow data to serve as the empowered
coordinating locus for such measures -- with diversity, and with independence from
the platforms.
The infomediaries might also be the independent coordinating
locus for takedowns of truly illegal content in ways that protect user rights
of privacy, explanation, and appeal, much as common carriers handle such roles
in traditional telecom services. Criteria here might be relatively top-down
(because takedowns are draconian binaries), in contrast to the bottom-up
rankings of the filtering services (which are fuzzy, not preventing anyone from
seeing content, merely making it less likely to be fed into one’s attention). The infomediaries could better shielded these functions from corporate or political interference than
leaving it with the platforms. They would serve as an institutional later insulated from platform control. The infomediaries could outsource takedown decision inputs to specialized services (much like email spam blocking services) that could compete
based on expertise in various domains. Here again, the co-opetition among trusted
peers (and their agents) keeps private data secure.
Note that this can evolve to a more general infrastructure
that works across multiple social media platforms and user subsets. It can also
support higher levels of user communities and special interest groups on this
same infrastructure, so that the notion of independent platforms can blur into
independent groups, communities, using a full suite of interaction modalities,
all on a common backbone network infrastructure.
Whatever the operational details, the primary responsibility
for control of personal data would remain with the infomediaries, as data custodians
for the data relating to the users they serve. To the extent that the platforms
and/or filtering services (and other cooperating infomediaries) have access to
that data at all, it could be limited to their specifically authorized transient
needs and removed from their reach as soon as that need is satisfied -- subject
to legal audits and enforcement. That enables powerful filtering based on rich
data across platforms and user populations.
This is not unlike how trust has long been enforced in financial
services ecosystems. Is our information ecosystem less critical to our welfare
than our financial ecosystem? Is our ability to exchange our ideas less
critical than our financial exchanges?
[Update 11/8/21:] Feedback from Sandy Pentland (a panelist for the upcoming Data Cooperatives session) led me to the introduction to his new book, which provides an excellent perspective on how this kind of infomediary can evolve, and be distributed in a largely bottom-up way. My description above highlights the institutional role of infomediaries and how they can balance top-down order to serve users -- but Sandy's book suggests how, as these new data technologies mature, they might provide a much more fully distributed blend of bottom-up control and cooperation that can still balance privacy and autonomy with constructive social mediation processes.
[Update
11/10/21:] There
was discussion of data cooperatives as relevant to filtering middleware in the 11/9
HAI middleware session. Panelist Katrina Ligett emphasized the need consider not only
content items, but the data about the social flow graph of how content moves
through the network and draws telling reactions from users. She referred to
data cooperatives as another kind of middleware, and Ashish Goel also saw
promise in this other kind of middleware. I will be writing more on that.
NOW ONLINE: Directions Toward Re-Architecting Social Media to Serve Society
------
For additional background, see the Selected Items tab.