Smartly Intertwingled: Resolving Speech, Biz Model, and Privacy Issues – An Infomediary Infrastructure for Social Media?

A Quick Sketch for Discussion: Formative thoughts on addressing open concerns, posted in anticipation of a 11/9 conference session at Stanford on the “middleware” unbundling proposals. (This also suggests linkage to an 11/10 session on “data cooperatives” at the same event). [Update: As noted at the end, there was some discussion at the 11/9 conference session that was generally supportive of the directions suggested here.]

Abstract

Recent proposals to unbundle filtering services from social media platforms to better serve user interests have generated support, tempered by concern -- notably about business models and privacy protection. Instead of the one-level functional unbundling that has been proposed, these concerns may be better handled by a two-level unbundling.

Between the platforms and the large numbers of unbundled filtering services that need resources and access to sensitive personal data to filter effectively on their users’ behalf, add a layer with a small number of better-resourced “infomediaries” that are fiduciaries for users. The infomediaries can manage coordination of services, data protection, and revenue sharing in service to user interests, and enable the many independent filtering services to share resources and run their filters in privacy-protected ways.

The time may be ripe for the long-gestating idea of “infomediaries” to emerge as a linchpin for resolving some of the management and control dilemmas we now face with social media. The session with Francis Fukuyama and others that I moderated at the Tech Policy Press event on 10/7, Reconciling Social Media & Democracy: Fukuyama, Keller, Maréchal & Reisman (along with other speakers that followed) generated a wide-ranging discussion of issues with those proposals that provide context for the upcoming session on these proposals he will participate in at Stanford.

Knotty problems with the “middleware” proposal

The unbundling proposals that Fukuyama, I, and others advocate have been viewed to have considerable appeal in principle, but the 10/7 discussion sharpened many previously raised questions about whether they can work -- relating to speech, business models, privacy, compatibility and interoperability, and technological feasibility.

Reflecting on the privacy issues led me to refocus on “infomediaries” as an important part of a solution, and how they might clarify the business model issues, as well. Infomediaries were first proposed in the dot-com era, as agents of consumers that could negotiate with businesses over data and attention, to give consumers control and compensation for their information. The imbalance of power over consumers has grown in the world of e-commerce, but social media have given this even more importance and urgency.

The unbundling proposal is to spin out the filtering of what users see in their newsfeeds from the platforms -- to create independent filtering “middleware” services that users select in an open market to serve as their agents. There is wide agreement that their ad-engagement-driven business model drives social media to promote harmful speech in powerful and dangerous ways. Fukuyama raised an even deeper concern that the concentration of power to control what we each see is a mortal threat to democracy, “a loaded gun sitting on the table” that we cannot rely on good actors to not pick up.

Unbundling of the filtering services would take that loaded gun from the platforms (and those who might coerce them) and reduce its power -- by giving individual users more independent control of what they see in their social media newsfeeds and recommendations. But -- how can those unbundled services be funded, since users seem disinclined to pay for them? -- and how can the filtering services use the personal data needed to do filtering effectively without breaches of privacy?

This problem is compounded because we would want a wide diversity of filtering services innovating and competing for users. Many would be small, and under-resourced -- and there is no simple, automated solution to understanding the content they filter and its authority.

How would they have the resources -- to not only do the basic filtering task of ranking, but also to moderate the overwhelming firehose of harmful content that already taxes the ability of giants like Facebook?
How would a multitude of small filtering services be able to protect non-public multi-party content, as well as the multi-party personal metadata, needed to understand the provenance and authority of what they filter?

These are challenging tasks, and there is reluctance to proceed without a clear idea of how we might operationalize a solution.

The role of infomediaries

I suggest the answer to this dilemma could be a more sophisticated distribution of functions. Not just two levels: of platform and filtering services (as user agents); but three levels: of platform, of infomediaries (as a few, privileged user agents), and of filtering services (as many, more limited user agents).

"Infomediaries" (Information intermediaries) were suggested in 1997 in Harvard Business Review --as a trusted user agent that manages a consumer’s data and attention -- and negotiates with businesses on how it is used and for what compensation. Similar ideas re-surfaced in a law review article in 2016 as "Information Fiduciaries" and then in HBR in 2018 as "Mediators of Individual Data" ("MIDs").

(As I was writing this, I learned that another session at the Stanford event is on more a recent variant, “Data Cooperatives.” Despite that coincidence, I am unaware a connection has been seen, except for the observation in this recent work that social media data is not individual but “collective.” If the participants at those two sessions are not in communication, I suggest that might be productive.)

Why have infomediaries not materialized in any significant way? It seems network effects and the "original sin of the Internet," advertising, have proven so hugely powerful that infomediaries never got critical mass in commerce beyond narrow uses. (I was CTO from ‘98-‘00 for a basic kind of infomediary service that had some success before the crash.)

But now, with the harms of social media bringing the broader abuses of “attention capitalism” to a head, regulators may see that the only way to systematically limit these harms – and the harms of attention capitalism more broadly -- is to mandate the creation of infomediaries to serve as negotiating and custodial agents for consumers. They offer a way to enable business models that balance business power with consumer power, especially regarding compensation for attention and data -- in ways that empower users to decide what to allow, for what benefit. They also offer a new solution to protecting sensitive multi-party social media messages and related metadata -- while enabling society to refine and benefit from the wisdom of the crowd that it contains -- to help us manage our attention.

Here is a sketch of how filtering services might be supported by infomediaries. Working out the details will be a complex task that should be guided by a dedicated Digital Regulatory Agency with significant business and independent expert participation.

Put all personal data of social media users under the control of carefully regulated infomediaries (IMs) who interface with the platforms and the filtering services (FSs), as fiduciary agents for their users. Create a small number of infomediaries (five to seven?) to support defined subsets of users. After that, users would be free to migrate among infomediaries in a free market -- and very limited numbers of new infomediary entrants might be enabled.
Spin out the filtering services from the platforms – and create processes to encourage new entrants. The infomediaries would cooperate to enable the filtering services to benefit from the data of all qualified infomediaries, while protecting personally identifiable data.
Empower the infomediaries to negotiate a share of advertising revenue from the platforms on behalf of their users, in compensation for their data and attention – to be shared with the filtering services (and perhaps the users). Provide alternatively for a mix of user support or public subsidy much like existing public media. Ideally that could grow to include user support for the platforms as an alternative to some or all advertising.
Use regulatory power to work with industry to manage interface standards and the ongoing conduct of these roles and negotiations, much as other essential, complex, and dynamic industries like finance, telecom, transport, power, and other utilities are regulated. Creation of new infomediaries might be strictly limited by regulators, much like banks or securities exchanges.

The virtue of this two-level unbundling architecture is that it concentrates elements of the infomediary role that have network-wide impact and sensitive data in a small number of large competitive entities -- they could apply the necessary resources and technology to maintain privacy and provide complex services, with some competitive diversity. It enables much larger numbers of filtering services that serve diverse user needs to be lean and unburdened.

Because the new infomediaries would be accredited custodians of sensitive messaging data, as fiduciaries for the users, they could share that data among themselves, providing a collective resource to safely power the filtering services.

This could be done in two ways: 1) by providing purpose and time-limited, privacy protected data to the filtering services, or perhaps simpler and more secure, 2) by acting as a platform that runs filtering algorithms defined by the filtering services and returning rankings without divulging the data itself. (More on how that can be done, and why, is below). Either way, the platforms would no longer control or be gatekeepers for the filtering.

This multilevel breakup may sound very complex and raise questions of regulatory power, but it would be very analogous to the breakup of the Bell System, which unbundled the integrated AT&T into a long-distance service (AT&T), seven regional local-service operating companies (RBOCs), and a manufacturing service (Lucent), all of which were opened to new competitive entrants, unleashing a torrent of valuable innovation.

As our social media ecosystem becomes the underlying fabric of most human discourse, a similarly ambitious undertaking is not only economically desirable and justifiable, but essential to the survival of democracy and free speech. Functional specialization multiplies the number of entities, but it simplifies the tasks of those entities – and enables competition, innovation, and resilience. To the fear of technical solutions to social problems that Nathalie Marechal spoke of, I submit that the problem of algorithms that select for virality (thus exacerbating a social problem) is a newly-created technical problem, driven by an incentives problem – one that this architecture (or some improvement on it) can help solve.

A casual reader might stop here. The following sections dig deeper on how this addresses first, the business model challenges that infomediaries were conceived to solve, and then, the difficult privacy issues of middleware unbundling and other problems that they seem might help finesse.

----

Looking deeper...

Resolving the business model issues

Even as one who had read it then, it is now enlightening to turn the clock back to 1997 to read the original HBR article on infomediaries by John Hagel III and Jeffrey F. Rayport, The Coming Battle for Customer Information,” for perspective on the current problems of surveillance and attention capitalism. The authors predicted:

In order to help [consumers] strike the best bargain with vendors, new intermediaries will emerge. They will aggregate consumers and negotiate on their behalf within the economic definition of privacy determined by their clients. … When ownership of information shifts to the consumer, a new form of supply is created. By connecting information supply with information demand and by helping both parties involved determine the value of that information, infomediaries would be building a new kind of information supply chain.”

A 1999 book co-authored by Hagel greatly expands on this idea (and is also worth a look). It specifically refers to “filtering services,” to include or exclude marketing messages to match the need or preferences of its clients.

Growing due to network effects and scale economies, vendors like Amazon and ad-tech services like Google and Facebook have effectively usurped the vendor side of the infomediary function. These powers are now so entrenched and engorged with obscene profits that there is little hope that infomediaries that do represent user interests can emerge without regulatory action.

The proposal that unbundled filtering services be funded by as revenue share from the platforms has struck critics as implausible and complex. But if that role is not dispersed, among large numbers of often-small filtering services, but managed by a small number of larger infomediaries who have a mandate from regulators, the task may be far more tractable.

Yes, this would be a complex ecosystem, with multiple levels of cooperating businesses for which economically sound revenue shares would need to be negotiated. Ad revenues from platforms to infomediaries, to filtering services, and possibly to consumers. Or alternatively, from consumers or sponsors or public funding -- in whichever direction makes corresponds to the value transfer. But many industries – such as financial services. ad-tech, telecom, logistics -- flourish with equally complex revenue shares (whether called shares, fees, commissions, settlements or whatever), often overseen by regulators that ensure fairness.

Once such a multiplayer market begins to operate, innovation can enable better revenue models. My 2018 article “Reverse the Biz Model” explored some possible variations, and explained how they could work via infomediaries, or directly between business and consumer. It also suggested how consumer funding to eliminate ads on an individual basis could be commensurate with ability to pay. The inherent economics are more egalitarian than one might first think because those with low income have low value to advertisers. They would have to contribute less to compensate for lost ad revenue. Mediated well, users could even benefit from whatever level of non-intrusive and relevant advertising they desire, and platforms would still bring in sufficient funding to disperse through the ecosystem -- perhaps more than now, given that there would be less waste. (Note that filtering services might specialize in advertising/marketing messages or in user-generated content to better address the different issues for each.)

Some fear that having filtering services receive funding from advertising, even indirectly, would continue the perverse incentives for engagement that are so harmful. But revenue shares to the infomediaries and filtering services need not be tied to engagement – they could be tied to monthly average users or other user-value-based metrics. With a multitude of filtering services, the value of engagement to the platform would be decoupled, so that no individual filtering service would materially affect engagement. These services might be structured as nonprofits, benefit corporations, or cooperatives, to further shift incentives toward user and social value.

Resolving the privacy issues

The other key opportunity for infomediaries is to manage data privacy. This takes on special significance because key aspects of filtering and recommendations depend on either message content or the metadata about how users interact with those messages -- both of which are often privacy-sensitive. Importantly, as noted by the recent proposals for data cooperatives, that data is not individual, but collective.

Infomediaries may offer a way to finesse the concerns pinpointed in the 10/7 discussion. I suggested that the most promising strategy for filtering to understand quality -- given the limitations of AI and of human review of billions of content items in hundreds of languages and contexts -- is to use the metadata that signals how other users responded to that content. Daphne Keller nicely delineated the privacy concern:

… I think a lot of content moderation does depend on metadata. For example, spam detection and demotion is very much driven by metadata. And Twitter has said that a lot of how they detect terrorist content, isn’t really by the content, it’s by the patterns of connections between accounts following each other or coming from the same IP address or appearing the same– those aren’t the examples they gave, but what I assume they’re using. And I think it’s a big part of what Camille Francois has called the ABC framework, the Actors-Behavior-Content, as these three frameworks for approaching responding to problematic online content.

And I think it just makes everything much harder because if we pretend that metadata isn’t useful to content moderation, that kind of simplifies things. If we acknowledge that metadata is useful, that is often personally identifiable data about users, including users who haven’t signed up for this new middleware provider, and it’s a different kind of personally identifiable data than just the fact that they posted particular content at a particular time. And all of the concerns that I raised, but in particular, the privacy concern and just like how do we even do this? What is the technology that takes metadata structured around the backend engineering of Twitter or whomever and share it with a competitor? That gets really hard. So I’m scared to hear you bring up metadata because that adds another layer of questions I’m not sure how to solve.

This is what drove me to refocus on infomediaries as the way to cut through the dilemma. The platforms could have filtered using as much of this data as the wished, since they now control that data. Similar data is central to Google search (the PageRank algorithm that was the key to their success) -- but search is less driven by engagement than social media.

Privacy has been a sore point for the unbundling of filtering. The kind of issues that Keller raised led Fukuyama and his colleagues to back off from the broadest unbundling to advocate more limited ambitions, such as labelling, that are content-based and rather than metadata-based. He points to services like NewsGuard that rate news sources for their credibility. As I have argued elsewhere, that is a useful service, but severely limited because it only applies to limited numbers of established news services (which do represent large amounts of content), not the billions of user-generated content sources (obviously significant in aggregate, but intractable for expert ratings). Instead, I suggest using metadata to draw out the wisdom of crowds, much as Google does. Recent studies support the idea that crowdsourced assessment of quality can be as good as expert ratings, and there is no question that automated crowdsourced methods that draw on passively obtained metadata are far more capable of operating at Internet scale and speed – the only solution that can really scale as needed.

Thus, it would be a huge loss to society to not be able to filter social media based on interaction metadata --- an infomediary strategy for making that feasible is well worth some added complexity. A manageable number of infomediaries could manage this data to include most (but not necessarily all) users in this crowdsourcing. Each infomediary would only have a subset of the users’ data, but that data could be pooled among properly regulated infomediaries and restricted to use only in filtering.

More technical/operational detail on filtering and data protection

As noted above, and drawing on work on trust and data sharing by Sandy Pentland (one of the speakers in the Stanford Data Cooperatives session), and similar suggestions by Stephen Wolfram (in his 2019 testimony to a US Senate subcommittee), there seem to be two basic alternatives: 1) providing limited, privacy protected data to the filtering services, or perhaps simpler and more securely, 2) acting as a platform for running filtering algorithms defined by the filtering services and returning rankings, without divulging the data itself.

Perhaps emerging technologies for secure data sharing (such as those described by Pentland) might allow the fiduciaries to grant the filtering services controlled and limited access to this data. But that is not necessary to this architecture -- as noted above, the simpler solution appears to be that of having the infomediaries act as a platform for running filtering algorithms defined by the filtering services without divulging the data itself. Send the algorithm to the data.

Adapting the approach and terminology suggested by Wolfram, the infomediary retains operational control of the filtering operation, and all the data used for that -- working essentially as a “final ranking provider” – as a fully trusted level of user agent. But the setting of specific criteria for that ranking is delegated to one or more user-chosen filtering services to operate essentially as a “constraint provider” that instructs that rankings to be done in accord with the preferences they set on behalf of their users. (In contrast, the platforms now serve as both constraint providers and final ranking providers -- and users have very little say in how that is done.)

Note that, ideally, these rankings should be done in a composable format, such that rankings from multiple filters can be arithmetically combined into a composite ranking. This might be done with relative weightings that users can select for each filtering service, such as with sliders, to compose an overall ranking drawn from on all the services they choose. Users might be enabled to change their filter selections and weightings at any time to suit varying objectives and moods. Thus, users control the filters by choosing the filtering services (and setting any variations they enable), but the actual process of filtering and the data needed for that remains within the cooperating array of secure infomediaries.

Back to Keller’s concerns: The boundaries between the platforms and the infomediaries are clear and with well-defined interfaces, much as in any complex, evolving ecosystem. There is nothing shared with competitors, only with partners. It is co-opetition, among trusted peers, on how shared data is used and protected, at what price. The personal data never goes beyond a team of infomediaries, all trusted with purpose-specific portions of one-another’s clients’ data. There is no more implementation complexity than in Google’s ad business. It won’t happen in a day, but it is eminently do-able – if we really want.

Improved functionality for filtering, blocking, and flow control

Consider how this two-level architecture can enable rich functionality with diverse characteristics needed to address the multi-faceted challenges of filtering, blocking, and flow control as we face new technical/social issues like "ampliganda." Growing evidence favors not just filtering (ranking and recommenders) or blocking (takedowns and bans), but flow controls. These include circuit-breakers and other forms of friction that can slow the effects of virality (such as nudging users to take time and read an item before sharing it). The infomediaries could pool their real-time network flow data to serve as the empowered coordinating locus for such measures -- with diversity, and with independence from the platforms.

The infomediaries might also be the independent coordinating locus for takedowns of truly illegal content in ways that protect user rights of privacy, explanation, and appeal, much as common carriers handle such roles in traditional telecom services. Criteria here might be relatively top-down (because takedowns are draconian binaries), in contrast to the bottom-up rankings of the filtering services (which are fuzzy, not preventing anyone from seeing content, merely making it less likely to be fed into one’s attention). The infomediaries could better shielded these functions from corporate or political interference than leaving it with the platforms. They would serve as an institutional later insulated from platform control. The infomediaries could outsource takedown decision inputs to specialized services (much like email spam blocking services) that could compete based on expertise in various domains. Here again, the co-opetition among trusted peers (and their agents) keeps private data secure.

Note that this can evolve to a more general infrastructure that works across multiple social media platforms and user subsets. It can also support higher levels of user communities and special interest groups on this same infrastructure, so that the notion of independent platforms can blur into independent groups, communities, using a full suite of interaction modalities, all on a common backbone network infrastructure.

Whatever the operational details, the primary responsibility for control of personal data would remain with the infomediaries, as data custodians for the data relating to the users they serve. To the extent that the platforms and/or filtering services (and other cooperating infomediaries) have access to that data at all, it could be limited to their specifically authorized transient needs and removed from their reach as soon as that need is satisfied -- subject to legal audits and enforcement. That enables powerful filtering based on rich data across platforms and user populations.

This is not unlike how trust has long been enforced in financial services ecosystems. Is our information ecosystem less critical to our welfare than our financial ecosystem? Is our ability to exchange our ideas less critical than our financial exchanges?

[Update 11/8/21:] Feedback from Sandy Pentland (a panelist for the upcoming Data Cooperatives session) led me to the introduction to his new book, which provides an excellent perspective on how this kind of infomediary can evolve, and be distributed in a largely bottom-up way. My description above highlights the institutional role of infomediaries and how they can balance top-down order to serve users -- but Sandy's book suggests how, as these new data technologies mature, they might provide a much more fully distributed blend of bottom-up control and cooperation that can still balance privacy and autonomy with constructive social mediation processes.

[Update 11/10/21:] There was discussion of data cooperatives as relevant to filtering middleware in the 11/9 HAI middleware session. Panelist Katrina Ligett emphasized the need consider not only content items, but the data about the social flow graph of how content moves through the network and draws telling reactions from users. She referred to data cooperatives as another kind of middleware, and Ashish Goel also saw promise in this other kind of middleware. I will be writing more on that.
NOW ONLINE: Directions Toward Re-Architecting Social Media to Serve Society

------

For additional background, see the Selected Items tab.

Smartly Intertwingled

Pages

Wednesday, November 03, 2021

Resolving Speech, Biz Model, and Privacy Issues – An Infomediary Infrastructure for Social Media?

No comments:

Post a Comment