Faceted Id/entity:
Managing representation in a digital world

danah boyd
MIT Media Lab
Master's Thesis

Thesis document [pdf]
- Abstract
- Introduction
- Negotiating Identity in Social Interactions
- Reconsidering Social Interaction for the Digital Realm
- Self-Awareness in Social Interactions
- Digital Identity Management
- Example Applications
- Social Network Fragments: A Self-Awareness Application
- SecureId: An Identity Management Application
- Conclusion
- Bibliography

Related Projects
- Social Network Fragments
- SecureId

About the author


Chapter 5: Digital Identity Management

When we present ourselves to others, we want both awareness and control over what and how we are presenting ourselves. Without the ability to manage the aspects of the self that we perform, awareness is simply a reflective exercise. To truly empower people's interactions online, they must have the ability to manage the impressions that they construct, the information that they provide. Yet, giving people these abilities is a challenging design task.

Offline, interactions have an ephemeral quality. While initial impressions certainly impact all future negotiations, data is not persistent. Thus, future interactions are only impacted by memory-driven impressions, not by the constant reemergence of previous interactions. Online, social data is quite persistent. Thus, it is not only the initial impressions that matter, but also how well the data from previous interactions persists in an archived, out-of-context manner. While the ability to research someone online is quite valuable, finding a young professional's angst-ridden tirades from early teenage years is not necessarily valuable or appropriate in deriving an impression. Yet, with that data archived, the young professional has no way to eliminate that decade-old data and must always confront the impressions that it renders. Persistent interactions create immense challenges for identity and impression management.

I refer to both identity and impression management, because they are quite tangled conceptually, yet they cover separate ideas that must be considered. In social interactions, a viewer perceives both the identity information that one is conveying as well as more underlying information that strikes impressions on the viewer. While the former can include things such as one's occupation and political leanings, the latter is much more difficult to tack down. People leave impressions on others simply in the way in which they smile. While people aim to leave specific impressions, they must rely on and react to the other's perceptions. Impression management, as detailed by Goffman (1959), is the negotiation of leaving and receiving impressions.

On the other hand, identity management is more concerned with the underlying structure of what's presented about the individual when making impressions. Identity management is the controlling systems behind impression management, as it is the facets of one's identity that one controls during presentations. Identity management is highly affected by the impressions that one leaves, but one manages one's identity regardless of those impressions. Impression management is completely tied to the reactions of others; without those reactions, there are no impressions.

While impression management is certainly crucial for identity management and for the construction of oneself online, it requires a level of awareness of others' reactions that is not currently possible online and is outside of the scope of this thesis. Thus, for the remainder of this chapter, i focus solely on identity management and address the resultant impressions when appropriate. The goal of this chapter is to discuss what, beyond self-awareness, is needed for one to properly manage their digital identity. I start by discussing why control is necessary online, introduce some of the current systems for digital identity management and then propose some of my own thoughts on this matter. This chapter prepares the reader for considering the issues raised by SecureId, an identity management tool for users that is addressed in detail in Chapter 8.

Why control? Why management?

As i discussed in Chapter 2 and Chapter 3, identity management empowers people to regulate their social behavior and engage in more meaningful social interactions. Between the persistence of data, the collapsing of contexts, and the marketability of their identity, people have very little say in how their identity is represented online. For this reason, people desire the ability to manage and control their presentation.

By lacking even basic control over the system's abilities, many people feel immediately disempowered. The market encourages both surveillance and profiling. Online, people cannot access many services without submitting to the profiling requests of corporations; if they do not agree to the terms of service, they have no mechanism to dissent and still utilize the systems. The data that they provide to one service can be bought and sold, where the terms of service are changed and implemented with no form of recourse by the user; they do not own the data that goes through other people's servers.

Yet, users choose to use these systems because they provide a service that people see as valuable. Without realizing how valuable their data is people are willing to sell it in return for what appear to be free services. Yet, this reduction in privacy awareness and automatic protection of data is precisely what worries many privacy experts (Rosen 2000; Garfinkel 2000; Lessig 1999; EPIC 1994). While corporations are more rigorously requesting profiling data, and privacy experts attempt to educate the public, online participants are working within the systems to provide what they believe to be anonymous or falsified information. The current environment encourages anonymity and deception by users who seek out privacy and have no other method of access.

People lack control because the architecture makes it easy for the market to seize access to such people's presentations, as assets owned by the companies. People lack control because they do not realize how valuable their information is, what they are giving away, or how corporations use the data to profit at the expense of individuals. People lack control because they are not aware of their own presentation, let alone understand what it would mean to have the tools for control. Yet, the people's naïveté is not an excuse for the abuse of their privacy. While the law curbs the most egregious abuses of data control, it will not provide the level of protection that users need to develop a rich social environment. Thus, it becomes the responsibility of designers to consider the needs and interests of the users and construct environments that provide them with the ability to control their data and barter it at will, not on demand.

The value of identity control is not simply autonomy and freedom, but it is the underlying structure necessary for people to develop rich social environments. Lacking the ability to manage one's presentation in a faceted and contextual manner, anonymity will remain the only option for those who seek control. Even the mechanisms by which people create context discussed in Chapter 3 only provide a temporary bandage over the growing wound in individual control.

Regulation through federated identity

When online users attempt to regain control of their identity online, they do so through anonymity or multiple accounts. Although these mechanisms create control for some, they also provide an environment in which fraudulent behavior, harassment, hate speech, abusive deception, and other less desirable qualities of society can flourish. Not surprisingly, corporations are seeking accountability, if for no other reason than to eliminate the fraudulent abuses that are costing them economically.

Seeing these abuses as intimately tied to the ability for users to have anonymous and multiple digital personas, there has been a recent push for genuine authentication combined with the elimination of multiple logins. With proposals such as Microsoft's Passport and Sun's Liberty Alliance , corporations are drumming up support for single login systems as a mechanism to end abuses and ease the hassle that users experience by maintaining numerous accounts. While many of the intentions of these systems are admirable, they not only ignore privacy issues and put users at notable risk, but they also fail to accommodate the need that users have for controlling their own data and representation. Without serious design reconsiderations, such systems run the risk of providing the ideal digital Panopticon, where an authority figure is able to observe every action of all individuals without them knowing what is being observed, when or for what purpose. As Foucault (1995/1975) recognized, such structure provides external discipline and control out of fear. Such an environment is not advantageous to social interaction, particularly for marginalized individuals.

Considering Microsoft's Passport

In order to reflect on the design issues of these systems, consider Microsoft's Passport. As the name implies, this system is designed to provide a singular access point to many sites on the Internet. Yet, as is poorly indicated through such a metaphor, Microsoft maintains the information in one's Passport. When the user creates a Passport, they are asked to provide traditional corporate profiling information: name, email, sex, occupation, income, postal code, etc. In order to gain access to the federated sites that have integrated Passport, users must provide the site with their Passport. When authenticating the user's login, the site can also access the profile information that Microsoft has collected about the user. The site may then link this information with its own database of information and provide the content to its advertisers. As it appears from the technical notes on Passport, Microsoft does not currently receive any of the information that other companies collect about the user. In addition to the profile data that Microsoft maintains, Wallet, which is a component of Passport, maintains encrypted credit card information about the users for their ease of access. The metaphor of this is also noted, as one does not hold one's Wallet; Microsoft maintains it for the user.

While any site can pay to join the Passport authentication system federation, many of the sites that require Passport are Microsoft's, and not just those that focus on e-commerce. Microsoft's Communities portal, which provides users with Hotmail email access, chatrooms, message boards and instant messenger requires users to authenticate with Passport. As these technologies are the basis for many people's digital experience, Microsoft can easily associate one's profile data with one's social network, IP address, login habits, and other data. Therefore, regardless of its connections with other sites, Microsoft maintains most of the valuable data about one's digital presentation.

As users can only be logged in to one Passport at a time, it is not simple to maintain separate Passports for separate application contexts. This is magnified if users want to regularly access their instant messenger or email, applications which users tend to leave running throughout the duration of their connection. With the latest version of Windows, users are limited in what applications or information they may get if they are not logged into their Passport; thus, upon their initialization of Windows XP, they are actively encouraged to create an account. As this information is integrated in both social environments as well as commerce ones, certain information cannot be hidden, as the user is unlikely to purchase a book and have it shipped to a false address because of their desire to maintain privacy. Thus, by requiring the user to provide certain accurate and authenticated information in the commerce environment, they are bound to convey the same information in the social environment, regardless of its potential impact. Such a scheme provides Microsoft and their collaborators with a system that practically requires users to provide authentic data.

Just as a person can maintain passports for each nationality, a digital individual may currently control multiple Passports. While multiple logins provide users with the ability to present the proper form of identification in the proper scenario, they raise some of the same questions as their physical counterpart. When is it appropriate to provide which passport? Once you enter a country with one passport, you must use that one throughout the duration of your stay. What happens when aspects of that passport are considered socially unacceptable? Why can you not travel on multiple passports at once?

Such hassles limit the number of Passports that users are motivated to maintain, as it is quite inconvenient to have to log off of IM in order to check an email account that is associated with a separate facet. Therefore, only the highest self-monitors are likely to maintain these distinctions, just as some of the few cell phone users who maintain separate SIM cell phone cards are gay men (Green, et. al. 2001) and business men who work in both Hong Kong and China (Bell, 2001). Those with the greatest risk recognize the social and personal consequences.

Although managing the separate Passports is a nuisance, it does provide a strict boundary between two different facets of one's identity. At any given time, an individual can only be presenting one facet. Such separation allows for the strict separation that employers desire, so as to limit their employees from surfing and checking personal email at work. While this separation more accurately mimics physical life behaviors and an employer's ideal situation, it is not in synch with the typical user's behaviors, as most users are frequently managing unrelated interactions simultaneously.

While it is possible to maintain and manage multiple Passports, this is not encouraged behavior. With security-driven screams for a national ID, both in the United States and abroad, and an increased desire for authentication, it is it is quite reasonable to assume that it will not always be possible to separate one's identity online. The designers at Microsoft certainly recognize that a system such as Passport is a valuable way to curb unacceptable online behavior, yet they fail to acknowledge that they are also upsetting certain types of beneficial social behavior. With a uniform Passport, a sociable user is required to choose one of two values for "gender" - male or female. By default, this marker is accessible to anyone with whom the individual interacts, regardless of the social setting. As i discussed in "Sexing the Internet," this alters the social realm by sexualizing the environment and creating unnecessary expectations, built on poorly constructed mental models drawn from coarse data (boyd 2001). While intended for aggregate use only, even the Federal Trade Commission (2000) recognizes that online profiling must be addressed. The limitations of profile data, particularly static and uniform profile data are one of the weaknesses of a system such as Passport.

Perhaps the most problematic impact of Passport is that it eliminates the user's context replacement without providing a reasonable alternative. Although users can create multiple accounts if they feel the pressure to separate their facets, this system magnifies the difficulty in doing so and does not help provide the contextual information and separation that users are seeking to recover. By creating a uniform login across multiple sites, Passport furthers the collapsing of contexts. Prior to Passport, advertisers might have guessed when users from different sites represent the same user, often through IP address matching or connected email addresses. With Passport, Microsoft does not even need to collect all of the data for it to be collapsed outside of the user's control. Passport requires that users have the same login name for all of the different sites. Thus, any information recorded in the cookies for a given login is guaranteed to be the same individual; collection of mass data becomes quite a bit simpler.

Corporate control of personal data

In 1965, worried about potential unethical abuses of a national databank, the United States Congress decided to not pursue a National Data Center until individual privacies could be guaranteed (Garfinkel 2000: 14-15). With that decision, and the privacy regulations that unfolded in the 1970s, the United States made it difficult for the government to collect and maintain integrated records on its citizens. Yet, there are no restrictions on what the private sector can collect. While government agencies and credit bureaus are required to publicize their algorithms for computing scores and provide users with a mechanism for disputing the data kept on the individual, the private sector has no such regulations. Corporations do not need to make available the data that they have collected, nor the methods by which they evaluate their users. They do not need to address users' disagreements nor do they need to change inaccurate information. As long as the fine print reminds users that their accounts can be terminated at any time, for any reason, corporations can deny service without even offering an explanation (Scheeres 2002). Since users have no alternative to these contracts, they are bound to a set of unregulated restrictions that rely on a set of values that are at the whim of the site.

Users lack the recourse options for dissenting to contracts or challenging the data about their behavior that has been collected. They also lack the ownership of their own data. When Google purchased the Usenet archives owned by Deja, they also purchased all of the Usenet content collected . The content is a collection of public statements made by individuals, yet those words were bought and sold without the permission of the users. Not surprisingly, users did not appreciate the commodification of their knowledge (Hauben 2002). In order to have their words removed from the archive, users must contact Google directly, either using the address from which the posts were made or otherwise proving their identity. Google promises to do their best at removing the data, yet they make no guarantees. Additionally, had an individual's statement been directly quoted by another user in the same thread, the individual has no recourse for removing that aspect of their content.

Any site that collects data on users can sell that data without the permission of the subjects and the purchaser does not have to abide by the contracts that the user agreed to when they gave the original site permission to use their data. At will, sites my change the contract, sell the data, and deny service without informing the user. For example, when eGroups was purchased by Yahoo!, users were surprised to find that they were locked from their data unless they provided Yahoo! with a complete profile and agreed to a new terms-of-service agreement. Had the user declined to do so, Yahoo! still owned their data and the archives of their correspondences. In October 2001, various listserv owners were stunned when all of their archives and data were deleted; they were given no explanation nor any form of recourse; all attempts at contacting Yahoo! resulted in a lack of response. Even reflecting on the terms of service offered no explanation, as most of those affected could see no conflict there. It was not until a Washington Post article (Cha 2001) was published that these owners even knew why their data had been deleted - Yahoo! had declared them terrorists.

Perhaps the reader is thinking that they might have been terrorist organizations, and perhaps many of them were. Instead, i am inclined to believe that many victims of this abuse of data ownership resembled my own situation during this time. Out of the 20+ listservs that i moderated and 50+ listservs that i receive messages from, two of the most heavily trafficked listservs that i moderated on Yahoo! disappeared without notice in early October. Their topical content was identical, as they were both listservs intended for college and worldwide organizers of V-Day productions. A non-profit aimed at raising money for organizations working to end violence against women, V-Day and its associated listservs had two offending qualities: they conversed about helping women in Afghanistan and they used "pornographic" terms, as they raise money through productions of "The Vagina Monologues." Throughout October, my attempts to get an explanation were ignored. After the Washington Post article was published, i contacted Yahoo! again, offering an explanation as to what the organization was, what we did and why we were not terrorists. Although i received no response, most of my archives were reinstated within the week. While i was relieved to understand why my listservs had suddenly disappeared, i was horrified to realize how little control i had over the content that i managed. Not only could my access be taken away at a moment's notice, but also Yahoo! continued to own my data after they deleted my access, such that the data could be recovered when it interested them.

While the tech-savvy user has the ability to avoid using corporate services to host their data, no one is free from the impact that this control has. When a user sends an email message to a Hotmail account, Microsoft now owns that data on their server. When an archiving system records webpages or Usenet posts, that system owns the data. Lack of control is about privacy as well as control and it affects everyone online.

Approaches to identity management

Lawrence Lessig argues that there are four mechanisms by which behavior can be controlled: the law, the market, the architecture and social norms (Lessig 1999). In Chapter 3, i dissected some of the underlying forces of the digital architecture and explained why the underlying architecture does not provide the means for people to enact socially normative regulation. In the last section, i introduced some ways in which the market regulates social behavior and personal identity. And while the law is only beginning to address issues of cyberspace, it is still entrenched in the metaphors between the physical and the digital, offering legislation and decisions that fail to acknowledge how the digital architecture is constructing a very different social environment.

In Lessig's model, regulation works best when the various forces are all operating effectively, yet this is not the case online. With the architecture dramatically affecting what is possible, social norms are often ineffective and the market is capitalizing on these changes while the legal community is not acting as though this space must be regulated differently. Although the law is already starting to impact what is acceptable usage (i.e. Intel vs. Hamidi ) and acceptable architecture (i.e. Napster), its approach to architectural change has focused on protecting corporate interests and copyright, ignoring individual interests and the underlying architecture. For example, when Napster was declared illegal, it was forced to shut down because its architecture promoted the exchange of copyright materials. As a result, peer-to-peer networks were built such that no one could be held responsible. Thus, ISPs began to regulate their traffic and most recently, new technologies are being considered to eliminate the ability to copy music and other data. Additionally, bills in Congress (such as the CBDTPA ) are attempting to legislate architecture without an understanding of the architectural confounds. Thus, the legal impact has mostly been an impetus for system designers to work around the barriers that the law has created.

In any case, the legal approach will only handle the most egregious of incidents; it is up to designers to adjust the architecture to give people control. In particular, architects have the opportunity to create environments that promote self-regulation instead of relying on the market and law to develop or require such construction. In order for people to properly self-regulate, they must be able to manage their representations. Thus, designers must develop systems for identity management that authenticate users in a manner that does not also degrade their ability to control their presentation in a meaningful way.

1) In order to empower users, an identity management system should give the individual ownership over their data, its use and its distribution. In effect, people must own the rights to their words, thoughts and data. Copyright and intellectual property (IP) are not simply about the protection of registered artists and their managers, but the publication of the thoughts of all people.

2) The system should allow users to choose when and to whom what types of information should be revealed. Individuals should be able to develop and maintain the facets of their identity and have control over the contexts in which those facets are presented. Users should be aware of what can be seen about them and have the ability to adjust that information.

3) Users should have the ability to present the level of information that they perceive is appropriate. Systems should not require users to share personal data in order to gain access, as this allows for discrimination.

4) Users should have control over the redistribution of their data. If personal data is worthy enough for companies to trade it in return for free services, users should have the right to acquire those services at a price comparable to the value of their data and users should be compensated for the profits made from their data. No system should aggregate or distribute a user's data without their permission.

Certainly, these ideas are utopian in the current digital era. The architecture does not support such control; data can be easily transferred and copied such that having control over the data is near impossible. Yet, as companies develop technology intended to protect copyrighted material, these efforts should be appropriated to afford users the same level of protection as artists. Much of what is needed requires cooperation from the companies that so actively seek to profit from their sole control over a user's data. Thus, changes must come from the architectural level, with social and legal support.

In order for the architectural changes to be effective, they must be implemented at one of two levels. Either the foundation of the digital environment must be fundamentally altered to allow control over bits, or mechanisms must be placed on top of the current environment to regain control. Although the former is ideal, the latter can be implemented without the cooperation of most corporations. It is with that in mind that i designed SecureId as a prototype to consider the issues in building an identity management tool. As is discussed in detail in Chapter 8, the process of developing SecureId revealed the immense challenges that lie ahead in order to properly give users identity management tools. Although i stand behind the theoretical approach that i have outlined above, i realize that it is only embryonic, as much work is necessary both conceptually and functionally to provide users with the proper information.