Assessing intrusiveness of smartphone apps

(1)

~i~)

Assessing Intrusiveness of Smartphone Apps

by

Fan Zhang

Submitted to the Department of Electrical Engineering and Computer

Science

in partial fulfillment of the requirements for the degree of

Master of Engineering in Electrical Engineering and Computer Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

September 2012

@2012

Fan Zhang. All rights reserved.

The author hereby grants to M.I.T. permission to reproduce and to

distribute publicly paper and electronic copies of this thesis document

in whole and in part in any medium now known or hereafter created.

A uthor ...

.

...

Department of Electrical Engineering and Computer Science

August 22, 2012

Certified by..

f

Accepted by

....

Hal Abelson

Professor

Thesis Supervisor

Prof. Dennis M. Free

-n

(2)

(3)

Assessing Intrusiveness of Smartphone Apps

by

Fan Zhang

Submitted to the Department of Electrical Engineering and Computer Science on August 22, 2012, in partial fulfillment of the

requirements for the degree of

Master of Engineering in Electrical Engineering and Computer Science

Abstract

We tackle the challenge of improving transparency for smartphone apps by focusing on the intrusiveness component of assessing privacy risk. Specifically, we develop a framework for qualitatively assessing and quantitatively measuring the intrusive-ness of apps based on their data access behavior. This framework has two essential components: 1) the Privacy Fingerprint, a concise yet holistic visual that captures the data access patterns unique to each app, including which types and under which privacy-relevant usage contexts sensitive data are collected, and 2) an Intrusiveness Score that numerically measures each app's level of intrusiveness, based on real data accesses gathered from empirical testing on about 40 popular Android apps across 4 app categories. Used together, the Privacy Fingerprint and Intrusiveness Score help smartphone users easily and accurately assess the relative intrusiveness of apps during the decision-making process of installing apps. Our study demonstrates that the Intrusiveness Score is especially useful in helping to compare apps that exhibit

similar types of data accesses. Another major contribution of the thesis is the iden-tification and quaniden-tification of the proportion of accesses that are made while the user is idle. As our preliminary user study will show, this level of idle access activity significantly enhances the profiling potential of apps, increasing the app's intrusive-ness. When quantified, idle access activity exerts significant impact on changes in an app's Intrusiveness Score and its relative intrusiveness ranking within a given app category.

Thesis Supervisor: Hal Abelson Title: Professor

(4)

(5)

Acknowledgments

This thesis would not have been possible without the help and support of many people.

First and foremost, I would like to thank my advisor, Hal Abelson, for his guidance and encouragement throughout these past years. His shrewd insights and constructive critiques have pushed me to refine my ideas and turn them into more compelling arguments. He has always been very supportive, believing in me even when I was faltering.

Fuming Shih has been a wonderful source of ideas and illuminating discussions. I thank him for his numerous insights and patience through our long weekly brain-storming sessions, where our wandering thoughts crystallized into fully-formed ideas. I appreciate his constant encouragement and his contagious enthusiasm. The novel ideas presented in this thesis are inspired by our talks.

Wen Dong helped me tremendously in processing the location data and turning them into much more appealing visuals and animations. I thank him for all his thorough explanations and useful tips.

Brian Patt was an enormous help in the initial stages of this thesis. He provided much valued expertise in navigating through undocumented Android Source Code. His advice was always very useful, and without his help, I would not have been able to move forward from the implementation stages.

K. Krasnow Waterman was a great source of background knowledge during the initial phases of framing the thesis. I thank her for helping me find a suitable topic that was worth studying.

My friends have provided me with constant emotional support. I thank them for

their understanding, even when I disappeared for long stretches of time to work on this thesis, and for being there to entertain me during my much needed breaks.

Finally, I'd like to thank my parents for their unwavering support and encourage-ment in all my endeavors. They have always be patient and kind, even when I was undeserving, and this thesis is a tribute to them.

(6)

(7)

List of Figures

2-1 Instrusiveness Assessment Interface . . . . 27

4-1 AppWindow Architecture... 41

6-1 Privacy Fingerprint for Badoo - Meet New People . . . . 59 7-1 Heatmap of locations accessed by Google Maps over 4 weeks, collected

while user was idle . . . . 67 7-2 The user's level of activity is estimated by the average number of data

accesses logged by AppWindow each day. . . . . 69 7-3 User activity is estimated by the average number of data accesses

logged by AppWindow each hour. . . . . 70

7-4 Interaction patterns over a week are estimated by counting up the number of times a given bluetooth device is logged by AppWindow each day. . . . . 72 7-5 Interaction patterns over a day are estimated by counting up the

num-ber of times a given bluetooth device is logged by AppWindow each

hour... ... 73 8-1 Privacy Fingerprint showing the data access behavior for Angry Birds 82 A-1 Privacy Fingerprint showing the data access behavior for Angry Birds 100 A-2 Privacy Fingerprint showing the data access behavior for Bow Man . 100 A-3 Privacy Fingerprint showing the data access behavior for Cut the Rope

(12)

A-4 Privacy Fingerprint showing the data access behavior for Fishing Star 101

A-5 Privacy Fingerprint showing the data access behavior for Flow . . . . 102

A-6 Privacy Fingerprint showing the data access behavior for Fruit Ninja. 102

A-7 Privacy Fingerprint showing the data access behavior for Paper Toss app... ... 103 A-8 Privacy Fingerprint showing the data access behavior for Scramble Free 103 A-9 Privacy Fingerprint showing the data access behavior for Temple Run 104

A-10 Privacy Fingerprint showing the data access behavior for UNOO . . . 104

A-11 Privacy Fingerprint showing the data access behavior for Camera360

U ltim ate . . . . 105 A-12 Privacy Fingerprint showing the data access behavior for FxCamera 106 A-13 Privacy Fingerprint showing the data access behavior for Instagram 106

A-14 Privacy Fingerprint showing the data access behavior for Photo Grid 107 A-15 Privacy Fingerprint showing the data access behavior for PicsArt

-Photo Studio . . . . 107

A-16 Privacy Fingerprint showing the data access behavior for Pudding

Cam era . . . . 108 A-17 Privacy Fingerprint showing the data access behavior for Facebook

M essenger . . . . 109 A-18 Privacy Fingerprint showing the data access behavior for GO SMS Pro 110 A-19 Privacy Fingerprint showing the data access behavior for Google Voice 110 A-20 Privacy Fingerprint showing the data access behavior for Handcent SMS 111 A-21 Privacy Fingerprint showing the data access behavior for Kakao Talk 111

A-22 Privacy Fingerprint showing the data access behavior for Pinger . . . 112

A-23 Privacy Fingerprint showing the data access behavior for Skype . . . 112 A-24 Privacy Fingerprint showing the data access behavior for WhatsApp . 113 A-25 Privacy Fingerprint showing the data access behavior for Badoo -Meet

New People . . . . 114

A-26 Privacy Fingerprint showing the data access behavior for Bump . . . 115 A-27 Privacy Fingerprint showing the data access behavior for Facebook . 115

(13)

A-28 Privacy Fingerprint showing the data access behavior for Foursquare 116 A-29 Privacy Fingerprint showing the data access behavior for Google+ . . 116 A-30 Privacy Fingerprint showing the data access behavior for WhatsApp . 117 A-31 Privacy Fingerprint showing the data access behavior for MeetMe

-M eet New People . . . . 117 A-32 Privacy Fingerprint showing the data access behavior for Skout . . 118

(14)

(15)

List of Tables

2.1 Android Permissions for Hypothetical Social Hangout Apps . . . . 24

4.1 List of Sensitive Datatypes logged by AppWindow . . . . 40

6.1 Intrusiveness Weights for Sensitive Access Contexts . . . . 56

6.2 Distribution of access contexts for Badoo . . . . 58

7.1 Percentage of Idle Access Activity . . . . 66

8.1 Ranking of App Categories by Average Intrusiveness . . . . 76

8.2 Average Intrusiveness Subscores by Category . . . . 76

8.3 Ranking of App Categories by Average Intrusiveness Subscores . . . . 77

8.4 Ranking of Game Apps by Decreasing Intrusiveness Score . . . . 79

8.5 Intrusiveness Subscore Breakdown of Game Apps . . . . 80

8.6 Rankings of Game Apps by Intrusiveness Subscores . . . . 80

8.7 Effect of Idle Access Activity on Overall Intrusiveness Ranking of Gam es Apps . . . . 81

8.8 Intrusiveness Scores for Photography Apps . . . . 84

8.9 Intrusiveness Subscore Breakdown of Photography Apps . . . . 84

8.10 Ranking of Photography Apps by Intrusiveness Subscores . . . . 85

8.11 Effect of Idle Access Activity on Overall Intrusiveness Ranking of Pho-tography Apps . . . . 85

8.12 Intrusiveness Subscore Breakdown of Messaging Apps . . . . 87

(16)

8.14 Effect of Idle Access Activity on Overall Intrusiveness Ranking of

Mes-saging Apps . . . . 88

8.15 Social Apps Ranked by Decreasing Intrusiveness Score . . . . 89

8.16 Intrusiveness Subscore Breakdown of Social Apps . . . . 89

8.17 Ranking of Social Apps by Intrusiveness Subscores . . . . 90

8.18 Effect of Idle Access Activity on Overall Intrusiveness Ranking of Social A pps . . . . 90

(17)

Chapter 1 Introduction

1.1 Background

In today's world, where data is the new currency, social networks the new playing ground, and mobile phones the new vehicle for information exchange, privacy is be-coming obsolete. As consumers increasingly rely on smartphones for their daily needs, more of their personal information is placed onto mobile devices, providing apps and third party advertisers with a wealth of privacy sensitive information to mine and pro-file user behavior. Indeed, numerous studies [7, 20, 41, led by academic researchers and investigative journalists alike have confirmed consumer fears over potential privacy breaches of their data.

Most consumers voluntarily offer their personal data in return for free and useful services, oftentimes without sufficient realization of the privacy risks that accom-pany such personal information disclosures. This lack of understanding, however, can hardly be blamed on app users. They have been brought up in an environment that offers users little choice and control over what types of data they disclose, and little visibility into how they are used once obtained.

Various efforts have been made by app developers and mobile platforms like iOS and Android to increase transparency in order to gain user trust, but most of these mechanisms are ineffective, and at best, incomplete. Android's permission framework

(18)

grain location, device id, personal contacts, etc.) required to run their application software, asking smartphone users for all the necessary permissions upon installation. However, permission descriptions are rarely read by users, and even when they are, rarely understood, most likely due to their length and abundant technical jargon [10]. This lack of user interest and comprehension may be motivated by the fact that users have little choice in installing an Android app. They must accept all permissions, however overreaching, or be precluded from using the app altogether.

iOS conducts its own review process for content, quality, and security, but its criteria is only available to developers and unavailable for public perusal. Thus, it does not enhance transparency for users. Given that numerous popular iPhone apps [4, 20] were exposed as collecting sensitive information (device id, contact data) unnecessary for app functionality, it is dubious how effective Apples review process actually is at filtering out rogue apps. Currently, iPhone apps are only required to present notifications for user consent to gather location information, while accesses to other types of potentially sensitive data (device id, contact info, etc.) occurs in the background unbeknownst to the user. Nevertheless, the notification scheme is received positively by consumers, although it often fosters a false sense of assurance, as many users subsequently mistake location to be the only type of personal data collected by the app [15].

Both platforms also support Privacy Policies posted by app companies, but these policies are often consumed by legalese, rendering them inaccessible to the typical user who is eager to explore the functionalities of a new app and does not have the patience to sift through 60-page documents in fine print (and even finer print on a tiny smartphone screen) about legal matters.

1.2 Our Research

With privacy breaches (intentional or not) hitting the news every couple of months and the lack of supporting transparency structures that effectively educate consumers and assuage their privacy concerns, there is a compelling need for mechanisms that

(19)

clearly and concisely help users gain a holistic and accurate understanding about the sensitive access behaviors of their apps, empowering them to make better, informed decisions about choosing which apps to install.

We tackle the challenge of improving transparency for smartphone apps by focus-ing on the intrusiveness component of assessfocus-ing privacy risk. Specifically, the purpose of this thesis is to provide a framework for qualitatively assessing and quantitatively measuring the intrusiveness of apps based on their sensitive data access behavior, so that users can easily and effectively assess and compare the relative intrusiveness of apps. This work is accomplished in three high-level steps, each motivating a different component of our study:

1. We determine the types and patterns of data accesses performed by apps.

2. We qualitatively assess the intrusiveness of such data collections.

3. We construct a framework to quantify the intrusiveness of apps so that their

intrusiveness can be easily assessed and compared by users.

The first step is accomplished by building AppWindow, a context-aware in-formation monitoring system constructed from existing Android Open Source Code that logs every instance of privacy sensitive data accessed by an app (See Table 4.1 in Chapter 4 for a comprehensive list of the sensitive datatypes logged by AppWin-dow). We also log contextual information for each data request, because it is crucial for appropriately assessing the level of intrusiveness for each data access (See Section 4.2) for the list of context information logged by AppWindow).

The second step is reached by the first of two experimental studies: the

Quali-tative Intrusiveness Study. We use this study to confirm the prevalence of data

accesses performed under unexpected, and thereby intrusive, conditions. To justify the importance of this study, we assess the privacy implications of long-term intru-sive accesses. We do this by performing aggregate analysis on data collected over one month from one smartphone user, simulating the powerful inferences that can be gleaned by an app about the user's movement, activity, and interaction patterns based on the location data collected by the app and accompanying contextual information.

(20)

The third step is achieved by the Quantitative Intrusiveness Study. We handpick about 40 Android apps from the top 20 apps listed in each of 4 app categories (Games, Messaging, Photography, Social) from the Google Play Store and rigorously test each app under a set of predefined usage conditions, using AppWindow to monitor all sensitive data accesses. We break down an app's data accesses into a set of access contexts, assigning each a different Intrusiveness Score based on the type of sensitive data accessed and the privacy-relevant usage contexts under the data was collected. We use these weighted access contexts to generate a Privacy Fingerprint, visually capturing the sensitive data access patterns unique to each app. We then quantify the intrusiveness of these data accesses by computing a single-number Intrusiveness Score based on a normalized sum of the weighted access contexts, so that users can easily and accurately assess the relative intrusiveness of apps when they choose which ones to install (See Section A in the Appendix for the Privacy Fingerprints generated for all the tested apps).

1.3 Contribution

Together, the Privacy Fingerprint and Intrusiveness Score make significant improve-ments to existing transparency mechanisms in the following ways:

Our framework quantifies an app's intrusiveness.

Current transparency mechanisms (e.g., Android Permissions Model, TaintDroid) focus on qualitatively assessing an app's potential privacy risk by identifying sensitive data accesses. Our research probes deeper into an app's access behaviors, quantifying the intrusiveness of access contexts based on various privacy-relevant usage factors. We use these access contexts, weighted by intrusiveness, to compute a single-number Intrusiveness Score, which can be used to efficiently compare the relative intrusiveness of apps.

(21)

Unlike Android Permissions Lists and official app Privacy Policies which provide overly technical and ambiguous cues into possible sensitive data accesses, the Intru-siveness Score is computed based on an app's real access behavior collected under various privacy-relevant usage contexts, using the empirical testing data logged by AppWindow.

The Privacy Fingerprint and Intrusiveness Score are concise representations.

The visual representation of data access patterns and single score for intrusive-ness are concise and intuitive for users to understand. Users are no longer required to sift through inaccessible technical jargon and lengthy Privacy Policies and Terms of Service agreements to understand what kind of data apps are collecting and under which privacy-relevant usage conditions they are collecting it.

The Intrusiveness Score accounts for various intrusive usage contexts.

We quantify intrusiveness not only as a function of the sensitivity of data accessed, but also the sensitivity of usage contexts under which the data is accessed. For example, data retrieved while a user is idle is considered intrusive, because it is likely unexpected by the user. We also deem data accessed under moving conditions more intrusive than those retrieved while the user is not moving, because the app is able to glean more about a user's movement profile under the former condition.

1.4 Thesis Roadmap

In the following thesis, we present both the technical and conceptual framework for assessing the intrusiveness of smartphone applications, identifying important intru-siveness factors and evaluating their effect on the relative intruintru-siveness of apps. We first motivate the need for a better transparency framework by presenting a concrete scenario that illustrates the limitations of current mechanisms, namely Android's Per-mission Model. We follow this hypothetical, yet realistic scenario-driven discussion with a comprehensive review of related work, including user studies on the

(22)

effective-ness of existing transparency frameworks, information monitoring frameworks, access control frameworks, context-aware privacy frameworks, and an approach for quanti-fying data access patterns of apps. We highlight the contribution and limitations of each study.

Next, we take a step back from the conceptual discourse and delve into the im-plementation details behind AppWindow, explaining how it was built and how it monitors the data access activities of applications. We use AppWindow to log the sensitive data requests made by apps in our two experimental studies (Qualitative Intrusiveness Study, Quantitative Intrusiveness Study), and we describe the method-ology and experimental design decisions for each. Key findings from each study are then presented and discussed. For the Qualitative Intrusiveness Study, we present the powerful inferences that can be made by apps about a smartphone user's move-ment, activity, and interaction patterns. For the Quantitative Intrusiveness Study, we explain how the Intrusiveness Score is computed, justifying the inclusion of various contributing factors. We compare and contrast the Intrusiveness Scores of various apps in four categories (Games, Photography, Messaging, Social), highlighting trends in intrusiveness within and across each app category. Most notably, we find that Messaging apps are the most intrusive out of the four app categories, whereas Pho-tography apps, on average, are far less intrusive. We also find that most of the tested apps, especially in the Messaging and Social categories, exhibit significant idle access activity, which significantly affects the relative intrusiveness rankings of apps within each category. Finally, we summarize our findings and qualify our contributions, suggesting areas for improvement and further research.

(23)

Chapter 2 Motivating Scenario

We describe a scenario that illustrates the limitations of the Android Permission Model in promoting app transparency to users. We walk through the internal mo-tivations and concerns of one hypothetical Android user named Anna when she is searching for an app that recommends fun hangout places for her and her friends. We then discuss how the Privacy Fingerprint and Intrusiveness Score can be used to assuage her privacy and usability concerns, enabling her to make a more informed, confident decision in choosing which app to install.

Note to reader: Though hypothetical, our scenario reflects the concerns of users interviewed in numerous studies conducted to assess the effectiveness of Android's Permission Model[10, 15, 14, 9]. We focus on Android apps for the scenario because our current implementation of AppWindow is tied to the Android framework. How-ever, our conceptual framework for the privacy fingerprint is generalizable to any smartphone platform.

2.1 Scenario Description

As a new freshman in college, Anna is interested in exploring the party scene, but she and her new friends are all unfamiliar with the area. Anna's parents just bought her an Android smartphone as a going-away present, so she decides to use an app to find popular hangout places in the city. Anna knows there are several well-known

(24)

apps that provide this service, so she goes to the Google Play Store to find the best one. Anna types in "social hangout" into the search bar, and she is presented with a list of candidate apps. She focuses on the top 3: FriendHangout, PlacesForYou, and HotSpots. Each has more than 100,000 Android smartphone users. Anna reads the reviews for all three. They are mostly positive, and the features look similar. All three are rated 4.5 stars, but HotSpots has the least users. Anna still can't make a decision based on these metrics, so she clicks on another tab - "Permissions" - to see what extra information it can offer. Here is a summary of what she sees:

Table 2.1: Android Permissions for Hypothetical Social Hangout Apps FriendHangout PlacesForYou HotSpots

(123,143 users) (145,345 users) (121,341 users) Coarse location Coarse location Coarse location Fine location Fine location Fine location Contact data Contact data Contact data Calendar data Send SMS Change wifi state Device id Record audio

Browsing data Take pictures and videos Modify/delete USB stor- Coarse location

age and SD card content

Anna notices that all three apps access fine and coarse location and contact data. Beyond that, they all request different information, but she does not know how to assess the further permissions. There are descriptions displayed under each permission of varying length, but she tires of reading them because most of them are rather long and technical. For example, she sees the following description for fine location permission (as opposed to coarse):

Access fine location sources such as the Global Positioning System on the tablet, where available. Malicious apps may use this to determine where you are, and may consume additional battery power. Access fine location sources such as the Global Positioning System on the phone,

(25)

where available. Malicious apps may use this to determine where you are, and may consume additional battery power.

Anna only reads the first two sentences and then quickly loses focus because the description is too long. The word "malicious" strikes her as a bit disconcerting, but she reasons that the app is safe because almost all of the popular apps require the Fine Location permission. She glances over the other permissions required by the three apps under consideration and is confused by the practical implications of

"Modify/delete usb storage and SD card content" and "Change wifi state".

Eager to try out the app, Anna decides to install PlacesForYou because it has the most users and received good ratings. She also likes the idea that it allows users to send SMS and share pictures and videos (as deduced from the Permissions List). Though the permissions would give the app access to much of her personal data (contact, sms, pictures, location), Anna reasons that given its popularity and high ratings, the app cannot be doing anything too egregious, lest the company lose consumer confidence. She also naively assumes that a big and reputable company like Google must perform some sort of review process for all the apps uploaded onto Google Play Store, as Apple does for iPhone, in order to filter out malicious and rogue apps that may try to sell her personal data for money.

2.2 Scenario Discussion

Anna's experience demonstrates the ineffectiveness of the Android Permissions List in conveying an app's privacy risk. Indeed, the permissions list barely raises any privacy concerns because Anna bases her trust on the app's positive reviews, popularity, high ratings, and Google's reputation [15]. If anything, she uses the list as an indicator of features, not privacy risk. Furthermore, the inaccessibility of the Android Permissions List, though intended to provide greater transparency, does not effectively promote understanding about an app's access activities. On the contrary, the interface actually facilitates misunderstanding as a consequence of user's resulting ignorance. Without a clear grasp of the apps' sensitive access behaviors, Anna falls back on a false sense

(26)

of security, causing her to misplace trust in the Android platform and its applications. To summarize, we identify the following limitations of the Android Permission Model as barriers to user understanding:

" Long descriptions (Don't want to read it) " Technical jargon (Don't understand it)

" Naive trust in platform (Might as well trust it)

" Hard to compare apps (How do I pick one over the other?)

In the next section, we show how the Privacy Fingerprint and Intrusiveness Score can be used to overcome these hurdles, instilling greater knowledge and assurance in smartphone users over the sensitive behavior of their apps.

2.3 Benefit of Privacy Fingerprint and

Intrusive-ness Score

We argue that when used together, the Privacy Fingerprint and Intrusiveness Score are more effective than the existing Android Permission Model at promoting trans-parency. We cite the following reasons:

" Concise (Very little explanatory text, primarily a visual and a numerical score) " Minimal technical jargon (Focus on intrusiveness and not understanding

tech-nical terms)

" Informed trust (More understanding about how apps access data)

" Enables easy comparison of apps (Intrusiveness Scores are easy to compare)

Imagine that instead of perusing through a lengthy Permissions List cluttered

by verbose descriptions, Anna clicks on a tab called "Intrusiveness Assessment" for

(27)

I

1

PlacesForYou

ile ame (sw Intrusiveness Score

18.7

a itdCompare with Similar Apps

Intrusiveness App Name itrusiven

Ranking Score

1 PlacesForYou 18.7 3 FriendHangout 17.2

20 HotSpots 11.0

o Mn Io 0 Z C73 I proportion of data accesm

Figure 2-1: Instrusiveness Assessment Interface

that catches her eye is the big, bolded Intrusiveness Score displayed at the top right corner. Not knowing what this number means, she quickly scans the page and finds the comparison chart immediately below it, with an arrow pointing to the current app under consideration. There, she finds that PlacesForYou is the most intrusive out of the apps that offer similar services. Whereas FriendHangout is only two spots behind, HotSpots is ranked significantly lower in intrusiveness. Thus, the Intrusiveness Score, when combined with the comparison chart, gives Anna a good sense of the relative intrusiveness of social hangout apps.

Although Anna finds the score comparison helpful, she does not know what this "intrusiveness" means, nor how it is calculated. She now looks to the visual repre-sentation (Privacy Fingerprint) of the app's data access behavior displayed on the left-hand side of the interface. The long bands nearing the bottom of the finger-print indicate that PlacesForYou spends the most time accessing contact informa-tion. Since the bands are at the bottom, they also represent more intrusive accesses. Anna quickly glances over the other data types displayed in bold, and finds that the app also accesses gps, phone number, and device id. Moving up the visual, she is

(28)

surprised, and a bit spooked, to find that all of this information is gathered when she's not even using the app, with 61% of the accesses taking place while the user is idle. She considers this practice to be sneaky and unwarranted and wonders how this access behavior compares to those of FriendHangout and HotSpots, which rank lower in intrusiveness. Anna now clicks on the link for HotSpots in the comparison chart at the bottom right-hand corner, which takes her to another similar interface. There (not shown), she finds that HotSpots only accesses device id information and has an idle access percentage of 10%, significantly lower than PlacesForYou's 61%. Now, Anna has a better intuitive understanding of why PlacesForYou was ranked so much higher in intrusiveness than HotSpots. HotSpots accesses fewer pieces of sensi-tive information, as shown by the fewer number of bands in the Privacy Fingerprint (1 vs. 5). Furthermore, HotSpot's access of device id is not as intrusive as those of contact info, phone number, and gps performed by PlacesForYou (indicated by its higher vertical placement in the fingerprint). Finally, most of the sensitive informa-tion gathered by HotSpots occur while the user is active, unlike PlacesForYou, which accesses the majority of its data while the user is idle.

Anna now considers all the information she has at hand about each of the three apps - comparable reviews, comparable ratings, comparable popularity (although PlacesForYou has about 20,000 more users than HotSpots), but vastly different In-trusiveness Scores. The InIn-trusiveness Assessment Interface has helped to differentiate the otherwise similar apps on one important factor - intrusiveness, thereby simplify-ing Anna's decision-maksimplify-ing process. Instead of gosimplify-ing with PlacesForYou based on her original rationale of greater popularity, she now compromises that factor in favor of reduced intrusiveness and chooses to install HotSpots. Her increased understanding of the apps' access behaviors not only enlightens her to the surprising realization that apps access data while users are idle, but also gives her comfort and assurance that the app she has chosen is not accessing her more personal information (sMs, phone number, etc.) behind her back and sending them to advertisers. Anna now feels con-fident that she has installed an app that provides great performance and usability, with minimal intrusiveness.

(29)

2.4 Descriptions of the Privacy Fingerprint and

Intrusiveness Score

We spent the last section walking through a scenario and explaining how the Privacy Fingerprint and Intrusiveness Score can improve the transparency of sensitive app be-havior, aiding in quicker, more accurate risk assessment during the decision-making process of choosing apps. Now, we provide a a brief discussion on what these compo-nents are and how they are generated. The step-by-step details of this computation are further explained in Chapter 6.

The Intrusiveness Score captures the aggregate intrusiveness of data accesses per-formed by the app in a single, impressionable number. This concise yet potent metric helps smartphone users make quick yet accurate assessments about the relative intru-siveness of apps, facilitating the decision-making process. To compute the Intrusive-ness Score, we divide the data accesses of each app into unique access contexts, as-signing each context an intrusiveness weight, based on various factors that contribute to intrusiveness. These factors include the sensitivity of data collected (determined by how personally identifiable a particular data type is), the percentage of total accesses accounted for by that particular access context, and the privacy-relevant usage condi-tions under which the data is accessed (user moving vs. user not moving, user active vs. user idle). The intrusiveness formula finally distills these various intrusiveness factors into one number, so that the intrusiveness of apps can be easily compared.

Accompanying the numerical metric of intrusiveness is its necessary complement

-the Privacy Fingerprint. It is used as an explanatory visual aid for -the Intrusiveness Score, efficiently capturing the rationale behind each app's Intrusiveness Score to facilitate understanding and aid the decision-making process in cases of vacillation

caused by consideration of other app factors besides intrusiveness (e.g., performance, usability, popularity). Generated from empirical testing data collected under a set of predefined usage conditions, the Privacy Fingerprint provides a concise yet holistic representation of the data access patterns unique to each app. By displaying the

(30)

breakdown and intrusiveness of each of an app's data access contexts (composed of the type of sensitive data collected and the privacy-relevant usage conditions under which it is obtained), the Privacy Fingerprint allows smartphone users to quickly grasp the various factors that lead to different and similar Intrusiveness Scores. Thus, the fingerprint provides Anna with greater transparency and understanding into an app's access behavior, details which are otherwise masked by the Intrusiveness Score. To summarize, this newly proposed Intrusiveness Assessment Interface for smart-phone apps provides a transparency mechanism that is concise, informative and holis-tic, and enables easier comparison across apps, a significant improvement upon ex-isting tools (e.g., Android Permissions List). By encouraging user engagement, these improved features increase the likelihood that users like Anna will more accurately understand the access behavior of apps and appropriately assess and compare the intrusiveness of apps during their decision-making process.

In the long term, we hope increased engagement with the Intrusiveness Assess-ment Interface will push app developers to minimize the frequency and volume of sensitive data they collect in order to maintain a competitive advantage over similar app companies.

(31)

Chapter 3 Related Work

Here we discuss work that has been done in the field of smartphone transparency and permission control. We first motivate the need for better transparency mechanisms

by discussing the ineffectiveness of existing transparency models. Then we examine

proposed solutions to this problem, highlighting their contributions and their limita-tions and explaining how our study differs from each. We then look at quantitative approaches that can be used to aid our work in developing a framework for measuring intrusiveness.

3.1 Current

T

_{fransparency Mechanisms}

Recent user studies have shown that existing transparency frameworks on the An-droid and iOS platforms are largely ineffective in conveying an app's true actions, motivating the need for improved transparency mechanisms that will increase user understanding and facilitate more accurate risk assessment of apps.

Felt et al [10] interviewed and observed 25 Android users to assess the effective-ness of Android permissions at warning users about the potential risks of installing applications. These studies showed that only 17% of participants even looked at per-missions. Moreover, only 21% of those who did accurately understood their content. 42% were completely unaware of Android's Permission Lists, leading the authors to conclude that current Android permissions warnings do not help most users make

(32)

security decisions. While the platforms goal is to inform the user of the capabilities [their] applications have [1], the study's findings suggest that the permissions system is an ineffective way of providing transparency to users.

King [15] interviewed 24 iPhone and Android users about their privacy expecta-tions of both their apps and the smartphone platform. The author found there was a noticeable "privacy gap" among "users' privacy expectations, smartphone usage, and the current information access practices by application developers". The interviews showed that the majority of Android users demonstrate an over-reliance on exist-ing assurance structures, wrongly believexist-ing that Google reviews all applications for quality and security control before posting them in the Google Play Store for users to download. Furthermore, only 2 out of the 13 Android users felt they understood what the permissions granted after reading them, suggesting the language may have been ambiguous or overly technical. Despite their lack of understanding however,

the majority of the Android interviewees expressed that they liked being asked per-mission about personal data collection. This response indicates that transparency efforts are indeed appreciated by users. King's study also showed that participants were more comfortable sharing their real-time location, device ID, and location his-tory over photos, address book, call logs, text messages and files stored on SD cards, with iPhone users selecting location twice as often as Android users. The author posits that the iOS runtime prompt for accessing location has made iPhone users more aware of (and perhaps less sensitive to) location requests. Overall, King's study shows that current transparency efforts, though appreciated by users, have created confusion and caused them to overestimate the actual amount of assurance provided

by platforms over protecting users against unscrupulous apps.

Kelley et al [14] also conducted interviews with 20 Android users in two American cities to assess their understanding of Android permissions. The study found that while users generally view and read the Android permissions list before installing an app, the permissions themselves are not well understood. Based on the interviews, Kelly et al partly attributes this lack of understanding to language that is "at best vague, and at worst confusing, misleading, jargon-filled, and poorly grouped". Thus,

(33)

the permission descriptions are not likely to be a significant factor in making the decision to install an app. Furthermore, interviewees showed difficulty describing the possible harm that could be caused by applications collecting and sharing their personal information, highlighting another area of mobile app privacy design that can be improved. Our profiling study shows what types of deep inferences can be made

by seemingly benign data accesses when performed over a longer period of time (4

weeks in this study) and quantifies this level of profiling potential in our Intrusiveness Score.

3.2 Information Monitoring Frameworks

Several tools have been built to monitor the sensitive information collected by ap-plications on the Android platform. While these frameworks provide a good foun-dation for elucidating the types of data collected by smartphone applications, their use is currently limited to the identification of certain and possible undesirable trans-actions, rather than holistic assessments of an app's overall intrusiveness based on aggregate analysis of real-time accessed data. Furthermore, most of these systems

[7, 13, 12, 11, 21, 6] fail to log contextual information about data accesses, an

impor-tant factor in determining privacy intrusion.

Fuchs et al [11] developed a tool called ScanDroid that provides automated se-curity certification of Android Applications. It performs incremental checks on apps during install-time to detect possible security breaching behavior. It extracts security specifications from manifests and checks whether data flows through those applica-tions are consistent with those specificaapplica-tions. ScanDroid is limited to install-time

checking, so it cannot detect leaks that are outside the scope of installation.

Enck et al [7] created Taintdroid as an information tracking tool for privacy sensi-tive smartphone data. It uses dynamic taint analysis to track privacy data flow on a smartphone and any leaks to third party servers. Unlike ScanDroid, TaintDroid can also detect leaks that are not just limited to install-time activity. Like AppWindow, Taintdroid is a modification of Android Open Source and can be used to track the

(34)

information flow of existing Android apps. However, unlike AppWindow, Taintdroid lacks contextual information for its taints (e.g., location, surrounding bluetooth de-vice information, screen mode, current running application), which we use later to generate the Privacy Fingerprints and calculate the Intrusiveness Scores for apps.

Egele et al [6] created a similar tool for iOS, called PiOS, which applies static analysis on iOS apps to detect possible privacy leaks to third parties. PiOS first constructs the control flow graph of an iOS app and then determines whether there exists an execution path from the nodes that access privacy source to the nodes of network operations. If such a path exists, PiOS deems there to be a potential privacy leak. While this system can be used to identify potentially undesirable app behavior, the leak is found based on static analysis, and not holistic analysis of real-time data accesses, which our privacy fingerprint provides. Furthermore, like Taintdroid, PiOS does not use contextual data to identify these privacy leaks, and the severity of these leaks (most often device ID) may change according to the usage context and varying levels of privacy sensitivity across users. In our study, these limitations are overcome

by the Privacy Fingerprint.

Not all monitoring frameworks lack context awareness, however. Researchers at the MIT Media Lab developed Funf [16], an open-source extensible sensing and data processing framework for mobile devices. Funf allows developers to use various sensor probes to log information about context and sensory data related to data transac-tions. It also provides visualizations (e.g. heatmaps, frequency stats) on data usage given the collected contextual information. Although Funf can be used to contextu-ally monitor data accesses by apps, it is not compatible with existing Android apps. Funf's focus is to provide developers with tools for context-aware data collection, analysis, and visualization when they create new apps, instead of providing trans-parency mechanisms for smartphone users to evaluate the actual activities of existing apps.

ConUCON [2] can also log contextual information for data accessed by apps, but unlike Funf, it targets the user rather than the developer. ConUCON provides users with fine grain control of Android permissions and a well-defined policy enforcement

(35)

framework which uses contextual information to resolve user-specified policies relat-ing to data protection and resource usage control durrelat-ing runtime. The framework monitors accesses to resources, data and files along with context types that are rel-evant to the users specified policies (e.g. temporal, spatial, battery, signal strength, acceleration, Bluetooth state, WiFi state, CPU utilization, and memory amount). While ConUCON can be used as a context-aware information monitoring system for existing Android apps, the purpose of its monitoring is to enhance user control over application data, rather than elucidate the access activities of applications, which is the focus of our study.

3.3 Using Transparency to Increase User Control

Many other studies like ConUCON have built monitoring systems to empower users, affording them finer grain control over the specific permissions required by each app at runtime [19, 22, 12, 13]. Some of these systems [13, 3] alert the user to potentially undesirable information leaks, giving them greater insight into what apps are doing with sensitive data, while others provide an interface for users to define their own policies that prevent the disclosure of certain types of sensitive data under specified contexts [19, 22]. It is imperative to note that controlling disclosure is out of the scope of our thesis, because we believe the majority of smartphone users are ill-equipped to intelligently and meaningfully decide which data accesses they ought to revoke or allow (they always click OK) [15], given their lack of comprehension of the existing Android Permissions [10]. Also, revoking data collection may reduce and impair app performance [13].

3.4 Improving Transparency Interfaces

Lin et al [17] proposes a new privacy interface that uses privacy expectations to model the privacy concerns of users. The study uses crowdsourcing techniques (on

(36)

of which types of sensitive data are requested by apps and then uses the crowdsourced data to display the most unexpected accesses for a given app. By operating under a model of privacy expectations, the interface aims to correct misconceptions of users in order to allay privacy concerns in cases of uncertainty and misunderstanding, as well as educate them about the real types of data disclosed by apps in order to aid them in making better trust decisions. Like our study, Lin et al's work aims to provide a quantifiable indicator for privacy risk by measuring user expectations of various types of data accesses. They include the purpose of data collection (e.g., major app functionality, minor supporting app functionality, targeted advertising) as a context of their data accesses, whereas we capture usage contexts. Both contexts are relevant to quantifying intrusiveness, and the two studies can be combined to enhance the coverage and accuracy of the overall Intrusiveness Score.

3.5 Quantifying App Behavior

While many studies have qualitatively assessed the risk of apps by uncovering in-stances of unexpected data collections and detecting potentially malicious behavior

[8], very little has been done to quantify such risk for each app. One study [5] uses a

probabilistic framework to quantify usage behaviors unique to each user of a "bag" of Android apps. This "bag-of-apps" model robustly represents the level of phone usage over specific times of the day. Whereas this empirical approach was used to identify the unique usage behaviors of users, we use it to quantify the data access patterns unique to each application in order to create each app's Privacy Fingerprint. We then distill the fingerprint data into a single-number Intrusiveness Score.

Lin et al [17] used crowdsourcing to capture users' expectations of which sensi-tive resources are used by various Android apps, as well as to ask users to guess for which purpose(s) (e.g., major app functionality, minor supporting app functionality, targeted advertising) the data are used. The study uses TaintDroid to track the des-tinations of sensitive data transfers in order to determine the likely purpose of certain data accesses and compares this "ground truth" purpose to the purposes guessed by

(37)

users. Then, users were asked to rate their comfort level from

+2

(very comfortable) to -2 (very uncomfortable) of a given data access and its purpose (e.g. location is used

by Toss it for target advertising). Researchers correlated the comfort level with the

level of users expectation and correct identification of the purpose. Findings suggest that both user's expectation and the purpose of why sensitive resources are used have a major impact on users' subjective feelings and their trust decisions. We believe this study is a valuable starting point in quantifying the sensitivity of certain data accesses. Currently, we only account for the identifiability of the datatype and the intrusiveness of various usage contexts. Adding the user's comfort level would make the intrusiveness calculation customized to each user's privacy sensitivity. This factor would add a further dimension to our Intrusiveness Score, and Lin's study provides an excellent starting point for quantifying this subjective feeling. Another major finding is that properly informing users of the purpose of resource access can ease their pri-vacy concerns to some extent, particularly in the case where users expectations of app behavior are more intrusive than they actually are (e.g., expected behavior is targeted advertising, but actual behavior is support for minor feature of an app). This find-ing further motivates the need for our research and the development of an interface that clearly and concisely conveys an app's privacy-relevant data accesses, in order to aid users in more effectively assessing privacy risk as well as assuage unnecessarily paranoid fears.

(38)

(39)

Chapter 4 AppWindow Architecture

In this chapter, we explain the implementation details behind AppWindow, an infor-mation monitoring framework we developed from existing Android Open Source code to track all sensitive data requests made by Android apps. AppWindow is composed of two major components:

1. Monitoring Framework

2. Logger Application

4.1 Monitoring Framework

The current version of AppWindow's Monitoring Framework is a light modification of existing Android framework source code, version 2.3.4.rl (Gingerbread). To monitor the sensitive data accessed by apps, we placed additional logging code into public API methods of various Android framework modules responsible for managing potentially privacy sensitive information (see Figure 4-1).

(40)

Table 4.1: List of Sensitive Datatypes logged by AppWindow Sensitive Datatypes sms audio photo contact name contact email

contact phone number browsing info

voicemail gps location

sim serial number imisi

msisdn device id

wifi info (SSID, BSSID, RSSI, ip address, mac address) cell location

sensor accelerometer sensor rotation vector

bluetooth info (device address, name)

For more information about which Android classes and public API methods were modified and the correspondence of specific data types to publicly accessible methods, please see Section B of the Appendix.

We also created our own LoggingManager module as part of Android's existing framework to manage the logging information captured for each data access. The LoggingManager packages up the type and content of sensitive data accessed, along with the requesting app's name into an Android Intent and sends it to the dedicated Logger Application residing on the client device.

4.2 Logger Application

We create the Logger App as an Android application in order to store the sensitive data accesses in a dedicated sqlite database on the smartphone. The Logger App logs a transaction record for each privacy sensitive request made by any app installed on the smartphone. This transaction record includes:

" name of requesting application " datatype requested

(41)

" time of request

" contextual screenmode (on/off)

" contextual app status (foreground/background request) " address and name of nearby bluetooth devices

" contextual location (longitude, latitude)

4.3 Use Case

Android Framework 4 LoggingManager 3 r---LocationManager I SWifiManager TelephonyManager 1 BluetoothService SensorManager ContentResolver -- - -- - - -- - -:3 0

IL

Android Smartphone 5 Android Apps Google Maps Evernote Foursquare

Cut the Rope

Figure 4-1: AppWindow Architecture.

Components outlined in red represent structures that are either a modified version of existing Android Source Code or newly created for AppWindow. The Monitoring Framework is shown on the left side of the vertical line, and the Logger App is dis-played on the right.

Figure 4-1 shows how the two components (Monitoring Framework and Logger Application) fit together in the the overall architecture of AppWindow. We now

(42)

use the diagram to walk through a use case with the sample app FriendHangout (See Chapter 2) in order to more clearly convey how AppWindow monitors and logs sensitive information requested by apps. Each step below corresponds to a process number shown in the architecture diagram.

1. FriendHangout wants to access a user's location so that it can recommend

nearby hangout places. It calls the getLastKnownLocation( function available

through the Android API.

2. The LocationManager module, which is part of the existing Android Framework, is responsible for returning the cached location to FriendHangout.

3. Our modified version of LocationManager passes the datatype (gps location) of

the accessed data to the LoggingManager.

4. The LoggingManager adds contextual information (time of access, name of re-questing application) to the information received from LocationManager and packages all the data into an Android Intent. It sends this intent to the Logger

App (written for AppWindow) installed on the smartphone. The Logger App

receives the intent and writes its contents, along with additional contextual information (screenmode on/off, app in foreground or background, location, bluetooth info) gathered by the Logger App itself, into a sqlite database.

5. The LocationManager returns the last known location to FriendHangout.

In summary, AppWindow provides a "window" into the sensitive access behavior of existing Android apps. It not only logs privacy sensitive requests made by apps, but also records privacy-relevant context information corresponding to each data re-quest, so that the intrusiveness of each access can be more accurately assessed (See Chapter 6 for details on how this is done). Like TaintDroid, AppWindow serves as a monitoring tool, but it does not figure out the destination of data transfers, just their content and usage context. While the contextual information logged by AppWindow can ultimately be used to resolve user-defined policies regarding an app's collection

(43)

of personal data [19, 22], the current prototype does not include a policy enforcement module. We believe that once smartphone users are sufficiently and accurately in-formed of apps' sensitive data access behaviors and the privacy implications involved, they will be better equipped to make smart and meaningful decisions about which sensitive permissions to revoke or allow, making the use of policy enforcement add-ons more relevant and useful.

(44)

(45)

Chapter 5 Intrusiveness Studies

In this chapter, we present the methodology and design decisions for two experimental studies conducted to assess the intrusiveness of apps. The first study is the Qualitative Intrusiveness Study, wherein we aim to 1) confirm the prevalence of unexpected, intrusive data accesses by popular Android apps and 2) assess the long-term privacy implications of these intrusive accesses by examining the inferences that can be made about a user's behavioral profile. The second study is the Quantitative Intrusiveness Study. For this experiment, we test about 40 popular Android apps across 4 different app categories under pre-defined usage contexts and use the empirical data collected to construct a framework for quantifying the intrusiveness of apps.

5.1 Qualitative Intrusiveness Study

We conducted the Qualitative Intrusiveness Study to confirm the prevalence of un-expected, intrusive data accesses and gain a preliminary glimpse into the powerful inferences that can be made about a users personal profile as a consequence of those intrusive accesses. Results from the profiling analysis can be found in Chapter 7.

Assessing intrusiveness of smartphone apps

Assessing Intrusiveness of Smartphone Apps

by

Fan Zhang

Submitted to the Department of Electrical Engineering and Computer

Science

in partial fulfillment of the requirements for the degree of

Master of Engineering in Electrical Engineering and Computer Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

September 2012

@2012

Fan Zhang. All rights reserved.

The author hereby grants to M.I.T. permission to reproduce and to

distribute publicly paper and electronic copies of this thesis document

in whole and in part in any medium now known or hereafter created.

A uthor ...

.

...

Department of Electrical Engineering and Computer Science

August 22, 2012

Certified by..

f

Accepted by

....

Hal Abelson

Professor

Thesis Supervisor

Prof. Dennis M. Free

-n

Assessing Intrusiveness of Smartphone Apps

by

Fan Zhang

Abstract

Acknowledgments

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Background

1.2

Our Research

1.3

Contribution

1.4

Thesis Roadmap

Chapter 2

Motivating Scenario

2.1

Scenario Description

2.2

Scenario Discussion

2.3

Benefit of Privacy Fingerprint and

Intrusive-ness Score

I

1

18.7

2.4

Descriptions of the Privacy Fingerprint and

Intrusiveness Score

Chapter 3

Related Work

3.1

Current

fransparency Mechanisms

3.2

Information Monitoring Frameworks

3.3

Using Transparency to Increase User Control

3.4

Improving Transparency Interfaces

3.5

Quantifying App Behavior

+2

Chapter 4

AppWindow Architecture

4.1

_{fransparency Mechanisms}