You are on page 1of 2

Predicting Interpersonal Relationship based on Mobile Communication Patterns

Jumin, Chi Hyungeun, Jo Jung-hee, Ryu Graduate School of Culture Technology, KAIST Daejeon, Korea { zziju, acid, and junghee.ryu@kaist.ac.kr}
SYSTEM DESIGN Defining Intimacy Factors

ABSTRACT

In this poster we propose a method to measure the intimacy scores of interpersonal relationships (IR) using mobile communication data. We draw on previous literature to define communication factors that affect intimacy. Using these factors, we implemented a pattern-based intimacy prediction method. We found 33 communication patterns, and the correlation of the predicted intimacy based on those patterns to the self-reported intimacy was 0.61 (p<0.01). Using this method, we can automatically predict IR.
Author Keywords

Mobile, Communication, Interpersonal Relationship, Intimacy


ACM Classification Keywords

J.4 SOCIAL AND BEHAVIORAL SCIENCES, Sociology; H.4 INFORMATION SYSTEMS APPLICATIONS, H.4.3 Communications Applications
General Terms

Human Factors
INTRODUCTION

With various communication media such as mobile phones and email, people establish and manage many relationships. However, it is not easy to manage IR because relationship changes are neither instant nor easily recognizable. There have been studies that have tried to predict IR using communications media. So far, predicting binary-choice relationship (e.g. work or social) showed an accuracy of 80% [2, 3]. On the other hand, predicting a three-category choice relationship showed an accuracy of 50.2% [2]. However, categorizing IR in general suffers from a lack of distinct concepts. Fiske [4], a social psychologist, is one of several researchers who have attempted to categorize human relationships; Fiske generalized human relationships into 4 categories. Although this is an excellent categorization of human relationships, it is less helpful for use with prediction systems because the resulting correlations of two categories are high [4]. To avoid the unclear nature of categorization schemes, we used intimacy measuring tools [1, 6]. Using intimacy measuring surveys and mobile communication history, we present a method for predicting relative intimate relationships among people.

Eagle et al.[3] identified a Friend based on certain patterns such as proximity on Saturday night or proximity outside work. This contextual information showed a significant degree of matching with friend predictions. Avrahami and Hudson[2] revealed that many instant message (IM) communication characteristics were affected by the relationships between users and their buddies; such characteristics include communication length, frequency, and interval. Moreover, through IM, Hu et al.[5] showed a positive correlation between the amount of IM use and intimacy. The following characteristics were extracted from the above studies: Frequency (the number of communications), Intensity (communication duration), and Time and Day (the time at which communication occurred). We also added two more characteristics which are applicable in the mobile communication environment: Directivity (the person who mostly initiated the communication between user and his/her partner), and Channel (tendency to choose between calls or SMS).
Calculating Factors

The data obtained from the user-partner pairs, each containing six factors, were used in calculations as shown in Tables 1 and 2. All logs are transformed into session-based data. Each call is regarded as one session. In the case of SMSs, a chunk of SMSs is regarded as one session if no call has been made in between and every interval between the SMSs is shorter than a given threshold. Each user has a different threshold, which is calculated by clustering the SMS intervals into two classes, 'short' and 'long', by k-means clustering and taking the border value. Each SMS was calculated as 60 seconds. Note that the two factors
Table 1.Calculating method (frequency, intensity, and time)
Factors Frequency Intensity Time Statistical Representative Values Number of sessions Duration of the longest session Starting time of the latest midnight session The time was coded between 1 (=2AM, the midnight) and -1 (=2PM)

Table 2.Calculating method (day, directivity, and channel)


A B A B
Factors A B

Copyright is held by the author/owner(s). CSCW 2010, February 610, 2010, Savannah, Georgia, USA. ACM 978-1-60558-795-0/10/02.

Day Directivity Channel

# of Holiday sessions # of In sessions # of Call sessions

# of Workday sessions # of Out sessions # of SMS sessions

487

(Frequency, Intensity) are log-transformed to more closely fit in the normal distribution.
Setting up the reference

Table 3.The test to decide the thresholds of conditions


Pearson r 0.90 0.85 0.75 0.70 0.65 0.60 0.80 IOS SD 0.79 0.94 0.98 0.98 0.70 MSIS SD 0.87 0.95 0.99 1.00 0.78 N=5 0 0 7 10 16 14 16 This test results is to figure out the threshold of intimacy SD and Pearson R. When all conditions were satisfied, the values of Pearson R 0.8 and its corresponding intimacy SD were ultimately chosen.

We collected mobile communication logs of 14 South Korean people over a period of one month. These include text messages and call logs (Males: 8, Females: 6, Mean age: 27.4, Age standard deviation (SD): 2.6, Total number of logs: 7939). We selected 15 partners, on average, for each user to collect the intimacy levels of each partner via questionnaire surveys. We used the Inclusion of Other in the Self Scale (IOS)[1] and the Miller Social Intimacy Scale (MSIS)[6]. These scales provide quantitative measures of intimacy which can be used to measure intimacy of any relationship. The intimacy scores and all the factors from the log were standardized within respect to each user, to compensate for individual differences, and to compare the factors on the same scale.
Pattern Clustering

EVALUATION & DISCUSSION

We extracted intimacy patterns from the reference set, which consists of pairs of six intimacy factors and corresponding intimacy scores, through following clustering process (Figure 1 describes the pseudo-code of this method). When defining the patterns, some can be defined by all the factors, whereas some patterns can only be defined by some of the factors. For this reason, we narrowed the number of factors from six to four while making a combination of factors. A pattern should be satisfied based on the following three conditions: Pearson R among clustered pairs should be greater than the threshold, the intimacy SD within a cluster should be lower than the threshold, and a pattern should consist of more than five userpartner pairs of data for a reliable pattern. To determine the threshold of the intimacy SD and Pearson R between factors, an empirical method was used (Table 3). In the end, 33 patterns were derived.
for i 6 to 4: F the combinations of i factors from all factors sort F for each f in F: C agglomerativeClustering(ReferenceSet) with variable: f distance: Pearson R method: complete linkage stopWhen: distance > Tpearson for each c in C: c.remove(data that have Euclidean distance>Teuclid from the nearest data) if (|c| >= 5 && (TIOSSD -SD(IOS(c)))+ (TMSISSD - SD(MSIS(c)))> 0: Patterns.push(c) ReferenceSet.remove(data in c) end if end for end for end for

For the evaluation, one-month logs of mobile communication from 10 Korean people were used (Males: 5, Females: 5, Mean age: 28.4, Age SD: 2.9, Total number of logs: 5345). We selected approximately 15 partners for each user and the users were given an intimacy survey to answer. For each of the predefined patterns, the Centroid values for each factor were used as the representative value. Every test dataset had to be classified to the most similar pattern (the most similar Pearson R to the representative pattern) among the predefined patterns, and each of the classified data was given the Centroid value of intimacy according to the corresponding pattern. As Table 4 shows, compared to the correlation of frequency and selfreported intimacy, our method showed higher correlations with self-reported intimacy values. We also presented results from pattern closely matched datasets, which showed high similarity to the representative patterns.
Table 4.Prediction Results
Case Pattern closely matched R>0.85, N = 65 data only case R>0.75, N = 109 Pattern Intimacy case (All data, N = 159) Frequency Intimacy case (All data) Pearson R (p<0.01) IOS MSIS 0.576 0.607 0.541 0.588 0.478 0.496 0.399 0.433

Due to the fact that data were limited to mobile communication histories, the effect of other communication media were not considered which might not be sufficient enough to derive stable results; nevertheless, our proposed method could yield more practical mobile communication patterns than from a frequency based simple prediction method. This method could also help people to manage their IR, with information obtained semi-automatically from their mobile communication logs.
REFERENCES
1. Aron, A., Aron, E, and Smollan, D. Inclusion of Other in the Self Scale and the Structure of Interpersonal Closeness. Personality and Social Psychology, 63, 4(1992) 2. Avrahami, D., Hudson, S.E. Communication Characteristics of Instant Messaging: Effects and Predictions of Interpersonal Relationships. In Proc. CSCW, ACM Press(2006), 505-514 3. Eagle, N., Pentland, A., and Lazer, D. Inferring Social Network Structure Using Mobile Phone Data. In PNAS (2007) 4. Haslam, N. and Fiske, A. P. Relational Models Theory: A Confirmatory Factor Analysis Personal Relationships. Personal Relationships, 6, 2(1999), 241250 5. Hu, Y., Wood, J. F., Smith, V., and Westbrook, N. Friendships through IM: Examining the Relationship between Instant Messaging and Intimacy. Computer-Mediated Communication, 2004 6. Miller,R. and Lefcourt,H. The Assessment of Social Intimacy. Personality Assessment, 46, 5(1982), 514-518

Figure 1. F and C are the arrays that contain the combinations of factors and
the clusters of user-partner pairs from the Reference Set, respectively. In detail, |F| (the number of combinations) becomes 1 with i=6, 6 with i=5, and 15 with i=4. In the process of sorting F, the combination that has the factors of higher correlations with intimacy scores in the reference set than other combinations was sorted to front. Teuclid: the data (a user-partner pair) that are apart from the nearest one by more than 1.5 in Euclidean distance. Tpearson: Threshold of Pearson R, TIOSSD: Threshold of IOS SD (0.70), TMSISSD: Threshold of MSIS SD (0.78)

488