Professional Documents
Culture Documents
Cross-Platform Data Synchronization
Cross-Platform Data Synchronization
Dan Grover
Good morning. Im going to talk today about how you can write your own cross-plaform data synchronization as part of your iPhone apps.
Outline
1 Why Syncing Is Important 2
Syncing Through The Ages
and why you still might want to write your own
Heres what were going to talk about today. - First, I want to persuade you why data synchronization is important, and why you might want to add it to your app. - Next, Ill explain the ways data syncing has been solved by apps before, the advantages and disadvantages of various approaches, and explain why you may want to write your own. - Then were going to go over different algorithms that you can use to write your data synchronization code. Im going to be very abstract and handwavy because its hard to talk about this kind of stuff when youre also talking about implementation details. - Finally, well dig in and talk about how to actually implement this stuff in Objective-C using the Cocoa APIs available to you.
Who I Am
Former Northeastern student Independent software developer
Last November, I get an email from a friend of mine involved in the local Mobile Monday group here in Boston. They were going to do a fancy event at the Omni Parker House on up and coming mobile companies. Unfortunately, they couldnt nd enough up and coming mobile companies, so they asked me to present instead. So do you have anything you would like to present? At that time, I was mostly focused on Mac software -- I had a game out, but nothing much. So I said Oh, of course, I can demo the new iPhone version of ShoveBox Unfortunately, there was no iPhone version of ShoveBox. I didnt really want to do one. It was kind of beyond the scope of the app. And syncing was HAAARRRD. So I made up a functional prototype of the iPhone app. I added a pretend dialog to the Mac app to show it syncing. I had a script that I used to convert the example data over so it looked the same.
?
Friday, October 16, 2009 7
I actually *did* want to make the iPhone version for real, though. But I had no idea how it was going to work. I played around with a few half-way solutions -- storing the new entries and just propagating those on sync. But I realized that real, honest-to-god two-way syncing was doable if I just sat down and thought about it for a while. I studied all the ways that people are doing syncing and realized it wouldnt be too hard to write my own from scratch. Sounds crazy.
A few months later, I nally ship the iPhone version. Sales quadruple, it gets two reviews in Macworld. Still some bugs with syncing, but eventually those get ironed out.
Quick Demo
Outline
1 Why Syncing Is Important 2
Syncing Through The Ages
and why you still might want to write your own
1
Why Syncing is Important
Friday, October 16, 2009 11
Im going to get on my soapbox for a moment and explain briey why I think this is an important topic, and how its applicable to more apps than youd think.
12
Syncing has been something people have been trying to solve for a long time. If you follow the current hype, we dont have to worry about it because...
the
CLOUD
13
...you put everything on the cloud! The cloud will solve all our problems! The popular conception of the trend of cloud computing is a little wrong. People think of it as a monolithic thing.
14
But the reality is that the huge benet of cloud computing is that you can outsource the right things to the right people. I use one company for sending my email newsletters, because they have the best infrastructure and software for that. I use another for my regular web hosting, and yet another to host downloads. And I use a help desk app called Zendesk. So its not really on the cloud -- its on a lot of clouds! So were back to the same problem -- data is going to be in a ton of different places, and you have to build systems that can deal with that. Sync plays a big part.
CLOUD
CLOUD
A
A
CLOUD
Friday, October 16, 2009
CLOUD
15
So the futures more complicated than it seems. Its not the cloud, but lots of clouds and client apps and platforms and apparently goats. And they all have to be share data but operate independently.
16
And if you dont think data synchronization applies to your app, Id like you all to try this while youre in the city. I call it the Green Line Test.
17
I used to live near Lechmere in East Cambridge, and Id commute in to classes at Northeastern using the Green Line. The Green Line touches a lot of areas of Boston and goes above ground and below. Some of the stations underground are dead, some have reception. Inevitably, the ones that the train stops inexplicably for 20 minutes in will be those that dont. You see, theyve upgraded all the trains and havent quite got all the kinks worked out. If your app is one of those thin or hybrid apps that needs to make an HTTP request to do anything, you should try running your app for the entirety of a Green Line ride. How does it handle it when you lose connectivity for a minute? Pop up an error? Or stall indenitely? How good an experience is it? Do you cache things well, or does it always need a connection? If you nd that its not very good in this situation, you should consider making more of your application operate on the device itself, and then sync its state back to the cloud. It will be more responsive and usable more of the time. Youve probably avoided something like this because, well, syncing is a pain. But what Im going to talk about in this presentation will help.
2
Syncing Through The Ages
and
19
I thought this tweet from Steven Frank was funny. Its true. It never works. I think thats because theres not a lot of knowledge about syncing out there. There are a lot of companies that have written (bad) syncing, and a few academic papers on it. But not a lot of talk about syncing as a subject. If more people didnt have to waste all this time learning the basics for themselves, we could have better syncing as more people work out the kinks and integrate it in more systems.
Set-Reconciliation Problem
20
Academics call syncing the set reconciliation problem. Youve got two sets, and you want to reconcile their differences. The literature on it is pretty limited though.
rsync
Friday, October 16, 2009 21
Subversion
22
Subversion is a kind of syncing a lot of us probably use every day. Like most version control systems, the idea is that your whole team can have the most current copy of the code.
Data Files
23
But its important to note that theres a big difference between syncing *data* and syncing *les*. Syncing data is a LOT harder!
DropBox
Friday, October 16, 2009 24
Dropbox is a consumer le syncing solution. But it actually ends up working a lot more like Subversion than youd think. It keeps revisions and actually handles conicts in a neat way.
HotSync
Friday, October 16, 2009 25
Palm was one of the rst companies to try to make a comprehensive syncing solution for consumers. The way HotSync works is that, once youve done the rst sync, the Palm would set these status ags on any piece of data that you changed. That would make it really fast to sync back up with your PC, because the PC had an old copy of the data that both devices had the last time you synced.
Sync Services
Friday, October 16, 2009 26
Mac OS X
Sync Services is Apples syncing framework. Its pretty neat, and if you were like me and trying to write a Mac app that synced with an iPhone app, it would *almost* work.
Sync Services
Your App
27
Sync Services has this concept of a Truth Database -- where you replicate all your data so that it can sync it elsewhere. It gives you lots of goodies to sync your app to the Truth database -- pushing and pulling changes. They give you tools to dene the schema you want the Truth to keep for your data. But then it gets magically put on MobileMe and synced to other Macs. You dont have any control over that. The iPhone supports MobileMe, but only for syncing contacts, appointments, and notes. It doesnt read in the truth database from Sync Services, its totally separate. There is no Sync Services for the iPhone. So thats kind of a bummer.
History-Based
PROS
Ex-Post-Facto
PROS
- Easy to bolt onto an existing system - Hot swappable: arbitrary congurations of devices in any state can be synced
CONS
CONS
- All client software must maintain status ags/history - Does not scale as well - Complicated
29
History-Based
Subversion Dropbox HotSync (Fast)
Ex-Post-Facto
Rsync Sync Services HotSync (Slow)
30
3
Algorithms
and
Architecture
Friday, October 16, 2009 32
AB
33
So in these algorithms, were going to be a little abstract and think of this as two sets of data. - A is all the data thats on your rst device, B is all the data thats on the second device. - Heres all the data thats *only on A*. That needs to be put on B if it was added, deleted from A if not. - Heres all the data thats *only on B*. That needs to be put on A if it was added, deleted from B if not. - Heres the data thats on both. This is the trickiest part. We need to sift through this data and gure out if any of it has been modied since the last sync. We need to merge modications when we can, and otherwise, ask the user to resolve the conict.
34
35
So what is the goal of any sync algorithm? To make both sets of data the same. Well, that part is pretty easy. I could just erase whats on your server account and erase whats on your iPhone. Done! Turns out its more complicated. There are a lot of *correct* ways to make this happen, but only some of them are what the user is expecting to see. The sync also has to be fast. This usually means a minimum of data being transferred.
Three Algorithms
Copy Sync Merge
36
But there are a few ways to skin a cat. Lets look at each of these. They all meet the denition we discussed, but go about it differently.
Copy A
Good Will Hunting The Departed 21
B
Good Will Hunting Spenser: For Hire The Boondock Saints With Honors
37
Copy A
Good Will Hunting The Departed 21
B
Good Will Hunting The Departed 21
38
Merge A
Good Will Hunting The Departed 21
B
Good Will Hunting Spenser: For Hire The Boondock Saints With Honors
39
Merge A
Good Will Hunting Spenser: For Hire The Departed The Boondock Saints 21 With Honors
Friday, October 16, 2009
B
Good Will Hunting Spenser: For Hire The Departed The Boondock Saints 21 With Honors
40
Sync A
Good Will The Departed 21
B
Good Will Boondock With Honors
created: 2PM created: 11AM modied: 2PM modied: 11AM created: 1PM modied: 1PM
41
Sync A
Good Will The Departed Boondock With Honors
B
Good Will The Departed Boondock With Honors
created: 2PM created: 11AM modied: 2PM modied: 11AM created: 2PM modied: 2PM
42
Three Algorithms
Copy Sync Merge
43
So lets go back here and talk about when to use each of these algorithms: SYNC: This is what youre going to want to do 95% of the time. The other two algorithms are for when youre rst setting two devices up to sync. COPY: Some people doing sync like to offer you a choice of data on either device to become the one true set of data. MERGE: What I do with ShoveBox is just do a merge the rst time -- because there might be data on both devices they want to keep. It avoids any confusion over the choice, and nobodys going to be pissed with the initial result.
Sync: In Depth
PREPARE SYNC OBJECTS IN ONLY A SYNC OBJECTS IN ONLY B SYNC INTERSECTION CLEAN UP
Friday, October 16, 2009 45
Sync: In Depth
PREPARE SYNC OBJECTS IN ONLY A SYNC OBJECTS IN ONLY B SYNC INTERSECTION CLEAN UP
Friday, October 16, 2009 46
PREPARE
Establish Communication With Sources Grab summaries from A and B UUIDs, creation, modication
47
Sync: In Depth
PREPARE SYNC OBJECTS IN ONLY A SYNC OBJECTS IN ONLY B SYNC INTERSECTION CLEAN UP
Friday, October 16, 2009 48
For each object o in a: if o.creation > last sync then tell b to copy o over else tell a to delete o end if next
49
Sync: In Depth
PREPARE SYNC OBJECTS IN ONLY A SYNC OBJECTS IN ONLY B SYNC INTERSECTION CLEAN UP
Friday, October 16, 2009 50
For each object o in b: if o.creation > last sync then tell a to copy o over else tell b to delete o end if next
51
Sync: In Depth
PREPARE SYNC OBJECTS IN ONLY A SYNC OBJECTS IN ONLY B SYNC INTERSECTION CLEAN UP
Friday, October 16, 2009 52
SYNC INTERSECTION
For each object o in both a and b: if o.modication < last sync then skip it else if only as mod > last sync then propogate as version to b else if only bs mod > last sync then propogate bs version to a else if both a and bs mod > last sync then present conict end next
53
Sync: In Depth
PREPARE SYNC OBJECTS IN ONLY A SYNC OBJECTS IN ONLY B SYNC INTERSECTION CLEAN UP
Friday, October 16, 2009 54
CLEAN UP
55
2. Single modication date makes merging hard Keep per-attribute modication dates on each source
SOLUTION
Friday, October 16, 2009 56
INTERSECTION REVISITED
else if both a and bs mod > last sync then let c = new list of conicting keys let e = new entry record
for each key k on o if a[o].k == b[o].k then e.k = a[o].k else if only a[o].k.mod > o.last sync then e.k = a[o].k else if only b[o].k.mod > o.last sync then e.k = b[o].k else c += k end if end if next
57
Going Further
On textual keys, if the same key on the only ask the user to select one
same entry was modied on both entries, then use diff to do a text merge and version or the other if there is a text merge conict
58
Architecture
59
Architecture
Syncer
60
Architecture
Syncer
A
Source
Friday, October 16, 2009
B
Source
61
Architecture
Syncer
A
LocalSource DB SQLLite
Friday, October 16, 2009
B
Source Web Service
62
iPhone App
Architecture
Web Service
Friday, October 16, 2009 63
Architecture
iPhone App
The Cloud
Web Service
Friday, October 16, 2009 64
Architecture
Mac App iPhone App
65
Architecture
Syncer
A
Source
Friday, October 16, 2009
B
Source
66
A sync source supports: Create/Overwrite Object Delete Object Get Object Get summary
Friday, October 16, 2009 67
4
Implementing Sync in Objective-C
68
UDIDs
example:
DBCE017A-AF95-11DE-98BE-228156D89593
how to generate:
CFUUIDRef uuid = CFUUIDCreate(kCFAllocatorDefault); CFUUIDCreateString(kCFAllocatorDefault,uuid);
69
Dates
NSDate contains time zone info You can compare two NSDate objects
or two timestamps
Networking
Protocol choices: HTTP GameKit BEEP/BLIP-based protocol Roll your own (not recommended) Using Bonjour/ZeroConf
Friday, October 16, 2009 72
You have a few choices for your protocol. If youre communicating with a server, you can make yourself a web service API. Your sync source is just wrapping code that makes NSURLRequests. I made the unfortunate choice of using it locally over the network. Writing an HTTP server that just has to talk with one other device isnt too hard, but it was a really dumb architectural decision. Routers like to screw with it, even when its on a non-standard port.
73
SBIPhoneSyncSource
SBLocalDBSyncSource
74
- (id) initWithLastSyncDate:(NSDate *)lastSync sourceA:(NSObject<SBSyncSource> *)a sourceB:(NSObject<SBSyncSource> *)b operation:(SBSyncEngineOperation)newOperation; - (IBAction) start:(id)sender - (IBAction) cancel:(id)sender; - (NSDate *) lastSyncDate; - (NSString *) currentlySyncingObjectName; - (SBSyncEngineOperation) operation; - (NSObject<SBSyncSource> *) sourceA; - (NSObject<SBSyncSource> *) sourceB; - (NSObject<SBSyncEngineDelegate> *) delegate; - (void) setDelegate:(NSObject<SBSyncEngineDelegate> *)theDelegate;
75
typedef enum SBSyncEngineOperation { ! SBSyncEngineOperationSync = 0, // Time-based sync A and B ! SBSyncEngineOperationMerge = 1, // Non-destructive merge between A and B ! SBSyncEngineOperationCopy = 2, // Replace Bs contents with As } SBSyncEngineOperation;
76
@protocol SBSyncEngineDelegate - (void) syncEngineFinishedSyncingSuccesfully:(SBSyncEngine *)syncEngine; - (void) syncEngineDidCancel:(SBSyncEngine *)syncEngine; - (void) syncEngine:(SBSyncEngine *)syncEngine abortedWithError:(NSError *)err; - (BOOL) syncEngine:(SBSyncEngine *)syncEngine pausedWithRecoverableError:(NSError *)err; // return YES to continue, NO to cancel - (void) syncEngine:(SBSyncEngine *)syncEngine syncedObjects:(NSUInteger)objects ofTotal:(NSUInteger)total; // return the index of the correct choice - (NSUInteger) syncEngine:(SBSyncEngine *)syncEngine encounteredEntryConflictWithA:(NSDictionary *)aEntryInfo b:(NSDictionary *)bEntryInfo; - (NSUInteger) syncEngine:(SBSyncEngine *)syncEngine encounteredFolderConflictWithA:(NSDictionary *)aFolderInfo b:(NSDictionary *)bFolderInfo; - (NSUInteger) syncEngine:(SBSyncEngine *)syncEngine encounteredSimpleEntityConflictWithKeyPath:(NSString *)keyPath aValue:(id)aValue bValue:(id)bValue; @end
Friday, October 16, 2009 77
Questions/Discussion
78