Professional Documents
Culture Documents
http://code.google.com/p/protobuf/
Overview
WhatareProtocolBuffers
Structureofa.protofile
Howtouseamessage
Howmessagesareencoded
Importantpointstoremember
MoreStuff
2/33
WhatareProtocolBuffers?
SerializationformatbyGoogle
usedbyGoogleforalmostallinternalRPC
protocolsandfileformats
(currently48,162differentmessagetypesdefinedintheGooglecodetreeacross12,183.proto
files.They'reusedbothinRPCsystemsandforpersistentstorageofdatainavarietyofstorage
systems.)
Goals:
Simplicity
Compatibility
Performance
3/33
ComparisonXMLProtobuf
Readablebyhumansbinaryformat
SelfdescribingGarbagewithout.protofile
Bigfilessmallfiles(310times)
Slowtoserialize/parsefast(20100times)
.xsd(complex).proto(simple,lessambiguous)
Complexaccesseasyaccess
4/33
ComparisonXMLProtobuf(cntd)
<person>
<name>JohnDoe</name>
<email>jdoe@example.com</email>
</person>
Person{
name:"JohnDoe"
email:"jdoe@example.com"
}
(==69bytes,510'000nstoparse)
(==28bytes,100200nstoparse)
cout<<"Name:"
<<
person.getElementsByTagName("nam
e")>item(0)>innerText()
<<endl;
cout<<"Email:"
<<
person.getElementsByTagName("emai
l")>item(0)>innerText()
<<endl;
cout<<"Name:"<<person.name()<<
endl;
cout<<"Email:"<<person.email()<<
endl;
5/33
Example
messagePerson{
requiredstringname=1;//nameofperson
requiredint32id=2;//idofperson
optionalstringemail=3;//emailaddress
enumPhoneType{
MOBILE=0;
HOME=1;
WORK=2;
}
messagePhoneNumber{
requiredstringnumber=1;
optionalPhoneTypetype=2[default=HOME];
}
repeatedPhoneNumberphone=4;
}
6/33
From.prototoruntime
Messagesdefinedin.protofile(s)
Compiledintosourcecodewithprotoc
C++
Java
Python
MorelanguagesviaAddOns(C#,PHP,Perl,ObjC,etc)
Usageincode
Passedvianetwork/files
7/33
MessageDefinition
Messagesdefinedin.protofiles
Syntax:Message[MessageName]{ }
Canbenested
Willbeconvertedtoe.g.aC++class
...
8/33
MessageContents
Eachmessagemayhave
Messages
Enums:
enum<name>{
valuename=value;
}
Fields
Eachfieldisdefinedas
<rule><type><name>=<id>{[<options>]};
9/33
Fieldrules
Required
exactlyonce(msg.fieldname())
Optional
Noneorone
Queryexistence(msg.has_fieldname())
Repeated
Nonetoinfinite(orderedarray)
Querycount(msg.fieldname_size())
Useoptionpacked=true forefficientencoding
10/33
Requiredisrequired
Fieldrulerequiredisatoughdecision
Onceafieldisrequired,itmuststayrequired
foreverunlesscompatibilitybetweenversionsis
tobebroken(notsuchagoodidea)
SomeengineersatGoogleadvisetoneveruse
required
11/33
Fieldtypes
.prototype
Note
float/double
C++type
float/double
int32/int64
Variablelength,primarily
suitedforpos.numbers
uint32/sint32(dto....64)
Variablelength,un/signed (u)int32/(u)int64
(s)fixed32,(s)fixed64
Fixedlength(un/signed),
bettersuitedfor>228/56
bool
int32/int64
(u)int32/(u)int64
bool
string
UTF8or7bitASCII
std::string
bytes
Arbitrarysequenceof
bytes
std::string
MessageorEnumtype
Correspondingclass
12/33
Fieldid(tag)
Eachfieldhasauniquetag(id)(1..2291)
(Uniquepermessagedefinition)
Variablelengthencoded1..15==onebyte
Identifiesthefieldwithinthebinaryformat
i.e.fieldnamesareNOTusedintheencodeddata
Assignedforlife
13/33
Options,namespacesandimporting
Options:
[default=value]setsadefaultvalue(beware:default
valuesarenotencoded!)
[packed=false/true]betterencodingofrepeated
[deprecated=false/true]marksafieldasobsolete
[optimize_for=SPEED/CODE/LITE_RUNTIME]
Javapackageandouterclassname
Namespaces/packagescanbedefinedviae.g.package
com.example.message
Importingofmessagesdefinedinotherfilesviaimport
14/33
filename.proto
Example(again)
messagePerson{
requiredstringname=1;//nameofperson
requiredint32id=2;//idofperson
optionalstringemail=3;//emailaddress
enumPhoneType{
MOBILE=0;
HOME=1;
WORK=2;
}
messagePhoneNumber{
requiredstringnumber=1;
optionalPhoneTypetype=2[default=HOME];
}
repeatedPhoneNumberphone=4;
}
15/33
OverviewWherearewe
WhatareProtocolBuffers
Structureofa.protofile
Howtouseamessage
Howmessagesareencoded
Importantpointstoremember
MoreStuff
16/33
From.prototocode
protoccompilercreatesclassesindesired
language
Example:protoccpp_out=.person.proto
willcreateperson.pb.ccandperson.pb.h
17/33
Generatedcode
//name//id
boolhas_name()const;boolhas_id()
const;
voidclear_name();voidclear_id();
conststring&name()const;int32_tid()const;
voidset_name(conststring&value);voidset_id(int32_t
)
voidset_name(constchar*value);
string*mutable_name();
//phone
inlineintphone_size()const;
inlinevoidclear_phone();
inlineconstRepeatedPtrField<Person_PhoneNumber>&
phone()const;
inlineRepeatedPtrField<Person_PhoneNumber>*
mutable_phone();
18/33
inlineconstPerson_PhoneNumber&phone(intindex)const;
inlinePerson_PhoneNumber*mutable_phone(intindex);
Settingvaluesinamessage
#include"person.pb.h"
Personperson;
person.set_name("HansMustermann");
person.set_email("hans@muster.mann");
//std::string*name=person.mutable_name();
//*name="HansMustermann";
Person::PhoneNumber*phone;
phone=person.add_phone();
phone>set_number("03012345678");
phone>set_type(Person::WORK);
phone=person.add_phone();
phone>set_number("0170987654321");
phone>set_type(Person::MOBILE);
//checkforvalidity:person.IsInitialized()==true?
19/33
Serializing
Serializedatavia
std::stringperson.SerializeAsString()
person.SerializeToString(std::string*)
person.SerializeToFileDescriptor(int)
person.SerializeToOstream(std::ostream*)
person.SerializeToArray(char*,intsize)
Example
std::ofstreamfile(filename,
std::ios::out|std::ios::binary);
if(false==file.fail()){
person.SerializeToOstream(&file);
20/33
Parsing
Parsevia
person.ParseFromIstream(std::istream*)
person.ParseFromString(std::string)
person.ParseFromFileDescriptor(int)
person.ParseFromArray(constchar*,int)
Example:
std::ifstreamfile(filename,
std::ios::in|std::ios::binary);
if(false==file.fail()){
person.ParseFromIstream(&file);
}
21/33
Retrievingvaluesfromamessage
#include"person.pb.h"
Personperson;
person.ParseFromIstream(file);
if(person.IsInitialized()){
cout<<"Name:"<<person.name()<<endl;
if(person.has_email()){
cout<<"Email:"<<person.email()<<endl;
}
for(inti=0;i<person.phone_size();i++){
cout<<"Phone:"<<person.phone(i).number()
<<endl;
}
}
22/33
OverviewWherearewe
WhatareProtocolBuffers
Structureofa.protofile
Howtouseamessage
Howmessagesareencoded
Importantpointstoremember
MoreStuff
23/33
Messageencoding
Fulldescriptionat
code.google.com/intl/apis/protocolbuffers/docs/encoding.html
Messagesareencodedinbinaryformat,many
key/valuepairs
Key=(id<<3)|wire_type
0=Varint(u/s/int32/64,bool,enum)
1=64bit(fixed64,sfixed64,double)
2=Lengthdelimited(string,bytes,messages,packed
repeatedfields)
5=32bit(fixed32,sfixed32,float)
Littleendian
24/33
MessageencodingVarints
lower7bitsperbyteareusedtostoredata;ifMSBis
set,thenextbytebelongstothisvalueaswell.
Example:100000001
300(100101100)1010110000000010
Example:messageTest1{requiredint32a=1;}
andsettingato150(0x96)isencodedas089601:
08=00001000,sowiretype=0(varint)andid=1
9601=10010110000000110010110150
Generic/unsignedintegertypesusevarintencoding
25/33
MessageencodingZigZag
int32storesnegativevaluesinfulllength
signedintegertypes(e.g.sint32)useZigZag
MappingsmallpositiveANDnegativevaluesto
smallsizes:
00
11
+12
23
24
i.e.n(n<<1)^(n>>31)
26/33
MessageencodingTherest
string,byte:varintencodedlength+rawdata
float,double:asis(littleendian)
repeatedfields:
packed=false:tag/idoccursmultipletimes
packed=true:tag+size+elements
Unusedfieldsarenotpartofthemessage
strings
27/33
OverviewWherearewe
WhatareProtocolBuffers
Structureofa.protofile
Howtouseamessage
Howmessagesareencoded
Importantpointstoremember
MoreStuff
28/33
Importantpointstoremember
Alwaysrememberthatbackwardandforward
compatibilityisgoal#1withprotobuf
Beabsolutelysureaboutafield'slongterm
necessitywhenusingrequired
Chooseidnumbers115foroftenusedvalues
(moreefficientlyencoded)
Chooseappropriatedatatypes,basedon
expectedvaluessigned/unsigned/genericmay
resultinbetterencoding
29/33
Updatingamessage
Toupdateamessage
Definenewfieldsasrepeatedoroptionalandset
sensibledefaultvalues(forbackwardscompatibility)
Donotchangetags/idsanddonotrecycletags/ids
(whene.g.removingoptionalfieldsinanupdate,make
surethattheidwillnotbeusedagain,preferablyby
prefixingthenameoftheobsoletefieldwithe.g.
OBSOLETE_)
Somedatatypechanges(e.g.betweenints)possible
Whenchangingdefaults,rememberthatdefaultvalues
30/33
arenotencodedbutalwaysusedasdefinedin.proto
Morestuff
Extensions
Definerangesoftags/idsthatcanbedefinedin
another.protofile
message OneMessage {
ext ensi ons 100 t o max;
}
/ / El sewher e. . .
ext end OneMessage {
opt i onal Foo f oo_ext = 100;
opt i onal Bar bar _ext = 101;
opt i onal Baz baz_ext = 102;
}
31/33
Morestuff(cntd)
Services
PossibletocreatestubsforRPCservicesusing
protobuf,e.g.
serviceSearchService{
rpcSearch(SearchRequest)returns(SearchResponse
);
}
Selfdescribingmessages,Reflection
Customoptions
32/33
Questions?
33/33