You are on page 1of 138

ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD

v 1.0 - Lorenzo Zimolo


TECNICO SUPERIORE PER LE APPLICAZIONI
DI DATA INTEGRATION IN AMBIENTE
CLOUD
v 1.0
Lorenzo Zimolo
lorenzo.zimolo@sinesy.it
google.com/+LorenzoZimoloSinesy
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Cloud computing
"Cloud computing is a model for enabling ubiquitous,
convenient, on-demand network access to a shared
pool of configurable computing resources (e.g., networks,
servers, storage, applications, and services) that can be rapidly
provisioned and released with minimal management effort or
service provider interaction."
NIST, il National Institute of Standards Technology
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Cloud computing: why?
pay per use (many forms)
availability (distributed and replicated data centers)
security (physical and logical)
scalability (from zero to infinite, with caps)
flexibility
computation power
costs
reduced in-house infrastructure
So....
think about services you need
rethink IT role in your company
where is my data?
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Current providers examples
Google
Amazon
Microsoft
VMWare
Force.com
IBM
...
In Italy
Telecom Italia
Aruba
....
Every provider brings its own ideas and implementation of the cloud.
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Service models
Software as a Service (SaaS). The capability provided to the consumer is to use the providers applications
running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client
interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control
the underlying cloud infrastructure including network, servers, operating systems, storage, or
even individual application capabilities, with the possible exception of limited user specific application configuration settings.
Platform as a Service (PaaS). The capability provided to the consumer is to deploy onto the cloud infrastructure
consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by
the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers,
operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the
application-hosting environment.
Infrastructure as a Service (IaaS). The capability provided to the consumer is to provision processing, storage,
networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which
can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure
but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking
components (e.g., host firewalls).
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Service models stack
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Cloud computing dimensions
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Google Cloud Implementation
All three service models: SaaS, PaaS, IaaS.
All main cloud aspects are addressed, with particular stress to:
less infrastructure management
pay only what you use (pay as you go)
high scalability
Focused on public cloud, multitenant SaaS or reserved execution runtime instances.
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Download da:
https://www.google.com/intl/en/chrome/browser/
Modalit incognito:
https://support.google.com/chrome/answer/95464?hl=it
Store applicazioni:
https://chrome.google.com/webstore/category/apps
Chrome
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Google Saas
GMail (and more!)
Google Apps (for Business, for Education) http://www.google.
com/enterprise/apps/business/
Google Maps (for Business) http://www.google.com/enterprise/mapsearth/
Google Analytics http://www.google.com/analytics/
Google Ads (AdSense, AdWords) http://www.google.com/ads/
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Sito Google Apps:
https://www.google.com/intx/it/enterprise/apps/business/
Prodotti inclusi:
https://www.google.com/intx/it/enterprise/apps/business/products.html
Google Apps Marketplace:
http://www.google.com/enterprise/marketplace/
Google Apps for Business
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Documentazione:
https://www.google.com/intx/it/enterprise/apps/business/resources/library.html
Webinar:
https://www.google.com/intx/it/enterprise/apps/business/resources/recorded-webinars.
html
Supporto e documentazione tecnica GMail:
https://support.google.com/mail/?hl=en#topic=3394144
Google Apps for Business
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Documentazione Drive:
https://support.google.com/drive/?hl=en#topic=14940
Documentazione su tutti i prodotti:
https://support.google.com/
Status dashboard:
http://www.google.com/appsstatus#hl=en&v=status
Google Apps for Business
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
E-mail protocols
RFC: Request For Comments. Officila documents describing Internet protocols.
POP3
RFC: http://tools.ietf.org/html/rfc1939
Wikipedia: http://it.wikipedia.org/wiki/Post_Office_Protocol
http://en.wikipedia.org/wiki/Post_Office_Protocol
IMAP:
RFC: http://tools.ietf.org/html/rfc3501
Wikipedia: http://en.wikipedia.org/wiki/Internet_Message_Access_Protocol
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
E-mail protocols
SMTP
RFC: https://james.apache.org/server/rfclist/smtp/rfc0821.txt
Wikipedia: http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol
Spiegazione:
http://computer.howstuffworks.com/e-mail-messaging/email3.htm
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Revision control - Versioning
CVS: http://en.wikipedia.org/wiki/Concurrent_Versions_System
Revision Control: http://en.wikipedia.org/wiki/Revision_control
How il works:
http://betterexplained.com/articles/a-visual-guide-to-version-control/
Open source products:
SVN
GIT
List: http://en.wikipedia.org/wiki/List_of_revision_control_software
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Google Cloud Platform
Iaas and Paas by Google
Official site: https://cloud.google.com/
Products: https://cloud.google.com/products/
Documentation: https://cloud.google.com/developers/
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Google Cloud Platform
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Load balancer
Doc:
http://en.wikipedia.org/wiki/Load_balancing_(computing)
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
HTTP/HTTPS
HTTP:
http://en.wikipedia.org/wiki/HTTP
HTTPS:
http://en.wikipedia.org/wiki/HTTP_Secure
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Google App Engine (GAE)
Fully Managed Platform
Easy Development & Deployment
Focus On Your Code Not Your Server
Automatic Scaling
Popular Programming Language Support
Flexible and Scalable Application Storage
Services (Cron, Queue, Memcache, etc)
Datastore
Versioning and Traffic Splitting
Local Developer Tools
Third-party Frameworks and Extensions
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Home page:
https://developers.google.com/cloud/
App Engine:
https://developers.google.com/appengine/
Google Cloud Platform Developer Docs
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Java SDK 7
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.
html
Java 7 API
http://docs.oracle.com/javase/7/docs/api/
Eclipse Ide for Java EE Developers 4.3
https://www.eclipse.org/downloads/
GAE Java develpment environment
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
App Engine development
Google SDK
https://developers.google.com/appengine/downloads
Google Plugin for Eclipse:
https://developers.google.com/appengine/docs/java/tools/eclipse
https://developers.google.com/eclipse/docs/install-eclipse-4.3
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Java Web Technologies: Servlets and JPSs
Tutorial:
http://courses.coreservlets.com/Course-Materials/csajsp2.html
http://docs.oracle.com/javaee/5/tutorial/doc/bnafd.html
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
App Engine
What it is:
https://developers.google.com/appengine/docs/whatisgoogleappengine
Scalability
as it gets spike of traffic at the event of earthquake
Reliability
it's useless if it does not work at the time of disaster
Cost efficiency
it's too expensive to prepare for enough hardware resource that can handle the peak
traffic. they would be idle for the most of the time
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Traditional solution
What if you have:
Hardware failures
Traffic Spike
Growing Big Data
No initial fund
No one to build/operate
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Hosting challenges
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
The Google Way!
Encourage Google's best practice for
scalability and reliability.
Non-relational data model by
Datastore/Bigtable
sharding, denormalization...
Portable and fine-grained app
design
fast request handling to
optimize server resource
utilization
independent to each
physical server
It's not just a hosting service: App Engine
empowers you to design your app in the
Googley way!
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Results...
Significantly lower Total Cost of Ownership
Economy of scale
Easy to develop and deploy
Free to start - no initial cost
Lower operational cost
no security patches, upgrades, etc.
24x7 operation by Google SREs
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Request Queue
App Engine watches Pending Request Queue
of each version
Let's see what would happen if your app gets a
traffic spike
Instances dynamically added/removed
based on queue size
Pendi
ng
Requ
est
Queu
e
Idle
Instances
Pending
Latency
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
GAE Console and status check
https://appengine.google.com/
Log View (quota errors!)
Versions
Downtime notify group: google-appengine-downtime-notify
Status page: https://code.google.com/status/appengine
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Features, prices, quotas
Features: https://developers.google.com/appengine/features/
Prices: https://developers.google.com/appengine/pricing
Quotas: https://developers.google.com/appengine/docs/quotas
Limit to resource usage to protect the AppEngine System
Quota errors!
To avoid quota errors, enable billing and set budget on resources.
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Billing
To enable Billing on your account:
Admin Console > Billing Status > Enable Billing
Your app will be run under Billing Enabled account
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Application versions
Version is stored in
Java: WEB-INF/appengine-web.xml
Python: app.yaml
Can use any text for version
Only one version can be default
Min/Max idle instances are default
version
It takes a few moments to switch to new
default version. This depends on complexity
of application/start-up time and the current
load on your application.
Traffic splitting can route
percentage to non default
versions
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Logging API
https://developers.google.com/appengine/docs/java/logs/
Logging in Java
https://developers.google.com/appengine/docs/java/?csw=1#Java_Logging
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Apache JMeter
Load test and performance measurement suite.
Web Site: https://jmeter.apache.org/
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
App Stats for GAE
Application efficiency (and cost!)
App Stats:
https://developers.google.com/appengine/docs/java/tools/appstats
Exercise: enable appstats in your Java application
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Enable app stats
<filter>
<filter-name>appstats</filter-name>
<filter-class>com.google.appengine.tools.appstats.AppstatsFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>appstats</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
<servlet>
<servlet-name>appstats</servlet-name>
<servlet-class>com.google.appengine.tools.appstats.AppstatsServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>appstats</servlet-name>
<url-pattern>/appstats/*</url-pattern>
</servlet-mapping>
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Authentication & Authorization
Authentication: who are you?
Authorization: what can you do in the app?
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Authentication
Otherwise: Custom code
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Custom Authentication
Develop any custom authentication mechanism within application
Enterprise SSO systems (only if uses 80/443 ports)
Username/password
How?
Setup application as "Open to All Google Account users"
Do not restrict access
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Restricting access
https://developers.google.com/appengine/docs/java/users/
https://developers.google.
com/appengine/docs/java/config/webxml#Security_and_Authentication
https://developers.google.
com/appengine/docs/java/javadoc/com/google/appengine/api/users/package-summary
web.xml
<security-constraint>
<web-resource-collection>
<url-pattern>/profile/*</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>*</role-name>
</auth-constraint>
</security-constraint>
Excercise: add a protected URL in
your app
ADMIN:
<security-constraint>
<web-resource-collection>
<url-pattern>/developer</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>admin</role-name>
</auth-constraint>
</security-constraint>
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
User API
Identify basic details about the logged in user
Nickname
Email
User ID (empty for federated user)
Federated identity (Open ID identifier)
Federated provider (url of federation provider)
Functions
Create Login URL
get current user
is current user admin
Environment variable
Domain (AUTH_DOMAIN)
Java
UserService userService =
UserServiceFactory.getUserService();
if (req.getUserPrincipal() != null) {
//logged in user
} else {
//logged out
}
if (userService.isUserAdmin()) {
//logged in user is an Admin
}
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Configuration files
web.xml
https://developers.google.com/appengine/docs/java/config/webxml
appengine-web.xml
https://developers.google.com/appengine/docs/java/config/appconfig
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Data storage options
Cloud Datastore
Cloud SQL
Google Cloud Storage
Used for large flat files
File I/O interface
Blobstore
Key/Value storage
Google Apps
Docs, Spreadsheets, Drawings, etc.
Not appropriate for application back-end data
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Storage options
https://developers.google.com/appengine/docs/java/storage
Cloud Datastore
https://developers.google.com/appengine/docs/java/datastore/
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Cloud Datastore: motivation
Single Instance
Performance limited by machine resources
Single point of failure
Replication (copy)
Consistency among instances
Sharding (split among machines)
Lock control (transaction)
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Consistency
Strong Consistency
Data is always consistent among all database instances
Just after write operation
Crash in the middle of write operation
Eventual Consistency
Takes time until all data becomes consistent after write
(Think of DNS as an example)
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Datastore secrets
Table
Table
Table
write
write
write
write
write
write
write
write
write
write
write
write
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Terminology
Datastore RDBMS
Category of object
Kind Table
One entry/object
Entity Row
Unique identifier of data
entry
Key Primary Key
Individual data
Property Field
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Each object in the datastore is an entity
Each entity has a unique key
Each entity has one or more named properties
Can be multi-valued (== tests if any value matches)
Variety of data types (int, float, boolean, String, Date, etc.)
Each entity is of a particular kind
BlogEntry
Key: ID=1234
name: joe@ex.com
message: xxxxx
date: 1/1/2012 12:32
Following
Key: joe@ex.com
email: joe@ex.com
following:
[usr2@ex.com, usr3@ex.
com]
followers:
[]
Key: usr2@ex.com
email: usr2@ex.com
following: []
followers:[joe@ex.com]
Entities
Entity Kinds
Properties
Key
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Create an entity
DatastoreService datastore =
DatastoreServiceFactory.getDatastoreService();
Entity employee = new Entity("Employee");
employee.setProperty("name", "Antonio Salieri");
employee.setProperty("hireDate", new Date());
employee.setProperty("attendedHrTraining", true);
datastore.put(employee);
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Available APIs
Java
Low-level API
The best performance, but more coding
JDO/JPA
More portability by Java standard APIs
Third party frameworks
Objectify, Twig, Slim3...
Sophisticated features with better performance
Python
DB API
Traditional Datastore API for Python
NDB API (New DB)
Automatic caching, sophisticated queries, atomic transactions
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Quering the Datastore
Query query = new Query("Person");
Query.Filter nameFilter = new FilterPredicate(
"name", FilterOperator.EQUAL, "John");
query.setFilter(nameFilter);
PreparedQuery results = datastore.prepare(query);
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Filters
Filter on:
property values
keys
ancestors
Filter on property values
Equality Filter (Equal to)
IN -- Member of a list
Inequality Filters
Not equal to
Less than
Less than or equal to
Greater than
Greater than or equal to
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Adding more filters
Query query = new Query("Person");
Query.Filter filter1 = new FilterPredicate(...);
Query.Filter filter2 = new FilterPredicate(...);
Query.Filter comboFilter =
CompositeFilterOperator.and(filter1, filter2);
query.setFilter(comboFilter);
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Sort by properties
Sort by ascending or descending value of a property
Some restrictions on sorting (discussed later)
query.addSort("name", SortDirection.ASCENDING);
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Query for descendants
An entity can have a parent
Specify the parent when you create the entity
You can query for descendants of an entity
Conference1
Workshop1
Workshop2
Ticket2
Ticket1
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Ancestor query
To add an ancestor filter to a query:
Query query = new Query("Kind");
query.setAncestor(parentKey);
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Index and queries
SELECT * FROM Person
WHERE height < 72
ORDER BY height DESC
height: 76
height: 75
height: 73
height: 71
height: 70
height: 68
height: 67
height: 64
first_name: John
height: 71
first_name: Bob
height: 70
first_name: Kate
height: 68
Index table for height
Range
Scan on
Bigtable
Entities
in the
query
result
Datastore
Query
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Datastore requires indexes for every query
Otherwise the query fails
Not like the index in RDB
which is used to improve performance
The Index Scan makes it possible for
query performance to scale with
the size of the result set, not the
data set.
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Single property index
Key of Index table
Query for kind Person
first_name >= "A" and first_name < "C"
Scan range [Person first_name A, Person first_name B]
Person / first_name / Audrey
Person / first_name / Ben
Person / first_name / Bridgit
Person / first_name / Cathy
Two single-
property indices
are created
automatically:
ascending
descending
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Queries supported by one-properties indexes
Equality filters on one or more properties
first_name = 'Bob' AND last_name = 'James'
Inequality filters on one property
first_name >= 'B' AND first_name < 'C'
AND first_name != 'Bob'
One sort order
ORDER BY last_name ASC
will be executed as
first_name < 'Bob' OR first_name > 'Bob'
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Complex Queries
Composite Index
must be
explicitly
configured.
Query for kind is Person
last_name="Smith"
first_name > "A" and first_name < "D"
Scan range [Person Smith B, Person Smith C]
Kind / last_name / first_name
Person / Raley / Jane
Person / Smith / Ben
Person / Smith / Cathy
Person / Smith / Daniel
Person / Thomas / Alice
Equality filter +
Inequality filter
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
How to create indexes
App Engine creates single property indexes for all properties
You can run queries in the development server to create custom indexes
You can create or edit index configuration file
Java
XML
WEB-INF/datastore-indexes.xml
WEB-INF/appengine-generated/datastore-indexes-auto.xml
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Multi valued fields
Entity kind Person
name = Brian
lucky_number = {1, 5, 7, 9}
Kind / property / value
Person / lucky_number / 1
Person / lucky_number / 5
Person / lucky_number / 7
Person / lucky_number / 9
An index entry is created for EVERY value of a property
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Multi valued properties in queries
Matches query for:
Kind is Person
lucky_number = 1
Entity kind Person
name = Brian
lucky_number = {1, 5, 7, 9}
Multi-valued properties match a query
If AT LEAST ONE value matches ALL the filters
Matches query for:
Kind is Person
lucky_number > 2 and
lucky_number < 6
Does NOT match query for:
Kind is Person
lucky_number > 1 and
lucky_number < 5
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Missing properties
Kind last_name first_name
Entities with no property or an unindexed value
are not included in results
Person Anderson Jane
Person Arundel
Person Jenny
X
Missing Property
is not equal to Null/None
Query for:
Kind = Person
last_name != Arthur
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Inequality filters
Inequality filters: limited to one property per query
Query for:
first_name = Cathy
last_name > Able
last_name < Mooney
Query for:
first_name > Cathy
last_name > Able
OK X
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Inequality filters and sorting
A property with an inequality filter must be sorted first
Query for:
first_name = Cathy
last_name > Able
sort by last_name
Query for:
last_name > Able
sort by first_name
OK
X
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
JOINs are not permitted
SELECT FROM Person
WHERE age > 25 and country = US
Use Denormalization
It's a known practice for any scalable database design
SELECT * FROM PERSON p, ADDRESS a
WHERE a.person_id = p.id AND p.age > 25
AND a.country = US
Maintain country in Person
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Aggregation Queries not supported
Datastore does not support aggregation queries
(group by, having, sum, avg, max, min)
Use a special entity that maintains aggregated values
counter entity
be careful not to make the entity bottleneck
(by 1 updates/sec limit)
use Sharding Counter pattern
or Memcache putIfUntouched() + Datastore insert
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Aggregation queries not supported
Use batch processing to aggregate values asynchronously
Backend instance
App Engine MapReduce
Datastore Statistics
for counting entities
updated once per day
Use Sorting for MIN() or MAX()
Sort by a property: the first entity will have min/max value.
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Cost of indexes
Index consumes Datastore space & instance hours
Take the cost of Index into account for cost estimation
Read:
Understanding Write Costs
How Entities and Indexes are Stored
New Index for a large set of entities may take a long time
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Datastore statistics
View Statistics in the Admin Console
Datastore > Datastore Statistics
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Setting property as unindexed
conference.setUnindexedProperty(
"mainContact", "Adam Bolivar");
Good practice -- don't index long strings, such as descriptions, you won't usually be querying
them. You would use the Search API to search them.
In addition to any unindexed properties you declare explicitly, those typed as long text strings
(Text) and long byte strings (Blob) are automatically treated as unindexed.
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Deleting old index
Existing indexes remain when you
change the index config file
To delete an unused index:
Update index config file
Then:
appcfg.sh vacuum_indexes myapp
This lets you leave an
older version of the app
running while new
indexes are being built,
and to revert to the older
version if needed
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Query and indexes
When do the
indexes get
updated?
Every query to the datastore uses an index:
an automatically-generated single property index or
a custom index
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
When are indexes updated?
Every entity update has multiple writes:
commit phase:
writes data in log
write phase:
writes data to datastore
updates indexes
might take longer than
writing to the datastore
If commit phase succeeds,
write phase is guaranteed to succeed, but might not
happen immediately
What happens if I
query an entity
before the
indexes are
updated?
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
What do you get when you run a query?
Queries always return results from the INDEX
What's an
ancestor
query?
It's a query that uses an ancestor filter.
Results only include descendants of a
specific entity.
Results are strongly consistent -- completely
up to date
to get latest updates, use ancestor queries
ancestor queries force the index to update
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Entity groups
When you create an entity, you can specify its parent
Each entity is its own entity group by default
Parent child relationships are forever!
Terminology Tip:
Entities that descend from a common ancestor are in an entity group
Conference1
Workshop1
Workshop2
Ticket2
Ticket1
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Eventual vs Strong Consistency
Queries using an ancestor filter force applicable index updates to
complete
strongly consistent
Queries without an ancestor filter get results from the last index update
eventually consistent
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Entity groups: used for...
Entity groups are useful
for ancestor queries to get
strongly consistent results.
What else are entity
groups used for?
Entity groups are used in:
Ancestor queries
Transactions
Transactions!
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
What is a transaction?
Atomicity
Each transaction is "All or Nothing"
Consistency
Each transaction brings the datastore from one valid state to another
Isolation
Concurrent execution of transactions does not break consistency
Durability
Committed results of transaction persist after hardware failures
Transaction - a set of operations performed on a data store,
that preserves ACID characteristics.
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Snapshot isolation
All reads in a transaction reflect the state of the Datastore at the time
the transaction started
If an entity is modified or deleted in the transaction,
a query or get returns the original version of the entity, or nothing if the
entity did not exist then
https://developers.google.com/appengine/docs/python/datastore/transactions#Isolation_and_Consistency
https://developers.google.com/appengine/articles/transaction_isolation
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Optimistic concurrency
The first transaction to commit its
changes succeeds
All others fail.
The others can try again to apply
their changes to the updated
data.
What happens if
multiple
transactions try to
update the same
entity group at the
same time?
A transaction commits its changes only if:
the values updated by the transaction have not changed since the
snapshot was taken
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Using transactions
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService()
Transaction txn = datastore.beginTransaction();
try {
Key empKey = KeyFactory.createKey("Employee", "Joe");
Entity employee = datastore.get(empKey);
/*... reading and writing on employee ...*
datastore.put(employee);
txn.commit();
} finally {
if (txn.isActive()) {
txn.rollback();
}
}
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Entity operations in a transaction
Single entity transactions:
update a single entity group
Cross-entity transactions:
update up to 5 entity groups
Operations on an entity group:
create entities
update entities
delete entities
Queries inside a transaction
return results from the state
of the datastore before the
transaction started
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Transaction limits
Limit to the number of entity groups
1 for single entity group transaction
5 for cross-entity group transactions
Limit to number of updates per entity group per second
Usually between 1 and 5 updates per second
Duration limits
Max duration of 60 seconds
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Best practices
A transaction should happen quickly to minimize chances of external
changes that conflict with the transaction
Prepare data outside the transaction
Prepare keys outside the transaction
Use the keys to fetch entities inside the transaction
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Lets code!
Working with entities:
https://developers.google.
com/appengine/docs/java/datastore/entities#Java_Working_with_entities
Queries:
https://developers.google.com/appengine/docs/java/datastore/queries
https://developers.google.com/appengine/docs/java/datastore/projectionqueries
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
What is memcache?
Memcache is an in-memory Key-Value Pairs data store
Put a value with a key
Get a value with a key
"user001" : "John
Doe"
"user002" : "Larry
Page"
key or value can be anything that is serializable
Memcache is a shared service accessed via App Engine APIs.
value key
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Why memcache?
Improve Application Performance
Reduce Application Cost
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
What is memcache for?
Caching In Front of Datastore
Cache entities for low-latency reads
Integrated into most ORM frameworks
(ndb, Objectify, ...)
Caching for Read heavy operations
User authentication token and session data
APIs call or other computation results
Semi-durable Shared state Across App Instances
Sessions
Counters / Metrics
Application Configurations
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
How fast is memcache?
Datastore Query
Latency
Memcache Read
Latency
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Memcache APIs
Java
JCache APIs
GAE Low-Level Memcache APIs
Objectify for Datastore
Python
google.appengine.api.memcache module
ndb for Datastore
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
General pattern for Datastore
Coordinate data read with Datastore:
Check if Memcache value exists
if it does, displays/uses cached value directly; otherwise
fetch the value from Datastore and write the value to Memcache
Coordinate data write with Datastore:
Update Memcache value
to handle race condition, leverage put if untouched/compare and
set to detect race conditions
Write the value to Datastore
optionally, leverage the task queue for background writes
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Java code example
import com.google.appengine.api.memcache.*;
...
MemcacheService syncCache = MemcacheServiceFactory.getMemcacheService();
syncCache.setErrorHandler(ErrorHandlers.getConsistentLogAndContinue(Level.
INFO));
value = (byte[]) syncCache.get(key); // read from cache
if (value == null) {
value = getDataFromDb(key); // fetch value from datastore
syncCache.put(key, value); // write to cache (key and value must be
serializable)
}
...
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Batch operations
getAll(), putAll(), deleteAll()
A single read or write operation for multiple
memcache entries
Note
Further improve Memcache
performance
Batch size < 32 MB
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Athomic operations
increment(key, delta), incrementAll(...),
Provide atomic increment of numeric value(s)
getIdentifiable(), putIfUntouched()
A mechanism to update a value consistently by concurrent requests
Note
Helps managing memcache data consistency in
multi-instances/concurrent environment
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Other
Asynchronous calls
Provides a mechanism to make a non-blocking call for memcache
operations.
Namespace
Logically separates data layers for different application purposes
(such as multi-tenancy) across many GAE services, such as
Datastore, Memcache, Task Queue etc..
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Memcache is volatile
Entries can be evicted anytime for various reasons:
entry reaches expiration
entry is evicted because memcache memory is full
memcache server fails
It's important to handle cache-miss gracefully!

Implement write-through logic by backing
memcache with datastore in your application!
Tip
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Memcache is not transactional
$100
Instance 1 reads $100
Use getIdentifiable() and putIfUntouched
(...) for optimistic locking.
$100
Instance 2 reads $100
$80
Instance 2 deducts $20
$70
Instance 1 deducts $30
Tip
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Free memcache is limited
Your application should function without
memcache.
Only need to cache what is useful and
necessary.
Compression
Improve the cache-hit rate
Dedicated memcache
Cache size in GB (QPS 10K/GB)
My Application Does NOT Have Enough Memcache!
Tips
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Memcache key points
1. Memcache is supported natively in GAE.Take advantage of it to improve
your GAE application performance.
2. Memcache supports open standard JCache API. Many advanced features
are available via GAE Memcache APIs to suit your application's needs; i.e.
Batch, Atomic, Asynchronous operations.
3. Seamless integration with GAE Datastore in a few libraries such as Python
ndb and Java Objectify.
4. Read-frequently and write-rarely data is most suitable for use with
Memcache.
5. Handle Memcache's volatility in your application.
6. Use Memcache wisely, it is not an unlimited resource.
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Task queues and cron
Task Queues
Push Queues
Pull Queues
Cron or Scheduled Tasks
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Task basic concepts
Task: A task is a unit of work such as 'write
object to datastore' or 'send an e-mail'
All versions of an application share
queues
push queues for auto execution
pull queues to programmatically consume
tasks
Tasks have a unique name
generated automatically if
not assigned
insert new task with same
name will fail
Instances
QueueName
TaskName
Tag
Payload
QueueName
TaskName
URL + Params (i.e.?id=x)
Method (GET, POST, etc.)
RetryOptions
MaxBackoff
MaxDoublings
MinBackoff
AgeLimit
RetryLimit
TaskRetryCount
ExecutionCount
TaskETA
ExecutionDelay
Tag
Payload
P
u
s
h

Q
u
e
u
e
P
u
l
l

Q
u
e
u
e
Task
Task
Task
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Tasks overview
The task queue is a simple way to perform work outside of a
user request.
Push Queue:
Pull Queue:
Features:
Executed ASAP
May cause new instances
Frontend or Backend
- 10 minute deadline
(Frontend)
- Unlim deadline (Backend)
Max 100KB task size
Features:
Task leased by worker
REST interface (w/ACL)
- Can be outside App
Engine
Max 1MB task size
Instances
5 4 3 2 1
Instances
Instances
5 4 3 2 1
Instances
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Push task creation code
import com.google.appengine.api.taskqueue.Queue;
import com.google.appengine.api.taskqueue.QueueFactory;
import com.google.appengine.api.taskqueue.TaskOptions;
Queue queue = QueueFactory.getDefaultQueue();
queue.add(TaskOptions.Builder.withUrl("/worker").param("id", "123"));
//calls url "/worker" via POST with param id=123
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Deleting a task
Admin interface (one task or entire queue)
Code
Java:
one named task:
QueueFactory.getQueue("foo").deleteTask("myTask")
or all tasks:
QueueFactory.getQueue("foo").purge();
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Pull queues
Add task:
Queue q = QueueFactory.getQueue("pull-queue");
q.add(
TaskOptions.Builder.withMethod(TaskOptions.Method.PULL)
.payload("hello world"));
Lease then delete:
tasks = q.leaseTasks(3600, TimeUnit.SECONDS, 100);
//Do work!!!
q.deleteTask(tasks);
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Cron
App Engine allows tasks to be scheduled at defined times or regular
intervals via cron.
Cron tasks use a queue named "__cron".
At the predefined time, it executes a GET request to the specified path
Java
cron.xml:
<?xml version="1.0" encoding="UTF-8"?>
<cronentries>
<cron>
<url>/recache</url>
<description>Repopulate the
cache</description>
<schedule>every 2 minutes</schedule>
<target>version-2</target>
</cron>
</cronentries>
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Cron configuration
Parameters:
url: the url to call (escape &, <, >, ', ")
schedule: the times/dates to execute the task
timezone: optional, the standard zoneinfo name (defaults to UTC)
target: optional, the target version of application (defaults to the default)
Schedule format:
every 12 hours
every 5 minutes from 10:00 to 14:00
2nd,third mon,wed,thu of march 17:00
every monday 09:00
1st monday of sep,oct,nov 17:00
every day 00:00
Specify "synchronized" to execute on
regular interval regardless of how
long it takes to execute.
every 2 hours synchronized
Please Note
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Cloud Storage
Is a fast, scalable, highly available, strongly consistent object store
Objects can have almost arbitrary size (max 5 TB)
Use cases: SongPop, UBISoft
Cost is: storage $0.026 /GB/month + egress traffic $0,10 /GB
https://developers.google.com/storage/docs/overview
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Cloud Storage Structure
1. Projects
a. All data belongs
to a project
2. Buckets
a. Buckets are the basic data containers
b. Buckets belong to a project
3. Objects
a. Objects are the individual pieces of
data
b. Objects belong to a bucket
Note: No hierarchical structure of objects or
buckets (i.e., no folders)
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Buckets
Buckets are the basic container
Buckets cannot be nested
Bucket names must conform to standard Domain Name System (DNS) naming
conventions
Bucket names are global to the entire Google Cloud Storage
Don't put any confidential information into a bucket name
Must be unique
Buckets Geographical Locations
EU
US
experimental: regional buckets in the US
Can specify region at bucket creation time
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Objects
Objects are the immutable pieces of data you store in Google Cloud Storage
Object names are unique within a bucket
Object name can be up to 1024 unicode characters
Directory structure:
No concept of directories. Everything is a blob of data.
Slashes ('/') are legal object name and you can mimic directory listings by
using slash as the delimiter parameter
myfirstbucket/faucets/grohe_201x.jpg
myfirstbucket/showers/grohe_202b.jpg
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Objects
Objects are strongly consistent
There is no limit on how many objects you can put into a bucket
Listing is eventually consistent
For speed, you can index the objects using an index service if you plan to
store more than a few thousands objects (e.g., use Cloud Datastore as
your index)
Object can be up to 5TB in size
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Cloud console
Manage billing
Create and manage projects
Create and manage buckets
Browse buckets
Delete objects
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Access control
Allows you to share your objects and buckets with ...
Google
Account User
Google Apps
Domain
Google
Groups
All
Authenticated
Users
All
Users
Anonymous
Users
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
ACLs
Access Control List Entry consists of
Grantee - who
Google Storage ID
Google account email address
Google group email address
Google Apps domain
Special Identifier - AllAuthenticatedUsers / AllUsers
Permission - what can they do
READ/WRITE/FULL_CONTROL
permission is concentric
Access Control List =
(Grantees + Permissions)+
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Access control summary
Project Team - Who can create/delete/list buckets
Bucket ACL - Who can create/delete/list objects
Object ACL - Who can read object
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Bucket ACL Permissions
READ - List bucket's content
WRITE - Create/Overwrite/Delete objects in bucket
FULL_CONTROL - READ/WRITE + READ/WRITE bucket ACL
Default ACL is project-private
Project Members - READ
Project Editors - FULL_CONTROL
Project Owners - FULL_CONTROL
Default ACL can be changed with gsutil acl sub-command
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Objects ACL permissions
READ - Can download object
WRITE - Does not apply
FULL_CONTROL - READ + READ/WRITE object ACL
Default ACL is project-private
Project Members - READ
Project Editors - FULL_CONTROL
Project Owners - FULL_CONTROL
Can be changed with gsutil defacl sub-command
Can specify an ACL during upload
Bucket and Object ACL are independent!
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Command line tool
Access Google Cloud Storage from the command line (gcutil)
Allows for a wide range of bucket and object management tasks such
as:
Create and delete buckets or objects
Get and set bucket or object ACLs
Move, copy and rename objects
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
In short
Google Cloud Storage is an Infrastructure as a Service (IaaS) which allows industrial strength data
storage.
Easy to use. Just projects, buckets and objects.
Tools that make mastering the service easy.
Provides a RESTful interface for programmatic access to perform Create, Read, Update, Delete (CRUD)
operations.
You can choose the APIs set that best satisfies your requirements from the native XML and JSON APIs to
the App Engine APIs.
Google Cloud Storage leverages the power, reliability, speed and ubiquity of Google world wide network.
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
In short
Google Cloud Storage is an Infrastructure as a Service (IaaS) which allows industrial strength data
storage.
Easy to use. Just projects, buckets and objects.
Tools that make mastering the service easy.
Provides a RESTful interface for programmatic access to perform Create, Read, Update, Delete (CRUD)
operations.
You can choose the APIs set that best satisfies your requirements from the native XML and JSON APIs to
the App Engine APIs.
Google Cloud Storage leverages the power, reliability, speed and ubiquity of Google world wide network.
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Cloud Storage: upload file (BlobStore API)
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Code Example: upload form
<%
BlobstoreService blobstore = BlobstoreServiceFactory.getBlobstoreService();
String uploadUrl = blobstore.createUploadUrl("/uploadcallback",
UploadOptions.Builder.withGoogleStorageBucketName(<BUCKET_NAME>));
%>
. . .
<form action="<%= uploadUrl %>" method="post" enctype="multipart/form-data">
<textarea name="title" placeholder="Your title or comment" maxlength="500"
class="titleTextArea" required></textarea>
<input type="file" name="fileName">
<input class="active btn" type="submit" value="Upload">
</form>
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Code example: upload call back servlet
BlobstoreService blobstoreService = BlobstoreServiceFactory.getBlobstoreService();
Map<String, List<FileInfo>> blobs = blobstoreService.getFileInfos(req);
Collection<List<FileInfo>> entries = blobs.values();
for (Iterator iterator = entries.iterator(); iterator.hasNext();) {

ArrayList<FileInfo> filedeets = (ArrayList<FileInfo>) iterator.next();
FileInfo myfileinfo = filedeets.get(0);

if(myfileinfo.getSize() == 0) {
continue;
}

String gsFileName = myfileinfo.getGsObjectName();
log.info("gs storage is" + myfileinfo.getGsObjectName());
// DO WORK
}
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Cloud storage: object serving (Image service)
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Code example: serving URL
ServingUrlOptions options =
ServingUrlOptions.Builder.withGoogleStorageFileName(image.getObjectName());
ImagesService imagesService = ImagesServiceFactory.getImagesService();
try {
url = imagesService.getServingUrl(options) + "=s100";
} catch (Exception ex) {
// we are in development env
}
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
API Documentation
Blobstore: https://developers.google.com/appengine/docs/java/blobstore/
Cloud storage and blobstore: https://developers.google.
com/appengine/docs/java/blobstore/#Java_Using_the_Blobstore_API_with_Google_Cl
oud_Storage
Image services: https://developers.google.com/appengine/docs/java/images/
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Course Evaluation
1. Multiple choice question test: 60%
2. Final programming excercise: 20%
3. Date example: 10% (remember: version 1)
4. Tutorial example: 10% (remember: version 2)
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Versioning System
Team development with SVN o GIT
ITS - TECNICO SUPERIORE PER LE APPLICAZIONI DI DATA INTEGRATION IN AMBIENTE CLOUD
v 1.0 - Lorenzo Zimolo
Programming test
PhotoShare App

You might also like