Document Service Introduction
Zend_Cloud_DocumentService abstracts the interfaces to all major
document databases - both in the cloud and locally deployed - so developers can access their
common functionality through one API. In other words, an application can make use of these
databases and services with no concern over how the application will be deployed. The data
source can be chosen through configuration changes alone at the time of deployment.
Document databases and services are increasingly common in application development. These
data sources are somewhat different from traditional relational data sources, as they eschew
complex relationships for performance, scalability, and flexibility. Examples of
document-oriented services include Amazon SimpleDB and Azure Table Storage.
The Simple Cloud API offers some flexibility for vendor-specific features with an
$options array in each method signature. Some adapters require certain
options that also must be added to the $options array. It is a good
practice to retrieve these options from a configuration file to maintain compatibility with
all services and databases; unrecognized options will simply be discarded, making it
possible to use different services based on environment.
If more vendor-specific requirements are required, the developer should extend the specific
Zend_Cloud_DocumentService adapter to add support for these features.
In this manner, vendor-specific features can be called out in the application by referring
to the Simple Cloud API extensions in the subclass of the Simple Cloud adapter.
Zend_Cloud_DocumentService_Adapter Interface
The Zend_Cloud_DocumentService_Adapter interface defines methods
that each concrete document service adapter implements. The following adapters are
shipped with the Simple Cloud API:
To instantiate a document service adapter, use the static method
Zend_Cloud_DocumentService_Factory::getAdapter(), which accepts
a configuration array or a Zend_Config object. The
document_adapter key should specify the concrete adapter class by
classname. Adapter-specific keys may also be passed in this configuration parameter.
Example #1 Example: Using the SimpleDB adapter
$adapterClass = 'Zend_Cloud_DocumentService_Adapter_SimpleDb';
$documents = Zend_Cloud_DocumentService_Factory:: getAdapter(array(
Zend_Cloud_DocumentService_Factory::DOCUMENT_ADAPTER_KEY => $adapterClass,
Zend_Cloud_DocumentService_Adapter_SimpleDb::AWS_ACCESS_KEY => $amazonKey,
Zend_Cloud_DocumentService_Adapter_SimpleDb::AWS_SECRET_KEY => $amazonSecret
));
Supported Adapter Options
Zend_Cloud_DocumentService_Adapter Common Options
Option key |
Description |
Used in |
Required |
Default |
document_class |
Class to use to represent returned documents. The class provided must extend
Zend_Cloud_DocumentService_Document to ensure
compatibility with all document services. For all methods that
return a document or collection of documents, this class will be
used.
|
Constructor |
No |
Zend_Cloud_Document_Service_Document |
documentset_class |
Class to use to represent collections of documents,
Zend_Cloud_DocumentService_DocumentSet by
default. Typically, objects of this class will be returned by
listDocuments() and
query(). Any class provided for this
configuration value must extend
Zend_Cloud_DocumentService_DocumentSet.
|
Constructor |
No |
Zend_Cloud_DocumentService_DocumentSet |
Zend_Cloud_DocumentService_Adapter_SimpleDb Options
Option key |
Description |
Used in |
Required |
Default |
query_class |
Class to use for creating and assembling queries for this document
service; select() will create objects of
this class name, as will listDocuments().
|
Constructor |
No |
Zend_Cloud_DocumentService_Adapter_SimpleDb_Query |
aws_accesskey |
Your Amazon AWS access key |
Constructor |
Yes |
None |
aws_secretkey |
Your Amazon AWS secret key |
Constructor |
Yes |
None |
http_adapter |
HTTP adapter to use in all access operations |
Constructor |
No |
Zend_Http_Client_Adapter_Socket |
merge |
If a boolean true, all attribute values are merged. You may also
specify an array of key pairs, where the key is the attribute key to
merge, and the value indicates whether or not to merge; a boolean
true value will merge the given key. Any attributes not specified in
this array will be replaced.
|
updateDocument() |
No |
True |
return_documents |
If a boolean true, query() returns a
Zend_Cloud_DocumentService_DocumentSet object
containing
Zend_Cloud_DocumentService_Document objects
(default case); otherwise, it returns an array of arrays.
|
query() |
No |
True |
Zend_Cloud_DocumentService_Adapter_WindowsAzure Options
Option key |
Description |
Used in |
Required |
Default |
query_class |
Class to use for creating and assembling queries for this document
service; select() will create objects of
this class name, as will listDocuments().
|
Constructor |
No |
Zend_Cloud_DocumentService_Adapter_WindowsAzure_Query |
default_partition_key |
The default partition key to use if none is specified in the
document identifier. Windows Azure requires a two-fold document ID,
consisting of a PartitionKey and a RowKey. The PartitionKey will
typically be common across your database or a collection, while the
RowKey will vary. As such, this setting allows you to specify the
default PartitionKey to utilize for all documents.
If not specified, the adapter will default to using the collection
name as the PartitionKey.
|
Constructor, setDefaultPartitionKey() |
Name of whatever collection the document belongs to |
storage_accountname |
Windows Azure account name |
Constructor |
Yes |
None |
storage_accountkey |
Windows Azure account key |
Constructor |
Yes |
None |
storage_host |
Windows Azure access host, default is
table.core.windows.net
|
Constructor |
No |
table.core.windows.net |
storage_proxy_host |
Proxy hostname |
Constructor |
No |
None |
storage_proxy_port |
Proxy port |
Constructor |
No |
8080 |
storage_proxy_credentials |
Proxy credentials |
Constructor |
No |
None |
HTTP Adapter |
HTTP adapter to use in all access operations |
Constructor |
No |
None |
verify_etag |
Verify ETag on the target document and perform the operation only if the
ETag matches the expected value
|
updateDocument(),
replaceDocument(),
deleteDocument()
|
No |
False |
Basic concepts
Each document-oriented service and database uses its own terminology and constructs in
its API. The SimpleCloud API identifies and abstracts a number of common concepts and
operations that are shared among providers.
Document storage consists of a number of collections, which are
logical storage units analogous to database tables in the SQL world. Collections contain
documents, which are essentially a set of key-value pairs, along
with some metadata specific to the storage engine, and are identified by a unique
document ID.
Each document has its own structure (set of fields) that does not necessarily have to
match the structure of any other document, even in the same collection. In fact, you can
change this structure after the document is created.
Documents can be retrieved by ID or by querying a collection.
Documents are represented by the class
Zend_Cloud_DocumentService_Document. Note that the document
class does not validate the supplied IDs and data, and does not enforce compatibility
with each adapter's requirements.
The document fields can be accessed using keys as object properties and as array
elements.
The basic interface of Zend_Cloud_DocumentService_Document is as
follows:
/**
* ArrayAccess allows accessing fields by array key:
* $doc['fieldname']
*
* IteratorAggregate allows iterating over all fields:
* foreach ($document as $field => $value) {
* echo "$field: $value\n";
* }
*
* Countable provides a count of all fields:
* count($document)
*/
class Zend_Cloud_DocumentService_Document
implements ArrayAccess, IteratorAggregate, Countable
{
const KEY_FIELD = '_id';
/**
* $fields may be an array or an object implementing ArrayAccess.
* If no $id is provided, it will look for a field matching KEY_FIELD to
* use as the identifier.
*/
public function __construct($fields, $id = null);
public function setId($id);
public function getId();
public function getFields();
public function getField($name);
public function setField($name, $value);
/**
* These allow overloading, so you may access fields as if they were
* native properties of the document
*/
public function __get($name);
public function __set($name, $value);
/**
* Alternately, you can acces fields as if via native getters and
* setters:
* $document->setFoo($value); // set "Foo" field to value
* $value = $document->getFoo(); // get "Foo" field value
public function __call($name, $args);
}
Note: Windows Azure Document Identifiers
Windows Azure technically requires a combination of two fields to uniquely
identify documents: the PartitionKey and
RowKey, and as such, keys are fully qualified by the structure
array(PartitionKey, RowKey) -- which makes them non-portable. In most
situations, the PartitionKey will not differ for documents in a
single collection -- and potentially not even across your entire table instance. As
such, the DocumentService provides several options for specifying keys:
The takeaway is that you can utilize string keys if you wish to maximize portability
of your application. Just be aware that your record will contain a few extra fields
to denote the key (PartitionKey, RowKey, and
the previously undiscussed Timestamp) should you ever migrate
your data to another service.
Example #2 Creating a document
$document = new Zend_Cloud_DocumentService_Document (array(
'key1' => 'value1',
'key2' => 123,
'key3' => 'thirdvalue',
), "DocumentId");
$document->otherkey = 'some more data';
echo "key 1: " . $document-> key1 . "\n"; // object notation
echo "key 2: " . $document['key2'] . "\n"; // array notation
Example #3 Exploring the document data
$document = $documents->fetchDocument("mydata", $id);
echo "Document ID: " . $document-> getID() . "\n";
foreach ($document->getFields() as $key => $value) {
echo "Field $key is $value\n";
}
Exceptions
If some error occurs in the document service,
Zend_Cloud_DocumentService_Exception is thrown. If the exception
was caused by the underlying service driver, you can use the
getClientException() method to retrieve the original exception.
Since different cloud providers implement different sets of services, some drivers do
not implement certain features. In this case, the
Zend_Cloud_OperationNotAvailableException exception is thrown.
Creating a collection
A new collection is created using createCollection().
Example #4 Creating collection
$documents->createCollection("mydata");
If you call createCollection() with a collection name that
already exists, the service will do nothing and leave the existing collection untouched.
Deleting a collection
A collection is deleted by calling deleteCollection().
Example #5 Deleting a collection
$documents->deleteCollection("mydata");
Deleting a collection automatically deletes all documents contained in that collection.
Note:
Deleting a collection can take significant time for some services. You cannot
re-create a collection with the same name until the collection and all its documents
have been completely removed.
Deleting a non-existent collection will have no effect.
Listing available collections
A list of existing collections is returned by
listCollections(). This method returns an array of all the
names of collections belonging to the account you specified when you created the
adapter.
Example #6 List collections
$list = $documents->listCollections();
foreach ($list as $collection) {
echo "My collection: $collection\n";
}
Inserting a document
To insert a document, you need to provide a
Zend_Cloud_DocumentService_Document object or associative array
of data, as well as the collection in which you are inserting it.
Many providers require that you provide a document ID with your document. If using a
Zend_Cloud_DocumentService_Document, you can specify this by
passing the identifier to the constructor when you instantiate the object. If using an
associative array, the key name will be adapter-specific locations; for example, on
Azure, the ID is made up of the PartitionKey and RowKey; on Amazon SimpleDB, the ID is
the ItemName; you may also specify the key in the _id field to be
more portable.
As such, the easiest and most compatible way to specify the key is to use
a Document object.
Example #7 Inserting a document
// Instantiating and creating the document
$document = new Zend_Cloud_DocumentService_Document (array(
'key1' => 'value1',
'key2' => 123,
'key3' => 'thirdvalue',
), "DocumentID");
// inserting into the "mydata" collection
$documents->insertDocument("mydata", $document);
Replacing a document
Replacing a document means removing all document data associated with a particular
document key and substituting it with a new set of data. Unlike
updating, this operation does not merge old and new data but
replaces the document as a whole. The replace operation, like
insertDocument(), accepts a
Zend_Cloud_DocumentService_Document document or an array of
key-value pairs that specify names and values of the new fields, and the collection in
which the document exists.
Note: Document ID is required
To replace the document, the document ID is required. Just like inserting a document,
if you use an associative array to describe the document, you will need to provide a
provider-specific key indicating the document ID. As such, the most compatible way
to replace a document across providers is to utilize a Document object, as shown in
the examples.
Example #8 Replacing a document
$document = new Zend_Cloud_DocumentService_Document (array(
'key1' => 'value1',
'key2' => 123,
'key3' => 'thirdvalue',
), "DocumentID");
// Update the document as found in the "mydata" collection
$documents->replaceDocument("mydata", $document);
You may also use an existing Document object, re-assign the fields and/or assign new
fields, and pass it to the replaceDocument() method:
$docment->key4 = '4th value';
// Update the document as found in the "mydata" collection
$documents->replaceDocument("mydata", $document);
Updating a document
Updating a document changes the key/value pairs in an existing
document. This operation does not share the replace semantics; the
values of the keys that are not specified in the data set will not be changed. You must
provide both a document key and data, either via a
Zend_Cloud_DocumentService_Document document or an array, to this
method. If the key is null and a document object is provided, the document key is used.
Example #9 Updating a document
// update one field
$documents-> updateDocument("mydata", "DocumentID", array("key2" => "new value"));
// or with document; this could be a document already retrieved from the service
$document = new Zend_Cloud_DocumentService_Document (array(
'key1' => 'value1',
'key2' => 123,
'key3' => 'thirdvalue',
), "DocumentID");
$documents->updateDocument("mydata", null, $document);
// copy document to another ID
$documents->updateDocument("mydata", "AnotherDocumentID", $document);
Amazon SimpleDB supports multi-value fields, so data updates will be merged with the old key
value instead of replacing them. Option merge should contain an array
of field names to be merged. The array should be key/value pairs, with the key
corresponding to the field key, and the value a boolean value indicating merge status
(boolean true would merge; false would not). Any keys not specified in the
merge option will be replaced instead of merged.
Example #10 Merging document fields
// key2 is overwritten, key3 is merged
$documents->updateDocument('mydata', 'DocumentID',
array('key2' => 'new value', 'key3' => 'additional value'),
);
Deleting a document
A document can be deleted by passing its key to
deleteDocument(). Deleting a non-existant document has no
effect.
Example #11 Deleting a document
$documents->deleteDocument("collectionName", "DocumentID");
Fetching a document
You can fetch a specific document by specifying its key.
fetchDocument() returns one instance of
Zend_Cloud_DocumentService_Document.
Example #12 Fetching a document
$document = $service->fetchDocument('collectionName', 'DocumentID');
foreach ($document->getFields() as $key => $value) {
echo "Field $key is $value\n";
}
Querying a collection
To find documents in the collection that meet some criteria, use the
query()method. This method accepts either a string which is an
adapter-dependent query and is passed as-is to the concrete adapter, or a structured query
object instance of Zend_Cloud_DocumentService_Query. The return
is a Zend_Cloud_DocumentService_DocumentSet, containing instances
of Zend_Cloud_DocumentService_Document that satisfy the query.
The DocumentSet object is iterable and countable.
Example #13 Querying a collection using a string query
$docs = $documents->query(
"collectionName",
"RowKey eq 'rowkey2' or RowKey eq 'rowkey2'"
);
foreach ($docs as $doc) {
$id = $doc->getId();
echo "Found document with ID: "
. "\n";
}
If using a structured query object, typically, you will retrieve it using the
select() method. This ensures that the query object is specific
to your adapter, which will ensure that it is assembled into a syntax your adapter
understands.
Example #14 Querying a collection with structured query
$query = $service->select();
$query->from('collectionName')
-> where('year > ?', array(1945))
->limit(3);
$docs = $documents->query('collectionName', $query);
foreach ($docs as $doc) {
$id = $doc->getId();
echo "Found document with ID: "
. "\n";
}
Zend_Cloud_DocumentService_Query classes do not limit which query
clauses can be used, but the clause must be supported by the underlying concrete
adapter. Currently supported clauses include:
-
select() - defines which fields are returned in the
result.
Note:
Windows Azure ignores this clause's argument and always returns the whole
document.
-
from() - defines the collection name used in the query.
-
where() - defines the conditions of the query. It
accepts three parameters: condition, array of arguments to replace "?" fields in
the condition, and a conjunction argument which should be "and" or "or", and
which will be used to join this condition with previous conditions. Multiple
where() clasues may be specified.
-
whereId() - defines the condition by document ID (key).
The document matching must have the same key. The method accepts one argument -
the required ID (key).
-
limit() - limits the returned data to specified number
of documents.
-
order() - sorts the returned data by specified field.
Accepts two arguments - first is the field name and second is 'asc' or 'desc'
specifying the sort direction.
Note:
This clause is not currently supported by Windows Azure.
Creating a query
For the user's convenience, the select() method instantiates a
query object specific to the adapter, and sets the SELECT clause for it.
Example #15 Creating a structured query
$query = $documents->select()
->from('collectionName')
-> where('year > ?', array(1945))
->limit(3);
$docs = $documents->query('collectionName', $query);
foreach ($docs as $doc) {
$id = $doc->getId();
echo "Found document with ID: "
. "\n";
}
Accessing concrete adapters
Sometimes it is necessary to retrieve the concrete adapter for the service that the
Document API is working with. This can be achieved by using the
getAdapter() method.
Note:
Accessing the underlying adapter breaks portability among services, so it should be
reserved for exceptional circumstances only.
Example #16 Using concrete adapters
// Since SimpleCloud Document API doesn't support batch upload, use concrete adapter
$amazonSdb = $documents->getAdapter();
$amazonSdb->batchPutAttributes($items, 'collectionName');
|
|