In this course, you will get introduced to MongoDB. You will learn how to install it and how to operate it via its shell. Moreover, you will learn how to programmatically access it via Java and how to leverage Map Reduce with it. Finally, more advanced concepts like sharding and replication will be explained. Check it out here!
MongoDB shell is the best tool out there to discover MongoDB features and manage every single aspect of your server deployments, instances, databases, collections and documents. It is based on JavaScript language for executing command and queries. Please do not worry if you have little or no knowledge of JavaScript: you will be able to mostly understand every example effortlessly as there is a common pattern to follow.
With JSON being a format to manage document, it is also used to specify commands and queries as well as to return their results. Such unification brings a lot of benefits because JSON is inherently simple, human-friendly and easy to understand.
In this part of the tutorial our intention is to go through most of the commands and queries supported by MongoDB using its shell except the ones related to sharding (will be covered in details in Part 4. MongoDB Sharding Guide) and replication (will be covered in details in Part 5. MongoDB Replication Guide). More advanced topics will be covered in Part 7. MongoDB Security, Profiling, Indexing, Cursors and Bulk Operations Guide.
MongoDB has many internal and experimental commands and those (in most cases) we will not cover. Their usage is limited to very specific scenarios you may never encounter (or their behavior might be unstable).
MongoDB shell provides a couple of command helpers which allow establishing the context and implicitly populating shell variables, including:
db: current database context variable
rs: replica set context variable
sh: sharding context variable
With no command line arguments provided, MongoDB shell by default connects to local MongoDB server instance on port 27017 and database with name test (which may not physically exist on the disk).
Command
use <database>
Description
Switches current database to <database> and assigns shell variable db to the current database.
Example
In MongoDB shell, let us issue the command: use mydb
Outputs the last segment of log in memory. If logger name is omitted the global logger will be used as default.
Example
In MongoDB shell, let us issue the command: show log global
show log [name]
Command
load(<filename>)
Description
Loads and executes a JavaScript file with name filename inside current MongoDB shell environment.
Example
Let us prepare the sample db.js script, located inside MongoDB installation folder, which just lists all available databases and outputs their names on a console:
result = db.getSiblingDB( 'admin' )
.runCommand( { listDatabases: 1 } );
for( index in result.databases ) {
print( result.databases[ index ].name );
}
In MongoDB shell, let us issue the command: load( ‘db.js’ )
With MongoDB shell, there are at least two ways to run commands:
using the generic db.runCommand() function call
using the more convenient db.<command> or db.<collection>.<command> wrapper function calls
In most cases the second option is much more readable and that will be the choice for our examples in the following sections. In most cases both options will be demonstrated side by side (if applicable) so you will be able to pick your favorite way to run commands. Please notice that not all commands do have MongoDB shell wrappers and as such, they can be run with db.runCommand() function call only.
Command
db.runCommand(<command>)
Description
Provides a helper to run specified database commands. This is the preferred method to issue database commands, as it provides a consistent interface between the shell and drivers.
Example
In MongoDB shell, let us issue the command: db.runCommand( { buildInfo: 1 } )
Database is a top-level data container in MongoDB which holds one or more collections of documents. For every database MongoDB creates physical file (or files) on a disk, aggressively pre-allocating data files to reserve the space and avoid file system fragmentation.
The data file names follow the pattern: the first data file has name <databasename>.0, the next one <databasename>.1 and so on. The size of first pre-allocated file is 64 megabytes, the second one has size 128 megabytes, next one 256 megabytes, and so on, up to maximum size of 2 gigabytes (at this point all subsequent files will be 2 gigabytes in size). One thing to keep in mind, MongoDB will not permanently create a database until the data is inserted into it.
By default, MongoDB also creates the journal files, which store write operations on disk prior to they are being applied to databases.
Command
db.help()
Description
Show help for database methods.
Example
In MongoDB shell, let us issue the command: db.help()
Returns another database without modifying the db variable in the shell environment. It can be used as an alternative to the use <database> helper (see please Shell Command Helpers).
Example
In MongoDB shell, let us issue the command: db.getSiblingDB(‘admin’).getName()
Every MongoDB server instance has its own local database, which stores data used in the replication process, and other instance-specific data. The local database is not touched by replication: collections in the local database are never replicated (Part 5. MongoDB Replication Guide talks more about replication). Also, there is an admin database – a privileged database which users must have access to in order to run certain administrative commands.
To run the command in context of admin database, the following options are available:
use admin
db.runCommand( <command> )
Please notice that the current database will be switched to admin.
A bit more verbose chained calls but current database will not be switched and stays unchanged.
db.adminCommand( <command> )
A shortcut to db.getSiblingDB( ‘admin’ ) .runCommand( <command> ).
Command
listCommands
Wrapper
db.listCommands()
Description
Displays a list of all database commands with examples of usage and expecting parameters. The commands which require administrative privileges are marked as adminOnly.
Example
In MongoDB shell, let us issue the command: db.listCommands()
Alternatively, let us run the same command using runCommand() call: db.runCommand( { listCommands: 1 } )Note: Only a fragment of the output is shown.
Copies a database from a remote host to the current host or copies a database to another database within the current host. It should be run in context of admin database.
This command clones a database with the same name as the current database from a remote MongoDB instance running on <hostname> andport <port> to the current host.
Example
Assuming there in another instance of MongoDB server running on port 27018, let us issue the command in the shell: db.cloneDatabase( ‘localhost:27018’ )
Alternatively, let us run the same command using runCommand() call: db.runCommand( { clone: ‘localhost:27018’ } )
Flushes all pending writes from the storage layer to disk. Optionally, it can lock the server instance and block write operations for the purpose of capturing backups.It should be run in context of admin database.
Collections are the containers of MongoDB documents that share one or more indexes. For the users familiar with RDBMS concepts, a collection is the equivalent of a table. Collection belongs to a single database and do not enforce any schema on containing documents. There is no limit on number of documents any individual collection can contain unless it is a special type of collection called capped collection: a fixed-sized collection that automatically overwrites its oldest entries when it reaches its maximum size.
Together with database name, collections form a namespace: database name concatenated with collection name using period ‘.’ character, for example:
test.collection1
test.collection1.subcollection1
Additionally to user-defined collections, MongoDB stores system information in collections that use the <database>.system.* namespace and are reserved for internal use. The admin database (see please Databases section) includes following system collections:
admin.system.roles
admin.system.users
admin.system.version
Each user database has following system collections defined:
<database>.system.namespaces
<database>.system.indexes
<database>.system.profile
<database>.system.js
In this section we will not explore system collections directly but if you are interested in getting more details, please refer to the official documentation.
Command
db.<collection>.help()
Description
Show help on collection methods. The <collection> can be the name of an existing collection or a non-existing collection.
Example
In MongoDB shell, let us issue the command: db.mycoll.help()
Note: Only a fragment of the output is shown.
db.<collection>.help()
Command
db.getCollectionNames()
Description
Returns all collections in the current database.
Example
In MongoDB shell, let us issue the command: db.getCollectionNames()
Returns a collection name. This is useful for a collection whose name might interact with the shell itself (for example, begins with _ or has the same name as built-in database command).
Example
In MongoDB shell, let us issue the command: db.getCollection( ‘system.indexes’ )
Checks the structures within a collection <collection> for correctness by scanning the collection’s data and indexes. The command returns information regarding the on-disk representation of the collection.
Copies a collection <collection> from a remote host to the current host.
Example
Assuming there in another instance of MongoDB server running on port 27018, let us issue the command in the shell: db.cloneCollection( ‘localhost:27018’, ‘test.mycoll’ )
Alternatively, let us run the same command using runCommand() call: db.runCommand( { cloneCollection: ‘test.mycoll’, from: ‘localhost:27018’ } )
Creates a new capped collection <capped collection> from an existing <existing collection>, non-capped collection within the same database. The operation does not affect the original non-capped collection.
Drops all indexes on a collection <collection> and recreates them. This operation may be expensive for collections that have a large amount of data and/or a large number of indexes.
Example
In MongoDB shell, let us issue the command: db.mycoll.reIndex()
Alternatively, let us run the same command using runCommand() call: db.runCommand( { reIndex: ‘mycoll’ } )
Copies all documents from collection <collection> into new collection <newCollection> using server-side JavaScript. If collection <newCollection> does not exist, it will be created. The command returns the number of documents copied (or 0 if source collection is empty).
In MongoDB the data is represented and stored as a JSON documents: field and value pairs. More precisely, MongoDB uses binaryJSON (BSON) to store serialized documents on a disk but for the user it looks like regular JSON (at least, inMongoDB shell). The range of supported filed data types is quite impressive (please refer to BSON data types reference):
Double
String
Object
Array
Binary Data
Undefined
Object Id
Boolean
Date
Null
Regular Expression
JavaScript (with/without scope)
32-bit integer
64-bit integer
Timestamp
Other documents (so called embedded documents), arrays, arrays of documents and references are supported as well.
The field names have couple of restrictions:
The field with name _id is reserved for use as a primary key and must be unique in whole collection (it is immutable and may be of any type other than an array). This field is always the first field in the document.
The field names cannot start with the dollar sign ‘$’ character and cannot contain the dot ‘.’ character. Those are reserved.
For example, the simple document representing a Book may look like this:
References (DBRefs) are the pointers from one document to another (using the value of the document’s _id field, collection name, and, optionally, its database name):
For the users familiar with RDBMS concepts, it may look similar to foreign keys and joins, but MongoDB does not support joins: to resolve the reference, the additional query (or queries) should be executed. Nevertheless, it is quite a useful concept to commonly represent links between documents. Looking back to our Book example, let us assume the authors are stored in separate collection and every book is going to have a reference to its author. Here is an example of Author document:
Later in this section we will see more examples of inserting different documents and using document references.
The one hard limit to keep in mind is that the maximum size of the document (represented in BSON format) is 16 megabytes. For more comprehensive overview of document data model, please refer to official documentation.
The write/update commands have a notion of write concern: the guarantee that MongoDB provides when reporting on the success of a write operation. The strength of the write concerns determines the level of guarantee. When inserts, updates and deletes have a weak write concern, write operations return quickly. Consequently, with strong write concerns the clients may await the MongoDB server instance(s) to confirm the write operations. In some failure cases, write operations issued with weak write concerns may not be persisted. We are going to get back to write concern in Part 5. MongoDB Replication Guide but if you would like to get more details right now, please refer to official documentation.
Modifies an existing document or documents in a collection <collection> or inserts a new one if no documents match the query and the parameter upsert is set to true. The command can modify specific fields of an existing document or documents or replace an existing document entirely, depending on the update.
By default, the command updates a single document unless the parameter multi is to true andthen an update of all documents that match the query criteria will be performed.
The query syntax will be discussed in details in Queries section.
Example
In MongoDB shell, let us issue the command (the original document will be replaced if exists):
The update command supports variety of operators to control the modification semantics which are listed in the table below (for more details please refer to official documentation):
Operator
Description
$inc
Increments the value of the field by the specified amount.
$mul
Multiplies the value of the field by the specified amount.
$rename
Renames a field.
$setOnInsert
Sets the value of a field upon document creation during an upsert. Has no effect on update operations that modify existing documents.
$set
Sets the value of a field in an existing document.
$unset
Removes the specified field from an existing document.
$min
Only updates if the existing field value is less than the specified value.
$max
Only updates if the existing field value is greater than the specified value.
$currentDate
Sets the value of a field to current date, either as a Date or a Timestamp.
$
Acts as a placeholder to update the first element that matches the query condition in an update.
$addToSet
Adds elements to an existing array only if they do not already exist in the set.
$pop
Removes the first or last item of an array.
$pullAll
Removes all matching values from an array.
$pull
Removes all array elements that match a specified query.
$push
Adds an item to an array.
$each
Modifies the $push and $addToSet operators to append multiple items for array updates.
$slice
Modifies the $push operator to limit the size of updated arrays.
$sort
Modifies the $push operator to reorder documents stored in an array.
$position
Modifies the $push operator to specify the position in the array to add elements.
$bit
Performs bitwise AND, OR, and XOR updates of integer values.
$isolated
Modifies behavior of multi-updates to improve the isolation of the operation.
update
Here is an example of the update using some of the operators from the table above:
Removes documents from a collection <collection>. A multiple delete specifications could be provided. The command cannot operate on capped collections.
The query syntax will be discussed in details in Queries section.
Finds, modifies and returns a single document in a collection <collection>. By default, the returned document does not include the modifications made by the update. To return the document with the modifications made, the new option should be set to true. If query selects multiple documents, the sort parameter determines which document should be modified.
Example
In MongoDB shell, let us issue the commands (please notice that the original document will be replaced with updated one):
The latest MongoDB release introduces the bulk operations support, which along with indexing will be covered in Part 7. MongoDB Security, Profiling, Indexing, Cursors and Bulk Operations Guide.
6. Queries
Queries are being used to retrieve data stored in the MongoDB database. In MongoDB, queries select documents from a single collection only (as we already know, joins are not supported). Queries specify the criteria to match the documents against. A query may include a projection to select the fields from the matching documents to return (quite useful to limit the amount of data that should be sent over the network).
Command
db.<collection>.find(<criteria>, <projection>)
Description
Selects documents in a collection <collection> and returns a cursor to the selected documents. Cursors will be covered in Part 7. MongoDB Security, Profiling, Indexing, Cursors and Bulk Operations Guide.
Returns only one document from the collection <collection> that satisfies the specified query criteria. If multiple documents match the query, this method returns the first document according to the natural order which reflects the order of documents on the disk. In cappedcollections, natural order is the same as insertion order. This command is very similar to db.<collection>.find(<criteria>, <projection>) described abovebut limits the result to at most one document.
With latest release MongoDB allows to limit the query processing time using maxTimeMS option (milliseconds). Please notice that maxTimeMS only accounts for CPU time and does not include network latency or idle time.
Specifies a point for which a geospatial query returns the closest documents from <collection> first. The query returns the documents from nearest to farthest. It is an alternative to the $near query operator. We are going to cover Geo indexes in details in Part 7. MongoDB Security, Profiling, Indexing, Cursors and Bulk Operations Guide.
In Part 7. MongoDB Security, Profiling, Indexing, Cursors and Bulk Operations Guide we will go through some advanced topics related to cursors, query profiling and query plans.
7. Aggregations
MongoDB provides a family of commands to perform collection-wide aggregation operations (so called aggregation framework). Most the commands we are going to cover in this section except mapReduce, which will be covered in Part 6. MongoDB Map Reduce Tutorial.
Finds the distinct values for a specified key <field> across a single collection <collection>. The query syntax is discussed in details in Queries section.
Groups documents in a collection <collection> by the specified key <field> and performs simple aggregation functions, such as computing counts and sums. For the users familiar with RDBMS concepts, this command is analogous to a SELECT <…> GROUP BY<…> statement in SQL. The query syntax is discussed in details in Queries section.
Performs aggregation operation using the aggregation pipeline <pipeline>: processing the data from a collection <collection> with a sequence of stage-based manipulations (for more details please refer to official documentation).
The pipeline aggregation operators include:
$project
$match
$redact
$limit
$skip
$unwind
$group
$sort
$geoNear
$out
Each pipeline operator also supports the expression operators to calculate values within the pipeline (for more details please refer to official documentation).
GridFS allows storing and retrieving files that exceed the MongoDB document size limit of 16MB (see please Documents section). Instead of storing a file in a single document, GridFS divides a file into parts (or chunks) and stores each of those chunks as a separate document. By default, the size of the chunk is 255KB. GridFS stores files in two collections (the fs prefix may be changed):
MongoDB server supports a vast variety of commands to inspect its internals and monitor current activities. To satisfy the needs of enterprise deployments, MongoDB has a powerful, role-based security model to ensure that users and applications have access to only the data they are allowed to. Being a large topic, it will be covered in Part 7. MongoDB Security, Profiling, Indexing, Cursors and Bulk Operations Guide.
Terminates an operation as specified by the operation ID (returned by db.currentOp()). The recommendation for this command is to use it to terminate the operations initiated by clients only and do not terminate internal database operations.
Returns information about the underlying system that MongoDB server instance is running on. Some of the returned fields are only included on some platforms.
Example
In MongoDB shell, let us issue the command: db.hostInfo()
Alternatively, let us run the same command using runCommand() call: db.runCommand( { hostInfo: 1 } )
Alternatively, let us run the same command using runCommand() call: db.adminCommand( { shutdown: 1 } )
Please notice that you have to restart your MongoDB server instance when command finishes execution as it will be terminated and MongoDB shell will not be able to connect to it anymore.
Loads data from the data storage layer into memory. It can load the indexes, data (documents) or both data (documents) and indexes. Execution of this command ensures that a collection <collection>, and/or its indexes, is/are in memory before another operation begins. By loading the collection or indexes into memory, MongoDB server instance might be able to perform subsequent operations more efficiently.
Allows rotating the MongoDB server instance logs to prevent a single log file from consuming too much disk space. It should be run in context of admin database.
Example
In MongoDB shell, let us issue the command: db.adminCommand( { logRotate: 1 } )
Allows retrieving the value of MongoDB server instance options normally set on the command line. The <option> parameter follows the setParameter command specification. It should be run in context of admin database.
Example
In MongoDB shell, let us issue the command: db.adminCommand( { getParameter: 1, “textSearchEnabled”: 1 } )
In this section we have played quite a lot with MongoDB shell and seen most of the MongoDB commands in action. In the next section we are going to learn how to integrate MongoDB in your Java applications.
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
Andriy is a well-grounded software developer with more then 12 years of practical experience using Java/EE, C#/.NET, C++, Groovy, Ruby, functional programming (Scala), databases (MySQL, PostgreSQL, Oracle) and NoSQL solutions (MongoDB, Redis).