Saturday, September 28, 2013

Mongodb: Loading and querying the data

This blog post explains how can we load data from json files and perform simple queries on it. If you wish to install and configure the mongodb, please refer previous post abut the same.

Building database

You can go to mongo shell by clicking on "mongo" or running "./mongo" in bin folder. You can use following simple command to get list of all databases,

> show dbs
admin 0.03125GB

Now you can create the database a database by following command,

> use twitterdb
switched to twitterdb

However if you run "show dbs" then it will not show the database. Mongodb will actually create database when you try to save something in it. You can create a collection in the database. Collection is logical equivalent of table in RDBMS. The hierarchy is Database > Collection > Document. You can create and authenticate user with following commands in mongo shell.

> db.addUser("username", "password");
> db.auth("username", "password");

Loading the data 

I had around 3000 json files containing twitter feed gathered from twitter API. First I tested for single file if its working by following command. [Note: you have to use this command from terminal not mongo shell]

> ./mongoimport --host localhost --db <testdb> --collection <collection-name> --username <user> --password <password> --type json --file <filename>.json

Now after loading data from one json file we can use simple shell script to load other files as following from terminal. [Note: This script is written for files named as "file1.json", "file2.json"..."file3000.json"]

> for x in {1..3000}; do
./mongoimport --host localhost --db <testdb> --collection <collection-name> --username <user> --password <password> --type json --file file$x.json

If you are using windows you might like write similar batch script. Another simple way is we can store key-value information in some object say "data" and execute following command from mongo shell.


Querying the data

The following command is equivalent of "SELECT * FROM test". Here test is the collection name.


Now we can add parameter name and expected value is it,

> db.test.find( { id_str: '1836728' } ) // find documents where id_str is 1836728
> db.test.find( { city: 'Tucson' , username: 'Akshay' } ) // Find document with user name Akshay and city as Tucson
> db.test.find( { age: {$lt : 18} } ) # find records where age is less than 18

Now these queries will show all fields from documents which satisfy given conditions. We can limit fields to be displayed for matching documents like following, "

>db.test.find( { age: {$lt : 18} } , { username: 1} ) // Display all user names where age is less than 18

No comments:

Post a Comment