Call for Schemas

kchodorowAugust 22, 2012MongoDB

mongodbtdg — The Return of the Mongoose Lemur

I just started working on MongoDB: The Definitive Guide, 2nd Edition! I’m planning to add:

Lots of ops info
Real-world schema design examples
Coverage of new features since 2010… so quite a few

However, I need your help on the schema design part! I want to include some real-world schemas people have used and why they worked (or didn’t). If you’re working on something 1) interesting and 2) non-confidential and you’d like to either share or get some free advice (or both), please email me (kristina at 10gen dot com) or leave a comment below. I’ll set up a little interview with you.

I am particularly looking for “cool” projects (video games, music, TV, sports), recognizable companies (Fortune 50 & HackerNews 500*), and geek elite (Linux development, research labs, robots, etc.). However, if you’re working on something you think is interesting that doesn’t fall into any of those categories, I’d love to hear about it!

* There isn’t really a HackerNews 500, I mean projects that people in the tech world recognize and thinks are pretty cool (DropBox, Github, etc.).

16 thoughts on “Call for Schemas”

Steven Hatfield says:

August 22, 2012 at 10:20 am

Hi Kristina,
One problem that I’ve run into in the short time that I’ve been working with mongoDB is how to evolve a changing schema. I have a couple of collections with references to one from the other. I can run renameCollection() to change the name of the collection being referenced, but it leaves the referring collection’s documents with broken ref links! A work around for this kind of stuff would be excellent to include in the book.
Thanks,
Steven

LikeLike

Reply
1. kristina1 says:
  
  August 22, 2012 at 10:43 am
  
  So noted, I’ll add that to the application administration section. Thanks for the suggestion!
  
  LikeLike
  
  Reply
Justin Dearing says:

August 26, 2012 at 8:34 am

Is Mike working on the second edition again? Also, may I suggest including chocolatey directions for installing on windows, in addition to the windows directions.

LikeLike

Reply
1. kristina1 says:
  
  August 27, 2012 at 7:01 am
  
  Unfortunately, Mike decided not work on the 2nd edition.
  
  I’ll make a note on the Windows thing, you mean like directions on installing it as a service instead of just unpack->run? (Like here: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-windows/#mongodb-as-a-windows-service?) Mind if I run that section by you when it’s done? 🙂
  
  LikeLike
  
  Reply
  1. Justin Dearing says:
    
    August 27, 2012 at 7:03 am
    
    Chocolatey (
    http://chocolatey.org/ ) is like yum/apt-get/pkg-add for windows. Alan Stevens wrote a mongo package for it so you can install mongo with “cinst mongo” from a command prompt.
    
    But yes, it would be g
    
    LikeLike
  2. kristina1 says:
    
    August 27, 2012 at 10:07 am
    
    Ah, I thought you were just using chocolatey as flavor (HA!) text.
    
    I checked with the Windows people at 10gen and I don’t think it’s widely-enough known to include yet, sorry.
    
    LikeLike
  3. Justin Dearing says:
    
    August 27, 2012 at 10:43 am
    
    Fair enough, hit me up when you want the section reviewed.
    
    LikeLike
Aravind Udayashankara says:

August 30, 2012 at 10:52 am

Hey First I would like to thank you for contributing this wonderful book , It will be very helpful if you could add some more information on Sharding by taking some more practical examples Or Usecases like twitter . Because in general MongoDB is recommended in the scenarios where large amount of sharding is required .

LikeLike

Reply
1. kristina1 says:
  
  August 30, 2012 at 12:00 pm
  
  Thank you! I’ll definitely include some concrete examples for sharding.
  
  LikeLike
  
  Reply
Aravind Udayashankara says:

August 30, 2012 at 10:34 pm

Hi kristina ,

I have following scenario , in sharding , Lets say I have following setup

I have 1 server running config server

I have 2 servers running mongod instances called node1 and node2

I have 1 server running mongos

I have added the servers node1 and node2 as shard servers in the mongos shell by using the command

db.runCommand( { addshard : “serverhostname[:port]” } );

later I created database in node1 and enabled sharding for that database from Mongos

I created 300 collections by connecting to mongos but still all collections are going to only one server what is wrong in the above setup , Actually it should create some collections in the other shard server .

NOTE : – Since I am not in a position to upgrade the application layer I cant shard in collection level , I have a script which creates several collections dynamically for some requirement .

Please Provide your suggestions on how we can shard in database level so that collections gets equally distributed among all shard servers .

I am looking for some solution like

If i create 300 collections automatically mongos should create around 150 or 100 collections in each of the two sharded servers .

LikeLike

Reply
1. kristina1 says:
  
  August 31, 2012 at 5:32 am
  
  See http://www.kchodorow.com/blog/2012/07/25/controlling-collection-distribution/.
  
  LikeLike
  
  Reply
  1. Aravind Udayashankara says:
    
    August 31, 2012 at 10:24 am
    
    Hey this is really a good new feature I could learn , But my problem is collections are dynamically created , I cant manually shard them as and when they are created , Is there a way to distribute collections , which are created in a database for which sharding is enabled
    
    LikeLike
  2. kristina1 says:
    
    August 31, 2012 at 10:49 am
    
    No, not yet.
    
    LikeLike
Tyler Brock says:

September 2, 2012 at 2:37 pm

Kristina, I’ve got one that might be interesting. I’ll umm just tell you about it at work. It was actually an app built with a relational db at first but it didn’t work well and we transformed it to mongo and it worked really well. Was for my old company http://www.readrboard.com/

LikeLike

Reply
1. kristina1 says:
  
  September 4, 2012 at 1:42 pm
  
  Cool, that looks really interesting (I was looking for something like that for my blog a while ago, actually!). Look forward to hearing about it.
  
  LikeLike
  
  Reply
whardier says:

September 3, 2012 at 10:01 am

Here’s the example schema for my Informadiko generic search system. This mainly covers the fundamental accounting and access privileges as well as all of the templating for each document repository (called collections in my software.. to be confusing). One of the tricks I’ve been using since I started working in this schema style has been the ‘shortnames’ technique as seen on ine 54. This allows me to know the ObjectID of a _id key in a dict inside a list relative to the shortnames list. In the end this gives me some very vital shortcuts within my code. Some of which may be rather obvious when combined with my project here: https://github.com/whardier/MongoDict. MongoDict wraps a dict class so that when loading into a dict via PyMongo a shortcut list is generated on demand using this schema. This allows the developer to reference an item in a list by a lookup key rather than traversing it. The Schema: .gist table { margin-bottom: 0; } This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters #!/usr/bin/mongo informadiko // ┏━┓┏━╸┏━╸┏━┓╻ ╻┏┓╻╺┳╸┏━┓ // ┣━┫┃ ┃ ┃ ┃┃ ┃┃┗┫ ┃ ┗━┓ // ╹ ╹┗━╸┗━╸┗━┛┗━┛╹ ╹ ╹ ┗━┛ db.dropDatabase() // BE CAREFUL //db.createCollection("system.profile", {capped:true, size:1024000000}) // 1024 Megs db.setProfilingLevel(2); db.accounts.ensureIndex({ '_id' : 1, 'timestamp' : 1}); // We do this for a very fun caching reason later on. db.accounts.ensureIndex({ '_id' : 1, 'collections._id' : 1}); // Lets find those collections quickly. // rethink //db.accounts.ensureIndex({ '_id' : 1, 'collections.indexed' : 1}); // Lets find new batches quickly.. possibly make this a less primary index later on db.accounts.ensureIndex({ 'shortname' : 1, '_id' : 1 }); // We do this for a very fun caching reason later on. // For readability only company1_account = ObjectId('4f3daa355eb9736480000000'); company1_collection_cdr = ObjectId('4f3daa355eb9736480000005'); company1_collection_cdr_criteria_department = ObjectId('4f3daa355eb973648000000b'); company1_collection_cdr_criteria_destination = ObjectId('4f3daa355eb973648000000c'); company1_collection_cdr_criteria_source = ObjectId('4f3daa355eb973648000000a'); company1_collection_cdr_criteria_direction = ObjectId('4f3daa355eb9736480000009'); company1_collection_cdr_criteria_citystate = ObjectId('4f3daa355eb973648000000f'); company1_collection_cdr_criteria_disposition = ObjectId('4f3daa355eb97364800a0000'); company1_collection_monitored = ObjectId('4f3daa355eb9736480000006'); company1_collection_chat = ObjectId('4f3daa355eb9736480000007'); company1_collection_chat_company1_tv = ObjectId('4f3daa355eb9736480000008'); company1_template_cdr_1 = ObjectId('4f3daa355eb97364800000f0'); company1_template_cdr_2 = ObjectId('4f3daa355eb97364800000f1'); company1_template_monitored_1 = ObjectId('4f3daa355eb97364800000d1'); company1_template_company1_tv_1 = ObjectId('4f3daa355eb97364800000f5'); company1_criteria_company1_tv_1_source = ObjectId('4f3daa355eb9736480000ff1'); company1_criteria_company1_tv_1_destination = ObjectId('4f3daa355eb9736480000ff2'); company1_criteria_company1_tv_1_chat = ObjectId('4f3daa355eb9736480000ff3'); company2_account = ObjectId('4f3daa355eb9736480000001'); company2_collection_cdr = ObjectId('4f3daa355eb9736480000011'); company2_collection_monitored = ObjectId('4f3daa355eb9736480000012'); shane_user = ObjectId('4f3c4dd75eb973500f000000'); // there are two templates with the same shortname. This is because they need to be timestamped and selected when reindexing documents for a new template.. multiple template support? sure… Templates should not be overwritten db.accounts.save({ '_id' : company1_account, 'timestamp' : 1329617789.376707, 'name' : 'company1', 'shortname' : 'company10000', 'collection_shortnames': { 'cdr': company1_collection_cdr, 'monitored': company1_collection_monitored, 'chatlogs': company1_collection_chat, 'company1tv': company1_collection_chat_company1_tv, }, 'collections' : [ { '_id' : company1_collection_cdr, 'name' : 'Call Detail Records (CDR)', 'shortname' : 'cdr', 'path' : [company1_collection_cdr], 'links' : [ { 'collection': company1_collection_monitored, 'shortname': 'monitored', 'label': 'Monitor File', 'description': 'Should the call detail record be checked first to see if it contains a monitor file link?' } ], 'summary': 'Call from <B>{{ src }}</B> to <B>{{ dst }}</B>', 'template_shortnames': { 'default': company1_template_cdr_1, }, 'templates': [ { '_id': company1_template_cdr_1, 'shortname': 'default', 'type': 'detailtable', 'metadata': true, 'links': true, 'elements': [ {'name': 'Date', 'date': 'effective', 'type': 'datetime'}, {'name': 'Source', 'meta': 'src', 'type': 'string'}, {'name': 'Destination', 'meta': 'dst', 'type': 'string'}, {'name': 'Duration', 'meta': 'duration', 'type': 'dms'}, {'name': 'Disposition', 'meta': 'disposition', 'type': 'string'}, {'name': 'Userfield', 'meta': 'userfield', 'type': 'string', 'full': true}, ], 'timestamp': 1329790288.432687, }, { '_id': company1_template_cdr_2, 'shortname': 'default', 'type': 'detailtable', 'metadata': true, 'links': true, 'elements': [ ['calldate', 'datetime'], ['display_source','string'], ['display_destination','display_destination'], ['duration', 'dms'], ['disposition', 'string'], ['userfield', 'string'], ], 'timestamp': 1329790289.833056, } ], 'criteria_shortnames': { 'department': company1_collection_cdr_criteria_department, 'source': company1_collection_cdr_criteria_source, 'destination': company1_collection_cdr_criteria_destination, 'citystate': company1_collection_cdr_criteria_citystate, 'direction': company1_collection_cdr_criteria_direction, 'disposition': company1_collection_cdr_criteria_disposition, }, 'criteria' : [ { '_id' : company1_collection_cdr_criteria_department, 'label' : 'Department', 'shortname': 'department', 'type' : 'text', 'description' : 'internal department shared by this call', }, { '_id' : company1_collection_cdr_criteria_source, 'label' : 'Source', 'shortname': 'source', 'type' : 'text', 'extra' : { 'exact_startswith_endswith': 'exact', } }, { '_id' : company1_collection_cdr_criteria_destination, 'label' : 'Destination', 'shortname': 'destination', 'type' : 'text', 'extra' : { 'exact_startswith_endswith': 'startswith', } }, { '_id' : company1_collection_cdr_criteria_citystate, 'label' : 'Caller City/State (Where did this call originate from?)', 'shortname': 'citystate', 'type' : 'text', }, { '_id' : company1_collection_cdr_criteria_direction, 'label' : 'Call Direction', 'shortname': 'direction', 'type' : 'select', 'ignore': ['inout'], 'default' : 'inout', 'choices' : [ ['inout','Inbound and Outbound'], ['in','Inbound Only'], ['out','Outbound Only'], ['internal','Internal'], ], }, { '_id' : company1_collection_cdr_criteria_disposition, 'label' : 'Call disposition', 'shortname': 'disposition', 'type' : 'select', 'ignore': ['any'], 'default' : 'any', 'choices' : [ ['any','No Preference'], ['answered','Answered'], ['no answer','No Answer'], ], }, ], }, { '_id' : company1_collection_monitored, 'name' : 'Monitored Calls', 'icon' : 'images/tango/16×16/actions/media-record.png', 'shortname' : 'monitored', 'path' : [company1_collection_cdr, company1_collection_monitored], 'hidden' : true, 'template_shortnames': { 'default': company1_template_monitored_1, }, 'templates': [ { '_id': company1_template_monitored_1, 'shortname': 'default', 'type': 'detailtable', 'metadata': true, 'links': true, 'elements': [ {'name': 'Date', 'date': 'effective', 'type': 'datetime'}, {'name': 'Link', 'meta': 'url', 'type': 'url', 'full': true}, ], 'timestamp': 1329790288.432687, }, ], }, { '_id' : company1_collection_chat, 'shortname' : 'chatlogs', 'path' : [company1_collection_chat], 'name' : 'Chat Logs', 'templates': [] }, { '_id' : company1_collection_chat_company1_tv, 'shortname' : 'company1tv', 'path' : [company1_collection_chat, company1_collection_chat_company1_tv], 'name' : 'iChat Server (chat.company1.tv)', 'template_shortnames': { 'default': company1_template_company1_tv_1, }, 'templates': [ { '_id': company1_template_company1_tv_1, 'shortname': 'default', 'type': 'detailtable', 'metadata': true, 'links': true, 'elements': [ {'name': 'Date', 'date': 'effective', 'type': 'datetime'}, {'name': 'Source', 'meta': 'src', 'type': 'string'}, {'name': 'Destination', 'meta': 'dst', 'type': 'string'}, {'name': 'Chat', 'meta': 'chat', 'type': 'string', 'wrap': true, 'full': true}, ], 'timestamp': 1329790288.432687, }, ], 'criteria_shortnames': { 'source': company1_criteria_company1_tv_1_source, 'destination': company1_criteria_company1_tv_1_destination, 'chat': company1_criteria_company1_tv_1_chat, }, 'criteria' : [ { '_id' : company1_criteria_company1_tv_1_source, 'label' : 'Source', 'shortname': 'source', 'type' : 'text', }, { '_id' : company1_criteria_company1_tv_1_destination, 'label' : 'Destination', 'shortname': 'destination', 'type' : 'text', }, { '_id' : company1_criteria_company1_tv_1_chat, 'label' : 'Chat', 'shortname': 'chat', 'type' : 'text', 'extra' : { 'exact_startswith_endswith': 'startswith', } }, ], } ], 'paths': [ [company1_collection_cdr], [company1_collection_cdr, company1_collection_monitored], [company1_collection_chat], [company1_collection_chat, company1_collection_chat_company1_tv], ], 'timezone': 'America/Anchorage', 'timezones': { 'United States of America': { 'Alaska': { 'America/Adak': 'Adak, Alaska, USA', 'America/Anchorage': 'Anchorage, Alaska, USA', }, 'Idaho': { 'America/Los_Angeles': 'Coeur'd Alene, Idaho, USA', }, 'Hawaii': { 'Pacific/Honolulu': 'Kapolei, Hawaii, USA', 'Pacific/Honolulu': 'Kona, Hawaii, USA', }, }, 'Misc': { 'UTC': { 'UTC': 'UTC', }, }, }, }); db.accounts.save({ '_id' : company2_account, 'timestamp' : 1329617789.376707, 'name' : 'Liberty Telecom', 'shortname' : 'company2com0000', 'collections_shortnames': { 'cdr': company2_collection_cdr, 'monitored': company2_collection_monitored, }, 'collections' : [ { '_id' : company2_collection_cdr, 'name' : 'Call Detail Records (CDR)', 'shortname' : 'cdr', 'path': [company2_collection_cdr], 'links' : [ company2_collection_monitored ], 'templates': [] }, { '_id' : company2_collection_monitored, 'name' : 'Monitored Calls', 'shortname' : 'monitored', 'path': [company2_collection_cdr, company2_collection_monitored], 'hidden' : true, 'templates': [] } ], 'paths': [ [company2_collection_cdr], [company2_collection_cdr, company2_collection_monitored], ], 'timezone': 'America/Anchorage', 'timezones': { 'United States of America': { 'Alaska': { 'America/Anchorage': 'Anchorage, Alaska, USA', }, }, 'Asia': { 'Phillipines': { 'Asia/Manila': 'Manila, Metro Manila, Phillipines', }, }, 'Misc': { 'UTC': { 'UTC': 'UTC', }, }, }, }); // ╻ ╻┏━┓┏━╸┏━┓┏━┓ // ┃ ┃┗━┓┣╸ ┣┳┛┗━┓ // ┗━┛┗━┛┗━╸╹┗╸┗━┛ db.users.ensureIndex({ '_id' : 1, 'timestamp' : 1}); db.users.ensureIndex({ 'username' : 1, 'password' : 1, '_id' : 1}); db.users.save({ '_id': shane_user, 'timestamp' : 1329617789.376707, 'username': 'shanespencer', 'firstname': 'Shane', 'lastname': 'Spencer', 'password': '$2a$12$Z62NF.c6f38bFWnM4hf2ce2RejrPHqGsc7IvLupYXuZgDfc9bDM2u', 'nickname': 'whardier', 'accounts': [ { '_id': company1_account, 'active': true, 'timezone': 'America/Anchorage', }, { '_id': company2_account, 'active': true, 'timezone': 'America/Anchorage', } ], 'timezones': { 'United States of America': { 'Alaska': { 'America/Anchorage': 'Anchorage, Alaska, USA', }, }, 'Misc': { 'UTC': { 'UTC': 'UTC', }, }, }, }); // TODO: Add collection level access // ╺┳┓┏━┓┏━╸╻ ╻┏┳┓┏━╸┏┓╻╺┳╸┏━┓ // ┃┃┃ ┃┃ ┃ ┃┃┃┃┣╸ ┃┗┫ ┃ ┗━┓ // ╺┻┛┗━┛┗━╸┗━┛╹ ╹┗━╸╹ ╹ ╹ ┗━┛ db.documents.attachments.chunks.ensureIndex({'account': 1, 'collection': 1, 'files_id' : 1, 'n' : 1 }, {'unique' : true}) db.documents.ensureIndex({'account': 1, 'collection': 1, '_id': 1}) db.documents.ensureIndex({'account': 1, 'collection': 1, 'batch': 1, 'indexed': 1}) //test sparse db.documents.ensureIndex({'account': 1, 'collection': 1, '_uuid': 1, '_id': 1}) db.documents.batches.ensureIndex({'indexed': 1, '_id': 1}) db.documents.batches.ensureIndex({'account': 1, 'collection': 1, '_id': 1}) // find it quickly view raw gistfile1.js hosted with ❤ by GitHub Some other fun tricks have been how the timezone schema works for preference for each user as well as the company preference. This has been very important when searching for time based records/documents. With a simple function the allowed timezones and the preferred timezones can be merged and sorted and displayed as an option. It has been very useful to search from east coast at one time to west coast at another. And of course pay attention to the indexes. They have been paramount toward the speed of this solution. One of the tricky indexes is on line 15 where we use the advance $returnKey option when finding documents to hit ONLY the index and not the collection itself. Very simple Redis replacement at that point. company1_tv is actually company1.tv (as a domain) and company2_com is of course company2.com similarly. LikeLike

Reply