As we needed some kind of ID cache for synchronization purposes, which is fast and durable. By having some experience with CouchDB, a NoSQL solution seems to be worth a try. I decided to do some performance tests against the already available and running MongoDB in our server infrastructure.
Formerly we filled a LinkedList (yes, it is a bigger Java project as the "java" tag suggests ;)) by an expensive SQL database at server start (Oracle or SQL Server). Afterwards we had to check, again very expensive, with "contains" if elements are inside. Also the list was limited to a fixed size and contained only the newest IDs.
By using the Java MongoDB driver I inserted 6.5 millions simple documents ({id: "prefix_1"} into a test database. I also created an index on the field "id". Afterwards I did some different queries to check the best case for our situation.
Formerly we filled a LinkedList (yes, it is a bigger Java project as the "java" tag suggests ;)) by an expensive SQL database at server start (Oracle or SQL Server). Afterwards we had to check, again very expensive, with "contains" if elements are inside. Also the list was limited to a fixed size and contained only the newest IDs.
By using the Java MongoDB driver I inserted 6.5 millions simple documents ({id: "prefix_1"} into a test database. I also created an index on the field "id". Afterwards I did some different queries to check the best case for our situation.
Testmachine: Win7 SP1 64 Bit Core i7@3,46GHz with 16 GB RAM
No of documents: 6500000
CHECK #1: Querying one single doc
Duration (in msec): 3
Duration (in nsec): 3484028
Found element: true
--------------------------
CHECK #2: Querying 1000 (many single queries fired in a loop)
Duration (in msec): 160
Duration (in nsec): 160787507
FOUND: 1000 items
--------------------------
CHECK #3: Querying 1000 (one query with 'OR')
{ "$or" : [ { "id" : "id_0"} , { "id" : "id_1"} , { "post_id" : "id_2"}, ... }]}
Duration (in sec): 1
Duration (in msec): 1521
Duration (in nsec): 1521424872
FOUND: 1000 items
--------------------------
CHECK #4: Querying all documents into list
Duration (in sec): 45
Duration (in msec): 45424
Duration (in nsec): 45424462212
No of docs: 6500000
--------------------------
CHECK #5: Check complete list with contains
FOUND: fb_post_id_123456
Duration (in msec): 109
Duration (in nsec): 109090349
Result: It is very cheap to check IDs directly with a database query (in case of NoSQL and MongoDB), even if called in a loop, at least to compared to SQL DBMS and Java LinkedList. Unexpected: OR join is slower than many single queries.
CHECK #1: Querying one single doc
Duration (in msec): 3
Duration (in nsec): 3484028
Found element: true
--------------------------
CHECK #2: Querying 1000 (many single queries fired in a loop)
Duration (in msec): 160
Duration (in nsec): 160787507
FOUND: 1000 items
--------------------------
CHECK #3: Querying 1000 (one query with 'OR')
{ "$or" : [ { "id" : "id_0"} , { "id" : "id_1"} , { "post_id" : "id_2"}, ... }]}
Duration (in sec): 1
Duration (in msec): 1521
Duration (in nsec): 1521424872
FOUND: 1000 items
--------------------------
CHECK #4: Querying all documents into list
Duration (in sec): 45
Duration (in msec): 45424
Duration (in nsec): 45424462212
No of docs: 6500000
--------------------------
CHECK #5: Check complete list with contains
FOUND: fb_post_id_123456
Duration (in msec): 109
Duration (in nsec): 109090349
Result: It is very cheap to check IDs directly with a database query (in case of NoSQL and MongoDB), even if called in a loop, at least to compared to SQL DBMS and Java LinkedList. Unexpected: OR join is slower than many single queries.
No comments:
Post a Comment