Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
Clear All
new posts

    Advice on NoSQL integration?

    Hi -- I'm researching how to implement a database for my Smartclient app. Based on my research so far, I think a NoSQL database will work best for me because I like working with JSON directly on the server in python. It looks like MongoDB will work better than CouchDB. I'm intending to use this database for file uploads/downloads as well as my scientific data which will not necessarily be formatted uniformly (hence the NoSQL appeal). Right now, my Smartclient site is the only tool for accessing the database.

    Are there any recommendations on how to integrate MongoDB with Smartclient which minimize the code I'd have to write as well as maximize the number of SmartClient features I can use hassle-free? There seem to be two approaches:

    1) Use a RestDataSource. It looks like this might be the easiest to set up (pending further research into the glitch mentioned involving binary data), but your Quick Start Guide *highly* recommends using the SmartClient Server instead...

    2) Use a JDBC driver and a DataSource. Which serverType should I use? I'd rather not write boiler-plate code...

    2a) "sql". MongoDB is not an SQL database, but there is a mapping that could potentially be exploited for the purpose of a Smartclient interface:
    http://docs.mongodb.org/manual/reference/sql-comparison/

    2b) "hibernate". I'm seeing "beta" release on the downloads:
    http://hibernate.org/ogm/
    But this description of the configuration is encouraging:
    https://docs.jboss.org/hibernate/ogm/4.0/reference/en-US/html_single/#ogm-mongodb

    2c) "jpa". Looks like I could use Spring for this:
    http://projects.spring.io/spring-data-mongodb/

    2d) "generic". Looks like I don't want to go this route, but I can't rule it out either.


    I hope this post convinces you I'm trying to do my own research on this... but if anyone can offer advice on which routes might work best I'd appreciate it. Thanks!

    #2
    We'd recommend server-side integration (your 2 series below) for many different reasons, including being able to do various exports with no additional effort, slightly faster data delivery, and various other features mentioned here.

    We can't really comment on the quality of the JPA or Hibernate Mongo support beyond what those projects themselves are saying. But we would caution that SmartClient makes pretty full use of the JPA and Hibernate APIs so if they have bugs in the MongoDB implementation, it's somewhat likely you'll hit them.

    Also, if you like the idea of server-side JSON in Python, you will probably find it frustrating that JPA/HB will require you to build a whole POJO model around your (somewhat freeform-sounding) scientific data.

    So we would overall steer you towards "generic" or "sql". Note we weren't able to determine, from a brief look at this site you linked for "sql", whether there actually is a JDBC driver (you would need one), or whether this is just a conceptual mapping to help SQL users understand Mongo.

    Finally, just a caution that many people get excited by the hype around NoSQL DBs and choose them even where SQL is better. They are useful for certain massive scale, mostly-read, non-ACID use cases, and that's relatively rare.

    Comment


      #3
      Thanks for your very helpful response. I'll continue posting here as I research in the interest of helping people who search your documentation in the future. There are no questions in this reply, so you can ignore it unless you're bored. :-)

      Ok, I'll leave behind the JPA, Hibernate, and RestDataSource options.

      I did some research on JDBC drivers for MongoDB, and it looks like there are two. One of them is "experimental":
      https://github.com/erh/mongo-jdbc
      and the other is commercial:
      http://www.unityjdbc.com/mongojdbc/mongo_jdbc.php
      UnityJDBC does have a free version which will return a max of 100 rows/query. So neither of those options are ideal... And they
      may not solve a problem I just thought of -- how to map a single item (i.e. row+column, where row is some kind of "job id", and the result I want to display would be one of the columns) to a DataSource instead of the table itself. So probably an NoSQL option is an uphill battle all the way.

      I've also been doing research on SQL databases, and it looks like there are some neat features available. There are ORM tools like sqlalchemy which do something similar to NoSQL for my purposes (and maybe better), and PostgreSQL allows me to insert python functions into the database. As I do more reading, this is maybe what makes the most sense to me. e.g. I could store my raw python data in the database using SQLAlchemy's PickleType, have the server prepare the json in response to some RPC, and load it into a DataSource through dataURL or the return data of the RPC.

      You may ask why all the fuss then if I'm just going to do a dataURL... the answer is that I'll have other tables in my database which I can link a DataSource to directly.

      Comment

      Working...
      X