Share your knowledge at the LQ Wiki.
Go Back > Forums > Linux Forums > Linux - Server
User Name
Linux - Server This forum is for the discussion of Linux Software used in a server related context.


  Search this Thread
Old 02-14-2014, 06:21 AM   #1
LQ Newbie
Registered: Sep 2009
Location: United Kingdom
Distribution: Ubuntu, RHEL, Fedora
Posts: 17

Rep: Reputation: 0
What is big data? When and how to use a NoSQL database?

I'm curious about NoSQL databases, but my knowledge is so far a bit limited. So I have questions ....!

1. Is a NoSQL database basically just a big dictionary? A key - value list? Or is it more complex than that?

2. I suppose NoSQL databases start to make sense once you have "Big Data", or you know/suspect beforehand that you will reach that stage. But how big does the data have to be before it's "big"?

We have a relatively "big" table in our Oracle RDBMS - currently around 30G, whereas the nightly dump file for the whole database is only about 35G. Is that "big"? I guess maybe it's not that big compared to many others?

However, we are looking to expand this database to include other scientific disciplines, and so it's possible it will grow considerably both in data size and complexity. In this context I feel I need to understand NoSQL databases a bit better, to determine whether it's something we should consider.

3. My understanding so far is that NoSQL databases can be used alongside RDBMSes. Either as an integrated part of the overall data storage, or as an archive for all or part of the RDBMS. (Data in the RDBMS is moved over to the NoSQL database once it reaches a certain age - 6 months, a year etc.) Is that correct?

4. Our applications run some queries with multiple joins, mostly just joining on primary key - foreign key, although some other columns are also used. Could a NoSQL database do this kind of thing effectively? Or is that kind of data best stored in a RDBMS?

5. I like how NoSQL databases are supposed to be very flexible regarding changing the data model. Although I feel our RDBMS isn't all that rigid either - I can do a lot of changes to the schema without affecting its operation/availability. It's not quite clear to me whether a NoSQL database schema / data model can somehow "magically" allow just about any structural changes without affecting client applications. Any thoughts on how NoSQL can supposedly better handle such changes?

Your thoughts on any or all of the above are much appreciated!

Apologies for the verbosity / lengthiness.
Old 02-14-2014, 06:26 AM   #2
LQ Guru
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 11,843

Rep: Reputation: 3601Reputation: 3601Reputation: 3601Reputation: 3601Reputation: 3601Reputation: 3601Reputation: 3601Reputation: 3601Reputation: 3601Reputation: 3601Reputation: 3601
have you found this page:
or this:
Old 02-14-2014, 06:35 AM   #3
Registered: Feb 2014
Location: Europe
Distribution: Debian, Mint, Arch (multiboot)
Posts: 90

Rep: Reputation: Disabled
ad 1) NoSQL is just another clone of MySQL, so it inherits all the components (plus structure),
ad 2) Well, I have seen db's about 100GB and more in size so 30-35GB is rather small....
ad 3) Moving any sql database based on its age is not so stupid, because you have data-flow-pass every ftp.App
ad 4) NoSQL supports scretching HUJA>< so it will cope with public/private keys.
ad 5) hmmm....this should be question to dev team... but I think this is best use of NoSQL
Old 02-15-2014, 08:36 AM   #4
LQ Veteran
Registered: Feb 2003
Location: Maryland
Distribution: Slackware
Posts: 7,803
Blog Entries: 1

Rep: Reputation: 418Reputation: 418Reputation: 418Reputation: 418Reputation: 418
1) I think it depends upon what kind of nosql database you're talking about. Most of them are some sort of key-value pair, but others like XML databases don't quite fit that model. An no, nosql is NOT just another clone of MySQL.

2) Actually, nosql databases can be a big help even if you're not dealing with big data. It is all about the use cases. I work on a project that involves a whole lot of XML that changes schema fairly frequently, but probably isn't more than a few hundred gigs. XML databases like BaseX have been a very, very handy tool. Similarly, graph databases like Neo4J can be extremely useful when modeling even moderately complex relationships, regardless of size. Overall, the project has a few petabytes of data to manage, and that is still small potatoes compared to "big data".

3) Yeah, it is becoming more common to mix databases. The downside is it puts a significant burden on the API in front of those databases to understand where the data are and how to get them. You're probably looking at creating a custom API if you mix databases,

4) You gotta think differently when it comes to nosql databases. The whole concept of joining tables is pretty much out the window and for good reason. Joins are some of the most computationally expensive things RDBMSs do. And the bigger the data, the bigger the join and the slower the performance. Dumping joins is one of the biggest reasons to move to a nosql database. If you have current applications using SQL queries, moving to a nosql environment is going to mean you need to re-engineer your applications as well. It also suggests that maybe the API to the RDBMS wasn't thought out well enough, or the developers never thought of moving off of an RDBMS.

5) My experiences is that flexibility is the biggest reason for using a nosql database. Changes are easier than in an RDBMS, and frequently are transparent. Like I said earlier, one of the big changes you run into with nosql is that your API layer needs to take on a much bigger share of the load, particularly if you are using multiple technologies.
1 members found this post helpful.
Old 02-15-2014, 11:10 AM   #5
Registered: Jan 2007
Posts: 416

Rep: Reputation: 70
I agree w/hangdog on this one. Fowler has some stuff you may find useful:

The term 'nosql' is a catchall for several differnt types of db that share one thing in common: They don't use structured query language. Different nosql db's were invented to scratch different itches, so it's helpful to first break things down to what type of nosql db you're talking about.

That said, when I first started hearing about nosql in reference to 'big data' Apache HBase and Hadoop were usually being discussed.

Since then there are many, many others, several of the more popular of wh/have joined under the Apache umbrella, e.g. couchdb, cassandra, etc. Mongodb being the major exception.
1 members found this post helpful.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: DataStax Preps NoSQL, Big Data Partner Program LXer Syndicated Linux News 0 09-06-2013 01:10 AM
LXer: Big Data: 10gen VP Describes NoSQL Partner Push LXer Syndicated Linux News 0 08-29-2013 09:30 AM
LXer: 10 NoSQL, Big Data Questions: 10gen VP Matt Asay LXer Syndicated Linux News 0 08-15-2013 01:41 AM
LXer: Talend, Neo Partnership: NoSQL Database for Big Data LXer Syndicated Linux News 0 06-28-2013 10:21 AM > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 01:15 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration