I am beginning to put up some technical topics on my blog henceforth.
As a beginner post, here is an introduction to the concepts of Big Data. The content here is based on books I read, articles I read on the internet and also based on looking at demos of certain software that I or other people use. The effort I make is to put perspectives about the domain into a crisp readable mind map which is concise and puts forth a big picture for readers.
I use mind maps. If you are uncomfortable with mind maps, please look elsewhere for a better way to understand these topics.
This mind map covers the facts about Big Data, the challenges we face with the amount of data we are generating in this world today. There are constraints to using hardware, constraints on the nature of the data that we aim to process, etc.
When discussing drawbacks, it is also useful to understand the alternatives of how problems can be solved and the risks of using such alternatives. When we try and do some operations on Big Data they can be done entirely online or offline – so it is important to understand the choices we have to handle data in this manner.
Further data is also processed in different ways depending on where the data is coming from, what exactly we want to do with the data (index, search, etc). There are tools meant for usage for specific types of processing which are also listed in the map.
Hadoop being the popular (until now) distributed data processing framework means that it has thrown its weight around considering how many companies invested into it, to use it, to enhance it and produce outcomes on large amounts of data for purposes such as business analytics, or data sciences areas.
Finally the map also covers the related tools/projects in conjunction with Hadoop that exist in the ecosystem today and what they are capable of doing to help the Big Data domain. Obviously then what you would need to do is, based on your interest dwell deeper into an area that you are interested to know more.
I will leave you with a quote I saw in a particular book that defines the reason we need a different way to attack the problems Big Data poses.
If you need an ox for heavier pulling you don’t grow a larger ox, you put more oxen