Full description not available
B**C
My Experience Getting Certified In Hadoop
This book is the single best source to begin your career in Big Data Development. However this book should not be the first entry point, which will frustrate you. This review hopes to help the juniors and newbies, who want to enter the big data world.Cloudera CCD-410 certification ranges between tough to very tough. Period.TRAINING : You are not mandated to take a training. I took a relatively inexpensive training ($300) from edureka dot in, an online training website in India. They give a good overview at 10,000 feet are very good for the price,but no where close enough to get certified. Check out their first session available for free at Youtube. They do have steps to install your own VM, simple project , HIVE,PIG etc. If time and money permits, I strongly suggest going to official cloudera training. It costs about $3000 and includes a free test voucher , so effectively about $2700. Saves you months in preparation time and distinct advantage over your peers that should pay for itself.Install VM, try few commands, PIG, hive commands, Also try Amazon elastic mapreduce which reduces lot of manual typing and allows you to focus on the coding itself.LEARNING FROM THIS BOOK: After a training, start with this book. The first Eight chapters are critical (Approximately 300 out of 550 pages). If you are smart,sharp and young , expect to read these eight chapters about three times, more is just fine. Add some time to read rest of chapters once Or twice before the test and all the external links. If you are a busy professional, give a six month window to take the test. Knowing Java is a definitive plus. Buy the Cloudera mock examination after getting comfortable and familiar with Mapreduce($125). It is a nice resource. Explains every answer, links to where you can get more information . Just as an FYI, the real test was far more complex and difficult.SCENARIOS BASED ON A MAPREDUCE CODE:You will need to go through the example code, understand what each line does, why it is there, what happens if you comment out a line of the code. As an example, job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); return job.waitForCompletion(false) ? 0 : -1;> What does waitForCompletion mean?,> Is Reduce Job Must Or Optional ?> How Many Files will running a Map job produce?> Will the code compile or will it error at run time based on datatypes.?> What will happen if you run the same job twice ?> What happens to the map data after the job?> How does Hadoop handle huge files that cross block boundaries ?> What happens if you do not explicitly set a mapper or reducer ?> Will a combiner help , based on a scenario ?> Which daemon decides the number of Map job to run ?> How does hadoop handle the blocks when a node crashes?SCENARIOS BASED ON HIVEQL:This is an extension of previous scenarios. A small table, a simple SQL query ( example : select stationid,max(temp) from tableX. Answer choice are four set of mapreduce code and you have to chose the right one. Expect to read and understand the mapreduce that emulates how you create a distinct, how you do a sum, average, max, min etc. According to Cloudera website, these are the percentage of questions.CHAPTER 3 : 17 PercentCHAPTER 4 : 6 PercentCHAPTER 5 : 7 PercentCHAPTER 6 : 18 PercentCHAPTER 7 : 6 PercentCHAPTER 8 : 7 PercentPIG /HIVE/SQOOP/Zookeeper : 8 percent combined (no Hbase)Chapter no 2 has no reference but is very important. Expect several questions from that chapter since it gives a good overview. Remaining is all the links that cloudera suggests to read and get familier. SQOOP import syntax, creating a hive table via sqoop , creating and populating hive table via sqoop are must knows.WHY GETTING CERTIFIED:I have heard the tiring argument that certification is purely academic. Tell that to your doctor or your Dentist. Sound fundamentals are the foundations behind real world experience. Big Data is no different. Understanding the basics will give the confidence; experience will follow while you keep your client happy.WHY BIG DATA :My interest on Big Data was spooked by the Harvard Business Review Article claiming that "Data Scientist" was the hottest job of the 21st century. Follow that by googling for "Rayid Ghani", claimed as the data scientist behind Obama's second term victory.hbr dot org forwardslash 2012 forwardslash 10 forwardslash data-scientist-the-sexiest-job-of-the-21st-century forwardslash ar forwardslash1OTHER CHOICES :> Coursera provides a free course "Introduction To Data Science". I signed up for their first batch but could not finish with office commitments.> Youtube for "Stanford University Hadoop" by Amr AwadallahI was impressed with these books; You also might like them.> Big Data: A Revolution That Will Transform How We Live, Work and Think> Big Data at Work: Dispelling the Myths, Uncovering the Opportunities> Data Science for Business: What you need to know about data mining and data-analytic thinkingSUMMARY:Some day Big Data will become a commodity skillset,but not now. I did a search in glassdoor to see the demand for Hadoop vs some other hot ones. Hadoop is head and shoulders above the rest.Hadoop - 30,011 postings on Apr 2014Oracle DBA - 9227 postings ( A Perpetual hot skillset)Salesforce - 9968 postingsPlease post any questions in the comment section and I will certainly try to answer them.
A**I
good book on a complex subject
Hadoop is a pretty complex technology for even seasoned engineers to grasp and appreciate fully. Attempting to explain its core concepts and usage in a book is no small feat but I think the author did an admirable job in capturing the essence of Hadoop and the surrounding landscape. The thing that makes Hadoop so fascinating but so hard to fully grasp is that it really involves an understanding of its surrounding complementing technologies to truly understand what Hadoop is and why it is so popular.Can this book serves as a beginners guide? I am not sure. I have read a few Hadoop blogs and articles and have some prior hello world setup experinces with Hadoop and yet I couldn't always follow the book. It is definitely not a beginners book with fools-proof detailed instructions to setup and run every example. It is however an excellent book to educate users to the world of Hadoop, what Hadoop really is, what it involves and the complementing set of technologies that integrate and/or build on top of Hadoop that makes it even more useful.I walk away from this book with a much better understanding of the inner workings of Hadoop (HDFS, MapReduce), a solid grasp of its surrounding technologies (Pig, Hive, HBase) and a much better appreciation of the power of Hadoop, especially when used alongside its many complementing technologies. This is not a beginners introductory book, nor does it cover any high level data analysis or any BI solutions scenarios. This is also not an admin/configuration guide to setup, design and maintain complex Hadoop clusters. But if you read this book with the right expectations, you won't be disappointed.My take on the current state of Hadoop is it is still in its infancy, with an overly complex set of technologies and functioning at pretty low-level. In due time, Hadoop will form the backbone distributed technology but will pretty much shielded and be invisible to most users. Higher level data analysis solutions and real time queries will be the new rage powered by Hadoop in the background. I am looking forward to the next battleground!
R**N
*The* standard and definitive guide for Hadoop
This book is so all-encompassing of Hadoop and so helpful day after day after day. It's a monumental task writing such an excellent reference for all aspects of a large, incredible, powerful stack of software. Tom White has done a fantastic job here. If you are learning about Hadoop, working with Hadoop, or thinking about Hadoop, you should have the book.The 3rd edition brings all of the juicy enhancements to the Hadoop stack, including REST API access (WebHDFS), NameNode High Availability, and many more things. Love this book. :)
R**H
The Bible, but not a Tutorial
This is the best reference out there regarding Hadoop, but do not mistake it for a tutorial -- it's not really meant to be read cover to cover. If you want that, I've heard good things about Chuck Lam's "Hadoop in Action".Now that you know what it isn't, here's what this book is:A comprehensive, "roll up your sleeves, here's some Java" deep dive into Hadoop. It covers the basics as well as advanced topics and a brief tour of the supporting projects (like Hive, Pig, etc). No single book will do Haddop justice, but this book is the best attempt so far. If you only have enough cheddar to buy a single book, this is the one you should own.
S**Y
Five Stars
Book is best to learn Hadoop basics. And got the book as promised in good condition.
S**V
Good
Good
N**O
Must have book
The ultimate book for those who are interested in the hadoop framework. Lot of clear and precious explaination to set up and use a hadoop cluster. I highly recommend this book !
D**E
Ein MUSS für alle Hadoop-User
Dieses Buch kann als die "Referenz-Bibel" für Hadoop bezeichnet werden, deshalb ist das Buch ein Muss für jeden ersthaften Hadoop-User...egal ob Anfänger oder Fortgeschrittener.Was "Java ist auch eine Insel" für die allgemeine Java-Programmierung ist, ist diese Buch für Hadoop. Darüber hinaus werden auch noch optionale Tools und Addons wie Hive, Pig, Snoop usw. behandelt.
A**N
Bought just for curiosity
I hope to have to use in the future Hadoop in a production evironment: with this book you can understand what is and how you use big data processing.
Trustpilot
2 days ago
2 months ago