Apache Hadoop
-
Enterprise Java
Hadoop Hangover: Launch a hadoop cluster CDH4 using Apache Whirr
This post is about how-to launch a CDH4 MRv1 or CDH4 Yarn cluster on EC2 instances. It’s said that you…
Read More » -
Enterprise Java
MapReduce Algorithms – Secondary Sorting
We continue with our series on implementing MapReduce algorithms found in Data-Intensive Text Processing with MapReduce book. Other posts in…
Read More » -
Enterprise Java
MapReduce Algorithms – Order Inversion
This post is another segment in the series presenting MapReduce algorithms as found in the Data-Intensive Text Processing with MapReduce…
Read More » -
Enterprise Java
Calculating A Co-Occurrence Matrix with Hadoop
This post continues with our series of implementing the MapReduce algorithms found in the Data-Intensive Text Processing with MapReduce book.…
Read More » -
DevOps
Hadoop Single Node Set Up
With this post I am hoping to share the procedure to set up Apache Hadoop in single node. Hadoop is…
Read More » -
Enterprise Java
Hadoop + Amazon EC2 – An updated tutorial
There is an old tutorial placed at Hadoop’s wiki page: http://wiki.apache.org/hadoop/AmazonEC2, but recently I had to follow this tutorial and…
Read More » -
Enterprise Java
Testing Hadoop Programs with MRUnit
This post will take a slight detour from implementing the patterns found in Data-Intensive Processing with MapReduce to discuss something…
Read More » -
DevOps
Distributed Apache Flume Setup With an HDFS Sink
I have recently spent a few days getting up to speed with Flume, Cloudera‘s distributed log offering. If you haven’t…
Read More » -
Enterprise Java
MapReduce: Working Through Data-Intensive Text Processing – Local Aggregation Part II
This post continues with the series on implementing algorithms found in the Data Intensive Processing with MapReduce book. Part one…
Read More »