Back to Contents

Map Reduce

Map Reduce was required in the cases where a large amount of data was needed to be swept in order to run some functions like reading through the documents


  • User Defines:

    • {key,value}
    • mapper and reducer functions
  • Hadoop handles logistics of execution of map, reduce functions, and intermediate results. It handles distribution and execution

  • map() reads data and outputs {key,value}

  • reduce() reads {key, value} and outputs your result

    Fig1 : Map Reduce

Word Count Example

  • In standard way we would maintain a table of words with corrosponding frequency and update this table as we read through the documents.

  • What if the number of documents is in millions??

  • Let {word,1} be the {key,value} --> map()

  • e.g.
    line = A long time ago in a galaxy far far away keys = A | long | time | ago | in | a | galaxy | far | far | away emit {key, value} = {A, 1} {long, 1} {time, 1} {ago, 1} {in, 1} {a, 1} {galaxy, 1} {far, 1} {far, 1} {away, 1}

    Fig2 : map()

  • reduce(): Loop over keys { Get next {word} {value} if {word} is same as previous word add {value} to count else emit {word}{count} set count to 0


    Fig3 : reduce()

Back to Contents