In this post we'll see how to compute the mean of the max temperatures of every month for the city of Milan. 
The temperature data is taken from http://archivio-meteo.distile.it/tabelle-dati-archivio-meteo/, but since the data are shown in tabular form, we had to sniff the HTTP conversation to see that the data come from this URL and are in JSON format. 
Using Jackson, we could transform this JSON into a format simpler to use with Hadoop: CSV. The result of conversion is this:

01012000,-4.0,5.0
02012000,-5.0,5.1
03012000,-5.0,7.7
04012000,-3.0,9.7
...

If you're curious to see how we transformed it, take a look at the source code

Let's look at the mapper class for this job:

public static class MeanMapper extends Mapper<Object, Text, Text, SumCount> {

    private final int DATE = 0;
    private final int MIN = 1;
    private final int MAX = 2;

    private Map<Text, List<Double>> maxMap = new HashMap<>();
 
    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

        // gets the fields of the CSV line
        String[] values = value.toString().split((","));

        // defensive check
        if (values.length != 3) {
            return;
        }

        // gets date and max temperature
        String date = values[DATE];
        Text month = new Text(date.substring(2));
        Double max = Double.parseDouble(values[MAX]);

        // if not present, put this month into the map
        if (!maxMap.containsKey(month)) {
            maxMap.put(month, new ArrayList<Double>());
        }

        // adds the max temperature for this day to the list of temperatures
        maxMap.get(month).add(max);
    }

    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {

        // loops over the months collected in the map() method
        for (

相关文章:

  • 2022-02-27
  • 2021-07-30
  • 2021-07-16
  • 2022-12-23
  • 2022-12-23
  • 2021-07-13
  • 2022-01-04
  • 2022-12-23
猜你喜欢
  • 2021-06-08
  • 2021-03-31
  • 2021-12-24
  • 2022-02-12
  • 2022-12-23
  • 2022-01-01
相关资源
相似解决方案