【问题标题】:How to train an Italian language model in OpenNLP on Hadoop?如何在 Hadoop 上的 OpenNLP 中训练意大利语模型?
【发布时间】:2015-05-29 17:42:30
【问题描述】:

我想在 Hadoop 上为意大利语实现自然语言处理算法

我有 2 个问题;

  1. 如何找到意大利语的词干提取算法
  2. 如何集成到hadoop中

这是我的代码

String pathSent=...tagged sentences...;
String pathChunk=....chunked train path....;
File fileSent=new File(pathSent);
File fileChunk=new File(pathChunk);
InputStream inSent=null;
InputStream inChunk=null;

inSent = new FileInputStream(fileSent);
inChunk = new FileInputStream(fileChunk);
POSModel posModel=POSTaggerME.train("it", new WordTagSampleStream((
new InputStreamReader(inSent))), ModelType.MAXENT, null, null, 3, 3);

ObjectStream stringStream =new PlainTextByLineStream(new InputStreamReader(inChunk));
ObjectStream chunkStream = new ChunkSampleStream(stringStream);
ChunkerModel chunkModel=ChunkerME.train("it",chunkStream ,1, 1);
this.tagger= new POSTaggerME(posModel);
this.chunker=new ChunkerME(chunkModel);


inSent.close();
inChunk.close();

【问题讨论】:

    标签: java hadoop nlp opennlp linguistics


    【解决方案1】:

    你需要一个语法句子引擎:

    "io voglio andare a casa"
    
    io, sostantivo
    volere, verbo
    andare, verbo
    a, preposizione semplice
    casa, oggetto
    

    当你标记了句子时,你可以教 OpenNLP。

    在 Hadoop 上创建自定义地图

     public class Map extends Mapper<longwritable,
                                intwritable="" text,=""> {  
    
               private final static IntWritable one =
                               new IntWritable(1);  
              private Text word = new Text();    
    
              @Override  public void map(LongWritable key, Text value,
                          Context context)
          throws IOException, InterruptedException {
    
                //your code here
           } 
      }
    

    在 Hadoop 上创建自定义 reduce

    public class Reduce extends Reducer<text,
                  intwritable,="" intwritable="" text,=""> {
     @Override
     protected void reduce(
       Text key,
       java.lang.Iterable<intwritable> values,
       org.apache.hadoop.mapreduce.Reducer<text,
               intwritable,="" intwritable="" text,="">.Context context)
       throws IOException, InterruptedException {
           // your reduce here
     }
    }
    

    同时配置

    public static void main(String[] args)
                          throws Exception {
      Configuration conf = new Configuration();
    
      Job job = new Job(conf, "opennlp");
      job.setJarByClass(CustomOpenNLP.class);
    
      job.setOutputKeyClass(Text.class);
      job.setOutputValueClass(IntWritable.class);
    
      job.setMapperClass(Map.class);
      job.setReducerClass(Reduce.class);
    
      job.setInputFormatClass(TextInputFormat.class);
      job.setOutputFormatClass(TextOutputFormat.class);
    
      FileInputFormat.addInputPath(job, new Path(args[0]));
      FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
      job.waitForCompletion(true);
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-09-13
      • 1970-01-01
      • 1970-01-01
      • 2019-09-21
      相关资源
      最近更新 更多