2.Download and configure Terrier
-
Terrier Requirements:
Terrier’s single requirement consists of an installed Java JRE 1.8.0 or higher. - Download Terrier
-
Step by Step Unix Installation
After having downloaded Terrier, copy the file to the directory where you want to install Terrier. Navigate to this directory and execute the following command to decompress the distribution:
This will result in the creation of a terrier directory in your current directory. Next we will have to make sure that you have the correct Java version available on the system. Type:tar -zxvf terrier-project-5.0-bin.tar.gz
If the environment variable $JAVA_HOME is set, this command will output the path of your Java installation. (e.g. /usr/java/jre1.8.0). If this command shows that you have a correct Java version (1.8.0 or later) installed then your all done. If your system does not meet these requirements you can download a Java 1.8 from the JRE 1.8 download website and set the environment variable by including the following line either in your /etc/profile or ~/.bashrc files:echo $JAVA_HOMEexport JAVA_HOME=<absolute-path-of-java-installation>
3.Using Terrier
- indexing
- Go to the Terrier folder.
cd terrier-project-5.0 - Setup Terrier for using a TREC test collection by calling
execute result:bin/trec_setup.sh /Users/zcy/Desktop/information/document
In our example we are using a collection called VASWANI_NPL located at share/vaswani_npl/. It follows a traditional TREC test collection, with a corpus file, topics, and relevance assessments (qrels), and using the same format.
1)If necessary, check/modify the collection.spec file. This might be required if the collection directory contained files that you do not want to index (READMEs, etc).<DOC> <DOCNO>21</DOCNO> <TITLE>[Biochemical studies on camomile components/III. In vitro studies about theantipeptic activity of (--)-alpha-bisabolol (author's transl)].</TITLE> <TEXT>(--)-alpha-Bisabolol has a primary antipeptic action depending on dosage, which is not caused by an alteration of the pH-value. The proteolytic activity of pepsin is reduced by 50 percent through addition of bisabolol in the ratio of 1/0.5. The antipeptic action of bisabolol only occurs in case of direct contact. In case of a previous contact with the substrate, the inhibiting effect is lost</TEXT> <TEXT> </TEXT> </DOC>
2)Now we are ready to begin the indexing of the collection. This is achieved using the batchindexing command called from the terrier script, as follows:
With Terrier’s default settings, the resulting index will be created in the var/index folder within the Terrier installation folder.
Note: If you do not need the direct index structure for e.g. for query expansion, then you can use bin/terrier batchindexing -j for the faster single-pass indexing.
Once indexing completes, you can verify your index by obtaining its statistics, using the indexstats command of Terrier.
Now we can starting to set index in documents by:
batch-indexers - this is the code for indexing corpora of documentsbin/terreier batchindexing
Once indexing completes, you can verify your index by obtaining its statistics, using the indexstats command of Terrier.terrier indexstats -
Retrieval
convert query file(topics2017.xml) to traditional TRECIf alternatively, we want to use the title, description and the narrative tags to create the query, then we need to setup the properties as follows:
run query command:TrecQueryTags.doctag=topic TrecQueryTags.idtag=num TrecQueryTags.process=disease,gene,demographic,other TrecQueryTags.skip=DESC,NARR
execute result:bin/trec_terrier.sh -r -Dtrec.model=PL2 -c 10.99 -Dtrec.topics=/Users/zcy/Desktop/information/topics2017.xml
Once indexing completes, you can find a file named TF_IDF_2.res in /var/results. -
evaluation
Now we will use the “-e” parameter to evaluate the results.
execute result:bin/trec_terrier.sh -e -Dtrec.qrels=/Users/zcy/Desktop/information/qrels-treceval-abstracts.2017.txt
Terrier goes to the var / results directory to find all. res file evaluations, and then saves the evaluation results as a. Eval file with the same name as the corresponding. res file.
We can view the evaluation indicators in .eval file: