1.

Size: px

Start display at page:

Download "1."

Baldric Hill
6 years ago
Views:

1 * 390/0/2 : 389/07/20 : ( ) ( ) ISC SCOPUS L ISA :. aminnezarat@gmail.com mosavit@pnu.ac.ir : ( ) :

2 " ". ( ) Google.. Google " " ) "foundation courses" (. Information Retrieval 2. Dictionary 3. Chunks 4. Keywords 98

3 : ( (2 (3.4 " " Corpora 99

4 % 60 % 40..(Hull and Grefenstette 996).. % 53. % 42..(Chen 2002) % 83. " " ( : (2 " ". Morphologic 200

5 ... (3.(388 )..(Mosavi Miangah 2008)...6.(Mosavi Miangah 2008)..... " "... 20

6 :. 3. (Lewis and Ringuette 994) ( ) " " ". Classification 2. Text Mining 3. Chunk 202

7 ... " " " ". " :. : : :..(Mohammadian 2004) : ).(Luck and Padgham...9. ) 203

8 390.(.... HTML XML " " GO... : XML.. XML. ID Text Type Link Date Time. Type 2. Link 3. Robot 4. Driving list 5. Tag 204

9 .... SQL Server 2005 SQL. C# (388 ) Queries 2. Cluster 3. Intelligent Agent 205

10 390 (Mosavi Miangah 2009) "money laundering". Word Net.(Miller et al. 993) Types 2. Concordance 206

11 ... (Chunk). :. (g) g = C#..2 ). (. X 2 X 2 (d,c).. c, d c, d Mosavi ) X 2.(Miangah, and Nezarat 200 : ". ". 2 AS AS.2 X 2 X 2 X

12 390 (3 ). 63/6.3 P (Critical value) X AS X 2 63/6... "" : ( AS X 2 (2 63/6 AS (3 (4 4. Every other day 208

13 ... AS (5 Every AS. AS.4. other day AS AS The The committee The committee convenes The committee convenes every committee... committee convenes 03.7 Every other.. committee convenes every 234. Every other day.. committee convenes every other

14 390.4 ( ).... ( ). X Corpus -based dictionary 2. Translation memory system 20

... AS..5 -.388..70-53 :() 25...388..388 Carlson, C. N. 2004. Information overload, retrieval strategies and Internet user empowerment. In Proceedings of COSTA Action 269, Helsinki, 69-76. http://www.

15 ... AS :() Carlson, C. N Information overload, retrieval strategies and Internet user empowerment. In Proceedings of COSTA Action 269, Helsinki, (accessed 0 Dec. 2009). Chen, H.H Chinese information extraction techniques. Summer School of Intelligent Media and Information Processing (SSIMIP), Chapter 2, National University of Singapore. pretrack.pdf (accessed 6 Dec. 2009). Douglas, O. W., and B. J. Dorr A survey of multilingual text retrieval. Technical Report UMIACS- TR-96-9, Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA. xxii, Hull, D., and G. Grefenstette Querying across languages: a dictionary based approach to multilingual information retrieval. In Proceedings of the 9th Annual International ACM Sigir, Conference on Research and Development in Information Retrieval, Zurich, Switzerland, Zurich: Assn for Computing Machinery. Lewis, D., and M. Ringuette A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR94 3rd Annual Symposium on Document Analysis and Information Retrieval, Vol. 33, Citeseer, Las Vegas, NV, IRSI, University of Nevada, Las Vegas. Luck, M., and L. Padgham, (eds.) Agent oriented software engineering VIII: The 8th International Workshop on Agent Oriented Software Engineering, AOSE 2007, Honolulu, HI, May 4, Revised Selected Papers (LNCS 495). Berlin, Germany: Springer Verlag. Miller, G., R. Beckwith, C. Fellbaum, D.Gross, and K. Miller Introduction to WordNet: an on-line lexical database. Journal of Lexicography 3: Mohammadian, M Intelligent agents for data mining and information retrieval. Hershey :Idea Group Publishing. Mosavi Miangah, T Automatic term extraction for cross-language information retrieval using a bilingual parallel corpus. In Proceedings of the 6th International Conference on Informatics and Systems (INFOS2008), March 2008, Cairo, Egypt: IEEE. Mosavi Miangah, T Constructing a large-scale English-Persian parallel corpus. META 54 ():

16 390 Mosavi Miangah, T., and A. Nezarat A novel method for cross-language retrieval of chunks using monolingual and bilingual corpora. In Proceedings of the International Conference on Advances in Information and Communication Technologies (ICT 200), ACEEE - Association of Computer, Electronics and Electrical Engineers, Cochin, India: IEEE. Sihem, A. Y., and M. Lalmas XML search: languages, INEX and scoring. SIGMOD record 35 (4): DOI: doi.acm.org/0.45/ , 59, 526 (accessed 4 Dec. 2009). 22

Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora Amin Nezarat* MS in IT, Islamic Azad University, Yazd Branch Tayebeh Mosavi Miangah Associate

Information Storage, retrieval and Management (winter 202) Abstract: Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose

17 Designing and Implementing a Cross-Language Information Retrieval System Using Linguistic Corpora Amin Nezarat* MS in IT, Islamic Azad University, Yazd Branch Tayebeh Mosavi Miangah Associate Professor of Applied Linguistics, Payame Noor University, Yazd Iranian Research Institute For Science and Technology ISSN eissn Indexed in LISA, SCOPUS & ISC special issue: on Information Storage, retrieval and Management (winter 202) Abstract: Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query and that of searched document are different. In fact, it is a retrieval process where the user presents queries in one language to retrieve documents in another language. This paper tried to construct a bilingual lexicon of parallel chunks of English and Persian from two very large monolingual corpora an English-Persian parallel corpus which could be directly applied to cross-language information retrieval tasks. For this purpose, a statistical measure known as Association Score (AS) was used to compute the association value between every two corresponding chunks in the corpus using a couple of complicated algorithms. Once the CLIR system was developed using this bilingual lexicon, an experiment was performed on a set of one hundred English and Persian phrases and collocations to see to what extend this system was effective in assisting the users find the most relevant and suitable equivalents of their queries in either language. Keywords: Cross-language information retrieval, linguistic corpora, automated translation, intelligent factors. aminnezarat@gmail.com *Corresponding author: mosavit@pnu.ac.ir x

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent