Division of the sentence into phrases

Министерствообразования Республики Беларусь
Учреждениеобразования
«Гомельскийгосударственный университет
им. Ф. Скорины»
Филологический факультет
Курсовая работа
Divisionof the sentence into phrases
Исполнитель:
Студентка группы К-42
Лапицкая Т.Е.
Гомель 2005

Content

Introduction
Presentation
Algorithmfor division of the sentence into phrases
Lists used byAlgorithm No 2
Some examples ofthe performance of Algorithm No 2
Conclusion
References

Introduction

Formultiple purposes, in Text Processing and Machine Translation, often there is aneed to divide the sentence into smaller units that can be processed moreeasily than the whole sentence, especially when the sentence happens to be along one. To that purpose we have devised an efficient algorithm based on theassumptions presented in the next section. />

Presentation

Whenwe say that we are going to divide the sentence into phrases, we must statefirst how we will define the phrase and what our understanding of the phrasewill be where it starts and where it ends. For the purposes of the presentalgorithm (and not for any other, especially theoretical, purposes) the phraseis delimited on its left and on its right by Punctuation Marks and Auxiliarywords. The phrase usually starts with an Auxiliary word and ends with theappearance of a Punctuation Mark or an Auxiliary word.
TheAuxiliary words, marking the boundaries of the phrases, are presented in tables(Lists). Each table lists Auxiliary words of a particular type. It was observedthat some Auxiliary words (as well as some sequences of consecutively usedAuxiliary words) start usually longer and more independent phrases than others.For example, in a sentence like is often difficult to seek solutions throughthe curtailment of consumption.
TheAuxiliary word through followed by the Article the (another Auxiliary word)starts a phrase that ends with the appearance of a Punctuation Mark, while theAuxiliary word of starts a sub-phrase which is part of a longer phrase. In ouralgorithm (see Algorithm No 2 in Section 3) this subdivision of the sentenceinto longer phrases and the subdivision of the longer phrases into smallerconstituent phrases is expressed by leaving different lengths of space betweenone phrase and another. The longer the space left before the phrase, the moreself-sufficient and independent the phrase is thought to be. In this study wehave established five types of phrases, depending on their relativeindependence within the sentence. This independence is expressed by aparticular Auxiliary word (or words) or by a Punctuation Mark. The longest andthe most self-sufficient and relatively independent phrase starts and ends witha Punctuation Mark. The second most independent phrase starts with a word fromList No 1 and ends with a Punctuation Mark or with the appearance of anotherAuxiliary word from List No 1. For example:
(6spaces left) One US government study estimated
(5spaces left) that there are 68 large manufacturing complexes
(4spaces) in the region
(5spaces left) that have significant idle capacity, (end)
Thefull stop at the start of the sentence is equivalent to six spaces. In otherwords, a smaller space following after a larger space to the left means thatthe phrase starting after the smaller space is dependent on, and a constituentof, the larger phrase. The smaller space in the example above (4 spaces) showsthat the phrase following after it is dependent on the previous phrase thatthere are 68 large manufacturing complexes and explains it (or bringsadditional information about it, here location), while the five spaces leftafter region signify that the next phrase is dependent on the previous largephrase (the one that has a longer space left in front), in this case One USgovernment study estimated that there are 68 large manufacturing complexes.
Thespace left between the phrases depends on the actual Preposition (orPunctuation Mark) used or on the sequence of Punctuation Mark and/or Auxiliarywords, as specified (for more details see the instructions for Algorithm No 2below)./>

Algorithmfor division of the sentence into phrases
Inputtext comparingof each word entry Searchingleft or right with the Auxiliary words or(up to two words) for Punctuation Marks (presentedother Auxiliary words in Lists) and identifying theor Punctuation Marks Auxiliary words or Punctuation MarksOutput result: a phrase
Note:The algorithm (27 digital instructions in all) is available for free downloadon the Internet (see Internet Downloads at the end of the book).Listsused by Algorithm No 2
NBThe words not registered in the Lists are recorded as they follow, in the samesequence, after those registered in the Lists.
(i)      ListNo 1: besides, therefore, however, whereas, thus, hence, though, despite, with,nevertheless, throughout, through, during, that, only, but, if, otherwise,again, which, although, thereby, already, against, unless, thereafter etc.
(ii)     ListNo 2: over, as, what, toward(s), for, into, about, by, so, from, at, above,under, beside, below, onto, since, behind, in front of, beyond, around, before,after, then, altogether, among(st), between, beneath etc.
(Hi)List No 3: both, neither, none etc.
(iv)    ListNo 4: of, to (as Preposition)
(v)     ListNo 5: the, a, an
(vi)List No 6: so much as, so far as, so far, as long as, as soon as, so long as,in order that, in order to, lest, as well as, and, or, noretc.
(vii)List No 7: such, than, onto, until, all, near, even, when, while, within, last,next, also, less, more, most, whether, much, once, one, any, many, some, where,another, other, each, then, whose, who, whoever, till, until, what, across,whence, according, due to, owing, whereby, prior, wherever, whenever, already,moreover, likewise, however etc.
(viii)List No 8: out, in, on, down etc.Someexamples of the performance of Algorithm No 2
Belowwe will present a text divided into phrases according to the instructions forthe algorithm:
(i)Many countries also have established or have under construction a free zone,where exporters have access to shipping facilities, a pool of labour andfreedom from exchange controls.
(ii)The Caribbean Basin Initiative, a US package of aid and trade incentives toencourage manufacturing, has given an added boost to industrial development inthis region.
Theanalysis of the sentence starts with checking the contents of the memory andtaking to print any information stored up to this moment (this is done at thestart of each new sentence), also with ascertaining whether the sentence hasended or not and recording the analysed word in the memory if it is notrecorded yet ia procedure carried out after each word). Then the algorithmreads the next word (in No 4a), which in the case of (i) above is many, andproceeds to analyse it in 5. Since it is not a full stop or any otherPunctuation Mark (5, 7), nor a word specified in 9, 11, 13, 15, 17 or 19, theanalysis yields no result until the program gets to operation No 21, where theword many is located in List No 7. Here the program, through operation No 22,checks whether many is followed by yet another word from the Lists. Operation22ab certifies that it is not, and instructs the program to cut the sentence atthis point and to leave three spaces (before many) when recording it, then toreturn to operation No 2 to start the analysis of the next word. The next word,countries, could not be identified (it is not registered in the Lists),therefore operation 27 instructs the program to record it in the memory as thenext consecutive word of the phrase and to return to 2 to continue the analysisof the sentence.
Theword also follows next. The program cannot locate the word and proceedsfurther, after registering it. The next words have and established are dealt within a similar way. Next comes the Conjunction or. The program locates the wordin operation No 17, then it checks if other words from the Lists follow (18). Asingle space is left before recording it (No 18b). The word have is registerednext and the program reaches under (15) to draw a dividing line by leaving fourspaces (16ab), and this carries on till the end of the text.
Theseprocedures can be applied to any English language texts. The actual users ofthe algorithm can improve it by adding new words to the Lists or by changingthe dividing lines to suit other strategies and other interpretations of theboundaries of the English phrase. />

Conclusion
AlgorithmNo 2 was developed with the special purpose of aiding the overall automaticanalysis of the sentence. The division of the sentence into smaller units helpsus understand better its meaning, though the division, as presented in thissection, is not based on meaning but on formal features. The reader will findsomewhat different and much more accurate interpretation of the existingboundaries within a sentence in Part 2.
In the course of this study it was observed thateach foregoing phrase finds further interpretation of its meaning in the nextphrase. In other words, the first phrase of a sentence carries a certainmeaning, which with each successive phrase becomes more and more clear andcomplete — the next phrase simply adds more information to the meaning of theprevious phrase. The phrases have varied mutual interdependence, which we triedto express with a margin left between them. We will expressthis graphically in Figure 2.2, which considers two sentences.
The brackets show the dependence of each succeedingphrase both on the previous one and on all preceding ones. In the secondsentence, the phrases are separated with equal space left between them. Inthose cases where the space left is smaller, this means that the tie with theprevious phrase is stronger (i.e. the next phrase is an integral part of thepreceding one). A sudden surge of the interval signals the division between twophrases, as in the example in Figure 2.3. In this example, the second largephrase (Clause) explains the meaning of the first. This is indicated with the intervalleft and with the brackets./>

References
1.Brill, E. and Mooney, R. J. (1997), ‘An overview ofempirical natural language processing', in AI Magazine, 18 (4): 13-24.
2.Chomsky, N. (1957), Syntactic Structures. The Hague:Mouton.
4.Curme, G.O. (1955), English Grammar. New York:Barnes and Noble.
5.Dowty, D.R., Karttunen, L. and Zwicky, A.M. (eds)(1985), Natural Language Parsing. Cambridge: Cambridge University Press.
6.Garside, R. (1986), 'The CLAWS word-tagging system',in R. Garside,
7.G. Leech and G. Sampson (eds) The ComputationalAnalysis of English. Harlow: Longman.
8.Gazdar, G. and Mellish, C. (1989), Natural LanguageProcessing in POP-11. Reading, UK: Addison-Wesley.
9.Georgiev, H. (1976), 'Automatic recognition ofverbal and nominal word groups in Bulgarian texts', in t.a. information, RevueInternational du traitement automatique du langage, 2, 17-24.
10.Georgiev, H. (1991), 'English Algorithmic Grammar', inApplied Computer Translation, Vol. 1, No. 3, 29-48.
11.Georgiev, H. (1993a), 'Syntparse, software programfor parsing of English texts', demonstration at the Joint Inter-Agency Meetingon Computer-assisted Terminology and Translation, The United Nations, Geneva.
12.Georgiev, H. (1993b), 'Syntcheck, a computersoftware program for orthographical and grammatical spell-checking of Englishtexts', demonstration at the Joint Inter-Agency Meeting on Computer-assistedTerminology and Translation, The United Nations, Geneva.
13.Georgiev, H. (1994—2001), Softhesaurus, EnglishElectronic Lexicon, produced and marketed by LANGSOFT, Sprachlernmittel,Switzerland; platform: DOS/ Windows.
14.Georgiev, H. (1996-2001a), Syntcheck, a computersoftware program for orthographical and grammatical spell-checking of Germantexts, produced and marketed by LANGSOFT, Sprachlernmittel, Switzerland;platform: DOS/Windows.
15.Georgiev, H. (1996-200lb), Syntparse, softwareprogram for parsing of German texts, produced and marketed by LANGSOFT,Sprachlernmittel, Switzerland; platform: DOS/Windows.
16.Georgiev, H. (1997—2001a), Syntcheck, a computersoftware program for orthographical and grammatical spell-checking of Frenchtexts, produced and marketed by LANGSOFT, Sprachlernmittel, Switzerland;platform: DOS/Windows.
17.Georgiev H. (1997-2001b), Syntparse, softwareprogram for parsing of French texts, produced and marketed by LANGSOFT,Sprachlernmittel, Switzerland; platform: DOS/Windows.
18.Georgiev H. (2000 2001), Syntcheck, a computersoftware program for orthographical and grammatical spell-checking of Italiantexts, produced and marketed by LANGSOFT, Sprachlernmittel, Switzerland;platform: DOS/Windows.
19.Giorgi A. and Longobardi G. (1991), The Syntax ofNoun Phrases: Configuration, Parameters and Empty Categories. Cambridge:Cambridge University Press.
20.Graver B.D. (1971), Advanced English Practice. Oxford:Oxford University Press.
21.Grisham R. (1986), Computational Linguistics. Cambridge:Cambridge University Press.
22.Harris Z.S. (1982), A Grammar of English onMathematical Principles. New York: Wiley.
23.Hausser R. (1989), Computation of Language. Berlin:Springer.
Hornby.A. S. (1958), A Guide lo Patterns and Usage in English. London: OxfordUniversity Press.
24.Kavi M. and NirenburgS. (1997), 'Knowledge-based systems for naturallanguage', in A. B. Tucker (ed.) The Computer Science and Engineering Handbook.Boca Raton, FL: CRC Press, Inc., 637 53.
25.Koverin A.A. (1972), 'Grammatical analysis, on acomputer, of French scientific and technical texts' (in Russian), PhD thesis,Leningrad University, Russia.
26.Leech, S. and Svartvik, J. (1975), A CommunicativeGrammar of English. London: Longman.
27.Manning C. and Schutze H. (1999), Foundations ofStatistical Natural Language Processing. Cambridge, MA: MIT Press.
28.Marcus M.P. (1980) A Theory of Syntactic Recognitionfor Natural Language. Cambridge, MA: MIT Press.
29.McEnery T. (1992), Computational Linguistics. Wilmslow,UK: Sigma Press.
30.Mihailova I.V. (1973), Automatic recognition of thenominal group in Spanish texts' (in Russian), in R.G. Piotrovskij (ed.) InjenernajaLinguistika. St Petersburg: Politechnical Institute, 148-75.
31.Primov U.V. and SorokinaV.A. (1970), 'Algorithm for automatic recognition ofthe nominal group in English technical texts' (in Russian), in R. G.Piotrovskij (ed.) Statistika Teksta, II. Minsk: Politechnical Institute.
32.Pullum, G.K. (1984), 'On two recent attempts to showthat English is not a CFL', Computational Linguistics, 10 (3-4), 182-6.
33.Quirk, R. and Greenbaum, S. (1983), A UniversityGrammar of English. London: Longman.
34.Quirk R., Greenbaum S., Leech G. and Svartvic J.(1972), Grammar of Contemporary English. London: Longman.
35.Reichman R. (1985), Getting Computers to Talk likeYou and Me. Cambridge, MA: MIT Press.
36.Sestier A. and Dupuis L. (1962), 'La place de lasyntaxe dans la traduction automatique des langues. Esquisse d'un nouveausysteme de description grammaticale et de son utilisation pour lareconstruction des structures grammaticales', Inge'nieurs et Techniciens, No.1555, 43-50.
37.Schank R. and Fano A. (1992) 'Knowledge, memory,learning and teaching. A survey of our research', in t.a.l., TraitementAutomatique des Langues, Vol. 33, No. 1-2.
38.Shanks D. (1993) 'Breaking Chomsky's rules', NewScientist, February, 26-30.
39.Shieber S. M. (1985) 'Evidence against the non-context-freeness of naturallanguage', in Linguistics and Philosophy, 8, 333-43.
40.Stannard A. (1974), Living English Structure. London:Longman.
41.Urdang L. (ed.) (1968) The Random House Dictionaryof the English Language (College Edition). New York: Random House.

Не сдавайте скачаную работу преподавателю!

Данный реферат Вы можете использовать для подготовки курсовых проектов.

Доработать Узнать цену написания по вашей теме

Поделись с друзьями, за репост + 100 мильонов к студенческой карме :

Заказать работу:

!	Курсовая работа
!	Дипломная работа
!	Реферат
!	Решение задач
!	Отчет по практике
!	Контрольная работа

Пишем реферат самостоятельно:

!	Как писать рефераты Практические рекомендации по написанию студенческих рефератов.
!	План реферата Краткий список разделов, отражающий структура и порядок работы над будующим рефератом.
!	Введение реферата Вводная часть работы, в которой отражается цель и обозначается список задач.
!	Заключение реферата В заключении подводятся итоги, описывается была ли достигнута поставленная цель, каковы результаты.
!	Оформление рефератов Методические рекомендации по грамотному оформлению работы по ГОСТ.

Читайте также:

→	Виды рефератов Какими бывают рефераты по своему назначению и структуре.

Другие популярные рефераты:

Реферат	Механические волны
Реферат	Обратная матрица
Реферат	Инверсия и ее применение
Реферат	Социокультурная динамика межпоколенных взаимодействий
Реферат	Расследование преступлений в сфере компьютерной информации
Реферат	Туристско-краеведческая характеристика Калининградской области
Реферат	Организация административно-хозяйственной службы гостиницы
Реферат	Диагностика психологической готовности ребенка к школе 2
Реферат	Модель современного менеджера
Реферат	Условия формирования военно-административной системы Южного Зауралья в XVII - первой половине XIX века

Сейчас смотрят :

Реферат	лабораторная работа: изучение команд Windows
Реферат	Принцип программного управления
Реферат	«История науки о финансах»
Реферат	Массивы в языках Pascal и Basic
Реферат	«Военно-медицинская академия имени С. М. Кирова»
Реферат	Понятие ощущений
Реферат	Асфиксия новорожденных
Реферат	Экельн, Фёдор Андреевич
Реферат	Вологодская картинная галерея
Реферат	Наклонный пластинчатый конвейер
Реферат	Логическое проектирование и минимизация
Реферат	Операции многократной точности (операции с длинными числами)
Реферат	Описание языка Turbo Basic для студентов всех специальностей
Реферат	Господарське процесуальне право України, Чернадчука
Реферат	Электроснабжение родильного отделения для коров на 72 места с профилакторием и вентпунктом

Реферат по предмету "Информатика, программирование"

Division of the sentence into phrases

Другие популярные рефераты:

Сейчас смотрят :