| Prennou G., De Calms M., BDLEX lexical data and knowledge base of spoken and written French, European conference on Speech Technology, pp. 393-396, Edinburgh (Scotland), September 1987. |
....WebFr WebFr4 Figure 1: French coverage of the three corpora The Figure 1 shows us the number of lexical forms we can find in the three corpora. These forms are obtained by computing word frequency on each corpus. The maximal list of French lexical words is constructed with two lexicons, BDLex [4] and ABU dictionaries [5] and consists of more than 400,000 forms. We can see that potentially, WebFr contains contextual information, used in n gram models, on more than the twice number of words than Grace. WebFR4 is more various than the others with a few less than 200,000 lexical forms. In ....
Prennou G., De Calms M., BDLEX lexical data and knowledge base of spoken and written French, European conference on Speech Technology, pp. 393-396, Edinburgh (Scotland), September 1987.
....speaking domains except Switzerland (not enough percentage of French documents) and Canada (too far from us on the network) In order to check for French words in WebFr4, we need an exhaustive list of them. The maximal list of French lexical words is constructed with two French lexicons, BDLex [3] and the ABU 2 dictionaries, and consists of more than 400,000 forms. With this list, we can count in the text extracted from WebFr4, more than 3 billions words. We obtained the French coverage of this corpus. We found a few less than 200,000 different lexical forms. We sorted them according to ....
....in the experiments. This database is rather easy for our task since the sentences were pronounced carefully by speakers used to speaking to a speech recognition system. Nespole This second test database is made up of 77 sentences (extracted from transcriptions of NESPOLE dialog database [3]) recorded on a client terminal and transmitted through the network with NESPOLE hardware architecture. On the distant site connected to the client terminal, VoIP G711 speech was collected. Thus we had for testing 77 speech signals transmitted through the network with G711 coding (will be referred ....
Prennou G., De Calms M., "BDLEX lexical data and knowledge base of spoken and written French", European conference on Speech Technology, pp 393396, Edinburgh (Scotland), September 1987.
....rebondit presque la hauteur de la croise, Cet inconnu traversait la cour d une maison. Figure 1: web text to French sentences filtering The first filter takes Internet documents and produces text with inserted document separators (1) Next, we have used two lexicons to select sentences : BDLex (Prennou, 1987), a dictionary with 245,000 entries, enlarged by ABU (Universal Bibliophiles Association 2 ) to about 400,000 lexical forms. The second filter produces all the sentences exclusively made with words of this vocabulary (2) It also transcribes numbers in context (date, money, etc. to textual ....
Prennou G., De Calms M. (1987) "BDLEX lexical data and knowledge base of spoken and written French", European conference on Speech Technology, pp 393-396, Edinburgh (Scotland), September 1987.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC