(Enter summary)
Abstract: The World Wide Web is the greatest information space unseen until now, distributed all over the world, in many languages, on many various topics. In a first part of this paper, we study the evolution of a French subset of this space during the last 3 years. During this time, the size of automatically extracted text for language modelling was multiplied by 6.5. Moreover, the French coverage has grown from 140,000 to 200,000 lexical forms. So, we show that we can get more and more reliable data... (Update)
Context of citations to this paper: More
...extraction on an unique field. On the other hand, we have showed that the Web is a very interesting source for spoken language modeling [VAU01]. Training such a language model needs a great diversity of words, and the Web is very useful for this task compared to other corpora....
Cited by: More
An Annotated Bibliography on Temporal and Evolution Aspects in.. - Grandi (2003)
(Correct)
Web as Huge Information Source for Noun Phrases.. - Géry, Haddad, Vaufreydaz
(Correct)
Active bibliography (related documents): More All
0.6: A New Methodology For Speech Corpora Definition.. - Vaufreydaz.. (2000)
(Correct)
0.4: From generic to task-oriented speech recognition : .. - Vaufreydaz.. (2001)
(Correct)
0.2: A Stateful Intrusion Detection System for World-Wide.. - Vigna, Robertson.. (2003)
(Correct)
System load high. Please wait...
Timeout. Please try your query later.
Similar documents based on text: More All
0.4: Internet Documents: A Rich Source for Spoken Language.. - Vaufreydaz, Akbar.. (1999)
(Correct)
0.4: A Network Architecture for Building Applications That.. - Vaufreydaz.. (1999)
(Correct)
0.3: Unknown - Newmethod Bh For
(Correct)
Related documents from co-citation: More All
4: Detecting and Representing Relevant Web Deltas using Web Join (context) - Bhowmick, Madria et al. - 2000
4: Multi-modal Presentation of Changes in Web Repositories
- Saeyor, Ishizuka - 1999
4: Extending temporal database concepts to the World Wide Web (context) - Grandi, Scalas - 1997
BibTeX entry: (Update)
D. Vaufreydaz and M. Gry, Internet evolution and progress in full automatic French language modelling, ASRU, Madonna di Campiglio, Italie, 2001. http://citeseer.ist.psu.edu/vaufreydaz01internet.html More
@misc{ vaufreydaz01internet,
author = "D. Vaufreydaz and M. Gry",
title = "Internet evolution and progress in full automatic French language modelling",
text = "D. Vaufreydaz and M. Gry, Internet evolution and progress in full automatic
French language modelling, ASRU, Madonna di Campiglio, Italie, 2001.",
year = "2001",
url = "citeseer.ist.psu.edu/vaufreydaz01internet.html" }
Citations (may not include all citations):
20
Cross-Language Information Retrieval Based on Parallel Texts.. (context) - Nie, Simard et al. - 1999
7
Internet Documents: A Rich Source for Spoken Language Modell..
- Vaufreydaz, Akbar et al.
7
Parole et traduction automatique : le module de reconnaissan.. (context) - Akbar, Caelen - 1998
4
A Method for Web Robots Control (context) - Koster - 1996
3
BDLEX lexical data and knowledge base of spoken and written .. (context) - Prennou, De Calms - 1987
1
Organisation de la premire campagne Aupelf pour l'valuation .. (context) - Dolmazon, Bimbot et al. - 1997
http://abu.cnam.fr/
http://www.limsi.fr/TLP/grace/index.html
Documents on the same site (http://www-geod.imag.fr/vaufreyd/Publications.asp): More
A Network Architecture for Building Applications That.. - Vaufreydaz.. (1999)
(Correct)
Internet Documents: A Rich Source for Spoken Language.. - Vaufreydaz, Akbar.. (1999)
(Correct)
A New Methodology For Speech Corpora Definition.. - Vaufreydaz.. (2000)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC