(Enter summary)
Abstract: The World Wide Web is the greatest information space unseen until now, distributed all over the world, in many languages, on many various topics. In a first part of this paper, we study the evolution of a French subset of this space during the last 3 years. During this time, the size of automatically extracted text for language modelling was multiplied by 6.5. Moreover, the French coverage has grown from 140,000 to 200,000 lexical forms. So, we show that we can get more and more reliable data... (Update)
Similar documents based on text: More All
0.6: A New Methodology For Speech Corpora Definition.. - Vaufreydaz.. (2000)
(Correct)
0.5: Web as Huge Information Source for Noun Phrases.. - Géry, Haddad, Vaufreydaz
(Correct)
0.4: Internet Documents: A Rich Source for Spoken Language.. - Vaufreydaz, Akbar.. (1999)
(Correct)
Related documents from co-citation: More All
4: Detecting and Representing Relevant Web Deltas using Web Join (context) - Bhowmick, Madria et al. - 2000
4: Multi-modal Presentation of Changes in Web Repositories
- Saeyor, Ishizuka - 1999
4: Extending temporal database concepts to the World Wide Web (context) - Grandi, Scalas - 1997
BibTeX entry: (Update)
D. Vaufreydaz and M. Gry, Internet evolution and progress in full automatic French language modelling, ASRU, Madonna di Campiglio, Italie, 2001. http://citeseer.ist.psu.edu/vaufreydaz01internet.html More
@misc{ vaufreydaz01internet,
author = "D. Vaufreydaz and M. Gry",
title = "Internet evolution and progress in full automatic French language modelling",
text = "D. Vaufreydaz and M. Gry, Internet evolution and progress in full automatic
French language modelling, ASRU, Madonna di Campiglio, Italie, 2001.",
year = "2001",
url = "citeseer.ist.psu.edu/vaufreydaz01internet.html" }
Citations (may not include all citations):
20
Cross-Language Information Retrieval Based on Parallel Texts.. (context) - Nie, Simard et al. - 1999
7
Internet Documents: A Rich Source for Spoken Language Modell..
- Vaufreydaz, Akbar et al.
7
Parole et traduction automatique : le module de reconnaissan.. (context) - Akbar, Caelen - 1998
4
A Method for Web Robots Control (context) - Koster - 1996
3
BDLEX lexical data and knowledge base of spoken and written .. (context) - Prennou, De Calms - 1987
1
Organisation de la premire campagne Aupelf pour l'valuation .. (context) - Dolmazon, Bimbot et al. - 1997
http://abu.cnam.fr/
http://www.limsi.fr/TLP/grace/index.html
Documents on the same site (http://www-geod.imag.fr/vaufreyd/Publications.asp): More
A Network Architecture for Building Applications That.. - Vaufreydaz.. (1999)
(Correct)
Internet Documents: A Rich Source for Spoken Language.. - Vaufreydaz, Akbar.. (1999)
(Correct)
A New Methodology For Speech Corpora Definition.. - Vaufreydaz.. (2000)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC