The first steps of the computerization of the language sciences, Machine Translation and Corpus Linguistics. Historical perspectives
Histoire des Théories Linguistiques, CNRS, Université Paris Diderot
[ Abstract ]
Jacqueline León é Diretora de Pesquisas do Laboratoire d’Histoire des Théories Linguistiques (HTL, UMR7597), no Centre National de la Recherche Scientifique (CNRS), em Paris, e Conselheira da Sociétéd’Histoire et d’Epistémologie des Sciences du Langage. Seus interesses de pesquisa voltam-se para a transferência do conhecimento científico e tecnológico (em particular, a História da automatização da linguagem, a História da linguística de Corpus, a Institucionalização da linguística aplicada, e a recepção das teorias estatísticas e da teoria da informação pelas ciências da linguagem) e para o estudo do empiricismo na linguística britânica e norte-americana nas décadas de 1940 e 1960. Vem trabalhando, ainda, na constituição de arquivos e documentação sobre a História da tradução automática e do Processamento Natural da Linguagem. / Jacqueline León is Director of Researches at the Centre National de la Recherche Scientifique (CNRS), at the Laboratoire d’Histoire des Théories Linguistiques (HTL, UMR7597) , and Chairman of the Sociétéd’Histoire et d’Epistémologie des Sciences du Langage.. Her main research topics are the transfer of scientific and technical knowledge and applications (History of the automatization of language, History of Corpus Linguistics, Applied linguistics and institutionalization, the reception of statistic theories and Information Theory into Language Sciences), empiricity in American Linguistics and British Linguistics (1940-60), and the constitution of a collection of archives and documents on the history of Machine Translation and Natural Language Processing.
In my paper, I will assume that the computerization of the language sciences had been carried out in two steps involving two main areas, Machine Translation and Corpus Linguistics.
Early experiments in Machine Translation began in 1949. Although one of the most difficult tasks of Natural Language Processing, Machine Translation was the first non numerical application of computers. It was devised as a war technology, originating in war sciences which were characterized by the intertwining of engineering with fundamental research prevailing during the 2nd World War. Cybernetics, Information Theory and computational sciences emerged then as new sciences, typically illustrating the new connection. They were devised at Massachussetts Institute of Technology, which was the very place of the new scientifico-technological configuration, where Machine Translation was intended to provide mass translations for the strategic purposes of the Cold war. Linguistics, however, did not belong to war sciences, so that the computerization of the language sciences took various forms according to the linguistic traditions of the fourth main protagonists of Cold War, the USA, URSS, Great Britain and France, and their respective anchorage in the first mathematization of language of the 1930s. Consequently, Machine translation and, later, computational linguistics can be said to mark the second mathematization of language and the first turn of its computerization.
Thanks to the unprecedented technological development of computers, Corpus Linguistics emerged in the 1990s. Based on statistical methods and probabilities (instead of mathematical logics for MT), it was developed within the British empiricist tradition. Contrary to Machine Translation which was designed outside the language sciences, Corpus Linguistics was developed in the wake of the Firthian conceptions of meaning, context and lexis. Thus, it can be considered the second turn of the computerization of the language sciences. At present, the generalization of large amounts of data, which, whether digital or linguistic, are treated by similar methods, has now reached natural language processing. This “data turn” has crucial consequences: linguistic data, losing their specificity, are processed as any other data; natural language processing applications do not need linguistics any more. One may wonder whether this is not the emergence of a third step of the computerization of language.
Archaimbault Sylvie & Léon Jacqueline, 1997, “La langue intermédiaire dans la Traduction Automatique en URSS (1954-1960). Filiations et modèles”, Histoire Epistémologie Langage 19-2:105-132.
Auroux Sylvain. 1994. La révolution technologique de la grammatisation, Liège : Mardaga.
Bar-Hillel Yehoshua, 1960, “The present Status of Automatic Translation of Languages” Advances in Computers vol.1, F.C. Alt ed. Academic Press, N.Y., London: 91-141.
Cori Marcel & Léon Jacqueline, 2002, “La constitution du TAL. Etude historique des dénominations et des concepts”, Traitement Automatique des Langues, n°43-3, p.21-55.
Dahan Amy & Pestre, Dominique (eds.), 2004, Les sciences pour la guerre (1940-1960) Paris : Editions de l’EHESS.
Hutchins W.J., 2000, Early Years in Machine Translation, John Benjamins, Amsterdam, Philadelphia.
Language and Machines. Computers in translation and linguistics. 1966. A report by the Automatic Language Processing Advisory Committee (ALPAC), National Academy of Sciences, National Research Council.
Léon Jacqueline, 2007, “ From universal languages to intermediary languages in Machine Translation : the work of the Cambridge Language Research Unit (1955-1970) » History of Linguistics 2002 (Eduardo Guimaraes & Diana Luz Pessoa de Barros eds), Amsterdam & Philadelphia : John Benjamins Publishing Company :123-132.
Léon Jacqueline, 2007, “ Meaning by collocation. The Firthian filiation of Corpus Linguistics » Proceedings of ICHoLS X, 10th International Conference on the History of Language Sciences, (D. Kibbee ed.), John Benjamins Publishing Company :404-415
Léon Jacqueline, 2010, « Automatisation-mathématisation de la linguistique en France dans les années 1960. Un cas de réception externe » Actes du 2e Congrès Mondial de Linguistique Française, F.Neveu, V.Muni-Toke, J.Durand,
T.Kingler, L.Mondada, S.Prévost (eds.) pp.825-838. Paris:EDP Sciences (www.linguistiquefrancaise.org)