{"id":102828,"date":"2018-03-11T10:25:11","date_gmt":"2018-03-11T10:25:11","guid":{"rendered":"https:\/\/www.deberes.net\/tesis\/sin-categoria\/identificacion-y-recuperacion-de-corpus-paralelos-en-la-world-wide-web\/"},"modified":"2018-03-11T10:25:11","modified_gmt":"2018-03-11T10:25:11","slug":"identificacion-y-recuperacion-de-corpus-paralelos-en-la-world-wide-web","status":"publish","type":"post","link":"https:\/\/www.deberes.net\/tesis\/cadiz\/identificacion-y-recuperacion-de-corpus-paralelos-en-la-world-wide-web\/","title":{"rendered":"Identificaci\u00f3n y recuperaci\u00f3n de corpus paralelos en la world wide web"},"content":{"rendered":"<h2>Tesis doctoral de <strong> Mar\u00eda Elo\u00edsa Yr\u00e1yzoz D\u00edaz De Lia\u00f1o <\/strong><\/h2>\n<p>Esta tesis doctoral es el resultado del trabajo realizado sobre la identificaci\u00f3n y recuperaci\u00f3n de corpus paralelos en la web.  Los corpus paralelos  son b\u00e1sicos como herramientas de trabajo  en muchos campos de investigaci\u00f3n.  para el desarrollo de la tesis se han considerado dos l\u00edneas diferentes: la primera l\u00ednea de trabajo abarca todo lo relacionado con la elecci\u00f3n de las caracter\u00edsticas que nos van permitir  identificar textos paralelos, mientras que la segunda l\u00ednea de trabajo desarrolla  una herramienta que nos permitir\u00e1 recuperar dichos textos paralelos en la web.   en la tesis,  en primer lugar se ha construido una gran base documental obtenida del parlamento europeo. La base documental ha estado formada por documentos escritos en cinco idiomas distintos.  esta base documental es la que se ha utilizado para extraer las caracter\u00edsticas de los documentos que nos permitir\u00e1n identificar textos paralelos. Las caracter\u00edsticas extra\u00eddas de dichos textos se han obtenido trabajando sobre dos par\u00e1metros distintos: el primer par\u00e1metro ha consistido en obtener caracter\u00edsticas  estad\u00edsticas de los documentos. Estos estudios estad\u00edsticos se han realizado sobre las siguientes variables: n\u00famero de caracteres del t\u00edtulo de cada documento, n\u00famero de palabras del t\u00edtulo de cada  documento y  por \u00faltimo extensi\u00f3n del documento medida en kb.  El segundo par\u00e1metro trabajado  ha consistido  en estudiar  datos referentes a la sintaxis de los documentos.   la segunda l\u00ednea de nuestro trabajo consiste en  desarrollar una herramienta para la recuperaci\u00f3n de corpus paralelos. Nuestro prototipo es una webcrawler implementada en lenguaje java donde incorporamos los distintos par\u00e1metros  obtenidos en la identificaci\u00f3n de los textos paralelos.  Esta herramienta  nos va a permitir rechazar aquellos documentos que sean falsos candidatos a ser textos paralelos y seleccionar s\u00f3lo los posibles textos paralelos.<\/p>\n<p>&nbsp;<\/p>\n<h3>Datos acad\u00e9micos de la tesis doctoral \u00ab<strong>Identificaci\u00f3n y recuperaci\u00f3n de corpus paralelos en la world wide web<\/strong>\u00ab<\/h3>\n<ul>\n<li><strong>T\u00edtulo de la tesis:<\/strong>\u00a0 Identificaci\u00f3n y recuperaci\u00f3n de corpus paralelos en la world wide web <\/li>\n<li><strong>Autor:<\/strong>\u00a0 Mar\u00eda Elo\u00edsa Yr\u00e1yzoz D\u00edaz De Lia\u00f1o <\/li>\n<li><strong>Universidad:<\/strong>\u00a0 C\u00e1diz<\/li>\n<li><strong>Fecha de lectura de la tesis:<\/strong>\u00a0 13\/07\/2010<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h3>Direcci\u00f3n y tribunal<\/h3>\n<ul>\n<li><strong>Director de la tesis<\/strong>\n<ul>\n<li>Antonio  Jorge Tomeu Hardasmal<\/li>\n<\/ul>\n<\/li>\n<li><strong>Tribunal<\/strong>\n<ul>\n<li>Presidente del tribunal: buenaventura Clares rodr\u00edguez <\/li>\n<li>Jorge Rami\u00f3 aguirre (vocal)<\/li>\n<li>david Almorza gomar (vocal)<\/li>\n<li>Francisco Manuel Solis cabrera (vocal)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Tesis doctoral de Mar\u00eda Elo\u00edsa Yr\u00e1yzoz D\u00edaz De Lia\u00f1o Esta tesis doctoral es el resultado del trabajo realizado sobre la [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"footnotes":""},"categories":[1506,5526,2528],"tags":[55237,35611,85147,73138,115928,208445],"class_list":["post-102828","post","type-post","status-publish","format-standard","hentry","category-cadiz","category-documentacion","category-inteligencia-artificial","tag-antonio-jorge-tomeu-hardasmal","tag-buenaventura-clares-rodriguez","tag-david-almorza-gomar","tag-francisco-manuel-solis-cabrera","tag-jorge-ramio-aguirre","tag-maria-eloisa-yrayzoz-diaz-de-liano"],"_links":{"self":[{"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/posts\/102828","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/comments?post=102828"}],"version-history":[{"count":0,"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/posts\/102828\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/media?parent=102828"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/categories?post=102828"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/tags?post=102828"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}