{"id":66340,"date":"2018-03-09T22:54:39","date_gmt":"2018-03-09T22:54:39","guid":{"rendered":"https:\/\/www.deberes.net\/tesis\/sin-categoria\/on-clustering-and-evaluation-of-narrow-domain-short-text-corpora\/"},"modified":"2018-03-09T22:54:39","modified_gmt":"2018-03-09T22:54:39","slug":"on-clustering-and-evaluation-of-narrow-domain-short-text-corpora","status":"publish","type":"post","link":"https:\/\/www.deberes.net\/tesis\/ciencia-de-los-ordenadores\/on-clustering-and-evaluation-of-narrow-domain-short-text-corpora\/","title":{"rendered":"On clustering and evaluation of narrow domain short-text corpora"},"content":{"rendered":"<h2>Tesis doctoral de <strong> David Eduardo Pinto Avenda\u00f1o <\/strong><\/h2>\n<p>En este trabajo de tesis doctoral se investiga el problema del agrupamiento de conjuntos especiales de documentos llamados textos cortos de dominios restringidos. para llevar a cabo esta tarea, se han analizados diversos corpora y m\u00e9todos de agrupamiento. Mas a\u00fan, se han introducido algunas medidas de evaluaci\u00f3n de corpus, t\u00e9cnicas de selecci\u00f3n de t\u00e9rminos y medidas para la validez de agrupamiento con la finalidad de estudiar los siguientes problemas: -determinar la relativa dificultad de un corpus para ser agrupado y estudiar algunas de sus caracter\u00edsticas como longitud de los textos, amplitud del dominio, estilometr\u00eda, desequilibrio de clases y estructura. -contribuir en el estado del arte sobre el agrupamiento de corpora compuesto de textos cortos de dominios restringidos el trabajo de investigaci\u00f3n que se ha llevado a cabo se encuentra parcialmente enfocado en el \u00abagrupamiento de textos cortos\u00bb. Este tema se considera relevante dado el modo actual y futuro en que las personas tienden a usar un \u00ablenguaje reducido\u00bb constituidos por textos cortos (por ejemplo, blogs, snippets, noticias y generaci\u00f3n de mensajes de textos como el correo electr\u00f3nico y el chat). adicionalmente, se estudia la amplitud del dominio de corpora. En este sentido, un corpus puede ser considerado como restringido o amplio si el grado de traslape de vocabulario es alto o bajo, respectivamente. En la tarea de categorizaci\u00f3n, es bastante complejo lidiar con corpora de dominio restringido tales como art\u00edculos cient\u00edficos, reportes t\u00e9cnicos, patentes, etc. el objetivo principal de este trabajo consiste en estudiar las posibles estrategias para tratar con los siguientes dos problemas: a) las bajas frecuencias de los t\u00e9rminos del vocabulario en textos cortos, y b) el alto traslape de vocabulario asociado a dominios restringidos. si bien, cada uno de los problemas anteriores es un reto suficientemente alto, cuando se trata con textos cortos de dominios restringidos, la complejidad del problema se incr<\/p>\n<p>&nbsp;<\/p>\n<h3>Datos acad\u00e9micos de la tesis doctoral \u00ab<strong>On clustering and evaluation of narrow domain short-text corpora<\/strong>\u00ab<\/h3>\n<ul>\n<li><strong>T\u00edtulo de la tesis:<\/strong>\u00a0 On clustering and evaluation of narrow domain short-text corpora <\/li>\n<li><strong>Autor:<\/strong>\u00a0 David Eduardo Pinto Avenda\u00f1o <\/li>\n<li><strong>Universidad:<\/strong>\u00a0 Polit\u00e9cnica de Valencia<\/li>\n<li><strong>Fecha de lectura de la tesis:<\/strong>\u00a0 15\/07\/2008<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h3>Direcci\u00f3n y tribunal<\/h3>\n<ul>\n<li><strong>Director de la tesis<\/strong>\n<ul>\n<li>Paolo Rosso<\/li>\n<\/ul>\n<\/li>\n<li><strong>Tribunal<\/strong>\n<ul>\n<li>Presidente del tribunal: manuel Palomar sanz <\/li>\n<li>eneko Agirre bengoa (vocal)<\/li>\n<li>benno Mar\u00eda Stein (vocal)<\/li>\n<li>Luis alfonso Ure\u00f1a l\u00f3pez (vocal)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Tesis doctoral de David Eduardo Pinto Avenda\u00f1o En este trabajo de tesis doctoral se investiga el problema del agrupamiento de [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"footnotes":""},"categories":[1890,2528,16820],"tags":[146187,146186,55068,146188,37679,86707],"class_list":["post-66340","post","type-post","status-publish","format-standard","hentry","category-ciencia-de-los-ordenadores","category-inteligencia-artificial","category-politecnica-de-valencia","tag-benno-maria-stein","tag-david-eduardo-pinto-avendano","tag-eneko-agirre-bengoa","tag-luis-alfonso-urena-lopez","tag-manuel-palomar-sanz","tag-paolo-rosso"],"_links":{"self":[{"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/posts\/66340","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/comments?post=66340"}],"version-history":[{"count":0,"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/posts\/66340\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/media?parent=66340"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/categories?post=66340"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.deberes.net\/tesis\/wp-json\/wp\/v2\/tags?post=66340"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}