anc.org
域名年龄: 23年11个月6天HTTP/1.1 200 OK 访问时间:2012年09月09日 04:28:40 服务器:Apache/2.2.14 (Ubuntu) 接受单位:字节 动作:Accept-Encoding Transfer-Encoding: chunked 类型:text/html; charset=utf-8 网站编码:utf-8
15em 7em distributions oanc masc resources about first release second release open anc about contents encoding frequency data using xaira bugs & caveats obtaining contributing contents encoding frequency data using Xaira bugs & caveats contents using annotations download home overview masc I download annotations software source code frequency data publications contributor's FAQ project people anc mailing list contact us site map Join us on Facebook What's New OANC in GrAF format The full 15 million word OANC is now available in GrAF format. GrAF is the ISO standard serialization format for standoff annotations over linguistic data. GrAF annotations can be loaded into annotation tools such as GATE and UIMA and/or transduced to other formats using ANC2Go. Please consult the latest draft of the ISO TC37 SC4 Linguistic Annotation Framework for details about GrAF. BBN Named Entity annotations of the OANC Inline named entity annotation produced by the BBN tagger are now available. A rendering of the annotations in GrAF to enable merging with other OANC annotations is forthcoming. Contributed by Sameer Pradhan. Syntactic parses of 11 million words of OANC data Three syntactic parses of 11 million words of the OANC, using the Charniak & Johnson (2005) parser, MaltParser, and LHT dependency converter, have been contributed by Rasul Kalajahi. ANC UIMAUtils With funding from an IBM UIMA Innovation Award, we have developed tools to enable import and export of OANC and MASC annotations in GrAF format in UIMA. MASC V1.0.3 Revision of MASC 1 annotations, fixes a few errors. This version will soon be distributed by the Linguistic Data Consortium. ANC2Go We provide a web application that allows uers to select the texts and annotations they want and obtain them in any of several different formats. The First Release of the Manually Annotated Sub-Corpus (MASC) MASC I consists of approximately 82,000 words drawn from the OANC. The corpus includes manual annotations for WordNet senses and fulltext FrameNet frame annotations, and validated annotations for token and sentence boundaries, part of speech, noun chunks, verb chunks, named entities, and Penn Treebank syntactic annotation. The corpus includes texts from the Language Understanding Corpus, and many of the LU Corpus annoations are also included in MASC. In addition, about half of the corpus was annotated in the Unified Linguistic Annotation (ULA) project, and annotations for opinion, PropBank, and TimeML are either included in MASC I or forthcoming. All annotations, both in-house and contributed, are in LAF/GrAF format and can therefore be merged or combined using the ANC Tool and transduced to other formats using ANC2Go. OANC NGram Search Engine A beta version of the OANC Ngram Search Engine, created by Satoshi Sekine using his Linguistic Knowledge Discovery Tool, is available. We will be porting the engine to its permanent home on the ANC server this summer. ANC in the News The ANC has
© 2010 - 2020 网站综合信息查询 同IP网站查询 相关类似网站查询 网站备案查询网站地图 最新查询 最近更新 优秀网站 热门网站 全部网站 同IP查询 备案查询
2024-09-20 23:41, Process in 0.0048 second.