Ucto is a unicode-compliant tokeniser. It takes input in the form of one or more untokenised texts, and subsequently tokenises them. Several languages are supported, but the software is extensible to other languages.2.5.2Maarten van Gompel, Ko van der SlootCentre for Language and Speech Technology, Radboud University and KNAW Humanities Clusterlamasoftware@science.ru.nlhttps://languagemachines.github.io/uctohttps://languagemachines.github.io/ucto/style/icon.pngGNU General Public License v3Data processing notice: All data you upload to this service and data obtained using this service will remain yours and is accessible only by you and our technical staff. Your data will not be shared with third parties and not be used for any purpose other than this service's operation. You can remove your projects at any time and are encouraged to do so, which will remove your data from our servers permanently. We can not guarantee any long-term storage of your data so you are recommended to download the results and store it yourself immediately; projects on the server will be automatically deleted after 30 days. Despite our security precautions, we do discourage use of this service for highly confidential material as there is no encryption on the storage. Last, we also collect some statistics on the frequency of use of this service, when shared this will always be anonymised.
]]>
P008284dca671bbfc6b5a7343b0bf9458e2P002695fcb08a2891b2d4a33c3b3de1246bP00e7f8a60d88954e749d800c9725f76016EnglishDutchDutch on TwitterFrenchGermanItalianFrisianSwedishRussianSpanishPortugueseTurkishConvert from PDF DocumentConvert from MS Word DocumentConvert from Latin-1 (iso-8859-1)Convert from Latin-9 (iso-8859-15)