Back to conversations

Part of speech tagging across Vale and Vale Server

Hossep Dolatian

Posted on October 12, 2020

Vale uses prose to do part-of-speech tagging, while Vale Server can use LangaugeTool to do part-of-speech tagging. I'm curious how the POS tags can be maintained across the two NLP systems. 
  1. Are there any known conflicts between prose and LangaugeTool? 
  2. Can the two tools (or their lists of tagged words) be combined in some way? 
  3. Can Vale Server use both prose and LanguageTool?

Joseph Kato

Posted on October 13, 2020

There's no overlap between the two. Both Vale and Vale Server use prose to do their internal NLP-related work, while LanguageTool does its own.

The Vale Server + LanguageTool integration is done simply through input/output processing: Vale Server passes text (from one of its clients) to a local instance of LanguageTool standalone and then processes the output provided by LanguageTool. In other words, Vale Server doesn't know about the internal logic of LanguageTool and LanguageTool doesn't know about Vale Server's.

It's also important to note that, as described in the documentation, not all of LanguageTool's rules are currently supported. 

Vale Server offers a free trial to allow you to experiment with its various features and integrations prior to purchasing a license. 

Hossep Dolatian

Posted on October 13, 2020

Thank you for the clarification. Two follow-up questions

1) The documentation says what rules Vale *can* support from LanguageTool. I know this is redundant, but what are the rules which Vale can't support?

2) Do Vale and Vale Server use the same dictionary of tagged words? Is it this? Is there any documentation on how correct are the tags, or what the tags mean?

Joseph Kato

Posted on October 14, 2020

1) You can browse the selection of rules here: https://community.languagetool.org/rule/list?lang=en. The ones listed under the categories mentioned in the docs are all that's enabled for now.

2) That's the default dictionary used for spell checking in Vale and Vale Server (it's not related to part-of-speech tagging). You can read about the tagging process here: https://github.com/jdkato/prose#tagging.

Hossep Dolatian

Posted on October 27, 2020

Thanks for all the information, regarding this point:
The Vale Server + LanguageTool integration is done simply through input/output processing: Vale Server passes text (from one of its clients) to a local instance of LanguageTool standalone and then processes the output provided by LanguageTool. In other words, Vale Server doesn't know about the internal logic of LanguageTool and LanguageTool doesn't know about Vale Server's.
This makes sense to me. However, the website also says:
Additionally, by using this add-on, you also get to take advantage of Vale Server's understanding of markup—which LanguageTool lacks altogether on its own.
So then, if a DITA file has a sentence with inline code like `codeph`, `uicontrol`, or `keyref`, how does Vale Server tell LT not to apply any rules on those tagged elements?



Joseph Kato

Posted on October 28, 2020

LanguageTool's API accepts "Annotated Text", which allows clients (in this case, Vale Server) to provide a description of their content in terms what is and isn't markup.

We use the standard LanguageTool features, so their HTTP Server docs (https://dev.languagetool.org/http-server) should be able to answer most questions.

Conversation followers

Tags

part of speech
pos
tagging
vale server
prose
vale
languagetool
nlp