ArDoCo (Architecture Documentation Consistency) is a framework to connect architecture documentation and models while identifying missing or deviating elements (inconsistencies). An element can be any representable item of the model, like a component or a relation. To do so, ArDoCo first creates trace links and then makes use of them and other information to identify inconsistencies.
Vision
Documenting the architecture of a software system is important, especially to capture reasoning and design decisions. A lot of tacit knowledge is easily lost when the documentation is incomplete, resulting in threats for the software system’s success and increased costs. However, software architecture documentation is often missing or outdated. One explanation for this phenomenon is the tedious and costly process of creating documentation in comparison to (perceived) low benefits. With our project, we want to step forward in our long-term vision, where we plan to persist information from any sources, e.g. from whiteboard discussions, to avoid losing crucial information about a system. A core problem in this vision is the possible inconsistency of information from different sources. A major challenge of ensuring consistency is checking the consistency between formal artifacts, i.e. models, and informal documentation. We plan to address consistency analyses between models and textual natural language artifacts using natural language understanding and plan to include knowledge bases to improve these analyses. After extracting information out of the natural language documents, we plan to create traceability links and check whether statements within the textual documentation are consistent with the software architecture models.
How does it work?
ArDoCo Traceability Link Recovery (TLR)
The image shows the idea and processing of the approach. Text (Architecture Documentation) and models (Architecture Model or Code Model) are given as input. If the text is not yet preprocessed, it is done via Stanford Core NLP. The goal of the preprocessing is to analyse the text and annotate additional language information to it. These could be dependencies, named entities, part-of-speech tags, or found relations between words. Based on the given documentation ArDoCo first extracts potential entity names, entity types and relations from the text. This stage is called text extraction or information extraction. After it, the recommendation generation or element generation begins. In this stage, the entity names and types are combined. To increase the performance, we use the meta model as additional input for this phase. Thus, potential types are easier to detect. The name-type combinations are traded as instances on the textual site. Thereby, ArDoCo is able to recommend textual instances as potential trace links without knowledge of the instantiated model. In the following connection generation or link generation, ArDoCo has access to the instantiated model and creates trace links between the recommended instances and the entities of the models. Finally, this information can be used to perform the detection of inconsistencies. E.g., we analyse the existence or non-existence of trace links for certain types of model elements.
You can find open topics for the ArDoCo project in our SDQ Wiki.