The syntatic tool for pattern Search in Time Series (SSTS) presents a novel approach to query search and pattern matching in time series. The proposed methodology delivers a more interactive and expressive way of matching the desired patterns in time series.
Nowadays, data scientists have a plethora of tools at their disposal to manipulate and extract complex information from time series data. However, the available methodologies still require a huge amount of cognitive effort in order to reach desired solutions, mostly because the focus goes more with what surrounds the problem than with the problem itself. For example, who might use programming for data analysis has a higher challenge in designing algorithms in C++ language than with Python language: the first has the syntax closer to the machine core, and involves worrying about garbage collection, pointers, type declaration, among others (many aspects that are not related with the problem itself); while the second has a syntax that is much closer to the human reasoning and is more accessible. It is not a question of arguing that Python is better than C++ or vice-versa (not at all), but implementing code with Python will be easier and quicker for general purposes. Regarding pattern and query search tasks complexity and productivity, it is of great importance that new methodologies, tightly related to the reasoning and visual analysis of time series, can be designed. With more expressive tools, concepts of pattern search, processing, and prediction in time series can have a broader audience and be quickly designed.
In a recent study titled “SSTS: A syntactic tool for pattern search on time series”, published in the journal Information Processing & Management, a team of researchers from the Nova University of Lisbon and Fraunhofer AICOS created an innovative tool for data exploratory analysis, which is very similar to the way we usually search for text keywords or patterns in text files (using the “find & replace” tool of text software, for instance).
The method uses regular expression queries to search the desired patterns in a symbolic representation of time series data, as will be explained further.
Human Reasoning in Data Science
Satosi Watanabe defines a pattern as “a vaguely defined entity that is the opposite of chaos, to which a name can be given“. The recognition of these entities immersed in chaos is well performed by the human brain by distinguishing or finding similarities in features that characterize a certain pattern, being an innate capability for decision-making mechanisms.
This intrinsic capacity is also revealed while working on tasks related to performing queries on time series. Quite often, data scientists face two usual tasks: query search – in which a priori information is given and the goal is to find similar instances on time series to a given predefined query; motif discovery – in which no a priori is given and the goal is finding repetitive similar patterns in time series.
In order to accomplish these tasks, data scientists look for trends, features, fluctuations and morphology variations on time series. These visual clues perceived by the human brain are thereafter implemented in traditional query mechanisms. Let us consider an example of an electrocardiogram (ECG) signal – an exhaustively studied time series which translates the variations in heartbeat dynamics self-regulated by an orchestral ensemble of electric, neurohormonal, respiratory and cardiac components. One might need to find the big narrow peak (R peak) or the peak that comes afterward (T peak). This task is easily described with words or by observation, being this description related to the morphological and statistical properties of the pattern, like amplitude or derivative. Therefore, there is an inherent language with associated syntax and grammar used by human reasoning for querying on time series.
What is the Syntatic Search on Time Series?
The Syntatic tool for pattern Search in Time Series (SSTS) was built over the previous observations and is capable of exploring time series data using a set of 3 symbolic steps: Pre-Processing, Symbolic Connotation and Search. The SSTS tool follows what is typically made for time series analysis: (1) the time series is transformed so that only the needed information is visible, therefore being prepared for the next procedures (Pre-Processing); (2) the major features of the time series are accessed to retrieve the needed information for the pattern match (Symbolic Connotation); (3) the search is performed based on the previous step (Search).
However, in this case, the time series is transformed into a symbolic representation, being the search made with regular expression queries. By adopting a set of symbolic methods, this approach has the purpose of increasing the expressiveness in solving standard pattern and query tasks, enabling the creation of queries more closely related to the reasoning and visual analysis of the time series.
More specifically, the pre-processing stage consists of the application of routine pre-processing tasks aiming to filter and remove noise from time series. Thereafter, the time series is converted from the numerical into the symbolic domain using a connotation process which generates a sequence of symbols following a grammar formalism. This process of translation profits of our capability in observing which properties are more important.
Once the time series has been converted from numerical values to a string, queries can be performed using string occurrence mechanisms. A powerful tool commonly used for such tasks are regular expressions. The search procedure returns the intervals at which positive matches occurred.
SSTS allows intuitive queries
SSTS is generic and can be applied to any type of time series, resting in the assumptions that the user has a good knowledge of the signal properties, and what is the structure being searched. This new methodology is potentially quite effective for exploratory contexts.
Although the current state of this tool requires some knowledge in data analysis, we imagine a future version less dependent on this knowledge to meet the needs of a broader range of specialists in all kinds of problems from healthcare to urban sustainability.
If you are interested in exploring and going deeper about the capabilities of this powerful tool, an interactive live demo is available on the SSTS website.
These findings are described in the article entitled SSTS: A syntactic tool for pattern search on time series, recently published in the Information Processing & Management. This work was conducted by João Rodrigues, David Belo and Hugo Gamboa from the LIBPhys-UNL, and Duarte Folgado from Fraunhofer AICOS.
Published by Duarte Folgado
These findings are described in the article entitled SSTS: A syntactic tool for pattern search on time series, recently published in the journal Information Processing & Management (Information Processing & Management 56, 1 (2019) 61-76). This work was conducted by João Rodrigues, David Belo, and Hugo Gamboa from the Universidade Nova de Lisboa, and Duarte Folgado from the Associação Fraunhofer Portugal Research.