Frequently Asked Questions
How does Trendalyze compete with cloud providers like Amazon (AWS - Timestream) or Microsoft (Azure Time Series Insights) ?
Trendalyze does not compete with cloud providers. We partner and deploy applications on all cloud platforms. Trendalyze provides complementary services to their offerings. The ML and DL offerings typically target the developer community, while we target business professionals with self-service tools. All of them use traditional statistical methods that have been around for many years. Trendalyze is based on a completely different, mathematical paradigm instead of a statistical approach to time-series mining and analysis. It is analogous to how physicists estimate probabilities mathematically instead of statistically. Due to the fundamental differences in the approach, many of the assumptions for statistical modeling do not apply to motif discovery.
What about the cloud computing leaders who are trying to innovate in the time-series analysis space?
At this time, the big vendors are more focused on computer vision and speech recognition modeling and relegating time series analysis to traditional methods and tools. The cloud vendors have been trying to innovate in the BI and analytics space too. It has taken more than 10 years for many to come up with offerings. We believe that the same will happen in the time-series analytics space.
Open source technology such as Facebook’s Prophet has attracted a lot of attention recently, but Bayesian modeling that it utilizes has been around for many years. Bayesian modeling is more difficult compared to ARIMA, and there is a debate about which of the two methods produces better results. The market demands self-service tools precisely because of the skills required to use either method.
Many customers ask for turnkey solutions, especially in the analytics area. This is because of the shortage of knowledgeable resources and a high failure rate of analytical projects. Turnkey solutions are perceived as risk-free. Yet turnkey solutions have neither replaced nor eliminated the demand for custom apps and self-service tools. Our experience is that custom applications remain strong in areas where they deliver competitive advantage, while turnkey solutions are better for standard non-differentiating business processes. Self-service has emerged to gain insights quickly and then operationalize the insights via custom or embedded applications to monetize them. Trendalyze is built to shorten the path from insight to monetization. – discover patterns and monitor for them in real-time to achieve desirable outcomes.
AWS Timestream or Azure Time Series Insights seem to have a similar offering, but with more capacity to scale.
The approaches are different. Both AWS and Azure scale in the estimation and scoring with statistical models which is very computationally intensive. Trendalyze scale in search and matching like Google and are less computationally intensive. Our backend is built with Hadoop and Spark which have proven scalability.
The data quality issues need to be handled with data management tools in the flow of the data, especially for real-time decision support systems. Trendalyze can be integrated with any of these tools. Compared to traditional statistical tools, Trendalyze is less sensitive to noise in the data and more robust when dealing with sparse data. See the Imputation section of this FAQ for more details on handling missing data.
Trendalyze does not require feature engineering because it does not use a statistical estimation of coefficients for variables and attributes. Instead, we compute distance scores between the shapes of patterns in the data. We also do not need to do transformations to correct the data distribution, nor we need to encode categorical variables. Pattern comparisons are performed on the grain, i.e., we create one time series for each combination of dimensions. If there are 100 stores and 50 SKUs, there are 5,000 individual series one for each SKU in every store. Trendalyze finds the patterns in all of them.
Severe data quality issues will affect any type of modeling. Wrong values must be handled with a data quality process downstream, i.e. as the data comes into the system, and not during the modeling process. Handling data quality during the modeling process produces one-off research projects and does not allow to operationalize the results, i.e. to put the model in a production environment where scoring or forecasting is automated. See an example of how motif discover works better with missing values in the Imputation section of this FAQ.
Do analytical techniques such as Trendalyze require considerable pre-processing to deal with absent or erroneous data?
This is true of statistical modeling methods that require pre-processing like normalization, standardization, transformations, and imputations to meet various model assumptions. Satisfying those assumptions is a requirement for the validity of the estimated model. Trendalyze motif discovery is not a statistical method and thus does not have such requirements. However, we support all of these methods in the platform for users who want to do machine learning or deep learning in addition to motif discovery. See the Comparison category of this FAQ for more details.
Motif discovery is non-statistical and, thus, does not require feature engineering. In its simplest implementation, a subject matter expert picks up a few motifs through interactive data explorations. Those hand-picked motifs are used as the baseline for search and monitoring. That process is not any different than data exploration and labeling for statistical modeling except that it does not require the modeling. In an unsupervised learning mode, Trendalyze profiles the time series algorithmically and annotate similar segments. This approach is analogous to clustering in traditional statistics. See Feature Engineering / Feature Encoding section of this FAQ for more details and examples.
Feature engineering is the selection of inputs for a model. These may be direct inputs or engineered inputs, i.e. inputs based on some calculations or transformations. We do not need any of these. The only input parameter that the system takes in an unsupervised learning mode is either (1) the length of the sequence, or (2) a purposefully chosen base sequence, or (3) a randomly chosen sequence (which is analogous to the random seed in clustering).
Domain-specific knowledge is key in every modeling situation regardless of the type of models used – machine learning, deep learning, or motif discovery. Trendalyze’s value proposition is that it leverages the domain knowledge of the experts directly. In machine learning and deep learning, the domain experts must transfer their domain expertise to data scientists and statisticians. This is one of the reasons why many projects get delayed and even fail.
We do not have prepackaged libraries. We provide the tools so that customers and VARs can build their industry and company-specific libraries. Pattern libraries have high business value because pattern-based strategies provide significant differentiation and competitive advantage. The companies such as trading, medical device manufacturers, and energy providers guard both the pattern libraries and the data used to derive these patterns very carefully.
When monitoring is configured, a domain expert decides on the cut-off point. But this is not different from putting a threshold on a BI KPI. The goodness of fit is a similar measure in statistical models. In some cases, 0.70 RSquare is enough and in others not. Confidence intervals, error terms, gains charts, ROC curves, and many more techniques are used in statistics to determine a tolerance level.
The platform works like the self-service BI tools. You give tools to the experts to explore the data and find relevant motifs that can be monetized. In the BI space, Tableau did not have use cases. It offered users easy tools to explore data and out of these explorations came many applications.
In many industries, the domain experts maintain patterns in the excel spreadsheets or even in paper books. Some of these are hand-drawn. In Trendalyze, you can: (1) hand draw a pattern point by point, or (2) import the data set for each pattern definition and save it in the library.
How would the process work for constructing motifs in an industry where they may not have existing motifs?
The best approach is to follow the collaborative sales methodology, where a consultant or pre-sales engineer conducts a short assessment and together with the customer discovers a few meaningful motifs. Such assessments can be done very quickly when the data is readily accessible. The assessment is followed by project implementation. The scope of the implementation will depend on the data engineering and application development complexity especially if Trendalyze is being embedded in third-party applications. This process is fairly similar to how such applications are built for BI projects. The general steps are as follows: (1) Ingest data from the process into Trendalyze (we support many data ingestion methods); (2) Machine profile and annotate the data or visually explore the data to find relevant motifs; (3) Decide on the importance of each motif, (4) Back validate on historical data for accuracy of searching and monitoring with the selected motifs (calculate the true and false positives for search), (5) Configure or develop a monitoring application. Steps 1-4 are typically done by SMEs or business analysts. Step 5 can be configured by the user in Trendalyze dashboards, or embedded by an IT specialist in a third party application via APIs.
Yes. Or it can be a VAR or SI.
Comparison to ML/DL
Motif discovery does not require to meet assumptions about the data distribution as traditional statistical methods do. Trendalyze is pattern matching AI. If you transform the data to correct the distribution, you will lose the pattern. This is also a major problem for traditional statistics as many real-life data sets do not meet distribution assumptions even when transformed. It is even a bigger problem with big data as distributions can change over time and within dimensions, which end up requiring to build multiple models. See the Non-Linear Transformation section of this FAQ for more details.
In Machine Learning, encoding is required because machine learning algorithms cannot make mathematical estimates on categorical data (i.e. string data such as Store Name, product name, etc.). Motif discovery does not require encoding because it finds the pattern within each time series for any combination of dimensions. Because Trendalyze also supports ML and DL natively, we work with all the standard encoding methods and even more. Encoding is a big problem in statistics because it can create too many dummy variables which may result in both accuracy and processing problems. For example, if you have 10,000 retail stores, you have to encode their store ID to estimate the store’s effect on sales as each store has a different contribution to sales. This will result in 10,000 dummy variables. Add to this also 1,000 products, and you have 11,000 dummies in the model. The number of variables can grow very fast in multi-dimensional data. When this happens, you may not have sufficient data to estimate the model and the processing time can take too long to be practical. For such cases, our patent-pending distance encoding can produce just one variable for each encoded variable. So instead of 10,000 dummies for store IDs, you have only one variable that captures the variation between stores. This encoding method significantly enhances the accuracy and processing time.
This is a standard function in most databases, BI and analytics tools, including our backend in the Metadata Creation module.
When building standard ML and DL models, there are different methods of imputation supported. It is part of the steps for building models as they cannot produce results with missing data. Motif discovery does not require this. One example would be a period-on-period analysis. You want to compare 60 min trading periods on how similar or dissimilar they are. Your base motif in period 1 has 60 points. The next period has only 57 due to missing data. Motif discovery will still estimate how similar they are and the missing data will be factored in the estimate directly. This has a tremendous benefit as it represents the true state of things. If you present to the user the second motif with imputed data, the similarity maybe 100% but it will not be factual. So which one is going to produce a more accurate decision? This is a question of big debate in science as many mathematicians argue that statistics incorporate the biases of the modeler. In other words, the decision to impute data and the method of imputation creates an impression of similarity that is not factual. Missing data is just missing – this is the fact. If too much data is missing, say the second period has only 10 points, i.e. 50 points are missing, motif discovery will estimate no similarity. On the other hand, imputing 50 points based on 10 points will be a very false assumption. Too many missing points is a data quality issue that needs to be handled in the downstream data collection process.
Dimensionality Reduction (Principal Component Analysis): group or consolidate highly correlated values
Motif discover does not need dimensionality reduction. Patterns can occur within any dimension. For example, medical devices – ECG. The more dimensions you have and the more patterns you find within each combination of dimensions, the better treatment you can provide. Data today is highly dimensional, and all new techniques are being developed to capture the patterns across all dimensions. In the past, the reduction of the dimensionality was done for efficiency purposes – improve processing and minimize the problems associated with categorical data encoding. But the higher the dimensionality the more insights you get. So many methods like deep learning and motif discovery are developed precisely to handle high dimensional data.
Non-Linear transformations are needed to satisfy assumptions about data distribution. Income does not have a normal distribution and the log transformation corrects this. The log transformation changes the shape of the data – it transforms a skewed distribution towards lower-income to a normal bell-shaped distribution. This makes two different graphs – l-shaped vs bell-shaped. Motif discovery is a shape-based approach; thus it does not need to change the distribution. Such transformations increase the accuracy of traditional statistical models, but they are very hard to explain to the business users and require a very sophisticated validation process. Explainability is very important. When you transform the data, it also transforms the interpretation. Log transformation is interpreted as a % change rather than absolute change. So, the end-user has to make a lot of mental calculations. A different version of the problem exists in deep learning – the predictions are unexplainable as the model is a black box. You may know what the future will be, but you do not know why. Motif discovery is explainable.