Data Quest Tool at Hike

Conversing with your data using Natural Language Search on Data

Hike

Published in

Hike Blog

5 min readOct 7, 2020

By Manjeet Singh and Dr Ankur Narang, AI & Data Technologies Team at Hike

A data-first attitude is intrinsic to Hike’s culture enabling decisions and insights to be data-driven. This not only helps us prioritize better, align to user preferences but also unlocks the culture of ‘Outcomes over egos”. The bottom line is what gets measured, gets improved.

However, to optimize data-exploration across touchpoints and resources, it has to be simplified and made super easy for all teams to access thereby making collective decision-making more effective.

This becomes even more crucial in the case of remote operations. Having operated remotely for over six months now, here’s a quick look at how we have leveraged tech to further our journey to democratize data at Hike in the midst of a global pandemic.

Need for data and DataTools at Hike:

Data helps us make better decisions. At Hike, the mission of the Data Engineering team is to democratize the data to the whole company, thereby enabling data-driven decisions across all teams at Hike — CEO Office, Data Science, Product, Engineering, Product Marketing, Quality, Finance, Design, etc.

To accomplish this, we have built various in-house tools in the areas of data collection, data pipelines, seamless aggregation framework, data visualization, reporting, flows/funnels visualization, etc. We also use a few open source and enterprise data products.

Why we started looking for a new DataTool (Data Quest)

Every data tool needs some pre-knowledge to start with. Naive users can not use them from day one. This defeats our goal of democratizing the data to the whole company because many non-tech teams are not able to use these technical and complex data tools. We tried having a tool that doesn’t need the skill of writing queries rather users can analyze data using drag and drop functionality. But this again fails as users need pre-knowledge of schemas and it is very technical for them. So, we need a tool that is intuitive enough and helps any non-tech member fetching the data by simply writing a question in the natural language. It can be like, “Get me the daily active users for yesterday”.

We explored such tools in the market but most of them need pre-knowledge of schemas. Few of them are Tableau’s Ask Data, Google’s Data QnA and ThoughtSpot’s SearchIQ. All of them give options to search data-source to be selected for which a user must have a knowledge of schemas etc. The reason behind knowing the schema is to limit the context of the query. Queries with unbounded context can give anomalous data. So we left with the option of building an in-house tool that answers the natural language questions with data and different visualizations. As we are aware of our company’s data terminology so we can handle all unbounded queries. With this, we have the flexibility of handling the company-specific terminology in a better way.

Data Quest Components:

Now, let’s discuss the major components of the Data Quest Tool.

User Interface:

It’s the minimal and super easy web interface where a user can type his question and get the output with different viewing options. Once the output is displayed, the user will have advanced options to tweak the question and get the corresponding output.

Voice Search:

Users can simply speak and the system will be able to convert the voice signals to a text question and output will be displayed without any manual intervention. Home-grown voice models have been integrated to identify general terms and company-specific terminology.

Knowledge Base:

Data Quest is built on the top of entities which can be a metric, dimension, product feature, etc. Knowledge base is built for each of the product features. Every feature maintains a list of supported metrics and dimensions. Every dimension has a list of supported values. The alias list is also maintained for all the different possible variants of the entities. We can always keep on updating our knowledge base to increase the coverage in every possible fashion.

Question Parsing:

When we parse the question, we look for various things in it like aggregate function, metric, dimension, feature name, stopwords, group by clause, date range, etc. Entities are identified using our knowledge base graph. If some words in the question are not identified then we pass this information back to the client and that word is shown with strikes conveying the user that it is not identified by the system. If we are not able to find enough information from the question then we suggest users for the nearest entities in that question.

Query Building:

Query building is done after the parsing step. We have multiple schemas in the backend on which query can be processed. Based on the metric, we identify the schema to be used for querying. All the identified items in the parsing step are filled in a query template on that particular schema. We use default values if some information is not provided. The user must provide the metric in the question. Metrics are unique and disjoint across the schemas.

Query Execution and Output:

We are using Google BigQuery as a data warehouse. All the schemas in BigQuery are getting refreshed on a daily basis with the help of automated data pipelines. Query created in the query building step is executed on BigQuery and the response is returned to the client. The client renders the output in different display charts. The user is given various options in the UI to play around with the output by changing the input parameters. Advanced filters are shown after the response is received by the client.

Conclusion:

With the release of Hike’s Data Quest Tool, we have given a new dimension to all the Hikers to converse with data in natural language. This will help every team member to be data-aware without the need of any data specialist. Everyone is a data specialist now! 😉

Sounds like something you might want to be a part of? Check out our open roles and apply here → work.hike.in 🚀