Product Engineering

Building a Search Engine for Afterpay’s Shop Directory

We discuss some of the initial issues with the search capabilities in our Shop Directory, our technical approach to addressing them, lessons learned, and next steps.

Afterpay

Nov 24, 2020 • 5 min read

Photo by Markus Winkler on Unsplash

By Jose Picado, Qiao Wang, and Yi Li

Context

Our Shop Directory is used by our consumers to discover stores, brands, and products. It is incredibly valuable for our retail partners: Almost 20 million referrals per month in the 2020 July - Sept quarter. The Shop Directory, available on the Web and our mobile app, contains nearly 64,000 stores, and each store sells 1000s of products. So providing an effective search capability to enable consumers get relevant results quickly is important for a smooth shopping experience. A sizable % of the referrals come directly from search. Our long term goal is to make the Shop Directory the place where most consumers start their shopping journey by providing great value in meeting their purchasing needs.

A key limitation of the current search functionality

The system that powers the current search functionality is primarily based on matching search strings to keywords (similar to Google, Bing, etc.) and sorting them by relevance. For each store in the Shop Directory, we generate a diverse list of keywords relevant to the store, using proprietary in-house algorithms. This works particularly well with store names or their variants.

An important limitation of this approach is that the keywords we have generated for a store may not contain information about their products or brands, impacting relevancy of results. For instance, Air Jordan Basketball Shoes are one of the most popular items purchased by our consumers, according to Afterpay Fashion and Beauty Report. Searching for “air jordan”, the current suggestions shown below search bar and the returned results are not entirely relevant, as shown in Figures 1 and 2, respectively.

Figure 1: Suggestions shown below the search bar when typing “air jordan” with the keyword-based system.

Figure 2: Search results for the query “air jordan” with the keyword-based system.

What did we set out to build?

Motivated by this and other similar findings, we decided to build a completely new search platform to power the Shop Directory. Based on our learning and feedback, we zeroed on the following key improvements for the new version:

Enhanced matching and ranking: Search stores by name, category, and products and brands sold by the stores.
Enhanced search-as-you-type: Incorporate auto-complete for store names and search-as-you-type based on enhanced matching and ranking.
Typo tolerance: Both search and search-as-you-type functionalities need typo tolerance.
Data freshness: Ensure that the search index reflects new stores and new products sold by stores.
Support human augmentation: Provide a way for analysts to augment search queries by adding synonyms. For instance, if the consumer’s search query is “furniture”, they can add words such as “chair”, “table”, etc. to improve the probability that a relevant store is returned (recall).

In addition we needed to maintain hygiene metrics such as availability and scalability to handle extremely high-traffic seen in the shopping seasons while maintaining low latency (<= 100 ms).

Detailed Solution Architecture

We implemented the search platform using the following architecture:

Figure 3: System architecture.

Sourcing the data: We obtain data from two sources. When a consumer completes an order in a store website, Afterpay Checkout Service sends an order completion event to Kafka. The Product Service obtains information about products from external data sources and sends product ingestion events to Kafka. The Kafka events flow into Amazon Redshift Data Warehouse (DWH).
Indexing the data: We run scheduled jobs, orchestrated by Apache Airflow, to index new data. The Indexing Service reads data from the DWH and writes into the Elasticsearch index.
Search and ranking: The Search Service calls Elasticsearch to find the stores relevant to the search query. We use multi-match queries, with typo tolerance and augmented with synonyms, to search over store names, products, and categories. We use Elasticsearch completion suggester to auto-complete store names. The Search Service then sorts the stores based on ranking rules and returns the results. The Search Service records all search events in the DWH by sending them to Kafka.
Feedback and analytics: When the consumer clicks on a store’s tile, the consumer is referred to the store’s website. The User Client records search, click, and order events in Amplitude. We use these events to measure relevant business metrics and to improve our matching and ranking mechanisms.

Rolling out a new search experience

The new search system overcomes the limitations that we saw on the keyword-based system. For instance, Figures 4 and 5 show the suggestions shown below the search bar and the returned stores, respecitvely, when the consumer searches for “air jordan”. The shown stores sell items related to the search query, such as Air Jordan shoes and apparel.

Figure 4: Suggestions shown below the search bar when typing “air jordan” with the new search system.

Figure 5: Search results for the query “air jordan” with the new search system.

The search system supports auto-complete functionality. For instance, if the consumer types “finish”, the first suggestion shown under the search bar is Finish Line, a store that sells athletic shoes and apparel. The system also supports typo tolerance. For instance, if the consumer types “finsh” (notice the typo), the first suggestion shown under the search bar is still Finish Line. Further, if the consumer clicks on the search button, the first store in the search results is also Finish Line.

We have currently rolled out the new search functionality to a small percentage of our consumers. We are A/B testing to compare the performance of the existing keyword-based system and the new system. We have seen an overall increase in search-to-click conversion corresponding to products and brands using the new system: For example "nike", one of our top search queries, increased by 15%. However, the keyword-based system outperforms the new system in some cases. We will continue improving our system with the goal of achieving a higher search-to-click conversion in all queries, hence providing a better shopping experience to our consumers.

What did we learn?

Search is challenging as we don’t know a consumer’s intent just from search terms.
Putting things into production frequently, learning from data-based real user feedback (similar to ML projects), and iteratively improving the system worked particularly well.
Ranking is equally important as matching - consumers really only care about the top few results.

What are we excited to work on?

Combine store, product, brand, and category information with the keywords used by the keyword-based system. This will help in both matching and ranking.
Use Machine Learning techniques to improve ranking.
Personalize search results by customer to improve relevancy.

The journey is only 1% complete. Our goal is to build a powerful and robust search platform that provides an optimal search experience. Our team of product managers, data scientists, and engineers are working together to achieve this goal. Thanks to James Taylor, Ivy Black, Lei Yan, and Guangyu Dong. This project has been supported by the Machine Learning, Data Science Engineering, Consumer Engineering, Global Data Engineering, and Platform Engineering Teams.

Sounds Interesting?

Interested in joining us in a Software Engineering role and help us improve lives for millions of consumers around the world, every day? Consider joining our team.