A leading home appliance retailer has full-line and specialty stores across the United States and Canada along with a large offering of tools, lawn and garden, home electronics and automotive repair and maintenance products. The company manages over 120 million customer contacts, originated through phone and web interactions. These contacts include requests for services as diverse as delivery, installation, home improvement and ordering replacement parts.
As a company with an extensive reach, tremendous inventory and an ongoing relationship with millions of customers, it produces and holds large amounts of highly specialized, highly diverse data.
The company maintains a database that includes product data for every part for sale through its stores, catalog and website, including frequently updated information such as pricing and availability. This incremental database, from which data is never purged, contains records for over seven million parts and grows daily. A separate database, which in itself is in excess of 50 million rows of data, maintains all of the product model data and related parts.
The parts and model data were originally intended for internal use only, and therefore were created with no set formatting or standards. The data was entered from multiple sources within and outside the company, with no standardization at the point of entry.
After more than 20 years of unstructured, unmanaged data entry into these databases, the data was riddled with inconsistencies, inaccuracies, misspellings and unrecognizable abbreviations, and only a minimum of data was truly identifiable or available.
The company sought to cleanse this data and use it to fuel an online customer resource that allows users to search for a model number or part number. The resource provides a listing of subcomponents and replacement parts for a particular model, as well as alternative parts if the original part has been discontinued. To successfully make the data customer-facing, the retailer needed to transform the data into high-quality and accurate information.
DataFlux Data Management Studio provides sophisticated data profiling and data matching technologies driven from an intuitive interface. Advanced DataFlux fuzzy-matching technology can successfully match incomplete, misspelled and inconsistent information to create a standard, unified and accurate record.
DataFlux allows users to easily create customized data matching rules, and then extend those rules across the enterprise – in batch or real time. The company used DataFlux technology to scan the millions of records in its databases, successfully identify related records and transform bad data into useful information.
With DataFlux, an initiative that many within the company had thought highly unfeasible, if not downright impossible, became a reality.
The company used DataFlux technology to effectively cleanse its databases of 20 years worth of bad data. The company standardized all part descriptions, accurately combined model and parts descriptions, and eliminated all abbreviations. Misspellings were also corrected and description formats standardized.
Furthermore, by creating an actual ongoing, monitored data governance program, the company gained the ability to define attributes at the part level, parsing attributes such as color, shape and size from the model descriptions to provide more granularity when searching the site. The data was enhanced with indicators that flagged data when an accessory or additional part or product was required. A local store availability indicator was also added, providing more value to the system.
In just 12 months, with DataFlux at the foundation of its data initiative, the company transformed two decades worth of disparate data into more than seven million records that were both accurate and useful for its customers.