It’s pretty unusual these days to come across a business that doesn’t harvest, store and use data. At the same time bad quality data can actually be more of a disadvantage than a good thing. Without reliable information your data analyses are meaningless, you risk annoying customers and prospects, you lose potential advocates, miss opportunities, miss the mark, miss the point of having data in the first place. As a result you make a whole load of really bad business decisions, and that means a less-than-optimal bottom line.
Poor data undermines the exciting new challenges and fresh opportunities posed by the big data revolution. The mere fact that the data being used is so very big, so enormous, means small-ish mistakes are amplified many times over, sometimes making total nonsense of the conclusions drawn. What a waste of time, effort and money.
These are the data quality challenges we face in today’s marketing and sales landscape. So what, exactly, is dirty data, and can you fix it?
Defining Dirty Data
It’s great having masses of big data solutions, but not so great when it’s in different formats. It matters when you want to integrate incompatible information, a time when multiple formats can get horribly confusing. It might be as simple as having one set of data expressed in UK pounds and another in US dollars, but having more than one database format always means you’ve got loads of disparate data that you can’t easily merge.
Dirty doesn’t always mean incorrect. It also covers mistakes made through ignorance. Doing something as seemingly simple and silly as classifying a chunk of data wrongly, for example making the ‘year’ field into a ‘currency’ field by mistake, can have catastrophic results. This kind of thing tends to happen when the data is structured in a way that’s far too complex, something that frequently affects big transactional databases. It’s also common when you don’t really know enough about the data’s source.
Data transfer is fraught with risk, too. A basic typing error can transform useful data into meaningless nonsense, especially risky when the data is external, provided by someone whose protocols you don’t control. And server outages can be catastrophic, leading to deadly malfunctions, duplicates, and lost data.
How To Clean Up Dirty Data
Cleaning dirty data can cost a lot of money, and it can be a real challenge. Your first step is to make a sensible decision around whether it’s worth the money and time, or whether you may be better off starting from scratch with a ready-cleaned batch of data created by experts, for example a comprehensive UK business email database that has recently been properly verified, checked, de-duplicated and cleaned. It might make more sense to start fresh – but you won’t know until you get quotes for cleaning, classifying and ordering the data.
Any organization wanting to achieve reliable data quality should put key checking protocols at the heart of their data processing processes and procedures. It makes a lot of sense to create checkpoints at the data collection or mining point and again at the delivery, storage, integration, recovery, and analysis stages. And to do that properly, you need a plan.
A detailed quality assurance plan is a great starting point. Design yours so it kicks in when you’re analyzing big data, something that is particularly risky because it’s sometimes still done manually. It’s also a good idea to carry out an early-bird check if your organization doesn’t have a standardized format for data input. Check the data is entered correctly, isn’t duplicated, and doesn’t contain so many abbreviations that you can’t understand what it means.
Preventative checking is useful, and that concerns the data process’ architecture. It might mean building integrity checkpoints into the process, doing better at enforcing existing checkpoints, even making input easier for staff, for instance by creating drop-down lists rather than leaving it to them to create free-form classifications. You can consider incentivizing accuracy to make your input staff take better care over their work, or even reduce system limitations. If your CRM doesn’t pull data straight from your sales revenue database, for example, there’s always a risk of error in the transfer process. Can you automate it?
How About Error Detection?
Error detection is extremely valuable. You could actually analyze the data’s accuracy in an expensive yet reliable manner, taking a detailed inventory of the current situation, or examine a collection of random samples to check the overall quality, less reliable but giving you a good indication of whether you need to do more detailed checks.
You can examine the consistency of different data elements to see if they correspond, another good way to make quick checks that pin down the overall reliability of the information. You might want to quantify any systems errors that may damage data quality. And you can measure the success of completed processes, everything from the data collection stage onwards. If you spot a lot of invalid or incomplete records, it’s a sure sign there’s more work to do.
Whose Job Is Data Quality?
In a nutshell, it’s everyone’s. Anyone who collects, handles, inputs, stores, maintains, manipulates or analyses data has a responsibility to keep an eye on its quality. If your people spot a handful of database errors every day, and take steps to improve or fix those records, you’re getting somewhere.
Is there anyone at your company who is a natural fit for the task? Someone detail-oriented who works with enterprise data and feels comfortable with it? People like this have been called ‘data provocateurs’, and they play a vital role. They can be from any department and any level of seniority, but it’s their job to constantly challenge data quality as well as thinking creatively about how to improve it.
In conclusion, the businesses that take care of their data best of all tend to enjoy better marketing results, make more sales, irritate their customers less, generate more loyalty and – in general – make more money. When you involve everyone from the top down in data quality, you end up boosting your profits and enjoying a real competitive edge.