Sign in | Sign up

Duplicate items returned

Tagged:

Currently contains 7 posts

Avatar

August 11, 2013 07:45

chomply

I don’t know how widespread this issue is, but for Outback Steakhouse at least, I am receiving a large number of duplicate items. This happens for both GET and POST requests. For example, look at items with IDs 513fc993927da70408001a4b and 513fc9c6673c4fbc260024b3. They’re both “Classic Blue Cheese Wedge Salad (Dressing Included)” and all the fields are identical except the former is missing an item_description and an old_api_id. Interestingly, they even have identical updated_at values.

This is the case for most items at Outback Steakhouse. I haven’t checked a large selection of restaurants, but it’s concerning that this is happening for one of the first restaurants I tried.

Am I doing something wrong with my query, or are there just duplicate data in your database? If so, and I need to cull the duplicates on my end, what is the proper method of culling? Discard anything missing an item_description if there’s an item with the same name containing an item_description? Discard anything without an old_api_id? I could add these restrictions to my query so I don’t get so many results back (and so I can avoid so many hits to your database), but I’d be afraid of accidentally culling an item that doesn’t have a duplicate.

Avatar

August 11, 2013 20:01

Matt Silverman

This does not appear to be a widespread issue, but we are checking into how the items were duplicated now. Thanks for reporting this! Will respond back with an update shortly.

Avatar

August 25, 2015 17:57

dcwhetstone

I have found the same thing. A simple search for milk returns (example)
2 – Milk, Essential Everyday
3 – Milk, Hannaford
4 – Milk, Hood
2 – Milk, Berkeley Farms
All 1 cup servings

Avatar

August 25, 2015 18:01

Matt Silverman

@dcwhetstone —

Grocery items may have duplicate names due to different brands having the same item. For example, the items you listed are milks from 4 different brands.

You can optionally filter the API results on the client side to show only unique names in each set of results, but when dealing with 500K different food products, there are not always distinct names possible for each item.

Avatar

September 10, 2015 02:29

dcwhetstone

I found the difference was the number of servings per container. I can understand that.

But the one I’m having a real question is brand_name = Quest Bar and item_name = Protein Bar, Chocolate Chip Cookie Dough. It comes back with 21 entries. Doing a SELECT DISTINCT produces 8 rows. They are all 1 bar per container but have 3 different calorie counts 170, 190 and 210. The biggest difference I found was the updated_at. BUT if I select the most current (last updated) it shows 170 calories which does not match nutrition facts on the Quest Nutrition website.

Avatar

September 10, 2015 14:42

Matt Silverman

@dcwhetstone,

can you post the item IDs and I will investigate further. Thanks!

Avatar

September 10, 2015 17:17

dcwhetstone

5526d21caf6365e97d697267
5550ec1c4619d01249e45b20
555a44654f27e14275fb5692
5575db6290d65ac876abfed5
5590ffb66b5e33a473022036
55b070fd52f9b6c862d201f7
5558f2d8e7a9180f701061d0
557dc37b1e01923d4d58d1cb
5559806055ab3f3a2a3acc12
558180855aa3265007d122e7
535519ffeeb492a177faaea5
546a09f0399532f861bc6d0e
5481fd34c35672027455985c
54e4b096ceb9f4a37cc70312
54fe80fdbd3c19fd1cbe3889
550fbfdbb9e5866a18dcffcc
558308f70dd6b09b26065a13
55269a075356633d7ebe10c2
553638b22db13fbf349ed03d
5571e59d54f112724d076da1
55bfb4a9fa91ff2260aaaa47

Reply to thread