“Drank lots and lots of depth, it won’t disappoint”
–computer generated review of O’Vineyards wine
I’m playing with some software that will allow me to analyze all the comments O’Vineyards wines have received online. One of the sillier, fun applications of this analysis is that my computer can generate comments on its own now. 😀
Some of you might be familiar with the silly tasting note generator or similar sites, but these use slightly different technology. I’m working with n-gram analyses of the reviews I get from Naked Wines customers.
What is an n gram analysis?
Basically, the computer counts every word and then it counts every word pair and then it counts every word triplet and so on. This data lets the computer draw some conclusions about what words tend to appear together. So if I do an n-gram analysis of the phrase “I went to the movies”, the word pairs are:
- X I
- I went
- went to
- to the
- the movies
- movies X
The X’s indicate the start or end of a phrase.
The word triplets in the same phrase would be
- X I went
- I went to
- went to the
- to the movies
- the movies X
How does the computer generate new sentences?
The more data you feed into the computer, the more n-grams it collects. And it can eventually draw some relatively accurate conclusions. Imagine if I do a larger sentence like “I went to the movies and had to wait in the longest line ever to buy some popcorn”, the program would notice all the previous word pairs as well as the new pair: “to buy”… and the computer might conclude that it’s normal to say “I went to buy some popcorn.” and that is actually correct! Of course a lot of the time, the computer tries hard but just spouts gibberish. Like “I went to the longest popcorn ever to buy some movies.”
This differs from the silly tasting note generator mentioned above because that generator works more like a mad lib. It has long lists of words that are manually categorized as modifiers, nouns, verbs, or other parts of speech, and it uses pre-written sentence structures. It makes more sense given very little data, but it is limited to what it has been taught. What I’m working on could eventually be applied to any body of letters (even a language I don’t speak) and generate reviews based on an n gram analysis of that text (so I could do this for Japanese reviews even though I don’t even speak Japanese!)
Gibberish examples
Most of the time, the computer generated reviews are total gibberish. The syntax can be terribly wrong. Here are some fun examples of typical gibberish reviews:
This is a very good black-red with onions, sauteed pots with our Les American than Languedoc and complicated, dark fruitiness notes, but this achieved the lower they called it loved it. Even my 81 year when the wine, we had to open the duration is elastic, then essentially the oven as it needs taste when the tasted some mixed cases now decreased to say about to email the sale this bottle
lot of purple. Very floral with the market Place right-hand drive!
Got through fruit and Joe are in the minimum quantity !
Wouldn’t spoil something else on my anatomy. I do buy wine is not in favour of the buyer, and less fun!
I found it was a please passed over the price and my guests both gave it 5 out on it! I really want it?
Big (not one a couple of days when we got back?
Almost there are dark plum tang and can under for anything wrong with Sunday lunch – open the last remnants post-food start to see how this aspect of the price, in recent trip to Carcassonne and price.
If you’re looking to hear your tounge without food and you at the Trah Lah Lah Lah was reminiscent of view it is dashed good! Which we found it interesting last remnants post-food start to show the silly name it’s frigging fantastic price. Remember if it was a 2008 or 2009 vintage) compares to taking decanted, and do under for anything. I was very intense, a good time favourite of the best wishes for a while to get the two, i sense a marketplace (the 2006 is supposed to be missing out of 5 others one not to everyone (that’s just slid down and Joe) may be more than a Merlot) Cabernet blend, or from her tasting and it was subtle and give the producer an enjoyed this is due to financial constraints, and you are missing out of 5 others one changing to see how those 5’s ! :)”
It’s clear that the words are related to wine (and the computer does manage to group brand names like Trah Lah Lah, and mention my region, vintages, and other things that make this sound like a tasting note). So it sounds like English. But then when you actually look at the whole paragraph, there’s no sense at all! 😀
Computer-generated wisdom
Sometimes though, the gibberish words line up just right and there’s a strange sort of wisdom in the computer’s misuse of the English language.
Hi Sandy, you get what you pay for what does she know ha ha ha.
Swirl it intense, a good with the yanks in men, what I had, but the wine, but not quite quick). This wine front of her, was an open it was quite French.
I have bid? – I thinking wine. Rich and can understand the base proposition of those tannin heavy so a good with food… Lamb medallions, sauteed pots with onions, snow peas, and body from naked wines and as we worked our way throughout our stay. Ryan and dirty with food… Lamb medallions, sauteed pots with the seller can extending that basis I have order, you wanted us to the extra years in the vineyard and as always the sale this remarkable wine in the front of parma violets are they used to make a lot of purple. Very floral smell of Lilies and lots of flavour packed the grapes and Edinburgh and less fun!
Ya, I still need to work on it.
Totally unrelated to wine
Sometimes, the reviews seem totally unrelated to wine!
I found as always the last night.
Big (not one a couple of days when we got back?
I’d been toying it!
Why the heck am I doing this?
If you know me at all, you really should get used to me doing strange stuff all the time. But there is actually a reason for this. It’s raining outside and the paint is drying in B&B room #3 (codename: the Cabardes Room). So it’s a perfect opportunity to further my research in data visualization and analysis. I’m going to try to broach this subject with my technical audiences much more often in 2012 (including but not limited to a potential SxSW talk on data analysis for non-verbal experiences like wine drinking).