-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaNoGenLab: an experiment a day during November #10
Comments
I will not have as much time to commit to this this year as I did last year. It might end up just being the first two days after Hallowe'en. It will not use Markov processes. It might use a genetic algorithm with a Levenshtein distance as part of its fitness function. Or it might just use a grammar-based generator (a recursive-descent parser "in reverse", whatever you call that.) Don't know yet. It will not use Twitter. It might use Project Gutenberg. But also, it might not. |
This is just me thinking out loud. It occurs to me that you can write an unremarkable generator which generates a remarkable novel, or a remarkable generator which generates an unremarkable novel. Or both or neither of course, but what I'm getting at is, you can concentrate your efforts on either side. Is a program that generates a single novel still a novel-generator? If, as the rulebook suggests, a script that downloads a novel from Project Gutenberg and spits it out is a novel generator, then, yes it is. So, you could participate in both NaNoGenMo and NaNoWriMo by opening a text file like the following and banging on it throughout the month:
And if you get stuck, just print "meow" for the remaining words. Or, more interestingly, use some less trite application of logic and looping and whatnot to construct parts of the text. I think you could call this a hybrid novel/novel-generator. |
|
That was just an experiment. The program which produced it, and other experiments, will eventually be placed into: |
regarding eliza v eliza: Did you see the Eliza twitterbot that was talking to GamerGaters?
What would happen if we hooked up Eliza to twitter, went back to twitter for another comment, something apropos..... |
Here is my novel-generating algorithm:
|
Playing requests now in the bandstand! 15 dollars a day, weddings, parties... bongo jams a speciality! @MichaelPaulukonis Unfortunately, and even though ElizaRBarr is my new hero, I don't do Twitter. (I realize I must be in the distinct minority, here.) So you might be waiting a long time. This variation on the algorithm might be more efficient:
|
NaNoGenLab: an experiment a day during NovemberMeditating on @dariusk's statement, encouraged by @hugovk, and inspired by @MichaelPaulukonis's tireless research into Propp and With the following caveats:
Everything in the NaNoGenLab is in the public domain, so feel free to steal it, build on it, sell it, don't even credit me, whatever. I'm trying to use only verifiably public-domain external resources as well (Project Gutenberg, chroniclingamerica, and images in Wikimedia's "PD" categories, so far.) |
good GOD, man! |
I'm into it. |
@MichaelPaulukonis We are doing science so hard right now!! I guess I didn't mention that I have a fairly long commute which constitutes a large chunk of my free time. If I can bang it out while on the train, it's an "experiment". And y'understand that some of these are going to be just bloody awful.
|
Sea-Shanty "Tale of Two Cities" is admirable, admiral. What happens when that's pointed at something smaller, like a short story. It would get longer, wouldn't it? Or does the shanty stop after three verses. |
It'll keep producing verses as long as there are more words in the input, the main problem is that the input words are given on the command-line, you'll have to use If I get to 30 experiments and have time remaining, I'll clean it up. And try to add more templates for verses, too (currently there are only two.) Here's some output from this morning's experiment:
It may amuse to try to guess the method used before reading the report. What's frightening is how simple it is. |
Indeed! I do hope I'll be part of the mad control group that doesn't get its world taken over when the time comes. But wouldn't a mad engineer spend their time doing mad calcs to prove that their mad blueprint meets the mad specifications and doesn't violate any mad codes...? (snaps fingers) Mad blueprint generator! Hmm... |
Here is some output from this morning's experiment, btw.
|
"two hours ago" it says to me at 9:am on an EST saturday. Where are you, and how can you start so early? I'm only online because my wife is out of the house and the kids are playing Angry Birds while I tweak some code (nobody has eaten breakfast yet). I did add more configurable genders, and buff up the wordbank passing mechanism. Tiny tiny miniscule tweaks. Bigger projects keep getting shunted to the side.... |
Well it is important to remember that NaNoGenLab is just one of many (mad) arms of the entire vast (and mad) Cat's Eye Technologies Lab Complex, which spans several hundred square kilometers inside the hollow Earth and has (mad) openings to the surface near Calgary, Rejkiavik, Krasnoyarsk, and Venice. But I'm working remotely from near Oxford right now. Actually, that reminds me that I really ought to open a bug report about this whole "Na = National" thing, because when I glanced over participants' Github profiles, I counted at least 6 countries. Well, at least I've found a workaround that works really well for me ("just ignore it") so it's kind of low priority I guess. Anyway, here's some output from this afternoon's experiment.
[edit: fixed names that were incorrectly assigned in parts of the dialogue] |
|
East Germany? Lichtenstein? The Monastic State of the Teutonic Knights? Good ol' Bob and Alice! |
The latest experiment has gone horribly wrong I'm afraid; I'm just lucky nothing exploded, I suppose. Well, it's not that bad, it's just that it needs a very specific input text before I will be able to make use of it. The text needs to be 215 words long and all the words need to be unique. So I find myself working on a poem... only 89 words so far, and I've already used "you", "no", "for", "at", "the", and "and". Tricky business, this poetry stuff. |
You can always take an arbitrarily large text, walk through it grabbing only the unique ones that you don't already have, and stop when you have 215. |
@ikarth What, and end up with gibberish? I don't think so. Oh, wait... right. Well anyway, poem's written now. No going back. The idea (detailed here) was to try to answer the question: If we wanted to submit a novel to NaNoGenMo that was exactly 50,000 words in length, and we wanted to generate it using only permutations or combinations (with repetitions allowed, or not) of r words drawn from a set of n words, which combinatoric method, and what values of r and n should we pick? And let's ignore trivial solutions like P(50000, 1). Turns out (unless there was a flaw in my maths) that you cannot get 50,000 out of a single non-trivial combinatoric function, although C(317, 2) = 50086 which is quite close, although also, as I realized somewhat late in the research, that is just the number of ways to pick two elements out of 317; if you wanted to count all those elements picked, it would actually be r times that. The closest, taking that into account, is 2*P(159,2) = 50244. So this led me to ask (and instruct my computer to find out) if there were any two non-trivial combinatoric expressions of the latter sort that, when added up, totalled 50000. Turns out, yes: 3_C(21,3) + 2_C(215,2) = 50000. (And because choose has a symmetry in it, there are three other possibilities, using r = 19 in the first C and/or r = 214 in the second C.) Directly I collected 21 unique words roughly meaning "section of a text", and wrote a 215-unique-word poem, and threw together something to pull all the combinations and output Markdown, and the result is: 3×C(21,3)+2×C(215,2)=50000: The Novel |
And @ikarth, just to let you know, your suggestion was not made in vain. |
Some recent results: Recursively expanding templates without localizing the variables first:
Converting a binary file into a great big number and then treating that number as a phone number mnemonic (you know, like 1-800-GET-LOST):
I believe I did say that some of these were going to be bloody awful. |
|
You could grab out only the sections of that number that are dictionary On Tue Nov 11 2014 at 2:14:54 PM Michael Paulukonis <
|
@enkiv2 Ironically, all of those words did come from Actually, thinkinaboutit, doesn't a real dictionary usually have entries for "J" and "FM" and such too? Anyway, I know what you mean, and yes you could throw the whole thing through a filter to clean it up, but then it would lose an interesting property. As it is now, you ought to be able to reconstruct the original binary file from the words. The whole thing was a hack of course (I'm starting to regret the experiment-a-day goal; it's like speed chess; don't play speed chess, Bobby, it'll ruin you) and I doubt it generates an "optimal" phone number mnemonic. I'm pretty sure it would be possible to do better with some kind of dynamic programming ish solution. (And I'm pretty sure the same applies to a number of other experiments I've done so far as well.) @MichaelPaulukonis Your frequent musical references are sorely tempting me to write a synthesized music generator as one of the experiments. Arguing that a piece of music is a novel is probably beyond even my own post-modernist-conceptual-non-media-specific (lack-of-)sensibilities, though. |
@ikarth: to follow up on some things you mentioned on other issue-threads:
Really? Here I thought it was a way of procrastinating until I came across a solid concept... Or maybe it was an experiment in answering the question "Where is the bar set?" by throwing the bar across the room. Elsewhere, you also said
As a SCIENTIST I should agree whole-heartedly with the idea that the results ought to be reproducible! But in my own experiments, too often I've just gone and used Python's pseudo-random number generator without choosing or recording the seed... so the output is not, technically speaking, reproducible. (Not without some sort of brute-force search that I'm sure no one wants to do) Although obviously it's usually obvious that you're obtaining similar results... (and Javascript's prng doesn't even let you seed it, last time I checked; you have to use one written in Javascript if you want to do that.) Need to write some kind of seed-chooser-and-recorder device as a piece of lab equipment. Ah, but there'll be time for that later. I still have one or two more silly ideas, and as long a commute as always... |
This is probably the best description of this whole event that anyone has come up with.
I just went to a lot of trouble to set up a stored seed for my own project. Of course, in my case, I had the extra incentive of writing a pure functional system, so the random shuffling was the first thing that broke perfect repeatability. On the other hand, a lost random seed may be the closest the computer can come to the impermanent: an artifact that has never been generated before and may never be generated again. |
If you need a static seed to generate a worthwhile novel, that's a bug in On Tue Nov 25 2014 at 11:20:32 AM ikarth [email protected] wrote:
|
@enkiv2 I was thinking, something like this. (This is untested and should be considered pseudo-code.)
This way, it doesn't get in the way, but you can set a specific seed if you want, and (maybe more importantly) when it does produce a gem it will at least write the seed somewhere so that you have a better chance at reproducing it. (Of course, there are yet other variables like "what version of the script was I using", "what input files was I using", etc.) |
The benefit of having the seed is that it's shorter than the novel that On Tue Nov 25 2014 at 2:56:55 PM Chris Pressey [email protected]
|
We might be talking at cross-purposes here, a bit... hosting the seeds instead of the generated result is definitely not what I had in mind. Maybe I should clarify that, beyond the playing-science trope of chanting "Reproducibility! Yes! (Remember cold fusion, after all!)", my own it'd-actually-be-a-nice-thing-to-have use case for this would be when I have just run
and pressed |
I worked on getting a random key last year, and just let it slide by the wayside this year. In the field of generative visual art, it is really really really a nice thing to have. Also, for unit-testing. |
Tests? This is a lab! This is no place for tests... Regarding seeds, I'm now inclined to play devil's advocate and just maybe in this modern age where everything we do is archived forever in the cloud we should be grateful for all the empherality we can get? Shrug? Anyway, latest experiment is here and it is a total flop, by which I mean a total success, by which I mean that artists often run experiments but the hypothesis is almost always "I hypothesize that if I try this, the result will be pleasing enough, or at least the experience will be rewarding enough, that it was worth the effort of trying it." |
Just to report: I had a vague goal that I'd produce a cut-up novel of some sort -- with four experiments run in the name of doing so -- and you can see how far I got with that here: https://github.com/catseye/NaNoGenLab/blob/master/sensible-paste-up/sample-cheese.jpg I think it has promise ("cheese, stirring it until is is pneumonia", for example, and "FOUNDER Plus running it for, what, 200 pages (or whatever would feel sufficiently 50,000-words-ish) would result in a massive file which I'd have to host somewhere and, ehh, that'd just be more hassle right now. So, maybe next year. |
That idea has potential, but I can see why you're holding off. |
I got to try my hand at procedural image processing anyway, which is not something I'd ever really done before. And learned a bit about using PIL. So that's something. It has been a good year. A fun month -- an exhausting month, in many ways -- fwiw I do not recommend the experiment-a-day approach, unless you just have way too many ideas and want to surprise yourself by how quick-and-dirty you are willing to code, to get them down. And, stupidly, I seem to have even more ideas now. Arrgh. Well, next year... arrgh. Next year is eleven months away! Well, what about the off-season? Dunno; last year after NaNoGenMo I (mercifully?) lost my taste for generated text, but now... One thing I'm tempted to do is to extract the possibly-useful "lab equipment" into some kind of reusable library-slash-suite of utilities. The name NaNoGenLib suggests itself, but maybe that's a bit presumptuous. KTLN, a toolkit for unnatural language processing also suggests itself, especially if I can think of a better backronym than "Kitten Talks Like Nixon". Shrug? also fwiw @hugovk I don't think this issue deserves a "Completed" tag, due to its tangential nature. I've been following your lead and opening separate issues for each novel. And actually, since I uploaded them all as gists, a handy index / summary can be found here: https://gist.github.com/cpressey/ |
more fwiw: At the request of a friend, I translated the uniquifier experiment to Javascript and put it online here: Text Uniquifier. Also, this is neither here nor there, but I just noticed that:
I noticed this while looking for public results from NaNoWriMo this year (y'know, to compare notes, sort of.) I haven't yet found any, although granted I haven't spent a lot of time hunting yet. The NaNoWriMo site has links to authors' websites, most of whom are "published for-reals" and have, at best, a link to an ebook for you to purchase -- sometimes, from a draft completed during NaNoWriMo. Take this for what you will, my only point is: different. |
If anybody is interested in text manipulation in the off season, it is an
|
@cpressey It had a "preview" label, but I've now de-labelled it. |
@MichaelPaulukonis There's probably enough interest to establish some kind of communication channel for that, if someone organizes it. |
@cpressey "I do not recommend the experiment-a-day approach" I'll just put this here... https://www.flickr.com/photos/ranjit/collections/72157627384812764/ |
@MichaelPaulukonis I am interested year-round! |
@moonmilk Indeed. I think I would get funny looks from the other commuters if I were to try that on the train. (well, funniER.) @MichaelPaulukonis Consider me interested too, at least enough to lurk on said communications channel... |
I started up an out-of-season rep last year @ https://github.com/TextGenTex/TextGenTex I'm certainly open to "better" communication channels. |
I'm definitely interested in off-season experiments as well. I've On Sun Nov 30 2014 at 8:38:33 PM Michael Paulukonis <
|
How about a google group for the off-season text stuff? They're free and IRC is nice, but even if someone is archiving it, it's much harder to look -r On Mon, Dec 1, 2014 at 1:37 PM, John Ohno [email protected] wrote:
|
I'd join a google group for this if someone produced one. I'm treating On Mon Dec 01 2014 at 7:52:16 AM Ranjit Bhatnagar [email protected]
|
https://groups.google.com/d/forum/generativetext or assuming I've set up the settings correctly. Which seems unlikely. |
It seems OK, other than being a private group. On Mon Dec 01 2014 at 9:09:16 AM Michael Paulukonis <
|
I intend to participate again this year.
If you are wondering why I used the word "again" in the previous sentence, it may help to understand that the account I was using last year has since been converted into an organization.
I don't know what I intend to do, yet, but the end result better consist of 50,000 of something that I can make a fair argument are "words" or I will surely be forced to pack my bags and catch the next Greyhound out of town in my shame.
The text was updated successfully, but these errors were encountered: