A Testers Contest – The Crappy Little DataGenerator


At House of Test a couple of us have been developing a simple little tool for generating test data. We call it the “HoTies Little DataGenerator” and we want to share it with the testing community. But before we go public we are releasing the “Crappy Little DataGenerator” as a challenge to you testers. Besides the honor and glory the winner will receive a cool HoT goodie bag! 

The challenge is to test the tool and report your findings in the comments field of this post. I, Martin, will be the almighty judge of what is the most interesting finding or result that will earn a winner the prize. This is a friendly challenge to the testing community but take it seriously and use this as a practice opportunity to train your testing skills. In other words, don´t let your testing be sloppy but do the ground work.

The tool is created in java and as long as you have java installed on your system you should be able to just double click on the .jar-file and start the GUI. You can find the crappy little datagenerator here: https://s3.eu-central-1.amazonaws.com/houseoftest/Exercises/crappy_little_datagenerator_v_1.0.jar
If you like the tool but dislike the bugs, then you will be able to download the real version after the contest is over.
Ladies and Gentlemen, start your testing. You have to the end of the week, the Sunday 28th of February, to send in your test results.

71 thoughts on “A Testers Contest – The Crappy Little DataGenerator

  1. =)
    Performance issue:

    How to reproduce:
    Length of string 1000000
    Numbers checked, everything else unchecked.
    Generate randomstring.

    20 minutes and the program is still not finished,

    Expected result:
    That the program would be faster then me typing it manually =)

  2. Summary:
    Generated data is lost from the Log File each time a new instance is run.

    Steps to reproduce:
    1. Open a new instance of the data generator – This will create a new empty log file in the folder location.
    2. Tick the “Print Logs to File” checkbox.
    3. Generate a new string (the choice does not matter).
    4. Open the datageneratorlog.txt file and view the generated string.
    5. Close the data generator instance.
    6. Open a new instance of the data generator
    7. Open the datageneratorlog.txt file

    Failure Point:
    On opening the log file, All the previous data generator sessions are now erased from the file.

    Risk and Implication:
    If the users intention was to use this tool as a means of creating test data for future test sessions, they would need to recreate the data each time they use the tool. Having a unique file name could counteract this from happening, placing the time and date stamp in the filename possibly.

    The formatting of the log file output makes it difficult for the user to easily cut and paste it to the desired target location. The data, once saved, is appended on to the end of the last entry rather than printing this to a new line.

  3. Summary:
    The data generator prints extra characters on the Ascii table when writing the string to the log file.

    Steps to reproduce:
    1. Open a new instance of the data generator
    2. Tick the “Print Logs to File” checkbox.
    3. Generate a new “Get Ascii table” string
    4. Open the datageneratorlog.txt file and view the generated string.

    Failure Point:
    On opening the log file, extra data is added.

    “????????????????????????????????” is added to the output file. When pasted into a local .txt file using the generators “copy string to clipboard”, these extra characters do not appear.

  4. While trying to invoke the app, I received an exception that said “A java exception has occured”. I’m using windows10. Don’t think it has anything to do with that though.

    BONUS BUG: WHY is all things I’m typing here coming in caps? It’s annoying…please correct that.

    1. What could be the reson behind such an error? Is there anyway to check with someone else using a Win 10 machine to see if the same error occur?

      And I have no idea why the typing is in caps, I have sent a change request 🙂 And in the meantime I will try to figure out why no one has reacted to this before :p

  5. 23.02.2016 at 07.17


    1. Open the Link an start the Download (Save the File on C:private Files/Crappy little
    2. Download finished
    3. Klick the Button Open the File
    4. Error massage: Java Virtual Machine Launcher/ Could not finde the main class:yhtest.tool.controller.Testtoolgui./Programm will Exit

    Java Version 2011

  6. Was a bit to quick with posting yesterday…

    Generates random email without including a dot.

    Steps to reproduce:
    1. Set the “Lenght of string” to anything between 6-256.
    2. Press the “Generate Random Email” button.

    The random emailadress will sometimes be generated without a dot included.

    Validity check at most registrations require the user to input a dot ifront of the TLD part.

  7. Windows 7 machine:





    rUNNING IN COMMAND LINE USING JAVA -JAR crappy_little_datagenerator_v_1.0


    C:WINDOWSsystem32>java -jar C:Usersjsteven2Downloadscrappy_little_datagenerator_v_1.0.jar
    Exception in thread “main” java.lang.UnsupportedClassVersionError: yhtest/tool/controller/TestToolGUI : Unsupported majo
    r.minor version 52.0
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClassCond(Unknown Source)
    at java.lang.ClassLoader.defineClass(Unknown Source)
    at java.security.SecureClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.access$000(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    Could not find the main class: yhtest.tool.controller.TestToolGUI. Program will exit.

    Java version:
    C:WINDOWSsystem32>java -version
    java version “1.6.0_105”
    Java(TM) SE Runtime Environment (build 1.6.0_105-b15)
    Java HotSpot(TM) 64-Bit Server VM (build 20.105-b03, mixed mode)

    1. Ahhh – Maybe you need to mention which version of Java it is compatible with! Works with 1.8 of the JRE

      Or future enhancement to add a check for java compatiblity.

  8. Some QUICK observations:

    No error check on length of string – well no message for the user.
    No error checking at all – well not messages to users.
    Selecting none of check boxes – no error reported
    Be good to be able to choose a counterstring limiter (sometimes * is no good esp for number only fleids)
    International chars (on their own) selected does nothing

  9. Another little quirk – If the DataGenerationLog.txt file is set to read only – the application does not start.

    Also another ENHANCEMENT would be the ABILITY to set a name for the log file.

  10. OK, so completed an initial scout of the application, found some behaviours that *I* didn’t expect but to continue my investigation I need some information on the context.

    For starters:
    What is important to you in terms of this application?
    What is this application going to be used for?
    Who are the likely users of this application and how will they use it?
    Are there any areas or risks that you are particularly interested in?

    As this is a competition and competitions typically have rules…
    Do you want my observations in a particular format?
    Do you want a single report, observations as I find them or something else?

    1. Great! Finally someone is asking for the actual purpose of this application!

      This little data generator is intended to be used by testers, and maybe developers, who quickly want to generate test data. Since testers are the intended audience it is important that they can trust the generated data and understands what is generated. I am expecting that this tool could be beneficial when testing websites where default input max lengths are 524288 characters (http://www.w3schools.com/tags/att_input_maxlength.asp) and therefore strings that are much longer than that might be of less use. But occasionally a tester might need a big amount of testdata and the tester should be able to generate that with this tool.

      This competition does have rules but since it is a testers competition it is also intended to simulate a real testing situation. Often there is more information to be accessed if one is only looking or asking for it. This is the reason for me not updating this information into the blog post but I will leave it here in the comment section. Anyone who spends a minimum of effort to read trough the blogg post and the comments will see this information. Anyone who jumps right into the testing might miss the information.

      As this is a friendly challenge I simply want the observations and findings written in this comments section for everyone to see and in no particular format. But bundling at least a couple of observations might be nice 🙂

  11. After having read that it is important to trust and understand the data which is being generated I focused on investigating the character distribution in the resulting string. I Tested different strings with the length of 100000 characters to get some statistical significance. I found that characters that belongs to two categories (e.g. the “@” character which is included both in “Special chars” and the “password friendly chars” categories) occurs twice as often as other characters in the string. This might be intended behavior but could be confusing to some users (me included). It is a problem if you for some reason really need a true randomly distributed string with no weighted characters.

  12. Hello Martin ,

    I read the below statement.
    The challenge is to test the tool and report your findings in the comments field of this post.

    This statement is not helping me to be clear in learning, what information (finding) you need from Testing. Test for what criteria? Kindly prioritize what kind of finding is your immediate need.

    I need your help in understanding what is important finding for,
    1) couple of others who is working to build this tool;
    2) to you;
    3) to HoT;
    4) to Testing Community.

    And let me know why the finding (information) you want to know from testing for now, is critical.


  13. I consider the command line interface to be a bit difficult to access and once you manage to launch it it is rather clunky to use.

    There DOESN’T seem to be any short versions of the commands. I haven’t been able to supply the commands as arguments rather than manual input. There is no easy way of repeating the previous command (for instance arrow Up wont do you any good)

    The command line interface seems to otherwise suffer from the same logical issues as the GUI version.

    It sort of usable but i wouldn’t call it useful, nor successful.

    Welcome to my Random Test Data Generator tool!

    You can use the following options:
    * randomstring
    Enter one or more of the following types in one word: U = Uppecase, L = Lowercase, D = Digits, I = International characters, P = sPace, S = Special Characters
    * counterString
    * email
    * clipBoard (turn copy to clipBoard On/Off
    * exit

    Copy to clipboard enabled [X]

    Enter your choice:

  14. It looks a lot like the team forgot to add international characters
    There are other characters than SPACE which I would consider to be whitespace but they are missing.
    There lowercasecharacters alphabet seems to be a bit heavy on the O side. It is also a COMPLETELY unnecessary hardcoded value variable (how about just using the uppercaseone.tolower() or something along those lines?).
    Aren’t there more special characters than the ones provided by the tool?

    String uppercaseCharacters = “ABCDEFGHIJKLMNOPQRSTUVWXYZ”;
    String lowercaseCharacters = “abcdefghijklmnoopqrstuvwxyz”;
    String digits = “0123456789”;
    String internationalChars = “”;
    String specialCharacters = ” !”#$%&'()* ,-./:;?@[\]^_`{|}~u00a1u00a2u00a3u00a4u00a5u00a6u00a7u00a8u00a9u00aau00abu00acu00adu00aeu00afu00b0u00b1u00b2u00b3u00b4u00b5u00b6u00b7u00b8u00b9u00bau00bbu00bcu00bdu00beu00bf”;
    String space = ” “;
    String passwordSpecialCharachters = “_-!?@u00a3u20ac”;
    String emailFriendlySpecialCharacters = “!#$%&’* -/=?^_`{|}~”;

  15. It is really rather tricky to format your replies to this posting when the input field defaults to CAPS no matter if caps is on or not.

  16. The code base seems to not be very consistent in terms of following one naming convention.
    What is up with:
    public String getPasswordFriendly() {
    return this.passwordSpecialCharachters;

    In the code below?

    public String getUpperCase() {
    return this.uppercaseCharacters;

    public String getLowercase() {
    return this.lowercaseCharacters;

    public String getDigits() {
    return this.digits;

    public String getSpace() {
    return this.space;

    public String getPasswordFriendly() {
    return this.passwordSpecialCharachters;

    public String getInternationa() {
    return this.internationalChars;

    public String getSpecial() {
    return this.specialCharacters;

  17. You have code that will never run. It doesn’t really do any harm right now but it’s not a good idea to build like this.

    public String generateEmail(int length) throws IOException {
    if (length < 6) {
    throw new IllegalArgumentException("An email cannot be less than 6 characters long");
    Random randomNumber = new Random();
    if (length < 6) {
    length = 6;

  18. Your e-mail generator sometimes won’t include an actual top level domain at the end, and some of the top level domains wont be actual domains. There are also many top level domains your generator wont ever generate. I’d consider the entire e-mail generator broken by design.

  19. Your handle EXCEPTIOns related to loggin to file differently at different locations throughout the code base. it might work at the moment but it is a breeding ground for future issues as it makes your code base harder to maintain.

  20. Judging by how the debug and output prints that the developer forgot to change when he/she was copy pasting these method bodies I’d suggest going over all of them and making the output format uniform and accurate to avoid confusion.

  21. Now, I am looking at a decompiled version of the code so I’m not getting any comments and it is possible that the dead whitespace I’m seeing is due to the decompiler but combined with the coding style seen otherwise I’m going to roll the dice and gamle on that the white space is put there by the developer. I’d suggest keeping the code a bit tighter and limiting the dead whitespace to whatever coding style you have picked.

  22. Now to some low hanging GUI level fruit:
    * If you resize the window, the layout of the GUI does not respond to the changes. CONSIDER making the GUI more responsive or locking the window size
    * As it turns out, the big image at the bottom right of the GUI is a link to a WEBSITe. There was no indicator of this until the image was clicked. Consider making it clear that the image is a link.
    * There are plenty of truncated labels in the GUI. Maybe consider shorter labels or a layout that allows longer labels?
    * what is up with the … in the GUI next to the dashed lines?
    * Get ascii-table takes a little while to generate the first time, what is up with that?
    * there is no mouse over description for International characters (As it turns out (by code inspection), because the actual variable is empty)
    * There is no message to the user when unallowed low numbers are used for e-mail generation.
    * if You generate something (which is allowed) and then generate something that is not allowed, the actual output is not changed, which could result in the user thinking he/she has a string formated in a certain way when in fact the user is using the same string as was generated before the failed attempt. POTENTIALLY not great.
    * If you enter INPUT that for some reason is not accepted and click random e-mail or counterstring, the length value is reset to 200 and the string is generated. No warning or error given to the user. (hitting random string results in noting happening or updating in the UI, which is also interesting. Why doesn’t these buttons show the same BEHAVIOR?)
    * If you enter a big number, but not too big 😉 you’ll make the application unresponsive for an unknown amount of time. It could POSSIBLy terminate eventually but I didn’t bother sticking around for that long.

  23. For the record.

    I would have PREFERRED to sit down with the developing team to talk about my findings and how I found them instead of trying to type it out in text.

  24. Let’s see what happens with the layout… copy/paste from Google Docs.

    Note: I didn’t test on my own computer, thus I didn’t install anything it didn’t yet have. I also wanted to gather notes, instead of waiting for replies to questions, while PO is not available. This means that the notes should be seen as points I would like to clarify and use to update my test strategy. BTW, I am not sure if this product is very important for him as his availability is very limited and all communication is directed to a blog post comment section. I suspect he just wanted to crowd test his application near freely. =P

    I didn’t have Java on the computer, so I focused on reading the code. (It’s a long time since I previously have worked with Java, so I might make mistakes in my interpretations.) Most of the things I mention would be rather simple to find by testing the tool via GUI, but because I am looking at the code, I also can give details to why certain things are potential problems. My testing will not notice e.g. environment, performance, and compatibility issues.

    The tool has various outputs, such as automatically copying to clipboard, writing to a file, and printing the test data on screen. Since I based my testing on reading the code, those might have problems I didn’t detect. An example of such a problem could be impossibility to write to DataGenerationLog.txt file with restricted user rights on a system where administrator has installed the application (or for example made the file write protected).

    The analysis of code focuses on 3 topics:
    1) Input/output processing
    2) Coding mistakes (no code analysis tools used)
    3) Logical errors

    This means I didn’t give a rats ass about maintainability etc. I also left most of the code base unread because I got hungry and went to make food. Additional notes and suggestions are provided for further testing and product improvements.


    boolean debug = false;
    I would like to see those texts somewhere so I know if something happened. Also, when processing a longer time, it would be nice to know if the application is still loading.

    String lowercaseCharacters = “abcdefghijklmnoopqrstuvwxyz”;
    “o” is two times, thus it will appear more often than other letters in the generated string. Mathematically this means when randomizing 100 letters, letter “o” would appear about 7.4 times and other letters each about 3.7 times.

    String internationalChars = “”;
    There are no international characters listed, thus nothing will be generated with this.

    String specialCharacters includes soft hyphen (“u00ad”) which is not supported in ISO 8859-11 (Thai/Latin). This and some other observations (e.g. extended ASCII 128-159 / Microsoft Windows Latin-1) made me believe there might be encoding issues when using the tool with different systems/configurations.

    String space = ” “;
    The tester might want other kind of space too to be included in the test data. Testers are weird that way.

    public String generateEmail
    “if (length 1) …
    Firstly, nextInt(2) returns either 0 or 1, thus end of domain can never reach 3 characters.
    Secondly, because of the “>1” comparison, when nextInt returns 0, the email address will not have “.” and domain at all.
    Thirdly, when the “.” is missing, the length of the string will be 2 characters less than what user wanted.
    It could be also useful to notify the user when email length is over 256 characters.

    Note this: int defaultLength = 200;
    The “Generate Randomstring” button doesn’t have
    catch (NumberFormatException e1) like the other buttons that take in a string, thus I expect non-numeric inputs (e.g. “100b” or “25 6”) don’t produce anything for the user when trying to create a random string, but when generating random email or counter string, the string length is changed to 200 and that is used to generate the test data.

    Further testing

    As my oracles are limited, I suggest to apply tools in testing, such as
    – somacon.com/p525.php for checking how many times each character appears in a random string
    – charactercountonline.com for counting amount of characters
    – Excel for processing the data
    – Playing around with character sets / encodings

    Different Java versions and devices.

    Combinations for selections (gets difficult when reading code)

    Boundary values (e.g. the email address has different rules in code than in the tooltip text)

    Edit code to generate what you want 🙂

    Performance etc. as mentioned at the beginning with a computer running Java – would be nice to see how the GUI works, especially with multiple instances running at the same time

    Adjust based on what PO will reply to these notes

    Involve potential users if possible

    Go apeshit with automation

    Static code analyzer(s)

    Product improvements

    Typing mistakes all around, but maybe they are not important for the PO.

    The error message for not finding the external IP address is not very helpful.

    There are little-to-none (error) messages for the user.

    General notice of the log file: using for example CSV format could generate a nice file that is easy to play around with in Excel.

    The file might also grow really fast, which might not be wanted. Then again, it’s overwritten when the app is restarted.

    Version for mobile applications

  25. 1. Long loading times without feedback to the end-user (To reproduce: In the Length of string field enter 1000001 > no loading sign or any error message, but no other button on the GUI can be pressed. Tendency is to just close the app)
    2. Previous inputs are not reliably deleted from the results window (Enter 0 in the length of string field, Then click on Get my External IP > Click again on Generate Randomstring. Result: the previous IP is not deleted from the results window, where there should be some message showing the error like “please enter a number above x to generate a string”)
    3. No error message for invalid input in the length field (string)
    4. Even when input type is valid (number, ex 56), a leading space is not ignored and no output is shown. Same goes for a space at the end of a valid numerical input.
    5. Not all criteria are taken into account in the generation of the string (Enter 10 in the length field, then select Capital Chars, Numbers & Small Chars. My expectation was that the generated string includes all selected character types (“AND” rule), but on repeated clicks I got a string without numbers)
    6. International char option does not work at all (Steps: put 10 for length of string, then only select International Char) >> no result
    7. The Generate Random Email option does not always respect the number of chars selected: ex. leave 10 as the length of string number, then click on Generate Random Email repeatedly. The generated result does not always contain 10 chars as expected (ex results with 8 chars)
    8. The generate random email does not always result in valid email formats even when input lenght is valid (below 256 char)(ex from results with 10 as char length: dkhcu@qp, xppt@coj)
    9. Usability: the description of checkboxes is not readable, the end of words/sentences has …. I accidentally stumbled upon the tooltips some 20 minutes after I started using the tool
    10. The “about” description has several typos in it („after the contests is over“) :))
    11. Tested on a Mac, OS 10.10: the generated output cannot be copied via right mouse click, which is counterintuitive as I might just wanna copy the output and paste it into an input field in my field under test
    12. Inconsistencies: the tooltip of the Random email generator buttons states that a valid email address should not be above 256 chars, however a random email is generated when 300 chars are fed into the length of string field

  26. In order to type comments correctly:

    In Firefox – right click on the Comment field > inspect element > uncheck “text-transform: uppercase”

    Find how to do the same in whatever other browser you are using.


  27. My biggest issue is with the international chars checkbox.

    The assumption behind it is that i18n is the exception and not the norm.

    This is a real risk while thinking about software that gets textual input from any source, manually by humans or from files.

    Your generator should include i18n chars by default.

    Your testers and developers should see i18n chars all the time.

    I would add a special switch such as /disableinternational that will be used only in extreme cases.

  28. A bit late to the game, I see, but there’s a few things that disturb me that hasn’t been mentioned yet.

    In the jar, there’s extraneous libraries that aren’t being used (jUnit, hamcrest). Okay, minor problem, perhaps, but then, they are not mentioned anywhere. According to the license(s) under which they have been published, this makes it a possible copyright violation.

    A small functional thing; The lengtH of string is potentially ambiguous; Is the program referring to number of characters or length in bytes? (in reality, it is #chars, but that might not be what is expected)

    I would have wanted a better command line interface, but that might not be in the requirements…

    1. OOps, Ignore the comment on characters. I converted the output to a useful charset. in the output charset, the number is of course the same.

  29. Martin, you make some interesting statements about this tool. I’d like to focus on the trust part.

    What do you mean by ‘trust the generated data’? Who says testers need to be able to trust their data? The degree to which I need to be able to trust my data may vary wildly. In testing medical applications, sure, there is no room for mistakes with data. In testing a online calculator for hobby purposes, I guess the validity of the data you throw at it matters to a lesser degree. I feel the need to trust the data also depends on the ‘formality’ of test results that are based on the data. Will the test results be scrutanized by auditors, for example?

    Why should I trust the data your tool generates? Is the application ‘certified’? 😉 Will you claim that this tool generates ‘trusted’ data? What are the requirements for this thing? Who created them? Have these requirements been tested? Can I look at the results?

    Especially with the ‘randomly’ generated data, at this point, and without further information about the algorithms that you use to generate the data, I wouldn’t trust this tool for the world. What’s out there already with regards to the creation of random data? For random data, I’d be more likely to use https://www.random.org/ just because they explain the process by which they generate their data.

    Best regards,


    1. Very good comments!

      And you are right in that if the need for “truely” random data (or very close to it) this tool should probably not be trusted.

      For a user who might not care that the randomized strings are statistically proven to be random, what could be important to be able to trust?

      1. Hi Martin,

        Just off the top of my head, I think that I would really hate if a string that was generated with this tool would trip up my tests (and possibly cause a false positive or negative) because the generated string was not wat it ‘promised’ to be. A very basic example would be if I’d be expecting to generate a string that should consist purely of letters and there would be a number in there.

        There are quite some functions in your tool that generate strings for which one or the other standard has been defined. I’d expect the tool to adhere to those standards or at least be made aware that there is a deviation from the standard. Standards have been mentioned in the other comments, but not explored explicitly as a particular subject.

        Given that we would come to some sort of agreement about the purpose of this tool, and with the little information I have at the moment, I’d investigate claims first, standards second, comparable products thereafter. But this would probably take a considerable amount of time. Would that be acceptable to you? If not, why not?

        Anyway, fun excercise! Thanks!

        Best regards,


  30. Hi Martin,

    Just off the top of my head, I think that I would really hate if a string that was generated with this tool would trip up my tests (and possibly cause a false positive or negative) because the generated string was not wat it ‘promised’ to be. A very basic example would be if I’d be expecting to generate a string that should consist purely of letters and there would be a number in there.

    There are quite some functions in your tool that generate strings for which one or the other standard has been defined. I’d expect the tool to adhere to those standards or at least be made aware that there is a deviation from the standard. Standards have been mentioned in the other comments, but not explored explicitly as a particular subject.

    Given that we would come to some sort of agreement about the purpose of this tool, and with the little information I have at the moment, I’d investigate claims first, standards second, comparable products thereafter. But this would probably take a considerable amount of time. Would that be acceptable to you? If not, why not?

    Anyway, fun excercise! Thanks!

    Best regards,


  31. Hi, I just landed on this page, and had a quick look at the randomized data generated. Immediate thought which came to my mind is as the strings generated are very random it might be difficult to analyze if any observation is found during the test.
    As you are generating the random data it is fine however this random data, might be difficult or many not friendly when things go wrong for example if I use a random email addresses of long length i.e. say 100 characters and on the UI it is not shown completely. As the data is random several letters might be repeated and if a tester wants to know exactly where it is breaking it might be little tough specially while rerunning to test such observations with the strings of different length

Leave a Reply

Your email address will not be published. Required fields are marked *