Test Data Generator
Generate repeatable synthetic test data from field recipes and seeds, with CSV, JSON, NDJSON, SQL output and recipe quality checks.{{ dataset.outputText }}
| Field | Type | Options | Blank % | Example | Note | Copy |
|---|---|---|---|---|---|---|
| {{ row.field }} | {{ row.type }} | {{ row.options }} | {{ row.blankPercent }} | {{ row.example }} | {{ row.note }} |
| Check | Status | Value | Detail | Copy |
|---|---|---|---|---|
| {{ row.check }} | {{ row.status }} | {{ row.value }} | {{ row.detail }} |
Introduction
Test data has to look real enough to exercise software, but fake enough to avoid exposing people, customers, or internal business facts. A useful fixture preserves the shapes that matter to the target workflow: field names, dates, identifiers, optional blanks, category choices, quoting, and row volume.
Different testing jobs need different kinds of samples. A unit test may need only three rows with obvious values. An import smoke test needs enough fields, commas, quotes, nulls, and date values to stress parsing. A regression test needs the same rows again after a bug fix, while a demo seed needs coherent-looking values that do not borrow from real accounts.
Repeatability is the main reason to use a recipe and a seed instead of a fresh random list every time. If the row count, recipe, date range, and seed stay fixed, a failing check can be rerun with the same fixture. When those inputs drift, the failure may follow the data instead of the product change.
| Choice | Why It Matters | Common Mistake |
|---|---|---|
| Small fixture | Fast to review and good for focused unit or component checks. | Too few rows to reveal pagination, sorting, or bulk import problems. |
| Fixed seed | Keeps generated rows stable across repeated test runs. | Changing the seed while investigating a failure and losing comparability. |
| Reserved domain | Prevents email or URL values from pointing at real recipients or sites. | Using production-like domains that can leak or accidentally contact real systems. |
| Blank values | Exercises optional-field handling, null conversion, and import defaults. | Leaving every field populated and missing the failure path. |
Synthetic data is not the same thing as anonymized production data. A copied list of rare plan names, customer segments, internal IDs, or recognizable examples can still reveal business context. Safer fixtures are written intentionally, use reserved or internal test domains, and keep option lists reviewed before the rows leave a private workspace.
The best dataset is usually the smallest one that exercises the behavior under test. Five rows may be enough for a unit test, while a bulk import check may need hundreds. Stable recipes make failures reproducible; deliberately varied recipes help find edge cases after the baseline is understood.
How to Use This Tool:
Start with the field recipe, then set the row count and output shape that match the fixture, seed, import file, or sample payload you need.
- Write one active line per field in
Field recipeusingname:type:options. For example,customer_id:sequence:start=1001creates a stable identifier, andplan:choice:Starter|Team|Enterpriselimits a field to named choices. - Use
Browse recipe, drag-and-drop,Load sample, orNormalizewhen the recipe starts in a note, CSV header draft, or schema sketch. If the file warning appears, use a TXT, CSV, or recipe file under 128 KB. - Set
Rowsbetween 1 and 500. ChooseOutput formatas CSV for tabular imports, JSON array for structured fixtures, NDJSON for line-by-line processing, or SQL INSERT for database seed statements. - When
Output formatis SQL INSERT, fillSQL table nameand chooseSQL dialect. The selected dialect controls identifier quoting for PostgreSQL, MySQL, SQLite, SQL Server, Oracle, Snowflake, BigQuery, or ANSI-style SQL. - Open
Advancedwhen you need fixed date limits, a non-routable email domain, nullable fields throughGlobal blank rate, a differentDefault sequence start, a JSON wrapper throughJSON root key, or CSV without a header row. - Check the summary and the
Quality Ledger. Fix blocking errors such as no active field lines or more than 24 parsed fields before copying, downloading, or exporting the generated data. - Review
Field Recipe,Synthetic Dataset, andField Mix Chartbefore using the output in a test run. The first-row examples, blank count, seed, and field-type mix should match the behavior you intend to exercise.
Interpreting Results:
Synthetic dataset ready means the recipe parsed, the row count stayed within range, and the selected format produced text. It does not prove that the target importer, API contract, database constraints, or business rules will accept the rows.
Quality Ledger is the main place to check confidence. It reports row volume, parsed field coverage, repeatability, blank cells, artifact format, recipe warnings, and blocking errors. A clean ledger means the generator found no current recipe problem, not that the sample is statistically realistic or privacy safe.
| Result Area | Use It For | Verify Separately |
|---|---|---|
Synthetic Dataset |
The generated CSV, JSON, NDJSON, or SQL text | Import acceptance, schema constraints, and downstream validation |
Field Recipe |
Normalized field names, value types, options, blank rates, examples, and unique counts | Exact column names expected by the target workflow |
Quality Ledger |
Warnings, errors, blank count, row count, seed, and selected format | Whether the rows cover edge cases that matter to the test |
Field Mix Chart |
A quick count of generated field types | Whether each type has the right values for the scenario |
False confidence usually comes from neat rows that skip the failure path. If the target system must reject a duplicate email, invalid SKU, expired date, missing amount, or unknown status, add that case deliberately or keep a separate hand-written fixture for negative testing.
Technical Details:
A field recipe is a lightweight fixture schema. Each non-comment line becomes a field name, a value type, and optional settings. Blank lines and lines that begin with # are ignored, field names are normalized to identifier-style names, and duplicate names receive suffixes so both columns remain visible.
Repeatability comes from a deterministic pseudo-random sequence built from the seed text, the active recipe source lines, and the row count. Keeping those inputs unchanged recreates the same rows. Changing a field option, row count, or seed gives a different sequence, even when the visible field names look similar.
Generated values come from built-in lists, field options, arithmetic ranges, and date ranges. That makes the rows useful for fixtures and smoke tests, but it does not infer production distributions, enforce database relationships, create secure tokens, or provide differential privacy guarantees.
Transformation Core:
| Stage | Rule | User Check |
|---|---|---|
| Parse recipe | Each active line is split into field name, type, and options; aliases and inferred types are normalized. | Field Recipe shows the parsed field, display type, options, blank rate, example, and unique count. |
| Validate bounds | At least one active field is required, and a recipe can contain up to 24 parsed fields. | Quality Ledger shows Recipe error rows for blocking failures. |
| Generate rows | The seeded sequence creates 1 to 500 rows from the current fields, date range, blank rate, sequence start, and field-level options. | The summary reports row count, field count, blank count, selected format, and seed. |
| Format artifact | The same row objects are rendered as CSV, JSON array, NDJSON, or SQL INSERT statements. | Synthetic Dataset shows the copy-ready artifact, while JSON shows the run record. |
Formula Core:
Sequence and numeric fields use simple deterministic arithmetic after the seeded random value is selected. In the formulas below, i is the zero-based row position, r is a seeded random value from 0 up to but not including 1, and d is the decimal-place setting.
A sequence field with start=1001, step=5, and row position 3 produces 1016. An integer field with min=1 and max=100 can produce both 1 and 100 because the upper bound is inclusive. A money field defaults to two decimal places unless decimals= changes the precision.
Format and Boundary Rules:
| Area | Rule | Boundary to Check |
|---|---|---|
| Supported value types | sequence, uuid, names, email, phone, company, city, country, dates, numbers, money, boolean, choice, word, sentence, SKU, URL, and IPv4 values are supported. |
Unknown or alias-like types may be inferred from the field name, so review Recipe warning rows. |
| Rows and fields | The row count is bounded from 1 through 500, and parsed fields are bounded from 1 through 24. | Out-of-range field counts block output; row count is constrained to the supported range. |
| Blank values | Global and field-level blank rates apply to non-sequence fields before the type-specific value is generated. | Blanks render as empty CSV cells, null in JSON and NDJSON, and NULL in SQL. |
| Dates and decimals | Date and datetime values draw from the configured inclusive date range. Number and money fields use 0 through 6 decimal places. | Reversed date or numeric endpoints are normalized before values are drawn. |
| CSV and JSON | CSV output quotes cells containing commas, quotes, or line breaks. JSON array output can be wrapped with an optional root key, while NDJSON writes one JSON object per line. | CSV header presence is controlled by the header switch; NDJSON is not wrapped by the JSON root key. |
| SQL INSERT | SQL output writes one INSERT statement per generated row and quotes identifiers according to the selected dialect. | It does not create tables, infer column types, add indexes, or validate foreign keys. |
Because the random stream is deterministic, it is useful for reproducible fixtures and unsuitable for secrets, passwords, security tokens, or sampling claims. Treat generated rows as test artifacts that still need schema checks and human review before they are shared.
Accuracy and Privacy Notes:
Fabricated rows can reduce exposure to real records, but they are not automatically private or representative. Built-in names, locations, companies, and words are convenience lists for fixtures, not a statistical model of a population.
- Do not paste production records into the recipe. Field names, rare category labels, and example option values can reveal internal business details.
- Use reserved or internal domains for email and URL fields, such as domains ending in
.test, when generated contact values should never point at real recipients. - Recipe files are read into the page for generation, and the visible workflow does not submit recipe text or generated rows for a separate generation step.
- Do not treat these rows as anonymized production data. Differential privacy requires specific algorithms and guarantees that recipe-based fixture generation does not provide.
Worked Examples:
Customer Import Smoke Test:
A recipe with customer_id:sequence:start=1001, email:email, plan:choice:Starter|Team|Enterprise, monthly_spend:money:min=12,max=480,decimals=2, and signup_date:date can create a 12-row CSV import sample. Quality Ledger should show Row volume as 12 rows, Repeatability with the selected seed, and Artifact format as CSV.
Line-Oriented API Fixture:
For a log or stream consumer, choose NDJSON and use fields such as event_id:uuid, account_id:sequence:start=2000, event_type:choice:created|updated|cancelled, and created_at:datetime. Synthetic Dataset should show one JSON object per line, and Field Recipe should show the first-row example for each field.
SQL Seed with Nullable Fields:
Select SQL INSERT for a table named customers and define id:sequence:start=5000, company:company, support_score:integer:min=1,max=100,blank=15, and active:boolean. When nullable values appear, Blank cells reports the count and the SQL text writes NULL for those cells.
Duplicate Name Cleanup:
If two active recipe lines both begin with email, the parser keeps both by adding a suffix to the duplicate field. Quality Ledger adds a Recipe warning, and Field Recipe shows the normalized names. Rename the fields yourself when an importer expects exact column names.
Blocked Recipe Boundary:
A pasted recipe with no active field lines changes the summary to Recipe needs attention and shows Add at least one field line before generating data. A recipe with more than 24 parsed fields also blocks output. Reduce the field list or split the fixture, then confirm that Synthetic dataset ready returns.
FAQ:
Why did the same seed create different rows?
The seed is combined with the active recipe lines and Rows. Keep Field recipe, Rows, and Seed unchanged when you need the exact same dataset again.
Why did my field name change?
Field names are normalized to identifier-style names. Spaces and punctuation become underscores, names that start unsafely are adjusted, and duplicate names receive numeric suffixes.
What happens to blank values?
The global blank rate applies to non-sequence fields unless a field line sets blank=. Blanks become empty CSV cells, null in JSON and NDJSON, and NULL in SQL.
Can SQL mode create my table?
No. SQL mode writes INSERT statements only. It does not create tables, choose column types, add constraints, or check relationships against a live database.
Can I use generated UUIDs or strings as secrets?
No. The generator is deterministic so fixtures can be repeated. Do not use its UUIDs, words, SKUs, or other values as passwords, security tokens, or production identifiers.
Does recipe file content get uploaded for generation?
Recipe files are read into the page, and generated values are produced in the page. The visible workflow does not submit recipe text or generated rows for a separate generation step.
Glossary:
- Field recipe
- A line-based description of generated fields using a field name, a type, and optional settings.
- Seed
- Text that helps make generated rows repeatable when the recipe and row count stay the same.
- Fixture
- A small dataset used for tests, imports, demos, seed data, or staging workflows.
- NDJSON
- Newline-delimited JSON, where each generated row is written as one JSON object per line.
- Blank rate
- The percentage chance that a non-sequence generated field becomes blank for a row.
- SQL dialect
- The selected SQL style used for identifier quoting in generated INSERT statements.
- Quality Ledger
- The result table that reports row volume, field coverage, repeatability, blanks, format, warnings, and errors.
References:
- Guidelines for Evaluating Differential Privacy Guarantees, NIST, March 2025.
- RFC 2606: Reserved Top Level DNS Names, IETF, June 1999.
- RFC 4180: Common Format and MIME Type for CSV Files, IETF, October 2005.
- RFC 8259: The JavaScript Object Notation (JSON) Data Interchange Format, IETF, December 2017.