Test Data Generator

Field recipe:

Define one field per line as name:type:options. Types include sequence, uuid, name, email, city, date, integer, number, money, boolean, choice, sku, url, and word.

{{ fileStatus || 'Drop TXT, CSV, or recipe text onto the textarea.' }}

Rows:

Use enough rows to test pagination, import validation, and common empty-state handling.

Output format:

Choose the file shape you want to paste into tests, fixtures, seeds, or import tools.

SQL table name:

Use the destination fixture, seed, or staging table name.

SQL dialect:

PostgreSQL and SQLite use double quotes; MySQL uses backticks.

Seed:

Change this when you want a different but repeatable dataset.

{{ item }}

Date range:

Use fixed dates so repeated seeds stay stable over time.

Email domain:

Use reserved or internal test domains rather than production customer domains.

Global blank rate:

Keep at 0 for clean fixtures; use small rates to test optional-field handling.

Default sequence start:

Field-level options such as id:sequence:start=1001,step=5 override this.

JSON root key:

Only affects JSON array output. NDJSON stays one JSON object per line.

Include CSV header row

Only affects CSV output. Other formats always include field names.

{{ dataset.outputText }}

Field	Type	Options	Blank %	Example	Note	Copy
{{ row.field }}	{{ row.type }}	{{ row.options }}	{{ row.blankPercent }}	{{ row.example }}	{{ row.note }}

Check	Status	Value	Detail	Copy
{{ row.check }}	{{ row.status }}	{{ row.value }}	{{ row.detail }}

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

Test data has to look real enough to exercise software, but fake enough to avoid exposing people, customers, or internal business facts. A useful fixture preserves the shapes that matter to the target workflow: field names, dates, identifiers, optional blanks, category choices, quoting, and row volume.

Different testing jobs need different kinds of samples. A unit test may need only three rows with obvious values. An import smoke test needs enough fields, commas, quotes, nulls, and date values to stress parsing. A regression test needs the same rows again after a bug fix, while a demo seed needs coherent-looking values that do not borrow from real accounts.

Repeatability is the main reason to use a recipe and a seed instead of a fresh random list every time. If the row count, recipe, date range, and seed stay fixed, a failing check can be rerun with the same fixture. When those inputs drift, the failure may follow the data instead of the product change.

Common synthetic test data choices and the risk each one controls
Choice	Why It Matters	Common Mistake
Small fixture	Fast to review and good for focused unit or component checks.	Too few rows to reveal pagination, sorting, or bulk import problems.
Fixed seed	Keeps generated rows stable across repeated test runs.	Changing the seed while investigating a failure and losing comparability.
Reserved domain	Prevents email or URL values from pointing at real recipients or sites.	Using production-like domains that can leak or accidentally contact real systems.
Blank values	Exercises optional-field handling, null conversion, and import defaults.	Leaving every field populated and missing the failure path.

Field recipe, seed, and row count flowing into a repeatable synthetic dataset with quality checks

Synthetic data is not the same thing as anonymized production data. A copied list of rare plan names, customer segments, internal IDs, or recognizable examples can still reveal business context. Safer fixtures are written intentionally, use reserved or internal test domains, and keep option lists reviewed before the rows leave a private workspace.

The best dataset is usually the smallest one that exercises the behavior under test. Five rows may be enough for a unit test, while a bulk import check may need hundreds. Stable recipes make failures reproducible; deliberately varied recipes help find edge cases after the baseline is understood.

How to Use This Tool:

Start with the field recipe, then set the row count and output shape that match the fixture, seed, import file, or sample payload you need.

Write one active line per field in Field recipe using name:type:options. For example, customer_id:sequence:start=1001 creates a stable identifier, and plan:choice:Starter|Team|Enterprise limits a field to named choices.
Use Browse recipe, drag-and-drop, Load sample, or Normalize when the recipe starts in a note, CSV header draft, or schema sketch. If the file warning appears, use a TXT, CSV, or recipe file under 128 KB.
Set Rows between 1 and 500. Choose Output format as CSV for tabular imports, JSON array for structured fixtures, NDJSON for line-by-line processing, or SQL INSERT for database seed statements.
When Output format is SQL INSERT, fill SQL table name and choose SQL dialect. The selected dialect controls identifier quoting for PostgreSQL, MySQL, SQLite, SQL Server, Oracle, Snowflake, BigQuery, or ANSI-style SQL.
Open Advanced when you need fixed date limits, a non-routable email domain, nullable fields through Global blank rate, a different Default sequence start, a JSON wrapper through JSON root key, or CSV without a header row.
Check the summary and the Quality Ledger. Fix blocking errors such as no active field lines or more than 24 parsed fields before copying, downloading, or exporting the generated data.
Review Field Recipe, Synthetic Dataset, and Field Mix Chart before using the output in a test run. The first-row examples, blank count, seed, and field-type mix should match the behavior you intend to exercise.

Interpreting Results:

Synthetic dataset ready means the recipe parsed, the row count stayed within range, and the selected format produced text. It does not prove that the target importer, API contract, database constraints, or business rules will accept the rows.

Quality Ledger is the main place to check confidence. It reports row volume, parsed field coverage, repeatability, blank cells, artifact format, recipe warnings, and blocking errors. A clean ledger means the generator found no current recipe problem, not that the sample is statistically realistic or privacy safe.

Test data result areas and checks to make before using generated rows
Result Area	Use It For	Verify Separately
`Synthetic Dataset`	The generated CSV, JSON, NDJSON, or SQL text	Import acceptance, schema constraints, and downstream validation
`Field Recipe`	Normalized field names, value types, options, blank rates, examples, and unique counts	Exact column names expected by the target workflow
`Quality Ledger`	Warnings, errors, blank count, row count, seed, and selected format	Whether the rows cover edge cases that matter to the test
`Field Mix Chart`	A quick count of generated field types	Whether each type has the right values for the scenario

False confidence usually comes from neat rows that skip the failure path. If the target system must reject a duplicate email, invalid SKU, expired date, missing amount, or unknown status, add that case deliberately or keep a separate hand-written fixture for negative testing.

Technical Details:

A field recipe is a lightweight fixture schema. Each non-comment line becomes a field name, a value type, and optional settings. Blank lines and lines that begin with # are ignored, field names are normalized to identifier-style names, and duplicate names receive suffixes so both columns remain visible.

Repeatability comes from a deterministic pseudo-random sequence built from the seed text, the active recipe source lines, and the row count. Keeping those inputs unchanged recreates the same rows. Changing a field option, row count, or seed gives a different sequence, even when the visible field names look similar.

Generated values come from built-in lists, field options, arithmetic ranges, and date ranges. That makes the rows useful for fixtures and smoke tests, but it does not infer production distributions, enforce database relationships, create secure tokens, or provide differential privacy guarantees.

Transformation Core:

How a field recipe becomes a generated synthetic dataset
Stage	Rule	User Check
Parse recipe	Each active line is split into field name, type, and options; aliases and inferred types are normalized.	`Field Recipe` shows the parsed field, display type, options, blank rate, example, and unique count.
Validate bounds	At least one active field is required, and a recipe can contain up to 24 parsed fields.	`Quality Ledger` shows `Recipe error` rows for blocking failures.
Generate rows	The seeded sequence creates 1 to 500 rows from the current fields, date range, blank rate, sequence start, and field-level options.	The summary reports row count, field count, blank count, selected format, and seed.
Format artifact	The same row objects are rendered as CSV, JSON array, NDJSON, or SQL INSERT statements.	`Synthetic Dataset` shows the copy-ready artifact, while `JSON` shows the run record.

Formula Core:

Sequence and numeric fields use simple deterministic arithmetic after the seeded random value is selected. In the formulas below, i is the zero-based row position, r is a seeded random value from 0 up to but not including 1, and d is the decimal-place setting.

\begin{array}{lcl} {Sequence}_{i} & = & start + i \times step \\ Integer & = & ⌊ low + r \times (high - low + 1) ⌋ \\ Decimal & = & \frac{⌊}{(} \end{array}

A sequence field with start=1001, step=5, and row position 3 produces 1016. An integer field with min=1 and max=100 can produce both 1 and 100 because the upper bound is inclusive. A money field defaults to two decimal places unless decimals= changes the precision.

Format and Boundary Rules:

Supported output formats and generator boundary rules
Area	Rule	Boundary to Check
Supported value types	`sequence`, `uuid`, names, email, phone, company, city, country, dates, numbers, money, boolean, choice, word, sentence, SKU, URL, and IPv4 values are supported.	Unknown or alias-like types may be inferred from the field name, so review `Recipe warning` rows.
Rows and fields	The row count is bounded from 1 through 500, and parsed fields are bounded from 1 through 24.	Out-of-range field counts block output; row count is constrained to the supported range.
Blank values	Global and field-level blank rates apply to non-sequence fields before the type-specific value is generated.	Blanks render as empty CSV cells, `null` in JSON and NDJSON, and `NULL` in SQL.
Dates and decimals	Date and datetime values draw from the configured inclusive date range. Number and money fields use 0 through 6 decimal places.	Reversed date or numeric endpoints are normalized before values are drawn.
CSV and JSON	CSV output quotes cells containing commas, quotes, or line breaks. JSON array output can be wrapped with an optional root key, while NDJSON writes one JSON object per line.	CSV header presence is controlled by the header switch; NDJSON is not wrapped by the JSON root key.
SQL INSERT	SQL output writes one INSERT statement per generated row and quotes identifiers according to the selected dialect.	It does not create tables, infer column types, add indexes, or validate foreign keys.

Because the random stream is deterministic, it is useful for reproducible fixtures and unsuitable for secrets, passwords, security tokens, or sampling claims. Treat generated rows as test artifacts that still need schema checks and human review before they are shared.

Accuracy and Privacy Notes:

Fabricated rows can reduce exposure to real records, but they are not automatically private or representative. Built-in names, locations, companies, and words are convenience lists for fixtures, not a statistical model of a population.

Do not paste production records into the recipe. Field names, rare category labels, and example option values can reveal internal business details.
Use reserved or internal domains for email and URL fields, such as domains ending in .test, when generated contact values should never point at real recipients.
Recipe files are read into the page for generation, and the visible workflow does not submit recipe text or generated rows for a separate generation step.
Do not treat these rows as anonymized production data. Differential privacy requires specific algorithms and guarantees that recipe-based fixture generation does not provide.

Worked Examples:

Customer Import Smoke Test:

A recipe with customer_id:sequence:start=1001, email:email, plan:choice:Starter|Team|Enterprise, monthly_spend:money:min=12,max=480,decimals=2, and signup_date:date can create a 12-row CSV import sample. Quality Ledger should show Row volume as 12 rows, Repeatability with the selected seed, and Artifact format as CSV.

Line-Oriented API Fixture:

For a log or stream consumer, choose NDJSON and use fields such as event_id:uuid, account_id:sequence:start=2000, event_type:choice:created|updated|cancelled, and created_at:datetime. Synthetic Dataset should show one JSON object per line, and Field Recipe should show the first-row example for each field.

SQL Seed with Nullable Fields:

Select SQL INSERT for a table named customers and define id:sequence:start=5000, company:company, support_score:integer:min=1,max=100,blank=15, and active:boolean. When nullable values appear, Blank cells reports the count and the SQL text writes NULL for those cells.

Duplicate Name Cleanup:

If two active recipe lines both begin with email, the parser keeps both by adding a suffix to the duplicate field. Quality Ledger adds a Recipe warning, and Field Recipe shows the normalized names. Rename the fields yourself when an importer expects exact column names.

Blocked Recipe Boundary:

A pasted recipe with no active field lines changes the summary to Recipe needs attention and shows Add at least one field line before generating data. A recipe with more than 24 parsed fields also blocks output. Reduce the field list or split the fixture, then confirm that Synthetic dataset ready returns.

FAQ:

Why did the same seed create different rows?

The seed is combined with the active recipe lines and Rows. Keep Field recipe, Rows, and Seed unchanged when you need the exact same dataset again.

Why did my field name change?

Field names are normalized to identifier-style names. Spaces and punctuation become underscores, names that start unsafely are adjusted, and duplicate names receive numeric suffixes.

What happens to blank values?

The global blank rate applies to non-sequence fields unless a field line sets blank=. Blanks become empty CSV cells, null in JSON and NDJSON, and NULL in SQL.

Can SQL mode create my table?

No. SQL mode writes INSERT statements only. It does not create tables, choose column types, add constraints, or check relationships against a live database.

Can I use generated UUIDs or strings as secrets?

No. The generator is deterministic so fixtures can be repeated. Do not use its UUIDs, words, SKUs, or other values as passwords, security tokens, or production identifiers.

Does recipe file content get uploaded for generation?

Recipe files are read into the page, and generated values are produced in the page. The visible workflow does not submit recipe text or generated rows for a separate generation step.

Glossary:

Field recipe: A line-based description of generated fields using a field name, a type, and optional settings.
Seed: Text that helps make generated rows repeatable when the recipe and row count stay the same.
Fixture: A small dataset used for tests, imports, demos, seed data, or staging workflows.
NDJSON: Newline-delimited JSON, where each generated row is written as one JSON object per line.
Blank rate: The percentage chance that a non-sequence generated field becomes blank for a row.
SQL dialect: The selected SQL style used for identifier quoting in generated INSERT statements.
Quality Ledger: The result table that reports row volume, field coverage, repeatability, blanks, format, warnings, and errors.

References:

Guidelines for Evaluating Differential Privacy Guarantees, NIST, March 2025.
RFC 2606: Reserved Top Level DNS Names, IETF, June 1999.
RFC 4180: Common Format and MIME Type for CSV Files, IETF, October 2005.
RFC 8259: The JavaScript Object Notation (JSON) Data Interchange Format, IETF, December 2017.