Is this free to use?
Yes. There is a free tier for manual generation and basic API usage. Higher limits, team features, and production workflows are in paid plans.
What makes DerpData different from Mockaroo or Faker.js?
Most generators are list-driven or template-driven. DerpData also uses Markov models for text-heavy fields, so names, addresses, notes, and descriptions follow realistic token transitions instead of repeating obvious patterns.
Can I use this for ML training data?
Yes, if your pipeline needs realistic structure without real PII. Teams use DerpData to build synthetic corpora for training, evaluation, and red-team tests before touching regulated data.
Does the API have rate limits?
Yes. Limits depend on plan and endpoint. Anonymous traffic is limited more aggressively, and high-volume workflows should use API keys plus batching strategies.
Can I self-host this?
Today, the managed version is the primary product. Self-hosting is possible in controlled environments by request when teams need private deployment constraints.
What formats does export support?
JSON, CSV, SQL, XML, YAML, and JSONL are supported depending on endpoint. Schema Builder and masking workflows expose the common formats directly in the UI.
Is the generated data truly random or does it follow patterns?
It follows statistical patterns on purpose. Pure random output looks fake to people and systems. DerpData uses controlled randomness with corpus-informed probabilities so results stay varied and plausible.
How does the Markov engine work?
The engine tokenizes training corpora and builds transition probabilities for token sequences. Generation then walks that state graph with weighted sampling, producing text that mirrors distribution and flow without copying source records.