The fake data that I've bothered to model are weighted age ranges. Fortunately, as of Python3.6, you can access it from random.choices [2] in the stdlib
part of what prompted the work on plait.py was that joke2k/faker was reasonably slow to generate 10K fake names for me: https://paste.ubuntu.com/26354987/
it's a drop in replacement for stympy/faker, but not joke2k faker
joke2k/faker is python and the data is stored in code (all or most of the random values are in .py files around the codebase), perhaps leading to its slowness.
stympy/faker is ruby and its random values are in yaml files, with some fields defined as ruby functions (those are not supported by plait.py).
can use 'plait -l' and 'plait -ll name' and 'plait -ll name.name' (more info in the README) to get a list of fake fields available.
Another great approach to generating notional data is using the Haskell QuickCheck library and specifically the Arbitrary type class. Super simple and extremely flexible/composable.
Interesting, though I must ask: Why YML instead of a DSL?
Granted, I come from Ruby, and writing DSLs is pretty typical. Maybe not so popular in Python.
I am asking this because I become suspicious of config languages that read like code. Is not a bonafide programming language the better choice in this scenario? i.e. all overly-configurable formats (e.g. Terraform .tf files, JSON schemas...) converge on just being a new scripting language?
good point! i'm not against a DSL. as I was working on plait.py, one thought going through my head was: "am i re-writing haskell or lisp but worse"? my experience with python and DSL is that I need to use YACC / PLY to create a grammar and so on. maybe a lot of work. i take it that its easier in ruby?
yaml was a format I chose because it is easy to write (close to human), but can not express full programming concepts (but yes to some metaprogramming). i did not want the templates to be full powered as they are meant to be able to express relationships between variables, but not much more (especially not side effects). they also support lazy evaluation - statements do not need to be in order. this is closer to a "mathematical language" for me.
the choice for yaml was also based on the premise that if performance becomes an issue, can hopefully move to another language but retain templates (will have to re-implement python's "random" compat, though)
Cool, is there a way to use it to dynamically generate data (for streaming)? Would be nice to be able to just call something like .next() and get another record so a simulator can run for an indefinite period of time.
if you create a template and keep calling .gen_record(), i think it will do what you want. Template() does not implement python's __next__ or __iter__ at the moment, but that's a good idea - i'm very open to diffs :-D
The fake data that I've bothered to model are weighted age ranges. Fortunately, as of Python3.6, you can access it from random.choices [2] in the stdlib
[1] https://github.com/joke2k/faker
[2] https://docs.python.org/3/library/random.html#random.choices
example: https://gist.github.com/Dowwie/8409d871ddae913e44c61bc4d47ce...