Show HN: Plait.py – a fake data modeler

Dowwie · on Jan 9, 2018

I've found the faker library [1] useful.

The fake data that I've bothered to model are weighted age ranges. Fortunately, as of Python3.6, you can access it from random.choices [2] in the stdlib

[1] https://github.com/joke2k/faker

[2] https://docs.python.org/3/library/random.html#random.choices

example: https://gist.github.com/Dowwie/8409d871ddae913e44c61bc4d47ce...

logv · on Jan 9, 2018

part of what prompted the work on plait.py was that joke2k/faker was reasonably slow to generate 10K fake names for me: https://paste.ubuntu.com/26354987/

PS. that's a really cool python tip!

Dowwie · on Jan 9, 2018

Well done, then! :) Is plait a drop-in replacement for faker?

logv · on Jan 9, 2018

it's a drop in replacement for stympy/faker, but not joke2k faker

joke2k/faker is python and the data is stored in code (all or most of the random values are in .py files around the codebase), perhaps leading to its slowness.

stympy/faker is ruby and its random values are in yaml files, with some fields defined as ruby functions (those are not supported by plait.py).

can use 'plait -l' and 'plait -ll name' and 'plait -ll name.name' (more info in the README) to get a list of fake fields available.

Dowwie · on Jan 9, 2018

if/when you tire with the latest performance improvement, consider porting to Rust and adapting it to python via cffi

sfvisser · on Jan 9, 2018

Another great approach to generating notional data is using the Haskell QuickCheck library and specifically the Arbitrary type class. Super simple and extremely flexible/composable.

Probably also available in other languages.

aldanor · on Jan 9, 2018

The ones I used and can recommend:

“hypothesis” package in Python (+pytest plugin)

“rapidcheck” in C++

“quickcheck” in Rust

LrnByTeach · on Jan 10, 2018

good list for generating model data / fake data

portlander12345 · on Jan 10, 2018

testcheck is a JavaScript equivalent.

ironix · on Jan 10, 2018

Interesting, though I must ask: Why YML instead of a DSL?

Granted, I come from Ruby, and writing DSLs is pretty typical. Maybe not so popular in Python.

I am asking this because I become suspicious of config languages that read like code. Is not a bonafide programming language the better choice in this scenario? i.e. all overly-configurable formats (e.g. Terraform .tf files, JSON schemas...) converge on just being a new scripting language?

logv · on Jan 10, 2018

good point! i'm not against a DSL. as I was working on plait.py, one thought going through my head was: "am i re-writing haskell or lisp but worse"? my experience with python and DSL is that I need to use YACC / PLY to create a grammar and so on. maybe a lot of work. i take it that its easier in ruby?

yaml was a format I chose because it is easy to write (close to human), but can not express full programming concepts (but yes to some metaprogramming). i did not want the templates to be full powered as they are meant to be able to express relationships between variables, but not much more (especially not side effects). they also support lazy evaluation - statements do not need to be in order. this is closer to a "mathematical language" for me.

the choice for yaml was also based on the premise that if performance becomes an issue, can hopefully move to another language but retain templates (will have to re-implement python's "random" compat, though)

mikeokner · on Jan 9, 2018

Cool, is there a way to use it to dynamically generate data (for streaming)? Would be nice to be able to just call something like .next() and get another record so a simulator can run for an indefinite period of time.

logv · on Jan 9, 2018

if you create a template and keep calling .gen_record(), i think it will do what you want. Template() does not implement python's __next__ or __iter__ at the moment, but that's a good idea - i'm very open to diffs :-D

dgrant · on Jan 9, 2018

Could I use this to generate XML?

logv · on Jan 9, 2018

if you already have a way of printing XML, you can add a "printer" field (that is a python function) to your template, like so:http://github.com/plaitpy/plaitpy/blob/master/templates/test...

if that function uses an import, you might also need to add an "imports" field, like in this example: https://github.com/plaitpy/plaitpy/blob/master/templates/web...

otherwise, that's a feature that can be added here: https://github.com/plaitpy/plaitpy/blob/master/src/fields.py..., if it works for you (and is added as a flag), i'd be happy to take patches.

bedros · on Jan 10, 2018

What advantages over faker

dgrant · on Jan 9, 2018

Could this be used to generate XML?

ship_it · on Jan 9, 2018

Yes, it's open-source.

dalacv · on Jan 9, 2018

a faker faker