Data model

One of the frequently asked questions by projects that use e-mission is "where is the data model"? The answer is that it is stored in code.

The main reason is that there is no dedicated documentation team to keep the code and documentation up to date, and I didn't want to cause confusion by having obsolete docs.

So the data model is represented by wrapper classes, stored in emission/core/wrapper.

  • Each python wrapper class contains the list of fields, along with a brief description.
  • Classes can inherit from other classes (e.g. cleaned_trip inherits from trip)
  • Entry is a special class that represents an entry in the timeseries, including both data and metadata.
  • A full list of classes, including a brief description of each type, is in emission/core/wrapper/entry.py
  • The class represents the data part of an entry

The whole wrapper mechanism is based on the attrdict module, with some extensions for validating not just which fields exist in the object, but which are valid for a particular class.

In [1]: import emission.core.get_database as edb
Connecting to database URL localhost

In [2]: import emission.core.wrapper.entry as ecwe

In [3]: entry_dict = edb.get_timeseries_db().find_one()

In [4]: entry_dict["metadata"]["key"]
Out[4]: 'stats/server_api_time'

In [5]: entry = ecwe.Entry(entry_dict)

In [6]: entry.metadata.key
Out[6]: 'stats/server_api_time'

In [7]: type(entry)
Out[7]: emission.core.wrapper.entry.Entry

In [8]: type(entry.data)
Out[8]: emission.core.wrapper.statsevent.Statsevent

In [9]: entry.data.name
Out[9]: 'POST_/usercache/get'

In [10]: entry.data.reading
Out[10]: 0.41276121139526367

The wrappers also so some basic validation of attributes so that errors can be caught at compile time. Even if that is a bit un-pythonic :)

In [11]: entry.beta
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-6d2537c58b13> in <module>()
----> 1 entry.beta

/Users/shankari/e-mission/e-mission-server/emission/core/wrapper/wrapperbase.py in __getattr__(self, key)
     73         return self._build(key, self[key])
     74     else:
---> 75         raise AttributeError("property %s is not defined for %s" % (key, self.__class__.__name__))
     76
     77   def _writable(self, key):

AttributeError: property beta is not defined for Entry

In [12]: entry.data.beading
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-500ae1bd7984> in <module>()
----> 1 entry.data.beading

/Users/shankari/e-mission/e-mission-server/emission/core/wrapper/wrapperbase.py in __getattr__(self, key)
     73         return self._build(key, self[key])
     74     else:
---> 75         raise AttributeError("property %s is not defined for %s" % (key, self.__class__.__name__))
     76
     77   def _writable(self, key):

AttributeError: property beading is not defined for Statsevent

Note that, in order to be reproducible, our data model is designed to be read-only. This means that if you work on some data and generate some output, that output is a separate object which you should store separately. This ensures that we can blow away all analysis results at any time and recreate them. Because of this, the wrappers are designed to be read-only as well.

In [13]: entry.data.reading = 50000
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-53ad43c60534> in <module>()
----> 1 entry.data.reading = 50000

/Users/shankari/e-mission/e-mission-server/emission/core/wrapper/wrapperbase.py in __setattr__(self, key, value)
     95                 return super(WrapperBase, self).__setattr__(key, value)
     96         else:
---> 97             raise AttributeError("property %s is read-only" % key)
     98     else:
     99         raise AttributeError("property %s is not defined for %s" % (key, self.__class__.__name__))

AttributeError: property reading is read-only

In order to create a new entry, you can use createEntry with a data object. There are examples of creating entries all over the analysis pipeline, but here's an example similar to the one above.

We first create the data object

In [14]: import emission.core.wrapper.statsevent

In [15]: new_data = emission.core.wrapper.statsevent.Statsevent()

In [16]: new_data.name = "modified"

In [17]: new_data.reading = 5000

In [19]: new_data.ts = 12345678

In [20]: new_data.fmt_time = "this is the formatted_time"

In [21]: new_data
Out[21]: Statsevent({'name': 'modified', 'reading': 5000, 'ts': 12345678, 'fmt_time': 'this is the formatted_time'})

We can't set the old data!!

In [22]: entry.data
Out[22]: Statsevent({'reading': 0.41276121139526367, 'name': 'POST_/usercache/get', 'ts': 1481533136.761428})

In [23]: entry.data = new_data
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-23-bb4e1aee5054> in <module>()
----> 1 entry.data = new_data

/Users/shankari/e-mission/e-mission-server/emission/core/wrapper/wrapperbase.py in __setattr__(self, key, value)
     95                 return super(WrapperBase, self).__setattr__(key, value)
     96         else:
---> 97             raise AttributeError("property %s is read-only" % key)
     98     else:
     99         raise AttributeError("property %s is not defined for %s" % (key, self.__class__.__name__))

AttributeError: property data is read-only

We create a new entry using create_entry

In [24]: new_entry = ecwe.Entry.create_entry(entry.user_id, entry.metadata.key, new_data
    ...: )

In [25]: new_entry.data
Out[25]: Statsevent({'name': 'modified', 'reading': 5000, 'ts': 12345678, 'fmt_time': 'this is the formatted_time'})