This commit is contained in:
Mike Bayer
2011-11-17 14:16:10 -05:00
parent 4fa66beb0c
commit e3f1bcfdc9
12 changed files with 886 additions and 284 deletions

View File

@@ -1,6 +1,6 @@
syntax:regexp
^build/
^doc/build/output
^docs/build/output
.pyc$
.orig$
.egg-info

View File

@@ -51,196 +51,8 @@ fall through, and not be blocked. It is expected that
the "stale" version of the resource remain available at this
time while the new one is generated.
Using a Value Function with a Memcached-like Cache
---------------------------------------------------
The dogpile lock includes a more intricate mode of usage to optimize the
usage of a cache like Memcached. The difficulties Dogpile addresses
in this mode are:
* Values can disappear from the cache at any time, before our expiration
time is reached. Dogpile needs to be made aware of this and possibly
call the creation function ahead of schedule.
* There's no function in a Memcached-like system to "check" for a key without
actually retrieving it. If we need to "check" for a key each time,
we'd like to use that value instead of calling it twice.
* If we did end up generating the value on this get, we should return
that value instead of doing a cache round-trip.
To use this mode, the steps are as follows:
* Create the Dogpile lock with ``init=True``, to skip the initial
"force" of the creation function. This is assuming you'd like to
rely upon the "check the value" function for the initial generation.
Leave it at False if you'd like the application to regenerate the
value unconditionally when the dogpile lock is first created
(i.e. typically application startup).
* The "creation" function should return the value it creates.
* An additional "getter" function is passed to ``acquire()`` which
should return the value to be passed to the context block. If
the value isn't available, raise ``NeedRegenerationException``.
Example::
from dogpile import Dogpile, NeedRegenerationException
def get_value_from_cache():
value = my_cache.get("some key")
if value is None:
raise NeedRegenerationException()
return value
def create_and_cache_value():
value = my_expensive_resource.create_value()
my_cache.put("some key", value)
return value
dogpile = Dogpile(3600, init=True)
with dogpile.acquire(create_and_cache_value, get_value_from_cache) as value:
return value
Note that get_value_from_cache() should not raise NeedRegenerationException
a second time directly after create_and_cache_value() has been called.
Locking the "write" phase against the "readers"
------------------------------------------------
The dogpile lock can provide a mutex to the creation
function itself, so that the creation function can perform
certain tasks only after all "stale reader" threads have finished.
The example of this is when the creation function has prepared a new
datafile to replace the old one, and would like to switch in the
"new" file only when other threads have finished using it.
To enable this feature, use ``SyncReaderDogpile()``.
``SyncReaderDogpile.acquire_write_lock()`` then provides a safe-write lock
for the critical section where readers should be blocked::
from dogpile import SyncReaderDogpile
dogpile = SyncReaderDogpile(3600)
def some_creation_function():
create_expensive_datafile()
with dogpile.acquire_write_lock():
replace_old_datafile_with_new()
Using Dogpile for Caching
--------------------------
Dogpile is part of an effort to "break up" the Beaker
package into smaller, simpler components (which also work better). Here, we
illustrate how to replicate Beaker's "cache decoration"
function, to decorate any function and store the value in
Memcached::
import pylibmc
mc_pool = pylibmc.ThreadMappedPool(pylibmc.Client("localhost"))
from dogpile import Dogpile, NeedRegenerationException
def cached(key, expiration_time):
"""A decorator that will cache the return value of a function
in memcached given a key."""
def get_value():
with mc_pool.reserve() as mc:
value = mc.get(key)
if value is None:
raise NeedRegenerationException()
return value
dogpile = Dogpile(expiration_time, init=True)
def decorate(fn):
def gen_cached():
value = fn()
with mc_pool.reserve() as mc:
mc.put(key, value)
return value
def invoke():
with dogpile.acquire(gen_cached, get_value) as value:
return value
return invoke
return decorate
Above we can decorate any function as::
@cached("some key", 3600)
def generate_my_expensive_value():
return slow_database.lookup("stuff")
The Dogpile lock will ensure that only one thread at a time performs ``slow_database.lookup()``,
and only every 3600 seconds, unless Memcached has removed the value in which case it will
be called again as needed.
In particular, Dogpile's system allows us to call the memcached get() function at most
once per access, instead of Beaker's system which calls it twice, and doesn't make us call
get() when we just created the value.
Using Dogpile across lots of keys
----------------------------------
The above patterns all feature the usage of Dogpile as an object held persistently
for the lifespan of some value. Two more helpers can allow the dogpile to be created
as needed and then disposed, while still maintaining that concurrent threads lock.
Here's the memcached example again using that technique::
import pylibmc
mc_pool = pylibmc.ThreadMappedPool(pylibmc.Client("localhost"))
from dogpile import Dogpile, NeedRegenerationException
import pickle
import time
def cache(expiration_time)
dogpile_registry = Dogpile.registry(expiration_time)
def get_or_create(key):
def get_value():
with mc_pool.reserve() as mc:
value = mc.get(key)
if value is None:
raise NeedRegenerationException()
# deserialize a tuple
# (value, createdtime)
return pickle.loads(value)
dogpile = dogpile_registry.get(key)
def gen_cached():
value = fn()
with mc_pool.reserve() as mc:
# serialize a tuple
# (value, createdtime)
value = (value, time.time())
mc.put(key, pickle.dumps(value))
return value
with dogpile.acquire(gen_cached, value_and_created_fn=get_value) as value:
return value
return get_or_create
Above, we use ``Dogpile.registry()`` to create a name-based "registry" of ``Dogpile``
objects. This object will provide to us a ``Dogpile`` object that's
unique on a certain name (or any hashable object) when we call the ``get()`` method.
When all usages of that name are complete, the ``Dogpile``
object falls out of scope. This way, an application can handle millions of keys
without needing to have millions of ``Dogpile`` objects persistently resident in memory.
The next part of the approach here is that we'll tell Dogpile that we'll give it
the "creation time" that we'll store in our
cache - we do this using the ``value_and_created_fn`` argument, which assumes we'll
be storing and loading the value as a tuple of (value, createdtime). The creation time
should always be calculated via ``time.time()``. The ``acquire()`` function
returns the "value" portion of the tuple to us and uses the
"createdtime" portion to determine if the value is expired.
Dogpile is at the core of the `dogpile.cache <http://bitbucket.org/zzzeek/dogpile.cache>`_ package
which provides for a basic cache API and sample backends based on the dogpile concept.
Development Status
-------------------

95
docs/build/Makefile vendored Normal file
View File

@@ -0,0 +1,95 @@
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = output
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml pickle json htmlhelp qthelp latex changes linkcheck doctest
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dist-html same as html, but places files in /doc"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
clean:
-rm -rf $(BUILDDIR)/*
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
dist-html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) ..
@echo
@echo "Build finished. The HTML pages are in ../."
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/Alembic.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/Alembic.qhc"
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make all-pdf' or \`make all-ps' in that directory to" \
"run these through (pdf)latex."
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."

23
docs/build/api.rst vendored Normal file
View File

@@ -0,0 +1,23 @@
===
API
===
Dogpile
========
.. automodule:: dogpile.dogpile
:members:
NameRegistry
=============
.. automodule:: dogpile.nameregistry
:members:
Utilities
==========
.. automodule:: dogpile.readwrite_lock
:members:

9
docs/build/builder.py vendored Normal file
View File

@@ -0,0 +1,9 @@
def autodoc_skip_member(app, what, name, obj, skip, options):
if what == 'class' and skip and name in ('__init__',) and obj.__doc__:
return False
else:
return skip
def setup(app):
app.connect('autodoc-skip-member', autodoc_skip_member)

209
docs/build/conf.py vendored Normal file
View File

@@ -0,0 +1,209 @@
# -*- coding: utf-8 -*-
#
# Dogpile documentation build configuration file, created by
# sphinx-quickstart on Sat May 1 12:47:55 2010.
#
# This file is execfile()d with the current directory set to its containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys, os
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.append(os.path.abspath('.'))
# If your extensions are in another directory, add it here. If the directory
# is relative to the documentation root, use os.path.abspath to make it
# absolute, like shown here.
sys.path.insert(0, os.path.abspath('../../'))
sys.path.insert(0, os.path.abspath('.'))
import dogpile
# -- General configuration -----------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.intersphinx', 'builder']
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'Dogpile'
copyright = u'2011, Mike Bayer'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = dogpile.__version__
# The full version, including alpha/beta/rc tags.
release = dogpile.__version__
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ''
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = '%B %d, %Y'
# List of documents that shouldn't be included in the build.
#unused_docs = []
# List of directories, relative to source directory, that shouldn't be searched
# for source files.
exclude_trees = []
# The reST default role (used for this markup: `text`) to use for all documents.
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
#modindex_common_prefix = []
autodoc_default_flags = 'special-members'
# -- Options for HTML output ---------------------------------------------------
# The theme to use for HTML and HTML Help pages. Major themes that come with
# Sphinx are currently 'default' and 'sphinxdoc'.
html_theme = 'nature'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
#html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
#html_additional_pages = {}
# If false, no module index is generated.
#html_use_modindex = True
# If false, no index is generated.
#html_use_index = True
# If true, the index is split into individual pages for each letter.
#html_split_index = False
# If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
#html_use_opensearch = ''
# If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml").
#html_file_suffix = ''
# Output file base name for HTML help builder.
htmlhelp_basename = 'dogpiledoc'
# -- Options for LaTeX output --------------------------------------------------
# The paper size ('letter' or 'a4').
#latex_paper_size = 'letter'
# The font size ('10pt', '11pt' or '12pt').
#latex_font_size = '10pt'
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass [howto/manual]).
latex_documents = [
('index', 'dogpile.tex', u'Dogpile Documentation',
u'Mike Bayer', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#latex_use_parts = False
# Additional stuff for the LaTeX preamble.
#latex_preamble = ''
# Documents to append as an appendix to all manuals.
#latex_appendices = []
# If false, no module index is generated.
#latex_use_modindex = True
#{'python': ('http://docs.python.org/3.2', None)}
intersphinx_mapping = {'sqla':('http://www.sqlalchemy.org/docs/', None)}

26
docs/build/index.rst vendored Normal file
View File

@@ -0,0 +1,26 @@
===================================
Welcome to Dogpile's documentation!
===================================
`Dogpile <http://bitbucket.org/zzzeek/dogpile>`_ provides the *dogpile* lock,
one which allows a single thread or process to generate
an expensive resource while other threads/processes use the "old" value, until the
"new" value is ready.
Dogpile is at the core of the `dogpile.cache <http://bitbucket.org/zzzeek/dogpile.cache>`_ package
which provides for a basic cache API and sample backends based on the dogpile concept.
.. toctree::
:maxdepth: 2
usage
api
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

113
docs/build/make.bat vendored Normal file
View File

@@ -0,0 +1,113 @@
@ECHO OFF
REM Command file for Sphinx documentation
set SPHINXBUILD=sphinx-build
set BUILDDIR=build
set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% source
if NOT "%PAPER%" == "" (
set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
)
if "%1" == "" goto help
if "%1" == "help" (
:help
echo.Please use `make ^<target^>` where ^<target^> is one of
echo. html to make standalone HTML files
echo. dirhtml to make HTML files named index.html in directories
echo. pickle to make pickle files
echo. json to make JSON files
echo. htmlhelp to make HTML files and a HTML help project
echo. qthelp to make HTML files and a qthelp project
echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter
echo. changes to make an overview over all changed/added/deprecated items
echo. linkcheck to check all external links for integrity
echo. doctest to run all doctests embedded in the documentation if enabled
goto end
)
if "%1" == "clean" (
for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
del /q /s %BUILDDIR%\*
goto end
)
if "%1" == "html" (
%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
echo.
echo.Build finished. The HTML pages are in %BUILDDIR%/html.
goto end
)
if "%1" == "dirhtml" (
%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
echo.
echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
goto end
)
if "%1" == "pickle" (
%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
echo.
echo.Build finished; now you can process the pickle files.
goto end
)
if "%1" == "json" (
%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
echo.
echo.Build finished; now you can process the JSON files.
goto end
)
if "%1" == "htmlhelp" (
%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
echo.
echo.Build finished; now you can run HTML Help Workshop with the ^
.hhp project file in %BUILDDIR%/htmlhelp.
goto end
)
if "%1" == "qthelp" (
%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
echo.
echo.Build finished; now you can run "qcollectiongenerator" with the ^
.qhcp project file in %BUILDDIR%/qthelp, like this:
echo.^> qcollectiongenerator %BUILDDIR%\qthelp\Alembic.qhcp
echo.To view the help file:
echo.^> assistant -collectionFile %BUILDDIR%\qthelp\Alembic.ghc
goto end
)
if "%1" == "latex" (
%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
echo.
echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
goto end
)
if "%1" == "changes" (
%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
echo.
echo.The overview file is in %BUILDDIR%/changes.
goto end
)
if "%1" == "linkcheck" (
%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
echo.
echo.Link check complete; look for any errors in the above output ^
or in %BUILDDIR%/linkcheck/output.txt.
goto end
)
if "%1" == "doctest" (
%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
echo.
echo.Testing of doctests in the sources finished, look at the ^
results in %BUILDDIR%/doctest/output.txt.
goto end
)
:end

321
docs/build/usage.rst vendored Normal file
View File

@@ -0,0 +1,321 @@
Introduction
============
At its core, Dogpile provides a locking interface around a "value creation" function.
The interface supports several levels of usage, starting from
one that is very rudimentary, then providing more intricate
usage patterns to deal with certain scenarios. The documentation here will attempt to
provide examples that use successively more and more of these features, as
we approach how a fully featured caching system might be constructed around
Dogpile.
Note that when using the `dogpile.cache <http://bitbucket.org/zzzeek/dogpile.cache>`_
package, the constructs here provide the internal implementation for that system,
and users of that system don't need to access these APIs directly (though understanding
the general patterns is a terrific idea in any case).
Using the core Dogpile APIs described here directly implies you're building your own
resource-usage system outside, or in addition to, the one
`dogpile.cache <http://bitbucket.org/zzzeek/dogpile.cache>`_ provides.
Usage
=====
A simple example::
from dogpile import Dogpile
# store a reference to a "resource", some
# object that is expensive to create.
the_resource = [None]
def some_creation_function():
# create the resource here
the_resource[0] = create_some_resource()
def use_the_resource():
# some function that uses
# the resource. Won't reach
# here until some_creation_function()
# has completed at least once.
the_resource[0].do_something()
# create Dogpile with 3600 second
# expiry time
dogpile = Dogpile(3600)
with dogpile.acquire(some_creation_function):
use_the_resource()
Above, ``some_creation_function()`` will be called
when :meth:`.Dogpile.acquire` is first called. The
remainder of the ``with`` block then proceeds. Concurrent threads which
call :meth:`.Dogpile.acquire` during this initial period
will be blocked until ``some_creation_function()`` completes.
Once the creation function has completed successfully the first time,
new calls to :meth:`.Dogpile.acquire` will call ``some_creation_function()``
each time the "expiretime" has been reached, allowing only a single
thread to call the function. Concurrent threads
which call :meth:`.Dogpile.acquire` during this period will
fall through, and not be blocked. It is expected that
the "stale" version of the resource remain available at this
time while the new one is generated.
By default, :class:`.Dogpile` uses Python's ``threading.Lock()``
to synchronize among threads within a process. This can
be altered to support any kind of locking as we'll see in a
later section.
Locking the "write" phase against the "readers"
------------------------------------------------
The dogpile lock can provide a mutex to the creation
function itself, so that the creation function can perform
certain tasks only after all "stale reader" threads have finished.
The example of this is when the creation function has prepared a new
datafile to replace the old one, and would like to switch in the
"new" file only when other threads have finished using it.
To enable this feature, use :class:`.SyncReaderDogpile`.
:meth:`.SyncReaderDogpile.acquire_write_lock` then provides a safe-write lock
for the critical section where readers should be blocked::
from dogpile import SyncReaderDogpile
dogpile = SyncReaderDogpile(3600)
def some_creation_function(dogpile):
create_expensive_datafile()
with dogpile.acquire_write_lock():
replace_old_datafile_with_new()
# usage:
with dogpile.acquire(some_creation_function):
read_datafile()
With the above pattern, :class:`.SyncReaderDogpile` will
allow concurrent readers to read from the current version
of the datafile as
the ``create_expensive_datafile()`` function proceeds with its
job of generating the information for a new version.
When the data is ready to be written, the
:meth:`.SyncReaderDogpile.acquire_write_lock` call will
block until all current readers of the datafile have completed
(that is, they've finished their own :meth:`.Dogpile.acquire`
blocks). The ``some_creation_function()`` function
then proceeds, as new readers are blocked until
this function finishes its work of
rewriting the datafile.
Using a Value Function with a Cache Backend
-------------------------------------------
The dogpile lock includes a more intricate mode of usage to optimize the
usage of a cache like Memcached. The difficulties Dogpile addresses
in this mode are:
* Values can disappear from the cache at any time, before our expiration
time is reached. Dogpile needs to be made aware of this and possibly
call the creation function ahead of schedule.
* There's no function in a Memcached-like system to "check" for a key without
actually retrieving it. If we need to "check" for a key each time,
we'd like to use that value instead of calling it twice.
* If we did end up generating the value on this get, we should return
that value instead of doing a cache round-trip.
To use this mode, the steps are as follows:
* Create the Dogpile lock with ``init=True``, to skip the initial
"force" of the creation function. This is assuming you'd like to
rely upon the "check the value" function for the initial generation.
Leave it at False if you'd like the application to regenerate the
value unconditionally when the dogpile lock is first created
(i.e. typically application startup).
* The "creation" function should return the value it creates.
* An additional "getter" function is passed to ``acquire()`` which
should return the value to be passed to the context block. If
the value isn't available, raise ``NeedRegenerationException``.
Example::
from dogpile import Dogpile, NeedRegenerationException
def get_value_from_cache():
value = my_cache.get("some key")
if value is None:
raise NeedRegenerationException()
return value
def create_and_cache_value():
value = my_expensive_resource.create_value()
my_cache.put("some key", value)
return value
dogpile = Dogpile(3600, init=True)
with dogpile.acquire(create_and_cache_value, get_value_from_cache) as value:
return value
Note that ``get_value_from_cache()`` should not raise :class:`.NeedRegenerationException`
a second time directly after ``create_and_cache_value()`` has been called.
Using Dogpile for Caching
--------------------------
Dogpile is part of an effort to "break up" the Beaker
package into smaller, simpler components (which also work better). Here, we
illustrate how to approximate Beaker's "cache decoration"
function, to decorate any function and store the value in
Memcached. We create a Python decorator function called ``cached()`` which
will provide caching for the output of a single function. It's given
the "key" which we'd like to use in Memcached, and internally it makes
usage of its own :class:`.Dogpile` object that is dedicated to managing
this one function/key::
import pylibmc
mc_pool = pylibmc.ThreadMappedPool(pylibmc.Client("localhost"))
from dogpile import Dogpile, NeedRegenerationException
def cached(key, expiration_time):
"""A decorator that will cache the return value of a function
in memcached given a key."""
def get_value():
with mc_pool.reserve() as mc:
value = mc.get(key)
if value is None:
raise NeedRegenerationException()
return value
dogpile = Dogpile(expiration_time, init=True)
def decorate(fn):
def gen_cached():
value = fn()
with mc_pool.reserve() as mc:
mc.put(key, value)
return value
def invoke():
with dogpile.acquire(gen_cached, get_value) as value:
return value
return invoke
return decorate
Above we can decorate any function as::
@cached("some key", 3600)
def generate_my_expensive_value():
return slow_database.lookup("stuff")
The Dogpile lock will ensure that only one thread at a time performs ``slow_database.lookup()``,
and only every 3600 seconds, unless Memcached has removed the value in which case it will
be called again as needed.
In particular, Dogpile's system allows us to call the memcached get() function at most
once per access, instead of Beaker's system which calls it twice, and doesn't make us call
get() when we just created the value.
Scaling Dogpile against Many Keys
----------------------------------
The patterns so far have illustrated how to use a single, persistently held
:class:`.Dogpile` object which maintains a thread-based lock for the lifespan
of some particular value. The :class:`.Dogpile` also is responsible for
maintaining the last known "creation time" of the value; this is available
from a given :class:`.Dogpile` object from the :attr:`.Dogpile.createdtime`
attribute.
For an application that may deal with an arbitrary
number of cache keys retrieved from a remote service, this approach must be
revised so that we don't need to store a :class:`.Dogpile` object for every
possible key in our application's memory.
The two challenges here are:
* We need to create new :class:`.Dogpile` objects as needed, ideally
sharing the object for a given key with all concurrent threads,
but then not hold onto it afterwards.
* Since we aren't holding the :class:`.Dogpile` persistently, we
need to store the last known "creation time" of the value somewhere
else, i.e. in the cache itself, and ensure :class:`.Dogpile` uses
it.
The approach is another one derived from Beaker, where we will use a *registry*
that can provide a unique :class:`.Dogpile` object given a particular key,
ensuring that all concurrent threads use the same object, but then releasing
the object to the Python garbage collector when this usage is complete.
The :class:`.NameRegistry` object provides this functionality, again
constructed around the notion of a creation function that is only invoked
as needed. We also will instruct the :meth:`.Dogpile.acquire` method
to use a "creation time" value that we retrieve from the cache, via
the ``value_and_created_fn`` parameter, which supercedes the
``value_fn`` we used earlier to expect a function that will return a tuple
of ``(value, created_at)``::
import pylibmc
import pickle
import os
import time
import sha1
from dogpile import Dogpile, NeedRegenerationException, NameRegistry
mc_pool = pylibmc.ThreadMappedPool(pylibmc.Client("localhost"))
def create_dogpile(key, expiration_time):
return Dogpile(expiration_time)
dogpile_registry = NameRegistry(create_dogpile)
def cache(expiration_time):
def get_or_create(key):
def get_value():
with mc_pool.reserve() as mc:
value = mc.get(key)
if value is None:
raise NeedRegenerationException()
# deserialize a tuple
# (value, createdtime)
return pickle.loads(value)
dogpile = dogpile_registry.get(key, expiration_time)
def gen_cached():
value = fn()
with mc_pool.reserve() as mc:
# serialize a tuple
# (value, createdtime)
value = (value, time.time())
mc.put(mangled_key, pickle.dumps(value))
return value
with dogpile.acquire(gen_cached, value_and_created_fn=get_value) as value:
return value
return get_or_create
Above, we use ``Dogpile.registry()`` to create a name-based "registry" of ``Dogpile``
objects. This object will provide to us a ``Dogpile`` object that's
unique on a certain name (or any hashable object) when we call the ``get()`` method.
When all usages of that name are complete, the ``Dogpile``
object falls out of scope. This way, an application can handle millions of keys
without needing to have millions of ``Dogpile`` objects persistently resident in memory.
The next part of the approach here is that we'll tell Dogpile that we'll give it
the "creation time" that we'll store in our
cache - we do this using the ``value_and_created_fn`` argument, which assumes we'll
be storing and loading the value as a tuple of (value, createdtime). The creation time
should always be calculated via ``time.time()``. The ``acquire()`` function
returns the "value" portion of the tuple to us and uses the
"createdtime" portion to determine if the value is expired.
Using a File or Distributed Lock with Dogpile
----------------------------------------------
The example below will use a file-based mutex using `lockfile <http://pypi.python.org/pypi/lockfile>`_.

View File

@@ -1,72 +1,7 @@
"""A "dogpile" lock, one which allows a single thread to generate
an expensive resource while other threads use the "old" value, until the
"new" value is ready.
Usage::
# store a reference to a "resource", some
# object that is expensive to create.
the_resource = [None]
def some_creation_function():
# create the resource here
the_resource[0] = create_some_resource()
def use_the_resource():
# some function that uses
# the resource. Won't reach
# here until some_creation_function()
# has completed at least once.
the_resource[0].do_something()
# create Dogpile with 3600 second
# expiry time
dogpile = Dogpile(3600)
with dogpile.acquire(some_creation_function):
use_the_resource()
Above, ``some_creation_function()`` will be called
when :meth:`.Dogpile.acquire` is first called. The
block then proceeds. Concurrent threads which
call :meth:`.Dogpile.acquire` during this initial period
will block until ``some_creation_function()`` completes.
Once the creation function has completed successfully,
new calls to :meth:`.Dogpile.acquire` will route a single
thread into new calls of ``some_creation_function()``
each time the expiration time is reached. Concurrent threads
which call :meth:`.Dogpile.acquire` during this period will
fall through, and not be blocked. It is expected that
the "stale" version of the resource remain available at this
time while the new one is generated.
The dogpile lock can also provide a mutex to the creation
function itself, so that the creation function can perform
certain tasks only after all "stale reader" threads have finished.
The example of this is when the creation function has prepared a new
datafile to replace the old one, and would like to switch in the
"new" file only when other threads have finished using it.
To enable this feature, use :class:`.SyncReaderDogpile`.
Then use :meth:`.SyncReaderDogpile.acquire_write_lock` for the critical section
where readers should be blocked::
from dogpile import SyncReaderDogpile
dogpile = SyncReaderDogpile(3600)
def some_creation_function():
create_expensive_datafile()
with dogpile.acquire_write_lock():
replace_old_datafile_with_new()
"""
from util import thread, threading
import time
import logging
from readwrite_lock import ReadWriteMutex
from nameregistry import NameRegistry
log = logging.getLogger(__name__)
@@ -80,36 +15,41 @@ class NeedRegenerationException(Exception):
NOT_REGENERATED = object()
class Dogpile(object):
"""Dogpile class.
"""Dogpile lock class.
:param expiretime: Expiration time in seconds.
Provides an interface around an arbitrary mutex that allows one
thread/process to be elected as the creator of a new value,
while other threads/processes continue to return the previous version
of that value.
"""
def __init__(self, expiretime, init=False):
self.dogpilelock = threading.Lock()
def __init__(self, expiretime, init=False, lock=None):
"""Construct a new :class:`.Dogpile`.
:param expiretime: Expiration time in seconds.
:param init: if True, set the 'createdtime' to the
current time.
:param lock: a mutex object that provides
``acquire()`` and ``release()`` methods.
"""
if lock:
self.dogpilelock = lock
else:
self.dogpilelock = threading.Lock()
self.expiretime = expiretime
if init:
self.createdtime = time.time()
else:
self.createdtime = -1
@clasmethod
def registry(cls, *arg, **kw):
"""Return a name-based registry of :class:`.Dogpile` objects.
The registry is an instance of :class:`.NameRegistry`,
and calling its ``get()`` method with an identifying
key (anything hashable) will construct a new :class:`.Dogpile`
object, keyed to that key. Subsequent usages will return
the same :class:`.Dogpile` object for as long as the
object remains in scope.
createdtime = -1
"""The last known 'creation time' of the value,
stored as an epoch (i.e. from ``time.time()``).
The given arguments are passed along to the underlying
constructor of the :class:`.Dogpile` class.
"""
return NameRegistry(lambda identifier: cls(*arg, **kw))
If the value here is -1, it is assumed the value
should recreate immediately.
"""
def acquire(self, creator,
value_fn=None,
@@ -129,8 +69,7 @@ class Dogpile(object):
lock. This option removes the need for the dogpile lock
itself to remain persistent across usages; another
dogpile can come along later and pick up where the
previous one left off. Should be used in conjunction
with a :class:`.NameRegistry`.
previous one left off.
"""
dogpile = self
@@ -214,11 +153,22 @@ class Dogpile(object):
pass
class SyncReaderDogpile(Dogpile):
"""Provide a read-write lock function on top of the :class:`.Dogpile`
class.
"""
def __init__(self, *args, **kw):
super(SyncReaderDogpile, self).__init__(*args, **kw)
self.readwritelock = ReadWriteMutex()
def acquire_write_lock(self):
"""Return the "write" lock context manager.
This will provide a section that is mutexed against
all readers/writers for the dogpile-maintained value.
"""
dogpile = self
class Lock(object):
def __enter__(self):

View File

@@ -10,6 +10,8 @@ class NameRegistry(object):
class MyFoo(object):
"some important object."
def __init__(self, identifier):
self.identifier = identifier
registry = NameRegistry(MyFoo)
@@ -19,24 +21,52 @@ class NameRegistry(object):
# thread 2
my_foo = registry.get("foo1")
Above, "my_foo" in both thread #1 and #2 will
be *the same object*.
Above, ``my_foo`` in both thread #1 and #2 will
be *the same object*. The constructor for
``MyFoo`` will be called once, passing the
identifier ``foo1`` as the argument.
When thread 1 and thread 2 both complete or
otherwise delete references to "my_foo", the
object is *removed* from the NameRegistry as
otherwise delete references to ``my_foo``, the
object is *removed* from the :class:`.NameRegistry` as
a result of Python garbage collection.
:class:`.NameRegistry` is a utility object that
is used to maintain new :class:`.Dogpile` objects
against a certain key, for as long as that particular key
is referenced within the application. An application
can deal with an arbitrary number of keys, ensuring that
all threads requesting a certain key use the same
:class:`.Dogpile` object, without the need to maintain
each :class:`.Dogpile` object persistently in memory.
"""
_locks = weakref.WeakValueDictionary()
_mutex = threading.RLock()
def __init__(self, creator):
"""Create a new :class:`.NameRegistry`.
:param creator: A function that will create a new
value, given the identifier passed to the :meth:`.NameRegistry.get`
method.
"""
self._values = weakref.WeakValueDictionary()
self._mutex = threading.RLock()
self.creator = creator
def get(self, identifier, *args, **kw):
"""Get and possibly create the value.
:param identifier: Hash key for the value.
If the creation function is called, this identifier
will also be passed to the creation function.
:param \*args, \**kw: Additional arguments which will
also be passed to the creation function if it is
called.
"""
try:
if identifier in self._values:
return self._values[identifier]

View File

@@ -4,7 +4,17 @@ except ImportError:
import dummy_threading as threading
class ReadWriteMutex(object):
"""A mutex which allows multiple readers, single writer."""
"""A mutex which allows multiple readers, single writer.
:class:`.ReadWriteMutex` uses a Python ``threading.Condition``
to provide this functionality across threads within a process.
The Beaker package also contained a file-lock based version
of this concept, so that readers/writers could be synchronized
across processes with a common filesystem. A future Dogpile
release may include this additional class at some point.
"""
def __init__(self):
# counts how many asynchronous methods are executing
@@ -17,6 +27,7 @@ class ReadWriteMutex(object):
self.condition = threading.Condition(threading.Lock())
def acquire_read_lock(self, wait = True):
"""Acquire the 'read' lock."""
self.condition.acquire()
try:
# see if a synchronous operation is waiting to start
@@ -37,6 +48,7 @@ class ReadWriteMutex(object):
return True
def release_read_lock(self):
"""Release the 'read' lock."""
self.condition.acquire()
try:
self.async -= 1
@@ -55,6 +67,7 @@ class ReadWriteMutex(object):
self.condition.release()
def acquire_write_lock(self, wait = True):
"""Acquire the 'write' lock."""
self.condition.acquire()
try:
# here, we are not a synchronous reader, and after returning,
@@ -91,6 +104,7 @@ class ReadWriteMutex(object):
return True
def release_write_lock(self):
"""Release the 'write' lock."""
self.condition.acquire()
try:
if self.current_sync_operation is not threading.currentThread():