tag:blogger.com,1999:blog-87606401981360262122024-03-13T16:56:50.914-07:00ZevilsMore fun than a gallon of strawberries.Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comBlogger18125tag:blogger.com,1999:blog-8760640198136026212.post-54126643341528387332021-12-03T10:47:00.000-08:002021-12-03T10:47:32.119-08:00Quick Ranked-Choice Elections with Google FormsWant to run a really quick ranked-choice election, like "which restaurant should we go to" or "where should we ask the city to build a crosswalk" ? See <a href="https://forms.gle/hsLK2DeLmY8nocfW6">here</a> for an example:
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEha9cNZjWwh4hQc9FWSRbJISyBdxyD1Mayoidw9PQtJD03yTfx4qD-Gdfot2-8bJ-uMe3trvSZ8AkKfPtqtmPqZgdWQhb6Gr6id-FeqTXMoTq9KZO-NQyjZz0eqvaMh7zLKWwIISBbn3sA/s652/Screenshot+2021-12-03+102243.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="320" data-original-height="439" data-original-width="652" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEha9cNZjWwh4hQc9FWSRbJISyBdxyD1Mayoidw9PQtJD03yTfx4qD-Gdfot2-8bJ-uMe3trvSZ8AkKfPtqtmPqZgdWQhb6Gr6id-FeqTXMoTq9KZO-NQyjZz0eqvaMh7zLKWwIISBbn3sA/s320/Screenshot+2021-12-03+102243.png"/></a></div>
Here's one way to do it:
<ol>
<li><a href="https://forms.new/">Create a new Google Form.</a></li>
<li>In the form description, explain each of the choices.</li>
<li>Add a "multiple choice grid" question.</li>
<li>In the "rows" of the question, add one row for each choice: "Chocolate", "Vanilla", etc.</li>
<li>In the "columns" of the question, add a "rank number" for each choice: "1st", "2nd", etc.</li>
<li>In the "three dots" menu at the bottom-right of the question, turn on "limit to one response per column": <div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJNKpJYLZWXM_GNDftm9steP65zkOg28gYEWJZYGlGEJcJ0eLavV1BDQXXLbneGo3ohQZoER1gxx0CAInAy4IlzhhRaF9NOB1pV7f5Y7_S9WHik54_T4Ygvl5kDF7FbRI0mB0QQsYwqGg/s724/Screenshot+2021-12-03+102835.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="200" data-original-height="255" data-original-width="724" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJNKpJYLZWXM_GNDftm9steP65zkOg28gYEWJZYGlGEJcJ0eLavV1BDQXXLbneGo3ohQZoER1gxx0CAInAy4IlzhhRaF9NOB1pV7f5Y7_S9WHik54_T4Ygvl5kDF7FbRI0mB0QQsYwqGg/s200/Screenshot+2021-12-03+102835.png"/></a></div></li>
<li>Send out the form and wait for people to vote.</li>
<li>Once the votes are in, go to the "Response" tab of the form and export the ballots to a CSV using the option under the "three dots" menu: <div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwvEPffKQTazofDTiDhyvCOOdJPhkebOYg-yLOLNi8Leq5mrLBpSty6IXslpuoUpNfV04V27zGMJ66U7VKRshgNSEqQFKik88Hv51mu8hNZGGIMOeWizSbcQf4vzAlrxtQCV5amtCg9YY/s820/Screenshot+2021-12-03+104430.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="320" data-original-height="273" data-original-width="820" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwvEPffKQTazofDTiDhyvCOOdJPhkebOYg-yLOLNi8Leq5mrLBpSty6IXslpuoUpNfV04V27zGMJ66U7VKRshgNSEqQFKik88Hv51mu8hNZGGIMOeWizSbcQf4vzAlrxtQCV5amtCg9YY/s320/Screenshot+2021-12-03+104430.png"/></a></div></li>
<li>Download <a href="https://github.com/matthewg/Zevils/tree/main/ballots.py">ballots.py</a> and <tt>pip install pyrankvote</tt>.</li>
<li>Adjust <tt>NUMBER_OF_SEATS</tt> in ballots.py to be the number of candidates you want to elect, e.g. how many flavors are you going to buy?</li>
<li>Unzip the ballot CSV run ballots.py with the CSV on standard input: <tt>./ballots.py < ballots.csv</tt></li>
</ol>
Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-69780646063141329662021-09-08T16:19:00.006-07:002021-09-08T16:31:01.521-07:00Maslow's Hierarchy of Engineering Team Needs<blockquote><em>Management is the continuation of prioritization by other means.</em> —Carl von Clausewitz</blockquote>
<p><i><a href="https://en.wikipedia.org/wiki/Maslow%27s_hierarchy_of_needs">Maslow's Hierarchy of Needs</a></i> is the idea that people have an inherent and universal set of priorities. If you don't have enough air to breathe or water to drink, you'd better prioritize solving that problem, or you won't be around for very long to solve any other problems. Once you have that sorted out, you can focus on loftier goals, such as "feeling loved", "experiencing beauty", and "living a meaningful life". Software engineering teams have a similar natural hierarchy, and if you're leading one of them, thinking about the lowest point in the hierarchy where your team is struggling is a good way to decide how to invest your time.</p>
<p><em>This post is also available as a <a href="https://twitter.com/mattsachs/status/1435747223996755969">Twitter thread</a>.</em></p>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjG44mc7Ud06W9KIqsVFDcfyjf4MtzaetIgQfZwp54_SSsEMTGDEPtB03AtEt4uYy5z9UfMX7jgkXsxc23xZQLJ9vpFKEMRgr0YK53wSeOV2mWGyNJ5Vvb1kOgXmH0V-8qjJmHRw2Z_Xj4/s0/Maslows+Hierarchy.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" data-original-height="547" data-original-width="593" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjG44mc7Ud06W9KIqsVFDcfyjf4MtzaetIgQfZwp54_SSsEMTGDEPtB03AtEt4uYy5z9UfMX7jgkXsxc23xZQLJ9vpFKEMRgr0YK53wSeOV2mWGyNJ5Vvb1kOgXmH0V-8qjJmHRw2Z_Xj4/s0/Maslows+Hierarchy.png"/></a></div>
As the overriding goal of life is to maximize the spread of its genes and ideas, the overriding goal of an engineering team is to maximize the value it creates for its users. Considering the sub-goals on the way to achieving this from lowest to highest priority:
<h1>Existence</h1>
Congratulations, you have an idea, or an organizational charter, or some other mandate to do something! If you don't have a team to do it with, you're probably not going to do that something. Maybe that team is just you, maybe it's scads of highly specialized non-you people... But a team of zero is not much of a team at all.
<h1>Culture</h1>
If your team is a terrible place to work, nobody's going to work there for long, and those who do aren't going to deliver very good work. Create an environment where people can succeed, where it's <a href="https://en.wikipedia.org/wiki/Psychological_safety">"safe to take risks</a>, and so on. If you want people to accomplish anything, create the conditions that enable it.
<h1>Engineering Velocity</h1>
<a href="https://www.poetryfoundation.org/poems/46565/ozymandias">Look upon thy works ye mighty and despair,</a> you have a team of more than zero people, and they're trying to write software instead of stabbing each other in the back and hunting for a less-crappy job! Are they successfully writing software? If it's impossible to get anything done because you have no tests, or your codebase is sued by Olive Garden for theft of trade secrets, or your documentation aspires to <em><a href="https://lithub.com/finnegans-wake-at-80-in-defense-of-the-difficult/">Finnegan's Wake</a></em> levels of clarity or <em><a href="https://en.wikipedia.org/wiki/The_Winds_of_Winter">The Winds of Winter</a></em> levels of existence... your team is not an effective deliverer of value for your users.
<h1>Reliability</h1>
Yay, you can write software! I tried to give you a medal, but you were busy fighting production fires. And, oddly, I couldn't find a customer to write a testimonial, in spite of the large numbers of them amassed outside your headquarters with torches and pitchforks. (Next time you're going to leak and delete all of their data, try not to do it in that order.) Unreliable software doesn't deliver much value.
<h1>Market Exists</h1>
Ok, now we're actually getting somewhere. You have a team, they're writing software and doing it well. Does anyone care? Unless you're solving a problem that someone actually has, the answer is no.
<h1>Product-Market Fit</h1>
You're attempting to solve a problem that people actually have. Does your solution actually solve the problem? If so, you've achieved the vaunted <a href="https://twitter.com/shreyas/status/1426594663671107585">product-market fit</a>. If not, congratulations on identifying a problem that needs solving, but your value comes from solving it, and you're not there yet.
<h1>Out-Compete</h1>
If you're effectively solving an important problem, but someone else is solving it better, why would anyone use <em>your</em> solution? If the answer is "they don't", then you're not actually creating value for anybody.
<h1>Delight</h1>
If you do this well, people will be <em>happy</em> that they're using your software. They'll want to use more of it, they'll want to tell other people to use it, and so on. And so more people will use it. How can you fail here if you're doing everything else on this list? Maybe your solution works pretty well but is awkward to use. Maybe it has a price or licensing terms or other cost that people will only just barely tolerate. Maybe your reputation or that of your company is terrible, so folks <em>hate</em> that your solution is the best option for them.Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-1047079689580297232011-06-23T19:59:00.000-07:002011-06-23T19:59:49.355-07:00Act IV, Scene I<span class="Apple-style-span" style="border-collapse: collapse; font-family: arial, sans-serif; font-size: 13px;"></span><br />
<div class="im" style="color: #500050;">
<blockquote class="gmail_quote" style="border-left-color: rgb(204, 204, 204); border-left-style: solid; border-left-width: 1px; margin-bottom: 0px; margin-left: 0.8ex; margin-right: 0px; margin-top: 0px; padding-left: 1ex;">
- what wizardry is this!? svn supports symlinks?</blockquote>
<div>
<br /></div>
</div>
<div>
<span style="font-family: Times; font-size: small;"><i>A dark Filesystem. In the middle, a Repository boiling. Thunder.</i><br />
<i>Enter the three</i> Programmers.<br />
1 P<span>ROGRAMMER. </span> Thrice the padded buf hath oe'r runn'd.<br /> 2 <span><span style="font-size: small;">P<span>ROGRAMMER</span></span>. </span> Thrice and once, the platter spun.<br /> 3 <span><span style="font-size: small;">P<span>ROGRAMMER</span></span>. </span> Hexate cries:—'tis time! 'tis time!<br /> 1 <span><span style="font-size: small;">P<span>ROGRAMMER</span></span>. </span> Round about the repo go;<br /> In spaghetti'd source code throw.—<br /> Functors, that on blackest ARM,<br /> Caused a user grievous harm;<br /> Refactor'd business logic got,<br /> Compile first i' the charmed pot!<br /> A<span>LL. </span> Double, double toil and trouble;<br /> Cycles burn, and repo bubble.<br /> 2 <span><span style="font-size: small;">P<span>ROGRAMMER</span></span>. </span> Mock-up of an inode struct,<br /> In the repo run amock;<br /> Superblock, corrupt extent,<br /> File flat, and file bent,<br /> Meta data, magic prop,<br /> B-tree hash, and sign that's dropped,—<br /> For VCS of powerful trouble,<br /> Like a hell-broth boil and bubble.<br /> A<span>LL. </span> Double, double toil and trouble;<br /> Cycles burn, and repo bubble.<br /> 3 <span><span style="font-size: small;">P<span>ROGRAMMER</span></span>. </span> Shard of cluster; meg of RAM;<br /> ASIC ripped from Don Knuth's pram;<br /> Recursive matrix transform hack;<br /> Toggle switch that won't switch back;<br /> Heap address that ain't been writ,<br /> Yet has a quine contained in it;<br /> NSA encryption key;<br /> Source code for an AI bee;<br /> Lambda of Alonzo Church<br /> Found by Turing's A* search;—<br /> Document this noxious gruel<br /> With Microsoft's new WinWord tool.<br /> A<span>LL. </span> Double, double toil and trouble;<br /> Cycles burn, and repo bubble.<br /> 2 <span><span style="font-size: small;">P<span>ROGRAMMER</span></span>. </span> Cool it with a peltier,<br /> Then the code can ship, hooray!<br />
<div>
<br /></div>
</span></div>Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-11219653830288755202010-11-14T08:09:00.000-08:002011-06-04T09:52:45.527-07:00Multithreaded Python, extensions, and static data<h3>The GIL</h3><h4>The GIL and Context Switching</h4><p>I use <a href="http://wiki.python.org/moin/boost.python">Boost.Python</a> to write some C++ extensions for Python. I work on an I/O-bound Python program; Python has a <a href="http://wiki.python.org/moin/GlobalInterpreterLock">global interpreter lock</a> ("the GIL") which means that in a multi-threaded program, only one thread can be executing code inside the Python interpreter at once. Now, a thread can drop the GIL, and the built-in Python read and write routines do this so that while one thread is doing I/O, another thread can run. However, due to a peculiarity in how the GIL is implemented,<sup><a href="#gil-fn-1" id="gil-fnref-1">1</a></sup> even though the actual I/O takes place during system calls that drop the GIL, the need to re-acquire the GIL after every I/O operation was killing our performance.</p><p>For instance, one of the things that the application does a lot of is logging. Doing the logging synchronously -- as the code is executing and it wants to write something to its log, it needs to wait for the write to the logfile to finish before it can continue going about its business -- turned out to be a bottleneck. My first attempt to do something about that was to spawn off a separate thread for the logfile, and have logging look something like this:</p>
<pre lang="python">
class Logger:
def __init__(self):
self._buffer = []
self._file = open("foo.log", "w")
self._hasMessages = threading.Condition()
self._lock = threading.Lock()
def writeEntriesForever(self):
while True:
with self._lock:
while len(self._buffer) == 0:
self._hasMessages.wait()
messages = self._buffer
self._buffer = []
self._file.write("".join(messages))
def log(self, message):
with self._lock:
self._buffer.append(message)
self._hasMessages.notify()
def main():
logger = Logger()
t = threading.Thread(target = logger.writeEntriesForever)
t.start()
logger.log("foo")
logger.log("bar")
</pre>
<p>
That did get the I/O off of the main thread. However, while the <code>write</code> call in <code>Logger.writeEntriesForever</code> would make the logging thread drop the GIL, allowing the main thread to continue executing, the logging thread would need to reacquire the GIL when <code>write</code> returned. Now, it'd drop the GIL <em>again</em> when while <code>wait</code>ing, which is where the thread would spend most of its time, but then it'd need to acquire the GIL again between the end of <code>wait</code> and the start of <code>write</code>. All of these context switches were almost completely negating any performance win from offloading the actual I/O to a separate thread.
</p>
<h4>Boost.Python</h4>
<p>
Enter Boost.Python. The GIL is only needed when using the Python interpreter, so if the entire body of <code>writeEntriesForever</code> doesn't <em>need</em> the interpreter, the thread can drop the GIL as soon as it enters that method and never reacquire it. This means writing that method in some language other than Python, which is what Boost.Python makes it easy to do.
</p><p>
The way Boost.Python works, you wind up compiling and linking your C++ code into a dynamic library, and that library is a Python extension. In the example above, the new Python code would look like:
</p>
<pre lang="python">
from logging_extension import Logger
def main():
logger = Logger()
t = threading.Thread(targer=logger.writeEntriesForever)
t.start()
logger.log("foo")
logger.log('bar")
</pre>
<p>
and then you'd have C++ code that would look something like:
</p>
<pre lang="c++">
#include <iostream>
#include <fstream>
#include <list>
#include <memory>
#include <string>
#include <boost/python.hpp>
#include <boost/threading.hpp>
class ScopedGILRelease {
// The GIL will be released when an instance of this class goes in-scope
// and reacquired when it goes out of scope.
public:
inline ScopedGILRelease() { m_thread_state = PyEval_SaveThread(); }
inline ~ScopedGILRelease() {
PyEval_RestoreThread(m_thread_state);
m_thread_state = NULL;
}
private:
PyThreadState *m_thread_state;
};
class Logger {
public:
Logger();
~Logger();
void log(boost::python::str message);
void writeEntriesForever();
private:
typedef std:list<std::string> LogBuffer;
std::auto_ptr<LogBuffer> buffer;
std::filebuf fb;
std::ostream file;
boost::condition hasMessages;
boost::mutex mutex;
};
Logger::Logger() {
fb.open("foo.log", ios::out);
file = std::ostream(&fb);
buffer = std::auto_ptr<LogBuffer>(new LogBuffer);
}
void Logger::log(boost::python::str message) {
ScopedGILRelease noGIL; //Drop GIL before acquiring mutex to avoid deadlock.
boost::scoped_lock lock(mutex);
buffer.push_back(boost::python::extract<std::string>(message);
hasMessages.notify_one();
}
void Logger::writeEntriesForever() {
ScopedGILRelease noGIL;
while(true) {
std::auto_ptr<LogBuffer> messages;
{
boost::scoped_lock lock(mutex);
while(!buffer.size()) hasMessages.wait();
messages = buffer;
}
for(LogBuffer::iterator i = messages->begin(); i != messages->end(); i++) {
file << *i;
}
}
}
BOOST_PYTHON_MODULE(logging_extension)
{
using namespace boost::python;
class_<Logger, boost::noncopyable>("Logger")
.def("log", &Logger::log)
.def("writeEntriesForever", &Logger::writeEntriesForever)
;
}
</pre>
<p>
<h3> Python Extensions and Static Data</h3>
<p>
So, that all worked fine and dandy until I did three things.
</p>
<h4>Multiple Extensions</h4>
<p>
The first thing that caused a problem is I also decided to move the code for interacting with Oracle into a Boost.Python extension. For the usual reasons, I didn't want to have that code <em>and</em> the logging code in one big honking library of doom, so I put it in its own extension; there was now <code>logging.so</code> and <code>oracle.so</code>.
</p>
<h4>Static Data</h4>
<p>
The second thing that caused a problem is that our logging code is actually more complicated above. We have a syslog-like framework where there are different categories of log message, and the app can be configured so that different categories have different log levels. There are a lot of <code>LOG_DEBUG</code> statements in the application, but if none of the logging categories are configured to be that verbose, those statements will never actually make it into the log.
</p><p>
Since logging settings are application-wide, and it'd be ugly to have to pass around a "logging state" object everywhere (or for that matter, an instance of <code>Logger</code>), I used static data for that:
</p>
<pre lang="c++">
static std::map<destination, Logger> theLoggers; //Map logging destination to the logger object.
static std::map<category, level> theSettings; //Map logging category to its log level.
</pre>
<h4>Using One Extension From Another</h4>
<p>
The third thing was that I wanted to actually log things from inside the database interaction code. Simply including the logging headers from inside the DB sources didn't work:<
</p>
<pre lang="c++">
#include "../logging/logging.hpp"
logging::doLog(logging::LEVEL_DEBUG, logging::CATEGORY_DATABASE, "Hello, world!");
</pre>
<p>
It would compile, but it wouldn't run because the symbols from logging.so were unresolved. Okay, easy enough to fix. I added <code>-l:logging.so</code> to the link line for <code>oracle.so</code> and went about my merry business.
</p>
<h4>Symbol Visibility</h4>
<p>
This looked like it worked, but none of the messages from <code>oracle.so</code> were actually making it into the log! I thought I must be doing something threading-related incorrectly, or something. But, eventually, while debugging in GDB I noticed an odd message.
</p>
<pre>
(gdb) b
Breakpoint 1 at 0x1234567: file logging.cpp, line 6. (2 locations)
</pre>
<p>
2 locations? Oh. Well, that was the problem. To load the dynamic library at runtime on Linux, Python uses the <code>dlopen</code> function. The documentation for dlopen mentions, in the description of the <code>flag</code> argument, that the <code>RLD_LOCAL</code> flag (the default) means that "symbols defined in this library are not made available to resolve references in subsequently loaded libraries." This meant that when Python loaded <code>oracle.so</code>, ld.so would map in a new copy of <code>logging.so</code> (because <code>oracle.so</code> was linked to it), ignore the copy pulled in when Python did <code>dlopen("logging.so", RTLD_LOCAL);</code> . This meant that when Python routines called <code>logging</code> functions, they got one copy of the static data, while when <code>oracle</code> routines called <code>logging</code> functions, they got their own copy of the static data! So, the database code wasn't seeing any of the logging settings changes made from the Python code.
</p><p>
My solution was to stop linking <code>oracle.so</code> against <code>logging.so</code>, and to create a new <code>pre_c_import.py</code>, with a dire warning in a comment that before importing any Boost.Python extensions, one <em>must</em> import this file:
</p>
<pre lang="python">
###---*** IMPORTANT! Before importing any Boost.Python modules, you *must*
###---*** import this, e.g.:
###---*** import pre_c_import
###---*** import c_extensions.logging
###---*** import c_extensions.oracle
###---*** If you don't, your import will fail with unresolved symbol errors.
###---*** Whenever you add a new extension module, you should add it to
###---*** the import below.
import sys, DLFCN
current = sys.getdlopenflags()
sys.setdlopenflags(current | DLFCN.RTLD_GLOBAL)
import c_extensions.logging
import c_extensions.oracle
sys.setdlopenflags(current)
</pre>
<p><code>RTLD_GLOBAL</code> is the opposite of <code>RLTD_LOCAL</code>, so now the extensions were able to see each other's symbols, and everything was happy.</p>
<div class="footnotes">
<hr />
<ol>
<li id="gil-fn-1"><p>Dave Beazley has <a href="http://dabeaz.blogspot.com/2010/01/python-gil-visualized.html">a good explanation</a> and <a href="http://blip.tv/file/2232410">hour-long video</a> describing the issues. <a href="#gil-fnref-1">↩</a></p></li>
</ol>
</div>Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-66207158791809721372009-09-16T06:21:00.000-07:002011-06-04T09:59:26.915-07:00Sloppy Graph, Sloppy Design<p>I was spending time trying to fine-tune a <a href="http://www.graphviz.org/">graphviz</a> file documenting the call graph of a piece of code and describing some of the critical functions. graphviz isn't really designed for the kind of long node labels I wanted to give it, so it would do things like put nodes in places which made it have to draw arrows reaching clear across the page.<br/><br/>Finally I realized that rather than trying to talk graphviz into reordering its nodes, I could just refactor the thing I was graphing so that the flow wasn't so darned convoluted in the first place.<br/><br/>Before (image links to full size version):<br/><br/><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixtHPZ7FL2b4-rDyRWztL51vUCOXiTEAjtv4Ukq-32QpVcfJMsXQvC2JNKHicyhZefwz06tedUVG_f5Pqjf_n7QMm7tbroKYR__uhNok33NMd0k1gi8t9mMRtF1sB2vQmvukX5UXC3D2M/s400/calls-old-greek.png
"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiPhXiT1wcf89FamMD5FBDSjyeFKuHrxupgSbr2j2M3p5KYoHpP7Bo5sRneCQ1mRH_XOu-EkVtUYcr8gO8wygbBVASUEkMg9R8Z1JccWW1ByPiKaxx4LLQXAwwAuFi8fimcwjy8D-OO3iU/s400/calls-old-greek-300x227.png" alt="calls-old-greek" title="calls-old-greek" width="300" height="227" class="alignnone size-medium wp-image-102" /></a><br/><br/>After (image links to full size version):<br/><br/><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbkvUE8dpNZ-2w2GOslpgZ0DUp3fNyt63o53nyMFrU8uSdQy1z-U_7lkGLC4Qo5uFXRT4EuR3dES3Ehp4Bj6E5c_LVMOo9JvZeC8CFb5nfiiYhFXqbJdY_4hwkjBDb2nVmv3dn2dQSYkk/s400/calls-new-greek.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjakc4YoitrYfXq0SjOR5VbiCxnbZtq3zrcbmJaHYBFQZ-QMgaksM9V85M-VAy5CPnUmKDsCgYvjb0w-7-Wy47I74YVgBwan4yW9YTwGumM4zEJUHE0MhTCZh9JxkVo39U5VY7SGtM8qJ4/s400/calls-new-greek-300x131.png" alt="calls-new-greek" title="calls-new-greek" width="300" height="131" class="alignnone size-medium wp-image-103" /></a><br/><br/>Corollary? If it's hard to get your call flow graph to look pretty, well, the graph isn't the only thing that's ugly...</p><br />Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-58146213047158820342008-12-22T01:54:00.001-08:002012-09-11T20:40:08.550-07:00An Avocado in the Snow<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbJ3bw-mJHt-5YvLdTL9-HZd6chbOj2q1kUakp0slnxfQGnHnyy2n8rp12eHYsIwSTeWT599Tumg-F83dLjROkMwXZsWIKjTJQIJebOqetAkk6kICjaypJ9h1JJVLyonn7YztCMDqMT9c/s400/avocado.jpg"><img alt="An avocado" class="size-medium wp-image-90" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIqpHBH60D4vrpD6Eh4hJpM1Yh27zqrdn2Fq1GFXXl0TU3c14K7fxVSMFcVw6tEdj2jfwqYME5amr-PSef6BkVL6ySPE0mbv9RjPrk0Qyq3WYFQrDJB7pxp3AthyphenhyphenB93G79iNG85HPCRWk/s400/avocado-225x300.jpg" title="An avocado, found in the snow near Portland St and Broadway, Cambridge, MA" width="225" /></a><br /><em>An avocado, found in the snow near Portland St and Broadway, Cambridge, MA</em><br /><br />An avocado in the snow. <br />Who left it there? I do not know. <br />Not Father, Son, nor Ghost so holy, <br />Rebirths you into guacamole. <br />Did leaping from some wretched fate <br />Allow you to feel special, great? <br />Or did, cast down like ancient foe, <br />You weep from terror, weep from woe? <br />But lie here now, near Portland Street, <br />And rest, green flesh and tasty meat.<br />
<br />Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-91712826865523941212008-01-05T17:59:00.001-08:002011-06-04T10:12:59.353-07:00Switching Finks<p>One of the open-source projects I contribute to is
<a href="http://finkproject.org/">Fink</a>, a package manager for OS X; if you've
used <code>apt-get</code> or <code>yum</code> on Linux, it provides a similar facility,
allowing you to install, say, GnuPG by running <code>fink install gnupg</code>.
It installs things into its own directory tree, rooted at <code>/sw</code> by
default, to avoid interfering with things shipped by Apple (<code>/</code>,
<code>/usr</code>) or manually installed by the user (<code>/usr/local</code>.) That is, if
you have Fink installed, your system will have <code>/sw/bin</code>, <code>/sw/lib</code>,
<code>/sw/etc</code>, <code>/sw/share/man</code>, &c.
</p><p>
So that you can run things installed in these nonstandard locations,
Fink provides some shell commands in <code>/sw/bin/init.sh</code> which edit
environment variables like <code>PATH</code> and <code>MANPATH</code> to include the <code>/sw/*</code>
directories. Most Fink users have <code>. /sw/bin/init.sh</code> in their
<code>~/.profile</code>, so these commands will be invoked when their shell
starts.
</p><p>
Having my shell automatically pull in Fink at startup doesn't work for
me, though. It's important to me to have a clean environment
available. For instance, when I'm contributing to non-Fink open-source projects, trying to help someone who doesn't have Fink installed troubleshoot something, or submitting a bug report for a program that interacts with other programs where I have the Fink version installed, but Apple ships a different version with the system. (Note that this is only an issue if program A interacts with program B by <em>invoking it as a standalone process</em> without using an absolute path.)
</p><p>
Also, as a Fink developer, I actually have multiple Fink
installations at different paths, and I only want one loaded at a
time; I don't want to activate <code>/Volumes/SandBox/fink/dev-sw</code> in an
environment where <code>/Volumes/SandBox/fink/sw</code> has already been pulled
in!
</p><p>
It's much easier to pull Fink stuff in later when I
need it than to undo the changes that <code>/sw/bin/init.sh</code> makes to my
environment. My solution for making it easy to activate a particular Fink installation was to add the following to <code>~/.bashrc</code>:
</p>
<pre lang="bash" lineno="1">
if [ -n "$SW" ]
then export CFLAGS="-I$SW/include"
export LDFLAGS="-L$SW/lib"
export CXXFLAGS="$CFLAGS"
export CPPFLAGS="$CXXFLAGS"
export ACLOCAL_FLAGS="-I \"$SW/share/aclocal\""
export PKG_CONFIG_PATH="$SW/lib/pkgconfig"
export PS1="[$SW_DISPNAME \\W@$(hostname -s)]\\\$ "
. "$SW/bin/init.sh"
export PATH=~/bin:"$PATH"
fi
</pre>
<p>
What this does is arranges it so that if I start a new shell with <code>SW</code>
and <code>SW_DISPNAME</code> set, it'll pull in the Fink installation rooted at
the directory <code>$SW</code> and put <code>$SW_DISPNAME</code> in my shell prompt so that
I can see which environment I'm using. The extra environment
variables before <code>. $SW/bin/init.sh</code> set things up so that if I
compile things by hand, they'll find and link against Fink-installed
libraries; the <code>PATH</code> setting at the end is because <code>init.sh</code> places
the Fink bin directory at the front of the <code>PATH</code>, and I want my
personal bin directory to come before it.
</p><p>
I run the following script (saved as <code>~/bin/finkinit</code>) when I want to
pull in Fink:
</p><pre>
#!/bin/bash
FINK=${1:-main}
case "$FINK" in
main)
SW=/Volumes/SandBox/fink/sw
SW_DISPNAME="fink"
;;
dev)
SW=/Volumes/SandBox/fink/dev-sw
SW_DISPNAME="fink-dev"
;;
*) echo "Unknown fink install '$FINK'" >&2 ; exit 1
esac
export SW SW_DISPNAME
exec /bin/bash
</pre>
<p>
This gives me a subshell with Fink turned on, which I can exit out of
when I want to return to a clean environment. If I run it as <code>finkinit</code>, I get my main Fink installation, or I can run <code>finkinit dev</code> to get an alternate Fink.
</p><p>Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-70535921649397643762008-01-01T15:59:00.001-08:002011-06-02T19:31:49.962-07:00For All Your Finger-Pointing Needs<p>While working with a large codebase, I often want to find the origin of a particular line. Subversion offers a tool, <code>annotate</code> (aka <code>blame</code>, aka <code>praise</code>), which displays the author and revision for every line in a file, indicating who made the last change to a line. However, the last change is often not very useful; it was a minor change as a result of some other change you're not interested in, or the code was moved around due to refactoring, and you need to go back even further.<br/><br/>When I need to do this, I find myself doing a sequence of:<br/><br/>1. <tt>svn blame <i>FILE</i> | less</tt>; find the revision <em>N</em> where the line was last changed<br/>2. <tt>svn log -r<i>N</i> <i>FILE</i> | less</tt>; if the change is interesting, read the commit log for the file<br/>3. <tt>svn blame <i>FILE</i>@<i>N-1</i> | less</tt>; using Subversion's little-known pinned revision syntax, find the previous time the line was changed<br/>4. Using <em>N-1</em> as the new <em>N</em>, return to step 2.<br/><br/>: Pretty much any Subversion command that takes a path argument can be given <tt><i>PATH</i>@<i>REVISION</i></tt> instead to use the version of the path at a particular revision. This is great for <code>diff</code> and <code>cat</code> as well as <code>blame</code>. I use it for working with deleted files and branches and diffing a branch against trunk.<br/><br/>I've put together a rough version of a tool to make this easier; it's at <code>/trunk/blamegame</code> in my repository, which is <a href="viewvc.cgi/trunk/blamegame">here</a> for browsing with ViewVC, or it can be checked out with <tt>svn co <a href="http://zevils.com/svn/trunk/blamegame">http://zevils.com/svn/trunk/blamegame</a> blamegame</tt> . It still needs some fine-tuning and documentation, but invoke it like <tt>blamegame <i>FILE</i> <i>LINE</i></tt> (where <em>FILE</em> is a URL or the path to a file in a Subversion working copy) to start looking at a particular line of a file. You can navigate and search the file using a <code>less</code>-like interface. To drill down to the previous change to a line, hit <code>r</code> and then enter the line number. <code>l</code>, <code>o</code>, <code>n</code>, and <code>m</code> switch between viewing the commit log, the changed parts of the old file, the changed parts of the new file, and (the default) the diff. If you need to change the path you're looking at (for instance, to jump inside a branch), use the <code>p</code> command. <code>h</code> will show the available commands.<br/><br/>Let me know what you think.</p><br />Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-14003432547691404572007-12-31T07:12:00.000-08:002011-06-04T10:50:51.856-07:00Wrong Dates in iCal Birthday Calendar<p>To keep track of people's birthdays, I use Mac OS X's <a href="http://www.unseemlyraptor.com/2007/11/19/address-book-as-birthday-calendar/">Birthday Calendar feature</a> of Address Book/iCal. I was going through my calendar the other day, and I noticed that a birthday which I knew was sometime in January wasn't showing up. It was on the corresponding Address Book contact, though. I deleted the birthday from this contact and reentered it, which fixed that entry, but on the suspicion that more birthdays might be missing, I flipped through my calendar and found:
</p><p>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-2FNhmA1xn8VmWmP8MosnLHeKhk3L4jO4cr6krW4J8uQIIX3sfQSnjcpPMMuQcyHa0f5sE4jsGHyPxu5kCuP_pp2cQYfzxMEhrYNXrF6NjaLqyCSg2DT7dyWvp8KHbH9UCXnhuCjBwi8/s400/ical-wrong-1.png" width="381" height="137" alt="Address Book says Mar 23, iCal says Mar 21">
</p><p>
The Address Book birthday field has the misfeature that it forces a year to be specified. What a rude thing for Address Book to be asking! Anyway, I'd arbitrarily picked year 1 for the year for any contacts whose birth years I didn't know. Maybe, I thought, the <a href="http://en.wikipedia.org/wiki/Gregorian_calendar#Gregorian_reform">Gregorian reform</a> was throwing things off. However, changing the year to 1900 didn't help matters, and in fact made them worse:
</p><p>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgy9aPn_PotaUvUBPQz3iucBndgTUcdYpKwWysgtR8ZKLInV-5YrClT0EhTevJTQNXDRlFXR2fS8SNy_iEqItRRPhy9fXXaNp8Xr-GDeKc0NEgS6wRImU4wS6vA65xP9KmnumKy95N33VM/s400/ical-wrong-2.png" width="371" height="238" alt="Address Book says Mar 23, iCal says June 23">
</p><p>
Turning the birthday calendar off (which wipes out iCal's backing store for the calendar) and on didn't help matters. A web search turned up <a href="http://discussions.apple.com/thread.jspa?threadID=1218209&tstart=0">some other people</a> having the same problem, but the only useful solution they came up with was deleting and recreating entire contacts by hand.
</p><p>
I wanted to see if the raw data was wrong in Address Book's database. Address Book uses Core Data in a way that makes the database difficult to work with at the SQLite command-line level, so instead I hacked <code>/Developer/Examples/Python/PyObjC/AddressBook/Scripts/exportBook.py</code> to emit the birthday field by adding <code>('Birthday', AddressBook.kABBirthdayProperty)</code> to <code>FIELD_NAMES</code> and the following to <code>encodeField</code>:
</p><pre>
elif isinstance(value, AppKit.NSCalendarDate):
return value.descriptionWithCalendarFormat_("%Y-%m-%d")
</pre><p>
It turns out that a number of entries had <em>negative</em> years, e.g. <code>-1900-03-23</code> instead of <code>1900-03-23</code>. I'm not sure how this happened, but here's a script (which you can <a href="http://zevils.com/svn/trunk/misc/fixABBirthday">download</a>) to fix it:
</p><pre>
#!/usr/bin/python
"""
Fix negative birthday years in Address Book.
This work is hereby released into the Public Domain.
"""
import AddressBook
import AppKit
def personName(person):
return "%s %s" % (
person.valueForProperty_(AddressBook.kABFirstNameProperty),
person.valueForProperty_(AddressBook.kABLastNameProperty)
)
def formatDate(date):
return date.descriptionWithCalendarFormat_("%Y-%m-%d")
def fixBirthday(birthday):
year = int(birthday.descriptionWithCalendarFormat_("%Y"))
if year < 0:
return birthday.dateByAddingYears_months_days_hours_minutes_seconds_(
-year * 2, 0, 0, 0, 0, 0)
else:
return None
def fixPersonBirthday(person):
birthdayProp = AddressBook.kABBirthdayProperty
birthday = person.valueForProperty_(birthdayProp)
if birthday == None: return
fixedBirthday = fixBirthday(birthday)
if fixedBirthday != None:
print "Fixing up %s: %s -> %s" % (
personName(person),
formatDate(birthday),
formatDate(fixedBirthday)
)
person.setValue_forProperty_(fixedBirthday, birthdayProp)
book = AddressBook.ABAddressBook.sharedAddressBook()
for person in book.people():
fixPersonBirthday(person)
book.save()
</pre>Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-27009935724008742292007-12-28T08:19:00.000-08:002011-06-04T10:58:21.556-07:00Internationalization of Names<h2>Names are complicated</h2>
<p>
What's in a name? The answer turns out to vary quite widely around
the world. When an English-language form, either electronic or paper,
asks for a person's name, it usually provides separate fields for
first and last name, and sometimes middle name or middle initial.
Aristotle Pagaltzis <a href="http://plasmasturm.org/log/485/">linked to</a> a
<a href="http://blog.jclark.com/2007/12/thai-personal-names.html">post by Jim
Clark</a> on
Thai names, demonstrating that this approach, or even the alternative "given name, family name", falls down pretty quickly outside the English-speaking world. Thai names consist of:
</p><ul>
<li>A given name, similar to the English first name, except that it must come from a list of government-approved names;</li>
<li>A family name, which is also government-regulated; all people with the same family name are related, and new Thai citizens must select an unused name. Like all non-namespaced identifiers (domain names, instant messenger handles, user names on popular web services), the good short ones are taken; and</li>
<li>A <em>chue len</em>, which is typically translated as <em>nickname</em>, but according to Mr. Clark is more like an <em>informal given name</em>; it's selected by one's parents or close relatives early in life (though not necessarily at birth).</li>
</ul><p>
The obvious mapping of Thai name components onto English, (given name,
family name, chue len) → (first name, last name, nickname),
doesn't work very well. Consider the Thai name <em>Thaksin Shinawatra</em>,
chue len <em>Meow</em>, the former prime minister. His (romanized; more on
that later) legal name is <em>Thaksin Shinawatra</em>. If addressing him
politely, I would refer to him as <em>Khun Thaksin</em>.<sup><a id="names-fnref-1" href="#names-fn-1">1</a></sup> Note that this
is {honorific} {given name}, not {honorific} {family name}; in other
words, <em>Mr. Matthew</em> as opposed to <em>Mr. Sachs</em>. His friends and
family will call him <em>Meow</em>, not <em>Thaksin</em> or <em>Shinawatra</em>.
</p><p>
A further wrinkle is that when sorting a list of Thai names, the given
name, not the family name, should be the sort key. Then there's also
the matter that <em>Thaksin Shinawatra</em>, aka <em>Meow</em> isn't really the
gentleman's name at all; it's
<em>ทักษิณ
ชินวัตร</em>, aka
<em>แม้ว</em>. There are several standard
romanizations for Thai, and whichever one the named individual prefers
is considered canonical. There are also other quirks involved in the
Thai script form of a name, like the lack of whitespace between the
honorific and the given name.
</p>
<h2>Non-Thai complications</h2>
<p>
Then there are the whole sets of different requirements for other kinds of names. The comments on Jim Clark's blog entry, and <a href="http://rishida.net/blog/?p=100">this post</a> by Richard Ishid, who's in charge of i18n issues for the W3C, give some other good examples.
</p><ul>
<li>Russian and Icelandic have gender suffixes on the family name (Fuzaylova for a woman, Fuzaylov for a man; Fjalar Jónsson vs. Katrín Jónsdóttir.)</li>
<li>Russian has nicknames (which, like Thai "nicknames", are much more widely used than English nicknames) which are usually (always?) systematically derivable from their given names; Vladimir → Vova.</li>
<li>Scandanavian given names typically include spaces, and convention varies as to how acceptable it is to refer to <em>Hans Christian Andersen</em> as <em>Hans</em> vs. <em>Hans Christian</em>. This isn't unheard of in the southern United States, either -- Billy Jean, &c. In some parts of Europe, these multipart given names are hyphenated, as in the Austrian <em>Hans-Christian</em> or the French <em>Jean-Claude</em>.</li>
<li>In France and Italy, names can have a comma which essentially divides a series of first names from a series of middle names; in France, the middle names are rarely used outside of legal contexts, while in Italy, the middle names <em>aren't</em> used in legal contexts. A <em>Mario, Alberto Giovanni Rossi</em> would have a legal name of <em>Mario Rossi</em> in Italy, whereas a French <em>Jean, Christophe Dupond</em> would be commonly known as <em>Jean Dupond</em> but legally <em>Jean, Christophe Dupond</em>.</li>
<li>Many countries use patronymics instead of stable family names, so a set of related people won't have the same family name.</li>
<li>Many Chinese take arbitrary western nicknames for ease of communicating with westerners.</li>
<li>Chinese names also have generational markers, so a set of siblings will all have the same "middle" name, and names are written {family}{generational}{given} in Chinese script.</li>
</ul>
<h2>So what?</h2>
<p>
How much of this do we really need to worry about? When I say that
Thai names <em>should</em> be sorted by given name, <em>should</em>, of course, is a
horribly loaded term. If an American border control agent pulls
up a list of people who have entered the country at a particular
point, they probably want the sort key to be <em>Thaksin</em>, not
<em>Shinawatra</em>. Mapping (given, family) → (first, last) is also
probably fine for this application. So when, exactly, does the extra
information need to be preserved?
</p><p>
Some reasons that a system might be interested in a name, or parts of
a name, are:
</p><ul>
<li>Correlating records with other systems</li>
<li>Displaying people's names</li>
<li>Addressing people in writing ("Dear Mr. Sachs,", "Welcome, Matthew!") or on the phone</li>
<li>Identifying people ("To look up your records, enter your name")</li>
<li>Searching for people (on, say, a social networking site)</li>
<li>Sorting a list of people</li>
</ul><p>
For most English applications that don't cater to a large
international audience, it might be "good enough" to either simply
have a flat name field where users can either enter arbitrary names or
at least their romanizations.<sup><a id="names-fnref-2" href="#names-fn-2">2</a></sup> A flat name field is much more
flexible. Since you probably need to support substring searches
anyway, it doesn't lose anything as far as searching's concerned.
</p><p>
If you want to sort by last name, or communicate with other systems
that take a (first name, last name) tuple, it might be good enough to
just split off the last whitespace-separated token and treat that as
the last name.<sup><a id="names-fnref-3" href="#names-fn-3">3</a></sup> If that's not good enough, a pair of (first names,
last name) or (given names, family name) inputs may be called for, but
characters such as spaces and apostophes (<em>O'Flannagan</em>) should be
valid. If your application wants to try to automatically derive a
secondary form of address from the name entered, maybe it shouldn't.
Is the ability to have form letters say <em>Mr. Sachs</em> as opposed to
<em>Matthew Sachs</em> really worth the faux pas of <em>Mr. Shinawatra</em>? I
guess it depends on how international your audience is; you could always ask for multiple forms of address.<sup><a id="names-fnref-4" href="#names-fn-4">4</a></sup>
</p><p>
For applications that want to really get localized names right, like a
system-wide address book or a global social networking site, a more
complex approach is called for. For instance, the Mac OS X address
book framework knows about the address formats for various countries;
it could extend that functionality to support different name formats.
It has some rudimentary support for this, in that an individual
address book entry can have a set of <em>name ordering flags</em> associated
with it, either <em>first name first</em> or <em>last name first</em> (sic); name fields
are fixed at <em>title</em>, <em>first name</em>, <em>middle name</em>, <em>last name</em>,
<em>suffix</em>, <em>nickname</em>, <em>maiden name</em>, and <em>phonetic (first, middle,
last) name</em>.
</p><p>
Per-country address format support doesn't change which fields exist,
but it changes the order they're displayed in. Per-country name
format would need to be more complicated. A <code>Name</code> (which a person might have more than one of with different <code>NameFormat</code>s) might consist of:
</p><ul>
<li><code>NameFormat</code>, defining the (country, language) associated with the name (e.g. <code>en.US</code> and the set of available <code>NameComponent</code>)</li>
<li>A list of (<code>NameComponent</code>, <code>Value</code>, (optional) <code>PhoneticValue</code>)</li>
The system could provide functions like:
<li><code>int Name.compareWith(Name)</code></li>
<li><code>String Name.representation(NAME_REPRESENTATION)</code> where <code>NAME_REPRESENTATION</code> is one of:<ul>
<li><code>LEGAL_NAME</code></li>
<li><code>FORMAL_NAME</code></li>
<li><code>SHORT_FORMAL_NAME</code></li>
<li><code>INFORMAL_NAME</code></li>
<li><code>VERY_INFORMAL_NAME</code></li></ul></li>
<li><code>Name Name.convertTo(NameFormat)</code> would try to convert to a different name representation using automated rules for things like romanization.</li></ul>
<div class="footnotes">
<hr />
<ol>
<li id="names-fn-1"><em>Khun</em> is a generic honorific roughly akin to <em>Mr./Ms./Mrs.</em> There
might be a better one to use for a (former) Prime Minister. <a href="http://web.utk.edu/~wratchuk/learningthai/mar5.html">This
list</a> includes
ones for teacher, aunt, sister, older person, and younger person, but
suggests that <em>khun</em> is always used when addressing someone formally. <a href="#names-fnref-1">↩</a></li>
<li id="names-fn-2">In <a href="http://rishida.net/blog/?p=105">part two of his post</a> Mr. Ishid recommends that applications that expect ASCII input specify it; detecting and erroring on input in unsupported scripts is probably sufficient. <a href="#names-fnref-2">↩</a></li>
<li id="names-fn-3">It might be worth having a list of tokens which will also get treated as part of the last name, such as <em>de</em>, with this approach. <a href="#names-fnref-3">↩</a></li>
<li id="names-fn-4">"Enter your name and how you'd like to be addressed:" ? <a href="#names-fnref-4">↩</a></li>
</ol>
</div>Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-4597044632585380962007-12-26T08:02:00.000-08:002011-06-02T20:10:44.059-07:00Migrating a wiki from Trac to MediaWiki<p>I'd set up a <a href="http://trac.edgewall.org/">Trac</a> installation for wedding planning, instead of using MediaWiki (the system Wikipedia uses, which I already had a couple of installations of) since we wanted both a wiki (venue data, possible honeymoon destinations, guest lists... shut up, it's useful!) and ticket system (useful for tracking things like thank-you notes and being able to assign specific ones to either Liz or myself).<br/><br/>However, <a href="/about/">Dreamhost</a> doesn't support mod_python, so pages were taking way too long to load. I decided to switch over to MediaWiki for the wiki part and just use my existing Bugzilla installation for ticket tracking. Hence, a new script over on the <a href="/code/">code</a> page, <a href="http://zevils.com/viewvc.cgi/trunk/misc/trac2mw">trac2mw</a>. Our wiki was fairly tiny, so caveat user. I didn't bother having it migrate tickets tickets or attachments, since we didn't have any data there that was worth preserving. The input format, a MySQL XML dump, probably isn't ideal for a lot of people (since Trac runs on SQLite by default.) It does fix up the wiki page syntax (the parts of it we were using, at least), though.</p><br />Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-90785081456417888782007-12-17T15:30:00.000-08:002011-06-04T11:01:51.546-07:00Less Edward Tufte, More Don Martin<p>A New York Times <a href="http://cityroom.blogs.nytimes.com/2007/12/17/an-uneasy-guide-to-holiday-tipping/index.html?ex=1355547600&en=18e254250aff5004&ei=5088&partner=rssnyt&emc=rss">blog post on holiday tipping</a> linked to a gem from the Times archives, <a href="http://query.nytimes.com/gst/abstract.html?res=9902E7DC123AE633A25753C1A9649D946096D6CF">its own ancestor from 1911</a>.
</p><p>
The most striking feature of the article, which appeared on page six of the magazine section, is the large political cartoon-like illustration in the center (drawn by Reginald Russom, who evidently went on to help found <a href="http://www.abwac.org.au/ACAhistory.htm">what later became the Australian Cartoonists' Association</a>.) From what I've noticed, while the Times Magazine still employs plenty of illustrations, they're mostly charts and graphs; when there's a lead image that's not a more or less realist photograph of the article's subject, it tends to be a photo like <a href="http://www.nytimes.com/2007/12/16/magazine/16wwln-medium-t.html">this one.</a>
</p><p>
I love how one old newspaper article can shed light on:
</p><ul>
<li>Other concerns of the period (the legality of a state (or city?)-wide income tax debate was argued before the State Supreme Court)</li>
<li>Typical incomes and wages (a bit over $1M/yr in 2006 dollars is their example income for a "well-bred" New Yorker)</li>
<li>Types of service-sector employees one might utilize (such as elevator boy, <a href="http://en.wikipedia.org/wiki/Charlady">charwoman</a>, furnaceman, telephone operator, milkman, and stenographer, in addition to less remarkable professions)</li>
<li>Things that one might fear malfunctioning in an apartment (how little some things change; here we have the electric buzzer, hot water, windows (by the glass being broken, not routine mechanical failure), and mail delivery)</li>
</ul><p>
Maybe this is still routine in Manhattan, at least in the more highfalutin co-ops, but I also found it noteworthy that the building's management was expected to send you candidates if you wanted to sublet your apartment (but watch out; if you anger your super by not tipping around Christmas, he might send "several negroes and a Chinaman" your way!)
</p><p>
When I first got Times archives access (by subscribing to TimesSelect back in the day), I trawled the archives, there's a lot of good stuff there. If anyone else has a favorite, I'd love to hear about it in the comments.</p>Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-27279848045728558752006-03-14T07:16:00.002-08:002011-06-02T18:25:55.191-07:00Diagnosis of Inferior Social Proclivity Disorder in Young Adult Patients: A Case Study<p>
<a href="http://www.lyricsfreak.com/f/frank-sinatra/56210.html">Rodgers N. Hart, F. Sinatra, and E. Fitzgerald</a>,
Lorenz Institute for the Advancement of Clinical Psychology</p>
<p>
Note: This paper has also been accepted for <a href="http://community.livejournal.com/reformat_songs/85302.html">publication</a> in the Annals of <a href="http://community.livejournal.com/reformat_songs/profile">reformat_songs</a>.</p>
<h2>
Introduction</h2>
<p>
Inferior social proclivity disorder, or “trampiness”, is commonly mistaken for adjustment disorder not otherwise specified.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> However, this condition is surprisingly common in early post-adolescent patients, especially females.<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup> We examine the diagnosis and treatment of one patient, who we shall refer to as Lady. Lady, when she began treatment, was a 24-year-old who referred herself to our private practice. She had become increasingly concerned over her difficulty in forming social relationships at her place of employment, a finishing school.</p>
<p>
<span id="more-29"></span></p>
<h2>
Initial Work</h2>
<p>
We spent several sessions simply becoming familiar with the patient<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup> and allowing the therapeutic relationship to coalesce, and listening to the cognitive-behavioral paradigms<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup> which the patient used to self-describe the internalities<sup id="fnref:5"><a href="#fn:5" rel="footnote">5</a></sup> of her situation. Lady seemed to view herself through a neo-behavioralist<sup id="fnref:6"><a href="#fn:6" rel="footnote">6</a></sup> lens, and attempted to leverage this paradigm to assert control over her situation. She would often attempt to defer meals until excessively late hours, although these control attempts were never successfully realized due to her inability to stave off her hunger. Peculiarly, she was unusually consistent in her failures; she routinely ate dinner at exactly 7:55 in the evening. This led us to suspect a possible anorexia nervosa (restricting type) in conjunction with obsessive-compulsive personality disorder.<sup id="fnref:7"><a href="#fn:7" rel="footnote">7</a></sup> Her consistent timeliness at cultural events — she was a regular patron of the theatre — reinforced this notion.<sup id="fnref:8"><a href="#fn:8" rel="footnote">8</a></sup> However, our experiences with disorders of these spectra suggested that it would be premature to form anything more than a tentative diagnosis at this point.<sup id="fnref:9"><a href="#fn:9" rel="footnote">9</a></sup> Using a hybrid talk therapy approach,<sup id="fnref:10"><a href="#fn:10" rel="footnote">10</a></sup> we probed further.</p>
<h2>
Contraindications for Obsessive-Compulsive Personality Disorder</h2>
<p>
Further work with Lady led to the discovery that she exhibited several behaviors which contraindicated OCPD. First and foremost amongst these was a strong revulsion to gambling and excessive personal grooming.<sup id="fnref:11"><a href="#fn:11" rel="footnote">11</a></sup> Two contexts in which her coworkers often socialized were informal gambling nights with members of the local political establishment and outings to nightclubs with rigorous formal dress codes. Lady claimed that she felt excluded from these events due to her aversion to these activities. In addition to serving as social bonding rituals, her coworkers used these occasions to undertake the exchange of critical back-channel social collateral, or “gossip”.<sup id="fnref:12"><a href="#fn:12" rel="footnote">12</a></sup></p>
<h3>
Contraindications for Anorexia Nervosa</h3>
<p>
We also found evidence that she did not have anorexia nervosa, or any other eating disorder. Eating disorders are typically characterized by a need by the patient for control over his or her environment, actualized by control over the frequency and manner of dietary events.<sup id="fnref:13"><a href="#fn:13" rel="footnote">13</a></sup> It is expected, in cases of these disorders, to find, upon a closer examination, a pattern of control mechanisms. However, Lady did not seem to have any extra-dietary retentiveness behaviors. She was almost alarmingly nonchalant about upcoming major life events and her financial situation. She hoped to leave California (her state of residence) at some point, stating a preference for a warmer, more arid climate, but neither had nor desired strategies for attaining this goal. On a smaller scale, she would often arrive for appointments with her hair in a state of disarray, claiming (when prompted) that it had been disturbed by the wind on the drive over, but making no attempt to correct it.</p>
<h2>
Diagnosis of Inferior Social Proclivity Disorder</h2>
<p>
We concluded that Lady was probably not suffering from OCPD or anorexia nervosa. We considered a diagnosis of general social anxiety disorder, but she genuinely did seem to desire to connect with her coworkers, and she was quite active in other social circles. Then, in one session, Lady revealed a key piece of information. She said that her avoidance of the contexts in which her coworkers preferred to socialize was probably a good thing, because her financial situation did not permit her to engage in the expense of attending such nights on the town. She felt that her non-luxury automobile and other secondary socioeconomic characteristics placed her in a position of inferiority, and that she would be taken advantage of by the sophisticated and (in her view) unsavory characters who would often accompany her coworkers on these social outings. She wished to pursue a deeper connection with her coworkers, but she characterized their other associates as “sharpies” and “frauds.”</p>
<p>
We then asked how her coworkers could maintain such extravagant lifestyles while she, in a similar job at the same place of employment, could not. Her response to this was the final piece of the puzzle. This reinforces the critical importance of a close reading of responses to even innocuous questions in talk therapy.<sup id="fnref:14"><a href="#fn:14" rel="footnote">14</a></sup> She said that she had been offered many increases in salary, but had repeatedly turned them down because she “didn’t want the hassle.” This was a clear-cut case of ISPD. The patient was intentionally holding herself to an “inferior” social position, had difficulty functioning because of it, and did not perceive of her assumed position as problematic.<sup id="fnref:15"><a href="#fn:15" rel="footnote">15</a></sup></p>
<h2>
Motivating Factor Analysis</h2>
<p>
At this point we had diagnosed Lady, but this only really told us the “how” of her “trampiness”. Although it is often difficult or impossible to do so successfully,<sup id="fnref:16"><a href="#fn:16" rel="footnote">16</a></sup> we elected to explore the motivating factors behind her disorder (the “why” of her “trampiness.”) Such analysis often reveals additional disorders, or at least provides information which may prove invaluable in treatment. This analysis is still ongoing, and we do not have any results yet.</p>
<h3>
Treatment Plan</h3>
<p>
Treatment of Lady is currently ongoing. We are continuing talk therapy, both for its own merits, and as a component of the aforementioned motivating factor analysis. We are also attempting to use a combination of cognitive behavioral therapy and desensitization to address some of her avoidance issues.<sup id="fnref:17"><a href="#fn:17" rel="footnote">17</a></sup> We have had some preliminary success in exposing her to fast food sprayed with a solution which will cause it to induce greater than normal levels of nausea when consumed, and we have instructed her to bring gradually larger amounts of cash with her on her visits to our office. We hope to discuss the efficacy of these techniques in a future publication.</p>
<div class="footnotes">
<hr>
<ol>
<li id="fn:1">
<p>
A. Hasapemapetalan, B. F. Goodwrench; <em>Misdiagnosis of Social Proclivity Disorders</em>; Annals of the Bowling University Watercooler; 1973. <a href="#fnref:1" rev="footnote">↩</a></p>
</li>
<li id="fn:2">
<p>
D. Sedaris, T. Mobile; <em>Covariant Statistical Analysis via Modified Stochastic ANOVA of ISPD Demographics</em>; Quarterly Christian Statistical Review; 2001. <a href="#fnref:2" rev="footnote">↩</a></p>
</li>
<li id="fn:3">
<p>
F. Vuzayloya, R. Nachlin; <em>Look Who’s Talking: Techniques for Patient-Therapist Acclimation</em>; Proceedings of the Windsor University Conference on Clinical Techniques; 1999. <a href="#fnref:3" rev="footnote">↩</a></p>
</li>
<li id="fn:4">
<p>
J. Evans, B. Wilson; <em>Quantum Entanglement and the Cognitive-Behavioral Paradigm</em>; Psychological Humourism; 1273. <a href="#fnref:4" rev="footnote">↩</a></p>
</li>
<li id="fn:5">
<p>
B. Allen, M. Davis, L. Fracalossi, M. Sue; <em>Internalities: A New Paradigm for Patient Perception Analysis, and its Applications for the Treatment of Inferior Fictive Disorder</em>; Psychology Fortnight; 1999. <a href="#fnref:5" rev="footnote">↩</a></p>
</li>
<li id="fn:6">
<p>
K. Reeves, A. Wachowski, L. Wachowski; <em>A New Kind of Behavioralism</em>; Zion Review of Psychology; 2235. <a href="#fnref:6" rev="footnote">↩</a></p>
</li>
<li id="fn:7">
<p>
M. Tee, S. L. Jackson; <em>Foolish Diagnoses: A Case Study of an Aviaphobic-Ophidiophobic Complex</em>; Scientific Moldovan; 2004. <a href="#fnref:7" rev="footnote">↩</a></p>
</li>
<li id="fn:8">
<p>
I. Asimov; <em>The Endochronic Properties of Resublimated Thiotimoline</em>; Astounding Science Fiction; 1948. <a href="#fnref:8" rev="footnote">↩</a></p>
</li>
<li id="fn:9">
<p>
D. Savage, D. Iskowitz; <em>It Happens To All Therapists: On The Avoidance And The Acceptance Of Premature Diagnosis</em>; Journal of the Association for Computing Machinery; 2003. <a href="#fnref:9" rev="footnote">↩</a></p>
</li>
<li id="fn:10">
<p>
P. Hanks, J. Pusteyevski, R. Jakenduf; <em>Semantic Metrics for Evaluation of Talk Therapy Approaches</em>; Psychological Linguistics; 2006. <a href="#fnref:10" rev="footnote">↩</a></p>
</li>
<li id="fn:11">
<p>
U. Ulrich, D. Davidson, L. Richards, L. Rudolfo, B. Abrams; <em>Coded Contraindications</em>; Proceedings of the 30th Annual Hashimoto University Conference on Psychological Methodologies; 1986. <a href="#fnref:11" rev="footnote">↩</a></p>
</li>
<li id="fn:12">
<p>
D. Wikiberg, S. Bunan; <em>Byzantine Generals In Space: Network Theory, Social Dynamics, and Back-Channel Communications</em>; RISKS Digest; 1997. <a href="#fnref:12" rev="footnote">↩</a></p>
</li>
<li id="fn:13">
<p>
M. Powers; <em>Controlling Massively Parallel High-Resolution Event Timers in Low-Memory Environments</em>; Nature; 2000. <a href="#fnref:13" rev="footnote">↩</a></p>
</li>
<li id="fn:14">
<p>
H. P. Grice; <em>Logic and Conversation</em>; Syntax and Semactics, Vol. iii; 1975. <a href="#fnref:14" rev="footnote">↩</a></p>
</li>
<li id="fn:15">
<p>
T. Geisel (ed.); <em>The Delightful Diagnostic Dictionary</em>; Scholastic Books; 1960. <a href="#fnref:15" rev="footnote">↩</a></p>
</li>
<li id="fn:16">
<p>
S. Hill, P. Graves, et. al.; <em>Administrative Disavowment in High-Stress Environments</em>; Organizational Psychology; 1966. <a href="#fnref:16" rev="footnote">↩</a></p>
</li>
<li id="fn:17">
<p>
C. Thulhu, Y. Sothoth, S. Niggurath; <em>Inspiring Fear in Humankind</em>; Applied Noneuclidianism Review; 1986. <a href="#fnref:17" rev="footnote">↩</a></p>
</li>
</ol>
</div>Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-3298431662135509882006-03-08T09:36:00.003-08:002020-10-28T21:43:20.969-07:00Introduction to Unit Testing<p>
<small>Notes for a lecture given to Brandeis University’s COSI 22a.</small></p>
<h3>
What Is Unit Testing, and Why Should I Care?</h3>
<p>
Unit testing is the process of writing tests for individual bits of your
program, in isolation. A “bit” is a small piece of functionality. We’ll
discuss how small later. How can you know whether or not your program works
if you don’t test it? If you’ve ever lost points on a programming assignment
because something didn’t work right, you could’ve saved yourself from that
by testing your program.</p>
<p>
If you go on to take COSI 31a, you <em>will</em> do better on
the programming assignments if you write tests! More importantly, it’s a
good habit to get into as a programmer. Having tests for your code turns
programming from an art — “gee, it looks right and seems to work, I think I’m
done” — to a science —; “this is the evidence I have to support
the claim that my program is behaving correctly.”</p>
<p>
Unit testing is one of the easier ways to get into all the nooks and crannies of your code and make sure it’s doing the right thing. The act of writing tests often helps reveal areas where it isn’t
clear what it means to do “the right thing.”</p>
<p>
<span id="more-28"></span></p>
<h3>
What to Test</h3>
<p>
To figure out what to test, start by thinking about what it means for
your program to work. If you have a formal specification, that’s a great
place to start. For your homework assignments, you’ve had such a
specification, the <a href="http://java.sun.com/j2se/1.5.0/docs/api/">Java API reference</a> for whichever class you were supposed to be implementing.</p>
<p>
You should also think about what all the different parts of the task are.
You want at least one test for every public method in every public class.
One way to measure the quality of unit tests is a metric called <em>coverage</em>.
Coverage measures how much of your code is hit when you run your
tests. Consider the following code for the function <code>isNegative</code>:</p>
<pre class="java">
if(n > 0)
return false;
else
return true;
</pre>
<p>
If you wrote one test for this function, which tested <code>n = -5</code>, you
would only have 50% coverage, because two of the four lines are hit by that test
(the first two are never executed.) To achieve complete
coverage, you also need a test for a positive <code>n</code>, say <code>n = 5</code>.
Conceptually, you’re not fully testing the function if you only test that it
returns <code>true</code> for negative numbers, <strong>you also need to test that it
returns <code>false</code> for positive numbers</strong>; otherwise, it could be replaced
by a function that always returned <code>true</code> and your test suite (the
collection of all of your tests) would
have no idea! This is a common error I saw in the homeworks. A lot of
people were doing things like only testing <code>isEmpty()</code> on an empty
list.</p>
<p>
There’s one trap I should mention here. If you’re writing your test suite
and thinking about how to achieve maximum coverage, one way to do it is to
look at the source for your class while you’re writing the test suite and
go through every method and branch. The problem with this is that it ties
your test suite to implementation details of your code. It’s important to
think about the <em>logical</em> cases of the underlying problem you’re solving.
Consider the <code>isNegative</code> example. What does it return for
<code>n = 0</code>? According to a mechanical coverage check, you don’t need
to add a test for that, since you’ve already test both cases in the code.
The zero case is something that it’s easy to get wrong, though. It’s
the boundary between negative and positive. A good rule of thumb is
to <strong>always write specific tests for boundary conditions.</strong> The
<code>isNegative</code> above does the wrong thing, and it’s very easy to miss
unless you explicitly check <code>isNegative(0)</code>.
The way to figure out where the boundary cases, the corner cases, the
weird inputs which will give you problems are is to have a detailed mental
picture of what a particular method is supposed to. If you understand what
it really means to test whether a number is negative, it should occur to
you that 0 is an interesting case to check. Think about ways to implement the
functionality, and ways to implement it incorrectly. When comparing the
size of two lists, you should probably test not only cases like
<code>{1, 2, 3} == {1, 2, 3, 4}</code>, but also <code>{1, 2, 3, 4} == {1, 2, 3}</code>,
because catching one but not the other is a common mistake to make. Figuring
out what the easy mistakes are is hard. Of course, figuring out the hard
mistakes is harder.</p>
<p>
Also <strong>make sure to test the side effects and error conditions.</strong> If a method
is supposed to throw particular exceptions on particular invalid inputs, does
it? If <code>LinkedList.addAll(Collection)</code> is supposed to return
<code>true</code> to indicate that the list was modified, does it return <code>false</code>
when the collection is empty? A well-written spec makes this job a lot easier.
Look at the documentation for the method and make sure you’re testing that
it does everything that the documentation specifies, and exactly what the
documentation specifies.</p>
<p>
Another source of tests is bugs. When you find a bug, it indicates something
that you forgot to test. When this happens, <strong>write a test case for it</strong>.
You should do this <strong>before</strong> fixing the bug to verify that the test
case fails when the bug is present. Then fix the bug and make sure that
the test case starts passing. Things that you got wrong once are things
that you’re liable to get wrong again as things change. These sorts of
tests are called <em>regression tests</em>, because they’re testing that your
quality is always moving forward and never regressing.</p>
<h3>
How to Test It</h3>
<p>
Take a look at the included <code>PizzaTest</code> class and <code>Pizza</code>
documentation. I’ve written a package,
<code>Pizza</code>, for determining a set of toppings that will make a group of people
happy when they’re trying to order a pizza. Full source code for <code>Pizza</code>
is on the web, see below for the URL.</p>
<p>
The test suite is structured into groups of tests which test units of
functionality. The simple classes, <code>Topping</code> and <code>ToppingConstraint</code>,
have one group for each class. <code>Pizza</code> has a few different groups.
I isolated each group so that it doesn’t depend on anything done in any
of the other groups. Each group that needs to construct a <code>Pizza</code>
initializes its own topping list. This way the test groups aren’t dependent
on each other and a failure in one small area of the test suite won’t randomly
break a bunch of tests that should work. In order for a test suite to be useful,
you want it to help you figure out exactly what is failing. There are trade-offs,
though. I use <code>Topping.equals(Object)</code>, even in tests for completely
unrelated things. These tests will break if <code>Topping.equals</code> is
broken. It would be a lot of extra busywork to avoid using <code>Topping.equals</code>,
and it couldn’t be done without tying myself to the internal makeup of
<code>Topping</code>. I shouldn’t need to rewrite the entire test suite if
another attribute is added to toppings! One solution to this would be to
indicate in some fashion that some of the other groups of tests, such as
the <code>applyConstraints()</code> tests, <em>depend</em> on the <code>toppings()</code>
tests, and we shouldn’t even bother running the <code>applyConstraints()</code>
tests if the <code>toppings()</code> tests fail. There are frameworks to help
you write unit tests, such as JUnit, which allow you to express this.</p>
<p>
The first test group, <code>mustSetToppings()</code>, is testing that an error
condition is generated under circumstances that it should be and <strong>not</strong>
generated under other circumstances. It’s also a good example of how to test
whether or not an exception is thrown.</p>
<p>
The second test group, <code>toppings()</code>, tests the <code>Topping</code>
class. It’s a fairly trivial class, but we test it anyway. It’s nice to
not have to worry about whether or not it’s working. The test suite
can get things wrong, of course, so don’t get overconfident. Note that
the way equality of toppings is defined, they must have both the same
name and the same type, so the tests for <code>Topping.equals(Object)</code> test
cases where they have the same name but different types and vice versa, not
just a case where they’re completely different and a case where they’re
completely identical. We also test the case of them being completely
different. This way, if, say, the name equality test is broken, we will
know exactly what went wrong, because the “same names, different types”
test will fail if the name test is broken to return false negatives,
and the “different names, same types” test will fail if the name test is
broken to return false positives.</p>
<p>
<code>applyConstraints()</code> is the most complicated test group. This makes sense,
it’s testing the really hard bit of <code>Pizza</code>. The individual tests are
straightforward, the tricky part was figuring out which tests
there should be. To come up with those
test cases, I spent a lot of time thinking about the different ways in which
this could go wrong. I intentionally picked a loosely-specified problem to
make this job more interesting. The problem that <code>Pizza</code> is attempting
to solve, how it’s supposed to work, what sorts of results it should return…
these are all somewhat open to interpretation. That’s often what you have to
do when you’re programming. A lot of times, you’ll get a vague problem,
and you have to figure out how to solve it. Sometimes these are “business
requirements” handed to you by your boss, sometimes it’s you thinking
that it would be cool to do “foo”. The homeworks and labs <em>have</em>
been based on fairly detailed specifications, and there have still been
ambiguities! It took me around four hours to write all of this code,
<code>Pizza</code> and <code>PizzaTest</code> and <code>PizzaMain</code>, and at least
an hour, maybe two hours, was writing the <code>applyConstraints()</code> tests.
Most of <em>that</em> time was figuring out what tests I needed to write!</p>
<h3>
Further Resources</h3>
<ul>
<li><a href="https://github.com/matthewg/Zevils/tree/master/PizzaSource">Pizza source code</a></li>
<li>Some other types of testing: <a href="http://en.wikipedia.org/wiki/Integration_testing">integration testing</a>, <a href="http://en.wikipedia.org/wiki/Stress_testing">stress testing</a>, <a href="http://en.wikipedia.org/wiki/Fuzz_testing">fuzz testing</a>, <a href="http://en.wikipedia.org/wiki/Performance_testing">performance testing</a></li>
<li><a href="http://www.junit.org/index.htm">JUnit</a> is a framework for doing unit testing in Java. It handles a lot of the grunt work for you. There
are <a href="http://java-source.net/open-source/code-coverage">some additional packages</a>
for it that will automatically measure the coverage of your test suite.</li>
</ul>
<p>
<small>These lecture notes and all associated source code are in the
public domain.</small></p>Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-29214112773546497382006-02-13T08:59:00.002-08:002011-06-02T18:42:21.222-07:00The Design of Laptop "fn" Keys<p>On every PC laptop made in the past 5+ (10+ ?) years, many of the “F1” (F2, F3, …) keys, and sometimes some of the other keys (the arrow keys in particular) serve two purposes. When pressed normally, they act as their respective key — F1 acts as F1, etc. However, when pressed in conjunction with the “Fn” key, they perform a special function indicated by an icon on the key. Usually both the icon and the label on the Fn key will be blue (whereas the other key labels are white.) For instance:</p>
<p><img src='https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhM2xFeQG6gOT3UviaT808dB6RJ9kJZI5asXNy1KjGHe1UQjnqB44ZqgyWyS6Su5C358loDMEDhayKfsSIunt82Hc4ZFoAe-agAf9izjUwukAD8mqqsBVCL05hu4y1Yvf5FrGeGs2KhkqY/s400/fn-keys-example.jpg' alt='Demonstration of Fn-modifiable keycaps' width="284" height="115"></p>
<p>Today, one of my professors tried to hook up his laptop to the projector and was befuddled when it didn’t work. As soon as I saw him struggling, I knew that the problem was that he had to turn on the external video out. PC laptops typically have three display output modes: internal LCD only, external (VGA, or sometimes DVI these days) connector only, or both internal and external simultaneously. In order to change the mode, one typically has to either use the Fn function of one of the F keys (typically F5, F6, or F7.) Sometimes it can also be done through some buried option in the Display control panel.</p>
<p>The reason I knew that this was a problem is because almost every single professor who I’ve seen hook a laptop up to a projector has had to do this and had no idea what they had to do or how they were supposed to do it. The notion of hitting Fn in conjunction with some other key didn’t even seem to occur to them. Here’s something that’s a common thing to need to do, and laptop designers have tried to come up with a design that affords doing it (Fn is always next to Ctrl, so it should be natural to interpret it as a modifier key, and the color labels reinforce hitting Fn in conjunction with specific keys), but their design has failed, even after it’s been around for so many years and people have had a chance to get accustomed to it. Why doesn’t their design work? (And why do they keep using it?)
<span id="more-22"></span></p>
<p>One problem with the “Fn+blue” design is that the labels on the keys are almost always terrible. Okay, the volume labels — often the Fn+arrow keys will control the volume or generate page up/page down/home/end — are pretty recognizable, but it’s easy enough to control that from within Windows, and a dark blue label on a black background doesn’t stand out (although some keyboards are much worse about this than others, I’ve seen ones where you need to really squint to see that there’s anything there at all), so that one doesn’t tend to get noticed or remembered. Also, many professors, the group of users who I’ve had the most opportunities to observe trying to hook up a laptop to a projector, don’t use sound on their laptops, so they’ve never needed to control the volume. The label for toggling the display mode is either a cryptic image (<img src='https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhn-h_ct7gfcgdeyjWxMHbyEBepG3JKHNWBNSBjhb-p6jutNuPhB9RHL6hXMjOsero22-wMxMJd4IHJybCt4xvE9auFl_bNhA17Q9TDWAdUs2HepWtfq3wfKKpk6EdRSymuytJvtArSoUI/s400/fn-key-white.jpg' alt='A keycap' width="52" height="43">), a slightly less cryptic image (<img src='https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPD4K5odKlmw7cf_eRMioatXXYcAH7nHMDqQsuVA8lqUV1ReuKDFLfF7jidrJPo519rTlq7cVDXp_Bm3CCSYPWH4uq4jia_TlMS2UFDQBXf0M1iqx5E3dCjmO1OJrfdRncUhq02cqoxbE/s400/fn-key-2.jpg' alt='Another Fn keycap' width="70" height="52">), or a confusing text label (<img src='https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2ji4Y35yt10C23eDX7xL9RvSxG4tRQRRCf5-MIal4ODlwcY_14jlmDhijU_HWajzTQmdJDmp1p_GKGfak087k5a17NsrIZO9D1gd_p3LSZBoS51nftL72WIHm0VuqDu-K95Tyhq85f6I/s400/fn-key-3.jpg' alt='Yet another Fn keycap' width="54" height="43"> — even those few users who know what a CRT is are unlikely to associate it with trying to use a projector, since a projector is not a CRT).</p>
<p>Another problem is that users don’t expect to have to turn on the VGA output. It doesn’t match any of their experiences with plugging things in (most of them haven’t plugged in digital audio cables to their sound cards or receivers), and it doesn’t even match their experiences in plugging in monitors to desktops. It also isn’t very consistent. Sometimes it does just work, because they happen to have been in “internal+external” mode, and then for no apparent reason, they’ll have gotten switched to internal-only the next day.</p>
<p>Finally, I don’t think that users conceptualize Ctrl, Alt, etc. as modifier keys; that is, keys which, when pressed in conjunction with another key, change the behavior of that key in a predictable way. I think they get conceptualized as chording keys, or parts of a two-key combination. Ctrl-S doesn’t get conceptualized as “like pressing S, but I’m also pressing Ctrl so it will behave differently than the way I normally press S.” Instead, Ctrl-S turns into “pressing these two keys together to act as a different key entirely.” Users are right. The change in behavior produced by modifier keys is so rarely systematic (what does it mean to “Ctrl” something? To “Alt” it?), and the behavior produced by the combination bears so little resemblance to the normal behavior of the base key (the act of saving has nothing whatsoever to do with the act of producing the letter ‘s’), that there’s no reason to expect people to think of Ctrl/Alt/Fn as modifiers. It’s even hard to say what “Ctrl” means as an independent concept. Ctrl (in Windows) means “do something”, which is pretty meaningless. Alt (again, in Windows) seems to mean “shortcut to menus”, but most users don’t know about that either. The consequence of this is that users don’t feel that the behavior of chording keys is something they can predict. If Foo-S has nothing to do with either Foo or S, but does something completely novel, why should one be able to intuit what Bar-S might do? Fn actually does have a meaning — manipulate system hardware functionality in the way stated by the blue labels on the key caps — but meaning is not something that users expect to find.</p>
<p>The way Apple does things is different in a revealing way. First, Apple doesn’t use color to differentiate between the Fn function of its keys and the standalone behavior, everything is the same shade of gray. They use position (unmodified behavior on the left side of the key cap, modified behavior on the right.) Second, the modified and unmodified behavior are the inverse of PCs. Color would probably be better. F4 by itself decreases the volume, and Fn+F4 produces F4. (The exception is the arrow keys, which are arrow keys unmodified and page up/page down/home/end in conjunction with Fn.) The only time I’ve ever had to use one of the F keys on a Mac is F5 to bring up autocomplete in Xcode, and F12 for Dashboard, so the fact that needing to press Fn+F4 to get F4 doesn’t leap out at a naiive user isn’t all that critical, the F keys get used a lot more in Windows (application menu items are often bound to F keys by default.) Third, it’s much easier to find the software display controls in Mac OS X — they’re obvious in the Displays system preferences panel, and if it’s something you do often, you can check a box in Displays to have it right in your menu bar. Finally, it’s much more likely to just work — automatically detect that you’ve plugged something in and start displaying to it — on a PowerBook than on any PC laptop I’ve seen. This is borne out by my experiences observing professors, PowerBook users are much less likely to need to do anything and much more likely to be able to figure out what to do when they do.</p>
<p><img src='https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXhyphenhyphen7sdETOSkxoKgevDf7-LwP6qRkoO3smqQjG_f5g-pYQ-Gdd0byWtzMCAa8PuiXzkrvhFL5iCFnDKL3fRPuGhNo4jE65G7gcX5oLCY43c7o8v1tKBqcrmOQShAXoue7pCyEBPCFxCyc/s400/mac-fn-keys.jpg' alt='Powerbook function keycaps' width="616" height="44" /></p>
<p>Interestingly, many PC desktop keyboards are emulating the Mac model these days. The current generation of PC keyboards with “media” keys, e.g. a dedicated key to change to the next track in WMP or open Internet Explorer, typically make the F keys serve double-duty as the extended keys, and have a “Fn lock” which defaults to on and must be turned off to get the standard F behavior. Oddly, most of those keyboards don’t also have an Fn key, which makes them a pain in the butt.</p>Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-5701490332182985812004-12-20T08:47:00.002-08:002011-06-02T18:47:10.661-07:00Enhancing Machine Translation via Frame-Semantic Data<p>I’ve just finished my final assignment for the semester, a paper for LING 190. Click the title for the full text of the paper, read the abstract below, and see the cut at the bottom of this entry for a layperson’s explanation of the technical bits.</p>
<h2><a href="https://docs.google.com/a/sachsfam.org/viewer?a=v&pid=explorer&chrome=true&srcid=0B6TptVJl1TumMmIxODE1ODUtNzI4Yy00MzgzLWI0ZjEtOGNhOTBiYzc1Mzc5">Enhancing Machine Translation via Frame-Semantic Data</a></h2>
<h3>Abstract</h3>
<p>Frame semantics is an approach to examining meaning in natural language by considering clusters of related concepts. For instance, in the “commercial transaction” frame, there is a buyer, a seller, goods, and money; different predicates in this frame will place these agents in different syntactic roles, so, in English, the buyer will be the subject of buy while the seller will be the subject of sell.</p>
<p>Frame semantics presents a powerful aide to machine translation. Frame-semantic knowledge of an input phrase facilitates more precise word-sense disambiguation and allows greater flexibility in deciding which of multiple valid word orderings to emit in the target language. I have demonstrated this by creating a rudimentary system for translating from Spanish to English which can optionally take advantage of frame-annotated input, and then testing this system on a small corpus of phrases in the commercial transaction frame.
<span id="more-20"></span></p>
<p>Glossary, in order of appearance in paper:</p>
<dl>
<dt>predicate</dt>
<dd>verb</dd>
<dt>word-sense disambiguation</dt>
<dd>For a word that has multiple meanings, figuring out which meaning a particular occurrence of that word is referring to.</dd>
<dt>corpus, corpora</dt>
<dd>A bunch of arbitrary text gathered from real-world sources. “Corpora” is the plural.</dd>
<dt>parsing</dt>
<dd>Taking natural language input and arranging it into phrases, subphrases, clauses, etc.</dd>
<dt>lemma</dt>
<dd>The root form of a word. For verbs, the lemma would be the infinitive form.</dd>
<dt>tokenization</dt>
<dd>Splitting a stream of natural language into a series of tokens, which are basically the same thing as words and punctuation. So, splitting “Hello, world!” into the following series of tokens: H e l l o , w o r l d !</dd>
<dt>tagging</dt>
<dd>Annotating each token with its part of speech, so in the sentence “he ran”, ‘ran’ would be marked as (amongst other things) a past-tense third-person singular verb.</dd>
<dt>lexicon</dt>
<dd>Dictionary.</dd>
<dt>gendered/neuter pronouns</dt>
<dd>“He” and “she” are gendered pronouns in English. “It” is a neuter pronoun in English.
syntactic distribution of frame roles: This is referring to the way that particular frame roles are assigned to particular grammatical components in a particular frame-predicate (a particular predicate in a particular frame.) For instance, in English, “BUY” has the buyer as the subject, the seller in an optional “from” clause, the goods as the direct object, and money in an optional “for” clause.</dd>
<dt>area for further research</dt>
<dd>I’m too lazy to look into it / give me more grant money.</dd>
<dt>syntactically motivated</dt>
<dd>It’s a result of syntax.</dd>
<dt>collocations</dt>
<dd>Two words are considered collocates when they appear near each other more often than you’d expect given their respective frequencies.</dd>
<dt>anaphora resolution</dt>
<dd>When you have a pronoun, figuring out which noun it refers to.</dd>
</dl>Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-53246342304778908252003-11-03T08:44:00.002-08:002011-06-02T18:47:41.624-07:00Why Subnets are Good: The Party<p>(This is an attempt to explain, to a non-technical audience, why a large network, such as the Brandeis network, should be divided into subnets. It comes from <a href="http://my.brandeis.edu/bboard/q-and-a-fetch-msg.tcl?msg%5fid=0002MT">this bboard thread</a>.)</p>
<p>The reasons why one wouldn’t want to have the entire campus be one huge subnet are technical, but I’ll try to explain them. Think of the Brandeis network as a huge party. It’s a tremendous party, with every student, faculty-member, and staff-member in one huge room. The university is sponsoring it to celebrate the grand opening of the Carl and Ruth Shapiro Supremely Massive Empty Room.</p>
<p><span id="more-19"></span></p>
<p>You think your friend Alice is at the party, and you want to talk to her. So, you have to find her. You don’t know where she is in the massive room, so you have to either shout really loudly, or spend a lot of time looking for her. Everyone else is also trying to find people, so everybody’s shouting and running around.</p>
<p>Furthermore, people are trying to get to different places in the room. Once people have found their friends, they need to walk over to them, plus people need to get to the exits, the bathrooms, the bar, the DJ, and so forth. Some people are even trying to dance. So, everybody is walking everywhere, and dancing, and shouting. This might be fun, if you’re into that kind of thing, but it’d be really hard to find Alice, walk over to her without bumping into people and without taking some crazy zig-zagging route, and have a conversation with her.</p>
<p>Then it gets worse. Bob, that insufferable boor, has too much to drink, and gets in a fight with Charlie. Their shouting and scuffling drown out the rest of the activity in the room, and makes it impossible for anyone to have a good time until Public Safety comes and escorts them out. It takes a long time for Public Safety to locate Bob and Charlie in the room, despite their loudness, because the crowd is so thick, and even when Bob and Charlie are found, it takes many minutes for the officers to push through the crowd and get the drunks removed.</p>
<p>Well, the party is a disaster. Carl and Ruth get separated by people running between them trying to get to Alice, and they can’t find each other until 3AM. Jehuda spends the whole night trying to dance the Electric Slide, but people are shouting over the music and he can’t keep the beat.</p>
<p>To make things up to the Shapiros, Brandeis decides to throw another party. This time, they decide to get Brandeis’s most renowned social event planners to help them out with it, and so they go to the ITS department, whose infamous VoIP Rollout Bash is still whispered about in revered tones.</p>
<p>ITS throws the party in a large house, with many different rooms. Each room has between fifty and a couple of hundred people in it. Furthermore, people are assigned to rooms alphabetically, so you know which room everyone is in. Each room has a robot in it which can relay sounds, 3D images, smells, and even the sensation of someone dancing with you, from one room to another. This robot is so powerful, that all you have to do is say, at a normal conversational volume, “Hi, Alice?”, and the robot will zip into the corridor, fly down to some out-of-the-way closet, and give the message to a page, which will zip it over to the robot in Alice’s room. The robot is almost never too busy, the layout of the corridors is so efficient that messages can get between rooms instantaneously.</p>
<p>People can find each other and talk to each other, and are having a good time. You and Alice dance the night away. Bob and Charlie get drunk and start fighting again, but only the people in their room are affected, since the robot doesn’t pass on their alchohol-fueled shouting; also, Public Safety knows what room they’re in, and because the room is much smaller they can get them escorted out much more quickly. The party is a smashing success. Jehuda wins the dance competition, and Carl and Ruth have a wonderful time.</p>Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.comtag:blogger.com,1999:blog-8760640198136026212.post-31605112909343694322003-05-01T09:14:00.002-07:002011-06-02T18:48:30.009-07:00When Daemons Attack: Debugging Linux Applications<p>Notes from a talk I gave to the Brandeis University Computer Operators Group.
<span id="more-18"></span></p>
<ul>
<li>Call tracing
<ul>
<li>System call tracing — strace (Linux), truss (BSD), <a href="http://razor.bindview.com/tools/desc/strace_readme.html">strace for NT</a></li>
<li>Library call tracing — ltrace (Linux/BSD)</li>
</ul></li>
<li>Trace example (<code>ltrace -S</code>)
<ul>
<li><strong>First, the program is linked and loaded…</strong></li>
</ul></li>
</ul>
<pre>
SYS_uname(0xbffff600) = 0
SYS_brk(NULL) = 0x0804c000
SYS_open("/etc/ld.so.preload", 0, 010000210574) = -2
SYS_open("/etc/ld.so.cache", 0, 00) = 3
SYS_fstat64(3, 0xbfffeda0, 0x400114ac, 0, 0x400115e4) = 0
SYS_mmap(0xbfffed70, 0, 0x400114ac, 3, 0x40011594) = 0x40012000
SYS_close(3) = 0
SYS_open("/lib/libc.so.6", 0, 027777767210) = 3
SYS_read(3, "\177ELF\001\001\001", 1024) = 1024
SYS_fstat64(3, 0xbfffedf0, 0x400114ac, 0, 0x400115e4) = 0
SYS_mmap(0xbfffecd0, 0x40011d30, 0x400114ac, 2, 0xbfffecf0) = 0x40029000
SYS_mprotect(0x40131000, 32292, 0, 2, 0xbfffecf0) = 0
SYS_mmap(0xbfffecd0, 0x40011d30, 0x400114ac, 0x40011d30, 0xbfffed08) = 0x40131000
SYS_mmap(0xbfffecd0, 0, 0x400114ac, 0x40137000, 0xbfffed08) = 0x40137000
SYS_close(3) = 0
SYS_mmap(0xbffff360, 8, 0x400114ac, 4096, 112) = 0x40139000
SYS_munmap(0x40012000, 92718) = 0
</pre>
<ul>
<li><strong>We’ve finally reached the program’s main statement.</strong></li>
</ul>
<pre>
__libc_start_main(0x08048b28, 2, 0xbffffa94, 0x08048760, 0x08049b40 <unfinished ...>
setlocale(6, "") = "C"
bindtextdomain("coreutils", "/usr/share/locale" <unfinished ...>
SYS_brk(NULL) = 0x0804c000
SYS_brk(0x0804d000) = 0x0804d000
SYS_brk(NULL) = 0x0804d000
<... bindtextdomain resumed> ) = "/usr/share/locale"
textdomain("coreutils") = "coreutils"
__cxa_atexit(0x08048f34, 0, 0, 13, 0x08049edd) = 0
getenv("POSIXLY_CORRECT") = NULL
getopt_long(2, 0xbffffa94, "+", 0x0804a080, NULL) = -1
fputs_unlocked(0xbffffbe2, 0x40132c40, 0x080489f8, 20304, 0xbffffa50 <unfinished ...>
</pre>
<ul>
<li><strong>strace shows this next line as <code>fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 4), ...}) = 0</code> — that’s why you should try strace before going to ltrace, it’s prettier.</strong></li>
</ul>
<pre>
SYS_fstat64(1, 0xbffff8d0, 0x40135f60, 0x40136560, 0) = 0
</pre>
<ul>
<li><strong>This is libc’s fputs allocating a buffer.</strong></li>
</ul>
<pre>
SYS_mmap(0xbffff8b0, 0xbffff8d0, 0x40135f60, 0x40132c40, 4096) = 0x40012000
<... fputs_unlocked resumed> ) = 1
exit(0 <unfinished ...>
__fpending(0x40132c40, 0x400114ac, 0x400116d8, 0x08048728, 0x40135f60) = 13
fclose(0x40132c40 <unfinished ...>
SYS_write(1, "Hello, world\n", 13Hello, world
) = 13
</pre>
<ul>
<li><strong>echo explicitly closes stdout for some reason.</strong></li>
</ul>
<pre>
SYS_close(1) = 0
SYS_munmap(0x40012000, 4096) = 0
<... fclose resumed> ) = 0
SYS_exit_group(0 <unfinished ...>
+++ exited (status 0) +++
</pre>
<ul>
<li>Example: Finding errors with <code>strace</code>
<ul>
<li><code>foo: No such file or directory</code></li>
<li>Negative return values usually indicate errors, so try grepping for them</li>
<li>If the program is printing an error message, start at where that error message is printed and work backwards</li>
</ul></li>
</ul>
<pre>
open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=1384816, ...}) = 0
mmap2(NULL, 1384816, PROT_READ, MAP_PRIVATE, 3, 0) = 0x4019a000
close(3) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=57, ws_col=158, ws_xpixel=1584, ws_ypixel=1144}) = 0
brk(0) = 0x805a000
brk(0x805d000) = 0x805d000
<strong>stat64("foo", 0x8059a5c) = -1 ENOENT (No such file or directory)
lstat64("foo", 0x8059a5c) = -1 ENOENT (No such file or directory)</strong>
write(2, "ls: ", 4ls: ) = 4
write(2, "foo", 3foo) = 3
open("/usr/share/locale/locale.alias", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=2627, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40012000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2627
brk(0) = 0x805d000
brk(0x805e000) = 0x805e000
read(3, "", 4096) = 0
close(3) = 0
munmap(0x40012000, 4096) = 0
open("/usr/share/locale/en_US.ISO-8859-1/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.iso88591/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.ISO-8859-1/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.iso88591/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
<strong>write(2, ": No such file or directory", 27: No such file or directory) = 27</strong>
write(2, "\n", 1
) = 1
exit_group(1)
</pre>
<ul>
<li>Source-level Debugging
(This is why having source code is good…) Useful gdb commands:
<ul>
<li><code>bt</code> (backtrace) — Shows the stack of function calls</li>
<li><code>b</code> (breakpoint) — Sets a breakpoint</li>
<li><code>print expr</code></li>
</ul></li>
<li>Example: Using <code>gdb</code></li>
</ul>
<pre class="c">
<span style="color: #808080; font-style: italic;">/* Silly C program to print numbers */</span>
<span style="color: #339933;">#include <stdio.h></span>
<span style="color: #339933;">#include <stdlib.h></span>
<span style="color: #993333;">void</span> fillarray<span style="color: #66cc66;">(</span><span style="color: #993333;">int</span> *array, <span style="color: #993333;">int</span> size<span style="color: #66cc66;">)</span> <span style="color: #66cc66;">{</span>
<span style="color: #993333;">int</span> i;
<span style="color: #b1b100;">for</span><span style="color: #66cc66;">(</span>i = <span style="color: #cc66cc;">0</span>; i &lt; size; i++<span style="color: #66cc66;">)</span> array<span style="color: #66cc66;">[</span>i<span style="color: #66cc66;">]</span> = i;
<span style="color: #66cc66;">}</span>
<span style="color: #993333;">int</span> main<span style="color: #66cc66;">(</span><span style="color: #993333;">int</span> argc, <span style="color: #993333;">char</span> *argv<span style="color: #66cc66;">[</span><span style="color: #66cc66;">]</span><span style="color: #66cc66;">)</span> <span style="color: #66cc66;">{</span>
<span style="color: #993333;">int</span> *numarray, i;
numarray = malloc<span style="color: #66cc66;">(</span>atoi<span style="color: #66cc66;">(</span>argv<span style="color: #66cc66;">[</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">]</span><span style="color: #66cc66;">)</span><span style="color: #66cc66;">)</span>;
fillarray<span style="color: #66cc66;">(</span>numarray, <span style="color: #993333;">sizeof</span><span style="color: #66cc66;">(</span>numarray<span style="color: #66cc66;">)</span><span style="color: #66cc66;">)</span>;
<span style="color: #b1b100;">for</span><span style="color: #66cc66;">(</span>i = <span style="color: #cc66cc;">0</span>; i &lt; <span style="color: #993333;">sizeof</span><span style="color: #66cc66;">(</span>numarray<span style="color: #66cc66;">)</span>; i++<span style="color: #66cc66;">)</span> <a href="http://www.opengroup.org/onlinepubs/009695399/functions/printf.html"><span style="color: #000066;">printf</span></a><span style="color: #66cc66;">(</span><span style="color: #ff0000;">"%d<span style="color: #000099; font-weight: bold;">\n</span>"</span>, numarray<span style="color: #66cc66;">[</span>i<span style="color: #66cc66;">]</span><span style="color: #66cc66;">)</span>;
<span style="color: #b1b100;">return</span> <span style="color: #cc66cc;">0</span>;
<span style="color: #66cc66;">}</span>
</pre>
<pre>
bash$ printnums
Segmentation fault
bash$ gcc -ggdb -o printnums printnums.c
bash$ gdb printnums
(gdb) r
Starting program: /home/matthewg/printnums
Program received signal SIGSEGV, Segmentation fault.
0x40052a6e in __strtol_internal () from /lib/libc.so.6
(gdb) bt
#0 0x40052a6e in __strtol_internal () from /lib/libc.so.6
#1 0x40050849 in atoi () from /lib/libc.so.6
#2 0x080483e3 in main (argc=1, argv=0xbffffa04) at printnums.c:15
(gdb) up
#1 0x40050849 in atoi () from /lib/libc.so.6
(gdb) up
#2 0x080483e3 in main (argc=1, argv=0xbffffa04) at printnums.c:15
15 numarray = malloc(atoi(argv[1]));
(gdb) print argv[1]
(gdb) print argc
$4 = 1
(gdb) quit
A debugging session is active.
Do you still want to close the debugger?(y or n) y
</pre>
<ul>
<li><strong>The program’s expecting a command-line argument, we forgot to give it one or check for that condition.</strong></li>
</ul>
<pre>
Continuing.
0
1
2
3
Program exited normally.
(gdb) b 20
Breakpoint 2 at 0x8048451: file printnums.c, line 20.
(gdb) r
Starting program: /home/matthewg/printnums 1
Breakpoint 2, main (argc=2, argv=0xbffff9f4) at printnums.c:20
20 numarray = malloc(atoi(argv[1]));
(gdb) s
21 fillarray(numarray, sizeof(numarray));
(gdb) print sizeof(numarray)
$7 = 4
(gdb) print argv[1]
$8 = 0xbffffb64 "1"
(gdb) c
Continuing.
Breakpoint 1, fillarray (array=0x804a008, size=4) at printnums.c:9
9 for(i = 0; i < size; i++) array[i] = i;
(gdb) clear
Deleted breakpoint 1
(gdb) c
Continuing.
0
1
2
3
Program exited normally.
(gdb) quit
</pre>
<ul>
<li><code>sizeof(foo)</code> doesn’t do what the programmer thought it did
<ul>
<li>Replacing Library Functions
<code>fputs.c</code> source:</li>
</ul></li>
</ul>
<pre class="c">
<span style="color: #339933;">#define _GNU_SOURCE</span>
<span style="color: #339933;">#include <stdio.h></span>
<span style="color: #339933;">#include <dlfcn.h></span>
<span style="color: #993333;">int</span> fputs_unlocked<span style="color: #66cc66;">(</span><span style="color: #993333;">const</span> <span style="color: #993333;">char</span> *s, FILE *stream<span style="color: #66cc66;">)</span> <span style="color: #66cc66;">{</span>
<span style="color: #993333;">int</span> <span style="color: #66cc66;">(</span>*orig_fputs<span style="color: #66cc66;">)</span><span style="color: #66cc66;">(</span><span style="color: #993333;">const</span> <span style="color: #993333;">char</span> *, FILE *<span style="color: #66cc66;">)</span>;
<span style="color: #993333;">int</span> retval;
orig_fputs = dlsym<span style="color: #66cc66;">(</span>RTLD_NEXT, <span style="color: #ff0000;">"fputs_unlocked"</span><span style="color: #66cc66;">)</span>;
<a href="http://www.opengroup.org/onlinepubs/009695399/functions/printf.html"><span style="color: #000066;">printf</span></a><span style="color: #66cc66;">(</span><span style="color: #ff0000;">"Doing fputs...<span style="color: #000099; font-weight: bold;">\n</span>"</span><span style="color: #66cc66;">)</span>;
retval = orig_fputs<span style="color: #66cc66;">(</span>s, stream<span style="color: #66cc66;">)</span>;
<a href="http://www.opengroup.org/onlinepubs/009695399/functions/printf.html"><span style="color: #000066;">printf</span></a><span style="color: #66cc66;">(</span><span style="color: #ff0000;">"fputs returning %d.<span style="color: #000099; font-weight: bold;">\n</span>"</span>, retval<span style="color: #66cc66;">)</span>;
<span style="color: #b1b100;">return</span> retval;
<span style="color: #66cc66;">}</span>
</pre>
<pre>
bash$ gcc -shared -ldl -o fputs.so fputs.c
bash$ LD_PRELOAD=./fputs.so /bin/echo "Hello, world"
Doing fputs...
Hello, worldfputs returning 1.
bash$
</pre>
<ul>
<li>Why is there a line break after “fputs returning 1.” and none between that and “Hello, world” ?</li>
</ul>Matthew Sachshttp://www.blogger.com/profile/01455266173457039291noreply@blogger.com