The Google Code-in contest is now open! Students ages 13 to 17 gain real-world software development experience by building open source software with the support of mentors.
ES2015 (also known as ES6), the version of the JavaScript specification ratified in 2015, is a huge improvement to the language’s expressive power thanks to features like classes, for-of, destructuring, spread, tail calls, and much more. But powerful language features come at a high cost for language implementers. Prior to ES6, WebKit had spent years optimizing the idioms that arise in ES3 and ES5 code. ES6 requires completely new compiler and runtime features in order to get the same level of performance that we have with ES5.
WebKit’s JavaScript implementation, called JSC (JavaScriptCore), implements all of ES6. ES6 features were implemented to be fast from the start, though we only had microbenchmarks to measure the performance of those features at the time. While it was initially helpful to optimize for our own microbenchmarks and microbenchmark suites like six-speed, to truly tune our engine for ES6, we needed a more comprehensive benchmark suite. It’s hard to optimize new language features without knowing how those features will be used. Since we love programming, and ES6 has many fun new language features to program with, we developed our own ES6 benchmark suite. This post describes the development of our first ES6 benchmark, which we call ARES-6. We used ARES–6 to drive significant optimization efforts in JSC, and this post describes three of them: high throughput generators, a new Map/Set implementation, and phantom spread. The post concludes with an analysis of performance data to show how JSC’s performance compares to other ES6 implementations.
ARES-6 Benchmark
The suite consists of two subtests we wrote in ES6 ourselves, and two subtests that we imported that use ES6. The first subtest, named Air, is an ES6 port of the WebKit B3 JIT‘s Air::allocateStack phase. The second subtest, named Basic, is an ES6 implementation of the ECMA-55 BASIC standard using generators. The third subtest, named Babylon, is an import of Babel’s JavaScript parser. The fourth subtest, named ML, is an import of the feedforward neural network library from the mljs machine learning framework. ARES-6 runs the Air, Basic, Babylon, and ML benchmarks multiple times to gather lots of data about how the browser starts up, warms up, and janks up. This section describes the four benchmarks and ARES-6’s benchmark running methodology.
Air
Air is an ES6 reimplementation of JSC’s allocateStack compiler phase, along with all of Assembly Intermediate Representation needed to run that phase. Air tries to faithfully use new features like arrow functions, classes, for-of, and Map/Set, among others. Air doesn’t avoid any features out of fear that they might be slow, in the hope that we might learn how to make those features fast by looking at how Air and other benchmarks use them.
At the time that Air was written, most JavaScript benchmarks used ES5 or older versions of the language. ES6 testing mostly relied on microbenchmarks or conversions of existing tests to ES6. We use larger benchmarks to avoid over-optimizing for small pieces of code. We also avoid changing existing benchmarks because that approach has no limiting principle: if it’s OK to change a benchmark to use a feature, does that mean we can also change it to remove the use of a feature we don’t like? We feel that one of the best ways to avoid falling into the trap of creating benchmarks that only reinforce what some JS engine is already good at is to create a new benchmark from first principles.
We only recently completed our new JavaScript compiler, named B3. B3’s backend, named Air, is very CPU-intensive and uses a combination of object-oriented and functional idioms in C++. Additionally, it relies heavily on high speed maps and sets. It goes so far as to use customized map/set implementations – even more so than the rest of WebKit. This makes Air a great candidate for ES6 benchmarking. The Air benchmark in ARES-6 is a faithful ES6 implementation of JSC’s Air. It pulls no punches: just as the original C++ Air was written with expressiveness as a top priority, ES6 Air is liberal in its use of modern ES6 idioms whenever this helps make the code more readable. Unlike the original C++ Air, ES6 Air doesn’t exploit a deep understanding of compilers to make the code easy to compile.
Design
Air runs one of the more computationally expensive C++ Air phases, Air::allocateStack(). It turns abstract stack references into concrete stack references, by selecting how to lay out stack slots in the stack frame. This requires liveness analysis and an interference graph.
Air relies on three major ES6 features more so than most of the others:
Arrow functions. Like the C++ version with lambdas, Air uses a functional style of iterating most non-trivial data-structures. This is because the functional style allows the callbacks to mutate the data being iterated: if the callback returns a non-null value, forEachArg() will replace the argument with that value. This would not have been possible with for-of. Air’s use of forEachArg() and arrow functions usually looks like this:
inst.forEachArg((arg, role, type, width) => ...)
For-of. Many Air data structures are amenable to for-of iteration. While the innermost loops tend to use functional iteration, pretty much all of the outer logic uses for-of heavily. For example:
for (letblockofcode) // Iterate over the basic blocks in a program
for (letinstofblock) // Iterate over the instructions in a block
...
Map/Set. The liveness analysis in Air::allocateStack() relies on maps and sets. For example, we use a liveAtHead map that is keyed by basic block. Its values are sets of live stack slots. This is a relatively crude way of doing liveness, but it is exactly how the original Air::LivenessAnalysis worked. So we view it as being quite faithful to how a sensible programmer might use Map and Set.
Air also uses some other ES6 features. For example, it makes light use of a Proxy. It makes extensive use of classes, let/const, and Symbols. Symbols are used as enumeration elements, and so, they frequently show up as cases in switch statements.
The workflow of an Air run is pretty simple: we do 200 runs of allocateStack on four IR payloads.
Each IR payload is a large piece of ES6 code that constructs an Air Code object, complete with basic blocks, temporaries, stack slots, and instructions. These payloads are generated by running the Air::dumpAsJS() phase. This is a phase we wrote in the C++ version of Air that will generate JS code that builds an IR payload. Just prior to running the C++ allocateStack phase, we perform Air::dumpAsJS() to capture the payload. The four payloads we generate are for the hottest functions in four other major JS benchmarks:
Kraken/imaging-gaussian-blur, the gaussianBlur function.
Octane/Typescript, the scanIdentifier function.
Air (yes, it’s self referential), an anonymous closure identified by our profiler as ACLj8C.
These payloads allow Air to precisely replay allocateStack on those actual functions.
Air validates its results. We added a Code hashing capability to both the C++ Air and ES6 Air, and we assert each payload looks identical after allocateStack to what it would have looked like after the original C++ allocateStack. We also validate that the payloads hash properly before allcoateStack, to help catch bugs during payload initialization. We have not measured how long hashing takes, but it’s an O(N) operation, while allocateStack is closer to O(N^2). We suspect that barring some engine pathologies, hashing should be much faster than allocateStack, and allocateStack should be where the bulk of time is spent.
Summary
At the time that Air was written, we weren’t happy with the ES6 benchmarks that were available to us. Air makes extensive use of ES6 features in the hope that we can learn about possible optimization strategies by looking at this and other benchmarks.
Air does not use generators at all. We almost used them for forEachArg iteration, but it would have been a very unusual use of generators because Air’s forEach functions are more like map than for-of. The whole point is to allow the caller to mutate entries. We thought that functional iteration was more natural for this case than for-of and generators. But this led us to wonder: how would we use generators?
Basic
Web programming requires reasoning about asynchrony. But back when some of us first learned to program with the BASIC programming language, we didn’t have to worry about that: if you wanted input from the user, you said INPUT and the program would block until the user typed things. In a sense, this is what generators are about: when you say yield, your code blocks until some other stuff happens.
Basic is an ES6 implementation of ECMA-55 BASIC. It implements the interpreter as a recursive abstract syntax tree (AST) walk, where the recursive functions are generators so that they can yield when they encounter BASIC’s INPUT command. The lexer is also written as a generator. Basic tests multiple styles of generators, including functions with multiple yield points and recursive yield* calls.
Lexers are usually written as a lex function that maintains some state and returns the next token whenever you call it. Basic uses a generator to implement lex, so that it can use local variables for all of its state.
Basic’s lexer is a generator function with multiple nested functions inside it. It contains eight uses of yield, none of which are recursive.
AST Walk
Walking the AST is an easy way to implement an interpreter and this was a natural choice for Basic. In such a scheme, the program is represented as a tree, and each node has code associated with it that may recursively invoke sub-nodes in the tree. Basic uses plain object literals to create nodes. For example:
This code in the parser creates an AST node whose evaluation function is Basic.NumberPow. Were it not for INPUT, Basic could use ordinary functions for all of the AST. But INPUT requires blocking; so, we implement that with yield. This means that all of the statement-level AST nodes use generators. Those generators may call each other recursively using yield*.
The AST walk interpreter contains 18 generator functions with two calls to yield (in INPUT and PRINT) and one call to yield* (in Basic.Program).
Workloads
Each Basic benchmark iteration runs five simple Basic programs:
Hello world!
Print numbers 1..10.
Print a random number.
Print random numbers until 100 is printed.
Find all prime numbers from 2 to 1999.
The interpreter uses a fixed seed, so the random parts always behave the same way. The Basic benchmark runs 200 iterations.
Summary
We wrote Basic because we wanted to see what an aggressive attempt at using generators looks like. Prior to Basic, all we had were microbenchmarks, which we feared would focus our optimization efforts only on the easy cases. Basic uses generators in some non-obvious ways, like having multiple generator functions, many yield points, and recursive generator calls. Basic also uses for-of, classes, Map, and WeakMap.
Babylon and ML
It is important to us that ARES-6 consists both of tests we wrote, and tests we imported. Babylon and ML are two tests we imported with minor modifications to make them run in the browser without Node.js style modules or ES6 modules. (The reason for not including tests with ES6 modules is that at the time of writing this, <script type="module"> is not on by default in all major browsers.) Writing new tests for ARES-6 is important because it allows us to use ES6 in interesting and sophisticated ways. Importing programs we didn’t write is also imperative for performance analysis because it ensures that we measure the performance of interesting programs already using ES6 out in the wild.
Babylon is an interesting test for ARES-6 because it makes light use of classes. We think that many people will dip their toes into ES6 by adopting classes since many ES5 programs are written in ways that lead to an easy translation to classes. We’ve seen this firsthand in the WebKit project when Web Inspector moved all their ES5 pseudo-classes to use ES6 class syntax. You can browse the Babylon source in ARES-6 here. The Babylon benchmark runs 200 iterations where each iteration consists of parsing four JS programs.
Air, Basic and Babylon each run for 200 iterations, and ML runs for 60 iterations. Each iteration does the same kind of work. We simulate page reload before the first of those iterations, to minimize the chances that the code will benefit from JIT optimizations at the beginning of the first iteration. ARES-6 analyzes these four benchmarks in three different ways:
Start-up performance. We want ES6 code to run fast right from the start, even if our JITs haven’t had a chance to perform optimizations yet. ARES-6 reports a First Iteration score that is just the execution time of the first of the 60 or 200 iterations.
Worst-case performance. JavaScript engines have a lot of different kinds of jank. The GC, JIT, and various adaptive optimizations all run the risk of causing programs to sometimes run for much longer than the norm. We don’t want our ES6 optimizations to introduce new kinds of jank. ARES-6 reports a Worst 4 Iterations score that is the average of the execution times of the worst 4 of the 60 or 200 iterations, excluding the first iteration which was used for measuring start-up performance.
Throughput. If you write some ES6 code and it runs for long enough, then it should eventually benefit from all of our optimizations. ARES-6 reports an Average score that is the average execution time of all iterations excluding the first.
Each repetition of ARES-6 yields 3 scores (measured in milliseconds): First Iteration, Worst 4 Iterations, and Average, for each of the 4 subtests: Air, Basic, Babylon, and ML, for a total of 12 scores. The geometric mean of all 12 scores is the Overall score for the repetition. ARES-6 runs 6 repetitions, and reports the averages of each of these 13 scores along with their 95% confidence intervals. Since these scores are measures of execution time, lower scores are better because they mean that the benchmark completed in less time.
Optimizations
We built ARES-6 so that we could have benchmarks with which to tune our ES6 implementation. ARES-6 was built specifically to allow us to study the impact of new language features on our engine. This section describes three areas where we made significant changes to JavaScriptCore in order to improve our score on ARES-6. We revamped our generator implementation to give generator functions full access to our optimizing JITs. We rewrote our Map/Set implementation so that our JIT compiler can inline Map/Set lookups. Finally, we added significant new escape analysis functionality to eliminate the object allocation of the rest parameter when used with the spread operator.
High-Throughput Generators
The performance of ES6 generators is critical to the overall performance of JavaScript engines. We expect generators to be frequently used as ES6 adoption increases. Generators are a language-supported way to suspend and resume an execution context. They can be used to implement value streams and custom iterators, and are the basis of ES2017’s async and await, which streamlines the notoriously tricky asynchronous Promise programming model into the direct style programming model. To be performant, it was an a priori goal that our redesigned generator implementation had a sound and simple compilation strategy in all of our JIT tiers.
A generator must suspend and resume the execution context at the point of a yield expression. To do this, JSC needs to save and restore its execution state: the instruction pointer and the current stack frame. While we need to save the logical JavaScript instruction pointer to resume execution at the appropriate point in the program, the machine’s instruction pointer may change. If the function is compiled in upper JIT tiers (like baseline, DFG, and FTL), we must select the code compiled in the best tier when resuming. In addition, the instruction pointer may be invalidated if the compiled code is deoptimized.
JSC’s bytecode is a 3AC-style (three-address code) IR (intermediate representation) that operates over virtual registers. The bytecode is allowed an effectively “infinite” number of registers, and the stack frame comprises slots to store each register. In practice, the number of virtual registers used is small. The key insight into our generator implementation is that it is a transformation over JSC’s bytecode. This completely hides the complexity of generators from the rest of the engine. The phases prior to bytecode generatorification (parsing, bytecode generation from the AST) are allowed to view yield as if it were a function call — about the easiest kind of thing for those phases to do. Best of all, the phases after generatorification do not have to know anything about generators. That’s a fantastic outcome, since we already have a massive interpreter and JIT compiler infrastructure that consumes this bytecode.
When generating the original bytecode, JSC’s bytecode compiler treats a generator mostly as if it were a program with normal control flow, where each “yield” point is simply an expression that takes arguments, and results in a value. However, this is not how yield is implemented at a machine level. To transform this initial form of bytecode into a version that does proper state restoration, we rewrite the generator’s bytecode to turn the function into a state machine.
To properly restore the instruction pointer at yield points, our bytecode transformation inserts a switch statement at the top of each generator function. It performs a switch over an integer that represents where in the program the generator should resume when called. Secondly, we must have a way to restore each virtual register at each yield point. It turns out that JSC already has a mechanism for doing exactly this. We turned this problem into a problem of converting each bytecode virtual register in the original bytecode into a closure variable in JSC’s environment record data structure. Each transformed generator allocates a closure to store all generator state. Every yield point saves each live virtual register into the closure, and every resume point restores each live virtual register from the closure.
Because this bytecode transformation changes the program only by adding closures and a switch statement, this approach trivially meets our goal of being compilable in our optimizing compiler tiers. JSC’s optimizing compilers already do a good job at optimizing switch statements and closure variables. The generator’s dispatch gets all of JSC’s switch lowering optimizations just by virtue of using the switch bytecode. This includes our B3 JIT’s excellent switch lowering optimization, and less aggressive versions of that same optimization in lower JIT tiers and in the interpreter. The DFG also has numerous optimizations for closures. This includes inferring types of closure variables, optimizing loads and stores to closures down to a single instruction in most cases, and properly modeling the aliasing effects of closure accesses. We get these benefits without introducing new constructs into the optimizing compiler.
Example
Let’s walk through an example of the bytecode transformation for an example JavaScript generator function:
JSC generates the bytecode sequence shown in the following figure, which represents the control flow graph.
When the AST-to-bytecode compiler sees a yield expression, it just emits the special op_yield bytecode operation. This opcode is a macro. It’s not recognized by any of our execution engines. But it has well-understood semantics, which allows us to treat it as bytecode syntactic sugar for closures and switches.
Desugaring is performed by the generatorification bytecode phase. The figure above explains how generatorification works. Generatorification rewrites the bytecode so that there are no more op_yield statements and the resulting function selects which code to execute using a switch statement. Closure-style property access bytecodes are used to save/restore state around where the op_yield statements used to be.
Previously, JSC bytecode was immutable. It was used to carry information from the bytecode generator, which wrote bytecode as a byproduct of walking the AST, to the compilers, which would consume it to create some other byproduct (either machine code or some other compiler IR). Generatorification changes that. To facilitate this phase, we created a general-purpose bytecode rewriting facility for JSC. It allows us to insert and remove any kind of bytecode instruction, including jumps. It permits us to perform sophisticated control-flow edits on the bytecode stream. We use this to make generatorification easier, and we look forward to using it to implement other crazy future language features.
To allow suspension and resumption of the execution context at op_yield, the rewriter inserts the op_switch_imm opcode just after the op_enter opcode. At the point of op_yield, the transformed function saves the integer that represents the virtual instruction pointer and then returns from the function with op_ret. Then, when resuming, we use the inserted op_switch_imm with the saved integer, jumping to the point just after the op_yield which suspended the context.
To save and restore live registers, this pass performs liveness analysis to find live registers at op_yield and then inserts op_put_to_scope and op_get_from_scope operations to save their state and restore it (respectively). These opcodes are part of our closure object model, which happens to be appropriate here because JSC’s closures are just objects that statically know what their fields are and how they will be laid out. This allows fast allocation and fast access, which we already spent a great deal of time optimizing for actual JS closures. In this figure, generatorification performs liveness analysis and finds that the variable temp is live over the op_yield point. Because of this, the rewriter emits op_put_to_scope and op_get_from_scope to save and restore temp. This does not disrupt our optimizing compiler’s ability to reason about temp‘s value, since we had already previously solved that same problem for variables saved to closures.
Summary
The generatorification rewrite used bytecode desugaring to encapsulate the whole idea of generators into a bytecode phase that emits idiomatic JSC bytecode. This allows our entire optimization infrastructure to join the generator party. Programs that use generators can now run in all of our JIT tiers. At the time this change landed, it sped up Basic’s Average score by 41%. It also improved Basic overall: even the First Iteration score got 3% faster, since our low-latency DFG optimizing JIT is designed to provide a boost even for start-up code.
The data in the above graph was taken on a Mid 2014, 2.8 GHz Core i7, 16GB RAM, 15″ MacBook Pro, running macOS Sierra version 10.12.0. Safari 10.0.2 is the first version of Safari to ship with JavaScriptCore’s complete ES6 implementation. Safari 10.0.2 does not include any of the optimizations described in this post. The above graph shows that since implementing ES6, we’ve made significant improvements to JSC’s performance running Basic. We’ve made a 1.2x improvement in the First Iteration score, a 1.4x improvement in the Worst 4 Iterations score, and a 2.8x improvement in the Average score.
Desugaring is a classic technique. Crazy compiler algorithms are easiest to think about when they are confined to their own phase, so we turned our bytecode into a full-fledged compiler IR and implemented generatorification as a compiler phase over that IR. Our new bytecode rewriter is powerful enough to support many kinds of phases. It can insert and remove any kind of bytecode instruction including jumps, allowing for complex control flow edits. While it’s only used for generators presently, this rewriting facility can be used to implement complex ECMAScript features in the future.
Fast Map and Set
Map and Set are two new APIs in ES6 that make it a more expressive language to program in. Prior to ES6, there was no official hashtable API for JavaScript. It’s always been possible to use a JavaScript object as a hashtable if only Strings and Symbols are used as keys, but this isn’t nearly as useful as a hashtable that can take any value as a key. ES6 fixes this ancient language deficiency by adding new Map and Set types that accept any JavaScript value as a key.
Both Air and Basic use Map and Set. Profiling showed that iteration and lookup were by far the most common operations, which is fairly consistent with our experience elsewhere in the language. For example, we have always found it more important to optimize property loads than property stores, because loads are so much more common. In this section we present our new Map/Set implementation, which we optimized for the iteration and lookup patterns of ARES-6.
Fast Lookup
To make lookup fast, we needed to teach our JIT compilers about the internal workings of our hashtable. The main performance benefit of doing this comes from inlining the hashtable lookup into the IR for a function. This obviates the need to call out to C++ code. It also allows us to implement the lookup more efficiently by leveraging the compiler’s smarts. To understand why, let’s analyze a hashtable lookup in more detail. JSC’s hashtable implementation is based off the linear probing algorithm. When you perform map.has(key) inside a JS program, we’re really performing a few abstract operations under the hood. Let’s break those operations down into pseudo code:
letbucket=startBucket(h);
while (true) {
if (isEmptyBucket(bucket))
returnemptyBucketSentinel;
// Note that the actual key comparison for ES6 Map and Set is not triple equals.
// But it's close. We can assume it's triple equals for the sake of this explanation.
if (bucket.key===key)
returnbucket;
h=nextIndex(h);
}
There are many things that can be optimized here based on information we have about key. The compiler will often know the type of key, allowing it to emit a more efficient hash function, and a more efficient triple equals comparison inside the findBucket loop. The C code must handle JavaScript key values of all possible types. However, inside the compiler, if we know the type of key, we may be able to emit a triple equals comparison that is only a single machine compare instruction. The hash function over key also benefits from knowing the type of key. The C++ implementation of the hash function must handle keys of all types. This means that it’ll have to do a series of type checks on key to determine how it should be hashed. The reason for this is we hash numbers differently than strings, and differently than objects. The JIT compiler will often be able to prove the type of key, thereby allowing us to avoid multiple branches to learn key‘s type.
These changes alone already make Map and Set much faster. However, because the compiler now knows about the inner workings of a hashtable, it can further optimize code in a few neat ways. Let’s consider a common use of the Map API:
if (map.has(key))
returnmap.get(key);
...
To understand how our compiler optimizations come into play, let’s look at how our DFG (Data Flow Graph) IR represents this program. DFG IR is used by JSC for performing high-level JavaScript-specific optimizations, including optimizations based on speculation and self-modifying code. This hashtable program will be transformed into roughly the following DFG IR (actual DFG IR dumps have a lot more information):
DFG IR allows for precise modeling of effects of each operation as part of the clobberize analysis. We taught clobberize how to handle all of our new hashtable operations, like GetBucket, IsNonEmptyBucket, Hash, and LoadFromBucket. The DFG CSE (common subexpression elimination) phase uses this data and runs with it: it can see that the same Hash and GetBucket operations are performed twice without any operations that could change the state of the map in between them. Lots of other DFG phases use clobberize to understand the aliasing and effects of operations. This unlocks loop-invariant code motion and lots of other optimizations.
In this example, the optimized program will look like:
Our compiler was able to optimize the redundant hashtable lookups in has and get down to a single lookup. In C++, hashtable APIs go to great lengths to provide ways of avoiding redundant lookups like the one here. Since ES6 provides only very simple Map/Set APIs, it’s difficult to avoid redundant lookups. Luckily, our compiler will often get rid of them for you.
Fast Iteration
As we wrote Basic and Air, we often found ourselves iterating over Maps and Sets because writing such code is natural and convenient. Because of this, we wanted to make Map and Set iteration fast. However, it’s not immediately obvious how to do this because ES6’s Map and Set iterate over keys in insertion order. Also, if a Map or Set is modified during iteration, the iterator must reflect the modifications.
We needed to use a data structure that is fast to iterate over. An obvious choice is to use a linked list. Iterating a linked list is fast. However, testing for the existence of something inside a linked list is O(n), where n is the number of elements in the list. To accommodate the fast lookup of a hashtable, and the fast iteration of a linked list, we chose a hybrid approach of a combo linked-list hashtable. Every time something is added to a Map or Set, it goes onto the end of the linked list. Every element in the linked list is a bucket. Each entry in the hashtable’s internal lookup buffer points to a bucket inside the linked list.
This is best understood through a picture. Consider the following program:
A traditional linear probing hashtable would look like this:
However, in our scheme, we need to know the order in which we need to iterate the hashtable. So, our combo linked-list hashtable will look like this:
As mentioned earlier, when iterating over a Map or Set while changing it, the iterator must iterate over newly added values, and must not iterate over deleted values. Using a linked list data structure for iteration allows for a natural implementation of this requirement. Inside JSC, Map and Set iterators are just wrappers over hashtable buckets. Buckets are garbage collected, so an iterator can hang onto a bucket even after it has been removed from the hashtable. As an item is deleted from the hashtable, the bucket is removed from the linked list by updating the deleted bucket’s neighbors to now point to each other instead of the deleted bucket. Crucially, the deleted bucket will still point to its neighbors. This allows iterator objects to still point to the deleted bucket and then find their way back to non-deleted buckets. Asking such an iterator for its value will lead the iterator to traverse its neighbor pointer until it finds the next non-deleted bucket. The key insight here is that deleted buckets can always find the next non-deleted bucket by doing a succession of pointer chasing through their neighbor pointer. Note that this pointer chasing might lead the iterator to the end of the list, in which case, the iterator is closed.
For example, consider the previous program, now with the key 2 deleted:
Let’s consider what happens when there is an iterator that’s pointing to the bucket for 2. When next() is called on that iterator, it’ll first check if the bucket it’s pointing to is deleted, and if so, it will traverse the linked list until it finds the first non-deleted entry. In this example, the iterator will notice the bucket for 2 is deleted, and it will traverse to the bucket for 3. It will see that 3 is not deleted, and it will yield its value.
Summary
We found ourselves using Map/Set a lot in the ES6 code that we wrote, so we decided to optimize these data structures to further encourage their use. Our Map/Set rewrite represents the hashtable’s underlying data structures in a way that is most natural for our garbage collector and DFG compiler to reason about. At the time this rewrite was committed, it contributed to an 8% overall performance improvement on ARES-6.
Phantom Spread
Another ES6 feature we found ourselves using is the combination of the rest parameter and the spread operator. It’s an intuitive programming pattern to gather some arguments into an array using the rest parameter, and then forward them to another function call using spread. For example, here is one way we use this in the Air benchmark:
A naive implementation of spread over the rest parameter will cause this code to run slower than needed because a spread operator requires performing the iterator protocol on its argument. Because the function visitArg is creating the args array, the DFG compiler knows exactly what data will be stored into it. Specifically, it’ll be filled with all but the first two arguments to visitArg. We chose to implement an optimization for this programming pattern because we think that the spread of a rest parameter will become a common ES6 programming idiom. In JSC, we also implement an optimization for the ES5 variant of this programming idiom:
foo()
{
bar.apply(null, arguments);
}
The ideal code the DFG could emit to set up the stack frame for the call to func inside visitArg is a memcpy from the arguments on its own stack frame to that of func‘s stack frame. To be able to emit such code, the DFG compiler must prove that such an optimization is sound. To do this, the DFG must prove a few things. It must first ensure that the array iterator protocol is not observable. It proves it is not observable by ensuring that:
Array.prototype[Symbol.iterator] is its original value,
By performing such a proof, the DFG knows exactly what the protocol will do and it can model its behavior internally. Secondly, the DFG must prove that the args array hasn’t changed since being populated with the values from the stack. It does this by performing an escape analysis. The escape analysis will give a conservative answer to the question: has the args array changed since its creation on entry to the function? If the answer to the question is “no”, then the DFG does not need to allocate the args array. It can simply perform a memcpy when setting up func‘s stack frame. This is a huge speed improvement for a few reasons. The primary speed gain is from avoiding performing the high-overhead iterator protocol. The secondary improvement is avoiding a heap object allocation for the args array. When the DFG succeeds at performing this optimization, we say it has converted a Spread into a PhantomSpread. This optimization leverages the DFG’s ability to not only optimize away object allocations, but to rematerialize them if speculative execution fails and we fall back to executing the bytecode without optimizations.
The data in the above graph was taken on a Mid 2014, 2.8 GHz Core i7, 16GB RAM, 15″ MacBook Pro, running macOS Sierra version 10.12.0. PhantomSpread was a significant new additionto our DFG JIT compiler. The Phantom Spread optimization, the rewrite of Map/Set, along with other overall JSC improvements, shows that we’ve sped up Air’s Average score by nearly 2x since Safari 10.0.2. Crucially, we did not introduce such a speed up at the expense of the First Iteration and Worst 4 Iterations scores. Both of those scores have also progressed.
Performance Results
We believe that a great way to keep ourselves honest about JavaScriptCore’s performance is to compare to the best performance baselines available. Therefore, we routinely compare our performance both to past versions of JavaScriptCore and to other JavaScript engines. This section shows how JavaScriptCore compares to Firefox and Chrome’s JavaScript engines.
The graphs in this section use data taken on a Late 2016, 2.9 GHz Core i5, 8GB RAM, 13″ MacBook Pro, running macOS High Sierra Beta 1.
The figure above shows the overall ARES-6 scores in three different browsers: Safari 11 (13604.1.21.0.1), Firefox 53.0.3, and Chrome 58.0.3029.110. Our data shows that we’re nearly to 1.8x faster than Chrome, and close to 5x faster than Firefox.
The graph above shows detailed results for all four benchmarks and their three scores. It shows that JavaScriptCore is the fastest engine at running ARES-6 not only in aggregate score, but also in all three scores for each individual subtest.
Conclusion
ES6 is a major improvement to the JavaScript language, and we had fun giving it a test drive when creating ARES-6. Writing Air and Basic, and importing Babylon and ML, gave us a better grasp of how ES6 features are likely to be used. Adding new benchmarks to our repertoire always teaches us new lessons. Long term, this only works to our benefit if we keep adding new benchmarks and don’t over optimize for the same set of stale benchmarks. Going forward, we plan to add more subtests to ARES-6. We think adding more tests will keep us honest in not over-tuning for specific ES6 code patterns. If you have exciting ES6 (or ES7 or beyond) code that you think is worth including in ARES-6, we’d love to hear about it. You can get in touch either by filing a bug or contacting Filip, Saam, or Yusuke, on Twitter.
The success of the web as a platform relies on user trust. Many users feel that trust is broken when they are being tracked and privacy-sensitive data about their web activity is acquired for purposes that they never agreed to.
WebKit has long included features to reduce tracking. From the very beginning, we’ve defaulted to blocking third-party cookies. Now, we’re building on that. Intelligent Tracking Prevention is a new WebKit feature that reduces cross-site tracking by further limiting cookies and other website data.
What Is Cross-Site Tracking and Third-Party Cookies?
Websites can fetch resources such as images and scripts from domains other than their own. This is referred to as cross-origin or cross-site loading, and is a powerful feature of the web. However, such loading also enables cross-site tracking of users.
Imagine a user who first browses example-products.com for a new gadget and later browses example-recipies.com for dinner ideas. If both these sites load resources from example-tracker.com and example-tracker.com has a cookie stored in the user’s browser, the owner of example-tracker.com has the ability to know that the user visited both the product website and the recipe website, what they did on those sites, what kind of web browser was used, et cetera. This is what’s called cross-site tracking and the cookie used by example-tracker.com is called a third-party cookie. In our testing we found popular websites with over 70 such trackers, all silently collecting data on users.
How Does Intelligent Tracking Prevention Work?
Intelligent Tracking Prevention collects statistics on resource loads as well as user interactions such as taps, clicks, and text entries. The statistics are put into buckets per top privately-controlled domain or TLD+1.
Machine Learning Classifier
A machine learning model is used to classify which top privately-controlled domains have the ability to track the user cross-site, based on the collected statistics. Out of the various statistics collected, three vectors turned out to have strong signal for classification based on current tracking practices: subresource under number of unique domains, sub frame under number of unique domains, and number of unique domains redirected to. All data collection and classification happens on-device.
Actions Taken After Classification
Let’s say Intelligent Tracking Prevention classifies example.com as having the ability to track the user cross-site. What happens from that point?
If the user has not interacted with example.com in the last 30 days, example.com website data and cookies are immediately purged and continue to be purged if new data is added.
However, if the user interacts with example.com as the top domain, often referred to as a first-party domain, Intelligent Tracking Prevention considers it a signal that the user is interested in the website and temporarily adjusts its behavior as depicted in this timeline:
If the user interacted with example.com the last 24 hours, its cookies will be available when example.com is a third-party. This allows for “Sign in with my X account on Y” login scenarios.
This means users only have long-term persistent cookies and website data from the sites they actually interact with and tracking data is removed proactively as they browse the web.
Partitioned Cookies
If the user interacted with example.com the last 30 days but not the last 24 hours, example.com gets to keep its cookies but they will be partitioned. Partitioned means third-parties get unique, isolated storage per top privately-controlled domain or TLD+1, e.g. account.example.com and www.example.com share the partition example.com.
This makes sure users stay logged in even if they only visit a site occasionally while restricting the use of cookies for cross-site tracking. Note that WebKit already partitions caches and HTML5 storage for all third-party domains.
What Does This Mean For Web Developers?
With Intelligent Tracking Prevention, WebKit strikes a balance between user privacy and websites’ need for on-device storage. That said, we are aware that this feature may create challenges for legitimate website storage, i.e. storage not intended for cross-site tracking. Please let us know of such cases and we will try to help (contact info at the end of this blog post).
To get you started, here are some some guidelines.
Storage Requires User Interaction
Check to make sure that you aren’t relying on cookies and other storage to persist if the user does not interact directly with your website on a regular basis. Requiring user interaction covers most legitimate uses of client-side storage. It also provides better transparency and gives users more control over who gets to store data on their devices.
Web Analytics
Make sure to configure your web analytics to not rely on third-party cookies from domains that don’t get user interaction. A popular way to do cross-site analytics for a family of sites is to use link decoration, i.e. pad links with information that needs to be carried across origins and navigations.
Ad Attribution
We recommend server-side storage for attribution of ad impressions on your website. Link decoration can be used to pass on attribution information in navigations.
Managing Single Sign-On
If you run a single sign-on system with a centralized session, the user needs to interact with the domain that controls the session. Otherwise you run the risk of Intelligent Tracking Prevention treating your session controller domain as a tracker.
Imagine a scenario as depicted above; a central session at account.com used for the three sites SiteA.com, SiteB.com, and SiteC.com. Session information can be propagated from account.com to the dependents during account.com’s 24-hour exemption from cookie partitioning. From that point on the sites must maintain sessions without account.com cookies, or they must re-authenticate daily with a brief stop at account.com to acquire new user interaction. You can grant the sites the ability to propagate session information between themselves through navigations and new cookies set in HTTP responses. Single sign-out needs to invalidate the account.com session on the server.
Feedback and Bug Reports
Please report bugs through bugs.webkit.org and send feedback to our evangelist Jon Davis if you are a web developer or a web user and Intelligent Tracking Prevention isn’t working as intended for your website. If you have technical questions on how the feature works you can find me on Twitter @johnwilander.
Creating a mobile app for your business is an effective method for boosting customer engagement. Since mobile-first is now the norm, having an app is no longer an option--it is a necessity. Customers want the ability to access your services while on the go. Yet, have you ever thought about internal mobile apps? It's been around a decade since the launch of the first-generation iPhone. This is the phone that made mobile apps part of our daily interactions. Today,using apps to increase productivity is normal. And, if your business isn't using a app to connect your employees with your infrastructure, then you are lagging behind your competitors.
Internal business apps create real efficiency
Depending on themobile app development framework you choose, an internal business app can do a lot to enhance employee productivity and accessibility. You want your app to have an enjoyable user interface that intrigues your employees. This can be done through various programming languages such as:
BuildFire.js
Java
Python
C#
PHP
Imagine your team is at an important industry trade show promoting your products. Then, they start networking and picking up warm leads. How can you immediately send those leads back to your company database? Your employees could send an email with a list of names and contact information. On the other hand, they could use an internal business app that sends their leads straight into your organization's CRM system.
Market research
If your team is engaged in understanding the needs of your target market through research and focus groups, it can help to have an app that will instantly log details of any answers and customer data. Instead of writing things down, and hoping they don't get lost in translation, an app can take those answers and add them to your database directly.
Custom apps
When an out-of-the-box solution won't work, then your company can utilizebespoke business apps. This way, you know your app will fit right in with your internal processes and procedures. Again, with one of the above programming languages, you can create an app to give your employees more convenience and flexibility.
To illustrate, your employees can report expenses while traveling, track tech issues or collaborate with remote employees from anywhere. All of these functions help to increase productivity. Some apps can be as simple as streamlining reporting processes to a well-researched app that transforms your business.
Manage projects
Many businesses can't afford to let projects lag. Beating, or meeting deadlines, is essential to improving your bottom line. Customers want to know that you can meet their requirements on a timely basis. When you fall behind schedule, it costs more money in terms of resources and pushes other projects behind as well.
In this regard, it is critical to have a business app to help you manage all ofyour company projects. It can have functions such as anywhere-access, uploading features, financial reporting and real-time updates across multiple teams. You'll never have to spend hours tracking down the latest update again.
Many businesses are already utilizing internal apps to help improve efficiency and convenience. Now is the time to get started on this trend because your competitors are certainly doing so.
Whether you manage a business or just yourself - keeping your life organized can be a challenge. From bills to important documents - from your health to fitness - it can be difficult to remember everything you have to do, and much less get around to actually doing them.
Luckily - there’s a ton of technology out there designed to help you with your everyday tasks. According to Mike Saile, managing attorney ofCordisco and Saile, LLC, "A lot of people categorize themselves as 'forgetful.' I've met people who have done everything from forget a bill for so long that they got in trouble, to forget to pick up their pet from the groomer. But, a lot of my friends have made use of smartphone apps to help remind them to do. things and keep them organized. This can really be a lifesaver if you're prone to forgetting important things."
If you have a smartphone, many of these services can be found in app form for cheap - or free. Apps can help you do everything from budget for your upcoming bill payments, to remind you to drink the right amount of water per day to stay hydrated.
Budget
Tracking and maintaining a strict and accurate budget can be incredibly difficult for the unorganized person. Luckily, there’s plenty of services out there designed to help you. Mint is an app that’s designed not only to help you keep track of your spending and monetary information all in one place - it helps you to design a budget based on your spending patterns. Once the budget is created, it also notifies you if you go over, or stay under budget - which will help you plan for future budgets.
It’s an easy to use, all-in-one service designed to help keep the average person organized and aware of their finances without extra hassle. It’s easy to check and maintain - and perfect for the person who doesn’t know a good way to budget on their own, or just prefers to have things taken care of for them without too much hassle.
Organize your Files
Business, personal, or anything in between - you have to keep your files organized or you’re asking for disaster to strike. Some people are perfectly happy to design their own intricate filing system on their personal computers and in paper form - but some people don’t have time for that around everything else they need to do. For those people, there’s plenty of organizational apps out there designed to help. One of the most popular is Evernote.
This service is a cloud-based file storage system that can keep all of your important documents organized. Not only can you manually organize things in their system - they also offer a high-advanced search feature, which can find keywords in documents as well as pictures and PDFs - so if you organized something in the wrong place, or just didn’t organize them at all, you can still find what you need without digging forever trying to find it.
Plan your Meals
It can be hard to find time in the day to plan meals with so many other things going on. But, many people who don’t find time to plan their meals find that they end up eating out more than they should - or just generally eating in unhealthy ways. Well, there’s an app to help you plan your meals as well! Designed to fit each person’s unique taste and dietary restrictions, Mealime helps you plan your meals for the week, and puts together a grocery list for you to execute said meals - without much effort on your part at all.
When you first sign up for the app, you simply input your tastes, allergies or dietary restrictions, and how many meals you want planned - and the app does the rest. It gives you the shopping list, the recipes, prep time, and more, to ensure that you can save time while still eating healthy at home.
Find Cheap Gas
Nobody likes paying for gas. And if you live in the city, or near a major highway, you know that you’re paying too much for gas every time you stop at the nearest gas station. Luckily, there’s an app for this as well: GasBuddy. This app allows users to post when they find cheap gas prices, ranking them for the area.
This app also has the addresses of the gas stations and their distance from you location. The mobile app also allows you to pull up a map and gives you directions to the gas station - ensuring that you don’t waste gas getting lost while looking for cheaper gas options.
Keep Hydrated
Let’s face it - everyone forgets to drink enough water. No matter what you do, chances are you forget to drink enough water during the day and end up thirsty and irritated later in the day. Since app developers know that people forget even an easy thing like drinking water, there’s a ton of different apps out there, designed to help you remember. There’s all sorts, from apps that offer games or pets, to apps that have plants in them. But, one of the simplest apps available is called Daily Water.
This simple app does only what it needs to - it reminds you to drink water periodically, keeps track of how much you drink, and logs that data in case you need it later. You can set how much water you would like to drink, when you want to be reminded, and more with this app. Now you can stay hydrated without thinking about it, and focus on other things that need your attention.
Target. Ebay. Home Depot. It seems as if every year brings a new story of a major company experiencing a cyber-security breach, exposing their customers and clients’ data to hackers. It’s natural that - in the wake of these breaches - small and large business owners would be concerned about the security of their systems. After all, every business has its share of private information. Whether that’s your customers’ names, or their complete personal information - it’s your responsibility as a business to ensure that this data is secure.
It’s possible that no one understands this better than lawyers, who have to safeguard their clients’ information carefully. According to Laurence B. Green, attorney and co-founder of Berger & Green, “Your customers are your most valued asset. You need to take steps to ensure that their private data is secure. If you don’t and you experience a breach, you will violate your clients’ trust, as well as lose their respect and their business.”
So how do you ensure that this type of breach does not happen to you? Luckily, there are some fairly easy and painless ways to secure your data and keep it secure. Nothing is completely secure, however, so it’s always wise to keep an eye on your data and watch for security breaches. But you can diminish the threat significantly.
Encrypt Your Sensitive Files
If you have files that contain sensitive information about your business or your customers, make sure that you encrypt and password protect those files. There’s a lot of software out there that will help you accomplish this.
Encrypting your files will ensure that - even if your system is breached, the hackers will still have trouble getting to your sensitive information. So take the time to ensure that all of your sensitive data is encrypted.
Update Your Wi-Fi Connection
If you have an old internet connection or router, you may be in more danger for a breach than if you had a newer one. Ensure that your internet connection is protected by WPA2. WPA and WEP are both older, and more at-risk for breaches if hacked.
All three are wireless encryption protocols designed to make it difficult for outsiders to access your internet information and computer. Even if you are using WPA2, make sure you have a strong password protecting your internet. If your business offers free Wi-Fi to patrons, your best bet is to have a completely separate router for public access. That way you can still offer Wi-Fi, and keep your data secure.
Compile Your Sensitive Information in One Place
This may seem counter-intuitive at first glance. Isn’t there a saying about putting all your valuables in one place? But in this case, it can be smarter than trying to keep it spread out. Not only does that mean that you have one place that you know has all your information, it means that the one computer that stores this information can be more securely protected. Make sure the computer is password protected and connected to a secure internet service. This leaves your other computers or devices a little more free to use.
Password Protect Everything
At this point, this is basically common sense. That, and I’ve mentioned it in every point so far. But it’s important. Make sure that every aspect of your internet and files is secure with a strong password. File? Password protected. Computer? Password protected. Wi-Fi? Password protected. The more passwords you have between a potential hacker and your sensitive data, the more secure you will be in the end.
Ensure that your passwords are not all the same. There are plenty of ideas and guidelines out there to help you design a strong password. You can even randomly generate a string of letters, numbers, and symbols to use as a very secure password. Just make sure that if you do that, you write down the password and keep it in a secure location in case you forget it.
Again - even if you follow every piece of advice on this list, there is still a possibility that you will experience a breach. No system is perfect, and as developers create new tools to combat hackers, the hackers develop new ways to beat the systems. So make sure you stay up to date on the latest systems and security details - don’t let your antivirus software expire. Keep your operating systems up to date. Keep up on the latest scams and how they work. Being aware of the danger will help you combat it.
I put this on the ColinPaice z/OS blog ... and thought it worth sharing with a wider audience...
I was working with a customer who had an "MQ problem" which turned out to be too many virtual machines (VMs) running on the box - so causing lack of CPU. They fixed this and "the MQ problem went away"
There is well know Maslow's hierarchy of needs which says you need air before you think about safety, sense of belonging etc.
So here is Colin's hierarchy of needs... so fix 1) then fix 2) etc
1. CPU - is the image short of CPU?
2. Memory (real and virtual storage) - is there any paging
3. IO - check the IO response time is good
4. Check network response time is good
5. Check subsystem - eg MQ, DB2 are giving good response time
6. Check applications
If you have fixed a problem - start your checks from the top. For example fixing the IO problem allows much more work to flow - so there may now be a CPU problem.
If you have a performance problem, go through the list to see where the problem is. It may save you time before calling for help, as the support team may assume you have gone through the list.
I showed this list to a colleague who said it is really obvious - but if it is so obvious - why do we have so many problems caused by it!
As I was writing this, I was asked about another 'MQ problem' which turned out to be CPU.
Here are some real examples of problems I have dealt with. Tick the one which you have experienced
"The MQ performance was so bad - I could not even logon to the machine to display the MQ error log" - this was a lack of CPU in the VM
"It cannot be a CPU problem there are 20 cores on this machine" - yes but the VM is only configured to have one core. Defective End User
This server does not have a CPU problem" - yes - but half the messages are being routed (using MQ clustering) to that server which does have a CPU problem - problem between keyboard and chair.
"On average the CPU is only 50% busy" - yes - that is because you have peak workload where you run out of CPU followed by long periods where nothing happens.
Whoops I made the MQ buffer pool so big - it caused paging.
Throughput dropped at 8pm each evening - they did backups at 8pm - and the IO response time doubled - so commits took twice as long and transaction rate halved.
MQ distributed performance was poor - someone had reconfigured the connection to the SAN - IO problem
"You are running MQ on that system?- that SAN is due to be replace next month as it is old and overloaded. The reason why that machine was not being used is that it is so old and about to be scrapped - and you are running production MQ on it? " - lack of planning and communications
MQ throughput between MQ on z/OS and Linux died every Saturday. - Backups taken from all distributed machine to z/OS - which swamped the network
MQGETs are slow since we made the messages persistent - messages were out of syncpoiint - so IO for every message.
MQ throughput very low - because the application is doing a remote database insert over the network. The MQGET was very quick - the database update was not..
The official voting period will be between Friday, June 19 and Friday, June 26, 2017, following the 45 day review of the specification. For the convenience of members, voting will actually open a week before this on Friday, June 12 for members who have completed their reviews by then.
If you’re not already an OpenID Foundation member, or if your membership has expired, please consider joining to participate in the approval vote. Information on joining the OpenID Foundation can be found at https://openid.net/foundation/members/registration.
“A valid concern is that dependence on AI may cause us to forfeit human creativity. As Kasparov observes, the chess games on our smartphones are many times more powerful than the supercomputers that defeated him, yet this didn’t cause human chess players to become less capable — the opposite happened. There are now stronger chess players all over the world, and the game is played in a better way.”
“AI-MATHS, a robot designed by a Chinese firm, will sit the math portion of China’s national college entrance examinations, or gaokao, on Wednesday, media reported.”
Summary: We’ve been mapping air quality with Street View cars since 2014. Today, with our partners EDF and Aclima, we're announcing the results of this effort for the City of Oakland and making the data available to scientist and researchers.
As the next line in the series of "Success at Apache", I had to think about what kind of blog post I wanted to write. Given my personal focus, it made sense to focus on new projects coming in and the incubator. When I'm not busy dreaming up new ideas and working on personal projects, I'm helping new projects get in to Apache, keeping their goals in alignment with the Apache Way http://apache.org/foundation/governance/ . I'm a member of a few different PMCs here at Apache, notably the Incubator. I'm a mentor to five different podlings right now. While my primary programming focus is on programming models, my podlings are all over the place. Starting a new project here at Apache can be a daunting task: how do I get in? What if I don't build a diverse community? Becoming a podling has more to do with the community than it does the technical aspects of the project. We don't expect you to be experts in it, but we do expect new projects to be experts in how their own software works. We want to teach you, and we want you to be receptive to learning about The Apache Software Foundation and its best practices.
I'm not sure if everyone does it, but I build a lot of parallels between how an ASF project works and how an Agile team works. Agile teams start off as a bunch of people who don't really know each other but have assembled themselves into an informal team focused on solving a problem, or some number of problems, knowing that they can only do it together. They have common goals and objectives, but lack camaraderie early on to be able to work together smoothly. Over time, they get to know one another, figure out strengths and weaknesses and can resolve issues together. A well-functioning team isn't one at the beginning. It takes time and practice for them to work well - both together and as an outwardly facing unit.
Projects here at Apache follow the same type of maturity progression. Whether it's learning The Apache Way or learning to work with one another, it takes them time to mature and get into a good groove.
Open Communication The ASF is pretty big on open communication, wherever it's a sensible solution. We want to discuss with each other what we're doing, ideas around how to solve it and come up with a good solution together, as a team, in an open manner.
This all ties into agile practices. We host stand ups to talk about what we're doing and see if others have an opinion about what we're doing.
When a project comes to Apache, the original authors need to remember that they're bringing in a lot of experience, and the expectation is that those existing contributors must help get new contributors from the outside - outside their organization specifically - to contribute into the project. By driving towards open communication, outside of your own organization, you're encouraging more people to participate. This sort of governance model ensures that all parties who can participate are aware of decisions being made.
Open Communication isn't for everything though. We need to remember to be respectful in our communications with others and if it's felt that something’s awry - speak privately. But remember that isn't part of the decision making process. Likewise, anytime we're talking about individuals in either a positive or negative way that should be conducted on the private list for a project.
Turning Into a Well Oiled Machine Once a project begins to grow, new people start to get attracted to it. As a community, you have to figure out how to work together. Building a community of diverse ideas and skills will ensure that new ideas keep flowing. Contributors can react quickly to a user's question on list and help them resolve the problem, put in an enhancement request or get a bug report squashed in a following commit. Time is of the essence right now because I have availability to work on this.
There can't be a long drawn out waterfall style process when dealing with Open Source. At the same time, making sure there's a documented decision process and in sometimes an in depth design is critical for both new contributors and existing alike to come to a shared understanding of what is being proposed.
Sustaining Projects need to plan for longevity. Longevity comes in many forms. A strong backlog of features is important. Having a diverse set of committers is even more critical. You could even say that each helps create the other. Just like any feature set, we get to a point where the feature is complete enough that we can move on to another feature.
How do you get there? Apache's main way to go to these points is to incubate http://incubator.apache.org/ . You can't get to this point by yourselves, experiencing with first-hand from existing Foundation members will help get your community to turn a new leaf and adopt this way of working. We want you to be successful, as long as your project can dedicate itself to the practices that have been set forth within the Foundation.
If you're on an existing project, we want to hear your perspectives on how the Foundation works. You may want to reach out to dev@community http://community.apache.org/lists.html to let others know your thoughts, or even just subscribe and see what others have to say. We're all working together to make the foundation better. The more input we receive, both the positive and the negative, will help shape everyone's actions in the community.
WikiLeaks is still dumping CIA cyberweapons on the Internet. Its latest dump is something called "Pandemic":
The Pandemic leak does not explain what the CIA's initial infection vector is, but does describe it as a persistent implant.
"As the name suggests, a single computer on a local network with shared drives that is infected with the 'Pandemic' implant will act like a 'Patient Zero' in the spread of a disease," WikiLeaks said in its summary description. "'Pandemic' targets remote users by replacing application code on-the-fly with a Trojaned version if the program is retrieved from the infected machine."
The key to evading detection is its ability to modify or replace requested files in transit, hiding its activity by never touching the original file. The new attack then executes only on the machine requesting the file.
Version 1.1 of Pandemic, according to the CIA's documentation, can target and replace up to 20 different files with a maximum size of 800MB for a single replacement file.
"It will infect remote computers if the user executes programs stored on the pandemic file server," WikiLeaks said. "Although not explicitly stated in the documents, it seems technically feasible that remote computers that provide file shares themselves become new pandemic file servers on the local network to reach new targets."
The CIA describes Pandemic as a tool that runs as kernel shellcode that installs a file system filter driver. The driver is used to replace a file with a payload when a user on the local network accesses the file over SMB.
Major release of the cornerstone of the Big Data ecosystem, from which dozens of Apache Big Data projects and countless industry solutions originate.
Forest Hill, MD —5 June 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today momentum with Apache® Hadoop® v2.8, the latest version of the Open Source software framework for reliable, scalable, distributed computing.
Now ten years old, Apache Hadoop dominates the greater Big Data ecosystem as the flagship project and community amongst the ASF's more than three dozen projects in the category.
"Apache Hadoop 2.8 maintains the project's momentum in its stable release series," said Chris Douglas, Vice President of Apache Hadoop. "Our community of users, operators, testers, and developers continue to evolve the thriving Big Data ecosystem at the ASF. We're committed to sustaining the scalable, reliable, and secure platform our greater Hadoop community has built over the last decade."
Apache Hadoop supports processing and storage of extremely large data sets in a distributed computing environment. The project has been regularly lauded by industry analysts worldwide for driving market transformation. Forrester Research estimates that firms will spend US$800M in Hadoop software and related services in 2017. According to Zion Market Research, the global Hadoop market is expected to reach approximately US$87.14B by 2022, growing at a CAGR of around 50% between 2017 and 2022.
Apache Hadoop 2.8 is the result of 2 years of extensive collaborative development from the global Apache Hadoop community. With 2,914 commits as new features, improvements and bug fixes since v2.7, highlights include:
Several important security related enhancements, including Hadoop UI protection of Cross-Frame Scripting (XFS) which is an attack that combines malicious JavaScript with an iframe that loads a legitimate page in an effort to steal data from an unsuspecting user, and Hadoop REST API protection of Cross site request forgery (CSRF) attack which attempt to force an authenticated user to execute functionality without their knowledge.
Support for Microsoft Azure Data Lake as a source and destination of data. This benefits anyone deploying Hadoop in Microsoft's Azure Cloud. The Azure Data Lake service was actually developed for Hadoop and analytics workloads.
The "S3A" client for working with data stored in Amazon S3 has been radically enhanced for scalability, performance, and security. The performance enhancements were driven by Apache Hive and Apache Spark benchmarks. In Hive TCP-DS benchmarks, Apache Hadoop is currently faster working with columnar data stored in S3 than Amazon EMR's closed-source connector. This shows the benefit of collaborative Open Source development.
Several WebHDFS related enhancements include integrated CSRF prevention filter in WebHDFS, support OAuth2 in WebHDFS, disallow/allow snapshots via WebHDFS, and more.
Integration with other applications has been improved with a separate jar for the hdfs-client than the hadoop-hdfs JAR with all the server side code. Downstream projects that access HDFS can depend on the hadoop-hdfs-client module to reduce the amount of transitive classpath dependencies.
YARN NodeManager Resource Reconfiguration through RM Admin CLI for a live cluster that allows YARN clusters to have a more flexible resource model especially for a Cloud deployment.
In addition to physical Hadoop clusters, where the majority of storage and computation lies, Apache Hadoop is very popular within Cloud infrastructures. Contributions from Apache Hadoop's diverse community includes improvements provided by Cloud infrastructure vendors and large Hadoop-in-Cloud users. These improvements include: Azure and S3 storage and YARN reconfiguration in particular, improve Hadoop's deployment on and integration with Cloud Infrastructures. The improvements in Hadoop 2.8 enable Cloud-deployed clusters to be more dynamic in sizing, adapting to demand by scaling up and down.
"My colleagues and I are happy that tests of Apache Hive and Hadoop 2.8 show that we are able to provide a similar experience reading data in from S3 as Amazon EMR, with its closed-source fork/rewrite of S3," said Steve Loughran, member of the Apache Hadoop Project Management Committee.
Hailed as a "Swiss army knife of the 21st century" by the Media Guardian Innovation Awards and "the most important software you’ve never heard of…helped enable both Big Data and Cloud computing" by author Thomas Friedman, Apache Hadoop is used by an array of companies such as Alibaba, Amazon Web Services, AOL, Apple, eBay, Facebook, foursquare, IBM, HP, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, SAP, Tencent, Teradata, Tesla Motors, Uber, and Twitter. Yahoo, an early pioneer, hosts the world's largest known Hadoop production environment to date, spanning more than 38,000 nodes.
Catch Apache Hadoop in action at DataWorks Summit 13-15 June 2017 in San Jose, CA.
Availability and Oversight Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/ and https://twitter.com/hadoop
About The Apache Software Foundation (ASF) Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF
CEE All Stars connected 40 of Central and Eastern Europe's most promising startups together with 25 venture capital firms, making it the first--and the biggest-- fundraising event of its kind in the region.
This two-day live online training covers JShell, a tool Paul Deitel calls “one of Java’s most significant new learning, discovery, and developer-productivity-enhancement features since its inception 20+ years ago.”
Paul Deitel, CEO of Deitel & Associates, Inc., co-author of Java® 9 for Programmers, and an Oracle Java Champion, will lead a live online introduction to JShell for interactive Java® on June 19-20.
Is there such a thing as pitching your video game to journalists too early or too late? A real-life video game and technology journalist answers these questions and more.
Last year we awarded 13,500 scholarships to European Android developers to take a course with Udacity. Four all-star scholars were invited to Google I/O 2017.
“Assurance leaders at PwC envision a future where, with 100 percent of data available for analysis, as opposed to select samples, auditors will be able to study a business in its entirety. It’s only a matter of time before this technology is available in the accounting industry below the enterprise level.”
“Sailing alone or short-handed? Use the patented TillerClutch to “lash the helm” with the simple click of a lever. It’s conveniently mounted under the tiller with the lever right in your steering hand.”
49d to #TSFM2017. #NationalTrailsDay yesterday: 10.4mi & 2129' of mist cooled trails at SFRC, day after #NPSF hills. Rest day today.
Saw deer, rabbits, and a partridge. Ran the last mile on pavement to make it back in time for the Good Earth breakfast bar. Everything felt fine, if just a bit slower, the day after tantek.com/2017/153/t1/sunrise-fog-summits-npsf.
Happy National Trails Day!
Photos: 1. Old Springs trail start and deer! 2. Old Springs trail view to a fog covered Pacific 3. Wolf Ridge Trail into the mist 4. Coastal Trail single track descent: “Steep grade / Stay on trail” 5. Coastal Fire Road ascent, @micheleperras in sight! 6. Tennessee Valley view to Tennessee Valley Beach 7. Peak: Coyote Ridge 2 Reference Mark (Previously: tantek.com/t4jj3) 8. Fork left to Marin Drive, into a twisty maze of roads 9. @Strava run path satellite hybrid view 10. Golden Gate Bridge southbound back to foggy SF
The Wealth Paradox: Economic Prosperity and the Hardening of Attitudes [Goodreads] is the right book at the right time. Short, succinct, and with hard data to prove their central thesis, The Wealth Paradox is worthy of a thoughtful read by policy makers, political operatives, academics, and in these troubled times, the general public.
The last few years have seen what Mols and Jetten declare in their preface to be a "perfect storm" in both Western liberal democracies and other countries that pretend to the democratic mantel. A combination of deep economic recessions and global crises have seen 21 million people earn the legal title of refugee and an estimated 65 million people forcibly displaced from their homes. A bit of political turmoil was bound to occur.
Readers might immediately think of Donald Trump's populist rise in the United States in the frantic few months following the British choice to Brexit. Mols was one of very few political scientists to foresee the election of Donald Trump to the presidency of the United States. His rationale forms the central thesis of The Wealth Paradox: The rise of far-right parties and political movements are not simply attributable to the poor and dispossessed but also to middle class voters with some modest degree of wealth to protect.
There are others. News watchers could not have missed the Turkish leader Recep Tayyip Erdoğan's grab for dictatorial powers after transitioning from prime minister to president of that country. In the Philippines, strongman Rodrigo Duterte grabbed the presidency with his promises to murder drug dealers, street children, and, purely as a form of collateral damage, political opponents. Russia's Vladimir Putin, like Erdoğan a former prime minister of his country and now president, went several times better by being prime minister, then president, then prime minister, and now president again. One must give him points for consistency.
All of these leaders were democratically elected. Something to notice is how close these decisions have been. Trump became the second Republican president in a row to lose the popular vote on his way to the White House. Putin won his first presidential bid in 2000 with 53% of the vote. Erdoğan won his presidential bid with in 2014 with 51.79%. Duterte won with a minority 39.1%. The referendum deciding that Britain should leave the European Union was passed with 51.89% voting to leave. In all of these cases and many more, a populist platform was adopted with nearly half of the electorates voting for the opposite.
Invoking [Godwin's Law], it seems an excellent time to recall that at the time Adolph Hitler was appointed Chancellor of Germany, he was head of a political party that had garnered a third of the seats in the German parliament by democratic means.
Why should people elect leaders who so often pursue unarguably unpopular policies, or who hold unpopular ideas? Mols and Jetten argue that enough middle class voters, those with above average incomes, do so in order to protect their own narrow interests. It is this point, and the data behind it, that makes The Wealth Paradox worth reading.
Recent votes in the Netherlands and France rejecting populist parties have left little time to celebrate. The combination of Byzantine political systems and continued strong showings by populist parties clearly show that history is not over. We may yet see a spread of their simplistic mixture of xenophobia and protectionism.
The authors of The Wealth Paradox are not, of course, the first scholars to note the connection between the middle class and populism, nor the odd (to the settled mind) desire to rip and replace an imperfect system with a new one.
The British historian George Dangerfield, writing in the 1930s about the pre-World War I actions of the Tory party then in opposition, made Mols' and Jetten's case for them. Dangerfield's crisis resulted in the partition of Ireland and the mutiny of a portion of the British Army:
The Tory Rebellion was not merely a brutal attack upon an enfeebled opponent - that is to say, political; it was not merely the impassioned defence of impossible privileges - that is to say, economic; it was also, and more profoundly, the unconscious rejection of an established security. For nearly a century men had discovered in the cautious phrase, in the respectable gesture, in the considered display of reasonable emotions, a haven against those irrational storms which threatened to sweep through them, And gradually the haven lost its charms; worst still, it lost it peace. Its waters, no longer unruffled by the wind, ceased to reflect, with complacent ease, the settled skies, the untangled stars of accepted behaviour and sensible conviction; and men, with a defiance they not hope to understand, began to put forth upon little excursions into the vast, the dark, the driven seas beyond. (George Dangerfield. The Strange Death of Liberal England. Stanford University Press, 1997, pp. 122-3.)
Dangerfield could have been writing about today's political challenges. We find ourselves coming off of an unprecedented post-war period of established security that, when buffeted by the "perfect storm", resulted in rejection. It is little wonder that his book became the archetypal modern history.
Worrying, too, is the lesson learned by unrepentant socialist Christopher Hitchens. Visiting his literary superhero Jorge Luis Borges in his unhappy home in Buenos Aires, Hitchens read at Borges' request Rudyard Kipling's "Harp Song of the Dane Women" whose opening verse:
What is a woman that you forsake her And the hearth fire and the home acre To go with that old grey widow-maker?
so beautifully gets to the beating heart of the human male's yearning for adventure, and the the acceptance of the accompanying risk. Hitchens was dismayed that his idol "heartily preferred" the "gentlemen" of the brutal and populist regime of Juan Perón who abused both his family and himself. Borges, for all his stunning illumination of human foibles, himself fell in his old age into a sort of populist Stockholm Syndrome.
Herodotus noted millennia ago how to react to those protective of their wealth. "Great wealth can make a man no happier than moderate means, unless he has the luck to continue in prosperity to the end... Now if a man thus favoured died as he has lived, he will be just the one you are looking for: the only sort of person who deserves to be called happy. But mark this: until he is dead, keep the word 'happy' in reserve. Till then, he is not happy, but only lucky." Those voting for populist leaders should carefully note the warning. Pursuit of short term interests must be carefully weighed with longer term consequences.
No, the The Wealth Paradox is not entirely new. It is up to date, well researched, and particularly timely.
The 191 pages of main matter make The Wealth Paradox a respectable size for an audience uncomfortable with lengthy prose. Forget War and Peace: One sometimes wonders how many years will pass before the last undergraduate slogs to the end of Kafka’s The Metamorphosis at 55 pages, or the 64 pages of Robert Louis Stevenson’s The Strange Case of Dr. Jekyll and Mr. Hyde. No time have we in these days of Internet-connected pocket supercomputers for the massive 4,736 pages of Winston Churchill's The Second World War. Even our academics must adjust to doling out words short enough to absorb during a commute or a visit to the toilet. But perhaps I simply suffer from last century's skills. As Kurt Vonnegut so ironically juxtaposed his writing with Abraham Lincoln's Gettysburg Address in his geriatric romp A Man Without a Country, "I am windy".
On a private Facebook thread, where a friend expressed dismay at Trump leaving the Paris Climate Accords, a smug skeptic dropped in a dataviz that argued against the effects of combatting climate change.
This struck a nerve in me. Data visualizations are a tool, and can be used for good or evil. Because they are visual, they can have a more immediate and visceral impact than mere statements, in part because of the Picture Superiority Effect. So, I responded, in a thread that quickly grew too long. My new opponent, who I cleverly detected as having a libertarian bent by glancing at their last couple public Facebook posts, used many of the classic tropes for Internet trolls arguments, though stopping well short of the Hitler-Nazi threshold. They used blunted accusations of logical fallacies, while engaging in fallacies themselves.
I probably shouldn’t have engaged, but I felt compelled: they were WRONG on the INTERNET! They used the facile quasi-reason that is such an insidious force on public dialog, and they misused a dataviz. I could not, as the kids say, even.
Much to my dismay, having put in so much work on taking down this skeptic, I realized that this conversation was on a private thread. How could I rest when people don’t know just how damned clever I am? (It’s one of my things… I have to be right, all the time, every time. I’m working on it.)
I have some ethical concerns about posting content from a private thread, but I don’t want to misrepresent my opponent (they did so well on that themself), so I’ll anonymize the author (as Anonymous Libertarian Troll, or ALT), and only include the necessary posts. (Copyright concerns? Good point! I’ll cite two legal defenses: 1. Fair Use. 2. Don’t talk crap on social media with strangers.)
It’s not my best work, since I was just dashing it off, and I was a little harsher than I usually am… but for your morbid pleasure, here’s the exchange:
Some Sensible Stranger: It was a step in the right direction!
ALT: No, it wasn’t. It was a waste of the money we will need to help people cope with a changing climate.
ALT: The practical reality is that the goals set by the Paris Climate Conference are akin to the Kyoto goals: small enough not to bankrupt the entire world, and small enough to not actually do a darned thing to reduce CO2 emissions.
Put it this way: the US, using the free market, has done more to reduce CO2 emissions than any agreement or accord or treaty we’ve signed or not signed.
Me: There are a number of problems with that dataviz, not least of which is that it’s unsourced. Where did those projections come from? What are the margins of error, and what are the variables? Who made the projection? What is their goal in making the dataviz? What is the impact of their 0.17ºC change?
Even assuming that the dataviz is correct, which is a leap of faith, you assert that it’s wasted money, and that it wasn’t a step in the right direction (e.g. that it doesn’t lay the groundwork for future, stricter agreements). What’s your basis for those assertions? Where specifically could the money be spent more effectively, and are you assuming that those resources (as expressed financially) are indeed fungible and have the same sources of control?
In short, this seems to be mere opinion, of a similar vein to the trendy nihilist distraction that paralyzes rather than effects change.
ALT: Unsourced? Or are you simply wishing it were unsourced?
Me: I stand corrected. What I took for the URL for the picture itself (the bit.ly URL in the bottom-right corner) was in fact the source. (I’d argue that’s bad provenance hygiene, since it’s a fragile link with no supporting text or search terms, but at least there is a source quoted).
In fact, the source is Bjørn Lomborg, who’s well known as a climate change skeptic. He’s clever because he doesn’t deny that climate change is happening, nor that it’s caused by human activity, but only argues that it’s too little and too late to do anything about it.
The article cited in the dataviz is “Impact of Current Climate Proposals”, and it feeds his line that we should spend the money elsewhere, such as AIDS, malaria, and malnutrition; again, the assumption is that the necessary resources are fungible.
His assumptions and projections in that article are refuted (Ward) , as are most of his arguments, but the majority of climate scientists.
His methodology is convenient for libertarians and for corporations, governments, and organizations that wish to halt or stall efforts to rein in climate change and apply “free market” solutions, but is counter to the overwhelming consensus.
ALT: You’re actually correct. Because of climate hysteria, people are willing to spend trillions of dollars to suppress plant food, but without that unreasoning hysteria, people are unwilling to spend money on Clean Water for Everyone (for example).
Maybe we should stop with the hysteria? The IPCC predicts a 2 degree C warming, but we’ve already seen 1.4 with no apparent harm to the world. The IPCC says that the harm from climate change will be less than many other harms felt by humans, including war, poverty, pestilence, drought, and famine.
I’m not a climate scientist. I have to rely on what the IPCC says. How ’bout you? Where did you get your degree in Climate Science from?
Ward says “Neither of these scenarios corresponds to expected policies beyond 2030.” If that’s not hand-waving, I’m a monkey’s uncle. The fact is that CO2 emissions have been growing. You cannot assume that a subsequent agreement will be signed, nor can you assume that, counter to all evidence, people will voluntarily impoverish themselves by ceasing to use fossil fuels.
May I add that impugning the source of information has little or no value in a discussion of the value of the information. Such a reference is ad-hominem. Similarly, imputing veracity to a document merely because the source is well-known is an appeal to authority, similarly a fallacy. Can we possibly not resort *immediately* to fallacies? It would be better to wait until after you’ve failed to convince me, which you surely will do. I’ve looked at the evidence, and you don’t have any that will convince me.
Me: The strawman of your first paragraph is simple sarcasm, and meaningless. You really don’t seem to get the fungibility issue. The money “spent” or “saved” on climate action is not the same resource that would be “spent” or “saved” by completely different actions toward eradicating disease or providing abundant food or water. Of course, climate change is making it increasingly hard to get clean water around the world, but you live in the US, so it doesn’t affect you yet, so I’ll move on.
“No apparent harm to the world”? Nonsense. Educate yourself about the effects of climate change. Information is abundant.
Yes, I used an ad hominem argument. I did so deliberately, and with good reason. The source of the information you cited, Lomborg, is subject to the same scrutiny that anyone in the scientific community is; that’s how science works. Lomborg has been refuted again and again, so his arguments should carry less weight than those by scientists who have better reputations.
Your classic blunder is in assuming that an ad hominem argument is by its nature a fallacy. Specifically:
“However, in some cases, ad hominem attacks can be non-fallacious; i.e., if the attack on the character of the person is directly tackling the argument itself. For example, if the truth of the argument relies on the truthfulness of the person making the argument—rather than known facts—then pointing out that the person has previously lied is not a fallacious argument.”
Nor did I cite any particular document or author as “well-known”, I just posted a specific refutation of Lomberg’s paper that you cited. I wasn’t appealing to authority for a particular author (well-known or not), but rather showing an example of the kinds of criticism leveled against Lomberg by the climate science community at large. So, I wasn’t using an appeal to authority, I was reinforcing How Science Works, which is by repeatability and consensus opinion.
BUT! If I had been appealing to authority, citing some specific and well-known climate change expert, that too would not have been a fallacy. You keep conflating “types of arguments” with “types of fallacies”. Many formal arguments have both valid and invalid forms. Saying “X is a climate scientist; X says Y about climate science; therefore Y is more likely to be true” is a valid form of appealing to authority; in fact, you used it yourself when image-bombing that dataviz into this thread, and had your source been more reliable, it would have been a persuasive, strong, and valid argument by authority. An invalid appeal to authority might be, “X is a medical doctor; X says Y about climate science; therefore Y is more likely to be true”, which neither of us made.
(As an aside, I wonder if there’s a fallacy for mistakenly accusing others of fallacies? That seems rampant these days. Maybe “argument by fallacious fallacy”?)
A good example of a fallacy that doesn’t have a valid argument form is cherry picking, which is “the act of pointing to individual cases or data that seem to confirm a particular position, while ignoring a significant portion of related cases or data that may contradict that position”. For example, selecting a single author who holds a controversial and widely-refuted position, out of a field of many researchers on that same topic who hold contrary views, in order to make or reinforce a claim or viewpoint, would be cherry picking. It would be like if you picked the one well-known climate change skeptic that validates some libertarian “free market” philosophical solution to addressing climate change, and ignoring the 95+% of climate change scientists who agree that climate change is real, that it’s caused by humans, and that we can take active steps to decrease its negative effects. But neither you nor I would cherry-pick like that, because it would harm our credibility, and we know that since neither of us is a climate scientist, we should trust the scientific consensus by climate scientists rather than applying our own inadequate filter to select a fringe view. We’re both rational people who realize that the stakes are too high to indulge our default reactions and worldviews on topics we’re not qualified to advise others on. That would be the appeal to the common man, a logical fallacy akin to the appeal to authority, which holds that everyone’s opinion is equally valid in all matters… and we all know that that’s just silly.
Of course I realize that I won’t and can’t convince you. That’s not my goal. It’s clear from the style and substance of your arguments that your confirmation bias has a firm grip on your ability to process information. My goal is to present clear refutations of your claims, so that such toxic ideas have a decreased change of taking root in other people who might otherwise think your rational-sounding position is based in anything pragmatic.
It’s probably too late for you. I don’t think it’s too late for the planet.
I’m satisfied that I’ve accomplished my goal, so I’ll leave you the last words in this discussion.
Surprisingly, they haven’t posted since. I’ll keep you updated if they do.
If approved, these will be the second Implementer’s Drafts of the first three profiles and the first Implementer’s Drafts of the last two – the Fast Healthcare Interoperability Resources (FHIR) profiles.
An Implementer’s Draft is a stable version of a specification providing intellectual property protections to implementers of the specification. This note starts the 45-day public review period for the specification drafts in accordance with the OpenID Foundation IPR policies and procedures. Unless issues are identified during the review that the working group believes must be addressed by revising the drafts, this review period will be followed by a seven-day voting period during which OpenID Foundation members will vote on whether to approve these drafts as OpenID Implementer’s Drafts. For the convenience of members, voting will actually begin a week before the start of the official voting period.
The relevant dates are:
Implementer’s Drafts public review period: 2017-06-03 to 2017-07-18 (45 days)
You can send feedback on the specifications in a way that enables the working group to act upon it by (1) signing the contribution agreement at http://openid.net/intellectual-property/ to join the working group (please specify that you are joining the “HEART” working group on your contribution agreement), (2) joining the working group mailing list at http://lists.openid.net/mailman/listinfo/openid-specs-heart, and (3) sending your feedback to the list.
— Michael B. Jones – OpenID Foundation Board Secretary
🌇 #sunrise vs #fog on 4 summits before 6:40, then #NPSF #hillsforbreakfast. 1552' in 4.7mi. 51d til #TSFM2017, my longest race so far @THESFMARATHON.
Photos from / by / of: 1. Twin Peaks Summit 2. Twin Peaks Summit (North/Eureka Peak) 3. Twin Peaks Summit (South/Noe Peak) 4. Mount Davidson (view from vista point) 5. 📷 by https://instagram.com/lauramcgreen 6. My attempt to honor @ryanscura of dooster.tv with a dramatic backlit shot of him running counter-balanced by rainbow tinted sunbeams 7. Similar shot of friends @jerseyblauvs & instagram.com/therealboops 8. @Strava run elevation graph for the morning 9. Mount Davidson view after running @Nov_Project_SF Hills
Running uphill is still the hardest for me to run, though with sustained effort and practice I am slowly getting better at it. I climbed over 2000' just two days ago, the combined total from double-PR Wednesday^1 @Nov_Project_SF and evening run/hikes to finish out the Strava Mt. Everest Running Climbing Challenge for the month of May^2. Thanks to a restday yesterday and enough sleep last night, my legs felt fine (not sore) on the hills this morning.
I know that “proper” training for a marathon requires increasing distance, both in long runs, and total weekly miles. Since uphill is my weakpoint, I’m focusing my training on that, and especially on trails since they are more forgiving (thus more body sustainable) than running on the road. Rather than total miles, I’m focusing on time spent at high-effort running. That feels more right for my body, and if my PR Wednesday improvement over the past two months is any indication, the right thing for me to do too.
Now that I have Saturdays again, I will be running SFRC every Saturday morning that I’m in SF, doing my long runs on trails as well. I’ll eventually do at least one longish run (maybe 16-18 miles?) on road just to get used to doing so in the shoes I plan to wear for the race. Also planning to do at least a few track workouts to mix-in some speedwork.
For now I’m happy with the progress I’ve made, both in the past few months and the past few years running with @Nov_Project_SF., and for the increased body awareness from yoga teacher training^3 that has really helped me consciously keep track of how my body feels while running, and adjust breathing, posture, pace, stride, etc. accordingly.
As always, grateful to running/fitness/yoga friends, teachers, and communities (@Nov_Project@missioncliffs@YogaFlowSF) whose support has been essential to reinforcing regular sustainable physical practices.
🌇 #sunrise vs #fog on 4 summits before 6:40, then #NPSF #hillsforbreakfast. 1552' in 4.7mi. 51d til #TSFM2017, my longest race so far @THESFMARATHON.
Photos from / by / of: 1. Twin Peaks Summit 2. Twin Peaks Summit (North/Eureka Peak) 3. Twin Peaks Summit (South/Noe Peak) 4. Mount Davidson (view from the top) 5. 📷 by https://instagram.com/lauramcgreen 6. My attempt to honor @ryanscura of dooster.tv with a dramatic backlit shot of him running counter-balanced by rainbow tinted sunbeams 7. Similar shot of friends @jerseyblauvs & instagram.com/generictjohnson 8. @Strava run elevation graph for the morning 9. Mount Davidson view after running @Nov_Project_SF Hills
Running uphill is still the hardest for me to run, though with sustained effort and practice I am slowly getting better at it. I climbed over 2000' just two days ago, the combined total from double-PR Wednesday^1 @Nov_Project_SF and evening run/hikes to finish out the Strava Mt. Everest Running Climbing Challenge for the month of May^2. Thanks to a restday yesterday and enough sleep last night, my legs felt fine (not sore) on the hills this morning.
I know that "proper" training for a marathon requires increasing distance, both in long runs, and total weekly miles. Since uphill is my weakpoint, I’m focusing my training on that, and especially on trails since they are more forgiving (thus more body sustainable) than running on the road. Rather than total miles, I’m focusing on time spent at high-effort running. That feels more right for my body, and if my PR Wednesday improvement over the past two months is any indication, the right thing for me to do too.
Now that I have Saturdays again, I will be running SFRC every Saturday morning that I’m in SF, doing my long runs on trails as well. I’ll eventually do at least one longish run (maybe 16-18 miles?) on road just to get used to doing so in the shoes I plan to wear for the race. Also planning to do at least a few track workouts to mix-in some speedwork.
For now I’m happy with the progress I’ve made, both in the past few months and the past few years running with @Nov_Project_SF., and for the increased body awareness from yoga teacher training^3 that has really helped me consciously keep track of how my body feels while running, and adjust breathing, posture, pace, stride, etc. accordingly.
As always, grateful to running/fitness/yoga friends, teachers, and communities (@Nov_Project@missioncliffs@YogaFlowSF) whose support has been essential to reinforcing regular sustainable physical practices.
On behalf of the Jython development team, I'm pleased to announce that the second release candidate of Jython 2.7.1 is available! This is a bugfix release. Bug fixes include improvements in ssl and pip support.
Please see the NEWS file for detailed release notes. This release of Jython requires JDK 7 or above.
This release is being hosted at maven central. There are three main distributions. In order of popularity:
Most likely, you want the traditional installer. NOTE: the installer automatically installs pip and setuptools (unless you uncheck that option), but you must unset JYTHON_HOME if you have it set. See the installation instructions for using the installer.
To see all of the files available including checksums, go to the maven query for org.python+Jython and navigate to the appropriate distribution and version.
There's lots of video of squid as undersea predators. This is one of the few instances of squid as prey (from a deep submersible in the Pacific):
"We saw brittle stars capturing a squid from the water column while it was swimming. I didn't know that was possible. And then there was a tussle among the brittle stars to see who got to have the squid," says France.
As usual, you can also use this squid post to talk about the security stories in the news that I haven't covered.
So CloudCamp in London on the 6th July has the tagline “Serverless and the the death of devops”. OMG folks can you please stop doing this? I really like the CloudCamp folks – it’s a fun event and a really vibrant community, but the whole “death of devops” thing really grinds my gears. I blame Simon Wardley. 😉
Seriously – one of the most interesting aspects of the whole serverless thing, certainly as curated for example by the good people of Serverlessconf, is that DevOps is very much part of the serverless wave. Charity Majors, Honeycomb.io cofounder and Patrick Debois (of DevOpsDays fame) are stalwarts of the serverless community precisely because they do such a great job explaining how serverless doesn’t remove the need for ops. Ops will always be a key underpinning of every wave of technology. Automation doesn’t replace ops, it augments it. Abstraction doesn’t replace ops, it hides it. Function as a service doesn’t remove complexity, it increases it exponentially.
I appreciate the power of a good headline. I used to be a journalist. I am the guy that recently wrote how AWS is demolishing the cult of youth. But the whole Death of DevOps thing is getting a little old. Frankly in tech nothing ever dies. My excellent colleague Fintan wrote a great post in March about how serverless is redefining DevOps.
“One of the more interesting immediate evolutions, alongside the volume compute aspects we have previously noted, is in the use of serverless approaches by DevOps practitioners for dealing with tasks that were previously dealt with by having a dedicated virtual machine, or more recently containers, and a set of scripts in place. The kind of tasks that we are hearing about range from simple log processing, to tagging and identifying development instances for utilization monitoring to far more complex streams of activities which are processed and pushed into other cloud based services.”
So much this. Serverless is an excellent adjunct to automation and event-driven ops. Hackers gonna hack. Any good ops person using AWS, Azure, Bluemix or GCP is going to use their respective serverless implementations to drive better manageability of apps.
So anyway – I was lucky enough to chat with Tim Wagner, AWS GM of Serverless recently, and as far as I am concerned he pretty much put the matter to bed. He said simply:
“DevOps is our gateway drug.”
Customers start with something really simple like using Lambda to run cron jobs. Cron jobs are one of the least glamorous but most useful single function jobs in IT. Lambda makes great glue and bailing wire for managing all your AWS services, or whatever cloud you’re running on, or even inter cloud event-based calls.
DevOps is the ultimate reactive, or event-driven, tech use case. It’s not going anywhere.
For those of you who are not aware, #ID24 (Inclusive Design 24) returns for another 24 hours of live accessibility talks on 9 June starting at midnight UTC/GMT (here is a handy countdown clock). We here at The Paciello Group are really proud of the schedule we’ve put together and we’re looking forward to hosting a fantastic group of presenters.
I thought I’d take this opportunity to share some suggestions on how to maximize the value of #ID24 and get the most out of the 24 hours.
Spread the word!
What good is a party if no one shows up? We need your help spreading the word about #ID24 around the world. Please share the #ID24 programme with all your contacts, using the #ID24 hashtag where appropriate. And don’t forget our URL: InclusiveDesign24.org
Turn us on
Work in an office? Have a board room? Well then… we have a YouTube playlist for #ID24, why not throw the event up in a board room and leave it playing for the day?
Lunch and Learn
There may be no such thing as a “free” lunch, but there can be a free lunch and learn! Why not organize a viewing party for the lunch talks in your time zone and get together, learn, and discuss as a team. While you’re at it, why not pick up the tab for a few pizzas and treat your team? They deserve it!
Keep the party going
#ID24 runs for 24 hours so no matter what timezone you’re in, chances are pretty good you’ll be able to catch us after work hours. Why not organize an event, or viewing party, or whatever? Accessibility should be fun!
These are just a few suggestions; I’d love to hear what you come up with. Share below and let us know how things went!
No matter how you choose to celebrate #ID24, we hope you have a great day and we appreciate you taking the time to support it.
The Obama administrations had placed the moratorium on permits because the fees collected hadn’t been adjusted in decades, and the American taxpayers were not getting good value for their money. Zinke ostensibly gives a nod to this concern by chartering a committee to look into the fees but sees no reason not to give out permits in the meantime.
The Justice Department has asked the courts to hold EPA two court cases, including the case related to the Clean Power Plan, in abeyance. The request is a result of Trump’s recent Executive Order seeking to weaken or undermine EPA rules and regulations. The DOJ is asking for an abeyance until 30 days after the EPA determines whether to change or revoke the rules.
One of the first actions Trump took when he entered office was to issue an Executive Order (EO) instituting a travel ban that generated confusion and caused havoc at border entry points across the country. Dozens of lawsuits were filed against the ban, and its implementation was quickly halted in federal court.
In response, Trump issued another EO in March, which, while not as destructive or as blatantly discriminatory as the first, still generated concerns about its Constitutionality.
The problem: We want to display a large number of search results (from our DAM system) on a Web page. Gathering the results on the server and transferring them to the browser takes a while. To improve the user experience and show the first results as soon as possible, we want to “stream” the results. Each item needs to be rendered as soon as the browser receives it. A simple Ajax call waits until the server has returned everything, so we’ll have to do something a little more advanced.
In our old UI, we used Oboe.js for streaming, but I like the new approach much better because it requires little code, thanks to the magic of Vue.js and EventSource (the Web browser’s built-in SSE support, not available in IE and Microsoft Edge) – and because it’s very lightweight, requiring nothing but a simple script include (no npm, no build toolchain).
Server-sent events use a simple text protocol. Here’s what my server sends; pretty self-explanatory (note the text/event-stream content type):
$ curl -v http://example.com/stream.php
> GET /stream.php HTTP/1.1
>
< HTTP/1.1 200 OK
< Content-Type: text/event-stream; charset=UTF-8
<
event:header
data:{"total_items":20,"msg":"Hello from server"}
event:item
data:{"cnt":1,"text":"[1] Hello from server"}
event:item
data:{"cnt":2,"text":"[2] Hello from server"}
[…]
event:item
data:{"cnt":20,"text":"[20] Hello from server"}
event:close
data:[]
This format is transparently handled by the EventSource object. We just need it to feed data into a Vue.js model. Vue.js updates the HTML automatically when the model changes. (Check out the Vue.js documentation.)
It’s important to know that EventSource is designed for permanent connections so it keeps requesting the same URL in an endless loop when the server closes the connection. The evtSource.close() call prevents that, stopping after the server has signalled that all results have been delivered.
It’s all in a single HTML file, stream.html:
<html> <head> <meta charset="UTF-8"/> </head> <body> <!-- This div is rendered by Vue.js: --> <div id="app"> <form v-on:submit.prevent="run"> <input v-model="msg" type="text" placeholder="Enter message"/> <button v-on:click="run" type="button">{{ buttonLabel }}</button> </form> <p>{{ items.length }} of {{ total_items }} times “{{ actual_msg }}”:</p> <ul> <my-item v-for="item in items" :key="item.cnt" v-bind:item="item"> </my-item> </ul> </div> <!-- Include Vue.js: --> <script src="https://unpkg.com/vue/dist/vue.min.js"></script> <!-- Our own JavaScript code: --> <script type="application/javascript">
Yes, may Film Roundup bring you blessings throughout the year! No Twin Peaks spoilers, please.
Antitrust (2001): Semi-hate-watch with Sumana. (Here's her review.) She saw this on a plane in 2001 and had ever since wanted to revisit it to make fun of the bad tech. But... turns out the tech isn't all that bad. Pretty accurate for the most part. The software development processes we see aren't great, but they're in line with what I saw in the 90s. The biggest technical flub (an impractical plan to spy on people as they write code) is, IMO, just a way of dramatizing GPL violations.
Which is not to say that this is a good movie. It's bad. "Killer App" (1995), a television pilot, does a better job of just about everything that's not directly related to free software, e.g. the Bill Gates character's house-of-the-future. Not recommended unless you gotta see an old version of GNOME on the big screen.
Bahubali 2: The Conclusion (2017): First, read my review of part 1. Now, read it again, because the sequel is another 3-hour battering of the senses with spectacle. There were some moments where the plot got more complicated than your average blockbuster, but also moments where they passed up what I considered obvious opportunities for coolness. I was never bored, not at all, but where the first movie frequently went in directions I wasn't anticipating, this movie... didn't do that.
It's hard to stay unpredictable when half your movie is a direct prequel to a movie your audience has already seen. Sumana and I agree that the Bahubali series needs a mode where you can just watch the sub-films in chronological order. Currently it's as if you had to watch the Hobbit movies halfway through the Lord of the Rings movies, with Elijah Wood playing both Bilbo and Frodo.
Cops and Robbers (1973): Really good crime-and-grime heist movie with just the right mix of NYC and Long Island. Nothing serious, but classic popcorn. Not a science-fiction film, but does a great job incorporating Apollo stock footage into the plot.
There's a feeling in twentieth-century crime movies that's made explicit in Cops and Robbers. Society expects people to stay in their lane, crime-wise. You expect a cop to take bribes, a store manager to embezzle, a stockbroker to commit securities fraud. What causes a problem/creates a movie plot is when you pull off a crime that someone like you isn't supposed to do. There's this great scene in Cops and Robbers where the two cops realize that someone else has casually piggybacked a much more successful caper on top of theirs, and they react with the same New York "whadayagonna do" attitude they exhibit when stuck in a traffic jam. Exploited by the ruling class again! Wah wah.
I saw a series of films by Charles and Ray Eames which ranged from the somnolent (House: After 5 Years of Living) to the hypnotic (Tops) to the suspiciously sexy (S-73 Sofa Compact). Powers of Ten is always a treat.
Guardians of the Galaxy Vol. 2 (2017): This movie succeeds where Interstellar fails: it turns Solaris into a summer blockbuster. I had a good time! Better than the first one (which I also liked), though significantly more violent. Still not sure why these movies bother to have villains. Isn't it enough that the heroes hate each other?
D.O.A. (1950): An unbeatable first scene is, in fact, not beaten by anything else in this kinda dull noir. Nice dramatic structure though. The most interesting bit is how it dramatizes the white-collar nightmare that something you did at your boring desk job, something you don't even remember, has made an enemy of someone you don't know.
Thief (1981): Another criminal-goes-out-of-his-lane heist film. Not as punchy or as... subtle?... as Cops and Robbers, but fun enough. Dave Thomas of Wendy's has a masterful turn as the evil spirit of capitalism. Wait, I'm being informed that that role is actually played by Robert Prosky of Gremlins 2 fame.
Big bombastic Tangerine Dream soundtrack in this movie. I didn't even know that was a thing. I thought Tangerine Dream just made music to put you to sleep.
Don't think I didn't notice the Rififi reference. I see all!
Bottle Rocket (1996): Right from his first feature, the ups and downs of a Wes Anderson film are visible. Funny first sequence, emotionally effective final sequence, and in the middle lots of well-framed shots I don't care much about. Watch Fantastic Mr Fox instead.
IMDB trivia: "Originally, Owen Wilson had no plans to act in the film at all." In fact, this is true of every actor and every film.
ASF Board –management and oversight of the business and affairs of the corporation in accordance with the Foundation's bylaws. - Next Board Meeting: 21 June 2017. Board calendar and minutes http://apache.org/foundation/board/calendar.html
ASF Infrastructure –our distributed team on four continents keeps the ASF's infrastructure running around the clock. - 7M+ weekly checks yield grand performance at 99.92% uptime http://status.apache.org/
Apache Calcite™ Avatica –a framework for building database drivers. - Apache Calcite Avatica 1.10.0 released https://calcite.apache.org/avatica/
Apache Fineract™ –an Open Source system for core banking as a platform to offer financial services to the world's 2B under-banked and unbanked. - Apache Fineract 1.0.0 released http://fineract.apache.org/
Apache Flink™ –an Open Source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. - Apache Flink 1.3.0 released https://flink.apache.org/
Apache Jackrabbit™ –a fully compliant implementation of the Content Repository for Java(TM) Technology API, version 2.0 (JCR 2.0) as specified in the Java Specification Request 283 (JSR 283). - Apache Jackrabbit 2.14.1 and Jackrabbit Oak 1.7.0 released http://jackrabbit.apache.org/
Apache Tephra (incubating) –a transaction engine for distributed data stores like Apache HBase. - Apache Tephra-0.12.0-incubating released http://tephra.incubator.apache.org/
Apache Tika™ –a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. - Apache Tika 1.15 released http://tika.apache.org/
Did You Know?
- Did you know that the CFP for MesosCon North America closes on June 3rd? Submit your proposals to http://bit.ly/2rbVJhY
- Did you know that ARGO Labs uses Apache Airflow (airflow) towards automating Extract, Transform, Load (ETL) workflows for California water data from The California Data Collaborative? http://incubator.apache.org/projects/airflow.html
- Did you know Dutch multinational banking company Rabobank uses Apache Kafka for real-time financial alerts? http://kafka.apache.org/
- Join members of the Apache Apex, Beam, Flink, Hadoop, Kafka, Lucene, Solr, and Spark communities at Berlin Buzzwords 11-13 June in Berlin https://berlinbuzzwords.de/17/
- Meet members of Apache's Cloud community at Cloud Foundry Summit Silicon Valley 13-15 June in Santa Clara; enjoy 20% off registration rates using discount code CFSV17ASF20 https://goo.gl/Uq3g0t
- Catch the Apache Ignite and Spark communities at the In-Memory Computing Summit 20-21 June in Amsterdam and 24-25 October in San Francisco https://imcsummit.org/
- Find out how you can participate with Apache community/projects/activities --opportunities open with Apache HTTP Server, Avro, ComDev (community development), Directory, Incubator, OODT, POI, Polygene, Syncope, Tika, Trafodion, and more! https://helpwanted.apache.org/
For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers.
There is plenty of blame to go around for the WannaCry ransomware that spread throughout the Internet earlier this month, disrupting work at hospitals, factories, businesses, and universities. First, there are the writers of the malicious software, which blocks victims' access to their computers until they pay a fee. Then there are the users who didn't install the Windows security patch that would have prevented an attack. A small portion of the blame falls on Microsoft, which wrote the insecure code in the first place. One could certainly condemn the Shadow Brokers, a group of hackers with links to Russia who stole and published the National Security Agency attack tools that included the exploit code used in the ransomware. But before all of this, there was the NSA, which found the vulnerability years ago and decided to exploit it rather than disclose it.
All software contains bugs or errors in the code. Some of these bugs have security implications, granting an attacker unauthorized access to or control of a computer. These vulnerabilities are rampant in the software we all use. A piece of software as large and complex as Microsoft Windows will contain hundreds of them, maybe more. These vulnerabilities have obvious criminal uses that can be neutralized if patched. Modern software is patched all the time -- either on a fixed schedule, such as once a month with Microsoft, or whenever required, as with the Chrome browser.
When the US government discovers a vulnerability in a piece of software, however, it decides between two competing equities. It can keep it secret and use it offensively, to gather foreign intelligence, help execute search warrants, or deliver malware. Or it can alert the software vendor and see that the vulnerability is patched, protecting the country -- and, for that matter, the world -- from similar attacks by foreign governments and cybercriminals. It's an either-or choice. As former US Assistant Attorney General Jack Goldsmith has said, "Every offensive weapon is a (potential) chink in our defense -- and vice versa."
This is all well-trod ground, and in 2010 the US government put in place an interagency Vulnerabilities Equities Process (VEP) to help balance the trade-off. The details are largely secret, but a 2014 blog post by then President Barack Obama's cybersecurity coordinator, Michael Daniel, laid out the criteria that the government uses to decide when to keep a software flaw undisclosed. The post's contents were unsurprising, listing questions such as "How much is the vulnerable system used in the core Internet infrastructure, in other critical infrastructure systems, in the US economy, and/or in national security systems?" and "Does the vulnerability, if left unpatched, impose significant risk?" They were balanced by questions like "How badly do we need the intelligence we think we can get from exploiting the vulnerability?" Elsewhere, Daniel has noted that the US government discloses to vendors the "overwhelming majority" of the vulnerabilities that it discovers -- 91 percent, according to NSA Director Michael S. Rogers.
The particular vulnerability in WannaCry is code-named EternalBlue, and it was discovered by the US government -- most likely the NSA -- sometime before 2014. The Washington Postreported both how useful the bug was for attack and how much the NSA worried about it being used by others. It was a reasonable concern: many of our national security and critical infrastructure systems contain the vulnerable software, which imposed significant risk if left unpatched. And yet it was left unpatched.
There's a lot we don't know about the VEP. The Washington Post says that the NSA used EternalBlue "for more than five years," which implies that it was discovered after the 2010 process was put in place. It's not clear if all vulnerabilities are given such consideration, or if bugs are periodically reviewed to determine if they should be disclosed. That said, any VEP that allows something as dangerous as EternalBlue -- or the Ciscovulnerabilities that the Shadow Brokers leaked last August to remain unpatched for years isn't serving national security very well. As a former NSA employee said, the quality of intelligence that could be gathered was "unreal." But so was the potential damage. The NSA must avoid hoarding vulnerabilities.
Perhaps the NSA thought that no one else would discover EternalBlue. That's another one of Daniel's criteria: "How likely is it that someone else will discover the vulnerability?" This is often referred to as NOBUS, short for "nobody but us." Can the NSA discover vulnerabilities that no one else will? Or are vulnerabilities discovered by one intelligence agency likely to be discovered by another, or by cybercriminals?
In the past few months, the tech community has acquired some data about this question. In one study, two colleagues from Harvard and I examined over 4,300 disclosed vulnerabilities in common software and concluded that 15 to 20 percent of them are rediscovered within a year. Separately, researchers at the Rand Corporation looked at a different and much smaller data set and concluded that fewer than six percent of vulnerabilities are rediscovered within a year. The questions the two papers ask are slightly different and the results are not directly comparable (we'll both be discussing these results in more detail at the Black Hat Conference in July), but clearly, more research is needed.
People inside the NSA are quick to discount these studies, saying that the data don't reflect their reality. They claim that there are entire classes of vulnerabilities the NSA uses that are not known in the research world, making rediscovery less likely. This may be true, but the evidence we have from the Shadow Brokers is that the vulnerabilities that the NSA keeps secret aren't consistently different from those that researchers discover. And given the alarming ease with which both the NSA and CIA are having their attack tools stolen, rediscovery isn't limited to independent security research.
But even if it is difficult to make definitive statements about vulnerability rediscovery, it is clear that vulnerabilities are plentiful. Any vulnerabilities that are discovered and used for offense should only remain secret for as short a time as possible. I have proposed six months, with the right to appeal for another six months in exceptional circumstances. The United States should satisfy its offensive requirements through a steady stream of newly discovered vulnerabilities that, when fixed, also improve the country's defense.
The VEP needs to be reformed and strengthened as well. A report from last year by Ari Schwartz and Rob Knake, who both previously worked on cybersecurity policy at the White House National Security Council, makes some good suggestions on how to further formalize the process, increase its transparency and oversight, and ensure periodic review of the vulnerabilities that are kept secret and used for offense. This is the least we can do. A bill recently introduced in both the Senate and the House calls for this and more.
In the case of EternalBlue, the VEP did have some positive effects. When the NSA realized that the Shadow Brokers had stolen the tool, it alerted Microsoft, which released a patch in March. This prevented a true disaster when the Shadow Brokers exposed the vulnerability on the Internet. It was only unpatched systems that were susceptible to WannaCry a month later, including versions of Windows so old that Microsoft normally didn't support them. Although the NSA must take its share of the responsibility, no matter how good the VEP is, or how many vulnerabilities the NSA reports and the vendors fix, security won't improve unless users download and install patches, and organizations take responsibility for keeping their software and systems up to date. That is one of the important lessons to be learned from WannaCry.
This document is a Part 2 of a set of documents that specifies a Financial API. It provides a profile of OAuth that is suitable to be used for write access to the financial data also known as the transactional access. To achieve it, this part of the document specifies the control against such attacks like authorization request tampering, the authorization response tampering including code injection and the state injection, token request phishing, etc.
An Implementer’s Draft is a stable version of a specification providing intellectual property protections to implementers of the specification. This note starts the 45-day public review period for the specification drafts in accordance with the OpenID Foundation IPR policies and procedures. Unless issues are identified during the review that the working group believes must be addressed by revising the drafts, this review period will be followed by a seven-day voting period during which OpenID Foundation members will vote on whether to approve these drafts as OpenID Implementer’s Drafts. For the convenience of members, voting will actually begin a week before the start of the official voting period.
The relevant dates are:
Implementer’s Draft public review period: 2017-06-01 to 2017-07-16 (45 days)
Implementer’s Draft vote announcement: 2017-07-03
Implementer’s Draft voting period: 2017-07-10 to 2017-07-24 (7 days)*
* Note: Pre-voting before the start of the formal voting will be allowed.
Comments are to be submitted to the FAPI working group issue tracker. You must sign an IPR Contribution Agreement for the FAPI working group to file issues. See http://openid.net/intellectual-property/ for more details.
— Michael B. Jones – OpenID Foundation Board Secretary
Trying to psychologically cope with all the inequality in the world can be overwhelming. One way to deal with feelings of powerlessness is by doing something. You definitely can’t fix all the things but you can contribute to fixing one thing. So let me introduce you to Isaiah Lynn.
He’s pretty awesome. He studies at University College London, has an internship at JP Morgan this summer, and just got accepted into a course that will see him spend the third year of his undergraduate studies at Harvard and MIT. That’s the good news. The bad news – the funding he expected to get for the program from a charity just fell through, so he’s crowd-funding to raise the £64k (around $82k) he needs to pay for tuition. He recently hit the 10k mark, so there is plenty to do but things are looking good.
I first met Isaiah at Thingmonk, our IoT conference. He was one of our Diversity Scholars, in a program sponsored by the Eclipse Foundation, which involved crispy new business cards from Moo, free tickets and mentoring from great people like Thomas Otter of SuccessFactors SAP. We wanted to do more than just give out some free tickets, but to build a program of engagement before, during and after the event. Christie Fidura laid down the framework.
The scholarship intake at Thingmonk 2016 was stellar – bright, enthusiastic and personable. So many talented people joining us from underrepresented groups in tech. Sometimes in our industry though diversity is shorthand for gender diversity, but Thingmonk really opened my eyes. Having a significant number of black people at your event makes it way better.
One thing that really struck me was that many of the young people that attended were already high achievers: Internships at JP Morgan, fast track management programs at Colgate-Palmolive, “clinical entrepreneurs” at the the NHS. What could we do for them, when they were going to crush it anyway?
It turns out that one of simple things we did was include them, and inspire them. I talked to Isaiah earlier today and he said that attending ThingMonk had led him to radically reconsider the options open to him. Why shouldn’t an anthropology student, raised in Stratford, become a tech VC?
“Sometimes we are told that we cannot achieve an aspiration or that our aspirations are too high. We are told that due to our ethnicity, gender, sexual orientation, class, or disability, our dream is unachievable. No. What is important, as far as I am concerned, is not how our background can limit us, but rather how our background can propel us to our destination. Coming from a humble income, lone parent family, growing up in inner-city London could have stopped me reaching some of my aspirations. But, it has not. Instead, it has motivated me to explore just how far I can go with limited resources. I have come this far, and I still have a way to go.”
So why do I care? I care about creating opportunities for people because it’s who I am. I was raised that way. But i am not going to lie – my wife is black and having mixed race kids does make the lack of diversity in tech a bit more personal. I want my sons and daughter to be as welcome in tech as I have been, if they choose that route.
So back to Isaiah and crowd-funded education. One thing I learned in running our successful campaign to open the Village Hall was that people don’t fund things if they don’t get something back. It’s very hard to just say hey please fund my education. But Isaiah is the talent, the pipeline, the future of business success. You don’t want to support him because he needs the money, but because he’s going to bring value to your business in future. So I thought why not reach out to VCs I know and see if they’d be willing to contribute, on the basis that he could join as a research assistant after graduation. Isaiah has already reached out the the Harvard Ventures group, and he’s all in. At UCL he’s been involved in their VC program too, and has already helped fund 4 companies with support from VCs including Accel Partners and Balderton Capital. Why not have a chat with him?
Obviously if you happen to have enough funds you can contribute to his campaign please do so – his crowd funding page is here, please read it. But I figured going after some VCs i know was one way I might be able to help. Bottom line – there are not enough black people in tech, and definitely not enough in the VC world. We need to redress the balance. Isaiah Lynn in super smart and he will work himself to the bone to succeed. You don’t get a place to Harvard without doing the work, especially if you’re a black man from Stratford.
Safari Technology Preview Release 31 is now available for download for macOS Sierra. If you already have Safari Technology Preview installed, you can update from the Mac App Store’s Updates tab. This release covers WebKit revisions 216643-217207.
Web API
Added media and type attribute support for <link rel="preload"> (r217247)
Added support for DOMMatrix and DOMMatrixReadOnly (r216959)
Fixed getElementById to return the correct element when a matching element is removed during beforeload event (r216978)
Fixed skipping <slot> children when collecting content for innerText (r216966)
JavaScript
Fixed a syntax error thrown when declaring a top level for-loop iteration variable the same as a function parameter (r217200)
Layout & Rendering
Added support for transform-box to switch sizing box in SVG (r217236)
Fixed clientX and clientY on mouse events to be relative to the layout viewport (r216824)
Fixed large animated images getting decoded as a large static image before receiving all of the data (r217262)
Fixed screen flickering caused by asynchronous image decoding for large images when interacting with the page (r216901)
Fixed element position when dragging jQuery Draggable elements with position:fixed after pinch zoom (r216803)
Fixed a timing issue causing a hardware-accelerated transform animation to misplace an element 50% of the time (r217075)
Prevented loading the active recording until a Timeline view needs to be shown (r217379)
Media
Added support for painting MSE video-element to canvas (r217185)
Fixed captions and subtitles not showing up in picture-in-picture for MSE content (r216951)
Fixed media element reporting hidden when in picture-in-picture mode and tab is backgrounded (r217223)
Web Driver
Fixed characters produced with the shift modifier on a QWERTY keyboard to be delivered as shift-down, char-down, char-up, and shift-up events (r217244)
Fixed navigator.webdriver to return false if the page is not controlled by automation (r217391)
WebCrypto
Replaced CryptoOperationData with BufferSource (r216992)
Security
Improved error message for Access-Control-Allow-Origin violations due to a misconfigured server (r217069)
Not many remember it, because the technology industry tends to focus on its future at the expense of its past, but in the beginning software was free. In both senses of the word free; it was available at no cost, and the source typically came without restrictions. One of the earliest user groups SHARE, founded in 1955, maintained a library, in fact, of users’ patches, fixes and additions to the source code of the IBM mainframe like a proto-GitHub. The modifications SHARE maintained were extensive enough, in fact, that in 1959 SHARE released its own operating system – what we would today refer to as a distribution, the SHARE Operating System (SOS) – for IBM 709 hardware.
IBM made available the software at no cost and in source code form because for the company at that time, the software was not the product, the hardware was. It wasn’t until June 1969 that IBM announced, possibly in response to an anti-trust suit filed by the United State Justice Department in January of that year, that it would “unbundle” its software and hardware. Theoretically this would open the hardware platform to market competition, but the side effect was that IBM began charging for the software it had once shared for free. The side effect of this side effect, arguably, was the creation of the software business as we know it today.
In a bit of historical irony, the company that created the software industry missed by far its most lucrative market opportunity, the opportunity that Microsoft perceived. This was the idea that software not only had value, but due to hardware commoditization, the potential for far more value than the hardware it ran on. As has been documented here and elsewhere, Microsoft rode software to unprecedented heights, and in doing so ushered in a new model for software, one that was philosophically at the opposite end of the spectrum from its origins.
From its beginnings as a freely available and no cost good, the new standard was to protect source code by distributing it only in binary form and for profit. The latter model was dominant for years, but the old ways were never forgotten, precisely. While SHARE’s ability to produce its own flavor of IBM’s operating system may have been eliminated, the distributed development model that made it possible eventually evolved and inspired what today we simply call open source.
Where the proprietary model was once ascendant over something that looked a lot like open source, in the present day it is open source not proprietary that is the default – in the subset that is infrastructure software, at least.
Which is an outcome celebrated by millions in the technology industry, because more open source software means more capability and less friction for developers. But for all that the return of open source is welcomed, the industry is to some extent still grappling with one very basic question: where does the money come from?
In the early days, remember, software was subsidized by hardware. Sales of the latter paid for development of the former. In the model that followed, software paid for itself: software authors created a product, and that product was marketed and sold on a standalone basis. This was possible in large part because software vendors were no longer competing with free: by not making the source available, they ensured a role as the single point of access. This was and is the model that generates Microsoft tens of billions of dollars annually for two software assets in Office and Windows, better known as the two most successful software products in history.
With proprietary software pressured and giving ground to open source competition, however, the process for selling software has become more challenging. It is possible, of course, to monetize open source software directly. A variety of mechanisms have been tried, from dual licensing to support and service to open core. It is inefficient and significantly less profitable than selling proprietary software was, however. Even the best in the industry depended heavily on volume to make up for the difficulty in converting users of free software to paid customers. MySQL, for example, reportedly was at its peak able to convert one in a thousand users to a paid product. Combined with generally lower margins (though Pivotal might disagree) due to increased competition from other open source projects, and it’s not difficult to understand why it’s harder for commercial organizations to extract revenue relative to proprietary competitors. Red Hat, then, is the exception that proves the rule.
But if it’s true, as Cloudera’s Mike Olson has said, “you can’t build a successful stand-alone company purely on open source,” why open source the software at all? For those that do, he has an answer, “You can no longer win with a closed-source platform.”
You can’t win without open source, in other words, but you can’t win with it either. How is software going to be produced then? What kind of model balances the costs and benefits of both the open and proprietary models in such a fashion that buyer, developer and vendor are all satisfied?
From a market perspective, the clear answer is open source software delivered as a service. Instead of attempting to sell open source software as a standalone entity, you couple it with a platform and sell the two together. Consider a few of the advantages:
It can be difficult to compel developers or buyers to pay for software they can obtain for free. It is understood, by comparison, that that same software hosted will not be free.
Gone are misalignments of customer and vendor needs. The paradox of being tasked with delivering a quality product while deriving in some cases 100% of revenue from support of it, the tension between continual delivery and withholding features to incent upgrades, all are eliminated in service businesses. In a SaaS business, you’re simply paid to run the software and deliver ever better versions of it.
Developers or the customers they work for can still download and run a given open source project on their own if they choose, but increasingly are forced to address questions as to whether they’ll be able to operate it as well in house as the people who built it.
Not that the case for services is based strictly on theory. The available market evidence suggests that open source offered in a service context is increasingly the primary choice for growth. Businesses are acquiring OSS-aaS (e.g. IBM with Cloudant and Compose) or developing it internally (e.g. MongoDB and Atlas), and buyers are consuming it at an accelerating rate. PostgreSQL owes much of its recent resurgence to adoption by AWS, Heroku and lately, Google, while the MySQL-compatible but proprietary Aurora product is the fastest growing service in Amazon’s history.
As neatly as service-based open source offerings close loopholes and simplify customer conversion, however, they represent a model that can be less appetizing for startups, particularly small ones. Running a SaaS business is both more expensive and more challenging to manage than merely writing and selling software. For many commercial open source organizations, a hosted service is for reasons real or imagined not currently viewed as an option.
Which brings us back to licenses.
Commercial open source organizations today face significant challenges. On the one hand, open source itself is no longer viewed as an impediment to adoption; if anything, the opposite tends to be true. But it is difficult for such organizations to turn users into customers, as it has always been, and the competitive landscape is significantly more crowded than it has been in years past. Rampant fragmentation has led to significantly more choice arriving more quickly, and new models such as the previously discussed cloud offerings are options that didn’t exist a decade ago.
It should perhaps come as no surprise then that we’re seeing organizations look to new licensing mechanisms as a means of addressing questions of revenue.
While the history of software revenue models is rich in its variety, today most software is accurately classified as belonging to one of two categories: open source or proprietary. Free software advocates would likely take issue with being included in the open source bucket, and indeed that superset includes a multitude of approaches – many of which are radically distinct from one another. For our purposes here, it’s enough to recognize that most software is released under either a proprietary license or an open source license that meets the terms of the OSI’s Open Source Definition.
If Cockroach Labs, MariaDB or Sourcegraph have their way, however, we may have to add a third category to the list.
This category is not new; it borrows heavily from shareware (and crippleware, trialware, etc) models that preceded it, and similar efforts have been floated periodically. The model is clearly not a traditional proprietary approach, because the source code is made available. Neither can it be considered open source, however, because it does not meet the terms of the OSI’s definition of that term. The highest profile examples in this category are Sourcegraph’s Fair Source License (released March 2016), MariaDB’s Business Source License (August 2016) and Cockroach’s Community License (January 2017).
While these three licenses differ in their implementation details, they are all essentially trying to address the same need: balancing the availability of code that characterizes open source with the need to monetize the software to continue its development. Which is, of course, a company’s right. Those who write the software get to decide on the terms under which they make it available.
The question is not whether they have the right to use these neither open nor closed licenses, but rather if it’s a good strategic decision. While licenses are a matter of faith to many open source communities, and embody a given set of values, from an analyst’s perspective they are just tools. And different jobs may require different tools.
It’s not clear, however, that hybrid licenses – a term I’m using here as a placeholder until the market decides how to refer to them – are a worthwhile approach. There are several basic issues.
Legal Departments: Through a monumental effort by thousands of organizations worldwide over a period of better than a decade, most legal departments that deal with software have acquired at a minimum an understanding of and trust in the the Open Source Initiative’s list of approved licenses. The most important consequence of which, typically, is that licenses that are on this list may be considered; everything else is explicitly disallowed. Hybrid licenses then face an uphill battle with legal departments worldwide.
Developer Antipathy: As they do, notably, with developer populations. The steady erosion of proprietary software businesses by open source competition speaks to the central role OSS plays in a developer’s life today. Attempts to subvert or compete with open source, then, tend to be met with skepticism in a best case scenario and hostility at worst. Which means that the single most important constituency when it comes to adopting new technology is likely predisposed to being anti-hybrid licenses.
Audits and Trust: In many cases, hybrid licenses explicitly refer to the honor system. Rather than building limitations into the software, they rely on companies to be honest in their self-reporting. But while enterprises tend to be scrupulous about paying for the software they use, there’s a reason that many vendors include audit teams. Software has a way of finding its way into places the people responsible for paying for it are unaware of. Advocates of these licenses tend to argue that this isn’t a problem because they won’t perform audits, but a) that can change and b) companies can get acquired by organizations with differing attitudes towards audits.
General Confusion: Licenses, even those that have become well understood, are friction. Developers increasingly are responding to this friction by simply not selecting any license. As of March 2015, fewer than 20% of the repositories on GitHub carried a license. In this type of climate, new models that impose new responsibilities – and which are generally incompatible with existing licenses and/or package repositories – are unlikely to see widespread adoption.
Limited Contributions: The importance of this can vary by project, but hybrid licenses are likely to significantly depress the volume of incoming contributions for projects governed by these terms. The historical trend at present is towards more permissive rather than more restrictive licenses, and developers and enterprises alike tend to be reluctant to contribute back to projects that contain significant licensing limitations. This is one of the most significant issues for projects such as MySQL that leverage a dual license model, as one example. Many of the most significant potential contributions to the core project cannot be merged because the patch authors refuse to assign copyright, prefer a more liberal license or both.
Ultimately, the question of whether hybrid licenses are necessary likely comes down to how they’re evaluated relative to their closest competition. Pure play open source is enormously popular with developers but difficult to commercialize, and open core has emerged as the default model. Open core essentially describes a product in which a proprietary layer with extra features or capabilities is sold on top of a fully functional open source foundation.
At issue, then, is which of these two options is preferable:
Option A: Having most of a project be available under an OSI approved license with certain features and the source code they’re based on held back as a binary with a proprietary license.
Option B: The entirety of a code base is made available and shared under a license that is not OSI approved and comes with more restrictions.
The bet here is that the costs of hybrid licenses and the restrictions they impose will outweigh the lack of full source access in open core projects. That as a result the former will see limited adoption even amongst commercial open source organizations that are looking for every revenue opportunity they can find. As clever as the hybrid license might appear as a solution, they come with a significant overhead for buyers and sellers alike, one insufficiently offset by the fact that more source code is visible.
Even if this proves false and hybrid licenses take hold counter to expectations, it’s likely that the future of the commercialization of open source has less do with license models than it does with delivery methods. Commercial open source organizations, therefore, looking to maximize revenue would do well to think about whether improving their conversion rate at the margins is likely to generate a higher return than a business model that is built on the assumption that virtually every user is a paying user.
Licenses are always a tactic, but they’re rarely a strategy.
Disclosure: IBM, MariaDB, MongoDB, Pivotal and Red Hat are RedMonk customers. Cloudera, Cockroach Labs and Sourcegraph are not currently RedMonk customers.
The password-manager 1Password has just implemented a travelmode that tries to protect users while crossing borders. It doesn't make much sense. To enable it, you have to create a list of passwords you feel safe traveling with, and then you can turn on the mode that only gives you access to those passwords. But since you can turn it off at will, a border official can just demand you do so. Better would be some sort of time lock where you are unable to turn it off at the border.
There are a bunch of tricks you can use to ensure that you are unable to decrypt your devices, even if someone demands that you do. Back in 2009, I described such a scheme, and mentioned some other tricks the year before. Here's more. They work with any password manager, including my own Password Safe.
There's a problem, though. Everything you do along these lines is problematic, because 1) you don't want to ever lie to a customs official, and 2) any steps you take to make your data inaccessible is in itself suspicious. Your best defense is not to have anything incriminating on your computer or in the various social media accounts you use. (This advice was given to Australian citizens by their Department of Immigration and Border Protection specifically to Muslims pilgrims returning from hajj. Bizarrely, an Australian MP complained when Muslims repeated that advice.)
The EFF has a comprehensive guide to both the tech and policy of securing your electronics for border crossings.
I've just stuck my head around the door to offer refreshments to the gang of 13-year olds currently camping out around my son's wargame table.
"Remember that time the doorbell rang and there was nobody there...?"
"Um," I say, yes remembering but also, cringing. Oh shit. I'd forgotten that.
When that doorbell rang, I was in the middle of writing a fight scene for The Wreck of the Marissa. It's first person, and the main character is a foul-mouthed former mercenary NCO I vaguely imagine played by Daniel Craig.
You ever seen that experiment where you get somebody who swears they never dream and you wake them up from REM sleep?
And they say something like, "But the giant chickens are genuflecting"?
So when that doorbell went I swore. Then, juggling the paragraph-in-progress in my head - it's a bit like holding your breath - I got up from my desk and stomped through to the hall.
Nobody bloody there!
However, somebody, standing around the corner.. breathing.
I had a sudden sense of my vulnerability, that if this kind of pranking continued I'd not be able to settle to work outside school hours...
And then "Lucky" Jim Brandistock took over and growled, "IF THIS HAPPENS AGAIN I'LL NAIL YOUR FUCKING HEAD TO THE FUCKING WALL."
"...and then," continues Kurtzhau's friend, speaking faster, also remembering, "you said that you'd nail somebody's head to the wall? Well, I was there, but it was actually Other Kid..."
"That's OK," I say hurriedly.
"When he gets interrupted," cuts in Kurtzhau, "Dad sometimes gets grumpy and overreacts."
"Something like that," I say. "It's more to do with being um caught up in the story." I wince. "A writer thing. Now, who wants a cup of tea?"
Bad Dad.
Are writers bad parents? Is parenting bad for writers?
Last year, an Irish novelist caused a minor storm by stating that successful novelists are bad parents.
He may have been guilty of a sweeping statement, but I don't think he was entirely wrongheaded because he was really talking about being self-employed. Time and energy is finite. Following your dreams can make you grumpy and defensive of your working time. There's bound to be an impact or a payoff.
I'd like to claim there are compensations to having a writer dad.
For example, I weave a good ad hoc story.
I kept Kurtzhau entertained with Roman yarns for years. Based in Cologne, Centurion Tertius and his scholar friend Marcellinus fought Germans, Skeletons, Dragons, Dinosaurs, Aliens, and - in what turned out to be a finale - time-travelling Nazis after the hammer of Thor.
For Morgenstern, my daughter, there was Adventure Girl and her sidekick Pillow Panda (who turns into a pillow when she senses danger). They rescued the Sword of Fate from the Evil Ninjas, and once hired Lizard Mercenaries to storm Pluto and retrieve the Crown of Destiny. The ancient alien they discovered still inhabits a Himalayan lake, enjoying the company of paddling pandas.
But writers aren't the only creative parents, and playing bard does not necessarily equate to good parenting. Perhaps I would have been as well prompting the kids to tell their own stories? And isn't it just as important to get them into sport, or woodcraft or origami or whatever?
The one thing I can solidly lay claim to is that being-home based and self-employed over the last few years has meant that I'm here for the kids after school, in the holidays, and - like today - when they are off school sick.
So, writing doesn't automatically make me a bad parent, and I'm a good(ish) parent to the extent of being both primary child carer and homemaker.
What's the price of that OK-I-guess parenting?
What you'd expect.
According to Kurtzhau, in Traveller terms I am stuck at Entertainer #1:
"Jeez Dad! Stop screwing up those advancement rolls!"
I estimate that each child has cost me about two years of productivity.
When they are young, kids eat the discretionary time when you might have squeezed in some writing, or else they destroy your sleep so you are useless when that time materialises. Because you chose to have them, they also create a moral obligation to not just to walk out of the toxic job and live off lentils while you focus on your writing.
In this, having children is worse than having cats. At least Charlie can post pictures of a feline hanging onto his wrist as he types! Nobody wants to see the kind of disasters that used to get between me and my fictioneering. (We liked to rate nappies according to the Apocalypse Scale, as in from "One Horseman" through to the "Full St John".)
However, I swear the children have also made me into a better writer.
It's a cliche, but children help us see with fresh eyes. Through them I've rediscovered dinosaurs, and David Attenborough, and Science and Space and World War Two. I don't think I'd have sat through the new Cosmos without kids to share it with, or read up on the Polish 1939 Campaign if it hadn't caught Kurtzhau's imagination.
I've devoured books in order to keep up with my kids and that's affected my fiction. Shieldwall owes a lot to Kurtzhau asking how the Roman Empire fell. And my current Space Opera efforts owe much to Morgenstern's enthusiasm for stars and galaxies.
So in having kids, I've traded productivity for breadth and depth.
Was it worth it? That's not a meaningful question. I think most modern parents who "decide" to have kids are really obeying a primal imperative (one that people often don't really believe in if they don't feel it). If kids subsequently turn up, then it's no more a "lifestyle choice" than, say, being gay, and not really much more amenable to a cost-benefit analysys.
Can I still be a "successful" writer? Will the breadth and depth truly compensate for the loss of years? I've no idea. Ask me later in the year... I can hear the dice rattling right now.
What about you?
If you have kids, has your profession shaped your parenting? Has your parenting shaped your career?
If you don't have kids, how about your parents and their professions?
Remember that time, when we used to blithely use Docker and microservices interchangeably?
Docker has proved to be excellent in simplifying dev and test workflows, while Kubernetes has emerged as a strong alternative center of gravity for deployment of container-based apps. But it’s still early days in terms of production deployment of container based apps, especially of microservices-based ones apps.
What’s a service mesh? It’s the thing that focuses on solving all of those problems. Routing, rerouting for graceful degradation as services fail, secure inter service communications. Abstracting the networking and messaging in microservices based deployments.
You never want to be the only person that turns up at a party. If there is only one of something its pretty hard for people to understand as a market category.
With that in mind, the launch last week of ist.io is notable because it joins linkerd, another service mesh platform previously accepted by as a project by the Cloud Native Computing Foundation. And then there were two. Or three, counting Docker routing mesh.
“an open platform to connect, manage, and secure microservices. Istio provides an easy way to create a network of deployed services with load balancing, service-to-service authentication, monitoring, and more, without requiring any changes in service code. You add Istio support to services by deploying a special sidecar proxy throughout your environment that intercepts all network communication between microservices, configured and managed using Istio’s control plane functionality”
Backers of the project are shall we say rather notable – IBM, Google and Lyft. At this point, unlike linkerd, istio is only based on production lessons from a single company – Lyft. IBM and Google are coming at the problem from a vendor platform perspective.
Istio reflects IBM increasing investments into the Kubernetes ecosystem. By working with Google it is making a statement about regaining some industry engineering leadership.
As IBM’s Jason McGee said:
“IBM needs an engineering opinion [on building microservices], and we worked with partners that show that.”
That said – one interesting tension likely to emerge is that while initially integration between Kubernetes and Istio is very crisp, there plans in future to support serverless, Cloud Foundry and even bare metal deployments. Pivotal certainly expects to see Cloud Foundry support in the near future. Dormain Drewitz’s post on istio is really good, breaking it down into 3 key benefits – security (TLS for service to service authentication, intelligent traffic management (proxy, deployed as a sidecar to the relevant service), and visibility (monitoring and tracing for troubleshooting and debugging)
So what about linkerd, which is led by a single vendor, Bouyant, building a classic VC-funded commercial open source business around the technology, based on lessons learned while the founders worked at Twitter. The project’s key design principle is simple – that in a microservices world failures are more often found in the interactions between services. This should not surprise us – it’s pretty much distributed computing 101. But the complexity of the microservices implementations at Web scale natives is staggering enough to require a new generation of dedicated tooling. Microservices is hard – as the originator of the term Martin Fowler makes clear in this post it is not for everybody, because of the overheads created in building and maintaining applications composed of independent services with strong module boundaries.
The chart of the Twitter topology above is one of favourite visualisations of the fact.
Linkerd is described as “a transparent proxy that adds service discovery, routing, failure handling, and visibility to modern software applications.”
“By providing a consistent, uniform layer of instrumentation and control across services, linkerd frees service owners to choose whichever language is most appropriate for their service… Linkerd takes care of the difficult, error-prone parts of cross-service communication—including latency-aware load balancing, connection pooling, TLS, instrumentation, and request-level routing.”
So a service mesh is in effect a domain specific router for microservices. It sits above TCP/IP and assumes Level 3 and 4 networking. Not surprisingly, William Morgan does a bang up job of explaining why you need one. Not surprising because Morgan is cofounder of Bouyant, which wants to win paying customers customers building synchronous transaction-based systems with requirements for minimal latency composed of microservices – likely to be fintech and telco initially.
Istio seems more agnostic about supporting asynchronous microservices patterns, which could become a point of distinction between the two platforms going forward.
Service discovery and routing are two of the microservices questions that have yet to be comprehensively answered by either Docker Swarm or Kubernetes. In a distributed world services are by definition and design going to fail – so how do you route around them when they do? Linkerd is already used in production at Ticketmaster and the UK challenger bank Monzo (originally called Mondo). At least one other major cloud company is doing some very interesting refactoring around Linkered, but that’s NDA for now. Perhaps ironically the smallest vendor Bouyant seems to have the most service mesh deployments at scale.
So we now have a new center of gravity, a couple of interesting new projects to consider when looking at how to build, monitor and manage microservices based apps, alongside open source tools like Prometheus and vendors like Datadog and Weaveworks. Service mesh – it’s a thing now.
Docker, Google, IBM, and Pivotal are clients. Weaveworks is in my coworking space.
In this post we will take you behind the scenes on how we built a state-of-the-art Optical Character Recognition (OCR) pipeline for our mobile document scanner. We used computer vision and deep learning advances such as bi-directional Long Short Term Memory (LSTMs), Connectionist Temporal Classification (CTC), convolutional neural nets (CNNs), and more. In addition, we will also dive deep into what it took to actually make our OCR pipeline production-ready at Dropbox scale. Read more...
Open Source Big Data machine learning platform in use at Cadent Technology and IBM Watson Health, among other organizations.
Forest Hill, MD –31 May 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® SystemML™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.
Apache SystemML is a machine learning platform optimal for Big Data that provides declarative, large-scale machine learning and deep learning. SystemML can be run on top of Apache Spark, where it automatically scales data, line by line, to determine whether code should be run on the driver or an Apache Spark cluster.
"Today, the machine learning revolution is leading to thousands of life-altering innovations such as self-driving cars and computers that detect cancer," said Deron Eriksson, Vice President of Apache SystemML. "Apache SystemML enables and simplifies this process by executing optimized high-level algorithms on Big Data using proven technologies such as Apache Spark and Apache Hadoop MapReduce."
The core of Apache SystemML has been created from the ground up with the following design principles in mind:
Performance and Scalability, as SystemML scales up on single nodes, and scales out on large clusters using Apache Spark or Apache Hadoop;
"Designed for data scientists", enabling data scientists to develop algorithms in a system with a strong foundation in linear algebra and statistical functions; and
Cost-based optimization for scalable execution plans, that significantly shortens and simplifies the development and deployment cycle of algorithms for varying data characteristics and system configurations.
Using Apache SystemML, data scientists are able to implement algorithms using high-level language concepts without knowledge of distributed programming. Depending on data characteristics such as data size/shape and data sparsity (dense/sparse), and cluster characteristics such as cluster size and memory configurations, SystemML's cost-based optimizing compiler automatically generates hybrid runtime execution plans that are composed of single-node and distributed operations on Apache Spark or Apache Hadoop clusters for best performance.
"SystemML allows Cadent to implement advanced numerical programming methods in Apache Spark, empowering us to leverage specialized algorithms in our predictive analysis software," said Michael Zargham, Chief Scientist at Cadent Technology.
"SystemML is like SQL for Machine Learning, it enables Data Scientists to concentrate on the problem at hand, working in a high-level script language like R, and all the optimizations and rewrites are handled by the very powerful SystemML optimizer that considers data and available resources to produce the best execution plan for the application," said Luciano Resende, Architect at the IBM Spark Technology Center and Apache SystemML Incubator Mentor.
"IBM Watson Health VBC is using Apache SystemML on Apache Spark to build risk models on a very large EHR data set to predict emergency department visits," said Steve Beier, Vice President of Value Based Care Platform and Analytics at IBM Watson Health. "The models identify high-risk patients so that they can be targeted with preemptive strategies, thus potentially reducing care costs while at the same time leading to optimal outcomes for patients."
SystemML originated at IBM Research - Almaden in 2010, and was submitted to the Apache Incubator in November 2015. SystemML initiated compressed linear algebra research, a differentiating feature in SystemML, which received the VLDB 2016 Best Paper.
"The Apache Incubator is all about open collaboration and communication and was invaluable for everyone involved in SystemML," added Eriksson. "The Apache SystemML community sincerely encourages everyone interested in machine learning and deep learning to help build our community around this revolutionary technology."
Catch Apache SystemML in action at the Big Data Developers Silicon Valley MeetUp on 8 June 2017 in San Francisco, CA.
Availability and Oversight Apache SystemML software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache SystemML, visit http://systemml.apache.org/ and https://twitter.com/ApacheSystemML
About the Apache Incubator The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF) Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit https://www.apache.org/ and https://twitter.com/TheASF
The Unicode Technical Committee recently decided to change their long-standing guidance for the preferred number of REPLACEMENT CHARACTERs generated for bogus byte sequences when decoding UTF-8. I think this change is inappropriate because it was based on mere aesthetic considerations and ICU’s behavior and goes against the behavior of multiple prominent existing implementations that implemented the long-standing previous guidance.
Background
Not all byte sequences are valid UTF-8. When decoding potentially invalid UTF-8 input into a valid Unicode representation, something has to be done about invalid input. One approach is to stop altogether and to signal an error upon finding invalid input. While this is a valid response for some applications, it is not our topic today. The topic at hand is what to do in the non-Draconian case where the decoder continues even after discovering invalid input.
The naïve answer is to ignore invalid input until finding valid input again (i.e. finding the next byte that has a lead-byte value), but this is dangerous and should never be done. The danger is that silently dropping bogus bytes might make a string that didn’t look dangerous with the bogus bytes present become valid active content. Most simply, <scr�ipt> (� standing in for a bogus byte) could become <script> if the error is ignored. So it’s non-controversial that every sequence of bogus bytes should result in at least one REPLACEMENT CHARACTER and that the next lead-valued byte is the first byte that’s no longer part of the invalid sequence.
But how many REPLACEMENT CHARACTERs should be generated for a sequence of multiple bogus bytes?
Unicode 9.0.0 (page 127) says: “An ill-formed
subsequence consisting of more than one code unit could be treated as a single error or as
multiple errors. For example, in processing the UTF-8 code unit sequence <F0 80 80 41>,
the only formal requirement mandated by Unicode conformance for a converter is that the
<41> be processed and correctly interpreted as <U+0041>. The converter could return
<U+FFFD, U+0041>, handling <F0 80 80> as
a single error, or <U+FFFD, U+FFFD,
U+FFFD, U+0041>, handling each byte of <F0 80 80> as a separate error, or could take
other approaches to signalling <F0 80 80>
as an ill-formed code unit subsequence.” So as far as Unicode is concerned, any number from one to the number of bytes in the number of bogus bytes (inclusive) is OK. In other words, the precise number is implementation-defined as far as Unicode is concerned.
Yet, immediately after saying that there isn’t conformance requirement for the precise number, the Unicode Standard proceeds to express a preference motivating it by saying: “To promote interoperability in the implementation of conversion processes, the Unicode Standard recommends a
particular best practice.” The “best practice” (until the recent change) was that a maximal invalid sequence of bytes that forms a prefix of a valid sequence is collapsed into one REPLACEMENT CHARACTER and otherwise there is one REPLACEMENT CHARACTER per each bogus byte.
Explaining the Old Preference in the Terms of Implementation
The old preference makes sense when the UTF-8 to decoder is viewed as a state machine that recognizes UTF-8 as a regular grammar based on the information presented in table 3-7 “Well-Formed UTF-8 Byte Sequences” in the Unicode Standard (page 125 in version 9.0.0; quoted below) and exhibits the following behavior when encountering a byte that doesn’t fit the grammar at the current state:
If the state machine is in the start state, consume the bogus byte (which is never a lead byte since lead bytes are allowed in the start state) and emit a REPLACEMENT CHARACTER.
If the state machine is not in the start state, unconsume the bogus byte (which may be a lead byte), change state to the start state (i.e. the byte will be reprocessed in the start state shortly) and emit a REPLACEMENT CHARACTER.
The conclusion here is that when viewing the UTF-8 decoder as a state machine that encodes knowledge of what byte sequences are valid, the old preference makes perfect sense. In particular, the rule to collapse prefixes of valid sequences is not added complexity but the simple thing arising from not requiring the state machine to unconsume more than the one byte under examination.
Table 3-7. Well-Formed UTF-8 Byte Sequences
Code Points
First Byte
Second Byte
Third Byte
Fourth Byte
U+0000..U+007F
00..7F
U+0080..U+07FF
C2..DF
80..BF
U+0800..U+0FFF
E0
A0..BF
80..BF
U+1000..U+CFFF
E1..EC
80..BF
80..BF
U+D000..U+D7FF
ED
80..9F
80..BF
U+E000..U+FFFF
EE..EF
80..BF
80..BF
U+10000..U+3FFFF
F0
90..BF
80..BF
80..BF
U+40000..U+FFFFF
F1..F3
80..BF
80..BF
80..BF
U+100000..U+10FFFF
F4
80..8F
80..BF
80..BF
The New Preference
On May 12 2017, the Unicode Technical Committee accepted a proposal (for Unicode 11) to collapse sequences of bogus bytes to a single REPLACEMENT CHARACTER not only when they form a prefix of a valid sequence but also when the bogus bytes fit as a prefix of the general UTF-8 bit pattern. The bit pattern
for one, two, three and four-byte sequences is given in table 3-6 “UTF-8 Bit Distribution” in the Unicode Standard (page 125 in version 9.0.0; quoted below). The proposal is ambiguous about whether to do the same thing for five and six-byte sequences whose bit pattern is not defined as existing in Unicode but was defined in now-obsolete RFCs for UTF-8, the last RFC defining them being RFC 2279.
If five and six-byte sequences are treated according to the logic of the newly-accepted proposal, the newly-accepted proposal matches the behavior of ICU. If the decoder is supposed to be unaware of five and six-byte patterns, which are non-existent as far as Unicode is concerned, I am not aware of any implementation matching the new guidance.
The rationale against the old guidance was “I believe the best practices are wrong” and the rationale in favor the new guidance was “feels right”. (Really.)
Table 3-6. UTF-8 Bit Distribution
Scalar Value
First Byte
Second Byte
Third Byte
Fourth Byte
00000000 0xxxxxxx
0xxxxxxx
00000yyy yyxxxxxx
110yyyyy
10xxxxxx
zzzzyyyy yyxxxxxx
1110zzzz
10yyyyyy
10xxxxxx
000uuuuu zzzzyyyy yyxxxxxx
11110uuu
10uuzzzz
10yyyyyy
10xxxxxx
Explaining the New Preference in Terms of Implementation
The new preference makes sense if the UTF-8 decoder is viewed as a bit accumulator that first consumes bytes according to the UTF-8 bit distribution pattern and masks and shifts the variable bits into an accumulator where they form a scalar value and then upon completing a sequence according to the bit distribution pattern checks if the scalar value is valid
given the length of sequence consumed. Scalar value for the surrogate range or above the Unicode range is always invalid and otherwise scalar values are invalid if the scalar value could be represented as a shorter sequence of bytes than a sequence that was actually consumed.
It is worth noting that the concept of accumulating a scalar value during UTF-8 decoding is biased towards using UTF-16 or UTF 32 as the in-memory Unicode representation, since decoding to those forms necessarily involves the use of such an accumulator. When decoding UTF-8 to UTF-8 as the in-memory Unicode representation, while it is possible to first accumulate the scalar value and then re-encode it as UTF-8, it is unnecessary and inefficient and the sort of validation state machine described above makes more sense. Sure, such a state machine could be extended to exhibit the outward behavior of the formulation that involves a scalar accumulator, but it would be extra complexity in service of replicating the behaviors arising from a different model.
What’s Wrong with Changing a Mere Preference?
So who cares? It’s just a non-normative expression of preference.
There are multiple reasons to care.
There are multiple prominent implementations that follow the old guidance and it’s wrong to make them explain themselves (why they do not follow the new preference) or, worse, change behavior risking the introduction of bugs.
It sets a bad precedent to change the Unicode Standard instead of changing the Unicode Consortium-hosted implementation (ICU) when the two disagree but multiple prominent implementations follow the Standard and ICU is the outlier.
The old guidance isn’t a mere preference but a conformance requirement in the Web context.
The change was dodgy in terms of committee process, which sets a bad precedent.
Chrome has had a bug arising from different error handling behavior between two UTF-8 decoder implementations in the codebase.
If anything “It’s not a requirement.” should be taken as an argument why the spec doesn’t need changing and not as an argument why changes on flimsy grounds are OK. The realization that the Unicode Consortium seems to lack a strong objective reasoning to prefer a particular number of REPLACEMENT CHARACTERs could be taken to support a conclusion that maybe it would be the best for the Unicode Standard not to express a preference (called “best practice” implying the preference is in some sense the “best” option) on this topic, but it does not support the conclusion that changing the expressed behavior whichever way is OK.
But most importantly, even if the issue of the exact number of REPLACEMENT CHARACTERs in itself was not really that important to care about and it might seem silly to write this much about it, I see changing a widely-implemented spec on flimsy grounds as poor standard stewardship and I wish the Unicode Consortium did better than that so that this kind of failure to investigate what multiple implementations do does not repeat with something more important.
What Do Implementations Do, Then?
I tested the following implementations (Web browsers by visual inspection and others by performing the conversion and using diff on the output):
Firefox 53.0.3 (uconv conversion library)
Chrome 58.0.3029.110 (uses Web Template Framework forked from WebKit but since then modified for UTF-8 despite using ICU for many legacy encodings)
Safari (uses Web Template Framework for UTF-8 despite using ICU for many legacy encodings)
Edge (EdgeHTML 14.14393 on Windows 10 1607)
IE11 (on Windows 10 1607)
ICU (55.1 as distributed on Ubuntu 16.04)
Win32 (MultiByteToWideChar on Windows 10 1607)
The Java standard library as implemented in OpenJDK 8 (1.8.0_131 as distributed on Ubuntu 16.04)
The Ruby standard library (2.3.1p112 as distributed on Ubuntu 16.04)
The Python 3 standard library (3.5.2 as distributed on Ubuntu 16.04)
The Python 2 standard library (2.7.12 wide build as distributed on Ubuntu 16.04)
The Perl 5 standard library (5.22.1 as distributed on Ubuntu 16.04)
The Rust standard library (1.17.0)
rust-encoding (0.2.33)
encoding_rs (0.6.10) (of interest to me, because I wrote it; not claiming prominence, yet, since it has not yet been shipped in Firefox 😉)
Why these? Browsers should obviously be considered. I already had copypaste-ready code for ICU, Win32, rust-encoding and the Rust standard library. Java, Ruby, Python 3, Python 2 and Perl 5 were trivial to test due to packaging on Ubuntu and either prior knowledge of how to test or the documentation being approachable. On the other hand, I timed out trying to find the right API entry point in Go documentation, and I timed out trying to get GLib to behave (the glibc layer does not handle REPLACEMENT CHARACTER emission). I figured that testing e.g. CoreFoundation (which I believe only wraps ICU but, who knows, could do something else for UTF-8 like WebKit does), Qt or .Net would have taken too much time for me. In any case, the above list should be broad enough to make statements about “multiple prominent implementations”.
I used a specially-crafted HTML document as test input. Since the file is malformed, if you follow the link in your browser, the number of REPLACEMENT CHARACTERs depends on the UTF-8 decoder in your browser.
Most implementations produced bit-identical results matching the old Unicode preference. Therefore, I’m providing only one copy of the output. This file is valid UTF-8, so the number of REPLACEMENT CHARACTERs is encoded in the file and does not depend on your browser. This is the result obtained with (by visual inspection only for browsers):
Firefox
Chrome
Ruby
Python 3
The Rust standard library
rust-encoding
encoding_rs
An interesting browser behavior (link to manual synthesis of valid UTF-8 that looks the way the test input shows up in these browsers) is to emit as many REPLACEMENT CHARACTERs as there are bogus bytes without collapsing sequences that are prefixes of a valid sequence. This is the behavior of:
Edge
IE
Safari
The rest all had mutually different results (links point to valid output created with these implementations):
Win32 seems to approximate the old guidance with the quirk that for three or four-byte invalid sequences that follow the UTF-8 bit pattern the number of REPLACEMENT CHARACTERs is one fewer than the number of bytes.
OpenJDK 8 follows the old guidance except it follows the new guidance for CESU-8-encoded surrogates.
Python 2 (both narrow and wide) exhibits non-conforming behavior by accepting CESU-8-encoded astral code points. Non-shortest two, three or four-byte sequences follow neither the old nor the new guidance.
Perl 5 follows the old guidance for non-shortest forms but follows the new guidance for CESU-8-encoded surrogates and values above the Unicode range (with knowledge of five and six-byte sequences).
ICU follows the new guidance, including for five and six-byte sequences for which the proposal and decision were ambiguous.
As you can see, ICU is the most different from the others. In particular, even though Edge, IE11, Safari, OpenJDK 8 and Perl 5 do not follow the old guidance for everything, they match the old guidance for non-shortest forms.
When there are multiple prominent implementations following the old guidance and only ICU following the new guidance if it is taken to include five and six-byte sequences and no implementation (that I know of) following the new guidance if it is taken to exclude five and six-byte sequences, I think changing the spec shows poor standard stewardship.
It is wasteful if the implementors who followed the previous advice need to explain why they don’t follow the new “best practice”. It should be to the developers of the other implementations shouldering the burden of explaining their deviations from the “best practice”. It is even more wasteful if the change of Unicode Standard-expressed preference results in code changes in any of the implementations that implemented the old preference, since this would result in implementation and quality assurance work (potentially partially in the form of fixing bugs introduced as part of making changes).
A well-managed standard should not induce, for flimsy reasons, such waste on implementors who trusted the standard previously. Changes to widely-implemented long-standing standards should have a very important and strong rationale. The change at hand lacks such a strong rationale.
Favoring the Unicode-Hosted Implementation
Now that ICU is a Unicode Consortium project, I think the Unicode
Consortium should be particularli sensitive to biases arising from being
both the source of the spec and the source of a popular
implementation. It looks really bad both in terms of equal footing
of ICU vs. other implementations for the purpose of how the standard
is developed as well as the reliability of the standard text vs. ICU
source code as the source of truth that other implementors need to pay
attention to if the way the Unicode Consortium resolves a discrepancy
between ICU behavior and a well-known spec provision (even if mere expression of preference that isn’t a conformance requirement) is by changing the spec instead of
changing ICU.
The Mere Preference Has Been Elevated into a Requirement Elsewhere
Even though the Unicode Standard expresses the number of REPLACEMENT CHARACTERs as a mere non-normative preference, Web standards these days tend to avoid implementation-defined behavior due to the painful experience of Web sites developing dependencies on the quirks of particular browsers in areas that old specs considered error situations not worth spelling out precise processing requirements for. Therefore, there has been a long push towards well-defined behavior even in error situations on the Web without debating each particular error case individually to assess if the case at hand is prone to sites developing dependencies on a particular behavior. (To be clear, I am not claiming that the number of REPLACEMENT CHARACTERs would be particularly prone to browser-specific sites dependences.)
As a result, the WHATWG Encoding Standard, which seeks to be normative over Web browsers, has precise requirements for REPLACEMENT CHARACTER emission. For UTF-8, these used to differ from the preferences expressed by the Unicode Standard, but that was reported as a bug in 2012 and the WHATWG Encoding Standard aligned with the preference expressed by the Unicode Standard making it a requirement. Firefox was changed accordingly at the same time.
Chrome changed in 2016 to bring the Web Template Framework part of Chrome into consistency with V8, since a discrepancy between the two caused a bug! The change cited Unicode directly instead of citing the WHATWG Encoding Standard. This Chrome bug is the strongest evidence that I have seen that the precise behavior can actually matter.
Regrettably, the distance to make all browsers do the same thing would have been the shortest before the Chrome change. Before then, the shortest path to making all browsers do the same thing would have been to make V8 and Firefox emit one REPLACEMENT CHARACTER per bogus byte. However, I am not advocating such a change now. The V8 consistency issue shows that UTF-8 decoding comes to browsers from more places in the code than one would expect, some of those are implemented directly downstream of the Unicode Standard instead of downstream of the WHATWG Encoding Standard and consistency between those can turn out to matter. In the case of Firefox, there is the Rust standard library in addition to the main encoding library (currently uconv, encoding_rs hopefully in the near future), and I don’t want to ask the Rust standard library to change. (Also, from a purely selfish perspective, replicating the Edge/Safari behavior in encoding_rs, while possible, would lead to added complexity, because it would involve emitting multiple REPLACEMENT CHARACTERs retroactively for bytes that the decoder has already consumed as a valid-looking prefix.)
When two out of four of the major browsers match what the WHATWG Encoding Standard says about UTF-8 decoding and the two others are very close, the WHATWG spec is likely to stay as-is. It’s a shame if the Unicode change makes a conformance requirement for the Web differ from the non-requirement preference expressed by the Unicode Standard. There are already enough competing specs and stale spec versions around that making sure that browser developers read the right spec requires constant vigilance. It would be sad to have to add this particular part of the Unicode Standard to the “wrong specs to read” list. Even worse if the Unicode change leads to more bugs like the discrepancy between V8 and WTF within Chrome.
Process Issues
This is mostly of curiosity value but may be of relevance for the purpose of getting the decision overturned. It appears that the agenda for a Unicode Technical Committee meeting is supposed to be set at least a week in advance of the meeting, but the proposal at issue here seems to have been submitted on a shorter notice (proposal dated May 11 and accepted on May 12). Also, the old preference was formulated as the outcome of a more heavy-weight Public Review Issue process, so it seems inappropriate to change the outcome of a heavy-weight process by using a lighter-weight decision process.
Where to Go from Here?
First, I hope that the decision to change the preference that the Unicode Standard expresses for the number of REPLACEMENT CHARACTERs is overturned on appeal for the above reasons and for the bad precedent that the change is suggestive of when viewed as a slippery slope towards changing more important things on flimsy grounds. (Or, alternatively, the Unicode Standard stops expressing a preference for the number of REPLACEMENT CHARACTERs altogether.)
Second, I hope that the Unicode Consortium takes steps to mitigate the risk of making decisions on flimsy grounds in the future by requiring proposals to change text concerning implementation behavior (regardless of whether an actual requirement or a mere expression of preference) to come with a survey of the behavior of a large number of prominent existing implementations. The more established a given behavior is in implementations, the stronger the rationale should be to change the required or preferred behavior. The Unicode Consortium-hosted implementation should have no special weight when considering the existing implementation landscape.
That is, I think the proposal to change the preferred behavior in this case should have come with the kind of investigation that I performed here, preferably considering even more implementations, instead of baiting someone other than the person making the proposal to do the investigation after the decision has already been taken.
He’s leaving and she’s dying. Still, these are happy pictures.
That’s
Gareth Kirkby, a friend for decades, who
came over for dinner because we’d drifted apart, it’s been a while, and
because he’s leaving (has left now) for Asia, on a trip with no fixed end.
He’s political, a good writer and a good person and full of surprises. We’ll
miss him.
In his arms, Rune the
Bengal cat, who is 19
years old and failing fast; a list of her ailments would fill too much sad
space. But the interventions have (just barely) not reached abusive levels,
and they happen without the hated trips to the vet. Spring is coming and she
visibly enjoys the sunshine; but she won’t see another.
She’s been the best cat ever, full of stories; watch for her obituary in
this space pretty soon. We’ll miss her.
We are waves climbing the beach, growing thinner, and who knows how far
each will get? But a sunny spring evening on the porch stops time.
Ashley Hedberg has done us all a terrific service by compiling and sending to the Tufts administration reactions from former students to the decision by Tufts not to grant tenure to Ben Hescott. Ben is one of the finest teachers I’ve met. Tufts made a terrible mistake by not recognizing his extraordinary contributions and by not offering to him a stable long-term position that would appropriately reflect what he has achieved and what he continued to contribute to Tufts teaching and research every day.
We’ve have been thinking about Visual Studio Code, Microsoft’s relatively new open source cross platform code editor, a fair bit lately. One reason Code is on our radar is that it keeps coming up positively in conversations with developers and making surprise appearances in demos at grassroots tech conferences. Folks from modern language communities really like the platform. It’s hit a sweet spot in that it’s lightweight and snappy like a code editor but offers some neat features that you’d expect from an IDE – in code completion and debugging tools.
We’ve been seeing a fair bit of commentary like
Nice. So similar to my setup. Except I switched to VS Code recently – it's really fast, easy to write extensions, and great for typescript
Last week this post from Nodesource – Getting Started with VS Code for Node.js Development – bubbled up on twitter. It points out some useful utilities for Node developers like npm Intellisense and Debugger for Chrome (from within the editor), Docker support and keymap overlays for other editors. Keymaps is crucial to winning converts given so much of development is muscle memory.
Code also benefits from excellent support for TypeScript, a superset of Javascript developed by Microsoft. The project is led by Anders Hejlsberg, who is famous for building shit developers like to use, having been the lead lead developer of TurboPascal and then C#. TypeScript has shown notable growth in RedMonk’s most recent Programming Language Rankings.
But other languages are definitely in the mix too. Go devotees for example have been upbeat about Code. Given all the buzz, we were expecting to see some Code get some attention at Microsoft Build – so we ran some numbers to piggyback on the event.
Python is very interesting because of the language’s strong growth in data science. Integration with Jupyter Notebook, which grew out of iPython, could be a key driver there. Jupyter allows uses to create and share docs including live code, data visualisations, and associated text. It’s increasingly popular in data science and machine learning communities.
So given there is broad-based interest outside the C# community, like I say, we expected more attention at Build. In fact Code got very little attention at all, which was kind of surprising and a bit disappointing. The conference used to be for Microsoft technology true believers but these days it’s a more open affair, reflecting the open platform positioning of Microsoft CEO Satya Nadella, going to where the developers are. Obviously you can’t pack in every technology in a conference, much though a grueling 3 hour keynote can make it feel like you have. To be fair Microsoft announced general availability of Visual Studio for Mac at Build, and it deserved attention. A full function version of Visual Studio running on the Mac is kind of a big deal, at least in circling the wagons. Some folks on any good development team are going to be using Macs, after all.
But in terms of the Developer Experience funnel, attracting new users to Microsoft platforms, clearly Code is the new hotness.
I spoke to Ilya Dmitrichenko, developer experience engineer at Weaveworks about recent innovation in editors and toolchains last week. He charts the beginnings of networked IDEs doing something a standalone editor couldn’t as coming from ARM, of all companies, back in 2009. It released an IDE that made it easier to target a variety of hardware form factors without the usual low level gorp and downloads. One of the reasons we chatted about networked toolchains was the announcement that Red Hat is acquiring Codenvy to harden its openshift.io online IDE, CI/CD automated container deployment story.
But back to Code – Dmitrichenko said he liked the editor, but felt it closer integration with Azure Functions to be really compelling for him. The serverless community is hungry for an experience offering better integrated cloud based IDE/dev and test – Paul Johnston goes on about this all the time. GitHub is heading in a similar direction with Atom and the GitHub marketplace, with an code editor integrated with cloud based back end services. We shouldn’t forget AWS, which acquired Cloud9 but has been rather quiet since, apart from the odd teaser by Tim Wagner, GM of Serverless about online testing, integrated with the developer experience.
The opportunity is in client + services. If we ask ourselves who is going to be the Ubuntu of developer infrastructure it’s going to be a platform that best packages both local environment and hosted developer services. Code, rather than Visual Studio Classic is (arguably) best placed from a Microsoft perspective to make a play like this happen. And the Code team is making some savvy adwords purchases even if they weren’t featured heavily at Build 😉
AWS, Red Hat and Microsoft are all clients. Microsoft paid T&E to Build.
bonus update – I saw this exchange today and thought it was worth screenshotting
Don Thibeau, The OpenID Foundation, with Jeremy Crawford, CEO, Real Estate Standards Organization
One of the sure signs of increasing momentum is when other standards organizations adopt yours. And it’s particularly noteworthy when that adoption not only builds on OpenID Connect, but adds OpenID Connect conformance compliance as part of the overall standards certification requirements to its innovative standards portfolio. Leading that effort in the Real Estate Industry is RESO, a new member of the OpenID Foundation. But I’ll let Jeremy Crawford tell the story.
Jeremy: “The Real Estate Standards Organization (RESO) is tasked with the challenging goal of standardizing all of the real estate data in the US and Canada. This includes the data payload, the fields, formats, transport mechanism, and authentication/authorization. At the heart of our efforts is the RESO Data Dictionary, which serves as the real estate industry’s ‘Rosetta Stone’ for real estate data and the RESO Web API, the latest cutting edge standard for data delivery between endpoints. Hundreds of MLS, brokers and technology companies gather data. But what good is it if the data cannot be shared or understood? The Data Dictionary ensures that each system ‘speaks’ the same language. It is the common standard that defines real estate data in consistent terms and data structures.
“RESO standards all started with the Real Estate Transaction System (RETS) or RETS, an 18 year-old standard for transporting real estate data based on XML and virtually every major real estate website uses it. But the world has changed quite a bit since 1999, and the industry desperately needed something new and easy to use that was mobile and developer friendly. Moreover, the initial learning curve for RETS can be a little daunting, and we want to attract new software companies and developers to our industry. We’ve created the RESO Web API standard to make life a little easier for everyone who needs to deal with real estate data.
The strategic approach
“Developing the RESO Web API presented us with two huge strategic opportunities: first, we could unshackle ourselves from the proprietary world of RETS and move into the global technology space where collaboration is key. The result is we have expanded our realm outside of residential real estate to power the data in other industries because there are many companies working in multiple industries, including real estate, and already using OData and/or OpenID Connect standards.
“Firms like DocuSign works across a plethora of industries and in real estate leverages RESO standards and OpenID Connect as it relates to transaction management. Prempoint is another great example, which relies on standards like OpenID Connect for the industries it serves, including community management, commercial and utility site operators, real state management and property preservation. These are the kinds of new businesses that are able to provide products and services to brokers and agents by relying on RESO standards because of our new RESO Web API.
“Second, RESO has emerged as a leader in standard collaboration. Personally, I’ve had the great privilege of sitting on the xDTM Standards advisory board, and now RESO is collaborating with ECCMA, MISMO, PRIA, BLDS, MITS, OASIS, OpenID Connect Foundation, and OSCRE standards organizations. Among our most productive collaborations has been with the United States Department of Energy (BEDES) and our efforts through new Better Buildings Home Energy Information Accelerator, alongside the Council of MLSs and others to help Multiple Listing Services (MLS) across America provide a consistent “energy-transparent” shopping experience for consumers. New technology from RESO, like our Web API, is connecting us globally and to more industries and their respective rich datasets than ever before.
The technical details
“Now how did we get there? RESO member Cal Heldenbrand from FBS, a security expert who does the authentication portion of the Spark API on development, describes the technical path we took for the creation of the Web API.”
Cal: “The data transport portion of the RESO API standard is leveraging the latest version of the OData standard, a global standard used worldwide for transporting data in an efficient and consistent manner. On the authorization side, we initially started using the OAuth2 standard around January 2014. At that time, OpenID Connect was very cool looking as an extension to Oauth2, but the subject matter experts were hesitant to recommend it to RESO until it was a fully finalized, ratified standard.
“There are hundreds of software companies working together in our industry. Writing an interoperable OAuth2 protocol using the framework wasn’t trivial. Since there are many various options for implementing client based user authentication with OAuth2 standard, it seemed like every major installation in the world had their own spin on it. That’s not good. It also meant that we couldn’t just copy how someone else did it: we had to develop our.
“Plus, the absence of endpoint metadata means we had to create a document where everything lives, then ask clients to hard code URLs for every OAuth2 provider. That’s a lot of busywork for a developer to add a new identity provider (IdP) to a software installation.
“After OpenID Connect became a finalized standard we showcased a presentation at a RESO conference highlighting how one website in our industry could accept identities from Google, Microsoft, Amazon, and also from our own OpenID Connect Provider as it was already being done by one of our charter members, FBS, the creators of Flexmls on its Spark Platform. Since it’s an actual protocol standard, it’s was easy to demonstrate that with a simply plug in IdP and a small configuration change, the OpenID Connect client libraries would handle the rest. That’s really powerful. In our industry, we are used to SSO integrations taking weeks to complete. With OpenID Connect, that turns into minutes.
“The certification process was pretty easy as well. I was expecting it to be more intensive! Our environment is Ruby on Rails, and I used Nov’s openid_connect Ruby gem for constructing ID Tokens. Other than that, my Provider is written from scratch. It took me about 2 weeks to have a very simple provider running for demo purposes. Then another 2 weeks to have it fully compliant with the certification tools. This is also along side my usual day job tasks of web operations. I’d have to say this was a breeze compared to the old OpenID 2.0. Thanks for making a great standard!”
And thanks to Cal and the Real Estate Standards Organization (RESO) team for sharing their use case and feedback. We will be telling the OpenID Connect story at some of the real estate industry events this year.
Frontend United is a conference for web designers and developers, held this year in Athens. I went along to talk about JavaScript and ARIA.
My talk borrowed the classic quote “You’re only supposed to blow the bloody doors off!”, from The Italian Job starring Michael Cain. It looked at ways to avoid blowing things up with JavaScript and ARIA, when you really didn’t mean to.
There were some great questions from the audience:
Is there a convention for informing screen reader users about JavaScript shortcuts?
There isn’t one that I know of. Twitter displays a notification to screen reader users that is hidden visually, and that does work for screen reader users at least. It would be good if browsers could indicate the availability of keyboard shortcuts to everyone – unless you know that the question mark is the keyboard convention for discovering which shortcuts are available, there’s a good chance sighted people will miss out on the shortcuts too.
When I test with Jaws and NVDA, how do I get them both to do the same thing?
This is a bit like the CSS cross-browser compatibility problem we had a few years ago. People would fret if a site didn’t look exactly the same in two different browsers, but eventually we realised it didn’t matter as long as the site looked good in both browsers. It’s the same with screen readers – the question isn’t how to get Jaws and NVDA to behave the same way, but to make sure the site works well with both screen readers.