Franklin Hu

  • Archive
  • RSS
  • Ask me anything

Zipkin - distributed tracing for service-based applications

thechangelog:

As more and more applications are composed of several or more services, finding all the bottlenecks between the data and the user is a tough problem in itself. Twitter has added to their growing list of open source projects with Zipkin, which provides a system for distributed tracing.

Zipkin architecture

Instrumented libraries in the stack send information to a Zipkin collector via Thrift. Data can then be queried and presented in Zipkin’s web interface.

screenshot

Check the README or source on GitHub for more.

  • 11 months ago > thechangelog
  • 4
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

vim-pasta - smarter, context aware indented pasting for Vim

thechangelog:

Marcin Kulik has released vim-pasta, a plugin for Vim that looks at the destination context to determine the indentation level for pasted text. Consider a paste with the cursor on line 1 of the following code block:

if i_were_president
<paste would land here>
  <when you really wanted it here>
  ...
end

Vim-pasta drops the text at the correct indentation level, in this case nested inside the if. Also, vim-pasta preserves indentation within the pasted code, preserving the format of blocks like this example:

obj = {
       a: 1,
       b: 2,
     foo: 3,
  barbaz: 4
}

Pasta is disabled by default for Python, CoffeeScript, and Markdown since the destination indentation level can’t easily be guessed for these languages. You can configure how vim-pasta handles other languages via its black and white lists:

let g:pasta_disabled_filetypes = ['python', 'coffee', 'yaml']
let g:pasta_enabled_filetypes = ['ruby', 'javascript', 'css', 'sh']

See the README for advanced usage.

  • 1 year ago > thechangelog
  • 224
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

‘h’ to heart

I made a Chrome extension so in addition to using ‘j’ and ‘k’ to navigate, you can hit ‘h’ to heart the current post.

The source is on Github, or you can download the extension directly here.

  • 1 year ago
  • 3
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

Reimagining the HDFS NameNode

In an ideal world, distributed storage systems would scale easy to accommodate an infinite amount of data. HDFS’s use of a NameNode as the single arbiter of a cluster’s metadata is a huge scalability problem.

HDFS holds all of the cluster’s metadata (file-to-blocks association, block locations, etc.) in memory and also persisted to disk for durability, but this data grows with the amount of data in the cluster and will at some point exceed amount of RAM on the node.

The solution so far has been federation. Instead of making the NameNode scalable, a single DataNode can respond to multiple NameNodes. Simply partition your data into separate namespaces, and you’re good to go. But this isn’t the ideal solution.

Suppose we rip out the actual metadata storage in the NameNode. Most of the metadata consists of mappings (file-to-blocks, block-to-DataNodes, etc.), so using a durable distributed hash table makes sense.

The implications:

  • Highly available/scalable metadata store
  • NameNode becomes essentially stateless (recovery just involves firing a new instance up)
  • Per-request latency

The per-request latency (network + service response time versus in memory data structure look ups) could be in the millisecond range, but when compared to the amount of time actually doing computation during an MR job on a large data set, it’s essentially negligible.

  • 1 year ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

About

thisisfranklin.com

Me, Elsewhere

  • @thisisfranklin on Twitter
  • Facebook Profile
  • franklinhu on github
  • RSS
  • Random
  • Archive
  • Ask me anything
  • Mobile
Effector Theme by Pixel Union