I am Joshua Poehls. Say hello Archives (not so) silent thoughts

Earth's Water is Older Than the Sun

Astronomers have discovered that much of the water on Earth—and the solar system—predates the Sun.

This isn’t really surprising, right? I mean, it lines right up with the Genesis creation. Still, thinking about things like this blows my mind.

1 In the beginning God created the heaven and the earth. 2 And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters. 3 And God said, Let there be light: and there was light. 1

Genesis 1:1-3 (KJV)


  1. I’m no scholar but I read this as, “Let there be light: and there was the Sun.” ↩︎

⦿

Starting Sentences With I.E. or E.G.

A parenthetical statement that is a complete sentence should begin with a capital.

I used to debate this everytime I’d use i.e. or e.g. in my writing. Not anymore!

Right way:

E.g., this is a proper example.

I.e., something like this is correct.

Wrong way: 1

e.g., This is a bad example.

Apparently it is also good practice to include the comma.


  1. Sadly this is how I’ve usually written it, except I almost always left off the comma. ↩︎

⦿

Hello World

Welcome to my new blog! 1 All of my blogging in the past has been in the form of longer articles, mostly technology related, and mostly trying to teach a tip or trick.

I’m taking a different approach this time. I’ll be posting more random thoughts. Sharing articles I find interesting. Overall I expect this to feel more like reading my Facebook wall and that’s what I’m going for. 2

Here goes nothing!

Hello World! Hallo Welt! שלום עולם

  1. This is no longer the first post as I’ve started migrating older content from my blog at zduck.com↩︎

  2. This is my plan to take control of my public content. I’ll still share on Facebook and Twitter, but I want to own the canonical version. ↩︎

⦿

Go 101: String or Byte Slice?

One of the first things you’ll notice in Go is that two different types are commonly used for representing text. string and []byte. A quick example is the regexp package which has functions for both string and []byte.

What is the difference?

string is immutable and []byte is mutable. Both can contain arbitrary bytes.

The name “string” implies unicode text but this is not enforced. Operating on string is like operating on []byte. You are working with bytes not characters.

They are nearly identical and differ only in mutability. The strings and bytes packages are nearly identical apart from the type that they use.

Q: If strings are just arbitrary bytes, then how do you work with characters?

A: What you are thinking of as a character, Go calls a rune. One way to iterate the characters in a string is to use the for...range loop. Range will parse the string as UTF-8 and iterate the runes. Read the for loop section of Effective Go for more information.

When to use string?

Ask not when to use string but rather, when to use []byte. Always start with string and switch to []byte when justified.

When to use []byte?

Use []byte when you need to make many changes to a string. Since string is immutable, any change will allocate a new string. You can get better performance by using []byte and avoiding the allocations.

C# perspective: []byte is to System.StringBuilder as string is to System.String when it comes to performance.

Even if your code isn’t directly manipulating the string, you may want to use []byte if you are using packages which require it so you can avoid the conversion.

Converting to and from []byte is easy. Just remember that each conversion creates a copy of the value.

s := "some string"
b := []byte(s) // convert string -> []byte
s2 := string(b) // convert []byte -> string

Converting to/from string and []byte copies the entire value. Using lots of type conversions in your code is typically a warning sign that you need to reevaluate the types you are using. You want to minimize conversions both for performance and clean code.

More about strings

The Go blog has posted in detail about strings, bytes, runes, and characters in Go. You should definitely read that post to fully understand the topic.

Update: Thanks to @mholt6 for reviewing the post and helping improve it!

⦿

Go 101: Methods on Pointers vs. Values

Methods can be declared on both pointers and values. The difference is subtle but important.

type Person struct {
     age int
}

// Method's receiver is the value, `Person`.
func (p Person) Age() int {
     return p.age
}

// Method's receiver is a pointer, `*Person`.
func (p *Person) SetAge(age int) {
     p.age = age
}

In reality, you’d only define getter and setter functions like this if you needed to implement additional logic. In an example like this you’d just make the Age field public.

This is how you define getter and setter functions in Go. Notice that we defined the Age() function on the value but SetAge() on the pointer (i.e. *Person). This is important.

Go always passes by value. Function parameters are always passed by copying them as opposed to passing a reference. (Read more.)

Even pointers are technically passed by value. The memory address is copied, the value it points to is not copied.

Here is the wrong way to define SetAge. Let’s see what happens.

func (p Person) SetAge(age int) {
     p.age = age
}

p := Person{}
p.SetAge(10)
fmt.Printf("Age: %v", p.Age()) // Age: 0

▶ Run it.

Notice that the output is 0 instead of 10? This is ‘pass by value’ in action.

Calling p.SetAge(10) passes a copy of p to the SetAge function. SetAge sets the age property on the copy of p that it received which is discarded after the function returns.

Now let’s do it the right way.

func (p *Person) SetAge(age int) {
     p.age = age
}

p := Person{}
p.SetAge(10)
fmt.Printf("Age: %v", p.Age()) // Age: 10

▶ Run it.

My rule of thumb is this: declare the method on the pointer unless your struct is such that you don’t use pointers to it.

Two reasons:

  1. Performance. Calling a method on a pointer will almost always be faster than copying the value. There may be cases wear the copy is faster but those are edge case.
  2. Consistency. It is common for at least one of your methods to need a pointer receiver and if any of the type’s methods are on the pointer then they all should be. This recommendation is direct from the FAQ.

Read the FAQ “Should I define methods on values or pointers?” for more insight.

Update: Thanks to the fine folks on reddit for suggesting some improvements.
Join the discussion on reddit!

Update 2: Here are even more rules of thumb to help you choose whether to use a value or pointer receiver.

⦿

Go 101: Constructors and Overloads

Go doesn’t have constructors in the traditional sense. The convention is to make the zero value useful whenever possible.

type Person struct {
     Age int
}

// These are equivalent.
// `p1` and `p2` are initialized to the zero value of Person.
// Neither of these are nil.
var p1 Person // type Person
p2 := Person{} // type Person

// You could also use `new` to allocate which returns a pointer
p3 := new(Person) // type *Person

It is most common to use the struct initializer. e.g. p := Person{} or p := &Person{} if you need the pointer.

Sometimes you want special initialization logic. If your type is named Person then the convention would be create a function named NewPerson that returns a pointer to an initialized Person type.

func NewPerson(int age) *Person {
     p := Person{age}
     return &p
}

myPerson := NewPerson(10) // type *Person

Multiple constructors can be implemented by having multiple initializer functions. Go doesn’t support function overloads so you will need to name your functions intelligently.

import "time"

func NewPersonAge(int age) *Person {
     p := Person{age}
     return &p
}

func NewPersonBirthYear(int birthYear) *Person {
     p := Person{time.Now().Year() - birthYear}
     return &p
}

Read more in Effective Go.

Update: Thanks to Joe Shaw for the comments! I’ve updated the article with his suggestions.

⦿

Go and Package Versioning

I’ve been working with Go on personal projects for several months. During that time, Go’s intended strategy for package management and versioning has largely eluded me. My understanding really started to come together over the past week which is why I wrote about my Go Project Structure and now am writing about package versioning.

My goal with this post is to help others who, like myself, have had a hard time reconciling Go’s approach to package dependencies and versioning with that of other languages.

Import Paths and go get

Other languages, particularly recent ones, tend to have a package manager that the community has adopted as defacto standard. Ruby has gems. Node has npm. .NET has NuGet. Perl has CPAN. Haskell has Hackage. All of these have something in common; they are all a central package hosting repository.

Go’s package manager is the go get command and it is completely decentralized.

How does this work?

When you reference a package in your Go source code you use an import path that usually looks like a URL. e.g. github.com/jpoehls/gophermail. When you go build your code the Go tool uses this path to find the package in your GOPATH. If it can’t find the package the build fails. So how do you pull down the package?

  1. Manually. You can git clone the package into your GOPATH.
  2. Use go get. This is where the URL import path convention is leveraged. go get simply treats the import path as a URL and attempts to retrieve it via HTTP or HTTPS. It is smart enough to handle Git, Mercurial, Bazaar, and Subversion. Go has special support for common hosts like GitHub and Bitbucket, but you can use it with any host and even custom URLs.

Finding Packages

As a Baby Gopher you may find it hard to locate packages that you need. e.g. “What’s a good image resizing package?” You feel lost without a central repository to search.

Sites like GoDoc.org Go-Search.org fill the role of a central repository in this regard. They don’t host the packages but they index all packages stored on various hosting sites. GitHub, Bitbucket, etc. They even include packages from your custom servers.

So to answer the question, “what’s a good image resizing package”, all you have to do is a simple search.

GoDoc.org was recently adopted into the Go project which is very, very cool.

Package Versioning

go get is the Go package manager. We’ve seen how it works in a completely decentralized way and how package discovery still possible without a central package hosting repository.

Besides locating and downloading packages, the other big role of a package manager is handling multiple versions of the same package. Go takes the most minimal and pragmatic approach of any package manager. There is no such thing as multiple versions of a Go package.

go get always pulls from the HEAD of the default branch in the repository. Always. This has two important implications.

This isn’t always true. If your repository has certain special branches then go get will pull from them instead. Specifically, if you have a go1 branch and are running Go 1+ it will pull from that branch.

  1. As a package author, you must adhere to the stable HEAD philosophy. Your default branch must always be the stable, released version of your package. You must do work in feature branches and only merge when ready to release.
  2. New major versions of your package must have their own repository. Put simply, each major version of your package (following semantic versioning) would have its own repository and thus its own import path. e.g. github.com/jpoehls/gophermail-v1 and github.com/jpoehls/gophermail-v2.

It’s that simple.

Another way to phrase this is that each import path must point to a stable API. Major version bumps are for backwards incompatible API changes and thus each major version is a distinct stable API. In the Go world this warrants a distinct import path and because go get ties import paths to VCS repositories, it means a distinct repository.

This is exactly the explanation I wish was in the Go docs. They kind of dance around the idea but it isn’t stated as clearly as it could be IMO. This is the epiphany that I needed to really understand how Go works.

Version 2.0? New Repository.

As someone building an application in Go, the above philosophy really doesn’t have a downside. Every import path is a stable API. There are no version numbers to worry about. Awesome!

I think this must have been the Go author’s focus when designing go get. It is a solid and simple design for Go applications. I love it. It’s only frustrating from the package authoring point of view, which I’ll elaborate on below.

The frustration lies with the authoring of packages. Putting different versions of your package in separate repositories is simply not the standard workflow. The standard flow is to maintain tags or branches for each version of your package.

As a package author, you think about your repository as the top level of your project and the ecosystem supports this (think GitHub). I’ll use gophermail as an example. I can tag bugs with version numbers and organize them into release milestones, moving features from V1 to V2, etc. I might use GitHub Pages to serve a website for the project and host the documentation.

Go shatters this world view by forcing you to break your project into multiple repositories as soon as you want a version 2.0. The level of abstraction is simply moved from the branch level to the repository level and the ecosystem doesn’t cater to this methodology.

The biggest hurdle is just understanding that this is the case. That this is how Go works.

The downside is all in code organization preference. The industry standard is to use tags and branches for marking multiple versions. Not separate repositories. go get thinks of branches as being used solely to target different versions of Go (i.e. the special go1 branch) and, honestly, I think that’s a pretty ugly hack on their part.

Multiple Versions in a Single Repository

There is a workaround that will allow you to keep multiple versions of your package in the same repository and use branches/tags for differentiating between them.

go get supports custom URLs and you can use this to insert a version number into your package’s import path. Granted, this is non-trivial. It means writing and hosting a proxy service that parses URL and proxies the requests to the applicable branch/tag of your repository.

Fortunately, someone has already done the hard work for us. GoPkg.in does exactly what I’ve described.

I’m taking advantage of this for my gophermail package. All it means is that instead of people using github.com/jpoehls/gophermail to import my package, they use gopkg.in/jpoehls/gophermail.v0. The .v0 is because gophermail hasn’t reached 1.0 yet. When I release 1.0 and declare a stable API, the import path will change to gopkg.in/jpoehls/gophermail.v1.

GoPkg.in is a fantastic service and I encourage anyone with a Go package to use it. All you have to do is tell your users to use the gopkg.in import path for your package. If this gets wide enough attention, hopefully similar functionality will be adopted into go get itself.

My dream is for go get to support a version number component in the import path. So github.com/jpoehls/gophermail would fetch the HEAD of the repository as it does today. github.com/jpoehls/gophermail#v1 would fetch the v1 branch or tag.

⦿

Go Project Structure and Dependencies

Go is build around the concept of a GOPATH which is a common workspace for most (or all) of your Go source code. This works well but sometimes you don’t want the third party packages you depend on to be in your primary GOPATH. I can think of a few reasons for this:

• You have multiple Go apps and each depend on different versions of the same third party package. Ideally these packages would use a versioned URL so that the import paths are different but this isn’t always the case. (See http://gopkg.in, a great tool for package authors to use for providing versioned URLs.)

If you want to know more about how package versioning works in Go, you should read my other post: Go and Package Versioning

• You want to commit your third party packages as part of your application’s code so that everything needed for a build is in one place. This guarantees that your app will always build regardless of the state of the third party repo and elminates a lot of headaches when working with a build server or multiple developers.

This is equivelant to committing your NuGet ./packages folder in .NET or your npm ./node_modules folder in Node.

I looked at many Go projects to find a solution I liked. There are a lot of approaches to this problem including many attempts at a package manager for Go. The community is very much in flux around this topic.

Personally, I felt any package manager tool was overkill for my needs. I was also discouraged by the fact that the community hasn’t rallied behind any single package management solution.

In the end, this is my recipe for Go application structure.

  • It uses all the standard go get, go build, and go test commands.
  • It doesn’t involve copying your source code into a temporary folder during the build.
  • It doesn’t require you to rewrite the import paths of third party packages.
  • It modifies your GOPATH environment variable in a very simple and isolated way. This is how we prioritize our _vendor directory so that third party packages are pulled from there instead of our primary GOPATH.
  • It wraps common commands in a Makefile. This can easily be replaced by a batch file on Windows or a make.go file that you can run with go run make.go.

Camlistore uses the make.go approach (see here) and I really like its cross platform consistency. I may switch to it at some point.

Solution:

For context, my system GOPATH is ~/mygo.

My project is at ~/mygo/src/bitbucket.org/USERNAME/PROJECT.

~/mygo/src/bitbucket.org/USERNAME/PROJECT
|-- .gitignore
|-- README
|-- Makefile
|-- _vendor # Third party packages. This is a typical GOPATH workspace.
|   `-- src
|       `-- github.com
|           |-- codegangsta
|           |   |-- inject
|           |   `-- martini
|           `-- jpoehls
|               `-- gophermail
|-- bin  # My app's binary output.
|-- docs # My app's documentation.
`-- src  # My app's source code.
    |-- main_app
    |-- some_pkg
    `-- some_other_pkg

My Makefile looks like this:

.PHONY: build doc fmt lint run test vendor_clean vendor_get vendor_update vet

# Prepend our _vendor directory to the system GOPATH
# so that import path resolution will prioritize
# our third party snapshots.
GOPATH := ${PWD}/_vendor:${GOPATH}
export GOPATH

default: build

build: vet
	go build -v -o ./bin/main_app ./src/main_app

doc:
	godoc -http=:6060 -index

# http://golang.org/cmd/go/#hdr-Run_gofmt_on_package_sources
fmt:
	go fmt ./src/...

# https://github.com/golang/lint
# go get github.com/golang/lint/golint
lint:
	golint ./src

run: build
	./bin/main_app

test:
	go test ./src/...

vendor_clean:
	rm -dRf ./_vendor/src

# We have to set GOPATH to just the _vendor
# directory to ensure that `go get` doesn't
# update packages in our primary GOPATH instead.
# This will happen if you already have the package
# installed in GOPATH since `go get` will use
# that existing location as the destination.
vendor_get: vendor_clean
	GOPATH=${PWD}/_vendor go get -d -u -v \
	github.com/jpoehls/gophermail \
	github.com/codegangsta/martini

vendor_update: vendor_get
	rm -rf `find ./_vendor/src -type d -name .git` \
	&& rm -rf `find ./_vendor/src -type d -name .hg` \
	&& rm -rf `find ./_vendor/src -type d -name .bzr` \
	&& rm -rf `find ./_vendor/src -type d -name .svn`

# http://godoc.org/code.google.com/p/go.tools/cmd/vet
# go get code.google.com/p/go.tools/cmd/vet
vet:
	go vet ./src/...

Notes:

  • The standard go get, go build, and go test commands are used. All we do is prepend our _vendor workspace to the GOPATH to prioritize our snapshot of the third party packages during import path resolution.
  • vendor_update task in the Makefile is a shotgun approach. It will update all of your dependencies to their latest version. 80% of the time this is what I want. In the 20% cases, I simply manually update the individual packages.
  • The _vendor directory can be called anything you like but the underscore prefix is important. The go tool will ignore any files or directories that begin with . or _ (read more). This means we can run things like go test in our repo root without it picking up those third party packages.
  • I’ve included some useful ancillary tasks as well. Such as the doc, fmt, lint, and vet tasks, and the fact that build runs go vet first.

How do you structure your Go projects? What does your build script look like? I’m always interested in improving my methods, let me know what’s working for you!

⦿

Creating an ICO icon file for your Windows app

Every Windows app needs an icon. Even yours.

If you are a developer then this task probably sounds simpler than it may turn out to be. First off, a lot of graphics programs (Paint.NET, for example) don’t support saving as an ICO file, the format needed by Windows.

Don’t sweat. I’ve got you covered. Here is the developer cheat sheet for creating an app icon file.

Pro tip: ICO files can contain multiple icons. Often they will contain multiple versions of the same icon in different sizes. For example, a 64px and a 16px icon, each optimized for their respective size.

Wikipedia has a lot more interesting info on the ICO format.

  1. Create your icon. Make it square. Make it high res. Remember, you can always size down but it is harder to size up. I like to go with a minimum of 256px.
  2. Export your icon to PNG files at multiple sizes. My minimum recommendation would be the following: 16px, 32px, 48px, 64px, 96px.
  3. Use PNGGauntlet or PNGOut to compress those PNGs. You won’t lose quality but you will shed a lot of wasted bytes.
    Screenshot of PNGGauntlet
  4. Use Icon Maker to create your ICO file. Windows supports a single ICO file with multiple sizes embedded and that’s what we want to do. Create a single ICO file with each of your specifically sized PNGs in it. You could also just drag in the largest PNG (96px for example) and trust Windows to resize it nicely.
    Screenshot of Icon Maker
  5. Save out the ICO file and use it in your application.

That’s it! The biggest trick is knowing the tools. Icon Maker is a life saver. You should also understand what size icon(s) to use so that your app is well represented throughout the Windows UI. There is a good thread on StackOverflow with more details on this.

I’m not an expert in this area so I’m sure there are things I’m missing to further optimize your icon. Add any of your own tips in the comments!

⦿

Detecting ZIP files with PowerShell

Have you heard of magic numbers? Some file formats are designed such that files are always saved with a specific byte sequence in the header. JPEG, PDF, and ZIP are all such formats.

You could look for ZIP files by searching for all files with a .zip extension but a better way would be to look for all files that have 50 4b 03 04 as the first 4 bytes of the file. All ZIP files will start with those bytes. Not all ZIP files have the .zip extension.

Here’s a Test-ZipFile PowerShell cmdlet that will return true or false whether the specified file has this magic header. You may also note that this cmdlet is a good citizen by accepting file path input in an idiomatic way.

This cmdlet is also a great example of accepting -Path and -LiteralPath arguments in an idiomatic way. Including wildcard support for -Path.

View on GitHub →

function Test-ZipFile
{
<#
.SYNOPSIS
    Tests for the magic ZIP file header bytes.

.DESCRIPTION
    Inspired by http://stackoverflow.com/a/1887113/31308
#>
	[CmdletBinding()]
	param(
        [Parameter(
            ParameterSetName  = "Path",
            Mandatory = $true,
            ValueFromPipeline = $true,
            ValueFromPipelineByPropertyName = $true
        )]
        [string[]]$Path,

        [Alias("PSPath")]
        [Parameter(
            ParameterSetName = "LiteralPath",
            Mandatory = $true,
            ValueFromPipeline = $true,
            ValueFromPipelineByPropertyName = $true
        )]
        [string[]]$LiteralPath
	)

    process {
        $provider = $null

        # Only expand wildcards if the -Path parameter was used.
        if ($PSCmdlet.ParameterSetName -eq "Path") {
            $filePaths = $PSCmdlet.GetResolvedProviderPathFromPSPath($Path, [ref]$provider)
        }
        elseif ($PSCmdlet.ParameterSetName -eq "LiteralPath") {
            $filePaths = $PSCmdlet.GetResolvedProviderPathFromPSPath($LiteralPath, [ref]$provider)
        }

        foreach ($filePath in $filePaths) {
            $isZip = $false
            try {
                $stream = New-Object System.IO.StreamReader -ArgumentList @($filePath)
                $reader = New-Object System.IO.BinaryReader -ArgumentList @($stream.BaseStream)
                $bytes = $reader.ReadBytes(4)
                if ($bytes.Length -eq 4) {
                    if ($bytes[0] -eq 80 -and
                        $bytes[1] -eq 75 -and
                        $bytes[2] -eq 3 -and
                        $bytes[3] -eq 4) {
                        $isZip = $true
                    }
                }
            }
            finally {
                if ($reader) {
                    $reader.Dispose()
                }
                if ($stream) {
                    $stream.Dispose()
                }
            }

            Write-Output $isZip
        }
    }
}

Test-ZipFile is part of Poshato, my personal PowerShell module of miscellaneous goodness.

⦿